Posted on 16 Aug 2022, this text provides information on Bugs & Fixes related to General Tech. Please note that while accuracy is prioritized, the data presented might not be entirely correct or up-to-date. This information is offered for general knowledge and informational purposes only, and should not be considered as a substitute for professional advice.
The GroupBy during my WriteToText operations fails due to running out of memory which kills my dataflow job. Running the job locally I run out of memory as well.
Based on the WriteToText source code it seems to me specifying the number of shards should help with issue. I am not sure how to choose the number of shards though can anyone explain a process to choose the number of shards?
I expect a better sharding approach could mean the pipeline is less efficient but doesn't crash. In general I am not sure how to make dataflow pipelines more robust against failure from large outliers.
For a bit more context y error message on Dataflow looked like this:
No matter what stage you're at in your education or career, TuteeHUB will help you reach the next level that
you're aiming for. Simply,Choose a subject/topic and get started in self-paced practice
sessions to improve your knowledge and scores.
manpreet
Best Answer
2 years ago
The GroupBy during my WriteToText operations fails due to running out of memory which kills my dataflow job. Running the job locally I run out of memory as well.
Based on the WriteToText source code it seems to me specifying the number of shards should help with issue. I am not sure how to choose the number of shards though can anyone explain a process to choose the number of shards?
I expect a better sharding approach could mean the pipeline is less efficient but doesn't crash. In general I am not sure how to make dataflow pipelines more robust against failure from large outliers.
For a bit more context y error message on Dataflow looked like this: