No matter what stage you're at in your education or career, TuteeHub will help you reach the next level that you're aiming for. Simply,Choose a subject/topic and get started in self-paced practice sessions to improve your knowledge and scores.
General Tech 10 Answers
General Tech 7 Answers
General Tech 3 Answers
General Tech 9 Answers
manpreet
Best Answer
2 years ago
The GroupBy during my WriteToText operations fails due to running out of memory which kills my dataflow job. Running the job locally I run out of memory as well.
Based on the WriteToText source code it seems to me specifying the number of shards should help with issue. I am not sure how to choose the number of shards though can anyone explain a process to choose the number of shards?
I expect a better sharding approach could mean the pipeline is less efficient but doesn't crash. In general I am not sure how to make dataflow pipelines more robust against failure from large outliers.
For a bit more context y error message on Dataflow looked like this: