No matter what stage you're at in your education or career, TuteeHUB will help you reach the next level that
you're aiming for. Simply,Choose a subject/topic and get started in self-paced practice
sessions to improve your knowledge and scores.
Similar Forum
Explore Other Libraries
Online Exams
Question Bank
Career News
Feeds
Full Forms
Dictionary
Interview Question
Gigs
Quotes
Lyrics
Videos
Courses
Blogs
Tutorials
Forum
Educators
Corporates
Tools
manpreet
Best Answer
2 years ago
The GroupBy during my WriteToText operations fails due to running out of memory which kills my dataflow job. Running the job locally I run out of memory as well.
Based on the WriteToText source code it seems to me specifying the number of shards should help with issue. I am not sure how to choose the number of shards though can anyone explain a process to choose the number of shards?
I expect a better sharding approach could mean the pipeline is less efficient but doesn't crash. In general I am not sure how to make dataflow pipelines more robust against failure from large outliers.
For a bit more context y error message on Dataflow looked like this: