How to debug beam WriteToText running out of memory?

General Tech Bugs & Fixes 2 years ago

0 1 0 0 0 tuteeHUB earn credit +10 pts

5 Star Rating 1 Rating

Posted on 16 Aug 2022, this text provides information on Bugs & Fixes related to General Tech. Please note that while accuracy is prioritized, the data presented might not be entirely correct or up-to-date. This information is offered for general knowledge and informational purposes only, and should not be considered as a substitute for professional advice.

Take Quiz To Earn Credits!

Turn Your Knowledge into Earnings.

tuteehub_quiz

Answers (1)

Post Answer
profilepic.png
manpreet Tuteehub forum best answer Best Answer 2 years ago

 

The GroupBy during my WriteToText operations fails due to running out of memory which kills my dataflow job. Running the job locally I run out of memory as well.

Based on the WriteToText source code it seems to me specifying the number of shards should help with issue. I am not sure how to choose the number of shards though can anyone explain a process to choose the number of shards?

I expect a better sharding approach could mean the pipeline is less efficient but doesn't crash. In general I am not sure how to make dataflow pipelines more robust against failure from large outliers.

For a bit more context y error message on Dataflow looked like this:


Workflow failed. Causes: S31:ReadData/Read+BaseNLP+SplitBaseDoc+WriteJSONBaseNLPToGS/Write/WriteImpl/WriteBundles/WriteBundles+SplitSentences+NormalisedNESplitSentences+NamedEntitiesSplit+LinkedEntitiesSplit+ExtractMetadata+ExtractSentCoOcc+ExtractDocCoOcc+WriteJSONDocumentToGS/Write/WriteImpl/WriteBundles/WriteBundles+WriteDocCoOccToGS/Write/WriteImpl/WriteBundles/WriteBundles+WriteJSONDocumentToGS/Write/WriteImpl/Pair+WriteNamedEntitiesToGS/Write/WriteImpl/WriteBundles/WriteBundles+WriteNormalisedSentenceNEToGS/Write/WriteImpl/WriteBundles/WriteBundles+WriteNormalisedSentenceNEToGS/Write/WriteImpl/Pair+WriteNormalisedSentenceNEToGS/Write/WriteImpl/WindowInto(WindowIntoFn)+WriteNormalisedSentenceNEToGS/Write/WriteImpl/GroupByKey/Reify+WriteNormalisedSentenceNEToGS/Write/WriteImpl/GroupByKey/Write+WriteJSONDocToGS/Write/WriteImpl/WriteBundles/WriteBundles+WriteJSONDocToGS/Write/WriteImpl/Pair+WriteJSONDocToGS/Write/WriteImpl/WindowInto(WindowIntoFn)+WriteJSONDocToGS/Write/WriteImpl/GroupByKey/Reify+WriteJSONDocToGS/Write/WriteImpl/GroupByKey/Write+WriteDocCoOccToGS/Write/WriteImpl/Pair+WriteSentCoOccToGS/Write/WriteImpl/WriteBundles/WriteBundles+WriteSentCoOccToGS/Write/WriteImpl/Pair+WriteSentCoOccToGS/Write/WriteImpl/WindowInto(WindowIntoFn)+WriteSentCoOccToGS/Write/WriteImpl/GroupByKey/Reify+WriteSentCoOccToGS/Write/WriteImpl/
                                                
                                                
0 views
0 shares

No matter what stage you're at in your education or career, TuteeHub will help you reach the next level that you're aiming for. Simply,Choose a subject/topic and get started in self-paced practice sessions to improve your knowledge and scores.