Speak now
Please Wait Image Converting Into Text...
Embark on a journey of knowledge! Take the quiz and earn valuable credits.
Challenge yourself and boost your learning! Start the quiz now to earn credits.
Unlock your potential! Begin the quiz, answer questions, and accumulate credits along the way.
General Tech Technology & Software 2 years ago
Posted on 16 Aug 2022, this text provides information on Technology & Software related to General Tech. Please note that while accuracy is prioritized, the data presented might not be entirely correct or up-to-date. This information is offered for general knowledge and informational purposes only, and should not be considered as a substitute for professional advice.
Turn Your Knowledge into Earnings.
I want to write an application that is able to generate reports and enable interactive data analysis (OLAP-like) from monitoring data from a large production system. (I know, there are some problematic trade-off decisions ahead, but let's keep them aside for now.)I identified the following possibilities for the basic tech stack:
Based on my research I tend to believe that Hadoop/HBase/Hive would be the most common combination. But this is only based on a number of forum questions and product presentations.Can someone else share his general opinion on the subject?Or to be more specific answer the following questions:
If you'd be running HBase on the same cluster as Hadoop, you'd really cut down the memory available for MapReduce jobs. You don't really need random read/update capability of HBase for an OLAP system. You can load your data into Hadoop cluster using Flume or manually. The equipment monitoring data lends itself to partitioning by time, for example by calendar date. After you load your data into a directory structure that can be mapped to a partitioned Hive table, you can query it using HiveQL. For the most tricky analysis you can either write MapReduce jobs in Java or use Pig.
The problem is that responses would not come instantaniously. This is OK for ad-hock analysis, but might be frustrating if you trying to look at some commonly used pre-determined metrics. In the later case you should consider precalculating such metrics and loading results into a memory cache or even in a relational database. I have seen such frequently used results cached in HBase, I just cannot get over wasting half of the available RAM on a cluster for that purpose.
No matter what stage you're at in your education or career, TuteeHub will help you reach the next level that you're aiming for. Simply,Choose a subject/topic and get started in self-paced practice sessions to improve your knowledge and scores.
General Tech 10 Answers
General Tech 7 Answers
General Tech 3 Answers
General Tech 9 Answers
General Tech 2 Answers
Ready to take your education and career to the next level? Register today and join our growing community of learners and professionals.