Which NoSQL technology for Geospatial and Time Queries?

General Tech Technology & Software 2 years ago

0 2 0 0 0 tuteeHUB earn credit +10 pts

5 Star Rating 1 Rating

Posted on 16 Aug 2022, this text provides information on Technology & Software related to General Tech. Please note that while accuracy is prioritized, the data presented might not be entirely correct or up-to-date. This information is offered for general knowledge and informational purposes only, and should not be considered as a substitute for professional advice.

Take Quiz To Earn Credits!

Turn Your Knowledge into Earnings.

tuteehub_quiz

Answers (2)

Post Answer
profilepic.png
manpreet Tuteehub forum best answer Best Answer 2 years ago


I am looking for a NoSQL technology that meets the requirement of being able to process geospatial as well as time queries on a large scale with decent performance. I want to batch-process several hundred of GBs to TBs of data with the proposed NoSQL technology along with Spark. This will obviously be run on a cluster with several nodes.

Types of queries I want to run:

  • "normal" queries for attributes like "field <= value"
  • Basic geospatial queries like querying all data that relies within a bbox.
  • Time queries like "date <= 01.01.2011" or "time >= 11:00 and time <= 14:00"
  • a combination of all of the three query types (something like "query all data that where location is within bbox and on date 01.01.2011 and time <= 14:00 and field_x <= 100")

I am currently evaluating which technologies are possible for my usecase but I'm overwhelmed by the sheer amount of technologies there are available. I have thought about popular technologies like MongoDB and Cassandra. Both seem to be applicable for my usecase (Cassandra only with Stratios Lucene index) but there might be a different technology that works even better.

Is there any technology that will heavily outperform others based on these requirements?

profilepic.png
manpreet 2 years ago


I want to batch-process several hundred of GBs to TBs of data

That's not really a cassandra use case. Cassandra is firstly optimized for write performance. If you have a really huge amount of writes, Cassandra could be a good option for your. Cassandra isn't a database for Exploratory queries. Cassandra is a database for known queries. On read level Cassandra is optimized for sequentiell reads. Cassandra can only query data sequentially. It's also possible to ignore this but it's not recommended. Huge amount of data could be, with the wrong data model, a problem in Cassandra. Maybe a hadoop based database system is a better option for your.

Time queries like "date <= 01.01.2011" or "time >= 11:00 and time <= 14:00"

Cassandra is really good for time series data.

"normal" queries for attributes like "field <= value"

If you know the queries before you modeling you database, Cassandra is also a good choice.

a combination of all of the three query types (something like "query all data that where location is within bbox and on date 01.01.2011 and time <= 14:00 and field_x <= 100")

Cassandra could be a good solution. Why could? As i said: You have to know this queries before you create your tables. If you know that you will have thousands of queries where you need a time range and the location (city, country, content etc.) it is a good solution for your.

time queries on a large scale with decent performance.

Cassandra will have the best performance in this use case. The data are already in the needed order. MonoDB is a nice replacement for MySQL use cases. If you need a better scale, but scaling mongodb is not so simple as in Cassandra, and flexibly and you care about the consistency. Cassandra has eventual consistency is scalable and performance is really important. MongoDB has also relations, Cassandra not. In Cassandra is everything denormalized because performance cares.


0 views   0 shares

No matter what stage you're at in your education or career, TuteeHub will help you reach the next level that you're aiming for. Simply,Choose a subject/topic and get started in self-paced practice sessions to improve your knowledge and scores.