How to design schema denormalization to handle data changes?

General Tech Bugs & Fixes 2 years ago

0 1 0 0 0 tuteeHUB earn credit +10 pts

5 Star Rating 1 Rating

Posted on 16 Aug 2022, this text provides information on Bugs & Fixes related to General Tech. Please note that while accuracy is prioritized, the data presented might not be entirely correct or up-to-date. This information is offered for general knowledge and informational purposes only, and should not be considered as a substitute for professional advice.

Take Quiz To Earn Credits!

Turn Your Knowledge into Earnings.

tuteehub_quiz

Answers (1)

Post Answer
profilepic.png
manpreet Tuteehub forum best answer Best Answer 2 years ago

 

I'm designing a schema for MongoDB, and keep running into scenarios where future updates might invalidate my cached copies of data. One example is for Users, Orders and Addresses.

const UserSchema = mongoose.Schema({
    addresses: [{ street: String, city: String, state: String, zip: String }]
});

const OrderSchema = mongoose.Schema({
    address: { street: String, city: String, state: String, zip: String }
});

This seems to be a standard approach, since MongoDB isn't meant to be a relational database, to denormalize the data where possible. However, the following scenario confuses me:

  1. User adds an Address to their User document.
  2. User places an Order and selects an Address from their list of addresses.
  3. The address data is copied into the Order document when the order is persisted.
  4. Before the order is shipped, the user discovers that they mistyped the address.
  5. User changes that incorrect Address in their User document.
  6. The Address in the Order object needs to change also, otherwise it will be shipped to an invalid address.

This seems to point towards the need for a reference using mongoose.Schema.Types.ObjectId to simulate a relational structure between the collections. (In that case, there would also be an Addresses collection, of course.) However, there are other considerations such as the history aspect of the denormalization. I want to store the actual address to which the order was shipped, even if that address is later deleted or changed. With denormalization this would seem to be easier than the relational paradigm.

One approach I considered to create an Addresses collection, then mark its records as invalidated when they are deleted, in case they are already referenced in an Order. And when they are modified, I would need to check the Orders collection to see if that Address is referenced. If it's already been referenced in a shipped order, I would have to leave it alone (for historical purposes), and create an additional Address document with the new changes. All of which sounds a bit complicated, compared to the denormalization approach.

The next part of the issue regards querying and reporting. If I want to pull up a list of all Users who have ever had an address in Illinois, I would need to traverse both the Addresses collection and the Orders in order to find out. Because they may have had an Illinois address, used it in a shipped order, then deleted it from the Addresses collection.

How do the smartest MongoDB data architects handle situations like this? I'm an experienced relational database architect, but am somewhat baffled by the conceptual framework of NoSQL. Thanks!

No matter what stage you're at in your education or career, TuteeHub will help you reach the next level that you're aiming for. Simply,Choose a subject/topic and get started in self-paced practice sessions to improve your knowledge and scores.