Is MongoDB the ideal database system for e-commerce? -

Programmers, architects, and managers working on e-commerce projects often face a crucial decision: choosing the right database to store information about products or services. Just as physical products are stored in warehouses, their virtual counterparts are housed in databases.

When selecting a Database Management System (DBMS) for your online store, it’s essential to consider various factors like flexibility, high availability, reliability, managing concurrent queries, and data timeliness. MongoDB stands out as a widely used system that caters to these requirements, and I will delve into its capabilities in this article.

E-commerce encompasses more than just online retail stores.

In basic terms, e-commerce refers to business transactions conducted electronically over the Internet. This includes buying and selling, where payments and deliveries can occur either online or offline. While online stores are the most well-known form of e-commerce, it’s important to recognize that e-commerce encompasses various other platforms such as auction sites, e-currency exchange offices, electronic banking, and online betting platforms.

Challenges in Managing Your E-commerce Database

In the e-commerce sector, databases play unique and specialized roles.

An appropriately set up database system ought to:

Ensure 24/7 data availability
Sustain a high polling rate during peak usage
Handle large data storage needs
Dynamically and continuously update information about changes, like product availability in the catalog

During peak sales periods like Black Friday or Cyber Monday, when query volumes surge, ensuring these database capabilities becomes particularly crucial. Hence, e-commerce firms should prioritize database scalability.

Should you choose a relational database or a non-relational database?

Examining data storage options for e-commerce services, we encounter a choice between various databases, with the most prominent being relational (SQL) and non-relational (NoSQL). Let’s delve into the distinctions between these options. Specifically, SQL, or Structured Query Language, serves as a language for retrieving data from relational databases. However, it’s common to refer to this type of database simply as an “SQL database” for the sake of comparison. This also facilitates the recollection of the name for the second type – a NoSQL database, often denoted as “not SQL”.

There are five fundamental differences between these database types:

SQL

Clearly defined data relationships
Data is stored in tables
Defined schema
Preferred in the case of multi-line operations
Vertically scalable

NoSQL

no relationship; the data in our database is loosely coupled
data stored in documents, and graphs, as the so-called key-value
dynamic schema, unordered data
preferred when the speed of data acquisition is important
horizontally scalable

Indeed, NoSQL databases align seamlessly with the demands and prerequisites of the e-commerce sector regarding data availability and storage. At present, MongoDB stands out as the most widely used database system of this kind.

What is MongoDB?

MongoDB is a document database designed for straightforward creation and scalability. Documents are generated and stored in Binary JSON (BSON) format. Utilizing JSON makes it easy to convert queries and results into a format that can be interpreted by the front-end code of an e-commerce application. This format is also more human-readable. The NoSQL solution encompasses features like hierarchy, automatic fragmentation, and integrated replication, enhancing scalability and ensuring high availability.

Now that we’ve identified the key challenges in e-commerce and established MongoDB as an apt choice for data storage, let’s delve into how MongoDB can bolster the e-commerce sector.

Benefits of NoSQL Databases in E-commerce, Illustrated by MongoDB

Dynamic schemas

Dynamic schemas in MongoDB allow documents within a collection to possess varying fields and different types for a given field, enhancing flexibility in mapping to entities or objects. However, empirical evidence suggests that document structures within a collection often exhibit similarities. To ensure this consistency, MongoDB has incorporated the capability to establish validation rules on a per-collection basis.

Easy hierarchization of data

Leveraging the JSON format simplifies data structuring, offering the option to either embed one document within another or provide references. The choice between these methods should be made on a case-by-case basis for each collection. Embedding is typically favored as it enables data retrieval with a single query, thereby enhancing system performance. References may be more suitable for intricate hierarchical structures or situations where the advantages of embedding are outweighed by potential data duplication issues, such as the need to track changes when data replacement occurs.

Replication

MongoDB employs a concept known as a Replica Set, which consists of nodes containing identical data. This setup facilitates data replication, serving to enhance availability and safeguard against database server failures. A well-structured architecture also leads to expedited data access.

We’ll delve into the fundamental principles and replication mechanisms using the diagram provided below.

The replica set is composed of a Primary member, along with Secondary members. Additionally, there is a unique member known as the Arbiter, which lacks a data copy but plays a role in selecting an alternative if the primary server becomes inaccessible.

Write operations are exclusively executed on the Primary instance, after which MongoDB’s built-in mechanism duplicates the data across the other instances.

By default, read operations also pass through the Primary instance, but there’s an option to configure the nodes to utilize secondary servers for handling queries. This may introduce eventual consistency, implying a slight delay in data updates.

The clocking mechanism, known as heartbeat, involves each node (member) checking the others for availability every 2 seconds. If the primary server is unreachable, a new one is elected.

The election process involves choosing the instance with the highest priority from the available ones. As per the documentation, a replica set can accommodate up to 50 nodes, but only 7 can partake in the selection process (voting), with the successor being elected from among them. The remaining servers, referred to as Non-Voting members, must have their properties “votes” and “priority” set to 0. It’s advised to configure an odd number of voting instances; therefore, the minimum number of required nodes in a replica set is 3.

Fragmentation

Fragmentation refers to the process of breaking down a dataset into smaller segments. This allows for horizontal scaling of your database with virtually no constraints. In MongoDB, this fragmentation process is facilitated by a cluster, which comprises:

Shard – a replica set that houses a portion of the collection (chunk).

Router – functioning akin to a load balancer, it directs requests to the relevant sub collection based on configuration to distribute the workload evenly.

Config server – responsible for storing metadata and the cluster configuration.

These are the key considerations when selecting the document field for fragmentation:

Cardinality: Evaluate how many elements the collection can be divided into based on the chosen key.

Repeatability: Determine if any value occurs more frequently than others.

Consistency: Assess whether the new key values exhibit a non-linear increase or decrease.

Query Frequency: Prioritize keys used in the most frequently executed queries.

When it comes to strategies, there are two to take advantage of:

Hashed Sharding

In this strategy, MongoDB automatically generates a hash from the key field values. This method is effective when key values change consistently, as the hash enhances the even distribution of documents across shards. However, one drawback is that for queries pertaining to a specific range, it’s improbable that all documents will reside in a single shard. Consequently, the router may need to query all segments of the collection (chunks) since it cannot definitively ascertain which shard contains the sought-after documents.

Ranged Sharding

Under this strategy, each shard manages segments of the collection based on a defined key-value range. This approach proves effective when there’s a large set of infrequently repeating key values. The notable advantage is the ability to pinpoint queries to a specific shard or collection, leading to a marked improvement in polling speed. MongoDB’s built-in mechanism is responsible for dividing and allocating these segments. It ensures their consistent distribution and endeavors to maintain a relative balance in their sizes. It’s important to note that when making fragmentation decisions, MongoDB lacks an option for merging data. The only recourse is to perform fragmentation anew using a different key.

Streams of Change

Starting from version 3.6, MongoDB provides the capability to monitor changes in a specific collection, database, or the entire system, with the exception of admin, premises, and config collections. This is achieved by initiating a cursor, which allows for the iterative tracking of events related to a given scope. Since this mechanism utilizes aggregation, it’s possible to listen for specific changes or modify received notifications. The primary prerequisite is the use of a replica set, as notifications occur at the point where changes are saved in the majority of data storage components.

Change streams rely on a dedicated, restricted oplog collection to store information about operations affecting the current data state. Documents in this collection undergo rotation, meaning that when a new document reaches the size limit, the oldest ones are purged. Therefore, it’s essential to select an appropriate size for this collection, taking into account the event frequency, to ensure that you can capture the desired events before they are removed.

Conclusion

As per forecasts, the e-commerce sector in Poland is poised for sustained growth over the coming years. Customer expectations for websites or applications are on the rise. Key elements in enhancing Customer Experience encompass factors like availability, speed, and reliability. A well-tailored database system such as MongoDB not only exhibits robust resilience to failures but also offers scalability and the capacity to efficiently organize and manage substantial volumes of data. This positions it as an ideal choice for fulfilling the requirements of various e-commerce projects.