Every application needs, sooner or later, to store data that survives the restart of the process. The decision about how and where to persist that data is one of the most influential and long-lasting in any architecture: it shapes performance, scalability, consistency, and even operational cost. For decades the default answer was "a relational database," but the explosion of large-scale web applications popularized a range of alternatives grouped under the term NoSQL. In this lesson you will learn the real differences between the two worlds, what guarantees they offer (ACID versus BASE), what families of NoSQL databases exist, and, above all, what objective criteria to use when choosing. The goal is not to declare a winner, but for you to know how to reason through the decision.

Contents

  1. The relational (SQL) model in essence
  2. What "NoSQL" really means
  3. The four NoSQL families: document, key-value, columnar, and graph
  4. Transactional guarantees: ACID versus BASE
  5. SQL vs NoSQL comparison table
  6. Selection criteria and antipatterns
  7. Polyglot persistence

  1. The relational (SQL) model in essence

A relational database organizes data into tables (relations) made up of rows and columns, with a rigid schema defined in advance. Relationships between tables are expressed through foreign keys, and we query the data with SQL (Structured Query Language). Examples: PostgreSQL, MySQL, Oracle, SQL Server.

Let's look at a simple relational schema for customers and their orders:

-- Customers table: each customer has a unique identifier
CREATE TABLE customers (
    id          BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
    name        VARCHAR(120) NOT NULL,
    email       VARCHAR(180) NOT NULL UNIQUE,
    created_at  TIMESTAMP    NOT NULL DEFAULT now()
);

-- Orders table: the customer_id column "points to" customers.id
CREATE TABLE orders (
    id           BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
    customer_id  BIGINT       NOT NULL REFERENCES customers(id),
    total        NUMERIC(10,2) NOT NULL CHECK (total >= 0),
    status       VARCHAR(20)  NOT NULL DEFAULT 'PENDING'
);

Let's analyze the code fragment by fragment:

  • PRIMARY KEY GENERATED ALWAYS AS IDENTITY: defines the primary key, the unique identifier of each row. The database generates it automatically, so there will never be two customers with the same id.
  • UNIQUE on email: the database guarantees that no email is repeated. It is an integrity rule that lives in the data, not in the application code.
  • REFERENCES customers(id): declares a foreign key. The database prevents inserting an order whose customer_id does not exist, avoiding "orphan orders." This is called referential integrity.
  • CHECK (total >= 0): a constraint that rejects negative totals. Again, the rule is enforced by the engine.

To retrieve a customer's orders, we combine both tables with a JOIN:

SELECT c.name, o.id AS order_id, o.total
FROM customers c
JOIN orders o ON o.customer_id = c.id   -- joins rows that share the id
WHERE c.email = '[email protected]'
ORDER BY o.total DESC;

The JOIN is the star operation of the relational model: it allows you to compose normalized data (without duplication) at query time. The strength is the flexibility to ask unforeseen questions; the cost is that very large JOINs can be slow and hard to scale horizontally.

  1. What "NoSQL" really means

"NoSQL" is an unfortunate name: it does not mean "no SQL" (many of these databases offer similar query languages) but rather "Not Only SQL." It groups engines that fully or partially give up the relational model to gain horizontal scalability, schema flexibility, or performance on specific accesses. Their common traits:

  • Flexible schema: each record can have different fields without prior migrations.
  • Native horizontal scaling: they are designed to spread data across many nodes (sharding) from day one.
  • Denormalization: instead of a JOIN, data is duplicated so that each read touches a single record.
  • Relaxed guarantees: they often sacrifice immediate consistency in exchange for availability and speed.

  1. The four NoSQL families

Not all NoSQL databases are the same. They are classified into four major families according to their data model:

3.1 Document

They store self-contained documents, usually in JSON/BSON. Each document groups all related information together. Examples: MongoDB, Couchbase. The same customer with their orders from the SQL example would look like this in a single document:

{
  "_id": "cli_1042",
  "name": "Ana López",
  "email": "[email protected]",
  "orders": [
    { "order_id": "ord_5001", "total": 49.90, "status": "SHIPPED" },
    { "order_id": "ord_5002", "total": 12.50, "status": "PENDING" }
  ]
}

Notice the key difference from SQL:

  • The orders are embedded inside the customer, in an orders array. To display the customer's record with their orders, a single read is enough, no JOIN.
  • There is no fixed schema: another customer might not have the orders field, or have an additional phone field, without anyone having to alter the "table."
  • The price of this convenience is duplication: if a product's name changes and it is copied across thousands of documents, they all have to be updated.

3.2 Key-value

The simplest model: a giant dictionary that associates a key with an opaque value. Examples: Redis, DynamoDB (in its basic usage). Ideal for sessions, caches, and counters. Access pseudocode:

SET session:abc123  "{userId: 1042, expires: 1700000000}"
GET session:abc123

It is extremely fast because the operation is a direct access by key, but you cannot query by the content of the value: you can only search by the exact key.

3.3 Columnar (wide-column)

They store data by columns/column families instead of by rows, optimized for massive writes and reads over huge ranges. Examples: Apache Cassandra, HBase. They shine in time series and event logs at petabyte scale.

3.4 Graph

They model entities (nodes) and their relationships (edges) as first-class citizens. Examples: Neo4j, Amazon Neptune. When what matters are deep relationships (social networks, fraud detection, recommendations, "friends of friends"), they traverse connections far more efficiently than a recursive JOIN in SQL.

  1. Transactional guarantees: ACID versus BASE

Here is the most important conceptual difference. A transaction is a set of operations that must be treated as a single unit.

ACID (typical of the relational world) guarantees:

Letter Property Meaning
A Atomicity All or nothing: either all operations are applied or none are.
C Consistency The database moves from one valid state to another valid state (constraints are respected).
I Isolation Concurrent transactions do not interfere with each other.
D Durability Once committed, it survives system crashes.

Classic example: transferring money between two accounts must be atomic.

BEGIN;                                             -- starts the transaction
UPDATE accounts SET balance = balance - 100 WHERE id = 1; -- deducts from the source account
UPDATE accounts SET balance = balance + 100 WHERE id = 2; -- adds to the destination account
COMMIT;                                            -- commits both at once

If the system crashes between the two UPDATEs, atomicity ensures that neither is applied: the money never "disappears." This guarantee is indispensable in banking, accounting, or insurance.

BASE (common in distributed NoSQL) is the opposite approach, designed for systems that prioritize availability at large scale:

  • Basically Available: the system always responds, even if with somewhat stale data.
  • Soft state: the state may change over time without new writes (due to propagation between nodes).
  • Eventually consistent: if writes stop arriving, all nodes will eventually converge to the same value, but for a brief interval they may differ.

The choice between ACID and BASE is tied to the CAP theorem (covered in Module 4): in the face of a network partition, a distributed system must choose between consistency and availability. ACID tends to favor consistency; BASE, availability.

  1. SQL vs NoSQL comparison table

Aspect SQL (relational) NoSQL (generic)
Data model Tables with a fixed schema Documents, key-value, columns, or graphs
Schema Rigid, defined beforehand Flexible, dynamic
Relationships JOIN and referential integrity Denormalization / embedding
Transactions Full ACID Often BASE / limited ACID
Scaling Mainly vertical Native horizontal
Language Standard SQL Varies by engine
Ad-hoc queries Excellent Limited (depends on prior design)
Best for Structured data with strong rules Large volume, high scale, changing schema

  1. Selection criteria and antipatterns

Don't choose based on trends. Ask yourself these questions:

  1. Does your data have complex relationships and strong integrity rules? → SQL.
  2. Do you need unpredictable ad-hoc queries? → SQL (JOINs save you).
  3. Do you have a very well-known access pattern and an enormous volume? → NoSQL designed around that pattern.
  4. Does the schema evolve constantly and is the data semi-structured? → document.
  5. Are relationships the critical part (deep traversals)? → graph.
  6. Do you need ultra-fast access by key (sessions, cache)? → key-value.

Common Mistakes and Tips

  • Choosing NoSQL "because it scales" without having a scaling problem. Most applications never reach the volume where horizontal scaling offsets the loss of JOINs and transactions. Start with SQL unless you have a clear reason.
  • Modeling a document as if it were a table. If in document NoSQL you end up doing a "manual JOIN" in code, you chose the wrong model or the wrong document design.
  • Believing that NoSQL has no schema. It does have one; it just lives in your code instead of in the database. This shifts the validation responsibility to the application.
  • Assuming "eventual consistency" is harmless. In a bank balance or a stock level, reading a stale value can be a serious error. Know your consistency requirements.
  • Tip: measure before migrating. A well-placed index in SQL solves a great many performance problems mistakenly attributed to the "relational model."

Exercises

Exercise 1. You have a banking application that records account movements and must guarantee that money is never lost or duplicated. SQL or NoSQL? Justify with two concrete properties.

Exercise 2. Design, in JSON, a document for a social network user profile that includes their basic data and a list of their three most recent embedded posts. Indicate what maintenance problem you introduce by embedding the posts.

Exercise 3. For a product catalog where each category has completely different attributes (a book has "author," a television has "inches"), which database family fits best and why?

Solutions

Solution 1. Relational SQL with ACID transactions. Reasons: atomicity guarantees that a movement (debit + credit) is applied in full or not at all, and durability ensures that once committed it survives a crash. In addition, referential integrity and CHECK constraints prevent invalid states such as disallowed negative balances.

Solution 2. Example document:

{
  "_id": "usr_77",
  "name": "Marc Riera",
  "bio": "Photography enthusiast",
  "recent_posts": [
    { "id": "post_901", "text": "Good morning!", "likes": 12 },
    { "id": "post_900", "text": "Sunset photo", "likes": 48 }
  ]
}

Maintenance problem: the posts are duplicated. If a post is edited or its likes count changes, it must be updated both in its original collection and inside every user document that embeds it, with a risk of inconsistencies.

Solution 3. A document database. Its flexible schema allows each product to have exactly the attributes of its category without empty columns or separate tables per type, something that in SQL would force awkward EAV schemas or many tables.

Conclusion

You have learned that SQL and NoSQL are not rivals, but tools with different trade-offs: SQL provides a strong schema, flexible JOINs, and ACID guarantees; NoSQL provides a flexible schema, horizontal scaling and, in exchange, usually relaxes consistency (BASE). You know the four NoSQL families and a set of criteria for deciding, remembering that many modern architectures adopt polyglot persistence (using the right tool for each need). Once you have decided where to store the data, the next question is how to structure the code that accesses it without polluting the business logic. We'll cover that in the next lesson: Data Access Patterns: Repository, Unit of Work, and DAO.

Application Architecture Course

Module 1: Fundamentals of Application Architecture

Module 2: Design Principles and Tactics

Module 3: Architectural Styles and Patterns

Module 4: Distributed Architectures and Microservices

Module 5: Event-Driven Architectures and Messaging

Module 6: Domain-Driven Design (DDD)

Module 7: Data and Persistence

Module 8: Cloud Architecture and Deployment

Module 9: Quality, Security and Observability

Module 10: Evolution, Governance and Case Studies

© Copyright 2026. All rights reserved