Every application needs, sooner or later, to store data that survives the restart of the process. The decision about how and where to persist that data is one of the most influential and long-lasting in any architecture: it shapes performance, scalability, consistency, and even operational cost. For decades the default answer was "a relational database," but the explosion of large-scale web applications popularized a range of alternatives grouped under the term NoSQL. In this lesson you will learn the real differences between the two worlds, what guarantees they offer (ACID versus BASE), what families of NoSQL databases exist, and, above all, what objective criteria to use when choosing. The goal is not to declare a winner, but for you to know how to reason through the decision.
Contents
- The relational (SQL) model in essence
- What "NoSQL" really means
- The four NoSQL families: document, key-value, columnar, and graph
- Transactional guarantees: ACID versus BASE
- SQL vs NoSQL comparison table
- Selection criteria and antipatterns
- Polyglot persistence
- The relational (SQL) model in essence
A relational database organizes data into tables (relations) made up of rows and columns, with a rigid schema defined in advance. Relationships between tables are expressed through foreign keys, and we query the data with SQL (Structured Query Language). Examples: PostgreSQL, MySQL, Oracle, SQL Server.
Let's look at a simple relational schema for customers and their orders:
-- Customers table: each customer has a unique identifier
CREATE TABLE customers (
id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
name VARCHAR(120) NOT NULL,
email VARCHAR(180) NOT NULL UNIQUE,
created_at TIMESTAMP NOT NULL DEFAULT now()
);
-- Orders table: the customer_id column "points to" customers.id
CREATE TABLE orders (
id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
customer_id BIGINT NOT NULL REFERENCES customers(id),
total NUMERIC(10,2) NOT NULL CHECK (total >= 0),
status VARCHAR(20) NOT NULL DEFAULT 'PENDING'
);Let's analyze the code fragment by fragment:
PRIMARY KEY GENERATED ALWAYS AS IDENTITY: defines the primary key, the unique identifier of each row. The database generates it automatically, so there will never be two customers with the sameid.UNIQUEonemail: the database guarantees that no email is repeated. It is an integrity rule that lives in the data, not in the application code.REFERENCES customers(id): declares a foreign key. The database prevents inserting an order whosecustomer_iddoes not exist, avoiding "orphan orders." This is called referential integrity.CHECK (total >= 0): a constraint that rejects negative totals. Again, the rule is enforced by the engine.
To retrieve a customer's orders, we combine both tables with a JOIN:
SELECT c.name, o.id AS order_id, o.total FROM customers c JOIN orders o ON o.customer_id = c.id -- joins rows that share the id WHERE c.email = '[email protected]' ORDER BY o.total DESC;
The JOIN is the star operation of the relational model: it allows you to compose normalized data (without duplication) at query time. The strength is the flexibility to ask unforeseen questions; the cost is that very large JOINs can be slow and hard to scale horizontally.
- What "NoSQL" really means
"NoSQL" is an unfortunate name: it does not mean "no SQL" (many of these databases offer similar query languages) but rather "Not Only SQL." It groups engines that fully or partially give up the relational model to gain horizontal scalability, schema flexibility, or performance on specific accesses. Their common traits:
- Flexible schema: each record can have different fields without prior migrations.
- Native horizontal scaling: they are designed to spread data across many nodes (sharding) from day one.
- Denormalization: instead of a
JOIN, data is duplicated so that each read touches a single record. - Relaxed guarantees: they often sacrifice immediate consistency in exchange for availability and speed.
- The four NoSQL families
Not all NoSQL databases are the same. They are classified into four major families according to their data model:
3.1 Document
They store self-contained documents, usually in JSON/BSON. Each document groups all related information together. Examples: MongoDB, Couchbase. The same customer with their orders from the SQL example would look like this in a single document:
{
"_id": "cli_1042",
"name": "Ana López",
"email": "[email protected]",
"orders": [
{ "order_id": "ord_5001", "total": 49.90, "status": "SHIPPED" },
{ "order_id": "ord_5002", "total": 12.50, "status": "PENDING" }
]
}Notice the key difference from SQL:
- The orders are embedded inside the customer, in an
ordersarray. To display the customer's record with their orders, a single read is enough, noJOIN. - There is no fixed schema: another customer might not have the
ordersfield, or have an additionalphonefield, without anyone having to alter the "table." - The price of this convenience is duplication: if a product's name changes and it is copied across thousands of documents, they all have to be updated.
3.2 Key-value
The simplest model: a giant dictionary that associates a key with an opaque value. Examples: Redis, DynamoDB (in its basic usage). Ideal for sessions, caches, and counters. Access pseudocode:
It is extremely fast because the operation is a direct access by key, but you cannot query by the content of the value: you can only search by the exact key.
3.3 Columnar (wide-column)
They store data by columns/column families instead of by rows, optimized for massive writes and reads over huge ranges. Examples: Apache Cassandra, HBase. They shine in time series and event logs at petabyte scale.
3.4 Graph
They model entities (nodes) and their relationships (edges) as first-class citizens. Examples: Neo4j, Amazon Neptune. When what matters are deep relationships (social networks, fraud detection, recommendations, "friends of friends"), they traverse connections far more efficiently than a recursive JOIN in SQL.
- Transactional guarantees: ACID versus BASE
Here is the most important conceptual difference. A transaction is a set of operations that must be treated as a single unit.
ACID (typical of the relational world) guarantees:
| Letter | Property | Meaning |
|---|---|---|
| A | Atomicity | All or nothing: either all operations are applied or none are. |
| C | Consistency | The database moves from one valid state to another valid state (constraints are respected). |
| I | Isolation | Concurrent transactions do not interfere with each other. |
| D | Durability | Once committed, it survives system crashes. |
Classic example: transferring money between two accounts must be atomic.
BEGIN; -- starts the transaction UPDATE accounts SET balance = balance - 100 WHERE id = 1; -- deducts from the source account UPDATE accounts SET balance = balance + 100 WHERE id = 2; -- adds to the destination account COMMIT; -- commits both at once
If the system crashes between the two UPDATEs, atomicity ensures that neither is applied: the money never "disappears." This guarantee is indispensable in banking, accounting, or insurance.
BASE (common in distributed NoSQL) is the opposite approach, designed for systems that prioritize availability at large scale:
- Basically Available: the system always responds, even if with somewhat stale data.
- Soft state: the state may change over time without new writes (due to propagation between nodes).
- Eventually consistent: if writes stop arriving, all nodes will eventually converge to the same value, but for a brief interval they may differ.
The choice between ACID and BASE is tied to the CAP theorem (covered in Module 4): in the face of a network partition, a distributed system must choose between consistency and availability. ACID tends to favor consistency; BASE, availability.
- SQL vs NoSQL comparison table
| Aspect | SQL (relational) | NoSQL (generic) |
|---|---|---|
| Data model | Tables with a fixed schema | Documents, key-value, columns, or graphs |
| Schema | Rigid, defined beforehand | Flexible, dynamic |
| Relationships | JOIN and referential integrity |
Denormalization / embedding |
| Transactions | Full ACID | Often BASE / limited ACID |
| Scaling | Mainly vertical | Native horizontal |
| Language | Standard SQL | Varies by engine |
| Ad-hoc queries | Excellent | Limited (depends on prior design) |
| Best for | Structured data with strong rules | Large volume, high scale, changing schema |
- Selection criteria and antipatterns
Don't choose based on trends. Ask yourself these questions:
- Does your data have complex relationships and strong integrity rules? → SQL.
- Do you need unpredictable ad-hoc queries? → SQL (
JOINs save you). - Do you have a very well-known access pattern and an enormous volume? → NoSQL designed around that pattern.
- Does the schema evolve constantly and is the data semi-structured? → document.
- Are relationships the critical part (deep traversals)? → graph.
- Do you need ultra-fast access by key (sessions, cache)? → key-value.
Common Mistakes and Tips
- Choosing NoSQL "because it scales" without having a scaling problem. Most applications never reach the volume where horizontal scaling offsets the loss of
JOINs and transactions. Start with SQL unless you have a clear reason. - Modeling a document as if it were a table. If in document NoSQL you end up doing a "manual JOIN" in code, you chose the wrong model or the wrong document design.
- Believing that NoSQL has no schema. It does have one; it just lives in your code instead of in the database. This shifts the validation responsibility to the application.
- Assuming "eventual consistency" is harmless. In a bank balance or a stock level, reading a stale value can be a serious error. Know your consistency requirements.
- Tip: measure before migrating. A well-placed index in SQL solves a great many performance problems mistakenly attributed to the "relational model."
Exercises
Exercise 1. You have a banking application that records account movements and must guarantee that money is never lost or duplicated. SQL or NoSQL? Justify with two concrete properties.
Exercise 2. Design, in JSON, a document for a social network user profile that includes their basic data and a list of their three most recent embedded posts. Indicate what maintenance problem you introduce by embedding the posts.
Exercise 3. For a product catalog where each category has completely different attributes (a book has "author," a television has "inches"), which database family fits best and why?
Solutions
Solution 1. Relational SQL with ACID transactions. Reasons: atomicity guarantees that a movement (debit + credit) is applied in full or not at all, and durability ensures that once committed it survives a crash. In addition, referential integrity and CHECK constraints prevent invalid states such as disallowed negative balances.
Solution 2. Example document:
{
"_id": "usr_77",
"name": "Marc Riera",
"bio": "Photography enthusiast",
"recent_posts": [
{ "id": "post_901", "text": "Good morning!", "likes": 12 },
{ "id": "post_900", "text": "Sunset photo", "likes": 48 }
]
}Maintenance problem: the posts are duplicated. If a post is edited or its likes count changes, it must be updated both in its original collection and inside every user document that embeds it, with a risk of inconsistencies.
Solution 3. A document database. Its flexible schema allows each product to have exactly the attributes of its category without empty columns or separate tables per type, something that in SQL would force awkward EAV schemas or many tables.
Conclusion
You have learned that SQL and NoSQL are not rivals, but tools with different trade-offs: SQL provides a strong schema, flexible JOINs, and ACID guarantees; NoSQL provides a flexible schema, horizontal scaling and, in exchange, usually relaxes consistency (BASE). You know the four NoSQL families and a set of criteria for deciding, remembering that many modern architectures adopt polyglot persistence (using the right tool for each need). Once you have decided where to store the data, the next question is how to structure the code that accesses it without polluting the business logic. We'll cover that in the next lesson: Data Access Patterns: Repository, Unit of Work, and DAO.
Application Architecture Course
Module 1: Fundamentals of Application Architecture
- What Is Application Architecture?
- The Role of the Software Architect
- Quality Attributes and Non-Functional Requirements
- Architectural Decisions and Trade-offs
- Architecture Documentation: Views and the C4 Model
Module 2: Design Principles and Tactics
- Coupling, Cohesion and Separation of Concerns
- SOLID Principles Applied to Architecture
- DRY, KISS, YAGNI and Other Design Principles
- Architectural Tactics for Quality Attributes
- Managing Technical Debt
Module 3: Architectural Styles and Patterns
- Monolithic Architecture
- Layered Architecture (N-Tier)
- Client-Server Architecture
- Hexagonal Architecture (Ports and Adapters)
- Clean and Onion Architecture
Module 4: Distributed Architectures and Microservices
- Introduction to Distributed Systems
- Microservices Architecture
- Service Decomposition and Bounded Contexts
- API Gateway, Service Discovery and Inter-Service Communication
- Resilience Patterns: Circuit Breaker, Retry and Bulkhead
- The CAP Theorem and Data Consistency
Module 5: Event-Driven Architectures and Messaging
- Fundamentals of Event-Driven Architecture
- Asynchronous Messaging: Queues and Brokers
- Event Patterns: Event Sourcing and CQRS
- Managing Distributed Transactions: The Saga Pattern
- Real-Time Data Streaming
Module 6: Domain-Driven Design (DDD)
- Core DDD Concepts
- Strategic Design: Bounded Contexts and Ubiquitous Language
- Tactical Design: Entities, Aggregates and Repositories
- Context Mapping
Module 7: Data and Persistence
- Persistence Strategies: SQL vs NoSQL
- Data Access Patterns: Repository, Unit of Work and DAO
- Database per Service and Distributed Data Management
- Caching and Invalidation Strategies
Module 8: Cloud Architecture and Deployment
- Cloud Computing Fundamentals (IaaS, PaaS, SaaS)
- Containers and Orchestration with Docker and Kubernetes
- Serverless Architecture
- Cloud-Native Design Patterns
- Infrastructure as Code (IaC)
Module 9: Quality, Security and Observability
- Scalability: Horizontal vs Vertical and Load Balancing
- High Availability and Fault Tolerance
- Security by Design and Authentication/Authorization
- Observability: Logging, Metrics and Tracing
- Performance and Load Testing
