The Project | About Us | Contribute | Donations | License

HOME

Once the persistence technology has been decided, a recurring design problem arises: how do we connect the business logic to the database without the code filling up with scattered SQL statements and dependencies on the specific engine? If every part of the application talks directly to the database, we end up with brutal coupling, code that is impossible to test, and business rules mixed with infrastructure details. Data access patterns solve exactly this: they introduce a layer that isolates how data is stored from what is done with it. In this lesson we will study the four most important patterns (DAO, Repository, Unit of Work) and the Data Mapper versus Active Record dichotomy, with examples in Java so you can see the differences in real code.

The problem: coupling to persistence
The DAO pattern (Data Access Object)
The Repository pattern
DAO vs Repository: how do they differ?
The Unit of Work pattern
Data Mapper vs Active Record
How they all fit together

The problem: coupling to persistence

Imagine a service that mixes business logic with direct database access:

public class OrderService {
    public void confirm(long orderId) throws SQLException {
        Connection con = DriverManager.getConnection("jdbc:postgresql://...");
        PreparedStatement ps = con.prepareStatement(
            "UPDATE orders SET status = 'CONFIRMED' WHERE id = ?");
        ps.setLong(1, orderId);
        ps.executeUpdate();   // business logic and SQL mixed together
        con.close();
    }
}

This has serious problems: the service depends on JDBC and on PostgreSQL, it is impossible to test without a real database, and the SQL statement will be repeated in every place that needs to touch orders. The following patterns separate responsibilities to avoid this.

The DAO pattern (Data Access Object)

A DAO encapsulates all access to a specific data source (a table, normally) and exposes persistence-oriented operations. Its vocabulary is that of the database: insert, update, delete, find by key.

First we define the interface, which hides the implementation details:

public interface OrderDao {
    void insert(Order order);
    void update(Order order);
    void delete(long id);
    Order findById(long id);
    List<Order> findByCustomerId(long customerId);
}

Each method reflects an operation against the table. Now a JDBC implementation:

public class OrderDaoJdbc implements OrderDao {

    private final DataSource dataSource;   // injected connection pool

    public OrderDaoJdbc(DataSource dataSource) {
        this.dataSource = dataSource;
    }

    @Override
    public Order findById(long id) {
        String sql = "SELECT id, customer_id, total, status FROM orders WHERE id = ?";
        try (Connection con = dataSource.getConnection();
             PreparedStatement ps = con.prepareStatement(sql)) {
            ps.setLong(1, id);                       // replaces the ? with the id
            try (ResultSet rs = ps.executeQuery()) {
                if (rs.next()) {
                    return new Order(
                        rs.getLong("id"),
                        rs.getLong("customer_id"),
                        rs.getBigDecimal("total"),
                        rs.getString("status"));
                }
                return null;
            }
        } catch (SQLException e) {
            throw new DataAccessException("Error finding order " + id, e);
        }
    }
    // ... remaining methods
}

Key points of this code:

Injected DataSource: the DAO does not create the connection, it receives it. This allows changing the data source (including a mocked one in tests).
try-with-resources: automatically closes Connection, PreparedStatement, and ResultSet even if there is an exception, avoiding resource leaks.
PreparedStatement with ?: prevents SQL injection and allows the execution plan to be reused.
Exception translation: converts the (low-level) SQLException into a custom exception, so that the rest of the application does not depend on JDBC.

The DAO centralizes all the SQL for the orders table in a single place.

The Repository pattern

A Repository operates at a higher level of abstraction: it pretends to be an in-memory collection of domain objects. It is part of the vocabulary of Domain-Driven Design (covered in Module 6) and is associated with an aggregate, not a table. Its intent is for the business code to believe it is working with a list of objects, unaware that behind it there is a database.

public interface OrderRepository {
    void save(Order order);              // "add to the collection"
    Optional<Order> get(OrderId id);
    List<Order> pendingFor(CustomerId customer);  // query in domain language
}

Subtle differences from the DAO:

It uses domain identifiers (OrderId, CustomerId), not raw longs.
It returns Optional instead of null, explicitly expressing absence.
The methods speak the ubiquitous language of the business (pendingFor), not that of SQL.
A single save method internally decides whether to insert or update; the caller neither knows nor cares.

An implementation with Spring Data JPA reduces the code to almost nothing:

public interface OrderRepository extends JpaRepository<Order, Long> {
    // Spring generates the query from the method name
    List<Order> findByCustomerIdAndStatus(Long customerId, String status);
}

Here JpaRepository already provides save, findById, etc., and Spring derives the query from the name findByCustomerIdAndStatus. The business intent stays in the name, without manual SQL.

DAO vs Repository: how do they differ?

This is the most common confusion. Both abstract data access, but their intent and level differ:

Aspect	DAO	Repository
Orientation	To the table / data source	To the domain aggregate
Vocabulary	Persistence (insert, update)	Business (collection of objects)
Granularity	One per table, normally	One per aggregate root
Origin	Classic data layer pattern	Tactical DDD pattern
Knows SQL	Yes, it exposes it conceptually	No, it hides it behind the "collection"

In practice, a Repository usually relies on one or more DAOs or on an ORM. They are not mutually exclusive: the Repository is the face the domain sees, the DAO is the internal machinery.

The Unit of Work pattern

What happens when a business operation modifies several objects and they all must be committed together? If we call save one by one, we could commit half and fail on the other half. The Unit of Work solves this: it keeps a list of affected objects during a business operation and coordinates the write and the transaction management as a single atomic unit.

public class StockTransferService {

    private final OrderRepository orderRepo;
    private final StockRepository stockRepo;

    @Transactional   // delimits the Unit of Work: all or nothing
    public void confirmOrder(OrderId id) {
        Order order = orderRepo.get(id)
            .orElseThrow(() -> new OrderNotFoundException(id));

        order.confirm();                  // changes the status in memory
        stockRepo.deduct(order.lines());  // deducts stock in memory

        orderRepo.save(order);            // marks for persistence
        // On exiting the method, @Transactional COMMITs EVERYTHING together;
        // if anything throws an exception, it ROLLBACKs EVERYTHING.
    }
}

How it works:

Spring's @Transactional annotation implements the Unit of Work: it opens a transaction on entry and commits it (COMMIT) on exiting without error, or rolls it back (ROLLBACK) if an exception is thrown.
The modifications to order and to stock accumulate and are applied atomically. It is not possible to confirm the order without deducting the stock.
In JPA/Hibernate, the EntityManager is itself a Unit of Work: it tracks changes to the loaded entities (dirty checking) and flushes them on flush.

The Unit of Work provides two benefits: atomicity (the A guarantee of ACID at the application level) and efficiency (it groups writes into a single round-trip to the database instead of many).

Data Mapper vs Active Record

These two patterns describe how an object relates to its representation in the database.

Active Record: the domain object contains its own persistence logic. The row and the object are the same thing.

// Active Record style (typical of frameworks like Ruby on Rails or some in Java)
Order order = Order.findById(42);
order.setStatus("CONFIRMED");
order.save();   // the object itself knows how to write itself to the database

Advantage: fast and direct for simple CRUD. Drawback: it mixes business and persistence in the same class, hindering testing and violating the separation of responsibilities.

Data Mapper: a separate layer (the "mapper") translates between the domain objects and the database. The domain object knows nothing about persistence.

// Data Mapper style: the object is "ignorant" of the database
Order order = new Order(/* pure business data */);
order.confirm();                 // only business logic, no save()
entityManager.persist(order);    // the mapper (JPA) takes care of writing

Hibernate/JPA are implementations of Data Mapper. Comparison:

Criterion	Active Record	Data Mapper
Domain-DB coupling	High (in the same class)	Low (separated)
Learning curve	Low	Medium
Domain testability	Limited	High (pure domain)
Suitable for	Simple CRUD, prototypes	Complex domains, DDD

Common Mistakes and Tips

Leaking infrastructure exceptions into the domain. If a SQLException or a JpaException bubbles up to the business service, you have broken the isolation. Translate them in the DAO/Repository.
A "fat" Repository with hundreds of methods. If a repository grows out of control, the aggregate is probably poorly defined. One repository per aggregate root.
Returning managed JPA entities outside the transaction. This causes the dreaded LazyInitializationException. Return DTOs or load what you need inside the transaction.
Putting @Transactional on private methods or internal calls within the same class. Spring uses proxies; the transaction is not activated on self-invocations. Place it at the public entry point.
Tip: don't abstract for the sake of abstracting. In small applications, Spring Data JPA already gives you Repository + Unit of Work without writing a DAO by hand. Add layers only when they provide value.

Exercises

Exercise 1. Explain in your own words why a Repository returns Optional<Order> instead of null, and what risk it avoids.

Exercise 2. You have to register a new customer and create their first order in a single operation that must be atomic. Sketch the service method, indicating where you would place the transactional boundary (Unit of Work).

Exercise 3. Classify each situation as Active Record or Data Mapper: (a) the Invoice class has a save() method; (b) an EntityManager persists an Invoice object that contains only business rules.

Solutions

Solution 1. Optional<Order> explicitly expresses in the type that the order may not exist. It forces the caller to handle the absence (with orElseThrow, map, etc.) and avoids the NullPointerExceptions that arise from using an unchecked null. The intent of "there may be no result" is documented in the method signature.

Solution 2.

@Transactional   // Unit of Work boundary: both saves are committed together
public void registerCustomerWithFirstOrder(CustomerData data, OrderData line) {
    Customer customer = new Customer(data.name(), data.email());
    customerRepo.save(customer);

    Order order = customer.createOrder(line);
    orderRepo.save(order);
    // COMMIT on exit; if the second save fails, ROLLBACK of the customer too.
}

The transactional boundary is placed on the service method (@Transactional), so that the creation of the customer and of the order form a single atomic unit.

Solution 3. (a) Active Record: the Invoice object contains its own persistence logic (save()). (b) Data Mapper: persistence lives outside (in the EntityManager) and the Invoice only has business rules.

Conclusion

You have seen how to isolate business logic from persistence details through four complementary patterns: the DAO centralizes SQL by data source, the Repository offers the domain a collection of objects in its own language, the Unit of Work guarantees the atomicity of operations that touch several objects, and the Data Mapper vs Active Record dichotomy decides how much the domain object knows about its own persistence. Frameworks like Spring Data JPA combine several of these patterns for you. So far we have assumed a single database; but in microservices architectures each service has its own, which opens new challenges of distributed queries and transactions. That is what we will address in the next lesson: Database per Service and Distributed Data Management.

Data Access Patterns: Repository, Unit of Work, and DAO

Contents

The problem: coupling to persistence

The DAO pattern (Data Access Object)

The Repository pattern

DAO vs Repository: how do they differ?

The Unit of Work pattern

Data Mapper vs Active Record

Common Mistakes and Tips

Exercises

Solutions

Conclusion

Application Architecture Course

Module 1: Fundamentals of Application Architecture

Module 2: Design Principles and Tactics

Module 3: Architectural Styles and Patterns

Module 4: Distributed Architectures and Microservices

Module 5: Event-Driven Architectures and Messaging

Module 6: Domain-Driven Design (DDD)

Module 7: Data and Persistence

Module 8: Cloud Architecture and Deployment

Module 9: Quality, Security and Observability

Module 10: Evolution, Governance and Case Studies