Once the persistence technology has been decided, a recurring design problem arises: how do we connect the business logic to the database without the code filling up with scattered SQL statements and dependencies on the specific engine? If every part of the application talks directly to the database, we end up with brutal coupling, code that is impossible to test, and business rules mixed with infrastructure details. Data access patterns solve exactly this: they introduce a layer that isolates how data is stored from what is done with it. In this lesson we will study the four most important patterns (DAO, Repository, Unit of Work) and the Data Mapper versus Active Record dichotomy, with examples in Java so you can see the differences in real code.
Contents
- The problem: coupling to persistence
- The DAO pattern (Data Access Object)
- The Repository pattern
- DAO vs Repository: how do they differ?
- The Unit of Work pattern
- Data Mapper vs Active Record
- How they all fit together
- The problem: coupling to persistence
Imagine a service that mixes business logic with direct database access:
public class OrderService {
public void confirm(long orderId) throws SQLException {
Connection con = DriverManager.getConnection("jdbc:postgresql://...");
PreparedStatement ps = con.prepareStatement(
"UPDATE orders SET status = 'CONFIRMED' WHERE id = ?");
ps.setLong(1, orderId);
ps.executeUpdate(); // business logic and SQL mixed together
con.close();
}
}This has serious problems: the service depends on JDBC and on PostgreSQL, it is impossible to test without a real database, and the SQL statement will be repeated in every place that needs to touch orders. The following patterns separate responsibilities to avoid this.
- The DAO pattern (Data Access Object)
A DAO encapsulates all access to a specific data source (a table, normally) and exposes persistence-oriented operations. Its vocabulary is that of the database: insert, update, delete, find by key.
First we define the interface, which hides the implementation details:
public interface OrderDao {
void insert(Order order);
void update(Order order);
void delete(long id);
Order findById(long id);
List<Order> findByCustomerId(long customerId);
}Each method reflects an operation against the table. Now a JDBC implementation:
public class OrderDaoJdbc implements OrderDao {
private final DataSource dataSource; // injected connection pool
public OrderDaoJdbc(DataSource dataSource) {
this.dataSource = dataSource;
}
@Override
public Order findById(long id) {
String sql = "SELECT id, customer_id, total, status FROM orders WHERE id = ?";
try (Connection con = dataSource.getConnection();
PreparedStatement ps = con.prepareStatement(sql)) {
ps.setLong(1, id); // replaces the ? with the id
try (ResultSet rs = ps.executeQuery()) {
if (rs.next()) {
return new Order(
rs.getLong("id"),
rs.getLong("customer_id"),
rs.getBigDecimal("total"),
rs.getString("status"));
}
return null;
}
} catch (SQLException e) {
throw new DataAccessException("Error finding order " + id, e);
}
}
// ... remaining methods
}Key points of this code:
- Injected
DataSource: the DAO does not create the connection, it receives it. This allows changing the data source (including a mocked one in tests). try-with-resources: automatically closesConnection,PreparedStatement, andResultSeteven if there is an exception, avoiding resource leaks.PreparedStatementwith?: prevents SQL injection and allows the execution plan to be reused.- Exception translation: converts the (low-level)
SQLExceptioninto a custom exception, so that the rest of the application does not depend on JDBC.
The DAO centralizes all the SQL for the orders table in a single place.
- The Repository pattern
A Repository operates at a higher level of abstraction: it pretends to be an in-memory collection of domain objects. It is part of the vocabulary of Domain-Driven Design (covered in Module 6) and is associated with an aggregate, not a table. Its intent is for the business code to believe it is working with a list of objects, unaware that behind it there is a database.
public interface OrderRepository {
void save(Order order); // "add to the collection"
Optional<Order> get(OrderId id);
List<Order> pendingFor(CustomerId customer); // query in domain language
}Subtle differences from the DAO:
- It uses domain identifiers (
OrderId,CustomerId), not rawlongs. - It returns
Optionalinstead ofnull, explicitly expressing absence. - The methods speak the ubiquitous language of the business (
pendingFor), not that of SQL. - A single
savemethod internally decides whether to insert or update; the caller neither knows nor cares.
An implementation with Spring Data JPA reduces the code to almost nothing:
public interface OrderRepository extends JpaRepository<Order, Long> {
// Spring generates the query from the method name
List<Order> findByCustomerIdAndStatus(Long customerId, String status);
}Here JpaRepository already provides save, findById, etc., and Spring derives the query from the name findByCustomerIdAndStatus. The business intent stays in the name, without manual SQL.
- DAO vs Repository: how do they differ?
This is the most common confusion. Both abstract data access, but their intent and level differ:
| Aspect | DAO | Repository |
|---|---|---|
| Orientation | To the table / data source | To the domain aggregate |
| Vocabulary | Persistence (insert, update) | Business (collection of objects) |
| Granularity | One per table, normally | One per aggregate root |
| Origin | Classic data layer pattern | Tactical DDD pattern |
| Knows SQL | Yes, it exposes it conceptually | No, it hides it behind the "collection" |
In practice, a Repository usually relies on one or more DAOs or on an ORM. They are not mutually exclusive: the Repository is the face the domain sees, the DAO is the internal machinery.
- The Unit of Work pattern
What happens when a business operation modifies several objects and they all must be committed together? If we call save one by one, we could commit half and fail on the other half. The Unit of Work solves this: it keeps a list of affected objects during a business operation and coordinates the write and the transaction management as a single atomic unit.
public class StockTransferService {
private final OrderRepository orderRepo;
private final StockRepository stockRepo;
@Transactional // delimits the Unit of Work: all or nothing
public void confirmOrder(OrderId id) {
Order order = orderRepo.get(id)
.orElseThrow(() -> new OrderNotFoundException(id));
order.confirm(); // changes the status in memory
stockRepo.deduct(order.lines()); // deducts stock in memory
orderRepo.save(order); // marks for persistence
// On exiting the method, @Transactional COMMITs EVERYTHING together;
// if anything throws an exception, it ROLLBACKs EVERYTHING.
}
}How it works:
- Spring's
@Transactionalannotation implements the Unit of Work: it opens a transaction on entry and commits it (COMMIT) on exiting without error, or rolls it back (ROLLBACK) if an exception is thrown. - The modifications to
orderand tostockaccumulate and are applied atomically. It is not possible to confirm the order without deducting the stock. - In JPA/Hibernate, the
EntityManageris itself a Unit of Work: it tracks changes to the loaded entities (dirty checking) and flushes them onflush.
The Unit of Work provides two benefits: atomicity (the A guarantee of ACID at the application level) and efficiency (it groups writes into a single round-trip to the database instead of many).
- Data Mapper vs Active Record
These two patterns describe how an object relates to its representation in the database.
Active Record: the domain object contains its own persistence logic. The row and the object are the same thing.
// Active Record style (typical of frameworks like Ruby on Rails or some in Java)
Order order = Order.findById(42);
order.setStatus("CONFIRMED");
order.save(); // the object itself knows how to write itself to the databaseAdvantage: fast and direct for simple CRUD. Drawback: it mixes business and persistence in the same class, hindering testing and violating the separation of responsibilities.
Data Mapper: a separate layer (the "mapper") translates between the domain objects and the database. The domain object knows nothing about persistence.
// Data Mapper style: the object is "ignorant" of the database Order order = new Order(/* pure business data */); order.confirm(); // only business logic, no save() entityManager.persist(order); // the mapper (JPA) takes care of writing
Hibernate/JPA are implementations of Data Mapper. Comparison:
| Criterion | Active Record | Data Mapper |
|---|---|---|
| Domain-DB coupling | High (in the same class) | Low (separated) |
| Learning curve | Low | Medium |
| Domain testability | Limited | High (pure domain) |
| Suitable for | Simple CRUD, prototypes | Complex domains, DDD |
Common Mistakes and Tips
- Leaking infrastructure exceptions into the domain. If a
SQLExceptionor aJpaExceptionbubbles up to the business service, you have broken the isolation. Translate them in the DAO/Repository. - A "fat" Repository with hundreds of methods. If a repository grows out of control, the aggregate is probably poorly defined. One repository per aggregate root.
- Returning managed JPA entities outside the transaction. This causes the dreaded
LazyInitializationException. Return DTOs or load what you need inside the transaction. - Putting
@Transactionalon private methods or internal calls within the same class. Spring uses proxies; the transaction is not activated on self-invocations. Place it at the public entry point. - Tip: don't abstract for the sake of abstracting. In small applications, Spring Data JPA already gives you Repository + Unit of Work without writing a DAO by hand. Add layers only when they provide value.
Exercises
Exercise 1. Explain in your own words why a Repository returns Optional<Order> instead of null, and what risk it avoids.
Exercise 2. You have to register a new customer and create their first order in a single operation that must be atomic. Sketch the service method, indicating where you would place the transactional boundary (Unit of Work).
Exercise 3. Classify each situation as Active Record or Data Mapper: (a) the Invoice class has a save() method; (b) an EntityManager persists an Invoice object that contains only business rules.
Solutions
Solution 1. Optional<Order> explicitly expresses in the type that the order may not exist. It forces the caller to handle the absence (with orElseThrow, map, etc.) and avoids the NullPointerExceptions that arise from using an unchecked null. The intent of "there may be no result" is documented in the method signature.
Solution 2.
@Transactional // Unit of Work boundary: both saves are committed together
public void registerCustomerWithFirstOrder(CustomerData data, OrderData line) {
Customer customer = new Customer(data.name(), data.email());
customerRepo.save(customer);
Order order = customer.createOrder(line);
orderRepo.save(order);
// COMMIT on exit; if the second save fails, ROLLBACK of the customer too.
}The transactional boundary is placed on the service method (@Transactional), so that the creation of the customer and of the order form a single atomic unit.
Solution 3. (a) Active Record: the Invoice object contains its own persistence logic (save()). (b) Data Mapper: persistence lives outside (in the EntityManager) and the Invoice only has business rules.
Conclusion
You have seen how to isolate business logic from persistence details through four complementary patterns: the DAO centralizes SQL by data source, the Repository offers the domain a collection of objects in its own language, the Unit of Work guarantees the atomicity of operations that touch several objects, and the Data Mapper vs Active Record dichotomy decides how much the domain object knows about its own persistence. Frameworks like Spring Data JPA combine several of these patterns for you. So far we have assumed a single database; but in microservices architectures each service has its own, which opens new challenges of distributed queries and transactions. That is what we will address in the next lesson: Database per Service and Distributed Data Management.
Application Architecture Course
Module 1: Fundamentals of Application Architecture
- What Is Application Architecture?
- The Role of the Software Architect
- Quality Attributes and Non-Functional Requirements
- Architectural Decisions and Trade-offs
- Architecture Documentation: Views and the C4 Model
Module 2: Design Principles and Tactics
- Coupling, Cohesion and Separation of Concerns
- SOLID Principles Applied to Architecture
- DRY, KISS, YAGNI and Other Design Principles
- Architectural Tactics for Quality Attributes
- Managing Technical Debt
Module 3: Architectural Styles and Patterns
- Monolithic Architecture
- Layered Architecture (N-Tier)
- Client-Server Architecture
- Hexagonal Architecture (Ports and Adapters)
- Clean and Onion Architecture
Module 4: Distributed Architectures and Microservices
- Introduction to Distributed Systems
- Microservices Architecture
- Service Decomposition and Bounded Contexts
- API Gateway, Service Discovery and Inter-Service Communication
- Resilience Patterns: Circuit Breaker, Retry and Bulkhead
- The CAP Theorem and Data Consistency
Module 5: Event-Driven Architectures and Messaging
- Fundamentals of Event-Driven Architecture
- Asynchronous Messaging: Queues and Brokers
- Event Patterns: Event Sourcing and CQRS
- Managing Distributed Transactions: The Saga Pattern
- Real-Time Data Streaming
Module 6: Domain-Driven Design (DDD)
- Core DDD Concepts
- Strategic Design: Bounded Contexts and Ubiquitous Language
- Tactical Design: Entities, Aggregates and Repositories
- Context Mapping
Module 7: Data and Persistence
- Persistence Strategies: SQL vs NoSQL
- Data Access Patterns: Repository, Unit of Work and DAO
- Database per Service and Distributed Data Management
- Caching and Invalidation Strategies
Module 8: Cloud Architecture and Deployment
- Cloud Computing Fundamentals (IaaS, PaaS, SaaS)
- Containers and Orchestration with Docker and Kubernetes
- Serverless Architecture
- Cloud-Native Design Patterns
- Infrastructure as Code (IaC)
Module 9: Quality, Security and Observability
- Scalability: Horizontal vs Vertical and Load Balancing
- High Availability and Fault Tolerance
- Security by Design and Authentication/Authorization
- Observability: Logging, Metrics and Tracing
- Performance and Load Testing
