A system can do exactly what it was asked to do—meet all its functional requirements—and still be a failure: because it is slow, crashes every week, cannot withstand an attack, or costs a fortune to maintain. All of that is governed by the quality attributes, also called non-functional requirements. They are, to a large extent, what architecture seeks to achieve: the "how well" versus the "what it does." In this lesson you will learn to identify them, to distinguish them from functional requirements, and, above all, to specify them in a measurable way through quality scenarios—because an attribute that cannot be measured cannot be designed for or verified.

Contents

  1. Functional versus non-functional
  2. Catalog of quality attributes
  3. The problem of vague requirements
  4. Quality scenarios: how to specify them
  5. Architectural tactics to achieve them
  6. Conflicts between attributes

  1. Functional versus non-functional

It is worth being clear about the boundary from the start.

Type of requirement Answers Example
Functional What does the system do? "The user can transfer money between their accounts"
Non-functional (quality attribute) How well does it do it? "The transfer completes in under 2 seconds 99% of the time"

Non-functional requirements are also known as quality attributes, NFRs (Non-Functional Requirements), or the -ilities (scalability, availability, etc.). They are a direct responsibility of the architecture: functionality can be implemented in many ways, but only certain structures achieve the required availability or performance.

  1. Catalog of quality attributes

Formal taxonomies exist (for example, the ISO/IEC 25010 standard). Below are the attributes most relevant in practice:

Attribute Brief definition How it is usually measured
Performance Response speed and efficient resource use Latency (ms), throughput (requests/s)
Scalability Ability to grow under more load Concurrent users supported, cost per unit of load
Availability Proportion of operational time % of uptime (e.g., 99.9%), MTBF, MTTR
Reliability Functioning correctly without failures Error rate, failures per million operations
Security Protecting data and functions from improper access Vulnerabilities, regulatory compliance
Maintainability Ease of change and correction Average time to implement a change, complexity
Testability Ease of testing the system Coverage, test suite runtime
Usability Ease of use for the end user Task completion rate, learning time
Observability Ability to understand internal state from the outside Coverage of logs, metrics, and traces
Portability Ease of moving the system to another environment Migration effort

Note on availability: the uptime percentage has very concrete consequences for the downtime tolerated per year.

Availability Maximum downtime per year (approx.)
99% ("two nines") ~3.65 days
99.9% ("three nines") ~8.77 hours
99.99% ("four nines") ~52.6 minutes
99.999% ("five nines") ~5.26 minutes

Each additional "nine" drives up cost and complexity. That is why it is an attribute that must be negotiated with the business, not set to the maximum "just in case."

  1. The problem of vague requirements

Compare these two ways of expressing the same requirement:

  • Vague: "The system must be fast."
  • Measurable: "95% of product searches must respond in under 300 ms with 1,000 concurrent users."

The first is useless: fast for whom, in what operation, under what load, toward what goal? You cannot design against it or verify whether it is met. The second is measurable, specific, and verifiable. The golden rule is:

A quality attribute that is not measurable is not a requirement; it is a wish.

  1. Quality scenarios: how to specify them

The SEI (Software Engineering Institute) proposes quality scenarios as the standard way to specify non-functional attributes. A scenario has six parts:

Part Meaning Example (performance)
Source of the stimulus Who/what generates the event An external user
Stimulus The event that arrives Launches a product search
Artifact Part of the system affected The catalog service
Environment Conditions under which it occurs At peak hour, with 1,000 concurrent users
Response What the system does Returns the search results
Response measure Quantifiable success criterion In under 300 ms 95% of the time

Read as a sentence: "An external user launches a product search in the catalog service at peak hour with 1,000 concurrent users; the system returns the results in under 300 ms 95% of the time."

We can represent it in a structured way:

quality_scenario:
  attribute: performance
  source: external_user
  stimulus: product_search
  artifact: catalog_service
  environment: peak_hour_1000_concurrent_users
  response: return_search_results
  measure:
    metric: response_time_ms
    target: 300
    percentile: 95   # 95% of requests must meet the target

This YAML formalizes the scenario so it can be discussed and verified. What matters is not the specific format but that each scenario includes the six parts. The percentile: 95 field is key: specifying a percentile (P95, P99) instead of an average avoids deception, because a low average can hide a small percentage of disastrously slow requests that ruin the experience.

An availability scenario could be written like this:

quality_scenario:
  attribute: availability
  source: system_node
  stimulus: unexpected_instance_crash
  artifact: payment_service
  environment: normal_operation
  response: reroute_traffic_to_healthy_instance_without_data_loss
  measure:
    uptime_target: "99.9%"
    max_recovery_time_seconds: 30

Here the stimulus comes not from a user but from an internal failure (an instance crashing). The expected response is that the system recovers automatically by rerouting traffic, with an availability target of 99.9% and a maximum recovery time of 30 seconds. Specifying it this way allows fault tolerance to be designed and tested concretely.

  1. Architectural tactics to achieve them

A tactic is a design decision that influences a specific quality attribute. Some common ones:

  • Performance: caching, indexes, load balancing, asynchronous processing.
  • Availability: redundancy, automatic failover, health checks, circuit breakers.
  • Scalability: horizontal scaling (more instances), data partitioning (sharding), queues to absorb spikes.
  • Security: authentication and authorization, encryption in transit and at rest, principle of least privilege.
  • Maintainability: low coupling, high cohesion, modularity, automated tests.
// Example of an availability tactic: a simplified circuit breaker.
// If a dependent service fails repeatedly, we stop calling it for a while
// to avoid exhausting resources and give it room to recover ("fail fast").
public class CircuitBreaker {
    private int consecutiveFailures = 0;
    private final int threshold = 5;          // after 5 failures, we open the circuit
    private boolean open = false;

    public Response call(RemoteService service) {
        if (open) {
            // Circuit open: we don't call; we return a fallback response.
            return Response.degraded();
        }
        try {
            Response r = service.invoke();
            consecutiveFailures = 0;          // success: we reset the counter
            return r;
        } catch (Exception e) {
            consecutiveFailures++;
            if (consecutiveFailures >= threshold) {
                open = true;                  // too many failures: we open the circuit
            }
            return Response.degraded();
        }
    }
}

This Java example implements, in a very simplified way, the circuit breaker pattern, a classic availability tactic. The idea: if a remote service fails threshold times in a row (consecutiveFailures >= threshold), the circuit "opens" (open = true) and we stop calling it, returning an immediate degraded response instead of waiting for it to fail again. This prevents a downed service from dragging down the entire system (domino effect) and gives it time to recover. In production you would use mature libraries such as Resilience4j, but this is the concept.

  1. Conflicts between attributes

Quality attributes are almost never independent: improving one usually harms another. That is why architecture is the art of compromise (which we will look at in detail in the next lesson).

If you improve... You may harm... Reason
Security (more encryption/validation) Performance Encryption and checks consume time
Availability (more redundancy) Cost More servers and replication cost money
Performance (more caching) Consistency Cached data can become stale
Scalability (distributed systems) Maintainability/simplicity Distributed systems are more complex to operate and debug

The practical conclusion is that there is no "optimal" architecture in the abstract: there is the architecture appropriate for a prioritized set of quality attributes. That is why the architect's first job is to get the business to prioritize: what matters more, cost or availability? speed or perfect consistency?

Common Mistakes and Tips

  • Leaving NFRs for the end. Quality attributes must be captured at the start, because they shape the entire structure. Adding availability to a system that was not designed for it is extremely expensive.
  • Specifying them vaguely. "Fast," "secure," "scalable" are not requirements. Use quality scenarios with concrete measures.
  • Asking for the maximum in everything. Wanting five nines of availability, minimal latency, and minimal cost all at once is contradictory. You have to prioritize.
  • Using averages instead of percentiles. An average hides the worst cases. Measure P95/P99.
  • Tip: turn every NFR into something verifiable. If you don't know how to measure that it is met, rewrite it until you can.

Exercises

Exercise 1. Rewrite the following vague requirement as a quality scenario with its six parts: "The login system must be secure and fast."

Exercise 2. An online store requires 99.99% availability. Roughly how much downtime per year does it tolerate? What two tactics would you propose to achieve it, and which attribute would be harmed?

Exercise 3. Identify the conflict between attributes in this decision: "We are going to encrypt all data in the database and validate every field in three separate layers." Which attribute improves and which suffers?

Solutions

Solution 1. Example scenario for the performance part of login: Source: a registered user. Stimulus: submits their credentials. Artifact: the authentication service. Environment: normal operation with 500 concurrent users. Response: validates the credentials and returns a session token. Measure: in under 500 ms 99% of the time. (For the security part, a separate scenario would be written, e.g.: after 5 failed attempts in 1 minute from the same IP, the system locks the account and logs the security event.)

Solution 2. 99.99% tolerates approximately 52.6 minutes of downtime per year. Possible tactics: redundancy with several instances in different zones, and automatic failover with a load balancer and health checks. The harmed attribute is primarily cost (more infrastructure and replication), and secondarily operational complexity/maintainability.

Solution 3. It improves security (encryption and exhaustive validation). Performance suffers: encryption/decryption and triple validation add latency and CPU consumption on every operation. It can also affect maintainability if the validation is duplicated across three layers without a single source of truth.

Conclusion

You have learned to distinguish functional from non-functional requirements, to handle a catalog of quality attributes, to specify them in a measurable way with six-part quality scenarios, to apply tactics to achieve them, and to recognize that they almost always conflict with one another. It is precisely that constant clash between attributes—availability versus cost, performance versus consistency—that is at the heart of architectural work. In the next lesson we will tackle head-on how those hard decisions are made: architectural decisions and trade-offs, including formal evaluation methods such as ATAM.

Application Architecture Course

Module 1: Fundamentals of Application Architecture

Module 2: Design Principles and Tactics

Module 3: Architectural Styles and Patterns

Module 4: Distributed Architectures and Microservices

Module 5: Event-Driven Architectures and Messaging

Module 6: Domain-Driven Design (DDD)

Module 7: Data and Persistence

Module 8: Cloud Architecture and Deployment

Module 9: Quality, Security and Observability

Module 10: Evolution, Governance and Case Studies

© Copyright 2026. All rights reserved