The Project | About Us | Contribute | Donations | License

HOME

In this section, we will explore the Kafka Schema Registry, a critical component for managing and enforcing data schemas in Kafka. This module will cover the following topics:

Introduction to Schema Registry
Setting Up Schema Registry
Using Schema Registry with Kafka Producers
Using Schema Registry with Kafka Consumers
Schema Evolution and Compatibility
Practical Exercises

Introduction to Schema Registry

What is Schema Registry?

The Schema Registry is a service that provides a RESTful interface for storing and retrieving Avro schemas. It is part of the Confluent Platform and is used to manage the schemas for Kafka topics, ensuring that data is serialized and deserialized consistently.

Key Features

Centralized Schema Management: Stores all schemas in a central repository.
Schema Versioning: Supports multiple versions of schemas.
Compatibility Checks: Ensures that new schema versions are compatible with previous versions.
RESTful Interface: Provides a REST API for managing schemas.

Why Use Schema Registry?

Data Consistency: Ensures that data produced and consumed is consistent with the defined schema.
Schema Evolution: Allows for the evolution of schemas over time without breaking existing data.
Interoperability: Facilitates interoperability between different systems and applications.

Setting Up Schema Registry

Prerequisites

Kafka cluster up and running.
Confluent Platform installed.

Installation Steps

Download and Install Schema Registry:

wget http://packages.confluent.io/archive/6.0/confluent-6.0.0.tar.gz
tar -xvf confluent-6.0.0.tar.gz
cd confluent-6.0.0

Start Schema Registry:

./bin/schema-registry-start ./etc/schema-registry/schema-registry.properties

Verify Schema Registry is Running: Open a web browser and navigate to http://localhost:8081. You should see the Schema Registry REST API documentation.

Using Schema Registry with Kafka Producers

Registering a Schema

Before producing messages, you need to register the schema with the Schema Registry.

from confluent_kafka import avro
from confluent_kafka.avro import AvroProducer

schema_registry_url = 'http://localhost:8081'

value_schema_str = """
{
   "namespace": "example.avro",
   "type": "record",
   "name": "User",
   "fields": [
       {"name": "name", "type": "string"},
       {"name": "age", "type": "int"}
   ]
}
"""

value_schema = avro.loads(value_schema_str)

avro_producer = AvroProducer({
    'bootstrap.servers': 'localhost:9092',
    'schema.registry.url': schema_registry_url
}, default_value_schema=value_schema)

avro_producer.produce(topic='users', value={'name': 'John', 'age': 30})
avro_producer.flush()

Explanation

Schema Definition: The schema is defined in Avro format.
AvroProducer: The AvroProducer class is used to produce messages with Avro serialization.
Schema Registration: The schema is automatically registered with the Schema Registry when the message is produced.

Using Schema Registry with Kafka Consumers

Consuming Messages with Schema Registry

from confluent_kafka import avro
from confluent_kafka.avro import AvroConsumer

schema_registry_url = 'http://localhost:8081'

avro_consumer = AvroConsumer({
    'bootstrap.servers': 'localhost:9092',
    'group.id': 'user-consumers',
    'schema.registry.url': schema_registry_url
})

avro_consumer.subscribe(['users'])

while True:
    msg = avro_consumer.poll(10)
    if msg:
        print(f"Received message: {msg.value()}")
    avro_consumer.commit()

Explanation

AvroConsumer: The AvroConsumer class is used to consume messages with Avro deserialization.
Schema Deserialization: The schema is fetched from the Schema Registry to deserialize the message.

Schema Evolution and Compatibility

Schema Evolution

Schema evolution allows you to update the schema over time. For example, you might add a new field to the schema.

Compatibility Types

Backward Compatibility: New schema can read data written by the old schema.
Forward Compatibility: Old schema can read data written by the new schema.
Full Compatibility: Both backward and forward compatibility.

Example of Schema Evolution

Original Schema:

{
   "namespace": "example.avro",
   "type": "record",
   "name": "User",
   "fields": [
       {"name": "name", "type": "string"},
       {"name": "age", "type": "int"}
   ]
}

Updated Schema:

{
   "namespace": "example.avro",
   "type": "record",
   "name": "User",
   "fields": [
       {"name": "name", "type": "string"},
       {"name": "age", "type": "int"},
       {"name": "email", "type": ["null", "string"], "default": null}
   ]
}

Practical Exercises

Exercise 1: Register a New Schema

Define a new Avro schema for a Product record with fields id (int), name (string), and price (float).
Register the schema with the Schema Registry.
Produce a message using the new schema.

Solution

from confluent_kafka import avro
from confluent_kafka.avro import AvroProducer

schema_registry_url = 'http://localhost:8081'

value_schema_str = """
{
   "namespace": "example.avro",
   "type": "record",
   "name": "Product",
   "fields": [
       {"name": "id", "type": "int"},
       {"name": "name", "type": "string"},
       {"name": "price", "type": "float"}
   ]
}
"""

value_schema = avro.loads(value_schema_str)

avro_producer = AvroProducer({
    'bootstrap.servers': 'localhost:9092',
    'schema.registry.url': schema_registry_url
}, default_value_schema=value_schema)

avro_producer.produce(topic='products', value={'id': 1, 'name': 'Laptop', 'price': 999.99})
avro_producer.flush()

Exercise 2: Consume Messages with Schema Registry

Consume messages from the products topic.
Print the consumed messages to the console.

Solution

from confluent_kafka import avro
from confluent_kafka.avro import AvroConsumer

schema_registry_url = 'http://localhost:8081'

avro_consumer = AvroConsumer({
    'bootstrap.servers': 'localhost:9092',
    'group.id': 'product-consumers',
    'schema.registry.url': schema_registry_url
})

avro_consumer.subscribe(['products'])

while True:
    msg = avro_consumer.poll(10)
    if msg:
        print(f"Received message: {msg.value()}")
    avro_consumer.commit()

Conclusion

In this module, we covered the Kafka Schema Registry, its setup, and how to use it with Kafka producers and consumers. We also discussed schema evolution and compatibility, which are crucial for maintaining data consistency over time. The practical exercises provided hands-on experience with registering schemas and consuming messages using the Schema Registry.

Next, we will delve into more advanced topics in Kafka, such as performance tuning and multi-data center setups.

Kafka with Schema Registry

Introduction to Schema Registry

What is Schema Registry?

Key Features

Why Use Schema Registry?

Setting Up Schema Registry

Prerequisites

Installation Steps

Using Schema Registry with Kafka Producers

Registering a Schema

Explanation

Using Schema Registry with Kafka Consumers

Consuming Messages with Schema Registry

Explanation

Schema Evolution and Compatibility

Schema Evolution

Compatibility Types

Example of Schema Evolution

Practical Exercises

Exercise 1: Register a New Schema

Solution

Exercise 2: Consume Messages with Schema Registry

Solution

Conclusion

Kafka Course

Module 1: Introduction to Kafka

Module 2: Kafka Core Concepts

Module 3: Kafka Operations

Module 4: Kafka Configuration and Management

Module 5: Advanced Kafka Topics

Module 6: Kafka Ecosystem and Integrations

Module 7: Kafka Case Studies and Best Practices