Introduction
Understanding the architecture of Elasticsearch is crucial for effectively using and managing it. This section will cover the core components and how they interact to provide a scalable, distributed search and analytics engine.
Key Components of Elasticsearch Architecture
- Cluster
- Definition: A cluster is a collection of one or more nodes (servers) that together hold your entire data and provide federated indexing and search capabilities.
- Naming: Each cluster has a unique name, which defaults to "elasticsearch". Nodes in the same cluster must have the same cluster name.
- Node
- Definition: A node is a single server that is part of a cluster, stores data, and participates in the cluster’s indexing and search capabilities.
- Types of Nodes:
- Master Node: Responsible for cluster-wide settings and operations, such as creating or deleting an index, tracking nodes in the cluster, and deciding which shards to allocate to which nodes.
- Data Node: Stores data and performs data-related operations such as CRUD, search, and aggregations.
- Ingest Node: Preprocesses documents before indexing, such as enriching data or transforming it.
- Coordinating Node: Routes requests, handles search and aggregation requests, and reduces results from different shards.
- Index
- Definition: An index is a collection of documents that have somewhat similar characteristics. It is identified by a name, which is used to refer to the index when performing indexing, search, update, and delete operations.
- Structure: An index is divided into shards, and each shard can have multiple replicas.
- Shard
- Definition: A shard is a single Lucene instance. It is the basic unit of storage and search in Elasticsearch.
- Types of Shards:
- Primary Shard: The original shard that holds the data.
- Replica Shard: A copy of the primary shard, used for failover and increased search throughput.
- Document
- Definition: A document is a basic unit of information that can be indexed. It is expressed in JSON (JavaScript Object Notation) format.
- Structure: Each document is stored in an index and has a unique identifier.
Elasticsearch Architecture Diagram
Below is a simplified diagram to illustrate the architecture:
Cluster ├── Node 1 (Master + Data) │ ├── Index 1 │ │ ├── Primary Shard 1 │ │ └── Replica Shard 2 │ └── Index 2 │ ├── Primary Shard 3 │ └── Replica Shard 4 ├── Node 2 (Data) │ ├── Index 1 │ │ ├── Primary Shard 2 │ │ └── Replica Shard 1 │ └── Index 2 │ ├── Primary Shard 4 │ └── Replica Shard 3 └── Node 3 (Ingest)
Practical Example: Setting Up a Cluster
Step 1: Configure the Cluster Name
In the elasticsearch.yml
configuration file, set the cluster name:
Step 2: Configure Node Roles
Define the roles of each node in the elasticsearch.yml
file:
# For a master node node.master: true node.data: false node.ingest: false # For a data node node.master: false node.data: true node.ingest: false # For an ingest node node.master: false node.data: false node.ingest: true
Step 3: Start the Nodes
Start each node using the command:
Step 4: Verify the Cluster
Use the following command to check the cluster health and node information:
Exercises
Exercise 1: Create a Cluster
- Set up a three-node Elasticsearch cluster with one master node, one data node, and one ingest node.
- Verify the cluster health and node roles using the provided commands.
Solution
- Configure the
elasticsearch.yml
files for each node as described in the practical example. - Start each node and verify the cluster using:
Exercise 2: Add an Index and Shards
- Create an index named
test_index
with 2 primary shards and 1 replica. - Verify the index and shard allocation.
Solution
- Create the index using:
curl -X PUT "localhost:9200/test_index?pretty" -H 'Content-Type: application/json' -d' { "settings": { "index": { "number_of_shards": 2, "number_of_replicas": 1 } } } '
- Verify the index and shard allocation using:
Conclusion
In this section, we covered the fundamental components of Elasticsearch architecture, including clusters, nodes, indices, shards, and documents. We also provided practical steps to set up a cluster and manage indices and shards. Understanding these concepts is essential for effectively using Elasticsearch and preparing for more advanced topics.
Elasticsearch Course
Module 1: Introduction to Elasticsearch
- What is Elasticsearch?
- Installing Elasticsearch
- Basic Concepts: Nodes, Clusters, and Indices
- Elasticsearch Architecture
Module 2: Getting Started with Elasticsearch
Module 3: Advanced Search Techniques
Module 4: Data Modeling and Index Management
Module 5: Performance and Scaling
Module 6: Security and Access Control
- Securing Elasticsearch
- User Authentication and Authorization
- Role-Based Access Control
- Auditing and Compliance
Module 7: Integrations and Ecosystem
- Elasticsearch with Logstash
- Elasticsearch with Kibana
- Elasticsearch with Beats
- Elasticsearch with Other Tools