Introduction

Understanding the architecture of Elasticsearch is crucial for effectively using and managing it. This section will cover the core components and how they interact to provide a scalable, distributed search and analytics engine.

Key Components of Elasticsearch Architecture

  1. Cluster

  • Definition: A cluster is a collection of one or more nodes (servers) that together hold your entire data and provide federated indexing and search capabilities.
  • Naming: Each cluster has a unique name, which defaults to "elasticsearch". Nodes in the same cluster must have the same cluster name.

  1. Node

  • Definition: A node is a single server that is part of a cluster, stores data, and participates in the cluster’s indexing and search capabilities.
  • Types of Nodes:
    • Master Node: Responsible for cluster-wide settings and operations, such as creating or deleting an index, tracking nodes in the cluster, and deciding which shards to allocate to which nodes.
    • Data Node: Stores data and performs data-related operations such as CRUD, search, and aggregations.
    • Ingest Node: Preprocesses documents before indexing, such as enriching data or transforming it.
    • Coordinating Node: Routes requests, handles search and aggregation requests, and reduces results from different shards.

  1. Index

  • Definition: An index is a collection of documents that have somewhat similar characteristics. It is identified by a name, which is used to refer to the index when performing indexing, search, update, and delete operations.
  • Structure: An index is divided into shards, and each shard can have multiple replicas.

  1. Shard

  • Definition: A shard is a single Lucene instance. It is the basic unit of storage and search in Elasticsearch.
  • Types of Shards:
    • Primary Shard: The original shard that holds the data.
    • Replica Shard: A copy of the primary shard, used for failover and increased search throughput.

  1. Document

  • Definition: A document is a basic unit of information that can be indexed. It is expressed in JSON (JavaScript Object Notation) format.
  • Structure: Each document is stored in an index and has a unique identifier.

Elasticsearch Architecture Diagram

Below is a simplified diagram to illustrate the architecture:

Cluster
  ├── Node 1 (Master + Data)
  │     ├── Index 1
  │     │     ├── Primary Shard 1
  │     │     └── Replica Shard 2
  │     └── Index 2
  │           ├── Primary Shard 3
  │           └── Replica Shard 4
  ├── Node 2 (Data)
  │     ├── Index 1
  │     │     ├── Primary Shard 2
  │     │     └── Replica Shard 1
  │     └── Index 2
  │           ├── Primary Shard 4
  │           └── Replica Shard 3
  └── Node 3 (Ingest)

Practical Example: Setting Up a Cluster

Step 1: Configure the Cluster Name

In the elasticsearch.yml configuration file, set the cluster name:

cluster.name: my_cluster

Step 2: Configure Node Roles

Define the roles of each node in the elasticsearch.yml file:

# For a master node
node.master: true
node.data: false
node.ingest: false

# For a data node
node.master: false
node.data: true
node.ingest: false

# For an ingest node
node.master: false
node.data: false
node.ingest: true

Step 3: Start the Nodes

Start each node using the command:

./bin/elasticsearch

Step 4: Verify the Cluster

Use the following command to check the cluster health and node information:

curl -X GET "localhost:9200/_cluster/health?pretty"
curl -X GET "localhost:9200/_cat/nodes?v"

Exercises

Exercise 1: Create a Cluster

  1. Set up a three-node Elasticsearch cluster with one master node, one data node, and one ingest node.
  2. Verify the cluster health and node roles using the provided commands.

Solution

  1. Configure the elasticsearch.yml files for each node as described in the practical example.
  2. Start each node and verify the cluster using:
curl -X GET "localhost:9200/_cluster/health?pretty"
curl -X GET "localhost:9200/_cat/nodes?v"

Exercise 2: Add an Index and Shards

  1. Create an index named test_index with 2 primary shards and 1 replica.
  2. Verify the index and shard allocation.

Solution

  1. Create the index using:
curl -X PUT "localhost:9200/test_index?pretty" -H 'Content-Type: application/json' -d'
{
  "settings": {
    "index": {
      "number_of_shards": 2,
      "number_of_replicas": 1
    }
  }
}
'
  1. Verify the index and shard allocation using:
curl -X GET "localhost:9200/_cat/indices?v"
curl -X GET "localhost:9200/_cat/shards?v"

Conclusion

In this section, we covered the fundamental components of Elasticsearch architecture, including clusters, nodes, indices, shards, and documents. We also provided practical steps to set up a cluster and manage indices and shards. Understanding these concepts is essential for effectively using Elasticsearch and preparing for more advanced topics.

© Copyright 2024. All rights reserved