Introduction

In Elasticsearch, mapping defines how documents and their fields are stored and indexed. Analyzers, on the other hand, are used to process text data during indexing and searching. Understanding these concepts is crucial for effective data modeling and search optimization.

Key Concepts

Mapping

  • Definition: Mapping is the process of defining how a document and its fields are stored and indexed.
  • Types of Fields:
    • Text: Used for full-text search.
    • Keyword: Used for structured data like IDs, email addresses, etc.
    • Numeric: Used for numbers (integer, float, etc.).
    • Date: Used for date values.
    • Boolean: Used for true/false values.
    • Object: Used for JSON objects.
    • Nested: Used for arrays of objects.

Analyzers

  • Definition: Analyzers are used to process text data during indexing and searching.
  • Components:
    • Character Filters: Preprocess the text (e.g., remove HTML tags).
    • Tokenizer: Splits text into terms or tokens.
    • Token Filters: Modify tokens (e.g., lowercase, remove stop words).

Practical Examples

Example 1: Basic Mapping

PUT /my_index
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text"
      },
      "author": {
        "type": "keyword"
      },
      "price": {
        "type": "float"
      },
      "publish_date": {
        "type": "date"
      },
      "available": {
        "type": "boolean"
      }
    }
  }
}

Explanation:

  • title is a text field suitable for full-text search.
  • author is a keyword field for exact matches.
  • price is a float field for numerical values.
  • publish_date is a date field.
  • available is a boolean field.

Example 2: Custom Analyzer

PUT /my_index
{
  "settings": {
    "analysis": {
      "char_filter": {
        "html_strip": {
          "type": "html_strip"
        }
      },
      "tokenizer": {
        "my_tokenizer": {
          "type": "standard"
        }
      },
      "filter": {
        "my_stop": {
          "type": "stop",
          "stopwords": ["the", "is", "and"]
        }
      },
      "analyzer": {
        "my_custom_analyzer": {
          "type": "custom",
          "char_filter": ["html_strip"],
          "tokenizer": "my_tokenizer",
          "filter": ["lowercase", "my_stop"]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "content": {
        "type": "text",
        "analyzer": "my_custom_analyzer"
      }
    }
  }
}

Explanation:

  • Character Filter: html_strip removes HTML tags.
  • Tokenizer: my_tokenizer uses the standard tokenizer.
  • Token Filter: my_stop removes common stop words.
  • Analyzer: my_custom_analyzer combines the character filter, tokenizer, and token filters.

Exercises

Exercise 1: Create a Custom Mapping

Task: Create an index with the following fields:

  • name (text)
  • age (integer)
  • email (keyword)
  • signup_date (date)

Solution:

PUT /user_index
{
  "mappings": {
    "properties": {
      "name": {
        "type": "text"
      },
      "age": {
        "type": "integer"
      },
      "email": {
        "type": "keyword"
      },
      "signup_date": {
        "type": "date"
      }
    }
  }
}

Exercise 2: Define a Custom Analyzer

Task: Create an index with a custom analyzer that:

  • Removes HTML tags.
  • Uses the standard tokenizer.
  • Converts text to lowercase.
  • Removes the stop words "a", "an", and "the".

Solution:

PUT /text_index
{
  "settings": {
    "analysis": {
      "char_filter": {
        "html_strip": {
          "type": "html_strip"
        }
      },
      "tokenizer": {
        "standard_tokenizer": {
          "type": "standard"
        }
      },
      "filter": {
        "stop_filter": {
          "type": "stop",
          "stopwords": ["a", "an", "the"]
        }
      },
      "analyzer": {
        "custom_analyzer": {
          "type": "custom",
          "char_filter": ["html_strip"],
          "tokenizer": "standard_tokenizer",
          "filter": ["lowercase", "stop_filter"]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "content": {
        "type": "text",
        "analyzer": "custom_analyzer"
      }
    }
  }
}

Common Mistakes and Tips

  • Mistake: Not defining the correct field type.
    • Tip: Always ensure the field type matches the data you intend to store.
  • Mistake: Overlooking the importance of analyzers for text fields.
    • Tip: Customize analyzers to improve search relevance and performance.

Conclusion

In this section, we covered the basics of mapping and analyzers in Elasticsearch. We learned how to define mappings for different field types and create custom analyzers to process text data. Understanding these concepts is essential for effective data modeling and search optimization in Elasticsearch. In the next section, we will explore index templates and how they can be used to manage mappings and settings for multiple indices.

© Copyright 2024. All rights reserved