What is mapping at Elasticsearch and why it is so important

W

What is Mapping?

Mapping in Elasticsearch refers to the process of defining how documents and their fields are stored and indexed. In simpler terms, it’s a schema that defines the data structure within an index. Elasticsearch uses a JSON-based language for mapping, allowing users to specify the data types for each field, configure the index, and define various properties for efficient storage and retrieval.

Types of Mapping:

1. Static Mapping:

  • In static mapping, the data types and settings for fields are explicitly defined by the user during index creation.
  • This approach offers precision and control over the data structure but requires careful planning and consideration of the data model.

2. Dynamic Mapping:

  • Dynamic mapping allows Elasticsearch to automatically detect and define the data types of fields based on the first few documents indexed.
  • It provides flexibility but may lead to unexpected mappings if not managed properly, making it crucial to understand and control dynamic mapping behavior.

While dynamic mapping offers flexibility, it’s advisable to predefine mappings for stability and control over the data structure. And now MY PERSONAL PRACTICE RECOMMENDATION:

 Never, attention, never use default dynamic mapping. Always define it explicitly or you can get big problems and spend a lot of time for debugging.

Let me show you at practice why dynamic mapping is a bad decision, especially when you start your adventure with Elasticsearch. Let’s create a test index at first, assuming that Elasticsearch is already up and running at localhost:9202. I will using postman for that purpose

Now, let’s index test document with only one field: create_at:

And the most interested part – let’s run simple date range query:

And, here is a big surprise – our search query returned zero results. Holly, molly. What is going on here? The reason of the problem is – MAPPING. While adding document we did not set any mapping, as result – Elasticsearch tried to determine the type of filed by itself. And here is how it was done dynamically:

As you see, Elasticsearch assigned to created_at filed a Text type, and range query is not working with that field type. In that concrete case we want created_at to be treated as Date type field. Let fix it now, by removing and recreating index from zero, and then setting mapping explicitly as it is shown at screen below:

And now, when we will index our test document and run range query again, you will get the expected result:

Hope, now you understand why mapping is so important. The problem described above is far from the worst what you can meet at practice without having control at mapping settings. At the end I would like to provide some best practices around mapping

Best Practices for Mapping in Elasticsearch:

  1. Predefine Mapping for Stability: While dynamic mapping offers flexibility, it’s advisable to predefine mappings for stability and control over the data structure.
  2. Regularly Review and Update Mapping: As data evolves, it’s essential to review and update mappings to accommodate new requirements and ensure optimal performance.
  3. Understand Analyzers and Tokenizers: Gain a good understanding of analyzers and tokenizers to configure how text fields are processed during indexing, ensuring accurate and relevant search results.
  4. Use Index Templates: Index templates allow users to automate mapping configurations for multiple indices, streamlining the process and maintaining consistency across the cluster.

Conclusion:

In conclusion, mapping is a foundational concept in Elasticsearch that significantly influences the efficiency, accuracy, and adaptability of data indexing. By carefully defining and managing mappings, users can harness the full potential of Elasticsearch, ensuring optimal performance and scalability for diverse data sets. Understanding the nuances of mapping is essential for anyone working with Elasticsearch, empowering them to make informed decisions that align with the specific requirements of their applications.

P.S.

Below are the links to the courses, where you can find a lot of useful and, first of all, practical information about Elasticsearch:

architecture AWS cluster cyber-security devops devops-basics docker elasticsearch flask geo high availability java machine learning opensearch php programming languages python recommendation systems search systems spring boot symfony