Category Archives: search

Elastic Search

 ElasticSearch

  • Open  Source Distributed and Highly Available Search Engine (multitenant)
  • Built in Java
  • Distributed RESTful search engine
  • REST/JSON based and has native Java API
  • Document oriented and schema free (no need for upfront schema)
  • Built on top of Lucene
  • capable of full text search, filters, highlight, sorting, pagination, suggestions
  • Extensible ecosystem – Can create custom plugins to extend functionality (e.g. aggregation functions, analyzers) , has clients for many languages (java, python, javascript), integrate with Kibana (graphical overview of data) , Logstash, hadoop

Terminology

  1. NRT (Near Realtime) – There is slight delay (~1s) when document is indexed and becomes searchable
  2. Cluster – collection of nodes. Each cluster has unique name (default-elasticsearch)
  3. Node – single server part of cluster, has unique name
  4. Index – collection of documents with similar characteristics (fields)
  5. Type – logical category of index. Type is defined for documents which has similar fields. e.g. in blog – post data, comment data, user data
  6. Document – basic unit of information that can be indexed – json format
  7. Shards (Sharding – Data partitioning) – subdivide index to multiple pieces called shards. Each shard is independent index and can be hosted on any node in the cluster. Split logical data over several machines  – write scalability – control data flows
  8. Replica shard (Data duplication) – copy of shard for failover & scaling purposes (search can be executes in all replicas in parallel) – read scalability – removing single point of failure (SPOF)

Concept – Distributed

  1. first screen – shard and replica contract setup during index creation
  2. On firing second node, it will look like second screen – green background are primary shards where data get indexed first and then to replica shards
  3. On firing third node, it will look like third screen

Screen 1

distributed

Screen 2

distributed-2

Screen 3

distributed-3

Advanced Concepts

Resources

  1. https://www.elastic.co/ [Blog]
  2. Elastic Search
  3. Elastic Search Reference
  4. Github elastic search
  5. Elastic Search Guide [ Elastic Search Definite Guide]
  6. Elastic Search Java API
  7. Book – Elastic search definite guide
  8. Plugins  [Writing Custom plugins]
  9. Luke