Database

Elasticsearch: Search Engine & Analytics

Queries, mappings, aggregations, index management — panduan lengkap membangun full-text search dan analytics dengan Elasticsearch

1. Pengenalan Elasticsearch

Elasticsearch adalah search engine dan analytics database yang dibangun di atas library Apache Lucene. Elasticsearch sangat cepat untuk full-text search, filtering, dan aggregations pada data dalam skala besar — dari jutaan hingga milyaran dokumen.

Elasticsearch adalah bagian dari Elastic Stack (sebelumnya ELK Stack) yang terdiri dari:

Komponen Fungsi
ElasticsearchSearch engine & analytics database
LogstashData pipeline — collect, transform, load
KibanaVisualization & dashboard UI
BeatsLightweight data shippers (Filebeat, Metricbeat)

Kapan Menggunakan Elasticsearch?

Skenario Cocok?
Full-text search (e-commerce, blog, docs)✅ Sangat cocok
Log monitoring & analytics✅ Sangat cocok
Real-time analytics (dashboard)✅ Cocok
ACID transaction database❌ Tidak cocok — pakai RDBMS
Relational data dengan JOIN❌ Tidak cocok — pakai RDBMS
Data archiving (cold storage)⚠️ Bisa, tapi mahal — pakai S3
Diagram: Arsitektur Elasticsearch Cluster
┌─────────────────────────────────────────────────────────────────┐
│                ELASTICSEARCH CLUSTER                             │
│                                                                 │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐         │
│  │  Master Node  │  │  Master Node  │  │  Master Node  │         │
│  │  (eligible)   │  │  (eligible)   │  │  (active)     │         │
│  └──────────────┘  └──────────────┘  └──────────────┘         │
│                                                                 │
│  Index: products (3 shards, 1 replica)                          │
│  ┌────────────┐ ┌────────────┐ ┌────────────┐                 │
│  │  Shard 0    │ │  Shard 1    │ │  Shard 2    │  ← Primary    │
│  │  (Node 1)   │ │  (Node 2)   │ │  (Node 3)   │               │
│  └────────────┘ └────────────┘ └────────────┘                 │
│  ┌────────────┐ ┌────────────┐ ┌────────────┐                 │
│  │  Replica 0  │ │  Replica 1  │ │  Replica 2  │  ← Replicas   │
│  │  (Node 2)   │ │  (Node 3)   │ │  (Node 1)   │               │
│  └────────────┘ └────────────┘ └────────────┘                 │
│                                                                 │
│  Client → Mengirim request ke salah satu node (coordinating)    │
│  Node → Route ke shard yang tepat → return hasil                │
└─────────────────────────────────────────────────────────────────┘

2. Instalasi & Setup

Bash — Instalasi Elasticsearch
# =============================================
# DOCKER (Cara Tercepat untuk Development)
# =============================================

# Elasticsearch + Kibana
docker network create elastic

docker run -d --name elasticsearch \
  --net elastic \
  -p 9200:9200 -p 9300:9300 \
  -e "discovery.type=single-node" \
  -e "xpack.security.enabled=false" \
  -e "ES_JAVA_OPTS=-Xms512m -Xmx512m" \
  docker.elastic.co/elasticsearch/elasticsearch:8.14.0

docker run -d --name kibana \
  --net elastic \
  -p 5601:5601 \
  -e "ELASTICSEARCH_HOSTS=http://elasticsearch:9200" \
  docker.elastic.co/kibana/kibana:8.14.0

# Tunggu ~30 detik, lalu test
curl http://localhost:9200
# Kibana: http://localhost:5601

# =============================================
# UBUNTU/DEBIAN (Instalasi Manual)
# =============================================
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
sudo apt install apt-transport-https
echo "deb https://artifacts.elastic.co/packages/8.x/apt stable main" | \
  sudo tee /etc/apt/sources.list.d/elastic-8.x.list
sudo apt update && sudo apt install elasticsearch

# Edit config
sudo nano /etc/elasticsearch/elasticsearch.yml
# cluster.name: my-cluster
# network.host: 0.0.0.0
# discovery.type: single-node
# xpack.security.enabled: false

# Start
sudo systemctl start elasticsearch
sudo systemctl enable elasticsearch

# =============================================
# TEST API
# =============================================
# Cluster health
curl -s http://localhost:9200/_cluster/health | python3 -m json.tool

# Cluster info
curl -s http://localhost:9200 | python3 -m json.tool

# Node stats
curl -s http://localhost:9200/_nodes/stats | python3 -m json.tool

3. Index & Document Management

Di Elasticsearch, data disimpan sebagai documents dalam index. Analoginya: index = tabel, document = baris, field = kolom. Tapi berbeda dari RDBMS, setiap document adalah JSON object yang fleksibel (schema-free).

REST API — Index & Document
# =============================================
# MEMBUAT INDEX
# =============================================

# Create index dengan settings
curl -X PUT "http://localhost:9200/products" -H 'Content-Type: application/json' -d '{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1,
    "analysis": {
      "analyzer": {
        "indonesian": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": ["lowercase", "stop"]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "name": { "type": "text", "analyzer": "indonesian" },
      "description": { "type": "text" },
      "price": { "type": "float" },
      "category": { "type": "keyword" },
      "stock": { "type": "integer" },
      "created_at": { "type": "date" },
      "tags": { "type": "keyword" }
    }
  }
}'

# Cek index
curl -s "http://localhost:9200/products" | python3 -m json.tool

# Hapus index
curl -X DELETE "http://localhost:9200/products"


# =============================================
# INDEX DOCUMENT (Create/Update)
# =============================================

# Index document dengan ID otomatis
curl -X POST "http://localhost:9200/products/_doc" -H 'Content-Type: application/json' -d '{
  "name": "Laptop ASUS ROG Strix",
  "description": "Laptop gaming dengan processor Intel i9 dan GPU RTX 4090",
  "price": 25000000,
  "category": "Elektronik",
  "stock": 15,
  "tags": ["gaming", "laptop", "asus"],
  "created_at": "2026-06-26"
}'

# Index document dengan ID spesifik
curl -X PUT "http://localhost:9200/products/_doc/1" -H 'Content-Type: application/json' -d '{
  "name": "Keyboard Mechanical Keychron K2",
  "description": "Keyboard mechanical wireless dengan switch Gateron",
  "price": 1200000,
  "category": "Aksesoris",
  "stock": 50,
  "tags": ["keyboard", "mechanical", "wireless"],
  "created_at": "2026-06-20"
}'

# Bulk index (banyak document sekaligus)
curl -X POST "http://localhost:9200/_bulk" -H 'Content-Type: application/json' -d '
{"index": {"_index": "products", "_id": "2"}}
{"name": "Mouse Logitech MX Master 3", "description": "Mouse wireless ergonomis", "price": 1500000, "category": "Aksesoris", "stock": 30, "created_at": "2026-06-22"}
{"index": {"_index": "products", "_id": "3"}}
{"name": "Monitor LG 27 4K", "description": "Monitor 4K UHD untuk desain dan gaming", "price": 5500000, "category": "Elektronik", "stock": 10, "created_at": "2026-06-21"}
{"index": {"_index": "products", "_id": "4"}}
{"name": "Samsung Galaxy S24 Ultra", "description": "Smartphone flagship dengan S-Pen dan kamera 200MP", "price": 19000000, "category": "Elektronik", "stock": 25, "created_at": "2026-06-15"}
{"index": {"_index": "products", "_id": "5"}}
{"name": "Meja Gaming Secretlab", "description": "Meja gaming dengan cable management dan RGB", "price": 3500000, "category": "Furniture", "stock": 8, "created_at": "2026-06-18"}
'


# =============================================
# GET DOCUMENT
# =============================================
curl -s "http://localhost:9200/products/_doc/1" | python3 -m json.tool


# =============================================
# UPDATE DOCUMENT
# =============================================
curl -X POST "http://localhost:9200/products/_update/1" -H 'Content-Type: application/json' -d '{
  "doc": {
    "price": 1100000,
    "stock": 45
  }
}'

# Scripted update (increment)
curl -X POST "http://localhost:9200/products/_update/1" -H 'Content-Type: application/json' -d '{
  "script": {
    "source": "ctx._source.stock -= params.qty",
    "params": { "qty": 2 }
  }
}'


# =============================================
# DELETE DOCUMENT
# =============================================
curl -X DELETE "http://localhost:9200/products/_doc/1"

4. Mappings — Schema Definition

Mapping mendefinisikan bagaimana Elasticsearch menyimpan dan mengindeks setiap field. Mirip dengan schema di RDBMS, tapi lebih fleksibel.

REST API — Mappings
# =============================================
# TIPE DATA UMUM
# =============================================
# text      → Full-text searchable (di-analyze, di-tokenize)
# keyword   → Exact match (tidak di-analyze)
# integer, long, float, double → Numeric
# boolean   → true/false
# date      → Tanggal (berbagai format)
# object    → JSON nested object
# nested    → Array of objects (independently searchable)
# geo_point → Koordinat lat/lon
# ip        → IP address

curl -X PUT "http://localhost:9200/articles" -H 'Content-Type: application/json' -d '{
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "standard",
        "fields": {
          "keyword": { "type": "keyword", "ignore_above": 256 }
        }
      },
      "content": { "type": "text" },
      "author": {
        "properties": {
          "name": { "type": "text" },
          "email": { "type": "keyword" }
        }
      },
      "tags": { "type": "keyword" },
      "views": { "type": "long" },
      "rating": { "type": "float" },
      "published_at": {
        "type": "date",
        "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
      },
      "status": { "type": "keyword" },
      "comments": {
        "type": "nested",
        "properties": {
          "user": { "type": "keyword" },
          "text": { "type": "text" },
          "date": { "type": "date" }
        }
      },
      "location": { "type": "geo_point" }
    }
  }
}'


# =============================================
# TEXT vs KEYWORD (Perbedaan Penting!)
# =============================================
# text: "Laptop Gaming ASUS" → dipecah jadi ["laptop", "gaming", "asus"]
#   → Cocok untuk: search "laptop" bisa menemukan "Laptop Gaming ASUS"
#   → TIDAK bisa untuk: sorting, exact aggregation

# keyword: "Laptop Gaming ASUS" → tetap utuh "Laptop Gaming ASUS"
#   → Cocok untuk: filter exact, sorting, aggregation
#   → TIDAK bisa untuk: partial search

# Multi-field: keduanya!
# "title": { "type": "text", "fields": { "keyword": { "type": "keyword" } } }
# → title → bisa search full-text
# → title.keyword → bisa sort & aggregate


# =============================================
# CEK & UPDATE MAPPING
# =============================================
# Get mapping
curl -s "http://localhost:9200/articles/_mapping" | python3 -m json.tool

# Tambah field baru (TIDAK bisa ubah tipe field yang sudah ada!)
curl -X PUT "http://localhost:9200/articles/_mapping" -H 'Content-Type: application/json' -d '{
  "properties": {
    "language": { "type": "keyword" },
    "read_time_minutes": { "type": "integer" }
  }
}'
⚠️ Tidak Bisa Ubah Mapping yang Sudah Ada

Sekali field didefinisikan, tipe datanya tidak bisa diubah. Jika perlu ubah, Anda harus buat index baru dengan mapping yang benar, lalu reindex data dari index lama ke index baru menggunakan _reindex API.

5. Query DSL — Search Documents

Elasticsearch menggunakan Query DSL (Domain Specific Language) berbasis JSON untuk pencarian. Ada dua kategori query: Leaf queries (match, term, range) dan Compound queries (bool, dis_max).

REST API — Query DSL
# =============================================
# MATCH ALL (ambil semua)
# =============================================
curl -X GET "http://localhost:9200/products/_search" -H 'Content-Type: application/json' -d '{
  "query": { "match_all": {} }
}'


# =============================================
# MATCH (full-text search)
# =============================================
# Cari "gaming laptop" di field name
curl -X GET "http://localhost:9200/products/_search" -H 'Content-Type: application/json' -d '{
  "query": {
    "match": {
      "name": "gaming laptop"
    }
  }
}'
# Ini akan menemukan: "Laptop ASUS ROG Strix" (karena ada "laptop" di nama)


# =============================================
# MATCH_PHRASE (exact phrase)
# =============================================
curl -X GET "http://localhost:9200/products/_search" -H 'Content-Type: application/json' -d '{
  "query": {
    "match_phrase": {
      "description": "wireless ergonomis"
    }
  }
}'
# Hanya menemukan jika frasa "wireless ergonomis" berurutan


# =============================================
# TERM (exact match — untuk keyword fields)
# =============================================
curl -X GET "http://localhost:9200/products/_search" -H 'Content-Type: application/json' -d '{
  "query": {
    "term": {
      "category": "Elektronik"
    }
  }
}'


# =============================================
# RANGE (numeric/date range)
# =============================================
curl -X GET "http://localhost:9200/products/_search" -H 'Content-Type: application/json' -d '{
  "query": {
    "range": {
      "price": {
        "gte": 1000000,
        "lte": 10000000
      }
    }
  }
}'

# Date range
curl -X GET "http://localhost:9200/products/_search" -H 'Content-Type: application/json' -d '{
  "query": {
    "range": {
      "created_at": {
        "gte": "2026-06-01",
        "lte": "2026-06-30",
        "format": "yyyy-MM-dd"
      }
    }
  }
}'


# =============================================
# EXISTS (field ada/tidak)
# =============================================
curl -X GET "http://localhost:9200/products/_search" -H 'Content-Type: application/json' -d '{
  "query": {
    "exists": {
      "field": "description"
    }
  }
}'


# =============================================
# WILDCARD & PREFIX
# =============================================
# Prefix
curl -X GET "http://localhost:9200/products/_search" -H 'Content-Type: application/json' -d '{
  "query": {
    "prefix": {
      "name": "lap"
    }
  }
}'

# Wildcard
curl -X GET "http://localhost:9200/products/_search" -H 'Content-Type: application/json' -d '{
  "query": {
    "wildcard": {
      "name": "*rog*"
    }
  }
}'


# =============================================
# SORTING & PAGINATION
# =============================================
curl -X GET "http://localhost:9200/products/_search" -H 'Content-Type: application/json' -d '{
  "query": { "match_all": {} },
  "sort": [
    { "price": { "order": "desc" } },
    { "created_at": { "order": "desc" } }
  ],
  "from": 0,
  "size": 10
}'


# =============================================
# SOURCE FILTERING (ambil kolom tertentu)
# =============================================
curl -X GET "http://localhost:9200/products/_search" -H 'Content-Type: application/json' -d '{
  "query": { "match_all": {} },
  "_source": ["name", "price", "category"]
}'

6. Bool Query — Kombinasi Filter

Bool query adalah query paling sering digunakan — mengkombinasikan beberapa kondisi dengan AND, OR, NOT.

REST API — Bool Query
# =============================================
# BOOL QUERY STRUCTURE
# =============================================
# {
#   "bool": {
#     "must": [...],        // AND — semua harus terpenuhi (affects score)
#     "filter": [...],      // AND — semua harus terpenuhi (NO score, cached)
#     "should": [...],      // OR — minimal satu terpenuhi (affects score)
#     "must_not": [...]     // NOT — semua harus TIDAK terpenuhi
#   }
# }

# =============================================
# CONTOH: Cari laptop gaming murah
# =============================================
curl -X GET "http://localhost:9200/products/_search" -H 'Content-Type: application/json' -d '{
  "query": {
    "bool": {
      "must": [
        { "match": { "name": "laptop" } }
      ],
      "filter": [
        { "term": { "category": "Elektronik" } },
        { "range": { "price": { "lte": 20000000 } } },
        { "range": { "stock": { "gt": 0 } } }
      ]
    }
  }
}'


# =============================================
# CONTOH: Search dengan kategori opsional
# =============================================
curl -X GET "http://localhost:9200/products/_search" -H 'Content-Type: application/json' -d '{
  "query": {
    "bool": {
      "must": [
        { "match": { "description": "wireless gaming" } }
      ],
      "should": [
        { "term": { "category": "Elektronik" } },
        { "term": { "category": "Aksesoris" } }
      ],
      "minimum_should_match": 1,
      "must_not": [
        { "term": { "status": "deleted" } }
      ]
    }
  }
}'


# =============================================
# MUST vs FILTER: Perbedaan Scoring
# =============================================
# must:   Mempengaruhi _score (relevancy score)
# filter: TIDAK mempengaruhi _score (lebih cepat, di-cache)

# Contoh: filter lebih cepat untuk kondisi yang pasti
# "harga di bawah 5 juta" → filter (tidak perlu score)
# "mengandung kata gaming" → must (perlu score untuk ranking)


# =============================================
# NESTED QUERY (untuk nested fields)
# =============================================
curl -X GET "http://localhost:9200/articles/_search" -H 'Content-Type: application/json' -d '{
  "query": {
    "nested": {
      "path": "comments",
      "query": {
        "bool": {
          "must": [
            { "match": { "comments.text": "artikel bagus" } },
            { "term": { "comments.user": "budi" } }
          ]
        }
      }
    }
  }
}'

7. Full-Text Search & Scoring

REST API — Full-Text Search
# =============================================
# MULTI_MATCH (search di beberapa field)
# =============================================
curl -X GET "http://localhost:9200/products/_search" -H 'Content-Type: application/json' -d '{
  "query": {
    "multi_match": {
      "query": "gaming wireless",
      "fields": ["name^3", "description^2", "tags"],
      "type": "best_fields",
      "tie_breaker": 0.3
    }
  }
}'
# name^3 = bobot 3x lebih penting dari default
# best_fields = ambil score terbaik dari satu field
# tie_breaker = bonus score dari field lain (0-1)


# =============================================
# FUZZY SEARCH (toleransi typo)
# =============================================
curl -X GET "http://localhost:9200/products/_search" -H 'Content-Type: application/json' -d '{
  "query": {
    "match": {
      "name": {
        "query": "kyboard mekanikal",
        "fuzziness": "AUTO"
      }
    }
  }
}'
# "kyboard" → bisa menemukan "keyboard"
# fuzziness: AUTO = otomatis sesuaikan berdasarkan panjang kata


# =============================================
# HIGHLIGHT (sorot kata yang cocok)
# =============================================
curl -X GET "http://localhost:9200/products/_search" -H 'Content-Type: application/json' -d '{
  "query": {
    "match": {
      "description": "gaming laptop"
    }
  },
  "highlight": {
    "fields": {
      "description": {
        "pre_tags": ["<em>"],
        "post_tags": ["</em>"],
        "fragment_size": 150,
        "number_of_fragments": 3
      }
    }
  }
}'
# Hasil: "Laptop <em>gaming</em> dengan processor Intel..."


# =============================================
# CUSTOM ANALYZER (untuk bahasa Indonesia)
# =============================================
curl -X PUT "http://localhost:9200/produk_id" -H 'Content-Type: application/json' -d '{
  "settings": {
    "analysis": {
      "analyzer": {
        "indonesian_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "indonesian_stop",
            "indonesian_stemmer"
          ]
        }
      },
      "filter": {
        "indonesian_stop": {
          "type": "stop",
          "stopwords": ["dan", "di", "ke", "dari", "yang", "ini", "itu", "untuk", "dengan", "pada"]
        },
        "indonesian_stemmer": {
          "type": "stemmer",
          "language": "indonesian"
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "nama": { "type": "text", "analyzer": "indonesian_analyzer" },
      "deskripsi": { "type": "text", "analyzer": "indonesian_analyzer" }
    }
  }
}'

8. Aggregations — Analytics

Aggregations memungkinkan Anda melakukan analisis data — hitung rata-rata, distribusi, top-N, histogram, dan banyak lagi. Mirip GROUP BY di SQL, tapi jauh lebih powerful.

REST API — Aggregations
# =============================================
# BUCKET AGGREGATION (kelompokkan data)
# =============================================

# Distribusi per kategori
curl -X GET "http://localhost:9200/products/_search" -H 'Content-Type: application/json' -d '{
  "size": 0,
  "aggs": {
    "per_kategori": {
      "terms": {
        "field": "category",
        "size": 10
      }
    }
  }
}'
# Hasil:
# "buckets": [
#   { "key": "Elektronik", "doc_count": 3 },
#   { "key": "Aksesoris", "doc_count": 2 },
#   { "key": "Furniture", "doc_count": 1 }
# ]


# =============================================
# METRIC AGGREGATION (hitung statistik)
# =============================================
curl -X GET "http://localhost:9200/products/_search" -H 'Content-Type: application/json' -d '{
  "size": 0,
  "aggs": {
    "harga_stats": {
      "stats": {
        "field": "price"
      }
    },
    "harga_avg": {
      "avg": { "field": "price" }
    },
    "harga_max": {
      "max": { "field": "price" }
    },
    "harga_min": {
      "min": { "field": "price" }
    },
    "total_stock": {
      "sum": { "field": "stock" }
    },
    "jumlah_produk": {
      "value_count": { "field": "price" }
    }
  }
}'
# stats menghasilkan: count, min, max, avg, sum


# =============================================
# NESTED AGGREGATION (bucket + metric)
# =============================================
# Rata-rata harga per kategori
curl -X GET "http://localhost:9200/products/_search" -H 'Content-Type: application/json' -d '{
  "size": 0,
  "aggs": {
    "per_kategori": {
      "terms": { "field": "category" },
      "aggs": {
        "avg_harga": { "avg": { "field": "price" } },
        "max_harga": { "max": { "field": "price" } },
        "min_harga": { "min": { "field": "price" } },
        "harga_stats": { "stats": { "field": "price" } }
      }
    }
  }
}'


# =============================================
# RANGE & HISTOGRAM
# =============================================
# Range buckets
curl -X GET "http://localhost:9200/products/_search" -H 'Content-Type: application/json' -d '{
  "size": 0,
  "aggs": {
    "harga_ranges": {
      "range": {
        "field": "price",
        "ranges": [
          { "key": "Murah (<1jt)", "to": 1000000 },
          { "key": "Sedang (1-5jt)", "from": 1000000, "to": 5000000 },
          { "key": "Mahal (>5jt)", "from": 5000000 }
        ]
      }
    }
  }
}'

# Histogram
curl -X GET "http://localhost:9200/products/_search" -H 'Content-Type: application/json' -d '{
  "size": 0,
  "aggs": {
    "harga_histogram": {
      "histogram": {
        "field": "price",
        "interval": 2000000
      }
    }
  }
}'

# Date histogram (time-series)
curl -X GET "http://localhost:9200/products/_search" -H 'Content-Type: application/json' -d '{
  "size": 0,
  "aggs": {
    "per_bulan": {
      "date_histogram": {
        "field": "created_at",
        "calendar_interval": "month",
        "format": "yyyy-MM"
      },
      "aggs": {
        "total_revenue": {
          "sum": { "field": "price" }
        }
      }
    }
  }
}'


# =============================================
# PERCENTILES
# =============================================
curl -X GET "http://localhost:9200/products/_search" -H 'Content-Type: application/json' -d '{
  "size": 0,
  "aggs": {
    "harga_percentiles": {
      "percentiles": {
        "field": "price",
        "percents": [25, 50, 75, 95, 99]
      }
    }
  }
}'

9. Python Client (elasticsearch-py)

Python — elasticsearch-py
from elasticsearch import Elasticsearch
from datetime import datetime

# =============================================
# KONEKSI
# =============================================
es = Elasticsearch(
    "http://localhost:9200",
    # basic_auth=("username", "password")  # jika security aktif
)

# Cek koneksi
print(es.info())


# =============================================
# CREATE INDEX
# =============================================
def create_products_index():
    if es.indices.exists(index="products"):
        es.indices.delete(index="products")

    es.indices.create(index="products", body={
        "settings": {
            "number_of_shards": 2,
            "number_of_replicas": 1
        },
        "mappings": {
            "properties": {
                "name": {
                    "type": "text",
                    "fields": {"keyword": {"type": "keyword"}}
                },
                "description": {"type": "text"},
                "price": {"type": "float"},
                "category": {"type": "keyword"},
                "stock": {"type": "integer"},
                "tags": {"type": "keyword"},
                "created_at": {"type": "date"}
            }
        }
    })


# =============================================
# INDEX DOCUMENT
# =============================================
def index_product(product_id, data):
    data["created_at"] = datetime.now().isoformat()
    return es.index(index="products", id=product_id, body=data)

# Contoh
index_product("1", {
    "name": "Laptop ASUS ROG Strix",
    "description": "Laptop gaming dengan Intel i9 dan RTX 4090",
    "price": 25000000,
    "category": "Elektronik",
    "stock": 15,
    "tags": ["gaming", "laptop", "asus"]
})


# =============================================
# SEARCH
# =============================================
def search_products(query_text, category=None, min_price=None, max_price=None):
    must_clauses = []
    filter_clauses = []

    if query_text:
        must_clauses.append({
            "multi_match": {
                "query": query_text,
                "fields": ["name^3", "description^2", "tags"]
            }
        })

    if category:
        filter_clauses.append({"term": {"category": category}})

    if min_price or max_price:
        price_range = {}
        if min_price:
            price_range["gte"] = min_price
        if max_price:
            price_range["lte"] = max_price
        filter_clauses.append({"range": {"price": price_range}})

    body = {
        "query": {
            "bool": {
                "must": must_clauses if must_clauses else [{"match_all": {}}],
                "filter": filter_clauses
            }
        },
        "sort": [{"_score": "desc"}, {"price": "asc"}],
        "size": 20
    }

    response = es.search(index="products", body=body)
    hits = response["hits"]["hits"]
    total = response["hits"]["total"]["value"]

    return {"total": total, "products": [hit["_source"] | {"_score": hit["_score"]} for hit in hits]}


# Contoh penggunaan
results = search_products("gaming laptop", max_price=20000000)
print(f"Ditemukan {results['total']} produk:")
for p in results["products"]:
    print(f"  - {p['name']}: Rp{p['price']:,.0f} (score: {p['_score']:.2f})")


# =============================================
# AGGREGATIONS
# =============================================
def get_category_stats():
    response = es.search(index="products", body={
        "size": 0,
        "aggs": {
            "per_category": {
                "terms": {"field": "category"},
                "aggs": {
                    "avg_price": {"avg": {"field": "price"}},
                    "max_price": {"max": {"field": "price"}},
                    "total_stock": {"sum": {"field": "stock"}}
                }
            }
        }
    })
    return response["aggregations"]["per_category"]["buckets"]


# =============================================
# BULK INDEX
# =============================================
from elasticsearch.helpers import bulk

def bulk_index_products(products):
    actions = [
        {
            "_index": "products",
            "_id": p["id"],
            "_source": p
        }
        for p in products
    ]
    return bulk(es, actions)

# bulk_index_products([
#     {"id": "10", "name": "Mouse Logitech", "price": 500000, "category": "Aksesoris"},
#     {"id": "11", "name": "Keyboard Razer", "price": 1500000, "category": "Aksesoris"},
# ])


# =============================================
# UPDATE & DELETE
# =============================================
# Update
es.update(index="products", id="1", body={
    "doc": {"price": 24000000, "stock": 12}
})

# Delete
es.delete(index="products", id="1")

10. Best Practices & Optimasi

Best Practices

Praktik Detail
Gunakan filter untuk exact matchFilter lebih cepat (di-cache) dari must
Avoid deep paginationfrom + size max 10.000. Gunakan search_after untuk lebih dalam
text vs keyword benartext untuk search, keyword untuk filter/sort/aggregate
Limit _source fieldsAmbil hanya kolom yang dibutuhkan
Gunakan bulk APIBulk untuk insert/update banyak — 1000-5000 per batch
Shard sizing1 shard = 10-50GB (ideal). Terlalu banyak shard = overhead
Index Lifecycle ManagementGunakan ILM policy untuk manage index lifecycle (hot → warm → cold → delete)

Anti-Patterns

Anti-Pattern Kenapa Buruk Solusi
Wildcard query (*abc*)Sangat lambat — scan semua termsGunakan match/match_phrase
Deep pagination (from: 10000)Setiap halaman memuat semua halaman sebelumnyaGunakan search_after
Scripted sortSangat lambat untuk tabel besarPre-compute di index time
One giant indexShard terlalu besar, sulit manageGunakan time-based indices + alias
Too many fieldsMapping explosion — field limit default 1000Gunakan nested/flatten structure

11. Quiz: Uji Pemahamanmu!

Setelah membaca tutorial di atas, jawablah 5 pertanyaan berikut:

Pertanyaan 1: Apa perbedaan tipe data text dan keyword di Elasticsearch?

a) text untuk angka, keyword untuk teks
b) text di-analyze untuk full-text search, keyword disimpan apa adanya untuk exact match
c) text dan keyword sama saja
d) text untuk indexing, keyword tidak bisa di-search

Pertanyaan 2: Bagaimana cara mengkombinasi beberapa kondisi filter di Elasticsearch?

a) Gunakan AND/OR clause
b) Gunakan Bool query dengan must, filter, should, must_not
c) Gunakan WHERE clause
d) Gunakan JOIN query

Pertanyaan 3: Apa fungsi dari Aggregations di Elasticsearch?

a) Mengindeks dokumen lebih cepat
b) Melakukan analisis data — statistik, distribusi, grouping
c) Menghapus data lama secara otomatis
d) Mengenkripsi data untuk keamanan

Pertanyaan 4: Mengapa harus menghindari from: 10000, size: 10 untuk deep pagination?

a) Karena Elasticsearch tidak mendukung offset lebih dari 1000
b) Karena setiap halaman memuat semua data dari halaman sebelumnya ke memory
c) Karena pagination hanya bekerja di Kibana
d) Karena from + size tidak boleh melebihi 100

Pertanyaan 5: Apa perbedaan must dan filter di Bool query?

a) must untuk text, filter untuk keyword
b) must mempengaruhi relevancy score, filter tidak (lebih cepat dan di-cache)
c) must untuk SELECT, filter untuk DELETE
d) Tidak ada perbedaan
🔍 Zoom
100%
🎨 Tema