SQL Query Optimization Advanced

📋 Daftar Isi

Membaca Execution Plans
Jenis Scan & Join
Index Tuning Strategies
Database Statistics
Query Hints
Table Partitioning
Teknik Lanjutan
Masalah Umum & Solusi
Quiz Pemahaman

1. Membaca Execution Plans

Execution plan (atau query plan) adalah rencana yang dibuat oleh query optimizer untuk mengeksekusi SQL query. Memahami execution plan adalah skill paling penting dalam SQL optimization — tanpa ini, optimasi hanya berdasarkan tebakan.

1.1 PostgreSQL EXPLAIN

SQL — EXPLAIN di PostgreSQL

-- EXPLAIN saja: menampilkan plan tanpa mengeksekusi
EXPLAIN SELECT * FROM orders WHERE customer_id = 1234;

-- EXPLAIN ANALYZE: mengeksekusi dan menampilkan actual timing
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 1234;

-- EXPLAIN dengan opsi detail
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
    SELECT o.*, c.name
    FROM orders o
    JOIN customers c ON o.customer_id = c.id
    WHERE o.total > 1000
    ORDER BY o.created_at DESC
    LIMIT 20;

-- Output dan artinya:
-- Limit  (cost=1.23..4.56 rows=20 width=120)
--   -> Sort  (cost=1.23..4.56 rows=100 width=120)
--         Sort Key: o.created_at DESC
--         -> Hash Join  (cost=1.23..4.56 rows=100 width=120)
--               Hash Cond: (o.customer_id = c.id)
--               -> Seq Scan on orders o  (cost=0.00..1.23 rows=100)
--                     Filter: (total > 1000)
--                     Rows Removed by Filter: 900
--               -> Hash  (cost=1.00..1.00 rows=50)
--                     -> Seq Scan on customers c  (cost=0.00..1.00)

-- Format lain:
EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) SELECT ...;
EXPLAIN (ANALYZE, BUFFERS, FORMAT YAML) SELECT ...;

-- Interpretasi biaya:
-- cost=startup_cost..total_cost
-- startup_cost: biaya sebelum baris pertama dihasilkan
-- total_cost: total biaya untuk semua baris
-- rows: estimasi jumlah baris yang dihasilkan
-- width: rata-rata lebar baris dalam bytes

1.2 MySQL EXPLAIN

SQL — EXPLAIN di MySQL

-- EXPLAIN standar
EXPLAIN SELECT * FROM orders WHERE customer_id = 1234;

-- EXPLAIN FORMAT=JSON (detail)
EXPLAIN FORMAT=JSON SELECT o.*, c.name
FROM orders o JOIN customers c ON o.customer_id = c.id
WHERE o.total > 1000;

-- EXPLAIN ANALYZE (MySQL 8.0.18+)
EXPLAIN ANALYZE SELECT o.*, c.name
FROM orders o JOIN customers c ON o.customer_id = c.id
WHERE o.total > 1000;

-- Kolom penting di EXPLAIN output:
-- type: ALL (full scan), index, range, ref, eq_ref, const, system
-- possible_keys: index yang bisa digunakan
-- key: index yang benar-benar digunakan
-- rows: estimasi baris yang di-scan
-- filtered: persentase baris yang lolos filter
-- Extra: informasi tambahan (Using index, Using temporary, Using filesort)

-- Optimasi: hindari type=ALL (full table scan)
-- Target: type=const atau type=ref

💡 Pelajari EXPLAIN untuk Database Anda

Setiap database (PostgreSQL, MySQL, SQL Server, Oracle) memiliki format EXPLAIN yang berbeda. Luangkan waktu untuk memahami output EXPLAIN database yang Anda gunakan — ini investasi terbaik untuk skill optimization.

2. Jenis Scan & Join

2.1 Jenis Scan

Scan Type	Kecepatan	Kapan Terjadi
Seq Scan	Lambat	Tidak ada index yang cocok, atau tabel sangat kecil
Index Scan	Sedang	Index tersedia, beberapa baris perlu diambil dari heap
Index Only Scan	Cepat	Semua kolom yang dibutuhkan ada di index
Bitmap Index Scan	Cepat	Kombinasi beberapa kondisi index, banyak baris

2.2 Jenis Join

SQL — Join Strategies

-- Nested Loop Join: cocok untuk small result sets
-- Setiap baris dari outer table di-match dengan inner table
-- Baik jika inner table punya index pada join column
EXPLAIN ANALYZE
SELECT * FROM orders o
JOIN customers c ON o.customer_id = c.id  -- index on c.id
WHERE o.id = 1234;

-- Hash Join: cocok untuk medium-large tables
-- Build hash table dari smaller table, probe dengan larger table
-- PostgreSQL otomatis memilih ini
EXPLAIN ANALYZE
SELECT * FROM large_orders o
JOIN large_customers c ON o.customer_id = c.id;

-- Merge Join: cocok untuk sorted data, large tables
-- Kedua tabel di-sort dulu, lalu merge seperti merge sort
-- Paling efisien untuk tabel yang sudah terurut
EXPLAIN ANALYZE
SELECT * FROM sorted_orders o
JOIN sorted_customers c ON o.customer_id = c.id;

-- Tips: bantu optimizer memilih join yang tepat
-- Pastikan join columns punya index
-- Update statistics: ANALYZE orders; ANALYZE customers;
-- Pertimbangkan work_mem untuk Hash Join
SET work_mem = '256MB';  -- Default 4MB, naikkan untuk query besar

3. Index Tuning Strategies

3.1 Composite Index

SQL — Composite Index yang Efektif

-- Composite index: mengikuti aturan "equality, range, sort"
-- Urutan kolom sangat penting!

-- Index untuk query ini:
SELECT * FROM orders
WHERE status = 'completed'
  AND created_at > '2026-01-01'
ORDER BY total DESC;

-- Index optimal (equality dulu, lalu range, lalu sort):
CREATE INDEX idx_orders_status_created_total
ON orders (status, created_at, total DESC);

-- Index yang TIDAK optimal:
CREATE INDEX idx_orders_created_status
ON orders (created_at, status);  -- range di awal = efisiensi berkurang

-- Covering index: semua kolom yang dibutuhkan ada di index
-- Menghindari table lookup (heap access)
CREATE INDEX idx_orders_covering
ON orders (customer_id, status, created_at)
INCLUDE (total, items_count);

-- Partial index: index hanya untuk subset data
-- Lebih kecil, lebih cepat, hemat storage
CREATE INDEX idx_orders_pending
ON orders (customer_id, created_at)
WHERE status = 'pending';

-- Expression index: index berdasarkan ekspresi
CREATE INDEX idx_users_lower_email
ON users (LOWER(email));

-- Digunakan oleh query:
SELECT * FROM users WHERE LOWER(email) = 'budi@example.com';

3.2 Index Anti-Patterns

SQL — Hindari Index Anti-Patterns

-- 1. HINDARI: Terlalu banyak index
-- Setiap INSERT/UPDATE harus update semua index
-- Target: 3-5 index per tabel

-- 2. HINDARI: Index pada kolom yang jarang di-filter
-- Cek index usage:
SELECT
    indexrelname,
    idx_scan,
    idx_tup_read,
    idx_tup_fetch,
    pg_size_pretty(pg_relation_size(indexrelid)) as size
FROM pg_stat_user_indexes
WHERE schemaname = 'public'
ORDER BY idx_scan ASC;  -- Index dengan 0 scan = tidak dipakai!

-- 3. HINDARI: LIKE '%prefix' (tidak bisa pakai index)
-- LIKE 'prefix%' bisa pakai index, tapi '%prefix%' tidak bisa
-- Solusi: gunakan full-text search atau trigram index
CREATE INDEX idx_name_trgm ON users USING gin (name gin_trgm_ops);

-- 4. HINDARI: Fungsi pada kolom di WHERE clause
-- TIDAK menggunakan index:
SELECT * FROM orders WHERE YEAR(created_at) = 2026;
-- BISA menggunakan index:
SELECT * FROM orders
WHERE created_at >= '2026-01-01' AND created_at < '2027-01-01';

-- 5. HINDARI: Implicit type conversion
-- Jika kolom bertipe VARCHAR, jangan gunakan integer di WHERE
-- TIDAK optimal: WHERE phone = 0812345678
-- OPTIMAL: WHERE phone = '0812345678'

4. Database Statistics

Statistics memberikan informasi tentang distribusi data dalam tabel kepada query optimizer. Tanpa statistics yang akurat, optimizer bisa memilih plan yang buruk.

SQL — Statistik di PostgreSQL

-- Update statistics manual
ANALYZE orders;
ANALYZE customers;

-- Melihat statistics
SELECT
    attname AS column_name,
    n_distinct,              -- Estimasi nilai unik (-1 = unique)
    most_common_vals,        -- Nilai yang paling umum
    most_common_freqs,       -- Frekuensi nilai paling umum
    histogram_bounds,        -- Distribusi data
    correlation              -- Korelasi dengan physical order
FROM pg_stats
WHERE tablename = 'orders'
ORDER BY attname;

-- Melihat statistik tabel
SELECT
    relname,
    n_live_tup,              -- Jumlah baris hidup
    n_dead_tup,              -- Jumlah baris mati (tupes yang dihapus)
    last_analyze,            -- Terakhir analyze
    last_autoanalyze         -- Terakhir auto-analyze
FROM pg_stat_user_tables
WHERE schemaname = 'public'
ORDER BY n_live_tup DESC;

-- Mengubah target statistik untuk kolom tertentu
-- Default 100, maksimum 10000
ALTER TABLE orders ALTER COLUMN customer_id SET STATISTICS 1000;

-- Atau global:
SET default_statistics_target = 200;
ANALYZE orders;

-- MySQL: Update statistics
ANALYZE TABLE orders, customers;

-- Cek index statistics di MySQL
SHOW INDEX FROM orders;

⚠️ Statistics Outdated = Bad Plans

Jika Anda melihat perubahan drastis pada query plan setelah banyak INSERT/UPDATE/DELETE, kemungkinan statistics sudah outdated. Jalankan ANALYZE untuk tabel yang sering berubah. PostgreSQL memiliki auto-analyze, tetapi untuk tabel yang sangat aktif, scheduled ANALYZE sangat disarankan.

5. Query Hints

Query hints memungkinkan Anda memberi petunjuk kepada optimizer tentang cara mengeksekusi query. Gunakan hints hanya ketika optimizer memilih plan yang buruk — jangan gunakan secara rutin karena bisa menghambat optimizer saat data berubah.

SQL — Query Hints di Berbagai Database

-- PostgreSQL: menggunakan CTE untuk mempengaruhi plan
WITH forced_index AS (
    SELECT * FROM orders WHERE customer_id = 1234
)
SELECT * FROM forced_index WHERE total > 1000;

-- PostgreSQL: set local untuk session-scope hint
SET LOCAL enable_seqscan = off;
SET LOCAL enable_hashjoin = off;
SET LOCAL enable_nestloop = off;
SET LOCAL enable_mergejoin = off;
-- Setelah query, settings otomatis di-reset

-- PostgreSQL: mengatur work_mem per query
SET LOCAL work_mem = '512MB';

-- MySQL: Index hints
SELECT * FROM orders FORCE INDEX (idx_orders_customer)
WHERE customer_id = 1234;

SELECT * FROM orders IGNORE INDEX (idx_orders_created)
WHERE status = 'pending';

SELECT * FROM orders USE INDEX (idx_orders_status)
WHERE status = 'completed' AND total > 1000;

-- MySQL: Join order hint
SELECT STRAIGHT_JOIN o.*, c.name
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.total > 1000;

-- SQL Server: hints
SELECT * FROM orders WITH (INDEX(idx_orders_customer))
WHERE customer_id = 1234;

SELECT o.*, c.name
FROM orders o
INNER HASH JOIN customers c ON o.customer_id = c.id
WHERE o.total > 1000;

-- Oracle: hints
SELECT /*+ INDEX(orders idx_orders_customer) */
    * FROM orders WHERE customer_id = 1234;

SELECT /*+ USE_HASH(o c) */
    o.*, c.name FROM orders o JOIN customers c ON o.customer_id = c.id;

6. Table Partitioning

Partitioning membagi tabel besar menjadi beberapa bagian yang lebih kecil berdasarkan kriteria tertentu (tanggal, range, list). Query yang hanya membutuhkan sebagian data bisa skip partition yang tidak relevan — disebut partition pruning.

SQL — Partitioning di PostgreSQL

-- Range Partitioning berdasarkan tanggal
CREATE TABLE orders (
    id BIGSERIAL,
    customer_id BIGINT NOT NULL,
    total DECIMAL(10,2),
    status VARCHAR(20),
    created_at TIMESTAMPTZ NOT NULL
) PARTITION BY RANGE (created_at);

-- Membuat partisi per bulan
CREATE TABLE orders_2026_01 PARTITION OF orders
    FOR VALUES FROM ('2026-01-01') TO ('2026-02-01');
CREATE TABLE orders_2026_02 PARTITION OF orders
    FOR VALUES FROM ('2026-02-01') TO ('2026-03-01');
CREATE TABLE orders_2026_03 PARTITION OF orders
    FOR VALUES FROM ('2026-03-01') TO ('2026-04-01');

-- Index di setiap partisi
CREATE INDEX idx_orders_2026_01_customer ON orders_2026_01(customer_id);
CREATE INDEX idx_orders_2026_02_customer ON orders_2026_02(customer_id);
CREATE INDEX idx_orders_2026_03_customer ON orders_2026_03(customer_id);

-- Auto-create partition dengan pg_partman (ekstensi)
CREATE EXTENSION pg_partman;
SELECT partman.create_parent(
    p_parent_table := 'public.orders',
    p_control := 'created_at',
    p_type := 'range',
    p_interval := '1 month',
    p_premake := 3  -- Buat 3 partisi ke depan
);

-- Query yang trigger partition pruning
EXPLAIN ANALYZE
SELECT * FROM orders
WHERE created_at >= '2026-01-01' AND created_at < '2026-02-01';
-- Hanya scan orders_2026_01, bukan semua partisi!

-- List Partitioning berdasarkan region
CREATE TABLE users_global (
    id BIGSERIAL,
    region VARCHAR(10) NOT NULL,
    name VARCHAR(100)
) PARTITION BY LIST (region);

CREATE TABLE users_id PARTITION OF users_global FOR VALUES IN ('ID');
CREATE TABLE users_my PARTITION OF users_global FOR VALUES IN ('MY');
CREATE TABLE users_sg PARTITION OF users_global FOR VALUES IN ('SG');

-- Hash Partitioning (distribusi merata)
CREATE TABLE sessions (
    id UUID DEFAULT gen_random_uuid(),
    user_id BIGINT NOT NULL,
    data JSONB
) PARTITION BY HASH (user_id);

CREATE TABLE sessions_0 PARTITION OF sessions FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE sessions_1 PARTITION OF sessions FOR VALUES WITH (MODULUS 4, REMAINDER 1);
CREATE TABLE sessions_2 PARTITION OF sessions FOR VALUES WITH (MODULUS 4, REMAINDER 2);
CREATE TABLE sessions_3 PARTITION OF sessions FOR VALUES WITH (MODULUS 4, REMAINDER 3);

7. Teknik Lanjutan

7.1 Materialized Views

SQL — Pre-Computed Results

-- Materialized view untuk aggregasi yang sering dijalankan
CREATE MATERIALIZED VIEW mv_monthly_revenue AS
SELECT
    date_trunc('month', created_at) AS month,
    status,
    COUNT(*) AS order_count,
    SUM(total) AS revenue,
    AVG(total) AS avg_order_value,
    COUNT(DISTINCT customer_id) AS unique_customers
FROM orders
GROUP BY date_trunc('month', created_at), status;

CREATE UNIQUE INDEX idx_mv_monthly_revenue ON mv_monthly_revenue(month, status);

-- Refresh materialized view (secara berkala)
REFRESH MATERIALIZED VIEW CONCURRENTLY mv_monthly_revenue;
-- CONCURRENTLY: tidak memblokir reads saat refresh

-- CTE optimization: gunakan MATERIALIZED/NOT MATERIALIZED
WITH monthly_stats AS MATERIALIZED (
    -- Dipaksa dihitung sekali, hasilnya di-cache
    SELECT date_trunc('month', created_at) AS month, COUNT(*) AS cnt
    FROM orders GROUP BY 1
)
SELECT * FROM monthly_stats WHERE cnt > 100;

WITH active_users AS NOT MATERIALIZED (
    -- Di-inline ke parent query, tidak di-cache
    SELECT id FROM users WHERE is_active = true
)
SELECT * FROM orders WHERE customer_id IN (SELECT id FROM active_users);

7.2 Query Rewriting Patterns

SQL — Contek-Transformasi Query

-- 1. EXISTS lebih cepat dari IN untuk large subqueries
-- LAMBAT:
SELECT * FROM customers WHERE id IN (SELECT customer_id FROM orders);
-- CEPAT:
SELECT * FROM customers c WHERE EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.id);

-- 2. UNION ALL lebih cepat dari UNION (tidak perlu deduplicate)
-- Jika Anda tahu tidak ada duplikat:
SELECT id, name FROM users_id UNION ALL SELECT id, name FROM users_my;

-- 3. LIMIT + Index = cepat
-- Pastikan ORDER BY menggunakan kolom yang ter-index
SELECT * FROM orders ORDER BY created_at DESC LIMIT 10;
-- Index: CREATE INDEX idx_orders_created ON orders (created_at DESC);

-- 4. Window functions vs self-join
-- LAMBAT (self-join):
SELECT o.*, (SELECT COUNT(*) FROM orders WHERE customer_id = o.customer_id)
FROM orders o;
-- CEPAT (window function):
SELECT o.*, COUNT(*) OVER (PARTITION BY customer_id) FROM orders o;

-- 5. Batch processing vs row-by-row
-- LAMBAT (1000 UPDATE terpisah):
UPDATE orders SET status = 'processed' WHERE id = 1;
UPDATE orders SET status = 'processed' WHERE id = 2;
-- CEPAT (batch):
UPDATE orders SET status = 'processed' WHERE id IN (1, 2, 3, ..., 1000);

8. Masalah Umum & Solusi

Masalah	Gejala	Solusi
Full Table Scan	EXPLAIN menunjukkan Seq Scan pada tabel besar	Buat index yang tepat, update statistics
Index Not Used	Index ada tapi EXPLAIN tidak menggunakannya	Cek implicit conversion, fungsi di WHERE, statistik
N+1 Query	1 query utama + N query child	Gunakan JOIN atau batch loading
Sort Overflow	"External merge" di EXPLAIN	Naikkan work_mem, atau buat index yang match ORDER BY
Lock Contention	Query lambat karena menunggu lock	Kurangi transaksi, gunakan SELECT FOR UPDATE SKIP LOCKED
Temp Table Spill	Hash aggregate ke disk	Naikkan work_mem, kurangi result set
Stale Statistics	Plan berubah drastis	Jalankan ANALYZE, pertimbangkan manual statistics
Correlated Subquery	Query lambat karena subquery dijalankan per baris	Rewrite ke JOIN atau window function

9. Quiz: Uji Pemahamanmu!

Setelah membaca tutorial di atas, jawablah 5 pertanyaan berikut:

Pertanyaan 1: Apa yang dilakukan EXPLAIN ANALYZE berbeda dari EXPLAIN biasa?

a) Menampilkan lebih banyak kolom

b) Mengeksekusi query dan menampilkan actual timing dan row counts

c) Mengoptimalkan query secara otomatis

d) Menyimpan plan ke disk

Pertanyaan 2: Urutan kolom yang optimal dalam composite index untuk query dengan WHERE status='x' AND date > '2026' ORDER BY total?

a) (date, status, total)

b) (total, date, status)

c) (status, date, total)

d) Urutan tidak berpengaruh

Pertanyaan 3: Apa fungsi dari partition pruning?

a) Menghapus partisi yang tidak dipakai

b) Memungkinkan query melewati partisi yang tidak relevan dengan filter

c) Menggabungkan beberapa partisi menjadi satu

d) Menambah replica ke setiap partisi

Pertanyaan 4: Mengapa EXISTS lebih cepat dari IN untuk large subqueries?

a) EXISTS menggunakan index yang berbeda

b) EXISTS bisa berhenti mencari setelah menemukan match pertama

c) IN hanya mendukung primary key

d) Tidak ada perbedaan performa

Pertanyaan 5: Apa yang harus dilakukan jika index yang sudah dibuat tidak digunakan oleh optimizer?

a) Hapus index tersebut

b) Cek statistics, implicit conversion, fungsi di WHERE, dan perbandingan data types

c) Gunakan query hint untuk memaksa index

d) Restart database server