TimescaleDB: Time-Series SQL Database Lengkap

📋 Daftar Isi

Pengenalan TimescaleDB
Instalasi & Setup
Hypertables
Time-Series Queries
Compression
Continuous Aggregates
Retention Policies
Use Cases: IoT, Monitoring, Finance
Best Practices & Performance
Quiz Pemahaman

1. Pengenalan TimescaleDB

TimescaleDB adalah ekstensi PostgreSQL untuk data time-series. Dibangun di atas PostgreSQL, TimescaleDB menambahkan fitur khusus untuk data berbasis waktu — hypertables, compression, dan continuous aggregates — sambil mempertahankan kompatibilitas penuh dengan SQL dan ekosistem PostgreSQL.

Mengapa Time-Series Data Berbeda?

Karakteristik	Database Relasional Biasa	Time-Series Database
Pola Insert	Bervariasi (CRUD campur)	Insert-heavy (append-only)
Pola Query	Point lookup, join kompleks	Range scan, aggregation per waktu
Data Volume	Jutaan baris	Miliaran baris
Retensi	Tidak terbatas	Perlu auto-expire data lama
Indexing	B-tree, hash, GIN	Time-based partitioning
Compression	Opsional	Kritis (hemat storage)

TimescaleDB vs Alternatif

Aspek	TimescaleDB	InfluxDB	Apache Druid
Base	PostgreSQL	Custom	Custom
Query Language	SQL	Flux / InfluxQL	SQL (Druid SQL)
Full SQL Support	Ya (100%)	Terbatas	Terbatas
JOIN dengan tabel lain	Ya	Tidak	Terbatas
ACID Transactions	Ya	Tidak	Tidak
Compression	95%+ ratio	Baik	Sangat baik
Ecosystem	Seluruh PostgreSQL	Grafana, Telegraf	Grafana, Superset

Diagram: Arsitektur TimescaleDB

+--------------------------------------------------------------------+
|                  ARSITEKTUR TIMESCALEDB                              |
|                                                                    |
|  +----------------------------------------------------------+      |
|  |                    PostgreSQL                             |      |
|  |  +----------------------------------------------------+  |      |
|  |  |              TimescaleDB Extension                  |  |      |
|  |  |                                                    |  |      |
|  |  |  +----------+  +----------+  +----------+         |  |      |
|  |  |  |Hypertable|  |Continuous|  |Compression|         |  |      |
|  |  |  | (tabel)  |  |Aggregate |  | Policies  |         |  |      |
|  |  |  +----------+  +----------+  +----------+         |  |      |
|  |  |       |              |              |              |  |      |
|  |  |  +----------+  +----------+  +----------+         |  |      |
|  |  |  | Chunk 1  |  | Chunk 2  |  | Chunk N  |         |  |      |
|  |  |  | (partisi)|  | (partisi)|  | (partisi)|         |  |      |
|  |  |  +----------+  +----------+  +----------+         |  |      |
|  |  +----------------------------------------------------+  |      |
|  |                                                          |      |
|  |  Standard PostgreSQL: tables, indexes, views, functions  |      |
|  +----------------------------------------------------------+      |
+--------------------------------------------------------------------+

2. Instalasi & Setup

Bash — Instalasi TimescaleDB

# =============================================
# Docker (recommended untuk development)
# =============================================
docker run -d --name timescaledb \
  -p 5432:5432 \
  -e POSTGRES_PASSWORD=*** \
  timescale/timescaledb:latest-pg16

# =============================================
# Ubuntu/Debian
# =============================================
# Tambah repository Timescale
sudo add-apt-repository ppa:timescale/timescaledb-ppa
sudo apt update
sudo apt install timescaledb-2-postgresql-16

# Atau gunakan script setup
# https://docs.timescale.com/install/latest/installation-linux/

# =============================================
# Aktifkan ekstensi di PostgreSQL
# =============================================
psql -U postgres -d your_database

-- Aktifkan ekstensi
CREATE EXTENSION IF NOT EXISTS timescaledb;

-- Verifikasi
SELECT default_version, installed_version
FROM pg_available_extensions
WHERE name = 'timescaledb';

3. Hypertables

Hypertable adalah konsep inti TimescaleDB. Secara luar, hypertable terlihat seperti tabel PostgreSQL biasa. Di balik layar, TimescaleDB otomatis mempartisi data berdasarkan kolom waktu (time column) menjadi chunks yang lebih kecil.

Membuat Hypertable

SQL — Hypertables

-- =============================================
-- BUAT TABEL STANDAR dulu, lalu convert ke hypertable
-- =============================================

-- Tabel sensor readings (IoT)
CREATE TABLE sensor_readings (
    time         TIMESTAMPTZ NOT NULL,
    sensor_id    INTEGER NOT NULL,
    temperature  DOUBLE PRECISION,
    humidity     DOUBLE PRECISION,
    pressure     DOUBLE PRECISION,
    battery      DOUBLE PRECISION
);

-- Convert ke hypertable (time_column = 'time')
SELECT create_hypertable('sensor_readings', 'time');

-- =============================================
-- HYPERTABLE dengan partisi tambahan
-- =============================================

-- Partition by time DAN sensor_id (multi-dimensional)
CREATE TABLE sensor_data (
    time         TIMESTAMPTZ NOT NULL,
    location_id  INTEGER NOT NULL,
    sensor_id    INTEGER NOT NULL,
    metric_name  TEXT NOT NULL,
    metric_value DOUBLE PRECISION
);

SELECT create_hypertable(
    'sensor_data',
    'time',
    partitioning_column => 'location_id',
    number_partitions => 4
);

-- =============================================
-- TABEL LAINNYA: metrik aplikasi
-- =============================================
CREATE TABLE app_metrics (
    time         TIMESTAMPTZ NOT NULL,
    service      TEXT NOT NULL,
    endpoint     TEXT NOT NULL,
    response_ms  DOUBLE PRECISION,
    status_code  INTEGER,
    is_error     BOOLEAN DEFAULT FALSE
);

SELECT create_hypertable('app_metrics', 'time');

-- =============================================
-- TABEL: transaksi keuangan
-- =============================================
CREATE TABLE financial_ticks (
    time         TIMESTAMPTZ NOT NULL,
    symbol       TEXT NOT NULL,
    price        NUMERIC(15,4),
    volume       BIGINT,
    bid          NUMERIC(15,4),
    ask          NUMERIC(15,4)
);

SELECT create_hypertable('financial_ticks', 'time');

-- Cek hypertables
SELECT * FROM timescaledb_information.hypertables;

Mengelola Chunks

SQL — Chunk Management

-- Lihat semua chunks dari sebuah hypertable
SELECT chunk_name, range_start, range_end, is_compressed
FROM timescaledb_information.chunks
WHERE hypertable_name = 'sensor_readings'
ORDER BY range_start DESC;

-- Lihat ukuran hypertable (termasuk index dan chunks)
SELECT
    hypertable_name,
    pg_size_pretty(hypertable_size(format('%I.%I', hypertable_schema, hypertable_name)::regclass)) AS total_size,
    num_chunks
FROM timescaledb_information.hypertables;

-- Set chunk interval (default 7 hari)
SELECT set_chunk_time_interval('sensor_readings', INTERVAL '1 day');

-- Add compression policy (otomatis compress chunk lama)
SELECT add_compression_policy('sensor_readings', INTERVAL '7 days');

Insert Data ke Hypertable

SQL — Insert Data

-- Insert data biasa (sama seperti PostgreSQL)
INSERT INTO sensor_readings (time, sensor_id, temperature, humidity, pressure, battery)
VALUES
    (NOW() - INTERVAL '5 minutes', 1, 25.5, 65.2, 1013.25, 85.0),
    (NOW() - INTERVAL '4 minutes', 1, 25.7, 64.8, 1013.20, 84.5),
    (NOW() - INTERVAL '3 minutes', 2, 22.3, 70.1, 1013.30, 92.0),
    (NOW() - INTERVAL '2 minutes', 2, 22.5, 69.8, 1013.28, 91.5),
    (NOW() - INTERVAL '1 minute',  1, 26.0, 63.5, 1013.15, 84.0);

-- Generate data untuk testing (1 juta baris)
INSERT INTO sensor_readings (time, sensor_id, temperature, humidity, pressure, battery)
SELECT
    NOW() - (random() * INTERVAL '30 days'),
    (random() * 100)::INT + 1,
    20 + random() * 15,
    40 + random() * 40,
    1000 + random() * 30,
    50 + random() * 50
FROM generate_series(1, 1000000);

-- Bulk insert untuk performa
-- Gunakan COPY atau INSERT ... VALUES batch
COPY sensor_readings (time, sensor_id, temperature, humidity, pressure, battery)
FROM '/path/to/data.csv'
WITH (FORMAT csv, HEADER true);

4. Time-Series Queries

TimescaleDB menambahkan fungsi-fungsi khusus untuk query time-series yang tidak ada di PostgreSQL standar.

Fungsi TimeBucket

SQL — Time-Series Queries

-- =============================================
-- time_bucket: Group by interval waktu
-- (mirip date_trunc tapi lebih fleksibel)
-- =============================================

-- Rata-rata per 5 menit
SELECT
    time_bucket('5 minutes', time) AS bucket,
    sensor_id,
    AVG(temperature) AS avg_temp,
    AVG(humidity) AS avg_humidity,
    MAX(temperature) AS max_temp,
    MIN(temperature) AS min_temp,
    COUNT(*) AS readings
FROM sensor_readings
WHERE time >= NOW() - INTERVAL '1 day'
GROUP BY bucket, sensor_id
ORDER BY bucket DESC, sensor_id;

-- Rata-rata per jam
SELECT
    time_bucket('1 hour', time) AS jam,
    AVG(temperature) AS avg_temp,
    MAX(temperature) AS max_temp,
    MIN(temperature) AS min_temp
FROM sensor_readings
WHERE time >= NOW() - INTERVAL '7 days'
GROUP BY jam
ORDER BY jam DESC;

-- Rata-rata per hari (dengan custom origin)
SELECT
    time_bucket('1 day', time, TIMESTAMPTZ '2026-01-01') AS hari,
    AVG(temperature) AS avg_temp
FROM sensor_readings
GROUP BY hari
ORDER BY hari DESC;

Fungsi Time-Series Lainnya

SQL — Advanced Time-Series Functions

-- =============================================
-- first() dan last(): Ambil nilai pertama/terakhir
-- dalam interval
-- =============================================
SELECT
    time_bucket('1 hour', time) AS jam,
    sensor_id,
    first(temperature, time) AS temp_awal,
    last(temperature, time)  AS temp_akhir,
    last(temperature, time) - first(temperature, time) AS perubahan
FROM sensor_readings
WHERE time >= NOW() - INTERVAL '24 hours'
GROUP BY jam, sensor_id
ORDER BY jam DESC;

-- =============================================
-- time_bucket_gapfill: Isi gap waktu yang kosong
-- =============================================
SELECT
    time_bucket_gapfill('1 hour', time) AS jam,
    sensor_id,
    AVG(temperature) AS avg_temp,
    interpolate(AVG(temperature)) AS interpolated_temp,
    locf(AVG(temperature)) AS last_known_temp
FROM sensor_readings
WHERE time >= NOW() - INTERVAL '24 hours'
  AND sensor_id = 1
GROUP BY jam, sensor_id
ORDER BY jam;

-- =============================================
-- Moving Average dengan window function
-- =============================================
SELECT
    time,
    sensor_id,
    temperature,
    AVG(temperature) OVER (
        PARTITION BY sensor_id
        ORDER BY time
        ROWS BETWEEN 9 PRECEDING AND CURRENT ROW
    ) AS moving_avg_10,
    temperature - LAG(temperature) OVER (
        PARTITION BY sensor_id ORDER BY time
    ) AS delta
FROM sensor_readings
WHERE sensor_id = 1
  AND time >= NOW() - INTERVAL '2 hours'
ORDER BY time DESC;

5. Compression

Kompression adalah fitur kritis untuk time-series data karena volume data yang sangat besar. TimescaleDB menawarkan native columnar compression yang bisa mengurangi storage hingga 95%+.

Aktifkan Compression

SQL — Compression

-- =============================================
-- LANGKAH 1: Tentukan segmentby dan orderby
-- =============================================
ALTER TABLE sensor_readings SET (
    timescaledb.compress,
    timescaledb.compress_segmentby = 'sensor_id',
    timescaledb.compress_orderby = 'time DESC'
);

-- =============================================
-- LANGKAH 2: Auto-compress chunk yang lebih lama
-- dari 7 hari
-- =============================================
SELECT add_compression_policy('sensor_readings', INTERVAL '7 days');

-- =============================================
-- Manual compress: compress chunk tertentu
-- =============================================
-- Cari chunk yang belum di-compress
SELECT chunk_name, range_start, range_end, is_compressed
FROM timescaledb_information.chunks
WHERE hypertable_name = 'sensor_readings'
  AND is_compressed = FALSE
ORDER BY range_start;

-- Compress secara manual
SELECT compress_chunk('_timescaledb_internal._hyper_1_1_chunk');
SELECT compress_chunk('_timescaledb_internal._hyper_1_2_chunk');

-- =============================================
-- Lihat hasil kompresi
-- =============================================
SELECT
    pg_size_pretty(before_compression_total_bytes) AS sebelum,
    pg_size_pretty(after_compression_total_bytes) AS sesudah,
    ROUND(
        (1.0 - after_compression_total_bytes::float / before_compression_total_bytes) * 100,
        2
    ) AS persen_penghematan
FROM hypertable_compression_stats('sensor_readings');

💡 Compression Strategy

Atur segmentby berdasarkan kolom yang sering di-group atau filter. Atur orderby berdasarkan kolom yang sering di-sort. Misal: segmentby = 'sensor_id' dan orderby = 'time DESC' untuk query per sensor dalam rentang waktu.

6. Continuous Aggregates

Continuous Aggregate adalah materialized view yang otomatis di-update saat data baru masuk. Berbeda dari materialized view biasa yang harus di-refresh manual, continuous aggregate hanya meng-update chunk yang berubah.

Membuat Continuous Aggregate

SQL — Continuous Aggregates

-- =============================================
-- CONTINUOUS AGGREGATE: Statistik per jam
-- =============================================
CREATE MATERIALIZED VIEW stats_per_jam
WITH (timescaledb.continuous) AS
SELECT
    time_bucket('1 hour', time) AS jam,
    sensor_id,
    AVG(temperature) AS avg_temp,
    MIN(temperature) AS min_temp,
    MAX(temperature) AS max_temp,
    AVG(humidity) AS avg_humidity,
    AVG(pressure) AS avg_pressure,
    COUNT(*) AS jumlah_pembacaan
FROM sensor_readings
GROUP BY jam, sensor_id;

-- Query continuous aggregate (sangat cepat!)
SELECT * FROM stats_per_jam
WHERE jam >= NOW() - INTERVAL '24 hours'
  AND sensor_id = 1
ORDER BY jam DESC;


-- =============================================
-- CONTINUOUS AGGREGATE: Statistik per hari
-- =============================================
CREATE MATERIALIZED VIEW stats_per_hari
WITH (timescaledb.continuous) AS
SELECT
    time_bucket('1 day', time) AS hari,
    sensor_id,
    AVG(temperature) AS avg_temp,
    MIN(temperature) AS min_temp,
    MAX(temperature) AS max_temp,
    STDDEV(temperature) AS std_temp,
    COUNT(*) AS readings_count
FROM sensor_readings
GROUP BY hari, sensor_id;


-- =============================================
-- Auto-refresh policy
-- =============================================
-- Refresh otomatis setiap 1 jam
SELECT add_continuous_aggregate_policy('stats_per_jam',
    start_offset    => INTERVAL '3 days',
    end_offset      => INTERVAL '1 hour',
    schedule_interval => INTERVAL '1 hour'
);

-- Refresh otomatis setiap 1 hari
SELECT add_continuous_aggregate_policy('stats_per_hari',
    start_offset    => INTERVAL '30 days',
    end_offset      => INTERVAL '1 day',
    schedule_interval => INTERVAL '1 day'
);

-- Manual refresh
CALL refresh_continuous_aggregate('stats_per_jam', NOW() - INTERVAL '1 day', NOW());

-- Lihat semua continuous aggregates
SELECT * FROM timescaledb_information.continuous_aggregates;

Diagram: Continuous Aggregates Hierarchy

+--------------------------------------------------------------------+
|            CONTINUOUS AGGREGATES HIERARCHY                           |
|                                                                    |
|  Raw Data (Hypertable)                                             |
|  +--------------------------------------------------------------+  |
|  | sensor_readings: Miliaran baris                              |  |
|  | Per-second / per-minute data                                 |  |
|  +--------------------------------------------------------------+  |
|                          |                                         |
|                          v                                         |
|  stats_per_jam (Continuous Aggregate)                              |
|  +--------------------------------------------------------------+  |
|  | Rata-rata per jam per sensor                                 |  |
|  | Jauh lebih sedikit baris                                     |  |
|  +--------------------------------------------------------------+  |
|                          |                                         |
|                          v                                         |
|  stats_per_hari (Continuous Aggregate)                             |
|  +--------------------------------------------------------------+  |
|  | Rata-rata per hari per sensor                                |  |
|  | Paling cepat untuk query jangka panjang                      |  |
|  +--------------------------------------------------------------+  |
+--------------------------------------------------------------------+

7. Retention Policies

Data time-series sering memiliki masa retensi — data lama tidak lagi relevan dan harus dihapus untuk menghemat storage.

SQL — Retention Policies

-- =============================================
-- Auto-drop chunk yang lebih lama dari 90 hari
-- =============================================
SELECT add_retention_policy('sensor_readings', INTERVAL '90 days');

-- Retensi berbeda per hypertable
SELECT add_retention_policy('app_metrics', INTERVAL '30 days');
SELECT add_retention_policy('financial_ticks', INTERVAL '365 days');

-- Lihat semua retention policies
SELECT * FROM timescaledb_information.jobs
WHERE proc_name = 'policy_retention';

-- Manual drop chunk lama
SELECT drop_chunks('sensor_readings', INTERVAL '90 days');

-- Preview chunk yang akan di-drop (tanpa benar-benar drop)
SELECT show_chunks('sensor_readings', older_than => INTERVAL '90 days');

-- Hapus retention policy
SELECT remove_retention_policy('sensor_readings');


-- =============================================
-- Data Tiering: Pindahkan data lama ke storage murah
-- =============================================
-- TimescaleDB mendukung multi-tier storage
-- Data panas (recent) di SSD, data dingin (lama) di object storage

-- Contoh setup tiering (memerlukan license Tiered)
-- SELECT add_tiering_policy('sensor_readings', INTERVAL '30 days');

8. Use Cases: IoT, Monitoring, Finance

IoT: Monitoring Sensor

SQL — IoT Monitoring

-- Deteksi anomali: sensor dengan temperatur di atas rata-rata
WITH sensor_avg AS (
    SELECT
        sensor_id,
        AVG(temperature) AS avg_temp,
        STDDEV(temperature) AS std_temp
    FROM sensor_readings
    WHERE time >= NOW() - INTERVAL '24 hours'
    GROUP BY sensor_id
)
SELECT
    sr.time,
    sr.sensor_id,
    sr.temperature,
    sa.avg_temp,
    sa.std_temp,
    (sr.temperature - sa.avg_temp) / NULLIF(sa.std_temp, 0) AS z_score
FROM sensor_readings sr
JOIN sensor_avg sa ON sr.sensor_id = sa.sensor_id
WHERE sr.time >= NOW() - INTERVAL '1 hour'
  AND ABS((sr.temperature - sa.avg_temp) / NULLIF(sa.std_temp, 0)) > 3
ORDER BY ABS((sr.temperature - sa.avg_temp) / NULLIF(sa.std_temp, 0)) DESC
LIMIT 20;

-- Sensor yang offline (tidak ada data dalam 30 menit terakhir)
SELECT
    s.sensor_id,
    MAX(s.time) AS last_reading,
    NOW() - MAX(s.time) AS offline_duration
FROM sensor_readings s
WHERE time >= NOW() - INTERVAL '2 hours'
GROUP BY s.sensor_id
HAVING NOW() - MAX(s.time) > INTERVAL '30 minutes'
ORDER BY offline_duration DESC;

Monitoring: Aplikasi Performance

SQL — App Monitoring

-- Endpoint terlambat (response time > 1 detik)
SELECT
    time_bucket('5 minutes', time) AS bucket,
    endpoint,
    AVG(response_ms) AS avg_ms,
    PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY response_ms) AS p95_ms,
    PERCENTILE_CONT(0.99) WITHIN GROUP (ORDER BY response_ms) AS p99_ms,
    COUNT(*) AS total_requests,
    SUM(CASE WHEN is_error THEN 1 ELSE 0 END) AS errors,
    ROUND(SUM(CASE WHEN is_error THEN 1 ELSE 0 END)::NUMERIC / COUNT(*) * 100, 2) AS error_rate
FROM app_metrics
WHERE time >= NOW() - INTERVAL '1 hour'
GROUP BY bucket, endpoint
HAVING AVG(response_ms) > 1000
ORDER BY avg_ms DESC;

-- Status code distribution per jam
SELECT
    time_bucket('1 hour', time) AS jam,
    status_code,
    COUNT(*) AS jumlah
FROM app_metrics
WHERE time >= NOW() - INTERVAL '24 hours'
GROUP BY jam, status_code
ORDER BY jam DESC, jumlah DESC;

Finance: Stock Ticks

SQL — Financial Data

-- OHLCV (Open, High, Low, Close, Volume) per 5 menit
SELECT
    time_bucket('5 minutes', time) AS bucket,
    symbol,
    first(price, time) AS open,
    MAX(price) AS high,
    MIN(price) AS low,
    last(price, time) AS close,
    SUM(volume) AS volume
FROM financial_ticks
WHERE time >= NOW() - INTERVAL '1 day'
  AND symbol = 'BBCA'
GROUP BY bucket, symbol
ORDER BY bucket DESC;

-- VWAP (Volume Weighted Average Price)
SELECT
    time_bucket('1 hour', time) AS jam,
    symbol,
    SUM(price * volume)::NUMERIC / NULLIF(SUM(volume), 0) AS vwap,
    SUM(volume) AS total_volume
FROM financial_ticks
WHERE time >= NOW() - INTERVAL '24 hours'
GROUP BY jam, symbol
ORDER BY jam DESC;

9. Best Practices & Performance

Praktik	Penjelasan
Chunk interval yang tepat	Default 7 hari. Untuk data volume sangat tinggi, gunakan 1 hari atau bahkan 1 jam
Segmentby yang tepat	Pilih kolom yang sering di-GROUP BY atau filter untuk compression
Continuous aggregates	Buat aggregate per jam/hari untuk query yang sering diulang
Retention policies	Hapus data lama secara otomatis untuk hemat storage
Bulk insert	Gunakan COPY atau batch INSERT, bukan satu-satu
Jangan UPDATE/DELETE frequent	Time-series data bersifat append-only. Update jarang terjadi
Index minimal	Tambah index hanya pada kolom yang sering di-filter (selain time)
EXPLAIN ANALYZE	Selalu cek execution plan untuk query besar

⚠️ Hindari Kesalahan Umum

Jangan gunakan SELECT * pada hypertable dengan miliaran baris. Selalu filter berdasarkan rentang waktu (WHERE time BETWEEN ...) agar query memanfaatkan chunk pruning. Tanpa filter waktu, query scan semua chunks — sangat lambat.

10. Quiz Pemahaman

📝 Quiz: TimescaleDB

Apa itu hypertable dan bagaimana cara membuatnya?
Jawaban: Hypertable adalah tabel yang otomatis dipartisi berdasarkan waktu menjadi chunks. Dibuat dengan SELECT create_hypertable('nama_tabel', 'kolom_waktu').
Apa keuntungan continuous aggregate dibanding materialized view biasa?
Jawaban: Continuous aggregate otomatis meng-update hanya chunk yang berubah, sedangkan materialized view biasa harus di-refresh manual (full refresh).
Apa fungsi time_bucket()?
Jawaban: Mengelompokkan timestamp ke interval tertentu (5 menit, 1 jam, 1 hari) untuk aggregation. Lebih fleksibel dari date_trunc().
Apa itu segmentby dalam compression?
Jawaban: Kolom yang digunakan untuk membagi data sebelum dikompres. Query yang filter berdasarkan segmentby kolom akan lebih cepat karena hanya decompress segmen yang relevan.
Mengapa filter waktu sangat penting di hypertable?
Jawaban: Tanpa filter waktu, query harus scan semua chunks (full table scan). Dengan filter waktu, TimescaleDB hanya membaca chunks yang relevan (chunk pruning).

🎯 Langkah Selanjutnya

Pelajari Database Sharding untuk strategi skalabilitas horizontal, atau kembali ke SQL Stored Procedures untuk membuat fungsi-fungsi yang berinteraksi dengan data time-series.