DevOps & Cloud

OpenTelemetry: Observability untuk Developer

TOKEN

Panduan lengkap OpenTelemetry β€” distributed tracing, metrics collection, log correlation, exporters (Jaeger, Prometheus, Grafana), auto-instrumentation, dan best practices observability

1. Pengenalan OpenTelemetry

OpenTelemetry (OTel) adalah framework open-source untuk observability β€” mengumpulkan, memproses, dan mengekspor telemetry data (traces, metrics, dan logs) dari aplikasi. OpenTelemetry adalah project CNCF (Cloud Native Computing Foundation) dengan kontribusi dari Google, Microsoft, Amazon, dan banyak perusahaan teknologi besar lainnya.

Sebelum OpenTelemetry, developer harus memilih vendor proprietary untuk observability β€” Datadog, New Relic, Dynatrace, dll. Setiap vendor memiliki SDK sendiri yang tidak kompatibel satu sama lain. OpenTelemetry mengubah ini dengan menyediakan standard universal yang bisa digunakan dengan backend apapun.

OpenTelemetry adalah merger dari dua project sebelumnya: OpenTracing (API standar untuk tracing) dan OpenCensus (SDK untuk metrics dan tracing). OTel menggabungkan yang terbaik dari keduanya menjadi satu standar unified yang mencakup traces, metrics, dan logs.

Mengapa OpenTelemetry?

Keunggulan Penjelasan
Vendor NeutralStandard universal β€” bisa export ke backend apapun (Jaeger, Prometheus, Grafana, Datadog, dll)
Multi-languageSDK tersedia untuk 11+ bahasa β€” Java, Python, Go, JavaScript, .NET, Rust, Swift, dll
CNCF ProjectDidukung oleh perusahaan besar β€” dijamin long-term sustainability
Auto-instrumentationInstrumentasi otomatis untuk framework populer β€” minimal code changes
Traces + Metrics + LogsSatu framework untuk semua tipe telemetry data
W3C StandardTrace context propagation mengikuti standar W3C
CorrelationOtomatis mengkorelasikan traces, metrics, dan logs
Production ReadyDigunakan oleh perusahaan besar di production
Diagram: OpenTelemetry Ecosystem
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                  OPENTELEMETRY ECOSYSTEM                         β”‚
β”‚                                                                  β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚              APPLICATION (Your Code)                      β”‚   β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”            β”‚   β”‚
β”‚  β”‚  β”‚  OTel API β”‚  β”‚  OTel SDK β”‚  β”‚  Auto-    β”‚            β”‚   β”‚
β”‚  β”‚  β”‚  (stable) β”‚  β”‚  (config) β”‚  β”‚ Instrumentβ”‚            β”‚   β”‚
β”‚  β”‚  β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜            β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚           β”‚               β”‚               β”‚                      β”‚
β”‚           β–Ό               β–Ό               β–Ό                      β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚              OTel SDK (Process Pipeline)                  β”‚   β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚   β”‚
β”‚  β”‚  β”‚TracerPro-β”‚  β”‚MeterPro- β”‚  β”‚Log Bridge            β”‚   β”‚   β”‚
β”‚  β”‚  β”‚vider     β”‚  β”‚vider     β”‚  β”‚(LogEmitter)          β”‚   β”‚   β”‚
β”‚  β”‚  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚   β”‚
β”‚  β”‚       β”‚              β”‚                    β”‚               β”‚   β”‚
β”‚  β”‚       β–Ό              β–Ό                    β–Ό               β”‚   β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚   β”‚
β”‚  β”‚  β”‚              OTel Exporter                        β”‚    β”‚   β”‚
β”‚  β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚    β”‚   β”‚
β”‚  β”‚  β”‚  β”‚ OTLP    β”‚ β”‚ Jaeger   β”‚ β”‚ Prometheus        β”‚ β”‚    β”‚   β”‚
β”‚  β”‚  β”‚  β”‚ Exporterβ”‚ β”‚ Exporter β”‚ β”‚ Exporter          β”‚ β”‚    β”‚   β”‚
β”‚  β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚    β”‚   β”‚
β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                              β”‚                                   β”‚
β”‚           β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”               β”‚
β”‚           β–Ό                  β–Ό                  β–Ό               β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”‚
β”‚  β”‚   Jaeger     β”‚  β”‚  Prometheus  β”‚  β”‚   Grafana    β”‚         β”‚
β”‚  β”‚  (Traces)    β”‚  β”‚  (Metrics)   β”‚  β”‚  (Dashboard) β”‚         β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

OpenTelemetry vs Alternatif

Fitur OpenTelemetry Datadog New Relic
Harga🟒 Open-sourceπŸ”΄ Paid (mahal)🟑 Free tier + paid
Vendor Lock-in🟒 Tidak adaπŸ”΄ Terikat vendor🟑 Sedang
Data Ownership🟒 Full controlπŸ”΄ Di server vendor🟑 Sebagian
Setup🟑 Lebih kompleks🟒 Sangat mudah🟒 Mudah
BackendπŸ”΄ Perlu setup sendiri🟒 Managed🟒 Managed
Standardisasi🟒 CNCF StandardπŸ”΄ Proprietary🟑 Sebagian

2. Tiga Pilar Observability

OpenTelemetry mencakup tiga pilar observability yang saling terkorelasi:

Diagram: Tiga Pilar Observability
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚               TIGA PILAR OBSERVABILITY                        β”‚
β”‚                                                               β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
β”‚  β”‚                    TRACES                              β”‚    β”‚
β”‚  β”‚  "Bagaimana request mengalir melalui sistem?"         β”‚    β”‚
β”‚  β”‚                                                        β”‚    β”‚
β”‚  β”‚  Request β†’ Service A β†’ Service B β†’ Service C          β”‚    β”‚
β”‚  β”‚            ──────────────────────────────────           β”‚    β”‚
β”‚  β”‚            Trace ID: abc123                            β”‚    β”‚
β”‚  β”‚            Span 1 (100ms) β†’ Span 2 (50ms) β†’ Span 3   β”‚    β”‚
β”‚  β”‚            (30ms)                                      β”‚    β”‚
β”‚  β”‚  ─────────────────────────────────────────────────     β”‚    β”‚
β”‚  β”‚  Berguna untuk: Debugging latency, distributed        β”‚    β”‚
β”‚  β”‚  system tracing, dependency analysis                   β”‚    β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
β”‚                                                               β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
β”‚  β”‚                    METRICS                              β”‚    β”‚
β”‚  β”‚  "Seberapa sehat sistem saya?"                        β”‚    β”‚
β”‚  β”‚                                                        β”‚    β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”              β”‚    β”‚
β”‚  β”‚  β”‚Request/s β”‚ β”‚Latency   β”‚ β”‚Error Rateβ”‚              β”‚    β”‚
β”‚  β”‚  β”‚   450    β”‚ β”‚   25ms   β”‚ β”‚   0.1%   β”‚              β”‚    β”‚
β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜              β”‚    β”‚
β”‚  β”‚  ─────────────────────────────────────────────────     β”‚    β”‚
β”‚  β”‚  Berguna untuk: Alerting, dashboards, capacity        β”‚    β”‚
β”‚  β”‚  planning, SLA monitoring                              β”‚    β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
β”‚                                                               β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
β”‚  β”‚                    LOGS                                β”‚    β”‚
β”‚  β”‚  "Apa yang terjadi pada sistem?"                      β”‚    β”‚
β”‚  β”‚                                                        β”‚    β”‚
β”‚  β”‚  [2026-06-26T10:00:00Z] INFO  trace_id=abc123        β”‚    β”‚
β”‚  β”‚    span_id=def456 "Processing user order"             β”‚    β”‚
β”‚  β”‚  [2026-06-26T10:00:01Z] ERROR trace_id=abc123        β”‚    β”‚
β”‚  β”‚    span_id=ghi789 "Database connection timeout"       β”‚    β”‚
β”‚  β”‚  ─────────────────────────────────────────────────     β”‚    β”‚
β”‚  β”‚  Berguna untuk: Debugging error, audit trail,         β”‚    β”‚
β”‚  β”‚  compliance, detailed context                           β”‚    β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
β”‚                                                               β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
β”‚  β”‚              CORRELATION (Korelasi)                    β”‚    β”‚
β”‚  β”‚                                                        β”‚    β”‚
β”‚  β”‚  Trace β†’ Metrics β†’ Logs                               β”‚    β”‚
β”‚  β”‚  Semua data terkorelasi via Trace ID dan Span ID     β”‚    β”‚
β”‚  β”‚  β†’ Dari alert (metrics) β†’ ke trace β†’ ke logs          β”‚    β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

3. Arsitektur OpenTelemetry

Komponen Utama

Komponen Fungsi
APIInterface yang digunakan aplikasi untuk membuat spans, metrics, dan logs. Stabil dan tidak berubah
SDKImplementasi dari API β€” processing, sampling, export. Bisa dikonfigurasi
CollectorProxy yang menerima, memproses, dan mengekspor telemetry data. Bisa di-deploy sebagai agent atau gateway
Instrumentation LibrariesLibrary yang otomatis instrumentasi framework populer (Express, Flask, Spring, dll)
ExportersKirim data ke backend (Jaeger, Prometheus, Grafana Tempo, Datadog, dll)
Context PropagationMenyebarluaskan trace context antar service (W3C TraceContext, Baggage)

Data Flow

Text
Data Flow dalam OpenTelemetry:

Application Code
  β†’ OTel API (create spans, record metrics)
    β†’ OTel SDK (process, sample, batch)
      β†’ OTel Exporter (send data)
        β†’ OTel Collector (optional: receive, process, export)
          β†’ Backend (Jaeger, Prometheus, Grafana, dll)

Modes:
  1. Direct Export: App β†’ Exporter β†’ Backend
  2. Via Collector: App β†’ Exporter β†’ Collector β†’ Backend (recommended)
  3. Via Agent: App β†’ gRPC/HTTP β†’ Collector Agent β†’ Backend

4. Distributed Tracing

Distributed Tracing memungkinkan Anda melacak journey sebuah request dari awal hingga akhir β€” melewati berbagai service dalam sistem Anda. Setiap "span" merepresentasikan satu unit of work, dan kumpulan spans membentuk satu "trace".

Konsep Tracing

Diagram: Distributed Trace Example
Trace ID: abc123def456

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Span A: API Gateway                                             β”‚
β”‚ [==============================================================]β”‚
β”‚ 100ms                                                           β”‚
β”‚                                                                 β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                 β”‚
β”‚  β”‚ Span B: User Service                        β”‚                 β”‚
β”‚  β”‚ [==========================================]β”‚                 β”‚
β”‚  β”‚ 60ms                                        β”‚                 β”‚
β”‚  β”‚                                             β”‚                 β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                     β”‚                 β”‚
β”‚  β”‚  β”‚ Span C: Database   β”‚                     β”‚                 β”‚
β”‚  β”‚  β”‚ [==================]β”‚                     β”‚                 β”‚
β”‚  β”‚  β”‚ 20ms               β”‚                     β”‚                 β”‚
β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                     β”‚                 β”‚
β”‚  β”‚                                             β”‚                 β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                   β”‚                 β”‚
β”‚  β”‚  β”‚ Span D: Redis Cache  β”‚                   β”‚                 β”‚
β”‚  β”‚  β”‚ [====================]β”‚                   β”‚                 β”‚
β”‚  β”‚  β”‚ 15ms                 β”‚                   β”‚                 β”‚
β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                   β”‚                 β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                 β”‚
β”‚                                                                 β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                            β”‚
β”‚  β”‚ Span E: Order Service            β”‚                            β”‚
β”‚  β”‚ [================================]β”‚                            β”‚
β”‚  β”‚ 30ms                              β”‚                            β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Terminology:
  Trace    = End-to-end journey dari sebuah request
  Span     = Satu unit of work (function call, DB query, HTTP request)
  Context  = Metadata (trace ID, span ID, baggage) yang di-propagate
  Parent   = Span yang memanggil span lain
  Child    = Span yang dipanggil oleh parent span
  Attributes = Key-value pairs pada span (http.method, db.system, dll)
  Events   = Timestamped events pada span (exception, log, dll)
  Status   = Status span (OK, ERROR, UNSET)

Implementasi Tracing di Node.js

JavaScript
// tracing.js β€” Setup OpenTelemetry tracing (load SEBELUM app code)
const { NodeSDK } = require('@opentelemetry/sdk-node');
const { getNodeAutoInstrumentations } = require('@opentelemetry/auto-instrumentations-node');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-http');
const { OTLPMetricExporter } = require('@opentelemetry/exporter-metrics-otlp-http');
const { PeriodicExportingMetricReader } = require('@opentelemetry/sdk-metrics');
const { Resource } = require('@opentelemetry/resources');
const { ATTR_SERVICE_NAME, ATTR_SERVICE_VERSION } = require('@opentelemetry/semantic-conventions');

// Definisikan resource (service metadata)
const resource = new Resource({
  [ATTR_SERVICE_NAME]: 'user-service',
  [ATTR_SERVICE_VERSION]: '1.0.0',
  'deployment.environment': process.env.NODE_ENV || 'development',
});

// Setup trace exporter
const traceExporter = new OTLPTraceExporter({
  url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT || 'http://localhost:4318/v1/traces',
});

// Setup metric exporter
const metricExporter = new OTLPMetricExporter({
  url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT || 'http://localhost:4318/v1/metrics',
});

// Initialize SDK
const sdk = new NodeSDK({
  resource,
  traceExporter,
  metricReader: new PeriodicExportingMetricReader({
    exporter: metricExporter,
    exportIntervalMillis: 15000,
  }),
  instrumentations: [
    getNodeAutoInstrumentations({
      // Konfigurasi instrumentasi
      '@opentelemetry/instrumentation-http': {
        enabled: true,
      },
      '@opentelemetry/instrumentation-express': {
        enabled: true,
      },
      '@opentelemetry/instrumentation-pg': {
        enabled: true,
      },
      '@opentelemetry/instrumentation-redis': {
        enabled: true,
      },
    }),
  ],
});

// Start SDK
sdk.start();
console.log('OpenTelemetry SDK initialized');

// Graceful shutdown
process.on('SIGTERM', () => {
  sdk.shutdown()
    .then(() => console.log('OpenTelemetry SDK shut down'))
    .catch((err) => console.error('Error shutting down SDK:', err))
    .finally(() => process.exit(0));
});

App Code dengan Tracing

JavaScript
// app.js β€” Express app dengan OpenTelemetry
// PENTING: require('./tracing') harus di-import PALING PERTAMA
require('./tracing');

const express = require('express');
const { trace, SpanStatusCode } = require('@opentelemetry/api');
const app = express();

// Dapatkan tracer untuk service ini
const tracer = trace.getTracer('user-service', '1.0.0');

app.get('/users/:id', async (req, res) => {
  // Auto-instrumentation sudah membuat span untuk request ini
  // Tapi kita bisa menambahkan custom spans

  const span = tracer.startSpan('fetch-user-data', {
    attributes: {
      'user.id': req.params.id,
      'http.method': req.method,
    },
  });

  try {
    // Simulate database call
    const user = await tracer.startActiveSpan('database.query', async (dbSpan) => {
      dbSpan.setAttribute('db.system', 'postgresql');
      dbSpan.setAttribute('db.statement', 'SELECT * FROM users WHERE id = ?');

      const result = await db.getUser(req.params.id);

      dbSpan.setAttribute('db.rows_affected', result.rowCount);
      dbSpan.end();
      return result;
    });

    // Tambahkan event ke span
    span.addEvent('user_found', {
      'user.name': user.name,
      'user.email': user.email,
    });

    span.setStatus({ code: SpanStatusCode.OK });
    res.json(user);
  } catch (error) {
    // Record error di span
    span.recordException(error);
    span.setStatus({
      code: SpanStatusCode.ERROR,
      message: error.message,
    });
    res.status(500).json({ error: error.message });
  } finally {
    span.end();
  }
});

app.listen(3000, () => {
  console.log('Server running on port 3000');
});

Implementasi di Python

Python
# tracing.py β€” Setup OpenTelemetry untuk Python
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.resources import Resource
from opentelemetry.semantics.resource import ResourceAttributes
from opentelemetry.instrumentation.flask import FlaskInstrumentor
from opentelemetry.instrumentation.requests import RequestsInstrumentor
from opentelemetry.instrumentation.psycopg2 import Psycopg2Instrumentor

# Definisikan resource
resource = Resource.create({
    ResourceAttributes.SERVICE_NAME: "python-api",
    ResourceAttributes.SERVICE_VERSION: "1.0.0",
    "deployment.environment": "production",
})

# Setup provider dan exporter
provider = TracerProvider(resource=resource)
processor = BatchSpanProcessor(OTLPSpanExporter(
    endpoint="http://localhost:4318/v1/traces"
))
provider.add_span_processor(processor)
trace.set_tracer_provider(provider)

# Auto-instrumentasi
FlaskInstrumentor().instrument()
RequestsInstrumentor().instrument()
Psycopg2Instrumentor().instrument()

# app.py β€” Flask app dengan custom spans
from flask import Flask, jsonify
from opentelemetry import trace

app = Flask(__name__)
tracer = trace.get_tracer("python-api")

@app.route('/api/users/')
def get_user(user_id):
    with tracer.start_as_current_span("get-user-details") as span:
        span.set_attribute("user.id", user_id)

        # Database query (auto-instrumented)
        with tracer.start_as_current_span("db-query") as db_span:
            db_span.set_attribute("db.system", "postgresql")
            user = db.execute("SELECT * FROM users WHERE id = %s", [user_id])

        span.add_event("user_fetched", {"user.id": user_id})
        return jsonify(user)

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)

5. Metrics Collection

OpenTelemetry Metrics memungkinkan Anda mengumpulkan data numerik tentang performa sistem β€” request count, latency, error rate, resource usage, dan custom business metrics.

Tipe Metrics

Tipe Fungsi Contoh
CounterNilai yang selalu bertambahTotal requests, total errors
GaugeNilai yang bisa naik turunActive connections, memory usage
HistogramDistribusi nilaiRequest latency, response size
Up/Down CounterCounter yang bisa naik turunActive tasks, queue size
JavaScript
// metrics.js β€” Custom metrics dengan OpenTelemetry
const { metrics } = require('@opentelemetry/api');

// Dapatkan meter
const meter = metrics.getMeter('user-service', '1.0.0');

// 1. Counter β€” total requests
const requestCounter = meter.createCounter('http.server.requests', {
  description: 'Total HTTP requests',
  unit: '1',
});

// 2. Histogram β€” request duration
const requestDuration = meter.createHistogram('http.server.duration', {
  description: 'HTTP request duration',
  unit: 'ms',
});

// 3. Gauge β€” active connections
const activeConnections = meter.createUpDownCounter('http.server.active_connections', {
  description: 'Number of active connections',
  unit: '1',
});

// 4. Custom business metric
const orderCounter = meter.createCounter('business.orders.created', {
  description: 'Total orders created',
  unit: '1',
});

const orderValue = meter.createHistogram('business.orders.value', {
  description: 'Order value distribution',
  unit: 'USD',
});

// Menggunakan metrics di dalam kode
app.use((req, res, next) => {
  const startTime = Date.now();

  // Increment active connections
  activeConnections.add(1, { method: req.method });

  res.on('finish', () => {
    const duration = Date.now() - startTime;

    // Record request
    requestCounter.add(1, {
      method: req.method,
      route: req.route?.path || 'unknown',
      status_code: res.statusCode,
    });

    // Record duration
    requestDuration.record(duration, {
      method: req.method,
      route: req.route?.path || 'unknown',
    });

    // Decrement active connections
    activeConnections.add(-1, { method: req.method });
  });

  next();
});

// Record business metrics
app.post('/api/orders', async (req, res) => {
  const order = await createOrder(req.body);

  orderCounter.add(1, {
    category: order.category,
    region: order.region,
  });

  orderValue.record(order.total, {
    category: order.category,
  });

  res.json(order);
});

6. Log Correlation

OpenTelemetry Logs memungkinkan Anda mengkorelasikan log entries dengan traces dan metrics. Dengan OTel, setiap log entry bisa memiliki trace ID dan span ID, sehingga Anda bisa langsung navigasi dari log ke trace yang terkait.

JavaScript
// logging.js β€” Logs dengan OpenTelemetry correlation
const { logs, SeverityNumber } = require('@opentelemetry/api-logs');
const { trace } = require('@opentelemetry/api');

const logger = logs.getLogger('user-service');

function logWithContext(level, message, attributes = {}) {
  const span = trace.getActiveSpan();
  const spanContext = span?.spanContext();

  logger.emit({
    severityText: level,
    severityNumber: level === 'ERROR' ? SeverityNumber.ERROR
      : level === 'WARN' ? SeverityNumber.WARN
      : SeverityNumber.INFO,
    body: message,
    attributes: {
      ...attributes,
      // OTel SDK otomatis menambahkan trace_id dan span_id
      // jika log di-emit dari dalam span context
    },
  });
}

// Contoh penggunaan
app.get('/api/users/:id', async (req, res) => {
  const span = trace.getActiveSpan();

  logWithContext('INFO', 'Processing user request', {
    'user.id': req.params.id,
    'http.method': req.method,
  });

  try {
    const user = await getUser(req.params.id);
    logWithContext('INFO', 'User fetched successfully', {
      'user.id': user.id,
      'user.name': user.name,
    });
    res.json(user);
  } catch (error) {
    logWithContext('ERROR', 'Failed to fetch user', {
      'error.message': error.message,
      'error.stack': error.stack,
    });
    res.status(500).json({ error: error.message });
  }
});

Structured Logging Output

JSON
{
  "timestamp": "2026-06-26T10:00:00.123Z",
  "severityText": "INFO",
  "severityNumber": 9,
  "body": "User fetched successfully",
  "resource": {
    "service.name": "user-service",
    "service.version": "1.0.0",
    "deployment.environment": "production"
  },
  "attributes": {
    "user.id": "12345",
    "user.name": "Budi Santoso",
    "trace_id": "abc123def45678901234567890123456",
    "span_id": "def4567890123456"
  }
}

// Dengan trace_id dan span_id, Anda bisa:
// 1. Click dari log ke trace di Jaeger/Grafana
// 2. Filter logs berdasarkan trace_id
// 3. Melihat semua log dalam satu request

7. Auto-instrumentation

Fitur paling powerful dari OpenTelemetry adalah auto-instrumentation β€” library yang otomatis membuat spans untuk operasi yang umum (HTTP calls, database queries, cache operations) tanpa perlu menulis kode instrumentasi manual.

Supported Libraries per Language

Bahasa Libraries yang Didukung Auto-instrumentation
Node.jsExpress, Fastify, Koa, HTTP, PostgreSQL, MySQL, Redis, MongoDB, gRPC, GraphQL, AWS SDK
PythonFlask, Django, FastAPI, HTTPX, psycopg2, SQLAlchemy, Redis, Celery, requests
JavaSpring, Servlet, JDBC, MongoDB, Redis, Kafka, gRPC, OkHttp, Apache HTTP
Gonet/http, gRPC, database/sql, MongoDB, Redis
.NETASP.NET Core, EF Core, SQL Client, HTTP Client, gRPC, MongoDB

Zero-Code Instrumentation (Java)

Bash
# Java Agent β€” Zero code changes!
# Download otel java agent
curl -LO https://github.com/open-telemetry/opentelemetry-java-instrumentation/releases/latest/download/opentelemetry-javaagent.jar

# Jalankan aplikasi dengan agent
java -javaagent:opentelemetry-javaagent.jar \
  -Dotel.service.name=my-java-app \
  -Dotel.exporter.otlp.endpoint=http://localhost:4317 \
  -jar my-app.jar

# Environment variables configuration
export OTEL_SERVICE_NAME=my-java-app
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
export OTEL_TRACES_SAMPLER=parentbased_traceidratio
export OTEL_TRACES_SAMPLER_ARG=0.1
export OTEL_RESOURCE_ATTRIBUTES="service.version=1.0.0,deployment.env=prod"

java -javaagent:opentelemetry-javaagent.jar -jar my-app.jar

Zero-Code Instrumentation (Python)

Bash
# Python β€” Auto-instrumentasi via opentelemetry-instrument CLI
pip install opentelemetry-distro opentelemetry-exporter-otlp
opentelemetry-bootstrap -a install

# Jalankan dengan auto-instrumentasi
opentelemetry-instrument \
  --service_name my-python-app \
  --exporter_otlp_endpoint http://localhost:4318 \
  python my_app.py

# Atau dengan gunicorn
opentelemetry-instrument \
  --service_name my-python-app \
  gunicorn -w 4 -b 0.0.0.0:8000 myapp:app

8. Manual Instrumentation

Untuk operasi bisnis spesifik yang tidak di-cover oleh auto-instrumentation, Anda perlu menulis instrumentasi manual. Ini memberikan detail yang lebih kaya tentang operasi business logic.

JavaScript
// Manual instrumentation untuk business logic
const { trace, context, SpanKind, SpanStatusCode } = require('@opentelemetry/api');

const tracer = trace.getTracer('order-service', '1.0.0');

async function processOrder(orderData) {
  // Buat span parent untuk operasi bisnis
  return tracer.startActiveSpan(
    'process-order',
    {
      kind: SpanKind.INTERNAL,
      attributes: {
        'order.id': orderData.id,
        'order.customer_id': orderData.customerId,
        'order.item_count': orderData.items.length,
        'order.total': orderData.total,
      },
    },
    async (parentSpan) => {
      try {
        // 1. Validasi order
        const validatedOrder = await tracer.startActiveSpan(
          'validate-order',
          async (span) => {
            span.setAttribute('order.items', JSON.stringify(orderData.items));
            const result = await validateOrder(orderData);
            span.addEvent('validation_passed');
            span.end();
            return result;
          }
        );

        // 2. Proses pembayaran
        const payment = await tracer.startActiveSpan(
          'process-payment',
          {
            attributes: {
              'payment.method': orderData.paymentMethod,
              'payment.amount': orderData.total,
              'payment.currency': 'IDR',
            },
          },
          async (span) => {
            try {
              const result = await paymentService.charge(orderData);
              span.setAttribute('payment.transaction_id', result.transactionId);
              span.setStatus({ code: SpanStatusCode.OK });
              span.end();
              return result;
            } catch (err) {
              span.recordException(err);
              span.setStatus({ code: SpanStatusCode.ERROR, message: err.message });
              span.end();
              throw err;
            }
          }
        );

        // 3. Kirim notifikasi
        await tracer.startActiveSpan('send-notification', async (span) => {
          span.setAttribute('notification.type', 'order_confirmation');
          span.setAttribute('notification.channel', 'email');
          await sendEmail(orderData.customerId, 'Order confirmed');
          span.addEvent('notification_sent');
          span.end();
        });

        parentSpan.setStatus({ code: SpanStatusCode.OK });
        return { success: true, orderId: orderData.id };
      } catch (error) {
        parentSpan.recordException(error);
        parentSpan.setStatus({ code: SpanStatusCode.ERROR, message: error.message });
        throw error;
      } finally {
        parentSpan.end();
      }
    }
  );
}

9. Exporters & Backends

OpenTelemetry mendukung export ke berbagai backend observability. Pilihan backend menentukan bagaimana Anda menyimpan, memvisualisasi, dan meng-query telemetry data.

Backend Options

Backend Data Type Open Source Keterangan
JaegerTracesβœ…Visualisasi distributed traces, trace comparison
ZipkinTracesβœ…Alternatif Jaeger untuk distributed tracing
PrometheusMetricsβœ…Time-series database untuk metrics, PromQL
GrafanaAllβœ…Dashboard visualization β€” terkoneksi ke semua backend
Grafana TempoTracesβœ…Trace backend dari Grafana Labs β€” scalable
Grafana LokiLogsβœ…Log aggregation system dari Grafana Labs
ElasticsearchAllβœ…Full-text search untuk logs dan traces
DatadogAll❌Managed platform β€” terima OTLP langsung
New RelicAll❌Managed platform β€” terima OTLP langsung
HoneycombAll❌Fokus pada high-cardinality data exploration

Setup Grafana Stack (Recommended)

YAML
# docker-compose.yml β€” Full Grafana Observability Stack
version: '3.8'
services:
  # OpenTelemetry Collector
  otel-collector:
    image: otel/opentelemetry-collector-contrib:latest
    command: ["--config=/etc/otel-collector-config.yaml"]
    volumes:
      - ./otel-collector-config.yaml:/etc/otel-collector-config.yaml
    ports:
      - "4317:4317"   # gRPC
      - "4318:4318"   # HTTP

  # Jaeger β€” Traces
  jaeger:
    image: jaegertracing/all-in-one:latest
    environment:
      COLLECTOR_OTLP_ENABLED: "true"
    ports:
      - "16686:16686"  # Jaeger UI
      - "14250:14250"  # gRPC

  # Prometheus β€” Metrics
  prometheus:
    image: prom/prometheus:latest
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
    ports:
      - "9090:9090"    # Prometheus UI

  # Grafana β€” Dashboard
  grafana:
    image: grafana/grafana:latest
    environment:
      GF_SECURITY_ADMIN_PASSWORD: admin
    ports:
      - "3000:3000"    # Grafana UI
    volumes:
      - grafana-storage:/var/lib/grafana

  # Loki β€” Logs
  loki:
    image: grafana/loki:latest
    ports:
      - "3100:3100"

  # Tempo β€” Traces (alternatif Jaeger)
  tempo:
    image: grafana/tempo:latest
    ports:
      - "3200:3200"

volumes:
  grafana-storage:

OTel Collector Configuration

YAML
# otel-collector-config.yaml
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: "0.0.0.0:4317"
      http:
        endpoint: "0.0.0.0:4318"

processors:
  batch:
    timeout: 5s
    send_batch_size: 1024

  memory_limiter:
    check_interval: 1s
    limit_mib: 512
    spike_limit_mib: 128

  # Tail-based sampling β€” keep errors and slow requests
  tail_sampling:
    decision_wait: 10s
    policies:
      - name: error-policy
        type: status_code
        status_code: {status_codes: [ERROR]}
      - name: slow-request-policy
        type: latency
        latency: {threshold_ms: 1000}
      - name: probabilistic-policy
        type: probabilistic
        probabilistic: {sampling_percentage: 10}

exporters:
  # Traces β†’ Jaeger
  otlp/jaeger:
    endpoint: "jaeger:4317"
    tls:
      insecure: true

  # Metrics β†’ Prometheus
  prometheus:
    endpoint: "0.0.0.0:8889"

  # Logs β†’ Loki
  loki:
    endpoint: "http://loki:3100/loki/api/v1/push"

  # Debug β€” print ke stdout
  debug:
    verbosity: basic

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, batch, tail_sampling]
      exporters: [otlp/jaeger]
    metrics:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [prometheus]
    logs:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [loki]

10. OpenTelemetry Collector

OpenTelemetry Collector adalah komponen pusat yang menerima telemetry dari semua aplikasi, memprosesnya (filter, batch, transform), dan mengekspor ke berbagai backend. Ini menghilangkan vendor lock-in di sisi aplikasi β€” apps hanya perlu mengirim ke Collector.

Deployment Modes

Diagram: Collector Deployment Modes
  MODE 1: AGENT (sidecar per host)
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚  Host / Pod                          β”‚
  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
  β”‚  β”‚  App     β”‚  β”‚  OTel Collector  β”‚ β”‚
  β”‚  β”‚          │──│  (Agent Mode)    β”‚ β”‚
  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
  - Low latency, high availability
  - Co-located with application

  MODE 2: GATEWAY (centralized)
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚  App 1   β”‚ β”‚  App 2   β”‚ β”‚  App 3   β”‚
  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
       β”‚            β”‚            β”‚
       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                    β–Ό
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚     OTel Collector (Gateway)         β”‚
  β”‚     - Receive from all apps          β”‚
  β”‚     - Process (sample, filter)       β”‚
  β”‚     - Export to multiple backends    β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
             β–Ό             β–Ό
      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
      β”‚  Jaeger  β”‚  β”‚Prometheusβ”‚
      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
  - Centralized processing
  - Easier management

  MODE 3: COMBINED (Agent + Gateway)
  App β†’ Agent Collector β†’ Gateway Collector β†’ Backends
  - Best of both worlds
  - Agent handles initial processing
  - Gateway handles routing & export

11. Best Practices

Best Practice Detail
Load OTel FirstSelalu import/setup OpenTelemetry SEBELUM import library lain β€” agar auto-instrumentation bisa meng-hook library
Resource AttributesSet service.name, service.version, deployment.environment di semua service
SamplingGunakan sampling untuk production β€” jangan trace 100% request. Gunakan tail-based sampling untuk mempertahankan traces error
OTLP ProtocolGunakan OTLP (OpenTelemetry Protocol) sebagai default exporter β€” paling efisien dan universal
CollectorSelalu gunakan OTel Collector sebagai perantara β€” decouple app dari backend
Batch ProcessorGunakan batch processor untuk mengurangi jumlah export calls
Semantic ConventionsGunakan OTel semantic conventions untuk attribute names β€” konsisten antar service
Graceful ShutdownPanggil sdk.shutdown() saat SIGTERM untuk flush semua data yang belum ter-export
CardinalityHindari metric attributes dengan cardinality tinggi (misalnya user_id) β€” bisa memakan banyak memori
Context PropagationPastikan trace context di-propagate ke semua service calls β€” HTTP headers, message queue headers
πŸ’‘ Tips

Mulai dengan auto-instrumentation untuk mendapatkan visibility cepat tanpa banyak code changes. Setelah terbiasa, tambahkan manual instrumentation untuk business-critical paths yang memerlukan detail lebih. Gunakan Grafana + Prometheus + Jaeger + Loki sebagai stack observability open-source yang lengkap.

⚠️ Peringatan

Hati-hati dengan high cardinality attributes pada metrics β€” atribut seperti user_id atau request_id bisa menghasilkan jutaan time series yang memakan banyak memori di Prometheus. Simpan high-cardinality data di traces dan logs, bukan di metrics.

12. Quiz Pemahaman

πŸ“ Quiz: Pemahaman OpenTelemetry

1. Apa tiga pilar observability yang dicakup OpenTelemetry?

2. Apa itu "span" dalam distributed tracing?

3. Mengapa OpenTelemetry lebih baik dari vendor proprietary?

4. Apa fungsi OpenTelemetry Collector?

5. Apa itu "auto-instrumentation" di OpenTelemetry?

πŸ” Zoom
100%
🎨 Tema