1. Pengenalan OpenTelemetry
OpenTelemetry (OTel) adalah framework open-source untuk observability β mengumpulkan, memproses, dan mengekspor telemetry data (traces, metrics, dan logs) dari aplikasi. OpenTelemetry adalah project CNCF (Cloud Native Computing Foundation) dengan kontribusi dari Google, Microsoft, Amazon, dan banyak perusahaan teknologi besar lainnya.
Sebelum OpenTelemetry, developer harus memilih vendor proprietary untuk observability β Datadog, New Relic, Dynatrace, dll. Setiap vendor memiliki SDK sendiri yang tidak kompatibel satu sama lain. OpenTelemetry mengubah ini dengan menyediakan standard universal yang bisa digunakan dengan backend apapun.
OpenTelemetry adalah merger dari dua project sebelumnya: OpenTracing (API standar untuk tracing) dan OpenCensus (SDK untuk metrics dan tracing). OTel menggabungkan yang terbaik dari keduanya menjadi satu standar unified yang mencakup traces, metrics, dan logs.
Mengapa OpenTelemetry?
| Keunggulan | Penjelasan |
|---|---|
| Vendor Neutral | Standard universal β bisa export ke backend apapun (Jaeger, Prometheus, Grafana, Datadog, dll) |
| Multi-language | SDK tersedia untuk 11+ bahasa β Java, Python, Go, JavaScript, .NET, Rust, Swift, dll |
| CNCF Project | Didukung oleh perusahaan besar β dijamin long-term sustainability |
| Auto-instrumentation | Instrumentasi otomatis untuk framework populer β minimal code changes |
| Traces + Metrics + Logs | Satu framework untuk semua tipe telemetry data |
| W3C Standard | Trace context propagation mengikuti standar W3C |
| Correlation | Otomatis mengkorelasikan traces, metrics, dan logs |
| Production Ready | Digunakan oleh perusahaan besar di production |
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β OPENTELEMETRY ECOSYSTEM β β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β APPLICATION (Your Code) β β β β βββββββββββββ βββββββββββββ βββββββββββββ β β β β β OTel API β β OTel SDK β β Auto- β β β β β β (stable) β β (config) β β Instrumentβ β β β β βββββββ¬ββββββ βββββββ¬ββββββ βββββββ¬ββββββ β β β ββββββββββΌββββββββββββββββΌββββββββββββββββΌβββββββββββββββββββ β β β β β β β βΌ βΌ βΌ β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β OTel SDK (Process Pipeline) β β β β ββββββββββββ ββββββββββββ ββββββββββββββββββββββββ β β β β βTracerPro-β βMeterPro- β βLog Bridge β β β β β βvider β βvider β β(LogEmitter) β β β β β ββββββ¬ββββββ ββββββ¬ββββββ ββββββββββββ¬ββββββββββββ β β β β β β β β β β β βΌ βΌ βΌ β β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β β β OTel Exporter β β β β β β βββββββββββ ββββββββββββ βββββββββββββββββββββ β β β β β β β OTLP β β Jaeger β β Prometheus β β β β β β β β Exporterβ β Exporter β β Exporter β β β β β β β βββββββββββ ββββββββββββ βββββββββββββββββββββ β β β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β β β ββββββββββββββββββββΌβββββββββββββββββββ β β βΌ βΌ βΌ β β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β β β Jaeger β β Prometheus β β Grafana β β β β (Traces) β β (Metrics) β β (Dashboard) β β β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
OpenTelemetry vs Alternatif
| Fitur | OpenTelemetry | Datadog | New Relic |
|---|---|---|---|
| Harga | π’ Open-source | π΄ Paid (mahal) | π‘ Free tier + paid |
| Vendor Lock-in | π’ Tidak ada | π΄ Terikat vendor | π‘ Sedang |
| Data Ownership | π’ Full control | π΄ Di server vendor | π‘ Sebagian |
| Setup | π‘ Lebih kompleks | π’ Sangat mudah | π’ Mudah |
| Backend | π΄ Perlu setup sendiri | π’ Managed | π’ Managed |
| Standardisasi | π’ CNCF Standard | π΄ Proprietary | π‘ Sebagian |
2. Tiga Pilar Observability
OpenTelemetry mencakup tiga pilar observability yang saling terkorelasi:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β TIGA PILAR OBSERVABILITY β β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β TRACES β β β β "Bagaimana request mengalir melalui sistem?" β β β β β β β β Request β Service A β Service B β Service C β β β β ββββββββββββββββββββββββββββββββββ β β β β Trace ID: abc123 β β β β Span 1 (100ms) β Span 2 (50ms) β Span 3 β β β β (30ms) β β β β βββββββββββββββββββββββββββββββββββββββββββββββββ β β β β Berguna untuk: Debugging latency, distributed β β β β system tracing, dependency analysis β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β METRICS β β β β "Seberapa sehat sistem saya?" β β β β β β β β ββββββββββββ ββββββββββββ ββββββββββββ β β β β βRequest/s β βLatency β βError Rateβ β β β β β 450 β β 25ms β β 0.1% β β β β β ββββββββββββ ββββββββββββ ββββββββββββ β β β β βββββββββββββββββββββββββββββββββββββββββββββββββ β β β β Berguna untuk: Alerting, dashboards, capacity β β β β planning, SLA monitoring β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β LOGS β β β β "Apa yang terjadi pada sistem?" β β β β β β β β [2026-06-26T10:00:00Z] INFO trace_id=abc123 β β β β span_id=def456 "Processing user order" β β β β [2026-06-26T10:00:01Z] ERROR trace_id=abc123 β β β β span_id=ghi789 "Database connection timeout" β β β β βββββββββββββββββββββββββββββββββββββββββββββββββ β β β β Berguna untuk: Debugging error, audit trail, β β β β compliance, detailed context β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β CORRELATION (Korelasi) β β β β β β β β Trace β Metrics β Logs β β β β Semua data terkorelasi via Trace ID dan Span ID β β β β β Dari alert (metrics) β ke trace β ke logs β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
3. Arsitektur OpenTelemetry
Komponen Utama
| Komponen | Fungsi |
|---|---|
| API | Interface yang digunakan aplikasi untuk membuat spans, metrics, dan logs. Stabil dan tidak berubah |
| SDK | Implementasi dari API β processing, sampling, export. Bisa dikonfigurasi |
| Collector | Proxy yang menerima, memproses, dan mengekspor telemetry data. Bisa di-deploy sebagai agent atau gateway |
| Instrumentation Libraries | Library yang otomatis instrumentasi framework populer (Express, Flask, Spring, dll) |
| Exporters | Kirim data ke backend (Jaeger, Prometheus, Grafana Tempo, Datadog, dll) |
| Context Propagation | Menyebarluaskan trace context antar service (W3C TraceContext, Baggage) |
Data Flow
Data Flow dalam OpenTelemetry:
Application Code
β OTel API (create spans, record metrics)
β OTel SDK (process, sample, batch)
β OTel Exporter (send data)
β OTel Collector (optional: receive, process, export)
β Backend (Jaeger, Prometheus, Grafana, dll)
Modes:
1. Direct Export: App β Exporter β Backend
2. Via Collector: App β Exporter β Collector β Backend (recommended)
3. Via Agent: App β gRPC/HTTP β Collector Agent β Backend
4. Distributed Tracing
Distributed Tracing memungkinkan Anda melacak journey sebuah request dari awal hingga akhir β melewati berbagai service dalam sistem Anda. Setiap "span" merepresentasikan satu unit of work, dan kumpulan spans membentuk satu "trace".
Konsep Tracing
Trace ID: abc123def456 βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β Span A: API Gateway β β [==============================================================]β β 100ms β β β β ββββββββββββββββββββββββββββββββββββββββββββββ β β β Span B: User Service β β β β [==========================================]β β β β 60ms β β β β β β β β ββββββββββββββββββββββ β β β β β Span C: Database β β β β β β [==================]β β β β β β 20ms β β β β β ββββββββββββββββββββββ β β β β β β β β ββββββββββββββββββββββββ β β β β β Span D: Redis Cache β β β β β β [====================]β β β β β β 15ms β β β β β ββββββββββββββββββββββββ β β β βββββββββββββββββββββββββββββββββββββββββββββββ β β β β ββββββββββββββββββββββββββββββββββββ β β β Span E: Order Service β β β β [================================]β β β β 30ms β β β ββββββββββββββββββββββββββββββββββββ β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ Terminology: Trace = End-to-end journey dari sebuah request Span = Satu unit of work (function call, DB query, HTTP request) Context = Metadata (trace ID, span ID, baggage) yang di-propagate Parent = Span yang memanggil span lain Child = Span yang dipanggil oleh parent span Attributes = Key-value pairs pada span (http.method, db.system, dll) Events = Timestamped events pada span (exception, log, dll) Status = Status span (OK, ERROR, UNSET)
Implementasi Tracing di Node.js
// tracing.js β Setup OpenTelemetry tracing (load SEBELUM app code)
const { NodeSDK } = require('@opentelemetry/sdk-node');
const { getNodeAutoInstrumentations } = require('@opentelemetry/auto-instrumentations-node');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-http');
const { OTLPMetricExporter } = require('@opentelemetry/exporter-metrics-otlp-http');
const { PeriodicExportingMetricReader } = require('@opentelemetry/sdk-metrics');
const { Resource } = require('@opentelemetry/resources');
const { ATTR_SERVICE_NAME, ATTR_SERVICE_VERSION } = require('@opentelemetry/semantic-conventions');
// Definisikan resource (service metadata)
const resource = new Resource({
[ATTR_SERVICE_NAME]: 'user-service',
[ATTR_SERVICE_VERSION]: '1.0.0',
'deployment.environment': process.env.NODE_ENV || 'development',
});
// Setup trace exporter
const traceExporter = new OTLPTraceExporter({
url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT || 'http://localhost:4318/v1/traces',
});
// Setup metric exporter
const metricExporter = new OTLPMetricExporter({
url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT || 'http://localhost:4318/v1/metrics',
});
// Initialize SDK
const sdk = new NodeSDK({
resource,
traceExporter,
metricReader: new PeriodicExportingMetricReader({
exporter: metricExporter,
exportIntervalMillis: 15000,
}),
instrumentations: [
getNodeAutoInstrumentations({
// Konfigurasi instrumentasi
'@opentelemetry/instrumentation-http': {
enabled: true,
},
'@opentelemetry/instrumentation-express': {
enabled: true,
},
'@opentelemetry/instrumentation-pg': {
enabled: true,
},
'@opentelemetry/instrumentation-redis': {
enabled: true,
},
}),
],
});
// Start SDK
sdk.start();
console.log('OpenTelemetry SDK initialized');
// Graceful shutdown
process.on('SIGTERM', () => {
sdk.shutdown()
.then(() => console.log('OpenTelemetry SDK shut down'))
.catch((err) => console.error('Error shutting down SDK:', err))
.finally(() => process.exit(0));
});
App Code dengan Tracing
// app.js β Express app dengan OpenTelemetry
// PENTING: require('./tracing') harus di-import PALING PERTAMA
require('./tracing');
const express = require('express');
const { trace, SpanStatusCode } = require('@opentelemetry/api');
const app = express();
// Dapatkan tracer untuk service ini
const tracer = trace.getTracer('user-service', '1.0.0');
app.get('/users/:id', async (req, res) => {
// Auto-instrumentation sudah membuat span untuk request ini
// Tapi kita bisa menambahkan custom spans
const span = tracer.startSpan('fetch-user-data', {
attributes: {
'user.id': req.params.id,
'http.method': req.method,
},
});
try {
// Simulate database call
const user = await tracer.startActiveSpan('database.query', async (dbSpan) => {
dbSpan.setAttribute('db.system', 'postgresql');
dbSpan.setAttribute('db.statement', 'SELECT * FROM users WHERE id = ?');
const result = await db.getUser(req.params.id);
dbSpan.setAttribute('db.rows_affected', result.rowCount);
dbSpan.end();
return result;
});
// Tambahkan event ke span
span.addEvent('user_found', {
'user.name': user.name,
'user.email': user.email,
});
span.setStatus({ code: SpanStatusCode.OK });
res.json(user);
} catch (error) {
// Record error di span
span.recordException(error);
span.setStatus({
code: SpanStatusCode.ERROR,
message: error.message,
});
res.status(500).json({ error: error.message });
} finally {
span.end();
}
});
app.listen(3000, () => {
console.log('Server running on port 3000');
});
Implementasi di Python
# tracing.py β Setup OpenTelemetry untuk Python
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.resources import Resource
from opentelemetry.semantics.resource import ResourceAttributes
from opentelemetry.instrumentation.flask import FlaskInstrumentor
from opentelemetry.instrumentation.requests import RequestsInstrumentor
from opentelemetry.instrumentation.psycopg2 import Psycopg2Instrumentor
# Definisikan resource
resource = Resource.create({
ResourceAttributes.SERVICE_NAME: "python-api",
ResourceAttributes.SERVICE_VERSION: "1.0.0",
"deployment.environment": "production",
})
# Setup provider dan exporter
provider = TracerProvider(resource=resource)
processor = BatchSpanProcessor(OTLPSpanExporter(
endpoint="http://localhost:4318/v1/traces"
))
provider.add_span_processor(processor)
trace.set_tracer_provider(provider)
# Auto-instrumentasi
FlaskInstrumentor().instrument()
RequestsInstrumentor().instrument()
Psycopg2Instrumentor().instrument()
# app.py β Flask app dengan custom spans
from flask import Flask, jsonify
from opentelemetry import trace
app = Flask(__name__)
tracer = trace.get_tracer("python-api")
@app.route('/api/users/')
def get_user(user_id):
with tracer.start_as_current_span("get-user-details") as span:
span.set_attribute("user.id", user_id)
# Database query (auto-instrumented)
with tracer.start_as_current_span("db-query") as db_span:
db_span.set_attribute("db.system", "postgresql")
user = db.execute("SELECT * FROM users WHERE id = %s", [user_id])
span.add_event("user_fetched", {"user.id": user_id})
return jsonify(user)
if __name__ == '__main__':
app.run(host='0.0.0.0', port=5000)
5. Metrics Collection
OpenTelemetry Metrics memungkinkan Anda mengumpulkan data numerik tentang performa sistem β request count, latency, error rate, resource usage, dan custom business metrics.
Tipe Metrics
| Tipe | Fungsi | Contoh |
|---|---|---|
| Counter | Nilai yang selalu bertambah | Total requests, total errors |
| Gauge | Nilai yang bisa naik turun | Active connections, memory usage |
| Histogram | Distribusi nilai | Request latency, response size |
| Up/Down Counter | Counter yang bisa naik turun | Active tasks, queue size |
// metrics.js β Custom metrics dengan OpenTelemetry
const { metrics } = require('@opentelemetry/api');
// Dapatkan meter
const meter = metrics.getMeter('user-service', '1.0.0');
// 1. Counter β total requests
const requestCounter = meter.createCounter('http.server.requests', {
description: 'Total HTTP requests',
unit: '1',
});
// 2. Histogram β request duration
const requestDuration = meter.createHistogram('http.server.duration', {
description: 'HTTP request duration',
unit: 'ms',
});
// 3. Gauge β active connections
const activeConnections = meter.createUpDownCounter('http.server.active_connections', {
description: 'Number of active connections',
unit: '1',
});
// 4. Custom business metric
const orderCounter = meter.createCounter('business.orders.created', {
description: 'Total orders created',
unit: '1',
});
const orderValue = meter.createHistogram('business.orders.value', {
description: 'Order value distribution',
unit: 'USD',
});
// Menggunakan metrics di dalam kode
app.use((req, res, next) => {
const startTime = Date.now();
// Increment active connections
activeConnections.add(1, { method: req.method });
res.on('finish', () => {
const duration = Date.now() - startTime;
// Record request
requestCounter.add(1, {
method: req.method,
route: req.route?.path || 'unknown',
status_code: res.statusCode,
});
// Record duration
requestDuration.record(duration, {
method: req.method,
route: req.route?.path || 'unknown',
});
// Decrement active connections
activeConnections.add(-1, { method: req.method });
});
next();
});
// Record business metrics
app.post('/api/orders', async (req, res) => {
const order = await createOrder(req.body);
orderCounter.add(1, {
category: order.category,
region: order.region,
});
orderValue.record(order.total, {
category: order.category,
});
res.json(order);
});
6. Log Correlation
OpenTelemetry Logs memungkinkan Anda mengkorelasikan log entries dengan traces dan metrics. Dengan OTel, setiap log entry bisa memiliki trace ID dan span ID, sehingga Anda bisa langsung navigasi dari log ke trace yang terkait.
// logging.js β Logs dengan OpenTelemetry correlation
const { logs, SeverityNumber } = require('@opentelemetry/api-logs');
const { trace } = require('@opentelemetry/api');
const logger = logs.getLogger('user-service');
function logWithContext(level, message, attributes = {}) {
const span = trace.getActiveSpan();
const spanContext = span?.spanContext();
logger.emit({
severityText: level,
severityNumber: level === 'ERROR' ? SeverityNumber.ERROR
: level === 'WARN' ? SeverityNumber.WARN
: SeverityNumber.INFO,
body: message,
attributes: {
...attributes,
// OTel SDK otomatis menambahkan trace_id dan span_id
// jika log di-emit dari dalam span context
},
});
}
// Contoh penggunaan
app.get('/api/users/:id', async (req, res) => {
const span = trace.getActiveSpan();
logWithContext('INFO', 'Processing user request', {
'user.id': req.params.id,
'http.method': req.method,
});
try {
const user = await getUser(req.params.id);
logWithContext('INFO', 'User fetched successfully', {
'user.id': user.id,
'user.name': user.name,
});
res.json(user);
} catch (error) {
logWithContext('ERROR', 'Failed to fetch user', {
'error.message': error.message,
'error.stack': error.stack,
});
res.status(500).json({ error: error.message });
}
});
Structured Logging Output
{
"timestamp": "2026-06-26T10:00:00.123Z",
"severityText": "INFO",
"severityNumber": 9,
"body": "User fetched successfully",
"resource": {
"service.name": "user-service",
"service.version": "1.0.0",
"deployment.environment": "production"
},
"attributes": {
"user.id": "12345",
"user.name": "Budi Santoso",
"trace_id": "abc123def45678901234567890123456",
"span_id": "def4567890123456"
}
}
// Dengan trace_id dan span_id, Anda bisa:
// 1. Click dari log ke trace di Jaeger/Grafana
// 2. Filter logs berdasarkan trace_id
// 3. Melihat semua log dalam satu request
7. Auto-instrumentation
Fitur paling powerful dari OpenTelemetry adalah auto-instrumentation β library yang otomatis membuat spans untuk operasi yang umum (HTTP calls, database queries, cache operations) tanpa perlu menulis kode instrumentasi manual.
Supported Libraries per Language
| Bahasa | Libraries yang Didukung Auto-instrumentation |
|---|---|
| Node.js | Express, Fastify, Koa, HTTP, PostgreSQL, MySQL, Redis, MongoDB, gRPC, GraphQL, AWS SDK |
| Python | Flask, Django, FastAPI, HTTPX, psycopg2, SQLAlchemy, Redis, Celery, requests |
| Java | Spring, Servlet, JDBC, MongoDB, Redis, Kafka, gRPC, OkHttp, Apache HTTP |
| Go | net/http, gRPC, database/sql, MongoDB, Redis |
| .NET | ASP.NET Core, EF Core, SQL Client, HTTP Client, gRPC, MongoDB |
Zero-Code Instrumentation (Java)
# Java Agent β Zero code changes! # Download otel java agent curl -LO https://github.com/open-telemetry/opentelemetry-java-instrumentation/releases/latest/download/opentelemetry-javaagent.jar # Jalankan aplikasi dengan agent java -javaagent:opentelemetry-javaagent.jar \ -Dotel.service.name=my-java-app \ -Dotel.exporter.otlp.endpoint=http://localhost:4317 \ -jar my-app.jar # Environment variables configuration export OTEL_SERVICE_NAME=my-java-app export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317 export OTEL_TRACES_SAMPLER=parentbased_traceidratio export OTEL_TRACES_SAMPLER_ARG=0.1 export OTEL_RESOURCE_ATTRIBUTES="service.version=1.0.0,deployment.env=prod" java -javaagent:opentelemetry-javaagent.jar -jar my-app.jar
Zero-Code Instrumentation (Python)
# Python β Auto-instrumentasi via opentelemetry-instrument CLI pip install opentelemetry-distro opentelemetry-exporter-otlp opentelemetry-bootstrap -a install # Jalankan dengan auto-instrumentasi opentelemetry-instrument \ --service_name my-python-app \ --exporter_otlp_endpoint http://localhost:4318 \ python my_app.py # Atau dengan gunicorn opentelemetry-instrument \ --service_name my-python-app \ gunicorn -w 4 -b 0.0.0.0:8000 myapp:app
8. Manual Instrumentation
Untuk operasi bisnis spesifik yang tidak di-cover oleh auto-instrumentation, Anda perlu menulis instrumentasi manual. Ini memberikan detail yang lebih kaya tentang operasi business logic.
// Manual instrumentation untuk business logic
const { trace, context, SpanKind, SpanStatusCode } = require('@opentelemetry/api');
const tracer = trace.getTracer('order-service', '1.0.0');
async function processOrder(orderData) {
// Buat span parent untuk operasi bisnis
return tracer.startActiveSpan(
'process-order',
{
kind: SpanKind.INTERNAL,
attributes: {
'order.id': orderData.id,
'order.customer_id': orderData.customerId,
'order.item_count': orderData.items.length,
'order.total': orderData.total,
},
},
async (parentSpan) => {
try {
// 1. Validasi order
const validatedOrder = await tracer.startActiveSpan(
'validate-order',
async (span) => {
span.setAttribute('order.items', JSON.stringify(orderData.items));
const result = await validateOrder(orderData);
span.addEvent('validation_passed');
span.end();
return result;
}
);
// 2. Proses pembayaran
const payment = await tracer.startActiveSpan(
'process-payment',
{
attributes: {
'payment.method': orderData.paymentMethod,
'payment.amount': orderData.total,
'payment.currency': 'IDR',
},
},
async (span) => {
try {
const result = await paymentService.charge(orderData);
span.setAttribute('payment.transaction_id', result.transactionId);
span.setStatus({ code: SpanStatusCode.OK });
span.end();
return result;
} catch (err) {
span.recordException(err);
span.setStatus({ code: SpanStatusCode.ERROR, message: err.message });
span.end();
throw err;
}
}
);
// 3. Kirim notifikasi
await tracer.startActiveSpan('send-notification', async (span) => {
span.setAttribute('notification.type', 'order_confirmation');
span.setAttribute('notification.channel', 'email');
await sendEmail(orderData.customerId, 'Order confirmed');
span.addEvent('notification_sent');
span.end();
});
parentSpan.setStatus({ code: SpanStatusCode.OK });
return { success: true, orderId: orderData.id };
} catch (error) {
parentSpan.recordException(error);
parentSpan.setStatus({ code: SpanStatusCode.ERROR, message: error.message });
throw error;
} finally {
parentSpan.end();
}
}
);
}
9. Exporters & Backends
OpenTelemetry mendukung export ke berbagai backend observability. Pilihan backend menentukan bagaimana Anda menyimpan, memvisualisasi, dan meng-query telemetry data.
Backend Options
| Backend | Data Type | Open Source | Keterangan |
|---|---|---|---|
| Jaeger | Traces | β | Visualisasi distributed traces, trace comparison |
| Zipkin | Traces | β | Alternatif Jaeger untuk distributed tracing |
| Prometheus | Metrics | β | Time-series database untuk metrics, PromQL |
| Grafana | All | β | Dashboard visualization β terkoneksi ke semua backend |
| Grafana Tempo | Traces | β | Trace backend dari Grafana Labs β scalable |
| Grafana Loki | Logs | β | Log aggregation system dari Grafana Labs |
| Elasticsearch | All | β | Full-text search untuk logs dan traces |
| Datadog | All | β | Managed platform β terima OTLP langsung |
| New Relic | All | β | Managed platform β terima OTLP langsung |
| Honeycomb | All | β | Fokus pada high-cardinality data exploration |
Setup Grafana Stack (Recommended)
# docker-compose.yml β Full Grafana Observability Stack
version: '3.8'
services:
# OpenTelemetry Collector
otel-collector:
image: otel/opentelemetry-collector-contrib:latest
command: ["--config=/etc/otel-collector-config.yaml"]
volumes:
- ./otel-collector-config.yaml:/etc/otel-collector-config.yaml
ports:
- "4317:4317" # gRPC
- "4318:4318" # HTTP
# Jaeger β Traces
jaeger:
image: jaegertracing/all-in-one:latest
environment:
COLLECTOR_OTLP_ENABLED: "true"
ports:
- "16686:16686" # Jaeger UI
- "14250:14250" # gRPC
# Prometheus β Metrics
prometheus:
image: prom/prometheus:latest
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
ports:
- "9090:9090" # Prometheus UI
# Grafana β Dashboard
grafana:
image: grafana/grafana:latest
environment:
GF_SECURITY_ADMIN_PASSWORD: admin
ports:
- "3000:3000" # Grafana UI
volumes:
- grafana-storage:/var/lib/grafana
# Loki β Logs
loki:
image: grafana/loki:latest
ports:
- "3100:3100"
# Tempo β Traces (alternatif Jaeger)
tempo:
image: grafana/tempo:latest
ports:
- "3200:3200"
volumes:
grafana-storage:
OTel Collector Configuration
# otel-collector-config.yaml
receivers:
otlp:
protocols:
grpc:
endpoint: "0.0.0.0:4317"
http:
endpoint: "0.0.0.0:4318"
processors:
batch:
timeout: 5s
send_batch_size: 1024
memory_limiter:
check_interval: 1s
limit_mib: 512
spike_limit_mib: 128
# Tail-based sampling β keep errors and slow requests
tail_sampling:
decision_wait: 10s
policies:
- name: error-policy
type: status_code
status_code: {status_codes: [ERROR]}
- name: slow-request-policy
type: latency
latency: {threshold_ms: 1000}
- name: probabilistic-policy
type: probabilistic
probabilistic: {sampling_percentage: 10}
exporters:
# Traces β Jaeger
otlp/jaeger:
endpoint: "jaeger:4317"
tls:
insecure: true
# Metrics β Prometheus
prometheus:
endpoint: "0.0.0.0:8889"
# Logs β Loki
loki:
endpoint: "http://loki:3100/loki/api/v1/push"
# Debug β print ke stdout
debug:
verbosity: basic
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, batch, tail_sampling]
exporters: [otlp/jaeger]
metrics:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [prometheus]
logs:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [loki]
10. OpenTelemetry Collector
OpenTelemetry Collector adalah komponen pusat yang menerima telemetry dari semua aplikasi, memprosesnya (filter, batch, transform), dan mengekspor ke berbagai backend. Ini menghilangkan vendor lock-in di sisi aplikasi β apps hanya perlu mengirim ke Collector.
Deployment Modes
MODE 1: AGENT (sidecar per host)
ββββββββββββββββββββββββββββββββββββββββ
β Host / Pod β
β ββββββββββββ ββββββββββββββββββββ β
β β App β β OTel Collector β β
β β ββββ (Agent Mode) β β
β ββββββββββββ ββββββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββ
- Low latency, high availability
- Co-located with application
MODE 2: GATEWAY (centralized)
ββββββββββββ ββββββββββββ ββββββββββββ
β App 1 β β App 2 β β App 3 β
ββββββ¬ββββββ ββββββ¬ββββββ ββββββ¬ββββββ
β β β
ββββββββββββββΌβββββββββββββ
βΌ
βββββββββββββββββββββββββββββββββββββββ
β OTel Collector (Gateway) β
β - Receive from all apps β
β - Process (sample, filter) β
β - Export to multiple backends β
ββββββββββββ¬ββββββββββββββ¬βββββββββββββ
βΌ βΌ
ββββββββββββ ββββββββββββ
β Jaeger β βPrometheusβ
ββββββββββββ ββββββββββββ
- Centralized processing
- Easier management
MODE 3: COMBINED (Agent + Gateway)
App β Agent Collector β Gateway Collector β Backends
- Best of both worlds
- Agent handles initial processing
- Gateway handles routing & export
11. Best Practices
| Best Practice | Detail |
|---|---|
| Load OTel First | Selalu import/setup OpenTelemetry SEBELUM import library lain β agar auto-instrumentation bisa meng-hook library |
| Resource Attributes | Set service.name, service.version, deployment.environment di semua service |
| Sampling | Gunakan sampling untuk production β jangan trace 100% request. Gunakan tail-based sampling untuk mempertahankan traces error |
| OTLP Protocol | Gunakan OTLP (OpenTelemetry Protocol) sebagai default exporter β paling efisien dan universal |
| Collector | Selalu gunakan OTel Collector sebagai perantara β decouple app dari backend |
| Batch Processor | Gunakan batch processor untuk mengurangi jumlah export calls |
| Semantic Conventions | Gunakan OTel semantic conventions untuk attribute names β konsisten antar service |
| Graceful Shutdown | Panggil sdk.shutdown() saat SIGTERM untuk flush semua data yang belum ter-export |
| Cardinality | Hindari metric attributes dengan cardinality tinggi (misalnya user_id) β bisa memakan banyak memori |
| Context Propagation | Pastikan trace context di-propagate ke semua service calls β HTTP headers, message queue headers |
Mulai dengan auto-instrumentation untuk mendapatkan visibility cepat tanpa banyak code changes. Setelah terbiasa, tambahkan manual instrumentation untuk business-critical paths yang memerlukan detail lebih. Gunakan Grafana + Prometheus + Jaeger + Loki sebagai stack observability open-source yang lengkap.
Hati-hati dengan high cardinality attributes pada metrics β atribut seperti user_id atau request_id bisa menghasilkan jutaan time series yang memakan banyak memori di Prometheus. Simpan high-cardinality data di traces dan logs, bukan di metrics.
12. Quiz Pemahaman
π Quiz: Pemahaman OpenTelemetry
1. Apa tiga pilar observability yang dicakup OpenTelemetry?
2. Apa itu "span" dalam distributed tracing?
3. Mengapa OpenTelemetry lebih baik dari vendor proprietary?
4. Apa fungsi OpenTelemetry Collector?
5. Apa itu "auto-instrumentation" di OpenTelemetry?