1. Pengenalan MLOps
MLOps (Machine Learning Operations) adalah praktik yang menggabungkan Machine Learning, DevOps, dan Data Engineering untuk menyederhanakan proses deployment, monitoring, dan pemeliharaan model ML di production. MLOps memastikan model ML yang dilatih di lab bisa berjalan dengan andal di dunia nyata.
Banyak data scientist menghabiskan waktu berbulan-bulan melatih model yang akurat, tetapi kesulitan untuk meng-deploy-nya. Menurut riset, hanya sekitar 50% model ML yang berhasil masuk ke production. MLOps hadir untuk menjembatani gap antara eksperimen dan production.
Mengapa MLOps Penting?
| Tantangan | Penjelasan |
|---|---|
| Model Drift | Performa model menurun seiring waktu karena data berubah (concept drift, data drift) |
| Reproducibility | Sulit mereproduksi hasil eksperimen yang sama |
| Scalability | Model yang berjalan di laptop belum tentu bisa menangani ribuan request/detik |
| Monitoring | Tidak tahu kapan model mulai memberikan prediksi yang buruk |
| Versioning | Tidak ada tracking versi model, data, dan kode yang tepat |
| Kolaborasi | Gap antara tim data science dan tim engineering/DevOps |
MLOps Maturity Level
| Level | Karakteristik | Otomatisasi |
|---|---|---|
| Level 0 β Manual | Deploy model secara manual, tidak ada monitoring | π΄ Rendah |
| Level 1 β Pipeline | Training pipeline otomatis, model versioning, basic monitoring | π‘ Sedang |
| Level 2 β CI/CD | Fully automated pipeline, A/B testing, auto-retraining, real-time monitoring | π’ Tinggi |
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β MLOps LIFECYCLE β β β β βββββββββββ ββββββββββββ ββββββββββββββββββββ β β β Data β β Model β β Model β β β β Eng. ββββΆβ TrainingββββΆβ Validation β β β β β β β β β β β βββββββββββ ββββββββββββ ββββββββββ¬ββββββββββ β β β² β β β β βΌ β β βββββββββββ ββββββββββββ ββββββββββββββββββββ β β βMonitoringβββββ Model βββββ Model β β β β& Alert β β Serving β β Packaging β β β β β β (API) β β (Docker) β β β βββββββββββ ββββββββββββ ββββββββββββββββββββ β β β β β βΌ β β ββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β CI/CD Pipeline: Test β Build β Deploy β Monitorβ β β ββββββββββββββββββββββββββββββββββββββββββββββββββββ β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
MLOps bukan hanya tentang tools β ini tentang mindset. Mulai dari yang sederhana: version control untuk kode, model registry, dan basic monitoring. Tambahkan otomatisasi secara bertahap sesuai kebutuhan proyek Anda.
2. Model Serialization
Model serialization adalah proses menyimpan model ML yang sudah dilatih ke dalam file sehingga bisa dimuat kembali tanpa perlu melatih ulang. Ini adalah langkah pertama menuju deployment.
Menyimpan Model dengan Berbagai Format
import joblib
import pickle
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
# ===== Latih Model =====
X, y = make_classification(
n_samples=1000, n_features=20,
n_informative=15, random_state=42
)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
print(f"Akurasi: {model.score(X_test, y_test):.4f}")
# ===== METHOD 1: joblib (direkomendasikan untuk scikit-learn) =====
# Simpan model
joblib.dump(model, 'model_rf.joblib')
# Muat model
loaded_model = joblib.load('model_rf.joblib')
print(f"Akurasi loaded: {loaded_model.score(X_test, y_test):.4f}")
# Simpan dengan kompresi
joblib.dump(model, 'model_rf_compressed.joblib', compress=3)
# ===== METHOD 2: pickle (built-in Python) =====
import pickle
with open('model_rf.pkl', 'wb') as f:
pickle.dump(model, f)
with open('model_rf.pkl', 'rb') as f:
loaded_pkl = pickle.load(f)
# ===== METHOD 3: ONNX (cross-platform) =====
# pip install skl2onnx onnxruntime
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
# Konversi ke ONNX
initial_type = [('float_input', FloatTensorType([None, 20]))]
onnx_model = convert_sklearn(model, initial_types=initial_type)
with open('model_rf.onnx', 'wb') as f:
f.write(onnx_model.SerializeToString())
# Inferensi dengan ONNX Runtime
import onnxruntime as ort
session = ort.InferenceSession('model_rf.onnx')
input_name = session.get_inputs()[0].name
result = session.run(None, {input_name: X_test[:5].astype(np.float32)})
print(f"Prediksi ONNX: {result[0]}")
# ===== METHOD 4: PyTorch Model =====
import torch
import torch.nn as nn
class SimpleNet(nn.Module):
def __init__(self, input_dim):
super().__init__()
self.fc1 = nn.Linear(input_dim, 64)
self.fc2 = nn.Linear(64, 32)
self.fc3 = nn.Linear(32, 1)
self.relu = nn.ReLU()
self.sigmoid = nn.Sigmoid()
def forward(self, x):
x = self.relu(self.fc1(x))
x = self.relu(self.fc2(x))
x = self.sigmoid(self.fc3(x))
return x
# Simpan seluruh model
torch.save(model_pytorch, 'pytorch_model.pth')
# Simpan hanya state_dict (lebih ringan)
torch.save(model_pytorch.state_dict(), 'pytorch_state_dict.pth')
# Muat kembali
loaded_pytorch = SimpleNet(20)
loaded_pytorch.load_state_dict(torch.load('pytorch_state_dict.pth'))
loaded_pytorch.eval()
File pickle/joblib bisa menjalankan kode arbitrary saat dimuat. Jangan pernah memuat model dari sumber yang tidak terpercaya. Gunakan format ONNX atau SavedModel untuk keamanan yang lebih baik di production.
3. API Serving dengan FastAPI
Setelah model disimpan, langkah selanjutnya adalah membuat API sehingga model bisa diakses oleh aplikasi lain. FastAPI adalah framework Python modern yang sangat cepat dan cocok untuk serving model ML karena mendukung async dan validasi data otomatis.
# app.py β FastAPI Model Serving
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel, Field
from typing import List
import joblib
import numpy as np
import time
import logging
# Setup logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
# Inisialisasi FastAPI
app = FastAPI(
title="ML Model API",
description="API untuk serving model klasifikasi",
version="1.0.0"
)
# Muat model saat startup
model = None
@app.on_event("startup")
async def load_model():
global model
model = joblib.load('model_rf.joblib')
logger.info("Model berhasil dimuat!")
# ===== Request & Response Schema =====
class PredictionRequest(BaseModel):
features: List[float] = Field(
...,
min_length=20,
max_length=20,
description="20 fitur input untuk prediksi"
)
class PredictionResponse(BaseModel):
prediction: int
probability: float
latency_ms: float
class BatchRequest(BaseModel):
instances: List[List[float]]
class HealthResponse(BaseModel):
status: str
model_loaded: bool
version: str
# ===== Endpoints =====
@app.get("/health", response_model=HealthResponse)
async def health_check():
return HealthResponse(
status="healthy",
model_loaded=model is not None,
version="1.0.0"
)
@app.post("/predict", response_model=PredictionResponse)
async def predict(request: PredictionRequest):
if model is None:
raise HTTPException(status_code=503, detail="Model belum dimuat")
try:
start_time = time.time()
# Preprocessing
features = np.array(request.features).reshape(1, -1)
# Prediksi
prediction = model.predict(features)[0]
probability = model.predict_proba(features)[0].max()
latency = (time.time() - start_time) * 1000
logger.info(f"Prediksi: {prediction}, Prob: {probability:.4f}, "
f"Latency: {latency:.2f}ms")
return PredictionResponse(
prediction=int(prediction),
probability=float(probability),
latency_ms=round(latency, 2)
)
except Exception as e:
logger.error(f"Error prediksi: {str(e)}")
raise HTTPException(status_code=500, detail=str(e))
@app.post("/predict/batch")
async def predict_batch(request: BatchRequest):
if model is None:
raise HTTPException(status_code=503, detail="Model belum dimuat")
start_time = time.time()
features = np.array(request.instances)
predictions = model.predict(features).tolist()
probabilities = model.predict_proba(features).max(axis=1).tolist()
latency = (time.time() - start_time) * 1000
return {
"predictions": predictions,
"probabilities": probabilities,
"latency_ms": round(latency, 2),
"count": len(predictions)
}
# Jalankan dengan: uvicorn app:app --host 0.0.0.0 --port 8000
Testing API
import requests
BASE_URL = "http://localhost:8000"
# Health check
response = requests.get(f"{BASE_URL}/health")
print(response.json())
# Single prediction
data = {
"features": [0.5, 1.2, -0.3, 0.8, 1.1, -0.5, 0.2, 0.9,
-1.0, 0.4, 0.7, -0.2, 1.3, 0.1, -0.8, 0.6,
-0.4, 1.5, 0.3, -0.7]
}
response = requests.post(f"{BASE_URL}/predict", json=data)
print(response.json())
# {"prediction": 1, "probability": 0.8934, "latency_ms": 2.31}
# Batch prediction
batch_data = {
"instances": [
[0.5, 1.2, -0.3, 0.8, 1.1, -0.5, 0.2, 0.9,
-1.0, 0.4, 0.7, -0.2, 1.3, 0.1, -0.8, 0.6,
-0.4, 1.5, 0.3, -0.7],
[-0.3, 0.5, 1.1, -0.8, 0.2, 0.9, -1.0, 0.4,
0.7, -0.2, 1.3, 0.1, -0.5, 0.6, 0.8, -0.4,
1.2, -0.1, 0.5, 0.3]
]
}
response = requests.post(f"{BASE_URL}/predict/batch", json=batch_data)
print(response.json())
4. Containerization dengan Docker
Docker memastikan aplikasi ML Anda berjalan dengan cara yang sama di semua environment β dari laptop developer hingga server production. Docker membungkus model, dependencies, dan runtime ke dalam satu container yang portable.
Dockerfile untuk ML Model
# Dockerfile untuk ML Model API
FROM python:3.11-slim
# Set working directory
WORKDIR /app
# Install system dependencies
RUN apt-get update && apt-get install -y \
gcc \
&& rm -rf /var/lib/apt/lists/*
# Copy requirements
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy model dan source code
COPY model_rf.joblib .
COPY app.py .
# Create non-root user (security best practice)
RUN useradd -m mluser
USER mluser
# Expose port
EXPOSE 8000
# Health check
HEALTHCHECK --interval=30s --timeout=10s --retries=3 \
CMD curl -f http://localhost:8000/health || exit 1
# Run API server
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]
fastapi==0.111.0 uvicorn==0.30.1 scikit-learn==1.5.0 joblib==1.4.2 numpy==1.26.4 pydantic==2.7.4
Docker Compose untuk Full Stack
version: '3.8'
services:
# ML Model API
ml-api:
build: .
ports:
- "8000:8000"
environment:
- MODEL_PATH=/app/model_rf.joblib
- LOG_LEVEL=INFO
restart: always
deploy:
resources:
limits:
memory: 1G
cpus: '1.0'
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 30s
timeout: 10s
retries: 3
# Prometheus monitoring
prometheus:
image: prom/prometheus:latest
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
# Grafana dashboard
grafana:
image: grafana/grafana:latest
ports:
- "3000:3000"
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
depends_on:
- prometheus
Build dan Run
# Build image
docker build -t ml-model-api:v1.0 .
# Run container
docker run -d \
--name ml-api \
-p 8000:8000 \
--memory=1g \
--cpus=1 \
ml-model-api:v1.0
# Cek container berjalan
docker ps
# Lihat logs
docker logs -f ml-api
# Jalankan dengan docker compose
docker compose up -d
# Scale container (multiple instances)
docker compose up -d --scale ml-api=3
# Stop semua
docker compose down
Untuk image yang lebih kecil, gunakan multi-stage build. Pisahkan tahap install dependencies (yang besar) dari tahap runtime. Ini bisa mengurangi ukuran image hingga 50-70%, terutama jika Anda membutuhkan compiler atau build tools.
5. Monitoring Model di Production
Deploy model bukan akhir dari cerita. Model ML perlu dipantau terus-menerus karena performanya bisa menurun seiring waktu akibat perubahan data (data drift) atau perubahan hubungan antara fitur dan target (concept drift).
Apa yang Perlu Dimonitor?
| Metric | Deskripsi | Tool |
|---|---|---|
| Latency | Waktu respons API (P50, P95, P99) | Prometheus, Grafana |
| Throughput | Jumlah request per detik | Prometheus |
| Error Rate | Persentase request yang gagal | Prometheus, Alert Manager |
| Prediction Distribution | Distribusi output prediksi | Evidently AI, WhyLabs |
| Feature Drift | Perubahan distribusi input data | Evidently AI, Alibi Detect |
| Model Accuracy | Akurasi aktual vs expected (jika ada ground truth) | Custom metrics |
| Resource Usage | CPU, Memory, GPU utilization | cAdvisor, Node Exporter |
Implementasi Monitoring dengan Prometheus
# monitoring.py β Tambahkan metrics ke FastAPI
from prometheus_client import Counter, Histogram, Gauge, generate_latest
from fastapi import Response
import time
# Definisi metrics
REQUEST_COUNT = Counter(
'ml_predictions_total',
'Total jumlah prediksi',
['endpoint', 'status']
)
PREDICTION_LATENCY = Histogram(
'ml_prediction_latency_seconds',
'Waktu prediksi',
buckets=[0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0]
)
PREDICTION_DISTRIBUTION = Counter(
'ml_prediction_distribution',
'Distribusi hasil prediksi',
['prediction_class']
)
FEATURE_DRIFT_SCORE = Gauge(
'ml_feature_drift_score',
'Skor drift untuk fitur input'
)
# Middleware untuk tracking
@app.middleware("http")
async def monitor_requests(request, call_next):
start_time = time.time()
response = await call_next(request)
duration = time.time() - start_time
REQUEST_COUNT.labels(
endpoint=request.url.path,
status=response.status_code
).inc()
return response
# Metrics endpoint untuk Prometheus scrape
@app.get("/metrics")
async def metrics():
return Response(
content=generate_latest(),
media_type="text/plain"
)
# Modifikasi endpoint predict untuk tracking
@app.post("/predict")
async def predict(request: PredictionRequest):
start_time = time.time()
# ... prediksi seperti biasa ...
prediction = model.predict(features)[0]
probability = model.predict_proba(features)[0].max()
# Record metrics
latency = time.time() - start_time
PREDICTION_LATENCY.observe(latency)
PREDICTION_DISTRIBUTION.labels(
prediction_class=str(prediction)
).inc()
return PredictionResponse(
prediction=int(prediction),
probability=float(probability),
latency_ms=round(latency * 1000, 2)
)
Data Drift Detection
# pip install evidently
from evidently.report import Report
from evidently.metric_preset import DataDriftPreset
import pandas as pd
# Data referensi (data training)
reference_data = pd.DataFrame(X_train, columns=[f"feat_{i}" for i in range(20)])
# Data baru (data production)
production_data = pd.DataFrame(X_test, columns=[f"feat_{i}" for i in range(20)])
# Buat laporan drift
report = Report(metrics=[DataDriftPreset()])
report.run(
reference_data=reference_data,
current_data=production_data
)
# Simpan laporan HTML
report.save_html("drift_report.html")
# Cek apakah ada drift
result = report.as_dict()
is_drift = result['metrics'][0]['result']['dataset_drift']
print(f"Data drift terdeteksi: {is_drift}")
6. CI/CD Pipeline untuk ML
CI/CD (Continuous Integration / Continuous Deployment) untuk ML membutuhkan pendekatan yang berbeda dari software development biasa karena selain kode, kita juga perlu men-version-kan data dan model. Berikut pipeline lengkapnya.
GitHub Actions CI/CD Pipeline
# .github/workflows/ml-pipeline.yml
name: ML Pipeline
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
# ===== STAGE 1: Test =====
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.11'
- name: Install dependencies
run: |
pip install -r requirements.txt
pip install pytest pytest-cov
- name: Run unit tests
run: |
pytest tests/ -v --cov=src --cov-report=xml
- name: Lint check
run: |
pip install ruff
ruff check src/
ruff format --check src/
# ===== STAGE 2: Train & Evaluate =====
train:
needs: test
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.11'
- name: Install dependencies
run: pip install -r requirements.txt
- name: Train model
run: python src/train.py
- name: Evaluate model
run: |
python src/evaluate.py
# Gagal jika akurasi di bawah threshold
python -c "
import json
with open('metrics.json') as f:
m = json.load(f)
assert m['accuracy'] >= 0.85, f'Accuracy {m[\"accuracy\"]} below threshold'
print(f'Accuracy: {m[\"accuracy\"]} β
')
"
- name: Upload model artifact
uses: actions/upload-artifact@v4
with:
name: model
path: model_rf.joblib
# ===== STAGE 3: Build Docker Image =====
build:
needs: train
runs-on: ubuntu-latest
if: github.ref == 'refs/heads/main'
steps:
- uses: actions/checkout@v4
- name: Download model
uses: actions/download-artifact@v4
with:
name: model
- name: Build Docker image
run: |
docker build -t ml-model-api:${{ github.sha }} .
docker tag ml-model-api:${{ github.sha }} ml-model-api:latest
- name: Push to registry
run: |
echo ${{ secrets.DOCKER_PASSWORD }} | docker login -u ${{ secrets.DOCKER_USERNAME }} --password-stdin
docker push ml-model-api:${{ github.sha }}
docker push ml-model-api:latest
# ===== STAGE 4: Deploy =====
deploy:
needs: build
runs-on: ubuntu-latest
if: github.ref == 'refs/heads/main'
environment: production
steps:
- name: Deploy to server
run: |
ssh ${{ secrets.SERVER_USER }}@${{ secrets.SERVER_HOST }} \
"cd /opt/ml-api && \
docker compose pull && \
docker compose up -d --force-recreate"
- name: Health check
run: |
sleep 10
curl -f http://${{ secrets.SERVER_HOST }}:8000/health
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β CI/CD PIPELINE UNTUK ML β β β β ββββββββββββ ββββββββββββ ββββββββββββββββββββ β β β Git Push β β Test β β Train & β β β β (Code) ββββΆβ (pytest)ββββΆβ Evaluate β β β β β β (lint) β β (accuracy β₯ X?) β β β ββββββββββββ ββββββββββββ ββββββββββ¬ββββββββββ β β β β β Pass? ββββββββββ€ β β β β β β βΌ βΌ β β ββββββββββββ ββββββββββββββββ β β β Reject β β Build Docker β β β β β β Image β β β ββββββββββββ ββββββββ¬ββββββββ β β β β β βΌ β β ββββββββββββββββββββ β β β Deploy to β β β β Production β β β ββββββββ¬ββββββββββββ β β β β β βΌ β β ββββββββββββββββββββ β β β Health Check β β β β & Monitor β β β ββββββββββββββββββββ β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
7. Best Practices MLOps
Berikut adalah ringkasan best practices yang harus diikuti saat menerapkan MLOps di proyek Anda:
Checklist MLOps
| Area | Best Practice | Tools |
|---|---|---|
| Version Control | Version kode, data, dan model secara terpisah | Git, DVC, MLflow |
| Experiment Tracking | Catat semua eksperimen dengan parameter dan metrik | MLflow, Weights & Biases |
| Testing | Unit test untuk data pipeline, model, dan API | pytest, great_expectations |
| CI/CD | Automate testing, building, dan deployment | GitHub Actions, GitLab CI |
| Containerization | Package model dalam Docker untuk reproducibility | Docker, Kubernetes |
| Monitoring | Monitor latency, error rate, dan data drift | Prometheus, Grafana, Evidently |
| Model Registry | Central storage untuk model artifacts | MLflow Registry, Vertex AI |
| Feature Store | Konsistensi fitur antara training dan serving | Feast, Tecton |
Jangan langsung mencoba menerapkan semua best practices sekaligus. Mulai dari yang paling penting: (1) version control untuk kode, (2) model serialization yang benar, (3) API yang terstruktur, dan (4) basic monitoring. Tambahkan fitur MLOps lainnya secara bertahap.
8. Quiz: Uji Pemahamanmu!
Setelah membaca tutorial di atas, jawablah 5 pertanyaan berikut untuk menguji pemahamanmu tentang MLOps: