AI & Data Science

MLflow untuk Experiment Tracking β€” Runs, Metrics, Model Registry & Deployment

Tutorial lengkap MLflow untuk experiment tracking β€” logging parameters, metrics, artifacts, model registry, comparison, dan deployment model ML ke produksi

1. Pengenalan MLflow

MLflow adalah platform open-source untuk mengelola siklus hidup machine learning. MLflow membantu Anda melacak experiment, mengelola model, dan deploy ke produksi.

Mengapa MLflow Penting?

Bayangkan Anda mencoba 50 kombinasi hyperparameter. Tanpa tracking, Anda lupa mana yang terbaik. MLflow mencatat semuanya secara otomatis.

Diagram: MLflow Components
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                  MLflow COMPONENTS                               β”‚
β”‚                                                                  β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”          β”‚
β”‚  β”‚  Tracking     β”‚  β”‚  Projects    β”‚  β”‚   Models     β”‚          β”‚
β”‚  β”‚              β”‚  β”‚              β”‚  β”‚              β”‚          β”‚
β”‚  β”‚ β€’ Parameters β”‚  β”‚ β€’ Packaging  β”‚  β”‚ β€’ Registry   β”‚          β”‚
β”‚  β”‚ β€’ Metrics    β”‚  β”‚ β€’ Reproduce  β”‚  β”‚ β€’ Versioning β”‚          β”‚
β”‚  β”‚ β€’ Artifacts  β”‚  β”‚ β€’ Sharing    β”‚  β”‚ β€’ Deploy     β”‚          β”‚
β”‚  β”‚ β€’ Tags       β”‚  β”‚ β€’ Conda/Dockerβ”‚ β”‚ β€’ Serve      β”‚          β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜          β”‚
β”‚                                                                  β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                                               β”‚
β”‚  β”‚  Evaluate    β”‚                                               β”‚
β”‚  β”‚              β”‚                                               β”‚
β”‚  β”‚ β€’ Curves     β”‚                                               β”‚
β”‚  β”‚ β€’ Metrics    β”‚                                               β”‚
β”‚  β”‚ β€’ Comparison β”‚                                               β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                               β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

2. Setup & Instalasi

Bash β€” Instalasi MLflow
# Instalasi MLflow
pip install mlflow

# Verifikasi
mlflow --version

# Jalankan MLflow UI
mlflow ui
# Buka http://localhost:5000

# Atau jalankan dengan backend store
mlflow ui --backend-store-uri sqlite:///mlflow.db --port 5000

# Untuk production, gunakan remote server
# mlflow server --backend-store-uri postgresql://... --default-artifact-root s3://...

3. Experiment & Runs

Python β€” MLflow Tracking Basics
# =============================================
# MLflow Experiment & Runs
# =============================================
import mlflow
import mlflow.sklearn

# Set experiment (grouping runs)
mlflow.set_experiment("prediksi-harga-rumah")

# ----- Run 1: Linear Regression -----
with mlflow.start_run(run_name="linear-regression-v1"):
    # Log parameters
    mlflow.log_param("model_type", "LinearRegression")
    mlflow.log_param("fit_intercept", True)
    mlflow.log_param("normalize", False)
    
    # Training
    from sklearn.linear_model import LinearRegression
    model = LinearRegression(fit_intercept=True)
    model.fit(X_train, y_train)
    
    # Evaluate
    from sklearn.metrics import mean_squared_error, r2_score
    predictions = model.predict(X_test)
    mse = mean_squared_error(y_test, predictions)
    r2 = r2_score(y_test, predictions)
    
    # Log metrics
    mlflow.log_metric("mse", mse)
    mlflow.log_metric("rmse", mse ** 0.5)
    mlflow.log_metric("r2", r2)
    
    # Log tags
    mlflow.set_tag("developer", "budi")
    mlflow.set_tag("dataset", "rumah-jakarta-v2")
    
    # Log model
    mlflow.sklearn.log_model(model, "model")
    
    print(f"Run ID: {mlflow.active_run().info.run_id}")
    print(f"MSE: {mse:.4f}, RΒ²: {r2:.4f}")

# ----- Run 2: Random Forest -----
with mlflow.start_run(run_name="random-forest-v1"):
    mlflow.log_param("model_type", "RandomForest")
    mlflow.log_param("n_estimators", 100)
    mlflow.log_param("max_depth", 10)
    
    from sklearn.ensemble import RandomForestRegressor
    rf = RandomForestRegressor(n_estimators=100, max_depth=10, random_state=42)
    rf.fit(X_train, y_train)
    
    predictions = rf.predict(X_test)
    mse = mean_squared_error(y_test, predictions)
    r2 = r2_score(y_test, predictions)
    
    mlflow.log_metric("mse", mse)
    mlflow.log_metric("rmse", mse ** 0.5)
    mlflow.log_metric("r2", r2)
    
    mlflow.sklearn.log_model(rf, "model")

4. Logging Parameters & Metrics

Perbedaan Parameter, Metric, dan Artifact

JenisContohKapan Berubah
Parameterlearning_rate, n_estimatorsInput (sebelum training)
Metricaccuracy, loss, f1Output (selama/after training)
Artifactmodel.pkl, confusion_matrix.pngFile (output training)
Tagdeveloper=budi, status=productionMetadata bebas
Python β€” Logging Metrics dengan Step
# =============================================
# Logging Metrics per Step (Epoch)
# =============================================
import mlflow

with mlflow.start_run(run_name="xgboost-detailed"):
    # Log params
    mlflow.log_params({
        "model": "XGBoost",
        "n_estimators": 200,
        "max_depth": 6,
        "learning_rate": 0.1,
        "subsample": 0.8,
        "colsample_bytree": 0.8
    })
    
    # Training dengan logging per epoch
    for epoch in range(num_epochs):
        model.fit(X_train, y_train, xgb_model=prev_model)
        
        train_loss = evaluate(model, X_train, y_train)
        val_loss = evaluate(model, X_val, y_val)
        val_r2 = r2_score(y_val, model.predict(X_val))
        
        # Log metric dengan step (epoch)
        mlflow.log_metric("train_loss", train_loss, step=epoch)
        mlflow.log_metric("val_loss", val_loss, step=epoch)
        mlflow.log_metric("val_r2", val_r2, step=epoch)
    
    # Log metrics dengan step (epoch)
    mlflow.log_metric("train_loss", train_loss, step=epoch)
    mlflow.log_metric("val_loss", val_loss, step=epoch)
    mlflow.log_metric("val_r2", val_r2, step=epoch)

5. Artifacts & Model Logging

Python β€” Log Artifacts
# =============================================
# Logging Artifacts
# =============================================
import mlflow
import matplotlib.pyplot as plt
import joblib

with mlflow.start_run(run_name="with-artifacts"):
    # ... training code ...
    
    # 1. Save & log plot
    fig, ax = plt.subplots()
    ax.plot(history["loss"], label="Train Loss")
    ax.plot(history["val_loss"], label="Val Loss")
    ax.legend()
    plt.savefig("loss_curve.png")
    mlflow.log_artifact("loss_curve.png", artifact_path="plots")
    
    # 2. Log confusion matrix
    from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
    cm = confusion_matrix(y_test, predictions)
    disp = ConfusionMatrixDisplay(cm)
    disp.plot()
    plt.savefig("confusion_matrix.png")
    mlflow.log_artifact("confusion_matrix.png", artifact_path="plots")
    
    # 3. Log model file
    joblib.dump(model, "model.pkl")
    mlflow.log_artifact("model.pkl", artifact_path="models")
    
    # 4. Log sklearn model (recommended)
    mlflow.sklearn.log_model(
        model, 
        artifact_path="model",
        registered_model_name="harga-rumah-model",
        input_example=X_train[:3],
        signature=mlflow.models.infer_signature(X_train, predictions)
    )
    
    # 5. Log dataset artifact
    mlflow.log_artifact("data/train.csv", artifact_path="datasets")

6. Run Comparison & Visualization

Python β€” Compare Runs Programmatically
# =============================================
# Compare Runs
# =============================================
import mlflow
from mlflow.tracking import MlflowClient

client = MlflowClient()
experiment = client.get_experiment_by_name("prediksi-harga-rumah")

# Search runs
runs = mlflow.search_runs(
    experiment_ids=[experiment.experiment_id],
    filter_string="metrics.r2 > 0.7",
    order_by=["metrics.r2 DESC"],
    max_results=10
)

# Print results
print("Top runs by RΒ²:")
for _, run in runs.iterrows():
    print(f"  {run['run_name']}: RΒ²={run['metrics.r2']:.4f}, "
          f"MSE={run['metrics.mse']:.4f}")

# Get best run
best_run = runs.iloc[0]
best_model_uri = f"runs:/{best_run['run_id']}/model"
print(f"\nBest model: {best_model_uri}")

# Compare parameters
cols_to_compare = [
    "params.model_type", "params.n_estimators", "params.max_depth",
    "metrics.r2", "metrics.mse", "metrics.rmse"
]
print(runs[cols_to_compare].to_string())

# MLflow UI: buka http://localhost:5000
# β†’ Select experiment β†’ Compare runs β†’ Pilih 2+ runs
# β†’ Lihat scatter plot, parallel coordinates, box plot

7. Model Registry

Model Registry adalah centralized hub untuk mengelola lifecycle model: dari staging hingga production.

Python β€” Model Registry
# =============================================
# Model Registry
# =============================================
import mlflow
from mlflow.tracking import MlflowClient

client = MlflowClient()

# 1. Register model (saat log_model)
with mlflow.start_run():
    mlflow.sklearn.log_model(
        model, "model",
        registered_model_name="harga-rumah-predictor"
    )

# 2. Transition model stages
model_name = "harga-rumah-predictor"
latest_version = client.get_latest_versions(model_name, stages=["None"])[0]

# Staging
client.transition_model_version_stage(
    name=model_name,
    version=latest_version.version,
    stage="Staging"
)

# Setelah testing β†’ Production
client.transition_model_version_stage(
    name=model_name,
    version=latest_version.version,
    stage="Production"
)

# 3. Load model berdasarkan stage
staging_model = mlflow.pyfunc.load_model(f"models:/{model_name}/Staging")
production_model = mlflow.pyfunc.load_model(f"models:/{model_name}/Production")

# 4. Add description
client.update_model_version(
    name=model_name,
    version=latest_version.version,
    description="Model Random Forest dengan RΒ²=0.89, di-train pada data Juni 2026"
)

# 5. Archive model lama
for mv in client.search_model_versions(f"name='{model_name}'"):
    if mv.current_stage == "Production" and mv.version != latest_version.version:
        client.transition_model_version_stage(
            name=model_name, version=mv.version, stage="Archived"
        )

8. Deployment

Bash β€” Deploy Model dengan MLflow
# =============================================
# Deployment Options
# =============================================

# 1. Serve model sebagai REST API
mlflow models serve -m "models:/harga-rumah-predictor/Production" -p 5001

# Test endpoint
curl -X POST http://localhost:5001/invocations \
  -H "Content-Type: application/json" \
  -d '{"dataframe_split": {"columns": ["luas", "kamar", "lokasi"], 
       "data": [[120, 3, "jakarta-selatan"]]}}'

# 2. Docker deployment
mlflow models build-docker -m "models:/harga-rumah-predictor/Production" -n "harga-rumah-api"
docker run -p 5001:8080 harga-rumah-api

# 3. Deploy ke cloud (Azure ML, AWS SageMaker, Databricks)
# mlflow deployments create -t sagemaker -m model_name ...

9. Advanced: Autolog & Integration

Python β€” Autolog & Hyperparameter Search
# =============================================
# Autologging (otomatis!)
# =============================================
import mlflow

# Enable autolog untuk sklearn
mlflow.sklearn.autolog()

# Sekarang SEMUA sklearn training otomatis ter-log!
from sklearn.ensemble import RandomForestRegressor
mlflow.set_experiment("autolog-demo")

with mlflow.start_run():
    model = RandomForestRegressor(n_estimators=100, max_depth=10)
    model.fit(X_train, y_train)
    # β†’ Parameters, metrics, model otomatis ter-log!

# Autolog untuk framework lain:
mlflow.pytorch.autolog()     # PyTorch
mlflow.tensorflow.autolog()  # TensorFlow
mlflow.keras.autolog()       # Keras
mlflow.xgboost.autolog()     # XGBoost
mlflow.lightgbm.autolog()    # LightGBM

# =============================================
# Hyperparameter Search + MLflow
# =============================================
from sklearn.model_selection import GridSearchCV

param_grid = {
    "n_estimators": [50, 100, 200],
    "max_depth": [5, 10, 15],
    "min_samples_split": [2, 5, 10]
}

mlflow.set_experiment("grid-search-rf")

with mlflow.start_run(run_name="grid-search"):
    grid = GridSearchCV(
        RandomForestRegressor(random_state=42),
        param_grid, cv=5, scoring="r2", n_jobs=-1
    )
    grid.fit(X_train, y_train)
    
    # Log best results
    mlflow.log_params(grid.best_params_)
    mlflow.log_metric("best_cv_r2", grid.best_score_)
    mlflow.sklearn.log_model(grid.best_estimator_, "best_model")
    
    # Log all results
    for i, params in enumerate(grid.cv_results_["params"]):
        with mlflow.start_run(run_name=f"config-{i}", nested=True):
            mlflow.log_params(params)
            mlflow.log_metric("mean_cv_r2", grid.cv_results_["mean_test_score"][i])

10. Quiz Pemahaman

1. Apa fungsi utama MLflow Tracking?

2. Apa perbedaan parameter dan metric di MLflow?

3. Apa fungsi Model Registry?

4. Apa keunggulan mlflow.sklearn.autolog()?

5. Apa itu artifact di MLflow?

Rangkuman

πŸ“ Poin Penting
  • MLflow Tracking β€” log params, metrics, artifacts untuk setiap run
  • Experiment β€” grouping runs dalam satu project
  • Model Registry β€” kelola lifecycle model: staging β†’ production β†’ archived
  • Autolog β€” logging otomatis untuk sklearn, PyTorch, TensorFlow, dll
  • Deployment β€” serve model via REST API, Docker, atau cloud
  • MLflow UI β€” dashboard visual untuk compare runs