1. Pengenalan Time Series
Time Series adalah data yang dikumpulkan secara berurutan dalam rentang waktu tertentu. Time Series Forecasting adalah proses memprediksi nilai masa depan berdasarkan pola historis.
Contoh Aplikasi Time Series
| Domain | Contoh Data | Prediksi |
|---|---|---|
| Finance | Harga saham harian | Harga saham besok |
| Retail | Penjualan bulanan | Demand bulan depan |
| Weather | Suhu harian | Suhu minggu depan |
| Traffic | Pengunjung website per jam | Traffic peak berikutnya |
| Energy | Konsumsi listrik per hari | Kebutuhan energi masa depan |
| Health | Kasus COVID-19 harian | Tren minggu depan |
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ KOMPONEN TIME SERIES โ โ โ โ Y(t) = Trend(t) + Seasonal(t) + Cyclic(t) + Noise(t) โ โ โ โ Trend: Pola naik/turun jangka panjang โ โ โโโโโโโโโโโโโโโโโโถ โ โ โ โ Seasonal: Pola berulang periodik (misal: tiap tahun) โ โ ~~~~/\~~~~\/~~~~~/\~~~~\/~~~~~ โ โ โ โ Cyclic: Fluktuasi tidak periodik (siklus ekonomi) โ โ ~~/\~/\~~~\/~~/\~\/~~ โ โ โ โ Noise: Variasi acak (random) โ โ ..ยท.ยท.ยท..ยท.ยท...ยท.ยท.ยท.. โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
2. Exploratory Data Analysis
# =============================================
# Time Series EDA
# =============================================
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.stattools import adfuller, acf, pacf
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
# Load dataset
df = pd.read_csv("data/penjualan.csv", parse_dates=["tanggal"])
df = df.set_index("tanggal")
df = df.asfreq("D") # Set frekuensi harian
# ----- 1. Basic Statistics -----
print(f"Shape: {df.shape}")
print(f"Date range: {df.index.min()} โ {df.index.max()}")
print(f"Missing values: {df['jumlah'].isna().sum()}")
print(df.describe())
# ----- 2. Time Series Plot -----
fig, axes = plt.subplots(3, 1, figsize=(14, 10))
# Plot utama
axes[0].plot(df.index, df["jumlah"], linewidth=0.8)
axes[0].set_title("Time Series Plot")
axes[0].set_ylabel("Jumlah Penjualan")
# Rolling statistics
rolling_mean = df["jumlah"].rolling(window=30).mean()
rolling_std = df["jumlah"].rolling(window=30).std()
axes[1].plot(df.index, df["jumlah"], alpha=0.5, label="Original")
axes[1].plot(df.index, rolling_mean, color="red", label="Rolling Mean (30d)")
axes[1].plot(df.index, rolling_std, color="green", label="Rolling Std (30d)")
axes[1].legend()
axes[1].set_title("Rolling Statistics")
# Distribution
axes[2].hist(df["jumlah"].dropna(), bins=50, edgecolor="black")
axes[2].set_title("Distribution")
plt.tight_layout()
plt.savefig("eda_timeseries.png", dpi=150)
# ----- 3. ACF & PACF -----
fig, axes = plt.subplots(1, 2, figsize=(14, 4))
plot_acf(df["jumlah"].dropna(), lags=60, ax=axes[0])
plot_pacf(df["jumlah"].dropna(), lags=60, ax=axes[1])
axes[0].set_title("Autocorrelation Function (ACF)")
axes[1].set_title("Partial Autocorrelation Function (PACF)")
plt.tight_layout()
plt.savefig("acf_pacf.png", dpi=150)
# ----- 4. Seasonal Decomposition -----
from statsmodels.tsa.seasonal import seasonal_decompose
decomposition = seasonal_decompose(
df["jumlah"].dropna(),
model="additive", # atau "multiplicative"
period=30 # Periode musiman
)
fig, axes = plt.subplots(4, 1, figsize=(14, 10))
decomposition.observed.plot(ax=axes[0], title="Observed")
decomposition.trend.plot(ax=axes[1], title="Trend")
decomposition.seasonal.plot(ax=axes[2], title="Seasonal")
decomposition.resid.plot(ax=axes[3], title="Residual")
plt.tight_layout()
plt.savefig("decomposition.png", dpi=150)
3. Dekomposisi & Stationarity
Sebelum modeling, data time series harus stationary โ artinya statistik (mean, variance) tidak berubah seiring waktu. Kebanyakan model (ARIMA) memerlukan data stationary.
# =============================================
# Stationarity Test & Differencing
# =============================================
from statsmodels.tsa.stattools import adfuller
# ----- Augmented Dickey-Fuller (ADF) Test -----
def test_stationarity(series, name="Series"):
result = adfuller(series.dropna())
print(f"\n{'='*50}")
print(f"ADF Test: {name}")
print(f"{'='*50}")
print(f"ADF Statistic: {result[0]:.4f}")
print(f"p-value: {result[1]:.4f}")
print(f"Critical Values:")
for key, val in result[4].items():
print(f" {key}: {val:.4f}")
if result[1] < 0.05:
print(f"โ
STATIONARY (p < 0.05)")
else:
print(f"โ NOT STATIONARY (p >= 0.05)")
return result[1] < 0.05
# Test original data
test_stationarity(df["jumlah"], "Original")
# ----- Differencing untuk membuat stationary -----
# First differencing
df["jumlah_diff1"] = df["jumlah"].diff()
test_stationarity(df["jumlah_diff1"], "First Difference")
# Second differencing (jika masih non-stationary)
df["jumlah_diff2"] = df["jumlah"].diff().diff()
test_stationarity(df["jumlah_diff2"], "Second Difference")
# Seasonal differencing (period=12 untuk data bulanan)
df["jumlah_seasonal_diff"] = df["jumlah"].diff(12)
test_stationarity(df["jumlah_seasonal_diff"], "Seasonal Difference")
# Log transform (untuk stabilkan variance)
df["jumlah_log"] = np.log1p(df["jumlah"])
test_stationarity(df["jumlah_log"], "Log Transform")
4. ARIMA & SARIMA
ARIMA (AutoRegressive Integrated Moving Average) adalah model statistik klasik untuk time series forecasting. ARIMA memiliki 3 parameter: p (AR order), d (differencing), q (MA order).
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ ARIMA(p, d, q) โ โ โ โ AR(p): AutoRegressive โ โ Y(t) = c + ฯ1ยทY(t-1) + ฯ2ยทY(t-2) + ... + ฯpยทY(t-p) + ฮต(t) โ โ โ Menggunakan nilai masa lalu untuk prediksi โ โ โ p ditentukan dari PACF plot โ โ โ โ I(d): Integrated (Differencing) โ โ Y'(t) = Y(t) - Y(t-d) โ โ โ Membuat data stationary โ โ โ d = 0, 1, atau 2 (biasanya 1) โ โ โ โ MA(q): Moving Average โ โ Y(t) = c + ฮต(t) + ฮธ1ยทฮต(t-1) + ฮธ2ยทฮต(t-2) + ... + ฮธqยทฮต(t-q) โ โ โ Menggunakan error masa lalu untuk prediksi โ โ โ q ditentukan dari ACF plot โ โ โ โ SARIMA = ARIMA + Seasonal component (P, D, Q, s) โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
# =============================================
# ARIMA & SARIMA Forecasting
# =============================================
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tsa.statespace.sarimax import SARIMAX
import warnings
warnings.filterwarnings("ignore")
# Train-test split (time-based)
train_size = int(len(df) * 0.8)
train = df["jumlah"][:train_size]
test = df["jumlah"][train_size:]
# ----- ARIMA -----
# Auto ARIMA untuk mencari parameter terbaik
# pip install pmdarima
from pmdarima import auto_arima
auto_model = auto_arima(
train,
start_p=0, max_p=5,
start_q=0, max_q=5,
d=None, # Auto determine
seasonal=False,
trace=True, # Print proses pencarian
error_action="ignore",
suppress_warnings=True,
stepwise=True
)
print(f"Best ARIMA: {auto_model.order}")
# Contoh output: Best ARIMA: (2, 1, 1)
# Fit ARIMA
arima_model = ARIMA(train, order=(2, 1, 1))
arima_fit = arima_model.fit()
print(arima_fit.summary())
# Forecast
arima_forecast = arima_fit.forecast(steps=len(test))
arima_forecast.index = test.index
# ----- SARIMA (Seasonal ARIMA) -----
sarima_model = SARIMAX(
train,
order=(1, 1, 1), # (p, d, q)
seasonal_order=(1, 1, 1, 12), # (P, D, Q, s)
enforce_stationarity=False,
enforce_invertibility=False
)
sarima_fit = sarima_model.fit(disp=False)
print(sarima_fit.summary())
sarima_forecast = sarima_fit.forecast(steps=len(test))
sarima_forecast.index = test.index
# Plot hasil
fig, ax = plt.subplots(figsize=(14, 5))
train.plot(ax=ax, label="Train")
test.plot(ax=ax, label="Test")
arima_forecast.plot(ax=ax, label="ARIMA Forecast", linestyle="--")
sarima_forecast.plot(ax=ax, label="SARIMA Forecast", linestyle="--")
ax.legend()
plt.title("ARIMA vs SARIMA Forecast")
plt.savefig("arima_forecast.png", dpi=150)
5. Facebook Prophet
Prophet adalah library dari Meta (Facebook) yang dirancang khusus untuk time series forecasting. Keunggulan Prophet: mudah digunakan, menangani missing data, mendukung holiday effects, dan robust terhadap outlier.
# =============================================
# Facebook Prophet
# =============================================
# pip install prophet
from prophet import Prophet
import pandas as pd
# Prophet memerlukan kolom 'ds' (date) dan 'y' (value)
df_prophet = df.reset_index()
df_prophet = df_prophet.rename(columns={"tanggal": "ds", "jumlah": "y"})
# Train-test split
train_p = df_prophet[:train_size]
test_p = df_prophet[train_size:]
# Inisialisasi model
model = Prophet(
growth="linear", # "linear" atau "logistic"
yearly_seasonality=True, # Pola tahunan
weekly_seasonality=True, # Pola mingguan
daily_seasonality=False, # Pola harian
changepoint_prior_scale=0.05, # Fleksibilitas trend
seasonality_prior_scale=10, # Kekuatan seasonal
holidays_prior_scale=10, # Kekuatan holiday effect
interval_width=0.95, # Confidence interval
)
# Tambahkan custom seasonality (jika perlu)
model.add_seasonality(
name="monthly",
period=30.5,
fourier_order=5
)
# Tambahkan holiday Indonesia
holidays = pd.DataFrame({
"holiday": ["lebaran", "lebaran", "natal", "natal"],
"ds": pd.to_datetime(["2025-03-30", "2026-03-20", "2025-12-25", "2026-12-25"]),
"lower_window": [-3, -3, -1, -1],
"upper_window": [3, 3, 1, 1],
})
model = Prophet(holidays=holidays)
# Fit model
model.fit(train_p)
# Prediksi
future = model.make_future_dataframe(periods=len(test_p), freq="D")
forecast = model.predict(future)
# Komponen forecast
fig1 = model.plot(forecast)
plt.title("Prophet Forecast")
plt.savefig("prophet_forecast.png", dpi=150)
fig2 = model.plot_components(forecast)
plt.savefig("prophet_components.png", dpi=150)
# Ambil prediksi untuk test period
prophet_pred = forecast.iloc[train_size:][["ds", "yhat", "yhat_lower", "yhat_upper"]]
print(prophet_pred.head())
6. LSTM Deep Learning
LSTM (Long Short-Term Memory) adalah jenis RNN yang sangat efektif untuk time series karena bisa menangkap pola jangka panjang (long-term dependencies).
# =============================================
# LSTM Time Series Forecasting
# =============================================
import torch
import torch.nn as nn
from sklearn.preprocessing import MinMaxScaler
# ----- 1. Data Preparation -----
scaler = MinMaxScaler(feature_range=(0, 1))
scaled_data = scaler.fit_transform(df[["jumlah"]].values)
def create_sequences(data, seq_length=30):
"""Buat input sequences untuk LSTM."""
X, y = [], []
for i in range(len(data) - seq_length):
X.append(data[i:i + seq_length])
y.append(data[i + seq_length])
return torch.FloatTensor(np.array(X)), torch.FloatTensor(np.array(y))
SEQ_LENGTH = 30 # Gunakan 30 hari terakhir untuk prediksi
X, y = create_sequences(scaled_data, SEQ_LENGTH)
# Split
train_size_seq = int(len(X) * 0.8)
X_train, X_test = X[:train_size_seq], X[train_size_seq:]
y_train, y_test = y[:train_size_seq], y[train_size_seq:]
# ----- 2. LSTM Model -----
class LSTMModel(nn.Module):
def __init__(self, input_size=1, hidden_size=64, num_layers=2, output_size=1):
super().__init__()
self.lstm = nn.LSTM(
input_size=input_size,
hidden_size=hidden_size,
num_layers=num_layers,
batch_first=True,
dropout=0.2
)
self.fc = nn.Sequential(
nn.Linear(hidden_size, 32),
nn.ReLU(),
nn.Dropout(0.1),
nn.Linear(32, output_size)
)
def forward(self, x):
lstm_out, _ = self.lstm(x)
last_output = lstm_out[:, -1, :] # Ambil output terakhir
return self.fc(last_output)
model_lstm = LSTMModel(input_size=1, hidden_size=64, num_layers=2)
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model_lstm.parameters(), lr=0.001)
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, patience=5)
# ----- 3. Training -----
EPOCHS = 100
BATCH_SIZE = 32
for epoch in range(EPOCHS):
model_lstm.train()
total_loss = 0
for i in range(0, len(X_train), BATCH_SIZE):
batch_X = X_train[i:i+BATCH_SIZE]
batch_y = y_train[i:i+BATCH_SIZE]
optimizer.zero_grad()
output = model_lstm(batch_X)
loss = criterion(output, batch_y)
loss.backward()
optimizer.step()
total_loss += loss.item()
scheduler.step(total_loss)
if (epoch + 1) % 10 == 0:
print(f"Epoch {epoch+1}/{EPOCHS}, Loss: {total_loss:.4f}")
# ----- 4. Forecasting -----
model_lstm.eval()
with torch.no_grad():
predictions = model_lstm(X_test).numpy()
# Inverse transform
predictions = scaler.inverse_transform(predictions)
actuals = scaler.inverse_transform(y_test.numpy())
print(f"Prediksi 5 hari pertama: {predictions[:5].flatten()}")
print(f"Aktual 5 hari pertama: {actuals[:5].flatten()}")
7. Feature Engineering
# =============================================
# Feature Engineering untuk Time Series
# =============================================
import pandas as pd
import numpy as np
def create_time_features(df, target_col="jumlah"):
"""Membuat fitur-fitur dari time series."""
df = df.copy()
# ----- Date-based features -----
df["dayofweek"] = df.index.dayofweek # 0=Senin, 6=Minggu
df["dayofmonth"] = df.index.day
df["dayofyear"] = df.index.dayofyear
df["weekofyear"] = df.index.isocalendar().week.astype(int)
df["month"] = df.index.month
df["quarter"] = df.index.quarter
df["year"] = df.index.year
df["is_weekend"] = (df.index.dayofweek >= 5).astype(int)
df["is_month_start"] = df.index.is_month_start.astype(int)
df["is_month_end"] = df.index.is_month_end.astype(int)
# ----- Lag features -----
for lag in [1, 2, 3, 7, 14, 28]:
df[f"lag_{lag}"] = df[target_col].shift(lag)
# ----- Rolling statistics -----
for window in [7, 14, 30]:
df[f"rolling_mean_{window}"] = df[target_col].rolling(window).mean()
df[f"rolling_std_{window}"] = df[target_col].rolling(window).std()
df[f"rolling_min_{window}"] = df[target_col].rolling(window).min()
df[f"rolling_max_{window}"] = df[target_col].rolling(window).max()
# ----- Expanding features -----
df["expanding_mean"] = df[target_col].expanding().mean()
# ----- Percentage change -----
df["pct_change_1"] = df[target_col].pct_change(1)
df["pct_change_7"] = df[target_col].pct_change(7)
# ----- Difference -----
df["diff_1"] = df[target_col].diff(1)
df["diff_7"] = df[target_col].diff(7)
# ----- Cyclical encoding (sin/cos) -----
df["month_sin"] = np.sin(2 * np.pi * df["month"] / 12)
df["month_cos"] = np.cos(2 * np.pi * df["month"] / 12)
df["day_sin"] = np.sin(2 * np.pi * df["dayofweek"] / 7)
df["day_cos"] = np.cos(2 * np.pi * df["dayofweek"] / 7)
return df.dropna()
df_features = create_time_features(df)
print(f"Features: {df_features.shape[1]} kolom")
print(df_features.columns.tolist())
8. Evaluasi Metrics
| Metric | Formula | Interpretasi |
|---|---|---|
| MAE | mean(|actual - pred|) | Error rata-rata dalam satuan asli |
| MSE | mean((actual - pred)ยฒ) | Penalti besar untuk error besar |
| RMSE | โMSE | MAE tapi lebih sensitif outlier |
| MAPE | mean(|error|/actual) ร 100% | Error dalam persen, mudah diinterpretasi |
| Rยฒ | 1 - (SS_res / SS_tot) | Proporsi variance yang dijelaskan model |
# =============================================
# Evaluasi Forecasting
# =============================================
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
def evaluate_forecast(actual, predicted, model_name="Model"):
mae = mean_absolute_error(actual, predicted)
mse = mean_squared_error(actual, predicted)
rmse = np.sqrt(mse)
mape = np.mean(np.abs((actual - predicted) / actual)) * 100
r2 = r2_score(actual, predicted)
print(f"\n{'='*40}")
print(f"Evaluasi: {model_name}")
print(f"{'='*40}")
print(f"MAE: {mae:.2f}")
print(f"RMSE: {rmse:.2f}")
print(f"MAPE: {mape:.2f}%")
print(f"Rยฒ: {r2:.4f}")
return {"MAE": mae, "RMSE": rmse, "MAPE": mape, "R2": r2}
# Evaluasi semua model
evaluate_forecast(test.values, arima_forecast.values, "ARIMA")
evaluate_forecast(test.values, sarima_forecast.values, "SARIMA")
evaluate_forecast(test_p["y"].values, prophet_pred["yhat"].values, "Prophet")
9. Deployment
# =============================================
# Deploy Forecasting API
# =============================================
from fastapi import FastAPI
from pydantic import BaseModel
import pickle
app = FastAPI()
# Load model
with open("model_prophet.pkl", "rb") as f:
model = pickle.load(f)
class ForecastRequest(BaseModel):
periods: int = 30 # Berapa hari ke depan
freq: str = "D" # Frekuensi
@app.post("/forecast")
async def forecast(req: ForecastRequest):
future = model.make_future_dataframe(periods=req.periods, freq=req.freq)
pred = model.predict(future)
result = pred[["ds", "yhat", "yhat_lower", "yhat_upper"]].tail(req.periods)
return {
"forecast": result.to_dict(orient="records"),
"model": "Prophet"
}
@app.get("/health")
async def health():
return {"status": "ok"}
# Run: uvicorn app:app --port 8000
10. Quiz Pemahaman
1. Apa arti stationarity dalam time series?
2. Parameter apa saja yang membentuk ARIMA?
3. Keunggulan utama Prophet dibanding ARIMA?
4. Mengapa lag features penting dalam time series forecasting?
5. Metric apa yang paling mudah diinterpretasi untuk bisnis?
Rangkuman
- Time Series โ data berurutan waktu dengan trend, seasonal, dan noise
- Stationarity โ syarat penting untuk ARIMA, gunakan ADF test
- ARIMA โ model statistik klasik, parameter (p,d,q)
- Prophet โ mudah, robust, mendukung holiday & seasonality
- LSTM โ deep learning untuk pola kompleks, butuh data banyak
- Feature Engineering โ lag, rolling stats, cyclical encoding
- Evaluasi โ MAE, RMSE, MAPE, Rยฒ