- Pengenalan Transfer Learning
- Mengapa Transfer Learning Penting?
- Feature Extraction: Membekukan Layer Pre-trained
- Fine-Tuning: Menyesuaikan Model ke Dataset Baru
- Arsitektur Populer: VGG, ResNet, EfficientNet
- Implementasi: Transfer Learning dengan PyTorch
- Implementasi: Transfer Learning dengan TensorFlow
- Strategi & Best Practice Transfer Learning
- Quiz Pemahaman
1. Pengenalan Transfer Learning
Transfer Learning adalah teknik deep learning di mana kita menggunakan model yang sudah dilatih pada satu tugas (source task) sebagai titik awal untuk tugas baru (target task). Alih-alih melatih dari nol (from scratch), kita "mentransfer" pengetahuan yang sudah dipelajari oleh model pre-trained.
Analogi Transfer Learning
TRAINING FROM SCRATCH TRANSFER LEARNING ββββββββββββββββββββββββ ββββββββββββββββββββββββ β β β β β Bayi belajar dari β β Orang dewasa yang β β nol: β β sudah bisa mengemudi β β β β mobil belajar β β 1. Kenali warna β β mengemudi truk: β β 2. Kenali bentuk β β β β 3. Kenali objek β β β Sudah tahu aturan β β 4. Kenali hewan β β jalan β β 5. Kucing vs Anjing β β β Sudah tahu gas/remβ β β β β Tinggal sesuaikan β β Butuh TAHUN β β dengan ukuran truk β β untuk belajar β β β β β β Butuh HARI/MINGGU β ββββββββββββββββββββββββ ββββββββββββββββββββββββ Training dari nol = belajar semuanya dari awal Transfer learning = manfaatkan pengetahuan yang sudah ada
Bagaimana Transfer Learning Bekerja?
CNN Pre-trained (ImageNet)
Layer 1-3 (Early/General): Layer 4-7 (Middle): Layer 8+ (Late/Specific):
βββββββββββββββββββββββ βββββββββββββββββββββββ βββββββββββββββββββββββ
β β’ Edge detectors β β β’ Texture patterns β β β’ Object parts β
β β’ Color gradients β β β’ Shapes & corners β β β’ Specific objects β
β β’ Simple patterns β β β’ Complex patterns β β β’ Task-specific β
β β β β β features β
β SANGAT UMUM: β β CUKUP UMUM: β β SANGAT SPESIFIK: β
β Berguna untuk β β Berguna untuk β β Hanya untuk β
β HAMPIR SEMUA β β banyak tugas CV β β tugas asli β
β tugas Computer β β β β (ImageNet 1000 β
β Vision β β β β class) β
βββββββββββββββββββββββ βββββββββββββββββββββββ βββββββββββββββββββββββ
β BEKUKAN layer ini β Bisa freeze atau β GANTI dengan layer
untuk feature extraction fine-tune (lr kecil) baru untuk tugas kita
2. Mengapa Transfer Learning Penting?
| Keuntungan | Penjelasan | Dampak |
|---|---|---|
| Lebih Sedikit Data | Model sudah "tahu" fitur umum, tidak perlu jutaan data | Bisa pakai ratusan atau ribuan gambar saja |
| Training Lebih Cepat | Tidak perlu training dari nol, hanya menyesuaikan layer akhir | Dari minggu ke jam |
| Performa Lebih Baik | Pre-trained features biasanya lebih baik dari random init | Akurasi lebih tinggi, especially data kecil |
| Hemat Sumber Daya | Tidak perlu GPU cluster untuk training | Bisa dilatih di laptop/GPU tunggal |
| Domain Adaptation | Pengetahuan dari domain besar bisa diadaptasi ke domain spesifik | Medical imaging, satelit, dll |
Kapan Menggunakan Transfer Learning?
Apakah tugas Anda mirip dengan
model pre-trained?
β
βββββ Ya βββΌββ Tidak βββββ
βΌ βΌ
βββββββββββββββββββββ βββββββββββββββββββββββββ
β Data Anda BANYAK β β Data Anda SEDIKIT β
β atau SEDIKIT? β β β Training from scratchβ
ββββββββββ¬βββββββββββ β (atau cari model β
β β yang lebih cocok) β
ββββ Ya βΌββ Tidak βββ βββββββββββββββββββββββββ
βΌ βΌ
ββββββββββββ ββββββββββββββββ
βFine-tune β βFeature β
βSEMUA β βExtraction β
βlayer β β(freeze base, β
β(lr kecil)β βtrain head) β
ββββββββββββ ββββββββββββββββ
Rule of Thumb:
β’ Data sedikit + mirip β Feature extraction (freeze semua, train head)
β’ Data banyak + mirip β Fine-tune atas (unfreeze layer terakhir)
β’ Data banyak + beda β Fine-tune semua (lr sangat kecil)
β’ Data sedikit + beda β Cari pre-trained model lain
3. Feature Extraction: Membekukan Layer Pre-trained
Feature Extraction adalah strategi di mana kita menggunakan pre-trained model sebagai fixed feature extractor. Semua layer convolutional dibekukan (frozen) β weight-nya tidak diperbarui selama training. Kita hanya melatih layer classifier baru di akhir (disebut "head" atau "top").
βββββββββββββββββββββββββββββββββββββββββββββββββββ β FEATURE EXTRACTION β β β β Pre-trained Model (ResNet50 di ImageNet) β β ββββββββββββββββββββββββββββββββββββββββ β β β Conv Layers βοΈ FROZEN (beku) β β β β (weight tidak diupdate) β β β β β β β β Input β [ConvβReLUβPool] Γ N β feat β β β ββββββββββββββββ¬ββββββββββββββββββββββββ β β β features β β βΌ β β ββββββββββββββββββββββββββββββββββββββββ β β β New Classifier Head π TRAINABLE β β β β β β β β GlobalAvgPool β FC(2048,512) β ReLU β β β β β Dropout(0.5) β FC(512, num_classes)β β β ββββββββββββββββββββββββββββββββββββββββ β β β β Hanya head yang dilatih β Training SANGAT CEPAT β βββββββββββββββββββββββββββββββββββββββββββββββββββ
4. Fine-Tuning: Menyesuaikan Model ke Dataset Baru
Fine-Tuning melangkah lebih jauh dari feature extraction β kita melatih ulang sebagian atau seluruh layer pre-trained model, tapi dengan learning rate yang sangat kecil agar perubahan weight tidak merusak pengetahuan yang sudah dipelajari.
STRATEGY 1: Fine-Tune Atas STRATEGY 2: Fine-Tune Semua (Unfreeze layer terakhir) (Unfreeze semua, lr berbeda) ββββββββββββββββββββββββββ ββββββββββββββββββββββββββ β Conv Layers 1-4 βοΈ β β Conv Layers 1-4 π LR=1e-5β β Conv Layers 5-6 π β β Conv Layers 5-6 π LR=1e-4β β (LR = 1e-4) β β New Head π LR=1e-3 β β New Head π LR=1e-3 β β β ββββββββββββββββββββββββββ ββββββββββββββββββββββββββ Cocok untuk: Cocok untuk: β’ Data sedikit-menengah β’ Data BANYAK β’ Tugas cukup mirip β’ Tugas cukup beda β’ Training lebih cepat β’ Performa lebih tinggi
Differential Learning Rates
Saat fine-tuning, gunakan learning rate berbeda untuk setiap kelompok layer:
- Layer awal (general features): LR sangat kecil (1e-5 atau kurang)
- Layer tengah (mid-level features): LR kecil (1e-4)
- Layer akhir (task-specific) dan head baru: LR lebih besar (1e-3)
Alasan: Layer awal sudah sangat umum dan tidak perlu banyak perubahan. Layer akhir perlu lebih banyak adaptasi ke tugas baru.
5. Arsitektur Populer: VGG, ResNet, EfficientNet
Evolusi Arsitektur CNN
| Model | Tahun | Keunggulan | Parameters | ImageNet Top-5 |
|---|---|---|---|---|
| AlexNet | 2012 | Membuktikan deep learning works | 60M | 84.7% |
| VGG-16/19 | 2014 | Arsitektur sederhana, stack 3Γ3 conv | 138M | 92.7% |
| GoogLeNet/Inception | 2014 | Inception module (multi-scale) | 6.8M | 93.3% |
| ResNet-50/101/152 | 2015 | Skip connections (residual) | 25.6M | 96.4% |
| DenseNet | 2017 | Dense connections semua layer | 8M-20M | 94.5% |
| EfficientNet | 2019 | Compound scaling (widthΓdepthΓresolution) | 5.3M-66M | 97.1% |
| Vision Transformer (ViT) | 2020 | Transformer untuk vision | 86M+ | 97.7% |
ResNet: Skip Connections
Plain CNN (tanpa skip) ResNet (dengan skip connection)
Input βββΊ [ConvβReLU] βββΊ Output Input βββ¬βββΊ [ConvβReLU] βββ
β β
Identity βββββββββββββββΌβββΊ (+) β Output
β
Masalah: Vanishing gradient Keuntungan:
pada network sangat dalam Gradient bisa mengalir langsung
melalui skip connection
F(x) = H(x) - x H(x) = F(x) + x
(network belajar RESIDUAL) (network belajar residual + input)
6. Implementasi: Transfer Learning dengan PyTorch
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import models, transforms, datasets
from torch.utils.data import DataLoader
import time
import copy
# === 1. PERSIAPAN DATA ===
# Transformasi untuk training (dengan augmentasi)
train_transform = transforms.Compose([
transforms.RandomResizedCrop(224),
transforms.RandomHorizontalFlip(),
transforms.RandomRotation(15),
transforms.ColorJitter(brightness=0.2, contrast=0.2),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], # ImageNet mean
[0.229, 0.224, 0.225]) # ImageNet std
])
# Transformasi untuk validasi (tanpa augmentasi)
val_transform = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406],
[0.229, 0.224, 0.225])
])
# Load dataset (ganti dengan path dataset Anda)
# train_dataset = datasets.ImageFolder('data/train', train_transform)
# val_dataset = datasets.ImageFolder('data/val', val_transform)
# train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
# val_loader = DataLoader(val_dataset, batch_size=32)
# === 2. LOAD PRE-TRAINED MODEL ===
model = models.resnet50(weights=models.ResNet50_Weights.IMAGENET1K_V2)
print("=== Arsitektur ResNet50 ===")
print(model)
# === 3. FEATURE EXTRACTION (Strategy 1: Freeze semua, train head) ===
# Bekukan semua parameter
for param in model.parameters():
param.requires_grad = False
# Ganti classifier head untuk 10 kelas
num_classes = 10
model.fc = nn.Sequential(
nn.Linear(model.fc.in_features, 512),
nn.ReLU(),
nn.Dropout(0.3),
nn.Linear(512, num_classes)
)
# Hitung parameter yang trainable
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
total_params = sum(p.numel() for p in model.parameters())
print(f"\n=== Feature Extraction Mode ===")
print(f"Total parameters: {total_params:,}")
print(f"Trainable parameters: {trainable_params:,}")
print(f"Frozen parameters: {total_params - trainable_params:,}")
print(f"Trainable %: {trainable_params/total_params*100:.1f}%")
# === 4. FINE-TUNING (Strategy 2: Unfreeze beberapa layer) ===
def unfreeze_layers(model, num_layers_to_unfreeze):
"""Unfreeze layer terakhir dari ResNet"""
# Freeze semua dulu
for param in model.parameters():
param.requires_grad = False
# Unfreeze layer4 dan fc
children = list(model.children())
layers_to_unfreeze = children[-num_layers_to_unfreeze:]
for layer in layers_to_unfreeze:
for param in layer.parameters():
param.requires_grad = True
trainable = sum(p.numel() for p in model.parameters() if p.requires_grad)
total = sum(p.numel() for p in model.parameters())
print(f"\nUnfroze {num_layers_to_unfreeze} layers")
print(f"Trainable: {trainable:,} / {total:,} ({trainable/total*100:.1f}%)")
# Unfreeze 2 layer terakhir (layer4 + fc)
unfreeze_layers(model, 2)
# === 5. DIFFERENTIAL LEARNING RATES ===
# Layer awal β lr kecil, layer akhir β lr lebih besar
optimizer = optim.Adam([
{'params': model.layer4.parameters(), 'lr': 1e-4},
{'params': model.fc.parameters(), 'lr': 1e-3},
], weight_decay=1e-4)
# Scheduler: kurangi lr saat training plateau
scheduler = optim.lr_scheduler.ReduceLROnPlateau(
optimizer, mode='min', factor=0.5, patience=3, verbose=True
)
criterion = nn.CrossEntropyLoss()
# === 6. TRAINING LOOP ===
def train_model(model, criterion, optimizer, scheduler, num_epochs=25):
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = model.to(device)
best_model_wts = copy.deepcopy(model.state_dict())
best_acc = 0.0
for epoch in range(num_epochs):
print(f'\nEpoch {epoch+1}/{num_epochs}')
print('-' * 40)
for phase in ['train', 'val']:
if phase == 'train':
model.train()
dataloader = train_loader
else:
model.eval()
dataloader = val_loader
running_loss = 0.0
running_corrects = 0
for inputs, labels in dataloader:
inputs = inputs.to(device)
labels = labels.to(device)
optimizer.zero_grad()
with torch.set_grad_enabled(phase == 'train'):
outputs = model(inputs)
_, preds = torch.max(outputs, 1)
loss = criterion(outputs, labels)
if phase == 'train':
loss.backward()
torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
optimizer.step()
running_loss += loss.item() * inputs.size(0)
running_corrects += torch.sum(preds == labels.data)
epoch_loss = running_loss / len(dataloader.dataset)
epoch_acc = running_corrects.double() / len(dataloader.dataset)
print(f'{phase} Loss: {epoch_loss:.4f} Acc: {epoch_acc:.4f}')
if phase == 'val':
scheduler.step(epoch_loss)
if epoch_acc > best_acc:
best_acc = epoch_acc
best_model_wts = copy.deepcopy(model.state_dict())
print(f'\nBest val Acc: {best_acc:.4f}')
model.load_state_dict(best_model_wts)
return model
# Jalankan training (uncomment jika data tersedia)
# model = train_model(model, criterion, optimizer, scheduler, num_epochs=20)
# === 7. PREDIKSI ===
def predict_image(model, image_path, transform):
from PIL import Image
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.eval()
image = Image.open(image_path).convert('RGB')
image = transform(image).unsqueeze(0).to(device)
with torch.no_grad():
output = model(image)
probabilities = torch.nn.functional.softmax(output[0], dim=0)
predicted_class = torch.argmax(probabilities).item()
confidence = probabilities[predicted_class].item()
return predicted_class, confidence, probabilities
print("\nβ Transfer Learning Pipeline Siap!")
print(" Uncomment fungsi train_model() dan siapkan dataset untuk mulai training.")
7. Implementasi: Transfer Learning dengan TensorFlow/Keras
import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras.applications import EfficientNetB0
from tensorflow.keras.preprocessing import image_dataset_from_directory
# === 1. LOAD DATA ===
IMG_SIZE = 224
BATCH_SIZE = 32
# Load dataset (ganti dengan path dataset Anda)
# train_ds = image_dataset_from_directory(
# 'data/train', image_size=(IMG_SIZE, IMG_SIZE),
# batch_size=BATCH_SIZE, validation_split=0.2,
# subset='training', seed=42
# )
# val_ds = image_dataset_from_directory(
# 'data/train', image_size=(IMG_SIZE, IMG_SIZE),
# batch_size=BATCH_SIZE, validation_split=0.2,
# subset='validation', seed=42
# )
NUM_CLASSES = 10
# === 2. DATA AUGMENTATION ===
data_augmentation = tf.keras.Sequential([
layers.RandomFlip("horizontal"),
layers.RandomRotation(0.1),
layers.RandomZoom(0.1),
layers.RandomContrast(0.1),
])
# === 3. LOAD PRE-TRAINED EfficientNetB0 ===
base_model = EfficientNetB0(
include_top=False, # Tanpa classifier head
weights='imagenet', # Pre-trained di ImageNet
input_shape=(IMG_SIZE, IMG_SIZE, 3)
)
# === 4. FEATURE EXTRACTION (freeze base model) ===
base_model.trainable = False
# Buat model baru
inputs = layers.Input(shape=(IMG_SIZE, IMG_SIZE, 3))
x = data_augmentation(inputs) # Augmentasi
x = base_model(x, training=False) # Feature extraction
x = layers.GlobalAveragePooling2D()(x) # Pooling
x = layers.Dropout(0.3)(x) # Regularisasi
x = layers.Dense(256, activation='relu')(x) # Hidden layer
x = layers.Dropout(0.2)(x)
outputs = layers.Dense(NUM_CLASSES, activation='softmax')(x)
model = models.Model(inputs, outputs)
print("=== Model Summary (Feature Extraction) ===")
model.summary()
# === 5. TRAIN FEATURE EXTRACTION ===
model.compile(
optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),
loss='sparse_categorical_crossentropy',
metrics=['accuracy']
)
# history1 = model.fit(train_ds, validation_data=val_ds, epochs=10)
# === 6. FINE-TUNING (unfreeze beberapa layer) ===
base_model.trainable = True
# Freeze semua kecuali 30 layer terakhir
for layer in base_model.layers[:-30]:
layer.trainable = False
print(f"\nTrainable layers: {sum(1 for l in model.layers if l.trainable)}")
# Compile ulang dengan learning rate LEBIH KECIL
model.compile(
optimizer=tf.keras.optimizers.Adam(learning_rate=1e-5), # LR kecil!
loss='sparse_categorical_crossentropy',
metrics=['accuracy']
)
# history2 = model.fit(train_ds, validation_data=val_ds, epochs=20)
# === 7. LEARNING RATE SCHEDULER ===
lr_scheduler = tf.keras.callbacks.ReduceLROnPlateau(
monitor='val_loss', factor=0.5, patience=3, min_lr=1e-7
)
early_stopping = tf.keras.callbacks.EarlyStopping(
monitor='val_loss', patience=7, restore_best_weights=True
)
# history2 = model.fit(
# train_ds, validation_data=val_ds, epochs=50,
# callbacks=[lr_scheduler, early_stopping]
# )
print("\nβ TensorFlow Transfer Learning Pipeline Siap!")
8. Strategi & Best Practice Transfer Learning
Tips Memilih Pre-trained Model
| Faktor | Pertimbangan | Rekomendasi |
|---|---|---|
| Ukuran Dataset | Dataset kecil (<1000) vs besar (>10K) | Kecil β model kecil (MobileNet), Besar β model besar (ResNet152) |
| Resource | Edge device vs server | Edge β MobileNet/EfficientNet-Lite, Server β ResNet/ViT |
| Kecepatan | Real-time vs batch | Real-time β MobileNet, Batch β ResNet/EfficientNet |
| Kemiripan Domain | Dataset source vs target | Mirip β feature extraction, Beda β fine-tune lebih banyak |
Kesalahan Umum Transfer Learning
- Learning rate terlalu besar saat fine-tuning β Merusak pengetahuan pre-trained (catastrophic forgetting)
- Tidak menggunakan data augmentation β Terutama saat dataset kecil, augmentasi sangat penting
- Melupakan normalisasi yang benar β Gunakan mean/std yang SAMA dengan pre-trained model (ImageNet: mean=[0.485,0.456,0.406], std=[0.229,0.224,0.225])
- Langsung unfreeze semua layer β Mulai dengan freeze semua, lalu unfreeze bertahap
- Tidak menggunakan early stopping β Fine-tuning mudah overfit pada dataset kecil
- Menggunakan model terlalu besar untuk dataset kecil β Model besar + data sedikit = overfitting
9. Quiz Pemahaman
π§ Quiz: Transfer Learning
1. Apa itu Transfer Learning?
2. Dalam Feature Extraction, apa yang terjadi pada layer pre-trained?
3. Mengapa learning rate harus sangat kecil saat fine-tuning?
4. Strategi mana yang TEPAT untuk dataset kecil dengan tugas mirip ImageNet?
5. Apa fungsi skip connection di ResNet?
- Transfer Learning menggunakan pengetahuan model pre-trained untuk tugas baru
- Feature Extraction = freeze base model, hanya train classifier head
- Fine-Tuning = unfreeze sebagian/seluruh layer dengan learning rate kecil
- Differential Learning Rates = layer awal lr kecil, layer akhir lr lebih besar
- ResNet mengatasi vanishing gradient dengan skip connections
- EfficientNet mengoptimalkan width, depth, dan resolution secara bersamaan
- Untuk dataset kecil: feature extraction dulu, baru fine-tune bertahap