Computer Vision: Object Detection — YOLO, R-CNN, SSD

📋 Daftar Isi

Pengenalan Object Detection
IoU & Non-Maximum Suppression (NMS)
Anchor Boxes & Bounding Box Regression
R-CNN Family: R-CNN → Fast R-CNN → Faster R-CNN
YOLO: You Only Look Once
SSD: Single Shot Detector
mAP: Mean Average Precision
Implementasi: Object Detection dengan YOLOv8
Quiz Pemahaman

1. Pengenalan Object Detection

Object Detection adalah tugas computer vision yang bertujuan untuk tidak hanya mengenali apa objek dalam gambar (klasifikasi), tetapi juga di mana objek tersebut berada (lokalisasi). Outputnya berupa bounding box (kotak pembatas) beserta label kelas dan confidence score untuk setiap objek yang terdeteksi.

Perbedaan Tugas Computer Vision

Diagram: Classification vs Detection vs Segmentation

  1. IMAGE                2. OBJECT               3. SEMANTIC              4. INSTANCE
     CLASSIFICATION          DETECTION               SEGMENTATION            SEGMENTATION
  
  ┌──────────────┐     ┌──────────────┐     ┌──────────────┐     ┌──────────────┐
  │   ┌──────┐   │     │ ┌─[🐱 0.95]  │     │ ┌▓▓▓▓▓▓▓▓▓▓┐ │     │ ┌▒▒▒▒▒▒▒▒▒▒┐ │
  │   │      │   │     │ │ ╔════╗     │     │ │▓▓▓▓▓▓▓▓▓▓│ │     │ │▒▒▒▒▒▒▒▒▒▒│ │
  │   │ 🐱🐕 │   │     │ │ ║ 🐱 ║     │     │ │▓▓▓▓▓▓▓▓▓▓│ │     │ │▒▒▒▒▒▒▒▒▒▒│ │
  │   │      │   │     │ │ ╚════╝     │     │ │            │ │     │ │          │ │
  │   └──────┘   │     │ └────────────│     │ ┌──────────┐ │     │ ┌░░░░░░░░░░┐ │
  │   Label: ??? │     │ ┌─[🐕 0.87]  │     │ │░░░░░░░░░░│ │     │ │░░░░░░░░░░│ │
  │              │     │ │ ╔════╗     │     │ │░░░░░░░░░░│ │     │ │░░░░░░░░░░│ │
  └──────────────┘     │ │ ║ 🐶 ║     │     │ └░░░░░░░░░░┘ │     │ └──────────┘ │
                       │ │ ╚════╝     │     └──────────────┘     └──────────────┘
  Output: "kucing"     │ └────────────│     Label per pixel:     Label + instance
  (satu label)         └──────────────┘     ▓=kucing, ░=anjing   terpisah per objek
  
  □ = Bounding box       Paling populer       Lebih detail        Per-pixel + per-
                           untuk deteksi       tapi lebih          objek terpisah
                                               kompleks

Aplikasi Object Detection

Aplikasi	Deskripsi	Contoh
Mobil Otonom	Mendeteksi pejalan kaki, kendaraan, rambu lalu lintas	Tesla Autopilot, Waymo
Surveillance / Keamanan	Mendeteksi orang, aktivitas mencurigakan	CCTV pintar, intruder detection
Medical Imaging	Mendeteksi tumor, lesi, sel abnormal	Deteksi kanker dari CT scan
E-commerce	Visual product search	Cari produk dari foto
Manufaktur	Quality control, defect detection	Deteksi cacat produk di jalur produksi
Pertanian	Deteksi hama, penyakit tanaman	Drone monitoring pertanian
AR / VR	Mendeteksi objek real-time untuk augmented reality	Google Lens, Snapchat filter

2. IoU & Non-Maximum Suppression (NMS)

Intersection over Union (IoU)

IoU adalah metrik untuk mengukur seberapa tumpang tindih antara dua bounding box. IoU digunakan untuk mengevaluasi apakah deteksi benar (true positive) atau salah (false positive).

Diagram: IoU (Intersection over Union)

  IoU = Area of Intersection / Area of Union
  
  IoU = 0 (Tidak overlap)     IoU = 0.5 (Threshold umum)    IoU = 1.0 (Perfect)
  ┌────────┐ ┌────────┐      ┌────────┐                    ┌────────┐
  │ ██████ │ │ ██████ │      │ ███████│█                   │ ██████ │
  │ ██████ │ │ ██████ │      │ ███████│█                   │ ██████ │
  │ ██████ │ │ ██████ │      │ ███████│█                   │ ██████ │
  └────────┘ └────────┘      └────────┘                    └────────┘
  
  Rumus IoU:
  ┌──────────────────────────────────────────────────────────┐
  │          Area(A ∩ B)                                      │
  │ IoU = ─────────────────                                   │
  │        Area(A ∪ B)                                        │
  │                                                           │
  │        Area(A ∩ B)                                        │
  │      = ─────────────────────────                           │
  │        Area(A) + Area(B) - Area(A ∩ B)                    │
  └──────────────────────────────────────────────────────────┘
  
  Kriteria umum:
  IoU ≥ 0.5  → True Positive (TP)   ← PASCAL VOC
  IoU ≥ 0.75 → True Positive (TP)   ← COCO (ketat)
  IoU ≥ 0.5:0.95 → Rata-rata        ← COCO primary metric

Non-Maximum Suppression (NMS)

Object detector sering menghasilkan banyak bounding box tumpang tindih untuk satu objek. NMS adalah algoritma pasca-pemrosesan yang menghilangkan duplikasi dan hanya menyimpan deteksi terbaik.

Diagram: Non-Maximum Suppression

  SEBELUM NMS                        SESUDAH NMS
  
  ┌──────────────────────┐          ┌──────────────────────┐
  │ ┌─[🐱 0.92]          │          │                      │
  │ │╔════════════════╗   │          │  ╔══════════════╗    │
  │ │║  ┌─[🐱 0.89]   ║   │          │  ║ [🐱 0.92]   ║    │
  │ │║  │ ┌─[🐱 0.85] ║   │          │  ║  🐱          ║    │
  │ │║  │ │  🐱       ║   │    ──►   │  ║              ║    │
  │ │║  │ └─────────── ║   │          │  ╚══════════════╝    │
  │ │║  └───────────────║   │          │                      │
  │ │╚════════════════╝   │          │                      │
  │ └────────────────────── │          │                      │
  │ ┌─[🐱 0.45] (low conf) │          │  (Duplicated boxes   │
  │ │ 🐱                   │          │   removed by NMS)    │
  │ └────────────────────── │          │                      │
  └──────────────────────┘          └──────────────────────┘
  
  Algoritma NMS:
  1. Sort semua deteksi berdasarkan confidence score (descending)
  2. Ambil deteksi dengan score tertinggi → masukkan ke final
  3. Hitung IoU dengan semua deteksi lain yang belum diproses
  4. Hapus semua deteksi yang IoU ≥ threshold (misal 0.5)
  5. Ulangi dari step 2 sampai semua selesai

Python — Implementasi NMS

import numpy as np

def compute_iou(box1, box2):
    """Hitung IoU antara dua bounding box [x1, y1, x2, y2]"""
    x1 = max(box1[0], box2[0])
    y1 = max(box1[1], box2[1])
    x2 = min(box1[2], box2[2])
    y2 = min(box1[3], box2[3])
    
    intersection = max(0, x2 - x1) * max(0, y2 - y1)
    
    area1 = (box1[2] - box1[0]) * (box1[3] - box1[1])
    area2 = (box2[2] - box2[0]) * (box2[3] - box2[1])
    
    union = area1 + area2 - intersection
    
    return intersection / union if union > 0 else 0


def non_max_suppression(boxes, scores, iou_threshold=0.5):
    """
    NMS dari scratch
    
    Args:
        boxes: Array shape (N, 4) — [x1, y1, x2, y2]
        scores: Array shape (N,) — confidence scores
        iou_threshold: IoU threshold untuk suppression
    
    Returns:
        List indices dari boxes yang dipertahankan
    """
    # Sort berdasarkan score (descending)
    sorted_indices = np.argsort(scores)[::-1]
    
    keep = []
    
    while len(sorted_indices) > 0:
        # Ambil box dengan score tertinggi
        current = sorted_indices[0]
        keep.append(current)
        
        if len(sorted_indices) == 1:
            break
        
        # Hitung IoU dengan semua box lain
        remaining = sorted_indices[1:]
        ious = np.array([compute_iou(boxes[current], boxes[i]) for i in remaining])
        
        # Hapus box yang IoU >= threshold (terlalu tumpang tindih)
        mask = ious < iou_threshold
        sorted_indices = remaining[mask]
    
    return keep


# === CONTOH PENGGUNAAN ===
boxes = np.array([
    [100, 100, 200, 200],   # Box A
    [110, 110, 210, 210],   # Box B (tumpang tindih dengan A)
    [105, 105, 205, 205],   # Box C (tumpang tindih dengan A)
    [300, 300, 400, 400],   # Box D (objek berbeda)
    [305, 305, 405, 405],   # Box E (tumpang tindih dengan D)
])

scores = np.array([0.92, 0.89, 0.85, 0.95, 0.88])

print("=== Sebelum NMS ===")
for i, (box, score) in enumerate(zip(boxes, scores)):
    print(f"  Box {i}: {box} | Score: {score:.2f}")

# Jalankan NMS
kept_indices = non_max_suppression(boxes, scores, iou_threshold=0.5)

print(f"\n=== Setelah NMS (threshold=0.5) ===")
for idx in kept_indices:
    print(f"  Box {idx}: {boxes[idx]} | Score: {scores[idx]:.2f}")

# Menggunakan PyTorch NMS (lebih cepat)
import torch
from torchvision.ops import nms as torch_nms

boxes_tensor = torch.FloatTensor(boxes)
scores_tensor = torch.FloatTensor(scores)
kept_torch = torch_nms(boxes_tensor, scores_tensor, iou_threshold=0.5)
print(f"\nPyTorch NMS result: {kept_torch.numpy()}")

3. Anchor Boxes & Bounding Box Regression

Anchor boxes (atau default boxes) adalah kumpulan bounding box referensi dengan berbagai ukuran dan rasio aspek yang di-deploy secara teratur di atas feature map. Detector belajar memprediksi offset dari anchor boxes ini, bukan koordinat absolut.

Diagram: Anchor Boxes

  Gambar Input dengan Anchor Boxes
  
  ┌────────────────────────────────────┐
  │  ┌─────────┐  ┌──────────────┐    │
  │  │ Tall    │  │              │    │    Anchor boxes ditempatkan
  │  │ Anchor  │  │  Wide Anchor │    │    di setiap posisi grid
  │  │ (1:2)   │  │  (2:1)       │    │    dengan berbagai ukuran:
  │  └─────────┘  │              │    │
  │               └──────────────┘    │    • Kecil, sedang, besar
  │  ┌──────────┐                     │    • Rasio: 1:1, 1:2, 2:1
  │  │ Square   │                     │
  │  │ Anchor   │                     │    Setiap anchor memprediksi:
  │  │ (1:1)    │                     │    • Offset dx, dy, dw, dh
  │  └──────────┘                     │    • Class probabilities
  └────────────────────────────────────┘    • Objectness score
  
  Bounding Box Regression:
  ┌──────────────────────────────────────────────────────────┐
  │ Prediksi model bukan koordinat absolut, tapi OFFSET:     │
  │                                                          │
  │   tx = (x - xa) / wa    ← prediksi x relative ke anchor │
  │   ty = (y - ya) / ha    ← prediksi y relative ke anchor │
  │   tw = log(w / wa)      ← prediksi width (log scale)    │
  │   th = log(h / ha)      ← prediksi height (log scale)   │
  │                                                          │
  │ Di mana (xa, ya, wa, ha) = koordinat anchor box          │
  │        (x, y, w, h)     = koordinat ground truth box     │
  └──────────────────────────────────────────────────────────┘

4. R-CNN Family: R-CNN → Fast R-CNN → Faster R-CNN

Keluarga R-CNN (Regions with CNN features) adalah pendekatan two-stage untuk object detection: pertama menghasilkan region proposals (kandidat area objek), lalu mengklasifikasi setiap region.

Evolusi R-CNN

Diagram: Evolusi R-CNN → Faster R-CNN

  R-CNN (2014)                      Fast R-CNN (2015)
  ┌──────────────────────┐          ┌──────────────────────┐
  │ Input Image           │          │ Input Image           │
  │       ↓               │          │       ↓               │
  │ Selective Search      │          │ CNN Backbone          │
  │ (~2000 region props.) │          │ (seluruh gambar)      │
  │       ↓               │          │       ↓               │
  │ CNN per region        │          │ RoI Pooling           │
  │ (SANGAT LAMBAT!)      │          │ (extract per region)  │
  │       ↓               │          │       ↓               │
  │ SVM Classifier        │          │ FC → Class + Box      │
  │       ↓               │          │                       │
  │ BBox Regressor        │          │ 10× lebih cepat!      │
  └──────────────────────┘          └──────────────────────┘
  ~47 detik/gambar                  ~2 detik/gambar
  
  Faster R-CNN (2016)
  ┌──────────────────────────────────────┐
  │ Input Image                           │
  │       ↓                               │
  │ CNN Backbone (shared features)        │
  │       ↓                               │
  │ Region Proposal Network (RPN)         │ ← Diganti dari
  │ (generate proposals dari CNN features)│   Selective Search
  │       ↓                               │   dengan neural network!
  │ RoI Pooling                           │
  │       ↓                               │
  │ FC → Class + Box Refinement           │
  └──────────────────────────────────────┘
  ~0.2 detik/gambar (GPU)

Region Proposal Network (RPN)

Komponen	Fungsi
Anchor Generator	Membuat anchor boxes di setiap posisi feature map
Classification Head	Memprediksi apakah setiap anchor berisi objek (objectness)
Regression Head	Memperbaiki posisi anchor boxes
Proposal Layer	Memilih top-N proposals terbaik, lalu NMS

5. YOLO: You Only Look Once

YOLO (Joseph Redmon, 2016) merevolusi object detection dengan pendekatan one-stage — mendeteksi semua objek dalam satu forward pass. YOLO sangat cepat dan cocok untuk real-time applications.

YOLO vs Two-Stage Detectors

Diagram: Two-Stage vs One-Stage (YOLO)

  TWO-STAGE (Faster R-CNN)          ONE-STAGE (YOLO)
  
  ┌──────────────────────┐          ┌──────────────────────┐
  │ Image                 │          │ Image                 │
  │   ↓                   │          │   ↓                   │
  │ Stage 1: RPN          │          │ Single CNN            │
  │ (Generate proposals)  │          │ (Satu kali forward)   │
  │   ↓                   │          │   ↓                   │
  │ Stage 2: Classify +   │          │ Output langsung:      │
  │ Regress each proposal │          │ • Grid predictions    │
  │   ↓                   │          │ • Classes + Boxes     │
  │ Output                │          │ • Confidence scores   │
  └──────────────────────┘          └──────────────────────┘
  
  ✓ Lebih akurat                    ✓ SANGAT CEPAT (real-time)
  ✗ Lebih lambat                    ✓ Sederhana (end-to-end)
  Cocok: presisi tinggi             ✗ Awalnya kurang akurat
                                    Cocok: real-time, edge devices

Cara Kerja YOLO

Diagram: YOLO Grid System

  YOLO membagi gambar menjadi S×S grid
  Setiap cell mendeteksi B bounding boxes
  
  ┌───┬───┬───┬───┬───┬───┬───┐
  │   │   │   │   │   │   │   │  S=7 (7×7 grid untuk YOLOv1)
  ├───┼───┼───┼───┼───┼───┼───┤
  │   │   │   │   │   │   │   │  Setiap cell memprediksi:
  ├───┼───┼───┼───┼───┼───┼───┤  • B bounding boxes (x,y,w,h,conf)
  │   │   │   │★│   │   │   │  • C class probabilities
  ├───┼───┼───┼───┼───┼───┼───┤
  │   │   │   │   │   │   │   │  Output tensor:
  ├───┼───┼───┼───┼───┼───┼───┤  S × S × (B×5 + C)
  │   │   │   │   │   │   │   │
  ├───┼───┼───┼───┼───┼───┼───┤  Center of object determines
  │   │   │   │   │   │   │   │  which cell is responsible
  └───┴───┴───┴───┴───┴───┴───┘
         ★ = Cell yang bertanggung jawab
             mendeteksi objek

Evolusi YOLO

Versi	Tahun	Keunggulan	Kecepatan	mAP (COCO)
YOLOv1	2016	One-stage pioneer, sangat cepat	45 FPS	63.4%
YOLOv2	2017	Batch norm, anchor boxes, multi-scale	40 FPS	78.6%
YOLOv3	2018	FPN, 3 scale detection, Darknet-53	30 FPS	84.5%
YOLOv4	2020	CSPDarknet, SPP, PANet	62 FPS	87.5%
YOLOv5	2020	PyTorch, auto-anchor, easy deploy	140 FPS	89.2%
YOLOv8	2023	Anchor-free, Ultralytics framework	280 FPS	92.3%
YOLO11	2024	Arch improvements, efficiency gains	300+ FPS	93.1%

📌 Catatan Tentang YOLO Versi

YOLOv1-v3: Dikembangkan oleh Joseph Redmon (Darknet framework)
YOLOv4: Dikembangkan oleh Alexey Bochkovskiy
YOLOv5+: Dikembangkan oleh Ultralytics (Python/PyTorch, paling populer saat ini)
YOLOv8: Versi terbaru Ultralytics — anchor-free, lebih akurat, API sederhana

6. SSD: Single Shot Detector

SSD (Liu et al., 2016) adalah detektor one-stage lainnya yang mendeteksi objek di multiple scale dari berbagai layer feature map. SSD sangat efisien dan menjadi dasar banyak arsitektur modern.

SSD Multi-Scale Detection

Diagram: SSD Multi-Scale Feature Maps

  Input Image (300×300)
         ↓
  ┌──────────────────┐
  │ VGG-16 Backbone  │  Feature maps mengecil →
  │ (sampai conv5_3) │  mendeteksi objek LEBIH BESAR
  └────────┬─────────┘
           ↓
  ┌──────────────────┐
  │ Extra Conv Layers │
  └────────┬─────────┘
           ↓
  ┌──────────────────────────────────────────────┐
  │                                              │
  │  38×38 ← Detect objek KECIL (head, roda)    │
  │      ↓                                       │
  │  19×19 ← Detect objek SEDANG (orang)         │
  │      ↓                                       │
  │  10×10 ← Detect objek SEDANG-BESAR (mobil)   │
  │      ↓                                       │
  │   5×5  ← Detect objek BESAR (truk)           │
  │      ↓                                       │
  │   3×3  ← Detect objek SANGAT BESAR           │
  │      ↓                                       │
  │   1×1  ← Detect objek PALING BESAR           │
  │                                              │
  └──────────────────────────────────────────────┘
  
  Keuntungan multi-scale:
  • Layer awal (resolusi tinggi) → fitur detail untuk objek kecil
  • Layer akhir (resolusi rendah) → fitur semantik untuk objek besar

Perbandingan: YOLO vs SSD vs Faster R-CNN

Aspek	YOLOv8	SSD	Faster R-CNN
Stage	One-stage	One-stage	Two-stage
Kecepatan	★★★ Sangat cepat	★★☆ Cepat	★☆☆ Lambat
Akurasi (mAP)	★★☆ Tinggi	★★☆ Cukup tinggi	★★★ Sangat tinggi
Objek Kecil	★★☆	★★☆ Multi-scale helps	★★★
Real-time?	✅ Ya (30-300+ FPS)	✅ Ya (20-60 FPS)	⚠️ 5-15 FPS
Edge Deployment	✅ Mudah	✅ Cukup mudah	❌ Sulit
Best For	Production, real-time	Mobile/embedded	Research, presisi tinggi

7. mAP: Mean Average Precision

mAP (Mean Average Precision) adalah metrik standar untuk mengevaluasi performa object detection. mAP menggabungkan Precision dan Recall ke dalam satu angka yang komprehensif.

Precision & Recall dalam Object Detection

Diagram: TP, FP, FN dalam Object Detection

  Ground Truth          Prediksi Model          Evaluasi
  ┌──────────────┐     ┌──────────────┐       ┌──────────────────────────────┐
  │              │     │              │       │                              │
  │  ╔═══╗      │     │  ╔═══╗       │       │ Deteksi 1: IoU=0.85 ≥ 0.5   │
  │  ║ 🐱 ║      │     │  ║🐱 ║       │       │ → TRUE POSITIVE (TP) ✅      │
  │  ╚═══╝      │     │  ╚═══╝       │       │                              │
  │              │     │              │       │ Deteksi 2: IoU=0.72 ≥ 0.5   │
  │      ╔═══╗  │     │  ╔═══╗       │       │ → TRUE POSITIVE (TP) ✅      │
  │      ║ 🐶 ║  │     │  ║🐶 ║       │       │                              │
  │      ╚═══╝  │     │  ╚═══╝       │       │ Deteksi 3: Tidak ada         │
  │              │     │  ╔═══╗       │       │ ground truth yang cocok      │
  │              │     │  ║?? ║       │       │ → FALSE POSITIVE (FP) ❌     │
  │              │     │  ╚═══╝       │       │                              │
  └──────────────┘     └──────────────┘       │ Ground truth 🐶 terdeteksi   │
                                               │ → FN = 0                     │
                                               │ (jika tidak terdeteksi = FN) │
                                               └──────────────────────────────┘

  Precision = TP / (TP + FP)    ← "Dari semua deteksi, berapa % yang benar?"
  Recall    = TP / (TP + FN)    ← "Dari semua objek asli, berapa % yang terdeteksi?"

mAP Calculation Steps

📐 Langkah Menghitung mAP

Untuk setiap kelas, hitung Precision-Recall curve:
- Sort semua deteksi berdasarkan confidence (descending)
- Untuk setiap deteksi, tentukan TP atau FP berdasarkan IoU threshold
- Akumulasikan TP dan FP → hitung Precision dan Recall pada setiap titik
Average Precision (AP) = Area di bawah PR curve untuk satu kelas
mAP = Rata-rata AP dari semua kelas: mAP = Σ APₖ / K

COCO mAP: Rata-rata mAP pada IoU threshold 0.50, 0.55, 0.60, ..., 0.95 (10 threshold)

Python — Menghitung mAP dari Scratch

import numpy as np

def compute_ap(recalls, precisions):
    """Hitung Average Precision menggunakan interpolation 11-point"""
    # 11-point interpolation (PASCAL VOC style)
    ap = 0.0
    for t in np.arange(0, 1.1, 0.1):
        precisions_at_recall = precisions[recalls >= t]
        if len(precisions_at_recall) > 0:
            ap += np.max(precisions_at_recall)
    return ap / 11.0


def compute_precision_recall(detections, ground_truths, iou_threshold=0.5):
    """
    Hitung precision-recall curve
    
    detections: list of (confidence, bbox, class)
    ground_truths: list of (bbox, class)
    """
    # Sort detections berdasarkan confidence (descending)
    detections = sorted(detections, key=lambda x: x[0], reverse=True)
    
    tp = np.zeros(len(detections))
    fp = np.zeros(len(detections))
    matched_gt = set()
    
    for i, (conf, det_box, det_class) in enumerate(detections):
        best_iou = 0
        best_gt_idx = -1
        
        for j, (gt_box, gt_class) in enumerate(ground_truths):
            if j in matched_gt or det_class != gt_class:
                continue
            iou = compute_iou(det_box, gt_box)
            if iou > best_iou:
                best_iou = iou
                best_gt_idx = j
        
        if best_iou >= iou_threshold:
            tp[i] = 1
            matched_gt.add(best_gt_idx)
        else:
            fp[i] = 1
    
    # Cumulative TP and FP
    tp_cumsum = np.cumsum(tp)
    fp_cumsum = np.cumsum(fp)
    
    precisions = tp_cumsum / (tp_cumsum + fp_cumsum)
    recalls = tp_cumsum / len(ground_truths)
    
    return recalls, precisions


# === CONTOH PERHITUNGAN mAP ===
# Simulated detections: (confidence, [x1,y1,x2,y2], class)
detections = [
    (0.95, [100, 100, 200, 200], 'cat'),
    (0.88, [300, 300, 400, 400], 'dog'),
    (0.82, [110, 105, 205, 210], 'cat'),  # Duplicate detection
    (0.75, [50, 50, 150, 150], 'cat'),    # False positive
    (0.70, [310, 310, 410, 410], 'dog'),
]

# Ground truth: ([x1,y1,x2,y2], class)
ground_truths = [
    ([105, 105, 205, 205], 'cat'),
    ([305, 305, 405, 405], 'dog'),
    ([500, 500, 600, 600], 'dog'),  # Tidak terdeteksi (FN)
]

# Hitung PR curve
recalls, precisions = compute_precision_recall(detections, ground_truths)

print("=== Precision-Recall Analysis ===")
for i, (r, p) in enumerate(zip(recalls, precisions)):
    print(f"  Deteksi {i+1}: P={p:.3f}, R={r:.3f}")

# Hitung AP
ap = compute_ap(recalls, precisions)
print(f"\nAverage Precision (AP): {ap:.3f}")

# mAP = rata-rata AP semua kelas (di sini hanya 1 kelas contoh)
print(f"mAP@0.5: {ap:.3f}")
print("\nDi konteks nyata, hitung AP untuk setiap kelas, lalu rata-rata = mAP")

8. Implementasi: Object Detection dengan YOLOv8

YOLOv8 dari Ultralytics adalah framework paling populer saat ini untuk object detection. API-nya sangat sederhana — hanya beberapa baris kode untuk training, validasi, dan inference.

Python — YOLOv8 Object Detection dengan Ultralytics

# pip install ultralytics
from ultralytics import YOLO
import cv2
import numpy as np

# === 1. INFERENCE DENGAN PRE-TRAINED MODEL ===
# Load model pre-trained di COCO dataset (80 kelas)
model = YOLO('yolov8n.pt')  # 'n'=nano, 's'=small, 'm'=medium, 'l'=large, 'x'=extra-large

# Inference pada gambar
results = model('https://ultralytics.com/images/bus.jpg')

# Parse results
for result in results:
    boxes = result.boxes          # Bounding boxes
    masks = result.masks          # Segmentation masks (jika ada)
    probs = result.probs          # Classification probabilities
    
    print(f"\n=== Deteksi pada gambar ===")
    print(f"Jumlah objek terdeteksi: {len(boxes)}")
    
    for box in boxes:
        # Koordinat bounding box
        x1, y1, x2, y2 = box.xyxy[0].tolist()    # Format (x1, y1, x2, y2)
        confidence = box.conf[0].item()             # Confidence score
        class_id = int(box.cls[0].item())           # Class ID
        class_name = model.names[class_id]          # Class name
        
        print(f"  {class_name}: {confidence:.2f} "
              f"→ [{x1:.0f}, {y1:.0f}, {x2:.0f}, {y2:.0f}]")

# Simpan hasil dengan bounding box
result_plot = results[0].plot()  # Gambar dengan annotated boxes
cv2.imwrite('detection_result.jpg', result_plot)
print("\n✓ Hasil disimpan: detection_result.jpg")

# === 2. VIDEO DETECTION ===
def detect_video(model, video_path, output_path='output.mp4'):
    """Object detection pada video"""
    cap = cv2.VideoCapture(video_path)
    fps = int(cap.get(cv2.CAP_PROP_FPS))
    w = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    h = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
    
    fourcc = cv2.VideoWriter_fourcc(*'mp4v')
    out = cv2.VideoWriter(output_path, fourcc, fps, (w, h))
    
    frame_count = 0
    while cap.isOpened():
        ret, frame = cap.read()
        if not ret:
            break
        
        # Run YOLOv8 inference
        results = model(frame, verbose=False)
        
        # Draw boxes on frame
        annotated = results[0].plot()
        out.write(annotated)
        
        frame_count += 1
        if frame_count % 30 == 0:
            print(f"  Processed {frame_count} frames...")
    
    cap.release()
    out.release()
    print(f"✓ Video saved: {output_path} ({frame_count} frames)")

# Uncomment untuk proses video:
# detect_video(model, 'input_video.mp4')

# === 3. WEBCAM REAL-TIME DETECTION ===
def detect_webcam(model):
    """Real-time object detection dari webcam"""
    cap = cv2.VideoCapture(0)  # Camera index 0
    
    while True:
        ret, frame = cap.read()
        if not ret:
            break
        
        results = model(frame, verbose=False)
        annotated = results[0].plot()
        
        cv2.imshow('YOLOv8 Real-Time Detection', annotated)
        
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break
    
    cap.release()
    cv2.destroyAllWindows()

# Uncomment untuk webcam:
# detect_webcam(model)

Fine-Tune YOLOv8 pada Custom Dataset

Python — Training YOLOv8 Custom Dataset

from ultralytics import YOLO

# === FINE-TUNE YOLOv8 PADA CUSTOM DATASET ===

# 1. Load pre-trained model (transfer learning)
model = YOLO('yolov8m.pt')  # Start dari pre-trained medium model

# 2. Training dengan custom dataset
# Dataset harus dalam format YOLO:
#   dataset/
#     ├── train/
#     │   ├── images/ (gambar .jpg/.png)
#     │   └── labels/ (label .txt, format: class x_center y_center w h)
#     ├── val/
#     │   ├── images/
#     │   └── labels/
#     └── data.yaml
#
# data.yaml contoh:
#   train: dataset/train/images
#   val: dataset/val/images
#   nc: 3  # jumlah kelas
#   names: ['cat', 'dog', 'bird']  # nama kelas

results = model.train(
    data='dataset/data.yaml',   # Path ke data config
    epochs=100,                 # Jumlah epoch
    imgsz=640,                  # Ukuran gambar input
    batch=16,                   # Batch size
    lr0=0.01,                   # Initial learning rate
    lrf=0.01,                   # Final learning rate (lr0 * lrf)
    warmup_epochs=3,            # Warmup epochs
    optimizer='AdamW',          # Optimizer
    augment=True,               # Data augmentation
    mosaic=1.0,                 # Mosaic augmentation probability
    mixup=0.0,                  # Mixup augmentation
    copy_paste=0.0,             # Copy-paste augmentation
    device=0,                   # GPU device (0, 1, 'cpu')
    project='runs/detect',      # Output directory
    name='custom_model',        # Experiment name
    exist_ok=False,
    pretrained=True,
    verbose=True,
)

# 3. Evaluasi model
metrics = model.val()
print(f"\n=== Hasil Evaluasi ===")
print(f"mAP@50:    {metrics.box.map50:.4f}")
print(f"mAP@50-95: {metrics.box.map:.4f}")
print(f"Precision: {metrics.box.mp:.4f}")
print(f"Recall:    {metrics.box.mr:.4f}")

# 4. Export ke berbagai format
model.export(format='onnx')     # ONNX
model.export(format='torchscript')  # TorchScript
# model.export(format='tflite')  # TensorFlow Lite
# model.export(format='coreml')  # CoreML (iOS)

print("\n✓ Model exported!")
print("  - ONNX: best.onnx")
print("  - TorchScript: best.torchscript")

Menggunakan Pre-trained YOLOv8 untuk Berbagai Tugas

Python — YOLOv8 Multi-Task

from ultralytics import YOLO

# === YOLOv8 UNTUK BERBAGAI TUGAS ===

# 1. OBJECT DETECTION
model_det = YOLO('yolov8n.pt')
results = model_det('image.jpg')

# 2. INSTANCE SEGMENTATION
model_seg = YOLO('yolov8n-seg.pt')
results = model_seg('image.jpg')

# 3. IMAGE CLASSIFICATION
model_cls = YOLO('yolov8n-cls.pt')
results = model_cls('image.jpg')

# 4. POSE ESTIMATION
model_pose = YOLO('yolov8n-pose.pt')
results = model_pose('image.jpg')

# 5. OBB (ORIENTED BOUNDING BOX)
model_obb = YOLO('yolov8n-obb.pt')
results = model_obb('image.jpg')

# === CONTOH: POSE ESTIMATION ===
model_pose = YOLO('yolov8n-pose.pt')
results = model_pose('person.jpg')

for result in results:
    keypoints = result.keypoints
    if keypoints is not None:
        for person_kp in keypoints:
            # 17 COCO keypoints
            # 0=nose, 1=left_eye, 2=right_eye, ...
            # 5=left_shoulder, 6=right_shoulder
            # 11=left_hip, 12=right_hip
            kp_data = person_kp.data[0]  # (17, 3): x, y, confidence
            print(f"Pose keypoints: {kp_data.shape}")
            print(f"  Nose: ({kp_data[0][0]:.0f}, {kp_data[0][1]:.0f})")

print("\n✓ Semua model YOLOv8 siap digunakan!")

9. Quiz Pemahaman

🎯 Ringkasan Artikel

Object Detection = Klasifikasi + Lokalisasi — mendeteksi "apa" dan "di mana"
IoU mengukur overlap antara prediksi dan ground truth
NMS menghilangkan deteksi duplikat/tumpang tindih
Two-stage (Faster R-CNN): lebih akurat tapi lebih lambat
One-stage (YOLO, SSD): sangat cepat, cocok untuk real-time
YOLOv8 (Ultralytics) adalah framework paling populer saat ini — mudah digunakan, sangat cepat, multi-task
mAP adalah metrik standar evaluasi — rata-rata precision pada berbagai recall threshold
Untuk project nyata: mulai dengan YOLOv8 pre-trained, fine-tune pada dataset kustom