SOC Operations dan Incident Triage

📋 Daftar Isi

Pengenalan SOC
Struktur Tim SOC
Monitoring Workflow
Incident Triage
Alert Prioritization
SIEM Integration
Incident Response
SOC Metrics
SOAR & Automation
Quiz

1. Pengenalan Security Operations Center

Security Operations Center (SOC) adalah pusat komando keamanan siber yang beroperasi 24/7 untuk memantau, mendeteksi, menganalisis, dan merespons insiden keamanan secara real-time. SOC menjadi garda terdepan pertahanan organisasi terhadap serangan siber yang semakin canggih.

Di era di mana serangan siber terjadi setiap 39 detik, memiliki SOC yang efektif bukan lagi kemewahan tetapi kebutuhan. SOC memungkinkan organisasi mendeteksi ancaman sebelum menyebabkan kerusakan signifikan, merespons insiden dalam hitungan menit, dan memenuhi persyaratan compliance seperti PCI DSS, HIPAA, dan ISO 27001.

📋 Apa yang Dipelajari

Struktur dan fungsi tim SOC
Workflow monitoring dan alerting 24/7
Incident triage dan prioritisasi severity
Integration dengan SIEM tools
Incident response dan remediation
SOAR dan automasi proses keamanan

SOC Maturity Levels

Diagram: SOC Maturity Model

┌─────────────────────────────────────────────────────────────┐
│                    SOC MATURITY LEVELS                        │
│                                                              │
│  Level 1: Perimeter Security                                 │
│  ├── Firewall, IDS/IPS                                       │
│  └── Basic log monitoring                                    │
│                                                              │
│  Level 2: Log Management & SIEM                              │
│  ├── Centralized logging                                     │
│  ├── SIEM deployment                                         │
│  └── Basic correlation rules                                 │
│                                                              │
│  Level 3: Threat Detection & Response                        │
│  ├── Advanced analytics                                      │
│  ├── Threat intelligence integration                         │
│  └── Incident response procedures                            │
│                                                              │
│  Level 4: Proactive Hunting                                  │
│  ├── Threat hunting teams                                    │
│  ├── Red team exercises                                      │
│  └── Behavioral analytics                                    │
│                                                              │
│  Level 5: Adaptive & Autonomous                              │
│  ├── AI/ML-powered detection                                 │
│  ├── Automated response (SOAR)                               │
│  └── Continuous improvement                                  │
└─────────────────────────────────────────────────────────────┘

2. Struktur Tim SOC

Tim SOC terdiri dari beberapa level analis dengan tanggung jawab berbeda. Struktur tiered model memastikan insiden ditangani oleh personel dengan keahlian sesuai.

Level	Peran	Tanggung Jawab
Tier 1 — L1	Triage Analyst	Monitoring alert, filter false positive, eskalasi insiden
Tier 2 — L2	Incident Responder	Analisis mendalam, investigasi, containment
Tier 3 — L3	Threat Hunter	Proactive hunting, malware analysis, forensik
Tier 4	SOC Manager	Manajemen tim, reporting, strategi keamanan

Tier 1 — Triage Analyst

Analyst L1 adalah lini pertama pertahanan. Mereka memantau dashboard SIEM, melakukan triage terhadap alert, dan memutuskan apakah perlu eskalasi ke L2. Kemampuan yang dibutuhkan meliputi pemahaman dasar networking, OS, dan keamanan siber.

Checklist — Triage Alert

# =============================================
# Checklist Triage Alert — Tier 1 Analyst
# =============================================

# 1. Verifikasi Alert
#    - Apakah alert valid atau false positive?
#    - Apakah ada konteks tambahan di log?
#    - Apakah ada korelasi dengan alert lain?

# 2. Enrichment Data
#    - Cek IP address di threat intelligence
#    - Verifikasi user yang terlibat
#    - Cek aset yang terdampak

# Contoh: Mengecek IP di abuse.ch
curl -s "https://threatfox-api.abuse.ch/api/v1/" \
  -d '{"query": "search_ioc", "search_term": "192.168.1.100"}'

# Contoh: Lookup domain di VirusTotal
curl --request GET \
  --url "https://www.virustotal.com/api/v3/domains/malicious.com" \
  --header "x-apikey: YOUR_API_KEY"

# 3. Klasifikasi Severity
#    - Critical: Active exploitation, data breach
#    - High: Successful unauthorized access
#    - Medium: Suspicious activity, policy violation
#    - Low: Informational, reconnaissance

# 4. Dokumentasi & Eskalasi
#    - Isi template insiden
#    - Attach semua evidence
#    - Eskalasi ke L2 jika severity >= Medium

3. SOC Monitoring Workflow

Workflow SOC yang efektif mengikuti siklus berulang: Detect → Triage → Investigate → Respond → Recover → Lessons Learned. Setiap tahap memiliki prosedur dan tools yang terstandarisasi.

Data Sources yang Dimonitor

Network Logs — Firewall, router, switch, IDS/IPS
Endpoint Logs — EDR, antivirus, OS event logs
Application Logs — Web server, database, authentication
Cloud Logs — AWS CloudTrail, Azure Activity Log, GCP Audit
Email Logs — Email gateway, anti-spam, DMARC reports
Identity Logs — Active Directory, LDAP, SSO, MFA

Template — Shift Handover

# =============================================
# Template: SOC Shift Handover
# =============================================

# Tanggal & Waktu: 2024-01-15 08:00 WIB
# Shift Sebelumnya: Night Shift (00:00 - 08:00)
# Analis: John Doe

# RINGKASAN SHIFT
# =================================================
# Total Alert Masuk: 342
# Alert Dismissed (FP): 298
# Alert Open (menunggu investigasi): 38
# Insiden Dikonfirmasi: 6

# INSIDEN AKTIF (belum resolved)
# =================================================
# INC-2024-0142: Brute force SSH dari 203.0.113.50
#   Status: Under investigation (L2)
#   Aksi: IP sudah di-block di firewall
#   Next: Cek apakah ada kompromi lain

# INC-2024-0145: Data exfiltration indicator
#   Status: Containment done, eradication pending
#   Aksi: Host WS-FINANCE-05 diisolasi dari network
#   Next: Full disk imaging dan malware analysis

# ALERT MENINGKAT
# =================================================
# Phishing email meningkat 300% — campaign baru
# Brute force ke VPN portal dari 3 IP berbeda
# Unusual DNS queries dari server HR

# ACTION ITEMS UNTUK SHIFT BERIKUTNYA
# =================================================
# 1. Follow up INC-2024-0142 dan 0145
# 2. Monitor phishing campaign — update email rules
# 3. Review VPN access logs 24 jam terakhir
# 4. Update threat intel feeds

4. Incident Triage Framework

Triage adalah proses memutuskan prioritas dan urgensi sebuah alert. Framework yang baik mengurangi waktu respon dan memastikan sumber daya teralokasi tepat.

Severity Matrix

Severity	Dampak	Response Time	Contoh
Critical (P1)	Bisnis terhenti, data breach	< 15 menit	Active ransomware, data exfiltration
High (P2)	Dampak signifikan	< 1 jam	Unauthorized admin access, malware
Medium (P3)	Potensi risiko	< 4 jam	Suspicious process, policy violation
Low (P4)	Informatif	< 24 jam	Failed login, port scan

Triage Decision Tree

Diagram: Triage Decision Tree

┌───────────────────────────────────────┐
│         ALERT MASUK                    │
└─────────────┬─────────────────────────┘
              ▼
┌───────────────────────────────────────┐
│  Apakah ini FALSE POSITIVE?            │
│  (known benign, scheduled scan)        │
└──────┬────────────────┬───────────────┘
       ▼ YES            ▼ NO
┌──────────────┐  ┌─────────────────────┐
│  Dismiss &   │  │  Apakah target ASET │
│  Document    │  │  KRITIS?             │
└──────────────┘  └──┬───────────┬──────┘
                     ▼ YES       ▼ NO
              ┌────────────┐ ┌─────────────┐
              │  P1/P2     │ │  Apakah ada │
              │  Immediate │ │  KOMPROMI?  │
              │  Response  │ │  (exec, C2) │
              └────────────┘ └──┬──────┬───┘
                               ▼ YES  ▼ NO
                        ┌──────────┐ ┌───────┐
                        │  P2      │ │  P3/P4│
                        │  Escalate│ │  Log &│
                        │  to L2   │ │  Track│
                        └──────────┘ └───────┘

SIEM Alert — Brute Force Detection

# =============================================
# Contoh Alert: Brute Force Detection
# =============================================

# Alert Details:
# Name: Multiple Failed Login Attempts
# Source: Active Directory
# Timestamp: 2024-01-15 03:22:45 WIB
# Count: 47 failed attempts in 5 minutes
# Target: DC-FINANCE-01 (Domain Controller)
# Source IP: 10.10.50.23 (Workstation HR)
# Account: administrator

# TRIAGE ANALYSIS
# =================================================

# Step 1: Verify alert validity
# - Check if IP belongs to legitimate scanner
# - Verify with asset inventory
# - Check if there's a maintenance window

# Step 2: Context enrichment
# - User "administrator" is sensitive account
# - DC-FINANCE-01 is critical asset
# - Source workstation belongs to HR department
# - Time 03:22 is outside business hours

# Step 3: Severity determination
# - Target: Critical (Domain Controller)
# - Account: Critical (administrator)
# - Time: Suspicious (after hours)
# - Volume: High (47 attempts)
# => SEVERITY: P1 — CRITICAL

# Step 4: Immediate actions
# 1. Block source IP at firewall
netsh advfirewall firewall add rule name="BLOCK-BF" \
  dir=in action=block remoteip=10.10.50.23

# 2. Disable compromised account
net user administrator /active:no

# 3. Check for successful login
# Query: Did any login succeed from this IP?
index=windows EventCode=4624 IpAddress="10.10.50.23"
  Account_Name="administrator"

# 4. Isolate affected workstation
# (via EDR or network segmentation)

5. Alert Prioritization

Alert fatigue adalah tantangan terbesar SOC. Rata-rata SOC menerima 11,000 alert per hari, dan hanya 1-5% yang memerlukan investigasi nyata.

Strategi Mengurangi Alert Fatigue

Tune correlation rules — Hapus atau modifikasi rule yang menghasilkan terlalu banyak false positive
Whitelist known good — Identifikasi dan whitelist aktivitas benign yang berulang
Context enrichment — Tambahkan konteks seperti asset criticality, user role, dan threat intel
Risk scoring — Hitung risk score berdasarkan multiple faktor untuk prioritisasi otomatis

Python — Risk Scoring Engine

# =============================================
# Risk Scoring Engine untuk Alert Prioritization
# =============================================

class AlertRiskScorer:
    def __init__(self):
        self.asset_weights = {
            "domain_controller": 10,
            "database_server": 9,
            "web_server": 7,
            "workstation": 4,
            "printer": 1
        }
        self.time_multiplier = {
            "business_hours": 1.0,
            "after_hours": 1.3,
            "holiday": 1.5
        }

    def calculate_score(self, alert):
        base_score = 0
        # Factor 1: Asset Criticality (0-30)
        asset_type = alert.get("asset_type", "workstation")
        base_score += self.asset_weights.get(asset_type, 4) * 3
        # Factor 2: Threat Intel Match (0-25)
        if alert.get("threat_intel_match"):
            base_score += 25
        # Factor 3: Attack Confidence (0-20)
        confidence = alert.get("confidence", 0.5)
        base_score += int(confidence * 20)
        # Factor 4: User Risk (0-15)
        if alert.get("privileged_user"):
            base_score += 15
        else:
            base_score += 5
        # Factor 5: Time-based (multiplier)
        time_factor = self.time_multiplier.get(
            alert.get("time_category"), 1.0)
        base_score = int(base_score * time_factor)
        return min(max(base_score, 0), 100)

    def get_priority(self, score):
        if score >= 80: return "P1 — Critical"
        if score >= 60: return "P2 — High"
        if score >= 40: return "P3 — Medium"
        return "P4 — Low"

# Contoh penggunaan
scorer = AlertRiskScorer()
alert = {
    "asset_type": "domain_controller",
    "threat_intel_match": True,
    "confidence": 0.85,
    "privileged_user": True,
    "time_category": "after_hours"
}
score = scorer.calculate_score(alert)
priority = scorer.get_priority(score)
print(f"Risk Score: {score}/100 — {priority}")
# Output: Risk Score: 97/100 — P1 — Critical

6. SIEM Integration

SIEM adalah jantung operasi SOC. Integrasi yang baik memastikan deteksi komprehensif dari semua sumber data.

Contoh Correlation Rule

Sigma Rule — Credential Dumping

# =============================================
# Sigma Rule: Credential Dumping Detection
# =============================================
title: Credential Dumping via LSASS Access
id: a1234567-89ab-cdef-0123-456789abcdef
status: production
description: Detects suspicious access to LSASS process

logsource:
  category: process_access
  product: windows

detection:
  selection:
    TargetImage|endswith: '\lsass.exe'
    GrantedAccess|contains:
      - '0x1010'
      - '0x1410'
      - '0x1438'
  filter:
    SourceImage|endswith:
      - '\wmiprvse.exe'
      - '\taskmgr.exe'
  condition: selection and not filter

falsepositives:
  - Legitimate security tools
  - System processes

level: high
tags:
  - attack.credential_access
  - attack.t1003.001

7. Incident Response

Incident Response (IR) adalah proses terstruktur untuk menangani insiden keamanan. Framework NIST SP 800-61 mendefinisikan 4 fase utama: Preparation, Detection & Analysis, Containment Eradication Recovery, dan Post-Incident Activity.

IR Playbook — Ransomware

# =============================================
# INCIDENT RESPONSE PLAYBOOK: RANSOMWARE
# =============================================

# PHASE 1: DETECTION & TRIAGE (0-30 menit)
# =================================================
# 1. Identifikasi indikator ransomware:
#    - Mass file extension changes
#    - Ransom note files appearing
#    - Unusual process execution
#    - Volume shadow copy deletion
#    - High CPU/disk I/O

# 2. Validasi alert:
#    - Verify file encryption evidence
#    - Check for ransom note content
#    - Identify ransomware variant (ID Ransomware)

# 3. Severity: P1 — CRITICAL

# PHASE 2: CONTAINMENT (30 menit - 2 jam)
# =================================================
# 1. Network isolation:
#    - Disconnect affected hosts from network
#    - Block lateral movement paths
#    - Isolate network segments

# 2. Account security:
#    - Force password reset for compromised accounts
#    - Disable suspicious service accounts
#    - Review and revoke VPN access

# 3. Evidence preservation:
#    - Capture memory dump
#    - Image affected systems
#    - Collect logs from all sources
#    - Document timeline

# PHASE 3: ERADICATION (2-24 jam)
# =================================================
# 1. Malware removal:
#    - Identify all affected systems
#    - Remove malware artifacts
#    - Patch exploited vulnerabilities

# 2. Root cause analysis:
#    - Initial infection vector
#    - Privilege escalation path
#    - Lateral movement methods

# PHASE 4: RECOVERY (1-7 hari)
# =================================================
# 1. System restoration:
#    - Restore from clean backups
#    - Verify system integrity
#    - Rebuild compromised systems

# 2. Monitoring:
#    - Enhanced monitoring for 30 days
#    - Watch for re-infection indicators

# PHASE 5: LESSONS LEARNED (7-14 hari)
# =================================================
# 1. Post-incident review meeting
# 2. Update detection rules
# 3. Improve security controls
# 4. Update this playbook

8. SOC Metrics & KPI

Mengukur kinerja SOC sangat penting untuk perbaikan berkelanjutan dan membuktikan nilai investasi keamanan.

KPI	Deskripsi	Target
MTTD	Mean Time to Detect	< 1 jam
MTTR	Mean Time to Respond	< 4 jam
MTTC	Mean Time to Contain	< 8 jam
False Positive Rate	% alert yang ternyata FP	< 50%
Coverage	% MITRE ATT&CK yang termonitor	> 70%
Escalation Rate	% alert yang perlu eskalasi	5-15%

💡 Tips

Fokus pada MTTD dan MTTR sebagai metrik utama. Kedua metrik ini langsung menggambarkan efektivitas SOC dalam mendeteksi dan merespons ancaman. Targetkan improvement 10-20% setiap quarter.

9. SOAR & Automation

SOAR (Security Orchestration, Automation, and Response) memungkinkan SOC mengotomasi tugas repetitif sehingga analis fokus pada analisis tingkat tinggi.

Python — SOAR Phishing Playbook

# =============================================
# SOAR Playbook: Automated Phishing Response
# =============================================

class PhishingPlaybook:
    def __init__(self, siem, email, edr):
        self.siem = siem
        self.email = email
        self.edr = edr

    def execute(self, alert):
        # Step 1: Extract IOCs
        iocs = self.extract_iocs(alert["email_data"])
        # Step 2: Threat intel lookup
        ti = self.threat_intel_lookup(iocs)
        malicious_score = ti.get("score", 0)

        if malicious_score > 70:
            # Step 3a: Auto remediate
            self.quarantine_email(alert["msg_id"])
            self.block_sender(alert["sender"])
            self.block_urls(iocs["urls"])
            # Step 4: Check link clicks
            affected = self.check_url_clicks(iocs["urls"])
            if affected:
                self.isolate_endpoints(affected)
                self.reset_passwords(affected)
            return {"action": "auto_remediated"}
        else:
            # Step 3b: Escalate to analyst
            self.create_ticket(alert, ti)
            return {"action": "escalated"}

    def extract_iocs(self, email_data):
        import re
        urls = re.findall(
            r'https?://[^\s<>"]+', email_data.get("body", ""))
        return {
            "sender": email_data.get("from"),
            "subject": email_data.get("subject"),
            "urls": urls,
            "attachments": email_data.get("attachments", [])
        }

⚠️ Catatan Penting

Otomasi tidak menggantikan analis manusia. Selalu gunakan human-in-the-loop untuk keputusan high-impact seperti isolasi sistem produksi atau blocking IP range. Otomasi terbaik untuk enrichment, ticketing, dan remediation low-risk.

10. Quiz Pemahaman

Rangkuman

📝 Poin Penting

Tiered Model — SOC menggunakan L1-L4 untuk menangani insiden sesuai severity
Triage — Framework prioritisasi memastikan sumber daya tepat
SIEM — Jantung operasi SOC untuk korelasi dan deteksi
Incident Response — Proses NIST 4 fase: Preparation, Detection, Containment, Lessons Learned
SOAR — Otomasi playbook untuk mengurangi beban analis