Keamanan

SOC Operations dan Incident Triage

Panduan lengkap membangun dan mengoperasikan Security Operations Center β€” dari struktur tim, monitoring, triage, hingga incident response

1. Pengenalan Security Operations Center

Security Operations Center (SOC) adalah pusat komando keamanan siber yang beroperasi 24/7 untuk memantau, mendeteksi, menganalisis, dan merespons insiden keamanan secara real-time. SOC menjadi garda terdepan pertahanan organisasi terhadap serangan siber yang semakin canggih.

Di era di mana serangan siber terjadi setiap 39 detik, memiliki SOC yang efektif bukan lagi kemewahan tetapi kebutuhan. SOC memungkinkan organisasi mendeteksi ancaman sebelum menyebabkan kerusakan signifikan, merespons insiden dalam hitungan menit, dan memenuhi persyaratan compliance seperti PCI DSS, HIPAA, dan ISO 27001.

πŸ“‹ Apa yang Dipelajari
  • Struktur dan fungsi tim SOC
  • Workflow monitoring dan alerting 24/7
  • Incident triage dan prioritisasi severity
  • Integration dengan SIEM tools
  • Incident response dan remediation
  • SOAR dan automasi proses keamanan

SOC Maturity Levels

Diagram: SOC Maturity Model
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    SOC MATURITY LEVELS                        β”‚
β”‚                                                              β”‚
β”‚  Level 1: Perimeter Security                                 β”‚
β”‚  β”œβ”€β”€ Firewall, IDS/IPS                                       β”‚
β”‚  └── Basic log monitoring                                    β”‚
β”‚                                                              β”‚
β”‚  Level 2: Log Management & SIEM                              β”‚
β”‚  β”œβ”€β”€ Centralized logging                                     β”‚
β”‚  β”œβ”€β”€ SIEM deployment                                         β”‚
β”‚  └── Basic correlation rules                                 β”‚
β”‚                                                              β”‚
β”‚  Level 3: Threat Detection & Response                        β”‚
β”‚  β”œβ”€β”€ Advanced analytics                                      β”‚
β”‚  β”œβ”€β”€ Threat intelligence integration                         β”‚
β”‚  └── Incident response procedures                            β”‚
β”‚                                                              β”‚
β”‚  Level 4: Proactive Hunting                                  β”‚
β”‚  β”œβ”€β”€ Threat hunting teams                                    β”‚
β”‚  β”œβ”€β”€ Red team exercises                                      β”‚
β”‚  └── Behavioral analytics                                    β”‚
β”‚                                                              β”‚
β”‚  Level 5: Adaptive & Autonomous                              β”‚
β”‚  β”œβ”€β”€ AI/ML-powered detection                                 β”‚
β”‚  β”œβ”€β”€ Automated response (SOAR)                               β”‚
β”‚  └── Continuous improvement                                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

2. Struktur Tim SOC

Tim SOC terdiri dari beberapa level analis dengan tanggung jawab berbeda. Struktur tiered model memastikan insiden ditangani oleh personel dengan keahlian sesuai.

LevelPeranTanggung Jawab
Tier 1 β€” L1Triage AnalystMonitoring alert, filter false positive, eskalasi insiden
Tier 2 β€” L2Incident ResponderAnalisis mendalam, investigasi, containment
Tier 3 β€” L3Threat HunterProactive hunting, malware analysis, forensik
Tier 4SOC ManagerManajemen tim, reporting, strategi keamanan

Tier 1 β€” Triage Analyst

Analyst L1 adalah lini pertama pertahanan. Mereka memantau dashboard SIEM, melakukan triage terhadap alert, dan memutuskan apakah perlu eskalasi ke L2. Kemampuan yang dibutuhkan meliputi pemahaman dasar networking, OS, dan keamanan siber.

Checklist β€” Triage Alert
# =============================================
# Checklist Triage Alert β€” Tier 1 Analyst
# =============================================

# 1. Verifikasi Alert
#    - Apakah alert valid atau false positive?
#    - Apakah ada konteks tambahan di log?
#    - Apakah ada korelasi dengan alert lain?

# 2. Enrichment Data
#    - Cek IP address di threat intelligence
#    - Verifikasi user yang terlibat
#    - Cek aset yang terdampak

# Contoh: Mengecek IP di abuse.ch
curl -s "https://threatfox-api.abuse.ch/api/v1/" \
  -d '{"query": "search_ioc", "search_term": "192.168.1.100"}'

# Contoh: Lookup domain di VirusTotal
curl --request GET \
  --url "https://www.virustotal.com/api/v3/domains/malicious.com" \
  --header "x-apikey: YOUR_API_KEY"

# 3. Klasifikasi Severity
#    - Critical: Active exploitation, data breach
#    - High: Successful unauthorized access
#    - Medium: Suspicious activity, policy violation
#    - Low: Informational, reconnaissance

# 4. Dokumentasi & Eskalasi
#    - Isi template insiden
#    - Attach semua evidence
#    - Eskalasi ke L2 jika severity >= Medium

3. SOC Monitoring Workflow

Workflow SOC yang efektif mengikuti siklus berulang: Detect β†’ Triage β†’ Investigate β†’ Respond β†’ Recover β†’ Lessons Learned. Setiap tahap memiliki prosedur dan tools yang terstandarisasi.

Data Sources yang Dimonitor

Template β€” Shift Handover
# =============================================
# Template: SOC Shift Handover
# =============================================

# Tanggal & Waktu: 2024-01-15 08:00 WIB
# Shift Sebelumnya: Night Shift (00:00 - 08:00)
# Analis: John Doe

# RINGKASAN SHIFT
# =================================================
# Total Alert Masuk: 342
# Alert Dismissed (FP): 298
# Alert Open (menunggu investigasi): 38
# Insiden Dikonfirmasi: 6

# INSIDEN AKTIF (belum resolved)
# =================================================
# INC-2024-0142: Brute force SSH dari 203.0.113.50
#   Status: Under investigation (L2)
#   Aksi: IP sudah di-block di firewall
#   Next: Cek apakah ada kompromi lain

# INC-2024-0145: Data exfiltration indicator
#   Status: Containment done, eradication pending
#   Aksi: Host WS-FINANCE-05 diisolasi dari network
#   Next: Full disk imaging dan malware analysis

# ALERT MENINGKAT
# =================================================
# Phishing email meningkat 300% β€” campaign baru
# Brute force ke VPN portal dari 3 IP berbeda
# Unusual DNS queries dari server HR

# ACTION ITEMS UNTUK SHIFT BERIKUTNYA
# =================================================
# 1. Follow up INC-2024-0142 dan 0145
# 2. Monitor phishing campaign β€” update email rules
# 3. Review VPN access logs 24 jam terakhir
# 4. Update threat intel feeds

4. Incident Triage Framework

Triage adalah proses memutuskan prioritas dan urgensi sebuah alert. Framework yang baik mengurangi waktu respon dan memastikan sumber daya teralokasi tepat.

Severity Matrix

SeverityDampakResponse TimeContoh
Critical (P1)Bisnis terhenti, data breach< 15 menitActive ransomware, data exfiltration
High (P2)Dampak signifikan< 1 jamUnauthorized admin access, malware
Medium (P3)Potensi risiko< 4 jamSuspicious process, policy violation
Low (P4)Informatif< 24 jamFailed login, port scan

Triage Decision Tree

Diagram: Triage Decision Tree
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚         ALERT MASUK                    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
              β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Apakah ini FALSE POSITIVE?            β”‚
β”‚  (known benign, scheduled scan)        β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       β–Ό YES            β–Ό NO
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Dismiss &   β”‚  β”‚  Apakah target ASET β”‚
β”‚  Document    β”‚  β”‚  KRITIS?             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
                     β–Ό YES       β–Ό NO
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β”‚  P1/P2     β”‚ β”‚  Apakah ada β”‚
              β”‚  Immediate β”‚ β”‚  KOMPROMI?  β”‚
              β”‚  Response  β”‚ β”‚  (exec, C2) β”‚
              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”˜
                               β–Ό YES  β–Ό NO
                        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”
                        β”‚  P2      β”‚ β”‚  P3/P4β”‚
                        β”‚  Escalateβ”‚ β”‚  Log &β”‚
                        β”‚  to L2   β”‚ β”‚  Trackβ”‚
                        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”˜
SIEM Alert β€” Brute Force Detection
# =============================================
# Contoh Alert: Brute Force Detection
# =============================================

# Alert Details:
# Name: Multiple Failed Login Attempts
# Source: Active Directory
# Timestamp: 2024-01-15 03:22:45 WIB
# Count: 47 failed attempts in 5 minutes
# Target: DC-FINANCE-01 (Domain Controller)
# Source IP: 10.10.50.23 (Workstation HR)
# Account: administrator

# TRIAGE ANALYSIS
# =================================================

# Step 1: Verify alert validity
# - Check if IP belongs to legitimate scanner
# - Verify with asset inventory
# - Check if there's a maintenance window

# Step 2: Context enrichment
# - User "administrator" is sensitive account
# - DC-FINANCE-01 is critical asset
# - Source workstation belongs to HR department
# - Time 03:22 is outside business hours

# Step 3: Severity determination
# - Target: Critical (Domain Controller)
# - Account: Critical (administrator)
# - Time: Suspicious (after hours)
# - Volume: High (47 attempts)
# => SEVERITY: P1 β€” CRITICAL

# Step 4: Immediate actions
# 1. Block source IP at firewall
netsh advfirewall firewall add rule name="BLOCK-BF" \
  dir=in action=block remoteip=10.10.50.23

# 2. Disable compromised account
net user administrator /active:no

# 3. Check for successful login
# Query: Did any login succeed from this IP?
index=windows EventCode=4624 IpAddress="10.10.50.23"
  Account_Name="administrator"

# 4. Isolate affected workstation
# (via EDR or network segmentation)

5. Alert Prioritization

Alert fatigue adalah tantangan terbesar SOC. Rata-rata SOC menerima 11,000 alert per hari, dan hanya 1-5% yang memerlukan investigasi nyata.

Strategi Mengurangi Alert Fatigue

Python β€” Risk Scoring Engine
# =============================================
# Risk Scoring Engine untuk Alert Prioritization
# =============================================

class AlertRiskScorer:
    def __init__(self):
        self.asset_weights = {
            "domain_controller": 10,
            "database_server": 9,
            "web_server": 7,
            "workstation": 4,
            "printer": 1
        }
        self.time_multiplier = {
            "business_hours": 1.0,
            "after_hours": 1.3,
            "holiday": 1.5
        }

    def calculate_score(self, alert):
        base_score = 0
        # Factor 1: Asset Criticality (0-30)
        asset_type = alert.get("asset_type", "workstation")
        base_score += self.asset_weights.get(asset_type, 4) * 3
        # Factor 2: Threat Intel Match (0-25)
        if alert.get("threat_intel_match"):
            base_score += 25
        # Factor 3: Attack Confidence (0-20)
        confidence = alert.get("confidence", 0.5)
        base_score += int(confidence * 20)
        # Factor 4: User Risk (0-15)
        if alert.get("privileged_user"):
            base_score += 15
        else:
            base_score += 5
        # Factor 5: Time-based (multiplier)
        time_factor = self.time_multiplier.get(
            alert.get("time_category"), 1.0)
        base_score = int(base_score * time_factor)
        return min(max(base_score, 0), 100)

    def get_priority(self, score):
        if score >= 80: return "P1 β€” Critical"
        if score >= 60: return "P2 β€” High"
        if score >= 40: return "P3 β€” Medium"
        return "P4 β€” Low"

# Contoh penggunaan
scorer = AlertRiskScorer()
alert = {
    "asset_type": "domain_controller",
    "threat_intel_match": True,
    "confidence": 0.85,
    "privileged_user": True,
    "time_category": "after_hours"
}
score = scorer.calculate_score(alert)
priority = scorer.get_priority(score)
print(f"Risk Score: {score}/100 β€” {priority}")
# Output: Risk Score: 97/100 β€” P1 β€” Critical

6. SIEM Integration

SIEM adalah jantung operasi SOC. Integrasi yang baik memastikan deteksi komprehensif dari semua sumber data.

Contoh Correlation Rule

Sigma Rule β€” Credential Dumping
# =============================================
# Sigma Rule: Credential Dumping Detection
# =============================================
title: Credential Dumping via LSASS Access
id: a1234567-89ab-cdef-0123-456789abcdef
status: production
description: Detects suspicious access to LSASS process

logsource:
  category: process_access
  product: windows

detection:
  selection:
    TargetImage|endswith: '\lsass.exe'
    GrantedAccess|contains:
      - '0x1010'
      - '0x1410'
      - '0x1438'
  filter:
    SourceImage|endswith:
      - '\wmiprvse.exe'
      - '\taskmgr.exe'
  condition: selection and not filter

falsepositives:
  - Legitimate security tools
  - System processes

level: high
tags:
  - attack.credential_access
  - attack.t1003.001

7. Incident Response

Incident Response (IR) adalah proses terstruktur untuk menangani insiden keamanan. Framework NIST SP 800-61 mendefinisikan 4 fase utama: Preparation, Detection & Analysis, Containment Eradication Recovery, dan Post-Incident Activity.

IR Playbook β€” Ransomware

IR Playbook β€” Ransomware
# =============================================
# INCIDENT RESPONSE PLAYBOOK: RANSOMWARE
# =============================================

# PHASE 1: DETECTION & TRIAGE (0-30 menit)
# =================================================
# 1. Identifikasi indikator ransomware:
#    - Mass file extension changes
#    - Ransom note files appearing
#    - Unusual process execution
#    - Volume shadow copy deletion
#    - High CPU/disk I/O

# 2. Validasi alert:
#    - Verify file encryption evidence
#    - Check for ransom note content
#    - Identify ransomware variant (ID Ransomware)

# 3. Severity: P1 β€” CRITICAL

# PHASE 2: CONTAINMENT (30 menit - 2 jam)
# =================================================
# 1. Network isolation:
#    - Disconnect affected hosts from network
#    - Block lateral movement paths
#    - Isolate network segments

# 2. Account security:
#    - Force password reset for compromised accounts
#    - Disable suspicious service accounts
#    - Review and revoke VPN access

# 3. Evidence preservation:
#    - Capture memory dump
#    - Image affected systems
#    - Collect logs from all sources
#    - Document timeline

# PHASE 3: ERADICATION (2-24 jam)
# =================================================
# 1. Malware removal:
#    - Identify all affected systems
#    - Remove malware artifacts
#    - Patch exploited vulnerabilities

# 2. Root cause analysis:
#    - Initial infection vector
#    - Privilege escalation path
#    - Lateral movement methods

# PHASE 4: RECOVERY (1-7 hari)
# =================================================
# 1. System restoration:
#    - Restore from clean backups
#    - Verify system integrity
#    - Rebuild compromised systems

# 2. Monitoring:
#    - Enhanced monitoring for 30 days
#    - Watch for re-infection indicators

# PHASE 5: LESSONS LEARNED (7-14 hari)
# =================================================
# 1. Post-incident review meeting
# 2. Update detection rules
# 3. Improve security controls
# 4. Update this playbook

8. SOC Metrics & KPI

Mengukur kinerja SOC sangat penting untuk perbaikan berkelanjutan dan membuktikan nilai investasi keamanan.

KPIDeskripsiTarget
MTTDMean Time to Detect< 1 jam
MTTRMean Time to Respond< 4 jam
MTTCMean Time to Contain< 8 jam
False Positive Rate% alert yang ternyata FP< 50%
Coverage% MITRE ATT&CK yang termonitor> 70%
Escalation Rate% alert yang perlu eskalasi5-15%
πŸ’‘ Tips

Fokus pada MTTD dan MTTR sebagai metrik utama. Kedua metrik ini langsung menggambarkan efektivitas SOC dalam mendeteksi dan merespons ancaman. Targetkan improvement 10-20% setiap quarter.

9. SOAR & Automation

SOAR (Security Orchestration, Automation, and Response) memungkinkan SOC mengotomasi tugas repetitif sehingga analis fokus pada analisis tingkat tinggi.

Python β€” SOAR Phishing Playbook
# =============================================
# SOAR Playbook: Automated Phishing Response
# =============================================

class PhishingPlaybook:
    def __init__(self, siem, email, edr):
        self.siem = siem
        self.email = email
        self.edr = edr

    def execute(self, alert):
        # Step 1: Extract IOCs
        iocs = self.extract_iocs(alert["email_data"])
        # Step 2: Threat intel lookup
        ti = self.threat_intel_lookup(iocs)
        malicious_score = ti.get("score", 0)

        if malicious_score > 70:
            # Step 3a: Auto remediate
            self.quarantine_email(alert["msg_id"])
            self.block_sender(alert["sender"])
            self.block_urls(iocs["urls"])
            # Step 4: Check link clicks
            affected = self.check_url_clicks(iocs["urls"])
            if affected:
                self.isolate_endpoints(affected)
                self.reset_passwords(affected)
            return {"action": "auto_remediated"}
        else:
            # Step 3b: Escalate to analyst
            self.create_ticket(alert, ti)
            return {"action": "escalated"}

    def extract_iocs(self, email_data):
        import re
        urls = re.findall(
            r'https?://[^\s<>"]+', email_data.get("body", ""))
        return {
            "sender": email_data.get("from"),
            "subject": email_data.get("subject"),
            "urls": urls,
            "attachments": email_data.get("attachments", [])
        }
⚠️ Catatan Penting

Otomasi tidak menggantikan analis manusia. Selalu gunakan human-in-the-loop untuk keputusan high-impact seperti isolasi sistem produksi atau blocking IP range. Otomasi terbaik untuk enrichment, ticketing, dan remediation low-risk.

10. Quiz Pemahaman

1. Apa tanggung jawab utama Tier 1 (L1) analyst?

2. Response time target untuk insiden P1 Critical?

3. Apa kepanjangan MTTD?

4. Tool untuk mengotomasi playbook keamanan?

5. Langkah pertama setelah deteksi ransomware?

Rangkuman

πŸ“ Poin Penting
  • Tiered Model β€” SOC menggunakan L1-L4 untuk menangani insiden sesuai severity
  • Triage β€” Framework prioritisasi memastikan sumber daya tepat
  • SIEM β€” Jantung operasi SOC untuk korelasi dan deteksi
  • Incident Response β€” Proses NIST 4 fase: Preparation, Detection, Containment, Lessons Learned
  • SOAR β€” Otomasi playbook untuk mengurangi beban analis