Python Profiling & Optimisasi: Panduan Lengkap

📋 Daftar Isi

Pengenalan Profiling
timeit — Benchmark Cepat
cProfile — Profiling Fungsi
Analisis Hasil dengan pstats
line_profiler — Profiling per Baris
memory_profiler — Profiling Memori
tracemalloc — Tracking Alokasi Memori
Optimisasi String & I/O
Optimisasi Struktur Data
Optimisasi Loop & Comprehension
Async & Concurrency
Quiz Pemahaman

1. Pengenalan Profiling

Profiling adalah proses menganalisis performa program untuk menemukan bottleneck — bagian kode yang paling lambat atau paling banyak menggunakan memori. Tanpa profiling, optimisasi hanya berdasarkan tebakan yang sering tidak akurat.

Donald Knuth pernah berkata: "Premature optimization is the root of all evil." Profiling memastikan kita mengoptimasi bagian yang benar-benar membutuhkan perbaikan, bukan bagian yang sudah cukup cepat.

Jenis Profiling

Jenis	Yang Diukur	Tool
Time Profiling	Berapa lama setiap fungsi/baris berjalan	cProfile, line_profiler, timeit
Memory Profiling	Berapa banyak memori yang digunakan	memory_profiler, tracemalloc
I/O Profiling	Waktu tunggu operasi I/O	asyncio debug, strace
CPU Profiling	Penggunaan CPU per thread/fungsi	py-spy, vmprof

Diagram: Workflow Profiling

┌─────────────────────────────────────────────────────────────────┐
│                PROFILING WORKFLOW                                │
│                                                                 │
│  ┌──────────┐    ┌───────────┐    ┌───────────┐    ┌────────┐  │
│  │ Identify  │──▶│  Profile  │──▶│  Analyze   │──▶│ Optimize│  │
│  │ Problem   │   │  with Tool│   │  Results   │   │ Code    │  │
│  └──────────┘    └───────────┘    └───────────┘    └────────┘  │
│       │                │                │               │       │
│  "App lambat"   cProfile/       "Fungsi X ambil   "Gunakan     │
│                  line_profiler    80% waktu"       dict lookup" │
│                                                                 │
│  ┌──────────┐    ┌───────────┐                                  │
│  │ Measure   │◀──│  Repeat   │  ← Iterasi sampai target tercapai│
│  │ Impact    │   │  Process  │                                  │
│  └──────────┘    └───────────┘                                  │
└─────────────────────────────────────────────────────────────────┘

Aturan Emas Profiling

Profile Dulu, Optimasi Kemudian: Jangan tebak — ukur dengan data
Targetkan Bottleneck: Biasanya 80% waktu di 20% kode (Pareto Principle)
Benchmark Sebelum & Sesudah: Selalu ukur perbaikan setelah optimasi
Profile di Environment yang Realistis: Gunakan data production-like
Jangan Lupa Memori: Kecepatan bukan segalanya — memori juga penting

2. timeit — Benchmark Cepat

Modul timeit adalah tool benchmarking built-in Python yang paling sederhana dan akurat untuk mengukur waktu eksekusi potongan kode kecil. Modul ini menonaktifkan garbage collector dan menggunakan timer presisi tinggi.

Penggunaan Dasar timeit

"""timeit — benchmark berbagai cara membuat list."""
import timeit

# Benchmark: List comprehension vs loop vs map
setup = "data = list(range(1000))"

t1 = timeit.timeit("[x**2 for x in data]", setup=setup, number=10000)
t2 = timeit.timeit("""
result = []
for x in data:
    result.append(x**2)
""", setup=setup, number=10000)
t3 = timeit.timeit("list(map(lambda x: x**2, data))", setup=setup, number=10000)

print(f"List comprehension : {t1:.4f}s")
print(f"For loop + append  : {t2:.4f}s")
print(f"Map + lambda       : {t3:.4f}s")
print(f"\nKomprehensi {t2/t1:.1f}x lebih cepat dari loop!")
print(f"Komprehensi {t3/t1:.1f}x lebih cepat dari map!")

# Hasil (contoh):
# List comprehension : 2.1456s
# For loop + append  : 3.8723s
# Map + lambda       : 3.2154s
# Komprehensi 1.8x lebih cepat dari loop!

timeit dari Command Line

# Benchmark dari command line
python -m timeit "'-'.join(str(n) for n in range(100))"
# 10000 loops, best of 5: 25.3 usec per loop

python -m timeit "'-'.join(map(str, range(100)))"
# 10000 loops, best of 5: 19.8 usec per loop

# Dengan setup
python -m timeit -s "data = list(range(1000))" "[x**2 for x in data]"
# 5000 loops, best of 5: 85.2 usec per loop

# Tentukan jumlah iterasi
python -m timeit -n 1000 -r 5 "[x**2 for x in range(1000)]"
# 1000 loops, best of 5: 312 usec per loop

Benchmark Komprehensif dengan timeit

"""Komparasi performa berbagai operasi Python."""
import timeit
from typing import Callable


def benchmark(func: Callable, *args, number: int = 10000, label: str = ""):
    """Fungsi helper untuk benchmark."""
    stmt = lambda: func(*args)
    time_taken = timeit.timeit(stmt, number=number)
    per_call = time_taken / number * 1_000_000  # Convert ke microseconds
    print(f"  {label:<35} {time_taken:.4f}s total | {per_call:.2f}μs/call")
    return time_taken


# === Komparasi lookup methods ===
data_list = list(range(10000))
data_set = set(range(10000))
data_dict = {i: i for i in range(10000)}

print("=== Lookup Performance (10,000 iterasi) ===")

# List lookup (O(n))
benchmark(lambda: 9999 in data_list, number=10000,
          label="'in' list (worst case)")

# Set lookup (O(1))
benchmark(lambda: 9999 in data_set, number=10000,
          label="'in' set (O(1))")

# Dict lookup (O(1))
benchmark(lambda: 9999 in data_dict, number=10000,
          label="'in' dict (O(1))")

print("\n=== String Concatenation ===")

# String concatenation (slow)
def concat_strings_loop(n):
    result = ""
    for i in range(n):
        result += str(i)
    return result

# Join (fast)
def concat_strings_join(n):
    return "".join(str(i) for i in range(n))

# f-string (fast)
def concat_strings_fstring(n):
    return "".join(f"{i}" for i in range(n))

benchmark(concat_strings_loop, 1000, number=1000, label="String += loop")
benchmark(concat_strings_join, 1000, number=1000, label="Join generator")
benchmark(concat_strings_fstring, 1000, number=1000, label="Join f-string")

print("\n=== Dictionary Methods ===")

data = {str(i): i for i in range(1000)}

benchmark(lambda: data.get("999"), number=100000, label="dict.get()")
benchmark(lambda: data["999"], number=100000, label="dict[key]")
benchmark(lambda: data.setdefault("999", 0), number=100000, label="dict.setdefault()")

print("\n=== Sorting Methods ===")

import random
unsorted = list(range(1000))
random.shuffle(unsorted)

benchmark(lambda: sorted(unsorted), number=1000, label="sorted() — new list")
benchmark(lambda: unsorted.copy().sort(), number=1000, label="copy + sort() — in-place")

💡 Tips timeit

Selalu gunakan -r 5 atau lebih untuk mendapat hasil yang stabil. Hasil timeit mengambil best-of dari beberapa kali repeat, jadi lebih banyak repeat = lebih konsisten.

3. cProfile — Profiling Fungsi

cProfile adalah modul profiling built-in Python yang memberikan informasi lengkap tentang waktu eksekusi setiap fungsi dalam program. Ini adalah tool pertama yang harus digunakan saat program terasa lambat.

Penggunaan Dasar cProfile

"""cProfile — profiling program Python."""
import cProfile
import pstats
from io import StringIO


# === Contoh program yang akan di-profile ===
def fibonacci_recursive(n: int) -> int:
    """Fibonacci rekursif — sangat lambat O(2^n)."""
    if n <= 1:
        return n
    return fibonacci_recursive(n - 1) + fibonacci_recursive(n - 2)


def fibonacci_memoized(n: int, memo: dict = None) -> int:
    """Fibonacci dengan memoization — O(n)."""
    if memo is None:
        memo = {}
    if n in memo:
        return memo[n]
    if n <= 1:
        return n
    memo[n] = fibonacci_memoized(n - 1, memo) + fibonacci_memoized(n - 2, memo)
    return memo[n]


def data_processing():
    """Fungsi yang memproses data — contoh bottleneck."""
    data = list(range(10000))

    # Operasi yang tidak efisien
    result = []
    for x in data:
        if x % 2 == 0:
            result.append(x ** 2)

    # Sort hasil
    result.sort(reverse=True)

    return result[:100]


def main():
    """Fungsi utama yang akan di-profile."""
    # Fibonacci
    fib_recursive = fibonacci_recursive(30)
    fib_memo = fibonacci_memoized(100)

    # Data processing
    top_data = data_processing()

    return fib_recursive, fib_memo, top_data


# === Profiling dengan cProfile ===
if __name__ == "__main__":
    # Cara 1: Profile seluruh fungsi
    profiler = cProfile.Profile()
    profiler.enable()

    result = main()

    profiler.disable()

    # Tampilkan hasil
    stream = StringIO()
    stats = pstats.Stats(profiler, stream=stream)
    stats.sort_stats("cumulative")  # Sort berdasarkan cumulative time
    stats.print_stats(20)  # Tampilkan 20 baris teratas

    print(stream.getvalue())

Profiling dari Command Line

# Profile script Python
python -m cProfile -s cumulative my_script.py

# Sort berdasarkan waktu internal (tottime)
python -m cProfile -s tottime my_script.py

# Sort berdasarkan jumlah panggilan
python -m cProfile -s calls my_script.py

# Simpan hasil profiling ke file
python -m cProfile -o profile_output.prof my_script.py

# Analisis file profil
python -c "
import pstats
stats = pstats.Stats('profile_output.prof')
stats.sort_stats('cumulative')
stats.print_stats(30)
"

Membaca Output cProfile

"""Membaca dan menganalisis output cProfile."""
import cProfile
import pstats
from io import StringIO


def profile_function(func, *args, **kwargs):
    """Helper untuk profiling fungsi individual."""
    profiler = cProfile.Profile()
    profiler.enable()

    result = func(*args, **kwargs)

    profiler.disable()

    # Analisis
    stream = StringIO()
    stats = pstats.Stats(profiler, stream=stream)
    stats.sort_stats("cumulative")
    stats.print_stats()

    output = stream.getvalue()
    print(output)

    return result, stats


# Contoh output cProfile (dijelaskan):
"""
         127 function calls in 0.045 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.045    0.045 main.py:25(main)
        1    0.041    0.041    0.043    0.043 main.py:8(fibonacci_recursive)
       62    0.002    0.000    0.002    0.000 main.py:7(fibonacci_recursive)
        1    0.000    0.000    0.002    0.002 main.py:20(data_processing)
        1    0.001    0.001    0.001    0.001 {built-in method built-in.sort}
"""
# Penjelasan kolom:
# ncalls  = jumlah kali fungsi dipanggil
# tottime = waktu eksekusi fungsi (tanpa sub-fungsi)
# percall = tottime / ncalls
# cumtime = waktu eksekusi inklusif (dengan sub-fungsi)
# percall = cumtime / ncalls
# filename:lineno(function) = lokasi fungsi

Profile Decorator

"""Decorator untuk profiling otomatis."""
import cProfile
import pstats
import functools
from io import StringIO


def profile(sort_by="cumulative", lines=20):
    """Decorator untuk mem-profile fungsi yang di-decorate."""
    def decorator(func):
        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            profiler = cProfile.Profile()
            profiler.enable()
            result = func(*args, **kwargs)
            profiler.disable()

            stream = StringIO()
            stats = pstats.Stats(profiler, stream=stream)
            stats.sort_stats(sort_by)
            stats.print_stats(lines)

            print(f"\n📊 Profile for {func.__name__}:")
            print(stream.getvalue())
            return result
        return wrapper
    return decorator


@profile(sort_by="tottime", lines=10)
def process_data(n: int):
    """Contoh fungsi yang di-profile."""
    data = [i ** 2 for i in range(n)]
    filtered = [x for x in data if x % 3 == 0]
    result = sorted(filtered, reverse=True)
    return result[:10]


# Jalankan — otomatis menampilkan profile
result = process_data(100000)

4. Analisis Hasil dengan pstats

pstats adalah modul untuk membaca dan menganalisis file profil yang dihasilkan oleh cProfile. Modul ini menyediakan berbagai cara untuk mengurutkan, memfilter, dan menampilkan data profil.

"""Analisis profil menggunakan pstats."""
import pstats
from pstats import SortKey


# Load profil dari file
stats = pstats.Stats("profile_output.prof")

# Berbagai cara sort
print("=== Sorted by Cumulative Time ===")
stats.sort_stats(SortKey.CUMULATIVE)
stats.print_stats(20)

print("\n=== Sorted by Total Time ===")
stats.sort_stats(SortKey.TIME)
stats.print_stats(20)

print("\n=== Sorted by Calls ===")
stats.sort_stats(SortKey.CALLS)
stats.print_stats(20)

# Filter hanya fungsi dari modul tertentu
print("\n=== Filter by Module ===")
stats.sort_stats(SortKey.CUMULATIVE)
stats.print_stats("my_module")

# Print callers (siapa yang memanggil fungsi ini)
print("\n=== Callers ===")
stats.print_callers("process_data")

# Print callees (fungsi apa yang dipanggil)
print("\n=== Callees ===")
stats.print_callees("main")

# Visualisasi dengan snakeviz (install: pip install snakeviz)
# Jalankan di terminal:
# snakeviz profile_output.prof
# Ini akan membuka browser dengan visualisasi interaktif!

💡 Visualisasi dengan snakeviz

Install pip install snakeviz lalu jalankan snakeviz profile_output.prof untuk mendapat visualisasi interaktif berbasis browser yang sangat membantu dalam memahami call tree dan bottleneck.

5. line_profiler — Profiling per Baris

line_profiler memberikan informasi detail tentang berapa lama setiap baris kode dalam sebuah fungsi berjalan. Ini sangat berguna untuk menemukan baris spesifik yang menjadi bottleneck.

Instalasi dan Penggunaan

# Instalasi
pip install line_profiler

# Menggunakan @profile decorator
# Tambahkan @profile pada fungsi yang ingin di-profile
# Lalu jalankan:
kernprof -l -v my_script.py

# Opsi:
# -l : line-by-line profiling
# -v : verbose (langsung tampilkan hasil)

Contoh Penggunaan line_profiler

"""Contoh penggunaan line_profiler.
Simpan sebagai: line_profile_example.py
Jalankan: kernprof -l -v line_profile_example.py
"""
import numpy as np


@profile  # Decorator dari line_profiler
def matrix_operations(n: int):
    """Operasi matriks yang bisa dioptimasi."""

    # Membuat matriks random
    matrix_a = np.random.rand(n, n)     # Line 10
    matrix_b = np.random.rand(n, n)     # Line 11

    # Perkalian matriks
    result = np.dot(matrix_a, matrix_b) # Line 14

    # Normalisasi
    norm = np.linalg.norm(result)       # Line 17
    normalized = result / norm           # Line 18

    # Statistik
    mean_val = np.mean(normalized)       # Line 21
    std_val = np.std(normalized)         # Line 22
    max_val = np.max(normalized)         # Line 23

    return normalized, mean_val, std_val, max_val


@profile
def data_transformation(data: list[int]) -> list[int]:
    """Transformasi data — contoh bottleneck."""

    # Step 1: Filter
    filtered = []
    for x in data:                      # Bisa lambat
        if x % 2 == 0:
            filtered.append(x)

    # Step 2: Transform
    transformed = []
    for x in filtered:                  # Bisa lambat
        transformed.append(x ** 2 + 3 * x + 1)

    # Step 3: Sort
    transformed.sort()                  # O(n log n)

    return transformed


if __name__ == "__main__":
    result = matrix_operations(500)
    data = list(range(100000))
    result2 = data_transformation(data)

Output line_profiler

# Contoh output line_profiler:
"""
Total time: 2.34567 s
File: line_profile_example.py
Function: matrix_operations at line 8

Line #  Hits     Time  Per Hit  % Hit  Line Contents
=============================================================
     8                                           @profile
     9                                           def matrix_operations(n):
    10         1     12345  12345.0   0.5     matrix_a = np.random.rand(n, n)
    11         1     11234  11234.0   0.5     matrix_b = np.random.rand(n, n)
    12
    13         1   2300000 2300000.0  98.1     result = np.dot(matrix_a, matrix_b)
    14
    15         1      1234   1234.0   0.1     norm = np.linalg.norm(result)
    16         1      2345   2345.0   0.1     normalized = result / norm
    ...
"""
# Interpretasi:
# Line 13 (np.dot) mengambil 98.1% waktu — ini bottleneck!
# Optimasi: Gunakan BLAS-optimized numpy, atau kurangi dimensi matriks

6. memory_profiler — Profiling Memori

memory_profiler memungkinkan Anda melihat penggunaan memori dari setiap baris kode dalam program. Ini sangat penting untuk menemukan memory leaks dan mengoptimasi penggunaan memori.

Instalasi dan Penggunaan

# Instalasi
pip install memory_profiler
pip install matplotlib  # Untuk visualisasi grafik memori

# Jalankan profiling memori
python -m memory_profiler my_script.py

# Profiling dengan grafik memori
mprof run my_script.py
mprof plot  # Hasilkan grafik penggunaan memori vs waktu

Contoh Profiling Memori

"""memory_profiler — contoh profiling penggunaan memori.
Simpan sebagai: memory_profile_example.py
Jalankan: python -m memory_profiler memory_profile_example.py
"""
from memory_profiler import profile
import sys


@profile
def load_large_data():
    """Contoh fungsi yang mengkonsumsi banyak memori."""

    # Method 1: Load semua data ke memori (boros memori)
    data_list = []
    for i in range(100000):
        data_list.append({
            "id": i,
            "name": f"User_{i}",
            "email": f"user{i}@example.com",
            "scores": list(range(10)),
        })

    # Method 2: Proses dengan generator (hemat memori)
    total = sum(
        item["scores"][0]
        for item in data_list
    )

    # Method 3: Filter di memori
    filtered = [item for item in data_list if item["id"] < 1000]

    return total, len(filtered)


@profile
def process_with_pandas():
    """Contoh memproses data dengan pandas."""
    import pandas as pd
    import numpy as np

    # Membuat DataFrame besar
    df = pd.DataFrame({
        "id": range(100000),
        "value": np.random.randn(100000),
        "category": np.random.choice(["A", "B", "C"], 100000),
    })

    # Operasi yang boros memori
    df["squared"] = df["value"] ** 2
    df["label"] = df["category"].map({"A": "Alpha", "B": "Beta", "C": "Charlie"})

    # Aggregasi
    summary = df.groupby("category").agg({
        "value": ["mean", "std"],
        "squared": "sum",
    })

    return summary


@profile
def streaming_vs_loading():
    """Perbandingan: streaming vs loading semua data."""

    # Method 1: Load semua (boros)
    all_data = [list(range(1000)) for _ in range(1000)]  # 1M items

    # Method 2: Generator (hemat)
    def data_generator():
        for i in range(1000):
            yield list(range(1000))

    # Hitung dengan generator
    total = 0
    for chunk in data_generator():
        total += sum(chunk)

    # Cleanup
    del all_data

    return total


if __name__ == "__main__":
    load_large_data()
    # process_with_pandas()  # Uncomment jika pandas terinstall
    streaming_vs_loading()

Output memory_profiler

# Contoh output memory_profiler:
"""
Filename: memory_profile_example.py

Line #    Mem usage    Increment  Occurrences   Line Contents
============================================================
     8     45.2 MiB     0.0 MiB           1   @profile
     9                                         def load_large_data():
    10
    11     45.2 MiB     0.0 MiB           1       data_list = []
    12    156.7 MiB   111.5 MiB      100001       for i in range(100000):
    13    156.7 MiB     0.0 MiB      100000           data_list.append({...})
    14
    15    156.7 MiB     0.0 MiB           1       total = sum(...)
    16
    17    157.2 MiB     0.5 MiB           1       filtered = [item for item in ...]
    18
    19    157.2 MiB     0.0 MiB           1       return total, len(filtered)
"""
# Interpretasi:
# Line 12-13: Menggunakan 111.5 MiB untuk data_list — ini masalah utama!
# Line 17: Filter hanya 0.5 MiB — tidak masalah
# Optimasi: Gunakan generator atau chunked processing

⚠️ Memory Leaks di Python

Memory leak di Python biasanya terjadi karena: (1) referensi circular yang tidak di-garbage collect, (2) cache/accumulator yang terus tumbuh, (3) closure yang menangkap variabel besar, atau (4) global variable yang terus bertambah.

7. tracemalloc — Tracking Alokasi Memori

tracemalloc adalah modul built-in Python yang melacak alokasi memori. Berbeda dari memory_profiler yang hanya menunjukkan RSS, tracemalloc bisa menunjukkan persis alokasi memori Python.

"""tracemalloc — tracking alokasi memori detail."""
import tracemalloc
import linecache


def display_top_allocations(snapshot, key_type="lineno", limit=10):
    """Menampilkan top alokasi memori dari snapshot."""
    top_stats = snapshot.compare_to(previous_snapshot, key_type)

    print(f"\n📊 Top {limit} alokasi memori:")
    for index, stat in enumerate(top_stats[:limit], 1):
        frame = stat.traceback[0]
        print(
            f"  #{index}: {frame.filename}:{frame.lineno}: "
            f"{stat.size / 1024:.1f} KiB"
        )
        # Tampilkan source code
        line = linecache.getline(frame.filename, frame.lineno).strip()
        if line:
            print(f"         → {line}")

    other = top_stats[limit:]
    if other:
        size = sum(stat.size for stat in other)
        print(f"  Others: {size / 1024:.1f} KiB")

    total = sum(stat.size for stat in top_stats)
    print(f"  Total: {total / 1024:.1f} KiB")


# Mulai tracking
tracemalloc.start(25)  # 25 frame untuk traceback lebih detail

# Snapshot awal
previous_snapshot = tracemalloc.take_snapshot()

# === Alokasi memori ===
# Contoh 1: List besar
big_list = [list(range(1000)) for _ in range(100)]

# Contoh 2: Dictionary besar
big_dict = {f"key_{i}": list(range(100)) for i in range(1000)}

# Contoh 3: String yang tidak perlu
strings = [f"string_{i}" * 100 for i in range(1000)]

# Snapshot setelah alokasi
current_snapshot = tracemalloc.take_snapshot()

# Tampilkan perbandingan
display_top_allocations(current_snapshot, limit=10)

# Statistik umum
current, peak = tracemalloc.get_traced_memory()
print(f"\n📈 Memori saat ini: {current / 1024:.1f} KiB")
print(f"📈 Peak memori: {peak / 1024:.1f} KiB")
print(f"📈 Total alokasi: {tracemalloc.get_tracemalloc_memory() / 1024:.1f} KiB")

# Stop tracking
tracemalloc.stop()

8. Optimisasi String & I/O

Operasi string dan I/O sering menjadi bottleneck yang tidak terduga. Berikut teknik optimasi yang efektif.

"""Optimisasi operasi string dan I/O."""
import time
import io
import sys


# ===== String Concatenation =====
def slow_concat(n: int) -> str:
    """❌ Lambat — string concatenation dengan +="""
    result = ""
    for i in range(n):
        result += str(i) + ","
    return result


def fast_concat(n: int) -> str:
    """✅ Cepat — menggunakan join"""
    parts = [str(i) for i in range(n)]
    return ",".join(parts)


def fastest_concat(n: int) -> str:
    """✅ Paling cepat — generator expression"""
    return ",".join(str(i) for i in range(n))


# ===== String Formatting =====
def format_old(n: int) -> str:
    """❌ % formatting (lama)"""
    return "User %s has %d points" % ("Budi", n)


def format_format(n: int) -> str:
    """🟡 .format()"""
    return "User {} has {} points".format("Budi", n)


def format_fstring(n: int) -> str:
    """✅ f-string (terbaik)"""
    name = "Budi"
    return f"User {name} has {n} points"


# ===== I/O Optimization =====
def write_slow(filepath: str, data: list[str]):
    """❌ Buka/tutup file setiap kali tulis"""
    for line in data:
        with open(filepath, "a") as f:
            f.write(line + "\n")


def write_fast(filepath: str, data: list[str]):
    """✅ Tulis sekaligus"""
    with open(filepath, "w") as f:
        f.writelines(line + "\n" for line in data)


def write_fastest(filepath: str, data: list[str]):
    """✅ Tulis dengan buffering manual"""
    buffer = io.StringIO()
    for line in data:
        buffer.write(line)
        buffer.write("\n")
    with open(filepath, "w") as f:
        f.write(buffer.getvalue())


# ===== Benchmark =====
if __name__ == "__main__":
    import timeit

    n = 10000

    t1 = timeit.timeit(lambda: slow_concat(n), number=10)
    t2 = timeit.timeit(lambda: fast_concat(n), number=10)
    t3 = timeit.timeit(lambda: fastest_concat(n), number=10)

    print("=== String Concatenation ===")
    print(f"  += concatenation : {t1:.4f}s")
    print(f"  join + list      : {t2:.4f}s")
    print(f"  join + generator : {t3:.4f}s")
    print(f"  Speedup: {t1/t3:.1f}x faster with join")

9. Optimisasi Struktur Data

"""Optimisasi struktur data — memilih struktur yang tepat."""
import timeit
import sys
from collections import deque, defaultdict, Counter


# ===== 1. List vs Set vs Dict Lookup =====
print("=== Lookup Performance ===")
n = 100000

data_list = list(range(n))
data_set = set(range(n))
data_dict = dict.fromkeys(range(n))

target = n - 1  # Worst case untuk list

t_list = timeit.timeit(lambda: target in data_list, number=1000)
t_set = timeit.timeit(lambda: target in data_set, number=1000)
t_dict = timeit.timeit(lambda: target in data_dict, number=1000)

print(f"  List lookup : {t_list:.4f}s  (O(n))")
print(f"  Set lookup  : {t_set:.4f}s  (O(1))")
print(f"  Dict lookup : {t_dict:.4f}s  (O(1))")


# ===== 2. collections.defaultdict vs dict.setdefault =====
print("\n=== Grouping Data ===")

def group_with_setdefault(data):
    """Menggunakan dict.setdefault — lebih verbose."""
    result = {}
    for item in data:
        result.setdefault(item[0], []).append(item)
    return result

def group_with_defaultdict(data):
    """Menggunakan defaultdict — lebih bersih."""
    result = defaultdict(list)
    for item in data:
        result[item[0]].append(item)
    return result

sample_data = [("A", 1), ("B", 2), ("A", 3), ("C", 4), ("B", 5)] * 1000

t1 = timeit.timeit(lambda: group_with_setdefault(sample_data), number=100)
t2 = timeit.timeit(lambda: group_with_defaultdict(sample_data), number=100)

print(f"  dict.setdefault : {t1:.4f}s")
print(f"  defaultdict     : {t2:.4f}s")


# ===== 3. Counter vs Manual Counting =====
print("\n=== Counting Elements ===")

data = ["apple", "banana", "apple", "cherry", "banana", "apple"] * 1000

def count_manual(data):
    counts = {}
    for item in data:
        counts[item] = counts.get(item, 0) + 1
    return counts

def count_counter(data):
    return Counter(data)

t1 = timeit.timeit(lambda: count_manual(data), number=100)
t2 = timeit.timeit(lambda: count_counter(data), number=100)

print(f"  Manual counting : {t1:.4f}s")
print(f"  Counter         : {t2:.4f}s")


# ===== 4. deque vs list untuk Queue =====
print("\n=== Queue Performance (pop from front) ===")

queue_list = list(range(10000))
queue_deque = deque(range(10000))

t1 = timeit.timeit(lambda: queue_list.pop(0), number=100)
t2 = timeit.timeit(lambda: queue_deque.popleft(), number=100)

print(f"  list.pop(0)     : {t1:.4f}s  (O(n))")
print(f"  deque.popleft() : {t2:.4f}s  (O(1))")


# ===== 5. __slots__ — Menghemat Memori =====
print("\n=== __slots__ Memory Savings ===")

class RegularClass:
    def __init__(self, x, y, z):
        self.x = x
        self.y = y
        self.z = z

class SlottedClass:
    __slots__ = ["x", "y", "z"]
    def __init__(self, x, y, z):
        self.x = x
        self.y = y
        self.z = z

regular = RegularClass(1, 2, 3)
slotted = SlottedClass(1, 2, 3)

print(f"  Regular instance : {sys.getsizeof(regular.__dict__)} bytes dict")
print(f"  Slotted instance : no __dict__, uses {sys.getsizeof(slotted)} bytes total")

# Membuat 100k instances
regulars = [RegularClass(i, i+1, i+2) for i in range(100000)]
slotteds = [SlottedClass(i, i+1, i+2) for i in range(100000)]

mem_regular = sum(sys.getsizeof(r.__dict__) for r in regulars)
mem_slotted = sum(sys.getsizeof(s) for s in slotteds)

print(f"  100k Regular : ~{mem_regular / 1024 / 1024:.1f} MiB")
print(f"  100k Slotted : ~{mem_slotted / 1024 / 1024:.1f} MiB")

10. Optimisasi Loop & Comprehension

"""Optimisasi loop dan comprehension."""
import timeit


# ===== 1. Comprehension vs Loop =====
print("=== List Creation Methods ===")

def with_loop(n):
    result = []
    for i in range(n):
        if i % 2 == 0:
            result.append(i ** 2)
    return result

def with_comprehension(n):
    return [i ** 2 for i in range(0, n, 2)]

def with_generator(n):
    return list(i ** 2 for i in range(0, n, 2))

def with_map_filter(n):
    return list(map(lambda x: x**2, filter(lambda x: x % 2 == 0, range(n))))

n = 100000
t1 = timeit.timeit(lambda: with_loop(n), number=100)
t2 = timeit.timeit(lambda: with_comprehension(n), number=100)
t3 = timeit.timeit(lambda: with_generator(n), number=100)
t4 = timeit.timeit(lambda: with_map_filter(n), number=100)

print(f"  Loop + append     : {t1:.4f}s")
print(f"  Comprehension     : {t2:.4f}s  ({t1/t2:.1f}x faster)")
print(f"  Generator + list  : {t3:.4f}s")
print(f"  map + filter      : {t4:.4f}s")


# ===== 2. Dictionary Comprehension =====
print("\n=== Dict Creation ===")

def dict_with_loop(n):
    result = {}
    for i in range(n):
        result[str(i)] = i ** 2
    return result

def dict_comprehension(n):
    return {str(i): i ** 2 for i in range(n)}

t1 = timeit.timeit(lambda: dict_with_loop(n), number=100)
t2 = timeit.timeit(lambda: dict_comprehension(n), number=100)

print(f"  Loop              : {t1:.4f}s")
print(f"  Comprehension     : {t2:.4f}s  ({t1/t2:.1f}x faster)")


# ===== 3. Loop Optimizations =====
print("\n=== Loop Optimization Tips ===")

data = list(range(10000))

# ❌ Lambat: function call di loop
def slow_sum(data):
    total = 0
    for x in data:
        total += x
    return total

# ✅ Cepat: built-in sum
def fast_sum(data):
    return sum(data)

t1 = timeit.timeit(lambda: slow_sum(data), number=1000)
t2 = timeit.timeit(lambda: fast_sum(data), number=1000)

print(f"  Manual loop sum   : {t1:.4f}s")
print(f"  Built-in sum()    : {t2:.4f}s  ({t1/t2:.1f}x faster)")


# ===== 4. Local Variable vs Global =====
print("\n=== Local vs Global Variable Access ===")

global_data = list(range(1000))

def sum_global():
    total = 0
    for x in global_data:
        total += x
    return total

def sum_local(data=None):
    if data is None:
        data = global_data
    total = 0
    for x in data:
        total += x
    return total

t1 = timeit.timeit(sum_global, number=10000)
t2 = timeit.timeit(sum_local, number=10000)

print(f"  Global variable   : {t1:.4f}s")
print(f"  Local parameter   : {t2:.4f}s  ({t1/t2:.1f}x faster)")


# ===== 5. Enumerate vs Manual Counter =====
print("\n=== Enumerate vs Counter ===")

def with_counter(n):
    i = 0
    result = []
    for x in range(n):
        result.append((i, x))
        i += 1
    return result

def with_enumerate(n):
    return [(i, x) for i, x in enumerate(range(n))]

t1 = timeit.timeit(lambda: with_counter(10000), number=1000)
t2 = timeit.timeit(lambda: with_enumerate(10000), number=1000)

print(f"  Manual counter    : {t1:.4f}s")
print(f"  enumerate()       : {t2:.4f}s")

11. Async & Concurrency

Untuk program yang banyak melakukan I/O (network request, file read, database query), async dan concurrency bisa meningkatkan performa secara drastis.

"""Async & Concurrent — untuk I/O bound tasks."""
import asyncio
import aiohttp
import time
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
import multiprocessing


# ===== Perbandingan Sync vs Async =====
async def fetch_url_async(session, url: str) -> str:
    """Fetch URL secara async."""
    async with session.get(url) as response:
        return await response.text()


async def fetch_multiple_async(urls: list[str]) -> list[str]:
    """Fetch multiple URLs secara concurrent."""
    async with aiohttp.ClientSession() as session:
        tasks = [fetch_url_async(session, url) for url in urls]
        return await asyncio.gather(*tasks)


# ===== ThreadPoolExecutor — untuk I/O bound =====
def download_file(url: str) -> int:
    """Simulasi download file."""
    import random
    time.sleep(random.uniform(0.1, 0.5))  # Simulasi network delay
    return len(url)


def download_with_threads(urls: list[str]) -> list[int]:
    """Download menggunakan ThreadPoolExecutor."""
    with ThreadPoolExecutor(max_workers=10) as executor:
        results = list(executor.map(download_file, urls))
    return results


# ===== ProcessPoolExecutor — untuk CPU bound =====
def cpu_intensive_task(n: int) -> int:
    """Simulasi task CPU-intensive."""
    return sum(i ** 2 for i in range(n))


def process_with_pool(numbers: list[int]) -> list[int]:
    """Proses menggunakan ProcessPoolExecutor."""
    with ProcessPoolExecutor(max_workers=multiprocessing.cpu_count()) as executor:
        results = list(executor.map(cpu_intensive_task, numbers))
    return results


# ===== Benchmark =====
if __name__ == "__main__":
    urls = [f"https://httpbin.org/delay/{i % 3}" for i in range(10)]

    # Sequential (lambat)
    start = time.time()
    results_seq = [download_file(url) for url in urls]
    time_seq = time.time() - start
    print(f"Sequential  : {time_seq:.2f}s")

    # ThreadPool (cepat untuk I/O)
    start = time.time()
    results_thread = download_with_threads(urls)
    time_thread = time.time() - start
    print(f"ThreadPool  : {time_thread:.2f}s  ({time_seq/time_thread:.1f}x faster)")

    # Async (paling cepat untuk I/O)
    start = time.time()
    results_async = asyncio.run(fetch_multiple_async(urls))
    time_async = time.time() - start
    print(f"Async       : {time_async:.2f}s  ({time_seq/time_async:.1f}x faster)")

    # CPU-bound with ProcessPool
    numbers = [10**6] * 8

    start = time.time()
    results_seq = [cpu_intensive_task(n) for n in numbers]
    time_seq = time.time() - start
    print(f"\nCPU Sequential : {time_seq:.2f}s")

    start = time.time()
    results_par = process_with_pool(numbers)
    time_par = time.time() - start
    print(f"CPU Parallel   : {time_par:.2f}s  ({time_seq/time_par:.1f}x faster)")

📋 Kapan Menggunakan Apa?

Tipe Task	Tool	Alasan
I/O Bound (network, file)	asyncio / ThreadPool	GIL tidak masalah karena thread menunggu I/O
CPU Bound (komputasi)	ProcessPool / multiprocessing	Bypass GIL dengan process terpisah
Mixed	asyncio + ProcessPool	Kombinasikan keduanya

12. Quiz Pemahaman

Uji pemahaman Anda tentang profiling dan optimisasi Python:

Python Profiling & Optimisasi: Panduan Lengkap

1. Pengenalan Profiling

Jenis Profiling

Aturan Emas Profiling

2. timeit — Benchmark Cepat

Penggunaan Dasar timeit

timeit dari Command Line

Benchmark Komprehensif dengan timeit

3. cProfile — Profiling Fungsi

Penggunaan Dasar cProfile

Profiling dari Command Line

Membaca Output cProfile

Profile Decorator

4. Analisis Hasil dengan pstats

5. line_profiler — Profiling per Baris

Instalasi dan Penggunaan

Contoh Penggunaan line_profiler

Output line_profiler

6. memory_profiler — Profiling Memori

Instalasi dan Penggunaan

Contoh Profiling Memori

Output memory_profiler

7. tracemalloc — Tracking Alokasi Memori

8. Optimisasi String & I/O

9. Optimisasi Struktur Data

10. Optimisasi Loop & Comprehension

11. Async & Concurrency

12. Quiz Pemahaman

1. Tool apa yang paling cocok untuk profiling per baris kode?

2. Kolom apa di output cProfile yang menunjukkan total waktu termasuk sub-fungsi?

3. Struktur data apa yang paling efisien untuk operasi lookup?

4. Untuk task yang CPU-bound, tool concurrency apa yang tepat?

5. Mengapa list comprehension lebih cepat dari loop + append?