mitmproxy Tutorial: Intercept, Inspect & Modify HTTP Traffic

mitmproxy Tutorial: Intercept, Inspect & Modify HTTP Traffic

mitmproxy is the Swiss Army knife of proxy debugging. It intercepts HTTP/HTTPS traffic, lets you inspect every request and response in real time, modify traffic on the fly with Python scripts, and replay captured sessions. For web scrapers, it is indispensable for reverse-engineering APIs, debugging blocked requests, and understanding anti-bot behavior.

This tutorial covers installation, basic usage, scripting, and real-world scraping applications.

Installation

# macOS
brew install mitmproxy

# Linux
pip install mitmproxy

# Windows
pip install mitmproxy

# Docker
docker run --rm -it -p 8080:8080 mitmproxy/mitmproxy

# Verify installation
mitmproxy --version

Three mitmproxy Interfaces

mitmproxy ships with three interfaces:

CommandInterfaceUse Case
mitmproxyInteractive TUIManual inspection
mitmwebWeb-based GUIVisual debugging
mitmdumpCommand lineScripting, automation

Quick Start

Step 1: Start mitmproxy

# Start on default port 8080
mitmproxy -p 8080

# Or use the web interface
mitmweb -p 8080
# Open http://127.0.0.1:8081 for the web UI

Step 2: Configure Client to Use the Proxy

# curl
curl -x http://localhost:8080 http://httpbin.org/get

# Python
export HTTP_PROXY=http://localhost:8080
export HTTPS_PROXY=http://localhost:8080
python my_scraper.py

# Or configure in code
import requests
proxies = {
    'http': 'http://localhost:8080',
    'https': 'http://localhost:8080',
}
r = requests.get('https://httpbin.org/get', proxies=proxies, verify=False)

Step 3: Install the CA Certificate (for HTTPS)

mitmproxy generates a CA certificate for TLS interception. Install it to avoid SSL errors:

# The CA cert is at ~/.mitmproxy/mitmproxy-ca-cert.pem

# macOS: Add to system keychain
sudo security add-trusted-cert -d -r trustRoot \
    -k /Library/Keychains/System.keychain \
    ~/.mitmproxy/mitmproxy-ca-cert.pem

# Linux: Copy and update
sudo cp ~/.mitmproxy/mitmproxy-ca-cert.pem /usr/local/share/ca-certificates/mitmproxy.crt
sudo update-ca-certificates

# Python requests: Set env var
export REQUESTS_CA_BUNDLE=~/.mitmproxy/mitmproxy-ca-cert.pem

# Or use verify parameter
requests.get(url, proxies=proxies, verify='~/.mitmproxy/mitmproxy-ca-cert.pem')

Writing mitmproxy Scripts (Addons)

The real power of mitmproxy is Python scripting. Create addons that intercept and modify traffic:

Basic Request Logger

# logger.py
from mitmproxy import http
import json

class RequestLogger:
    def __init__(self):
        self.count = 0

    def request(self, flow: http.HTTPFlow):
        """Called for each request."""
        self.count += 1
        print(f"[{self.count}] {flow.request.method} {flow.request.pretty_url}")
        print(f"  Headers: {dict(flow.request.headers)}")

    def response(self, flow: http.HTTPFlow):
        """Called for each response."""
        print(f"  → {flow.response.status_code} "
              f"({len(flow.response.content)} bytes)")

addons = [RequestLogger()]

# Run: mitmdump -s logger.py -p 8080

API Discovery Script

# api_discovery.py
"""Discover internal API endpoints used by a website."""
from mitmproxy import http
import json

class APIDiscovery:
    def __init__(self):
        self.apis = []

    def response(self, flow: http.HTTPFlow):
        content_type = flow.response.headers.get('content-type', '')

        # Find JSON API responses
        if 'application/json' in content_type:
            try:
                data = json.loads(flow.response.content)
                api_info = {
                    'url': flow.request.pretty_url,
                    'method': flow.request.method,
                    'status': flow.response.status_code,
                    'response_size': len(flow.response.content),
                    'response_keys': list(data.keys()) if isinstance(data, dict) else f"array[{len(data)}]",
                    'request_headers': dict(flow.request.headers),
                }
                self.apis.append(api_info)
                print(f"\n=== API FOUND ===")
                print(f"  {flow.request.method} {flow.request.pretty_url}")
                print(f"  Response keys: {api_info['response_keys']}")
                print(f"  Size: {api_info['response_size']} bytes")
            except json.JSONDecodeError:
                pass

    def done(self):
        """Called when mitmproxy shuts down."""
        with open('discovered_apis.json', 'w') as f:
            json.dump(self.apis, f, indent=2)
        print(f"\nSaved {len(self.apis)} API endpoints to discovered_apis.json")

addons = [APIDiscovery()]
# Run: mitmdump -s api_discovery.py -p 8080
# Browse the target website, then Ctrl+C to save results

Request Modifier

# modifier.py
"""Modify requests and responses in transit."""
from mitmproxy import http

class RequestModifier:
    def request(self, flow: http.HTTPFlow):
        # Change User-Agent
        flow.request.headers['User-Agent'] = (
            'Mozilla/5.0 (Windows NT 10.0; Win64; x64) '
            'AppleWebKit/537.36 (KHTML, like Gecko) '
            'Chrome/120.0.0.0 Safari/537.36'
        )

        # Add custom headers
        flow.request.headers['Accept-Language'] = 'en-US,en;q=0.9'

        # Remove tracking headers
        for header in ['x-request-id', 'x-correlation-id']:
            if header in flow.request.headers:
                del flow.request.headers[header]

    def response(self, flow: http.HTTPFlow):
        # Remove rate limit headers to study responses
        for header in ['x-ratelimit-remaining', 'retry-after']:
            if header in flow.response.headers:
                print(f"  Rate limit: {header}={flow.response.headers[header]}")

        # Inject custom response header
        flow.response.headers['x-proxied-by'] = 'mitmproxy'

addons = [RequestModifier()]

Anti-Bot Analysis Script

# antibot_analyzer.py
"""Analyze anti-bot challenges and fingerprinting attempts."""
from mitmproxy import http
import re

class AntiBotAnalyzer:
    def __init__(self):
        self.fingerprint_scripts = []
        self.challenges = []

    def response(self, flow: http.HTTPFlow):
        content_type = flow.response.headers.get('content-type', '')

        # Detect Cloudflare challenge
        if flow.response.status_code == 403:
            body = flow.response.content.decode('utf-8', errors='ignore')
            if 'cf-browser-verification' in body or 'cloudflare' in body.lower():
                print(f"  CLOUDFLARE CHALLENGE: {flow.request.pretty_url}")
                self.challenges.append({
                    'type': 'cloudflare',
                    'url': flow.request.pretty_url,
                })

        # Detect fingerprinting JavaScript
        if 'javascript' in content_type:
            body = flow.response.content.decode('utf-8', errors='ignore')
            fp_indicators = [
                'canvas.toDataURL', 'webgl', 'AudioContext',
                'navigator.plugins', 'screen.width',
                'window.chrome', 'Notification.permission',
            ]
            found = [i for i in fp_indicators if i in body]
            if found:
                print(f"  FINGERPRINTING: {flow.request.pretty_url}")
                print(f"    Techniques: {', '.join(found)}")

        # Detect CAPTCHA
        if flow.response.status_code in [403, 429]:
            body = flow.response.content.decode('utf-8', errors='ignore')
            if any(x in body for x in ['recaptcha', 'hcaptcha', 'turnstile']):
                print(f"  CAPTCHA DETECTED: {flow.request.pretty_url}")

addons = [AntiBotAnalyzer()]

Advanced Usage

Capture and Replay

# Capture traffic to file
mitmdump -w captured_traffic.flow -p 8080

# Replay captured traffic
mitmdump -c captured_traffic.flow

# Replay through another proxy
mitmdump -c captured_traffic.flow --mode upstream:http://other-proxy:8080

Filter Specific Traffic

# Only show specific domains
mitmdump -p 8080 --set flow_detail=2 "~d api.example.com"

# Only show JSON responses
mitmdump -p 8080 "~t application/json"

# Only show POST requests
mitmdump -p 8080 "~m POST"

# Combine filters
mitmdump -p 8080 "~d api.example.com & ~m POST & ~s 200"

Chain with Upstream Proxy

# Route mitmproxy through another proxy
mitmproxy --mode upstream:http://user:pass@proxy.example.com:8080 -p 9090

# Now: Client → mitmproxy:9090 → upstream-proxy:8080 → Target
# You can inspect traffic while using a residential proxy

Use Cases for Web Scraping

Use CaseHow mitmproxy Helps
Reverse-engineer APIsDiscover internal endpoints and parameters
Debug blocked requestsCompare blocked vs successful request headers
Analyze anti-botSee what fingerprinting scripts are loaded
Extract authenticationCapture tokens, cookies, session setup
Compare browser vs scraperSee exact differences in requests
Record and replayDevelop offline against captured traffic

Internal Links

FAQ

Does mitmproxy work with HTTPS?

Yes. mitmproxy generates a CA certificate and performs TLS interception (MITM). You need to install the CA cert on your client/system to avoid SSL errors. Without the CA cert installed, you will get certificate warnings.

Can I use mitmproxy with Playwright or Puppeteer?

Yes. Configure the browser to use mitmproxy as its proxy and install the CA certificate. Playwright accepts --proxy-server and --ignore-certificate-errors flags. This lets you inspect all browser network traffic.

Will websites detect that I am using mitmproxy?

mitmproxy’s TLS interception creates a different TLS fingerprint than a real browser (the JA3 hash will differ). Some anti-bot systems detect this. For stealth, use mitmproxy only for analysis, then apply findings to your scraper using curl-impersonate or real browsers.

How much does mitmproxy slow down my traffic?

mitmproxy adds 1-5ms latency per request for TLS interception. For debugging and development, this is negligible. For production scraping, remove mitmproxy from the chain after analysis.

Can mitmproxy handle thousands of concurrent connections?

mitmproxy is designed for development and debugging, not production traffic. It handles hundreds of concurrent connections well. For high-throughput interception, use mitmdump (lower overhead) or consider dedicated solutions.


Related Reading

Scroll to Top