Web Scraping Monitoring with Grafana: Complete Setup
Grafana transforms raw scraping metrics into visual dashboards that let you spot problems before they impact your data collection. Combined with Prometheus for metric storage, this monitoring stack provides real-time visibility into every aspect of your web scraping infrastructure.
Metrics to Monitor
| Category | Metrics | Why |
|---|---|---|
| Throughput | Requests/sec, pages/min | Capacity planning |
| Success Rate | 2xx vs 4xx/5xx ratio | Quality indicator |
| Latency | P50, P95, P99 response times | Performance |
| Proxy Health | Pool size, dead proxies | Infrastructure |
| Data Quality | Items extracted, empty responses | Output value |
| Errors | CAPTCHA rate, block rate, timeouts | Problem detection |
Prometheus Metrics in Python
from prometheus_client import Counter, Histogram, Gauge, start_http_server
SCRAPE_REQUESTS = Counter('scraper_requests_total', 'Total requests', ['status', 'domain'])
SCRAPE_DURATION = Histogram('scraper_duration_seconds', 'Request duration', ['domain'])
PROXY_POOL = Gauge('proxy_pool_size', 'Proxy pool by status', ['status'])
ITEMS_EXTRACTED = Counter('items_extracted_total', 'Data items extracted', ['type'])
ACTIVE_WORKERS = Gauge('scraper_active_workers', 'Active scraping workers')
CAPTCHA_ENCOUNTERS = Counter('captcha_encounters_total', 'CAPTCHA challenges', ['domain'])
class MonitoredScraper:
def __init__(self):
start_http_server(8000)
def scrape_page(self, url, proxy):
import requests, time
domain = url.split('/')[2]
ACTIVE_WORKERS.inc()
start = time.time()
try:
resp = requests.get(url, proxies={"http": proxy, "https": proxy}, timeout=15)
duration = time.time() - start
SCRAPE_DURATION.labels(domain=domain).observe(duration)
if resp.status_code == 200:
SCRAPE_REQUESTS.labels(status='success', domain=domain).inc()
if 'captcha' in resp.text.lower():
CAPTCHA_ENCOUNTERS.labels(domain=domain).inc()
return resp
else:
SCRAPE_REQUESTS.labels(status=f'http_{resp.status_code}', domain=domain).inc()
except Exception:
SCRAPE_REQUESTS.labels(status='error', domain=domain).inc()
finally:
ACTIVE_WORKERS.dec()Grafana Dashboard JSON
Import this dashboard configuration into Grafana:
Key Panels
- Request Rate —
rate(scraper_requests_total[5m])— Time series showing requests per second - Success Rate —
sum(rate(scraper_requests_total{status="success"}[5m])) / sum(rate(scraper_requests_total[5m])) * 100— Single stat showing percentage - P95 Latency —
histogram_quantile(0.95, rate(scraper_duration_seconds_bucket[5m]))— Time series - Active Workers —
scraper_active_workers— Gauge - CAPTCHA Rate —
rate(captcha_encounters_total[5m])— Alert threshold - Proxy Pool Health —
proxy_pool_size— Stacked bar by status
Alert Rules
groups:
- name: scraping_alerts
rules:
- alert: HighErrorRate
expr: sum(rate(scraper_requests_total{status!="success"}[5m])) / sum(rate(scraper_requests_total[5m])) > 0.2
for: 5m
annotations:
summary: "Error rate above 20%"
- alert: CAPTCHASpike
expr: rate(captcha_encounters_total[5m]) > 0.1
for: 3m
annotations:
summary: "CAPTCHA rate increasing"
- alert: ProxyPoolLow
expr: proxy_pool_size{status="healthy"} < 10
for: 2m
annotations:
summary: "Healthy proxy pool below 10"FAQ
What is the minimum Grafana setup for scraping monitoring?
Start with three panels: success rate, request latency, and error count. These three metrics give you immediate visibility into whether your scraping is working. Add proxy-specific metrics as your operation grows.
Can I monitor proxies from multiple providers?
Yes. Tag your Prometheus metrics with a provider label and create Grafana panels that compare performance across proxy providers. This helps identify which provider performs best for specific targets.
How long should I retain scraping metrics?
Keep high-resolution data (15s intervals) for 7 days and downsampled data for 90 days. This lets you investigate recent issues in detail while maintaining long-term trend visibility.
Complete Dashboard Setup
Step 1: Install Grafana with Docker
docker run -d \
--name=grafana \
-p 3000:3000 \
-v grafana-storage:/var/lib/grafana \
grafana/grafana:latestStep 2: Add Prometheus Data Source
- Open Grafana at
http://localhost:3000(default: admin/admin) - Navigate to Configuration > Data Sources
- Add Prometheus with URL:
http://prometheus:9090 - Click “Save & Test”
Step 3: Create Dashboard
Panel 1: Request Success Rate
Query: sum(rate(scraper_requests_total{status="success"}[5m])) / sum(rate(scraper_requests_total[5m])) * 100
Visualization: Gauge
Thresholds: 90 (green), 70 (yellow), 0 (red)Panel 2: Request Latency Distribution
Query A: histogram_quantile(0.50, rate(scraper_duration_seconds_bucket[5m]))
Query B: histogram_quantile(0.95, rate(scraper_duration_seconds_bucket[5m]))
Query C: histogram_quantile(0.99, rate(scraper_duration_seconds_bucket[5m]))
Legend: P50, P95, P99Panel 3: Error Breakdown
Query: sum by (status)(rate(scraper_requests_total{status!="success"}[5m]))
Visualization: Pie ChartPanel 4: Throughput Over Time
Query: sum(rate(scraper_requests_total[5m])) * 60
Unit: requests/min
Visualization: Time SeriesPanel 5: Active Workers
Query: scraper_active_workers
Visualization: Stat with sparklinePanel 6: Data Items Extracted
Query: sum(increase(items_extracted_total[1h]))
Visualization: Bar Chart, grouped by typeCustom Grafana Variables
Add dashboard variables for dynamic filtering:
Variable: domain
Query: label_values(scraper_requests_total, domain)
Multi-value: Yes
Variable: time_range
Values: 5m, 15m, 1h, 6h, 24hUse variables in queries: rate(scraper_requests_total{domain=~"$domain"}[$time_range])
Integration with Slack Alerts
# Grafana notification channel
apiVersion: 1
notifiers:
- name: slack-scraping
type: slack
settings:
url: https://hooks.slack.com/services/YOUR/WEBHOOK/URL
channel: "#scraping-alerts"
reminder_frequency: "30m"Proxy-Specific Dashboards
Create a dedicated proxy monitoring section:
# Additional Prometheus metrics for proxy monitoring
PROXY_LATENCY = Histogram(
'proxy_latency_seconds',
'Proxy response latency',
['provider', 'country', 'proxy_type']
)
PROXY_SUCCESS = Counter(
'proxy_success_total',
'Successful proxy requests',
['provider', 'country']
)
PROXY_FAILURES = Counter(
'proxy_failures_total',
'Failed proxy requests',
['provider', 'country', 'error_type']
)
PROXY_BANDWIDTH = Counter(
'proxy_bandwidth_bytes_total',
'Bytes transferred through proxy',
['provider', 'direction'] # direction: upload/download
)Proxy Dashboard Panels
| Panel | PromQL | Purpose |
|---|---|---|
| Proxy Success by Provider | sum by (provider)(rate(proxy_success_total[5m])) | Compare provider reliability |
| Latency by Country | histogram_quantile(0.95, sum by (country, le)(rate(proxy_latency_seconds_bucket[5m]))) | Geographic performance |
| Bandwidth Usage | sum(increase(proxy_bandwidth_bytes_total[24h])) / 1073741824 | Daily GB usage |
| Error Types | sum by (error_type)(rate(proxy_failures_total[5m])) | Diagnose proxy issues |
Best Practices for Scraping Dashboards
- Use annotations — Mark deployment times, proxy pool changes, and target site updates on your time series
- Set up on-call rotation — Use Grafana OnCall to route alerts to the right team member
- Create runbooks — Link alert descriptions to troubleshooting documentation
- Use variables — Allow filtering by domain, proxy provider, and time range
- Export dashboards — Store dashboard JSON in version control alongside your scraping code
For the complete monitoring stack, combine Grafana with our web scraping dashboard guide and proxy health monitor.
- Build an Anti-Detection Test Suite: Verify Browser Stealth
- Build a Proxy Rotator in Python: Complete Tutorial
- AJAX Request Interception: Scraping API Calls Directly
- Bandwidth Optimization for Proxies: Reduce Costs & Increase Speed
- How to Configure Proxies on iPhone and Android
- How to Use Proxies in Node.js (Axios, Fetch, Puppeteer)
- Build an Anti-Detection Test Suite: Verify Browser Stealth
- Build a News Crawler in Python: Step-by-Step Tutorial
- AJAX Request Interception: Scraping API Calls Directly
- Azure Functions for Serverless Web Scraping: the Complete Guide
- How to Configure Proxies on iPhone and Android
- How to Use Proxies in Node.js (Axios, Fetch, Puppeteer)
Related Reading
- Build an Anti-Detection Test Suite: Verify Browser Stealth
- Build a News Crawler in Python: Step-by-Step Tutorial
- AJAX Request Interception: Scraping API Calls Directly
- Azure Functions for Serverless Web Scraping: the Complete Guide
- How to Configure Proxies on iPhone and Android
- How to Use Proxies in Node.js (Axios, Fetch, Puppeteer)