How to Scrape G2 Reviews with Proxies in 2026
G2 is the world’s leading B2B software review platform, hosting millions of verified reviews across thousands of product categories. For SaaS companies, investors, and market researchers, G2 data is a goldmine of competitive intelligence. However, scraping G2 at any meaningful scale requires proxy infrastructure to bypass their anti-bot protections.
This guide covers how to extract G2 review data using Python with proxy rotation, from individual product reviews to large-scale category analysis.
Why G2 Data Matters
G2 reviews drive real business decisions in the B2B software space:
- Competitive intelligence — Understand how customers rate your competitors, what they love, and what they hate
- Product development — Mine review text for feature requests and pain points
- Market mapping — Identify all players in a software category with their positioning
- Sales enablement — Build battlecards using competitor weakness data
- Investment research — Evaluate SaaS companies based on customer sentiment trends
- Content marketing — Create comparison content backed by real user data
- Win/loss analysis — Understand why customers choose one product over another
Data Points to Extract
G2 provides rich structured data on every product:
| Data Point | Source | Use Case |
|---|---|---|
| Overall rating | Product page | Quick comparison |
| Review text | Review cards | Sentiment analysis |
| Star breakdown | Rating distribution | Quality assessment |
| Reviewer info | Review metadata | Company size, industry |
| Pros and cons | Structured fields | Feature comparison |
| Alternatives listed | Comparison section | Competitive mapping |
| Category ranking | Grid reports | Market positioning |
| Implementation rating | Specific metric | Ease of adoption |
| Support rating | Specific metric | Service quality |
| Feature ratings | Individual scores | Detailed comparison |
| Review date | Timestamp | Trend analysis |
Understanding G2’s Anti-Bot Defenses
G2 employs several protective measures:
- Cloudflare protection — G2 sits behind Cloudflare, which provides bot detection, JavaScript challenges, and IP reputation scoring
- Rate limiting — Aggressive request throttling per IP
- JavaScript rendering — Review content loaded dynamically
- Session validation — Cookie and token checks across page loads
- CAPTCHA triggers — Cloudflare Turnstile challenges for suspicious traffic
Setting Up Your Environment
pip install requests beautifulsoup4 lxml fake-useragent cloudscraperWe use cloudscraper to handle Cloudflare’s JavaScript challenges automatically.
Python Code: Scraping G2 Reviews
import cloudscraper
from bs4 import BeautifulSoup
import json
import time
import random
import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
class G2Scraper:
def __init__(self, proxy_list: list):
self.proxy_list = proxy_list
self.base_url = "https://www.g2.com"
self.reviews = []
def get_proxy(self) -> dict:
proxy = random.choice(self.proxy_list)
return {"http": f"http://{proxy}", "https": f"http://{proxy}"}
def create_scraper_session(self):
"""Create a cloudscraper session to handle Cloudflare."""
scraper = cloudscraper.create_scraper(
browser={
"browser": "chrome",
"platform": "windows",
"desktop": True
}
)
return scraper
def scrape_product_reviews(self, product_slug: str, max_pages: int = 20):
"""Scrape all reviews for a G2 product."""
session = self.create_scraper_session()
for page in range(1, max_pages + 1):
url = f"{self.base_url}/products/{product_slug}/reviews?page={page}"
logger.info(f"Scraping reviews page {page}: {url}")
try:
response = session.get(
url,
proxies=self.get_proxy(),
timeout=30
)
if response.status_code == 200:
page_reviews = self.parse_reviews_page(response.text)
if not page_reviews:
logger.info(f"No more reviews on page {page}")
break
self.reviews.extend(page_reviews)
logger.info(f"Extracted {len(page_reviews)} reviews from page {page}")
elif response.status_code == 403:
logger.warning("Cloudflare block detected -- rotating proxy")
session = self.create_scraper_session()
time.sleep(random.uniform(10, 20))
continue
else:
logger.error(f"Status {response.status_code}")
except Exception as e:
logger.error(f"Request failed: {e}")
session = self.create_scraper_session()
time.sleep(random.uniform(4, 8))
def parse_reviews_page(self, html: str) -> list:
"""Parse review data from G2 reviews page."""
soup = BeautifulSoup(html, "lxml")
reviews = []
review_cards = soup.select("[class*='review'], [id*='review']")
for card in review_cards:
review = {}
# Star rating
rating_el = card.select_one("[class*='stars'], [class*='rating']")
if rating_el:
# G2 uses star icons; count filled stars or read aria-label
aria = rating_el.get("aria-label", "")
if "out of" in aria:
review["rating"] = aria.split(" out of")[0].strip()
# Review title
title_el = card.select_one("h3, [class*='review-title']")
if title_el:
review["title"] = title_el.get_text(strip=True)
# What do you like best
pros_el = card.select_one("[class*='like-best'], [data-testid*='like']")
if pros_el:
review["pros"] = pros_el.get_text(strip=True)
# What do you dislike
cons_el = card.select_one("[class*='dislike'], [data-testid*='dislike']")
if cons_el:
review["cons"] = cons_el.get_text(strip=True)
# Reviewer details
reviewer_el = card.select_one("[class*='reviewer'], [class*='user-info']")
if reviewer_el:
review["reviewer"] = reviewer_el.get_text(strip=True)
# Company size
company_el = card.select_one("[class*='company-size'], [class*='segment']")
if company_el:
review["company_size"] = company_el.get_text(strip=True)
# Date
date_el = card.select_one("time, [class*='date']")
if date_el:
review["date"] = date_el.get("datetime", date_el.get_text(strip=True))
if review.get("title") or review.get("pros"):
reviews.append(review)
return reviews
def scrape_product_info(self, product_slug: str) -> dict:
"""Scrape product overview information."""
session = self.create_scraper_session()
url = f"{self.base_url}/products/{product_slug}/reviews"
try:
response = session.get(
url,
proxies=self.get_proxy(),
timeout=30
)
if response.status_code != 200:
return {}
soup = BeautifulSoup(response.text, "lxml")
info = {}
# Product name
name_el = soup.select_one("h1, [class*='product-name']")
if name_el:
info["name"] = name_el.get_text(strip=True)
# Overall rating
rating_el = soup.select_one("[class*='overall-rating'], [class*='avg-rating']")
if rating_el:
info["overall_rating"] = rating_el.get_text(strip=True)
# Total reviews
count_el = soup.select_one("[class*='review-count'], [class*='total-reviews']")
if count_el:
info["total_reviews"] = count_el.get_text(strip=True)
# Category
category_el = soup.select_one("[class*='category-link'], [class*='breadcrumb']")
if category_el:
info["category"] = category_el.get_text(strip=True)
# Alternatives
alternatives = []
alt_els = soup.select("[class*='alternative'] a, [class*='competitor'] a")
for alt in alt_els:
alternatives.append(alt.get_text(strip=True))
info["alternatives"] = alternatives
return info
except Exception as e:
logger.error(f"Product info scrape failed: {e}")
return {}
def scrape_category(self, category_slug: str, max_pages: int = 5) -> list:
"""Scrape all products in a G2 category."""
session = self.create_scraper_session()
products = []
for page in range(1, max_pages + 1):
url = f"{self.base_url}/categories/{category_slug}?page={page}"
logger.info(f"Scraping category page {page}")
try:
response = session.get(
url,
proxies=self.get_proxy(),
timeout=30
)
if response.status_code == 200:
soup = BeautifulSoup(response.text, "lxml")
cards = soup.select("[class*='product-card'], [class*='listing']")
for card in cards:
name_el = card.select_one("a[href*='/products/']")
if name_el:
products.append({
"name": name_el.get_text(strip=True),
"slug": name_el["href"].split("/products/")[1].rstrip("/"),
"url": self.base_url + name_el["href"]
})
if not cards:
break
except Exception as e:
logger.error(f"Category scrape failed: {e}")
time.sleep(random.uniform(3, 7))
return products
# Usage
if __name__ == "__main__":
proxies = [
"user:pass@residential1.proxy.com:8080",
"user:pass@residential2.proxy.com:8080",
"user:pass@residential3.proxy.com:8080",
]
scraper = G2Scraper(proxy_list=proxies)
# Scrape reviews for a specific product
scraper.scrape_product_reviews("slack/reviews", max_pages=10)
# Get product info
info = scraper.scrape_product_info("slack/reviews")
print(f"Product: {info.get('name')}")
print(f"Total reviews scraped: {len(scraper.reviews)}")
with open("g2_reviews.json", "w") as f:
json.dump({
"product_info": info,
"reviews": scraper.reviews
}, f, indent=2)Proxy Rotation Strategy for G2
G2’s Cloudflare protection requires a strategic approach to proxy usage:
- Use residential proxies exclusively — Cloudflare flags datacenter IPs immediately
- Rotate IPs every 2-3 requests — More frequent rotation helps avoid Cloudflare’s behavioral scoring
- Sticky sessions for detail pages — When scraping a single product’s reviews across pages, maintain the same IP for 3-5 pages before rotating
- US-based IPs preferred — G2 is primarily a US platform; US residential IPs get the least friction
- Pool size — Maintain a pool of at least 50 residential IPs for sustained scraping
Calculate bandwidth costs for your G2 scraping project with our proxy cost calculator.
Advanced Techniques
Extracting JSON-LD Data
G2 embeds structured data in JSON-LD format on product pages:
def extract_structured_data(html: str) -> list:
"""Extract JSON-LD structured data from G2 pages."""
soup = BeautifulSoup(html, "lxml")
scripts = soup.find_all("script", type="application/ld+json")
data = []
for script in scripts:
try:
parsed = json.loads(script.string)
data.append(parsed)
except (json.JSONDecodeError, TypeError):
continue
return dataSentiment Analysis Pipeline
Once you have review data, run sentiment analysis to quantify opinions:
from collections import Counter
def analyze_sentiment(reviews: list) -> dict:
"""Basic sentiment analysis on G2 reviews."""
positive_keywords = ["love", "great", "excellent", "easy", "intuitive", "powerful"]
negative_keywords = ["slow", "buggy", "expensive", "confusing", "lacking", "poor"]
pos_count = 0
neg_count = 0
for review in reviews:
text = (review.get("pros", "") + " " + review.get("cons", "")).lower()
for kw in positive_keywords:
if kw in text:
pos_count += 1
for kw in negative_keywords:
if kw in text:
neg_count += 1
return {
"positive_mentions": pos_count,
"negative_mentions": neg_count,
"sentiment_ratio": pos_count / max(neg_count, 1)
}Troubleshooting
Problem: Cloudflare challenges blocking every request
- Use
cloudscraperlibrary instead of plain requests. It handles JavaScript challenges. - If still blocked, switch to a headless browser (Playwright) with stealth plugins.
- Verify proxy quality — low-reputation IPs trigger Cloudflare more aggressively.
Problem: Reviews page returns empty content
- G2 lazy-loads reviews via JavaScript. Check if the initial HTML contains review data or if it requires JS execution.
- Look for API endpoints in network traffic that return review JSON directly.
Problem: Getting different data than what the browser shows
- G2 may serve different content based on authentication state. Some review details are gated behind G2 login.
- Use browser cookies from a logged-in session to access full review content.
Problem: Rate limited after a small number of requests
- Increase delays between requests to 5-10 seconds.
- Rotate both IP and User-Agent on every request.
- Spread scraping across different times of day.
Verify your proxy IP is clean using our IP lookup tool.
Legal and Ethical Considerations
Scraping G2 reviews involves several legal considerations:
- G2’s Terms of Service — G2 prohibits automated scraping in their ToS. Commercial use of scraped data could expose you to legal claims.
- Review ownership — Reviews are written by users but licensed to G2. Republishing full review text may infringe on G2’s rights.
- Personal data — Reviewer names, titles, and company affiliations constitute personal data under GDPR and CCPA. Handle this data with care.
- Fair use — Aggregating and analyzing review data for research purposes may fall under fair use, but this varies by jurisdiction.
- G2 API alternatives — G2 offers official API access for some data. Consider using official channels for commercial applications.
- Competitive use — Using scraped competitor reviews in marketing materials could raise unfair competition claims.
Always consult with a legal professional before scraping review platforms at scale.
Conclusion
G2 reviews are one of the most valuable data sources for B2B competitive intelligence. Scraping them requires Cloudflare bypass capabilities, residential proxies, and careful rate management. The cloudscraper library handles most Cloudflare challenges, but for the most reliable results, consider combining it with headless browser automation. Start with a specific product or category and expand your scraping scope gradually as you refine your approach.
- How to Scrape Airbnb Listings with Proxies in 2026
- How to Scrape Facebook Marketplace with Proxies in 2026
- aiohttp + BeautifulSoup: Async Python Scraping
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
- ASEAN Data Protection Laws: A Web Scraping Compliance Matrix
- How to Scrape Airbnb Listings with Proxies in 2026
- How to Scrape Facebook Marketplace with Proxies in 2026
- aiohttp + BeautifulSoup: Async Python Scraping
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
- ASEAN Data Protection Laws: A Web Scraping Compliance Matrix
- How to Scrape Airbnb Listings with Proxies in 2026
- How to Scrape Facebook Marketplace with Proxies in 2026
- aiohttp + BeautifulSoup: Async Python Scraping
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
- ASEAN Data Protection Laws: A Web Scraping Compliance Matrix
Related Reading
- How to Scrape Airbnb Listings with Proxies in 2026
- How to Scrape Facebook Marketplace with Proxies in 2026
- aiohttp + BeautifulSoup: Async Python Scraping
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
- ASEAN Data Protection Laws: A Web Scraping Compliance Matrix