Automotive Review Aggregation Using Proxy Networks

Automotive Review Aggregation Using Proxy Networks

Consumer reviews shape automotive purchasing decisions more than any advertisement or marketing campaign. In Southeast Asia, where word-of-mouth and peer recommendations carry enormous weight, understanding what consumers say about vehicles provides crucial intelligence for manufacturers, dealers, and market researchers. Aggregating reviews from multiple platforms creates a comprehensive view of consumer sentiment that individual sources cannot provide.

This guide covers how to use proxy networks to collect, aggregate, and analyze automotive reviews from across Southeast Asian platforms.

The Value of Automotive Review Data

For Manufacturers

  • Product feedback: Identify recurring complaints and praise points across markets
  • Competitive analysis: Compare sentiment for your vehicles versus competitors
  • Market-specific insights: Understand how the same vehicle is perceived differently in Singapore versus Thailand
  • Feature prioritization: Learn which features matter most to consumers in each market

For Dealers

  • Inventory guidance: Stock vehicles with the strongest consumer sentiment
  • Sales enablement: Arm sales teams with data on what buyers value most
  • Service improvements: Address common complaint areas proactively
  • Marketing content: Highlight genuinely praised features in advertising

For Consumers and Media

  • Unbiased assessments: Aggregate many opinions to reduce individual bias
  • Long-term reliability data: Owner reviews over time reveal reliability patterns
  • Regional relevance: Reviews from local markets address local conditions

Review Data Sources in Southeast Asia

Automotive Review Platforms

Regional:

  • SGCarMart reviews (Singapore)
  • WapCar reviews (Malaysia)
  • Autofun reviews (Thailand, Indonesia, Philippines)
  • OTO reviews (Indonesia)
  • Carmudi reviews (regional)

Global:

  • Google Reviews (dealer and service center reviews)
  • Facebook page reviews
  • YouTube video reviews (comments and engagement data)

Forum Communities:

  • MyCarForum (Singapore)
  • Paultan.org (Malaysia)
  • Headlight Magazine Forum (Thailand)
  • Kaskus Automotive (Indonesia)

E-Commerce and Marketplace:

  • Carousell ratings (seller reviews)
  • Carro vehicle reviews
  • Carsome customer reviews

Data Points Available from Reviews

  • Star rating (numerical score)
  • Review text (pros, cons, detailed feedback)
  • Reviewer profile (ownership status, duration of ownership)
  • Vehicle details (make, model, year, variant)
  • Review date
  • Helpful votes / engagement metrics
  • Photos or videos attached to reviews
  • Response from dealer or manufacturer

Proxy Setup for Review Collection

Why Proxies Are Needed

Review platforms protect their content because reviews represent their core value proposition. Scraping without proxies leads to:

  • IP blocks after moderate request volumes
  • CAPTCHAs that interrupt collection
  • Rate limiting that makes large-scale collection impractical
  • Geographic content restrictions

DataResearchTools mobile proxies solve these problems by providing:

  • Fresh mobile IPs that review platforms trust
  • Geographic targeting to access country-specific reviews
  • Session management for navigating paginated review sections
  • Sufficient bandwidth for large-scale text collection
class ReviewProxyManager:
    def __init__(self, api_key):
        self.api_key = api_key
        self.endpoint = "proxy.dataresearchtools.com"

    def get_proxy(self, country):
        session_id = uuid4().hex[:8]
        auth = f"{self.api_key}:country-{country}-type-mobile-session-{session_id}"
        return {
            "http": f"http://{auth}@{self.endpoint}:8080",
            "https": f"http://{auth}@{self.endpoint}:8080"
        }

Scraping Automotive Reviews

SGCarMart Reviews

class SGCarMartReviewScraper:
    def __init__(self, proxy_manager):
        self.proxy_manager = proxy_manager
        self.base_url = "https://www.sgcarmart.com"

    def scrape_model_reviews(self, make, model):
        proxy = self.proxy_manager.get_proxy("SG")

        session = requests.Session()
        session.proxies.update(proxy)
        session.headers.update({
            "User-Agent": get_random_mobile_ua(),
            "Accept-Language": "en-SG,en;q=0.9"
        })

        # Navigate to review section
        response = session.get(
            f"{self.base_url}/new_cars/review/{make}/{model}/",
            timeout=30
        )

        if response.status_code != 200:
            return []

        return self.parse_reviews(response.text)

    def parse_reviews(self, html):
        soup = BeautifulSoup(html, 'html.parser')
        reviews = []

        for review_element in soup.select('.review-item, [class*="review"]'):
            review = {
                "platform": "sgcarmart",
                "country": "SG",
                "rating": self.extract_rating(review_element),
                "title": safe_text(review_element, '.review-title, h4'),
                "text": safe_text(review_element, '.review-text, .review-content'),
                "pros": self.extract_list(review_element, '.pros li, [class*="pro"] li'),
                "cons": self.extract_list(review_element, '.cons li, [class*="con"] li'),
                "reviewer": safe_text(review_element, '.reviewer-name, .author'),
                "date": safe_text(review_element, '.review-date, time'),
                "ownership_duration": safe_text(review_element, '.ownership, [class*="ownership"]'),
                "variant": safe_text(review_element, '.vehicle-variant, [class*="variant"]'),
                "helpful_count": safe_text(review_element, '.helpful-count, [class*="helpful"]'),
            }

            if review.get("text") or review.get("rating"):
                reviews.append(review)

        return reviews

    def extract_rating(self, element):
        # Try to find star rating
        stars = element.select('.star.filled, [class*="star"][class*="active"]')
        if stars:
            return len(stars)

        # Try to find numeric rating
        rating_text = safe_text(element, '.rating, [class*="rating"]')
        if rating_text:
            match = re.search(r'(\d+(?:\.\d+)?)\s*/\s*(\d+)', rating_text)
            if match:
                return float(match.group(1))

        return None

    def extract_list(self, element, selector):
        items = element.select(selector)
        return [item.get_text(strip=True) for item in items if item.get_text(strip=True)]

Google Reviews for Dealers

class GoogleDealerReviewScraper:
    def __init__(self, proxy_manager):
        self.proxy_manager = proxy_manager

    def scrape_dealer_reviews(self, dealer_name, country):
        proxy = self.proxy_manager.get_proxy(country)

        with sync_playwright() as p:
            browser = p.chromium.launch(proxy={"server": proxy["http"]})
            context = browser.new_context(
                user_agent=get_random_mobile_ua(),
                locale=self.get_locale(country)
            )
            page = context.new_page()

            # Search for dealer on Google Maps
            search_query = f"{dealer_name} car dealer"
            page.goto(f"https://www.google.com/maps/search/{search_query}", wait_until="networkidle")

            page.wait_for_timeout(3000)

            # Click on reviews section
            reviews_button = page.query_selector('[class*="reviews"]')
            if reviews_button:
                reviews_button.click()
                page.wait_for_timeout(2000)

            # Scroll through reviews
            reviews = self.collect_visible_reviews(page)

            browser.close()
            return reviews

    def collect_visible_reviews(self, page):
        reviews = page.evaluate("""
            () => {
                const reviewElements = document.querySelectorAll('[data-review-id], [class*="review"]');
                return Array.from(reviewElements).slice(0, 50).map(el => ({
                    rating: el.querySelector('[class*="star"]')?.getAttribute('aria-label'),
                    text: el.querySelector('[class*="body"], [class*="text"]')?.textContent?.trim(),
                    reviewer: el.querySelector('[class*="author"], [class*="name"]')?.textContent?.trim(),
                    date: el.querySelector('[class*="date"], time')?.textContent?.trim(),
                }));
            }
        """)
        return [r for r in reviews if r.get("text")]

Forum Review Scraping

class ForumReviewScraper:
    def __init__(self, proxy_manager):
        self.proxy_manager = proxy_manager

    def scrape_forum_threads(self, forum_url, make, model, country):
        proxy = self.proxy_manager.get_proxy(country)

        session = requests.Session()
        session.proxies.update(proxy)
        session.headers.update({"User-Agent": get_random_ua()})

        # Search forum for vehicle reviews
        search_url = f"{forum_url}/search"
        params = {"q": f"{make} {model} review owner", "type": "thread"}

        response = session.get(search_url, params=params, timeout=30)
        if response.status_code != 200:
            return []

        threads = self.parse_search_results(response.text)
        reviews = []

        for thread in threads[:20]:
            thread_content = self.scrape_thread(session, thread["url"])
            if thread_content:
                reviews.append({
                    "platform": "forum",
                    "forum_name": forum_url.split("//")[1].split("/")[0],
                    "country": country,
                    "thread_title": thread["title"],
                    "posts": thread_content,
                    "url": thread["url"],
                })
            time.sleep(random.uniform(2, 5))

        return reviews

Sentiment Analysis

Text Processing Pipeline

class AutomotiveSentimentAnalyzer:
    def __init__(self):
        self.aspect_keywords = {
            "reliability": ["reliable", "breakdown", "repair", "problem", "issue", "fault", "defect", "warranty"],
            "comfort": ["comfortable", "ride", "smooth", "noise", "cabin", "seat", "ergonomic", "spacious"],
            "performance": ["power", "acceleration", "engine", "speed", "torque", "handling", "responsive"],
            "fuel_economy": ["fuel", "consumption", "mileage", "economy", "efficient", "petrol", "diesel", "range"],
            "value": ["value", "price", "worth", "expensive", "cheap", "affordable", "overpriced", "bargain"],
            "safety": ["safe", "safety", "airbag", "braking", "abs", "stability", "crash", "accident"],
            "technology": ["infotainment", "screen", "bluetooth", "camera", "sensor", "carplay", "android"],
            "maintenance": ["service", "maintenance", "parts", "cost", "servicing", "workshop", "dealer"],
            "design": ["design", "look", "style", "attractive", "ugly", "modern", "dated", "exterior", "interior"],
            "resale": ["resale", "depreciation", "value retention", "sell", "trade-in"],
        }

    def analyze_review(self, review_text):
        """Analyze a single review for sentiment by aspect"""
        if not review_text:
            return None

        text_lower = review_text.lower()
        aspects = {}

        for aspect, keywords in self.aspect_keywords.items():
            mentions = sum(1 for kw in keywords if kw in text_lower)
            if mentions > 0:
                sentiment = self.estimate_sentiment_for_aspect(text_lower, keywords)
                aspects[aspect] = {
                    "mentioned": True,
                    "keyword_count": mentions,
                    "sentiment": sentiment,
                }

        return {
            "overall_sentiment": self.estimate_overall_sentiment(text_lower),
            "aspects": aspects,
            "word_count": len(review_text.split()),
        }

    def estimate_sentiment_for_aspect(self, text, aspect_keywords):
        """Estimate sentiment for a specific aspect based on surrounding context"""
        positive_words = ["good", "great", "excellent", "amazing", "love", "best", "perfect",
                         "impressed", "recommend", "fantastic", "smooth", "quiet"]
        negative_words = ["bad", "poor", "terrible", "worst", "hate", "disappointing",
                         "horrible", "annoying", "noisy", "expensive", "cheap", "problem"]

        positive_count = sum(1 for w in positive_words if w in text)
        negative_count = sum(1 for w in negative_words if w in text)

        total = positive_count + negative_count
        if total == 0:
            return "neutral"

        ratio = positive_count / total
        if ratio > 0.6:
            return "positive"
        elif ratio < 0.4:
            return "negative"
        return "mixed"

    def estimate_overall_sentiment(self, text):
        positive_indicators = ["recommend", "love", "great", "excellent", "happy", "satisfied", "best"]
        negative_indicators = ["regret", "disappointed", "avoid", "worst", "terrible", "unhappy", "waste"]

        pos = sum(1 for w in positive_indicators if w in text)
        neg = sum(1 for w in negative_indicators if w in text)

        if pos > neg:
            return "positive"
        elif neg > pos:
            return "negative"
        return "neutral"

Aggregated Sentiment Report

class SentimentReportGenerator:
    def generate_model_report(self, make, model, reviews):
        """Generate a comprehensive sentiment report for a vehicle model"""
        analyzer = AutomotiveSentimentAnalyzer()

        analyzed_reviews = []
        for review in reviews:
            text = review.get("text", "")
            pros = " ".join(review.get("pros", []))
            cons = " ".join(review.get("cons", []))
            full_text = f"{text} {pros} {cons}"

            analysis = analyzer.analyze_review(full_text)
            if analysis:
                analyzed_reviews.append({
                    **review,
                    "analysis": analysis,
                })

        if not analyzed_reviews:
            return None

        # Aggregate sentiment by aspect
        aspect_summary = {}
        for aspect in analyzer.aspect_keywords:
            aspect_reviews = [r for r in analyzed_reviews
                            if aspect in r["analysis"]["aspects"]]

            if aspect_reviews:
                sentiments = [r["analysis"]["aspects"][aspect]["sentiment"] for r in aspect_reviews]
                aspect_summary[aspect] = {
                    "mention_count": len(aspect_reviews),
                    "mention_pct": round(len(aspect_reviews) / len(analyzed_reviews) * 100, 1),
                    "positive_pct": round(sentiments.count("positive") / len(sentiments) * 100, 1),
                    "negative_pct": round(sentiments.count("negative") / len(sentiments) * 100, 1),
                    "neutral_pct": round(sentiments.count("neutral") / len(sentiments) * 100, 1),
                }

        # Overall sentiment distribution
        overall_sentiments = [r["analysis"]["overall_sentiment"] for r in analyzed_reviews]

        # Ratings distribution
        ratings = [r.get("rating") for r in analyzed_reviews if r.get("rating")]
        avg_rating = statistics.mean(ratings) if ratings else None

        return {
            "vehicle": f"{make} {model}",
            "total_reviews": len(analyzed_reviews),
            "sources": list(set(r.get("platform") for r in analyzed_reviews)),
            "countries": list(set(r.get("country") for r in analyzed_reviews)),
            "average_rating": avg_rating,
            "overall_sentiment": {
                "positive": round(overall_sentiments.count("positive") / len(overall_sentiments) * 100, 1),
                "negative": round(overall_sentiments.count("negative") / len(overall_sentiments) * 100, 1),
                "neutral": round(overall_sentiments.count("neutral") / len(overall_sentiments) * 100, 1),
            },
            "aspect_analysis": aspect_summary,
            "strengths": [a for a, s in aspect_summary.items() if s["positive_pct"] > 60],
            "weaknesses": [a for a, s in aspect_summary.items() if s["negative_pct"] > 40],
        }

Competitive Sentiment Comparison

class CompetitiveSentimentAnalysis:
    def compare_models(self, models_data):
        """Compare sentiment across competing models"""
        comparison = []

        for model_key, report in models_data.items():
            if not report:
                continue

            comparison.append({
                "vehicle": model_key,
                "total_reviews": report["total_reviews"],
                "avg_rating": report.get("average_rating"),
                "positive_pct": report["overall_sentiment"]["positive"],
                "strengths": report["strengths"],
                "weaknesses": report["weaknesses"],
                "top_aspect": max(
                    report["aspect_analysis"].items(),
                    key=lambda x: x[1]["positive_pct"]
                )[0] if report["aspect_analysis"] else None,
            })

        return sorted(comparison, key=lambda x: x.get("positive_pct", 0), reverse=True)

Building a Review Aggregation Pipeline

End-to-End Pipeline

class ReviewAggregationPipeline:
    def __init__(self, proxy_manager, db):
        self.proxy_manager = proxy_manager
        self.db = db
        self.scrapers = self.initialize_scrapers()
        self.analyzer = AutomotiveSentimentAnalyzer()
        self.report_gen = SentimentReportGenerator()

    def run_pipeline(self, make, model, countries):
        # Step 1: Collect reviews from all sources
        all_reviews = []
        for country in countries:
            for scraper in self.scrapers.get(country, []):
                reviews = scraper.scrape_model_reviews(make, model)
                all_reviews.extend(reviews)
                time.sleep(random.uniform(2, 5))

        # Step 2: Deduplicate reviews
        unique_reviews = self.deduplicate_reviews(all_reviews)

        # Step 3: Analyze sentiment
        report = self.report_gen.generate_model_report(make, model, unique_reviews)

        # Step 4: Store results
        self.db.save_review_report(make, model, report)
        self.db.save_raw_reviews(make, model, unique_reviews)

        return report

    def deduplicate_reviews(self, reviews):
        seen_texts = set()
        unique = []
        for review in reviews:
            text_hash = hash(review.get("text", "")[:200])
            if text_hash not in seen_texts:
                seen_texts.add(text_hash)
                unique.append(review)
        return unique

Conclusion

Automotive review aggregation transforms scattered consumer opinions into structured market intelligence. By systematically collecting reviews from across Southeast Asian platforms, forums, and social media, businesses can understand consumer sentiment with a depth and breadth that no single source provides.

DataResearchTools mobile proxies enable reliable review collection from platforms that actively protect their content. With mobile IPs from carriers across Singapore, Malaysia, Thailand, Indonesia, and the Philippines, DataResearchTools ensures your review scrapers can access every major automotive review source in the region without interruption.

The insights from aggregated review data, covering strengths, weaknesses, competitive positioning, and market-specific sentiment, directly inform product development, marketing strategy, and sales approaches. For any business operating in the Southeast Asian automotive market, systematic review aggregation is a foundation for customer-centric decision-making.


Related Reading

Scroll to Top