How to Scrape Food Influencer Content for Marketing Intelligence

How to Scrape Food Influencer Content for Marketing Intelligence

Food influencers wield enormous power in Southeast Asia’s dining and food delivery markets. From Instagram food photographers in Singapore to TikTok food reviewers in Thailand and YouTube mukbang creators in Indonesia, these content creators shape consumer preferences and drive restaurant traffic. For F&B brands, understanding this influencer landscape through systematic data collection provides a significant marketing advantage.

This guide covers how to scrape and analyze food influencer content across social media platforms in Southeast Asia.

The Food Influencer Landscape in SEA

Platform Distribution

Food influencer content is spread across multiple platforms, each with different content formats and audiences:

PlatformContent TypeKey SEA MarketsAudience Profile
InstagramPhotos, Reels, StoriesAll SEA markets18-35, visual-first
TikTokShort videos, reviewsTH, ID, PH, MY16-30, trend-driven
YouTubeLong-form reviews, vlogsAll SEA markets20-40, research-oriented
FacebookReviews, live streamsPH, TH, MY25-45, community-driven
XiaoHongShuPhoto reviews, guidesSG, MY (Chinese speakers)20-35, lifestyle-focused

Data Opportunities

Scraping food influencer content reveals:

  • Trending restaurants: Which restaurants are getting influencer attention
  • Popular cuisines: What food types are generating the most content
  • Promotion effectiveness: How sponsored content performs vs. organic
  • Sentiment patterns: What influencers praise or criticize
  • Competitor coverage: Which competitors are investing in influencer marketing
  • Content gaps: Underserved niches in food content

Building an Influencer Intelligence System

Core Architecture

import requests
import time
import random
import json
from datetime import datetime
from dataclasses import dataclass, field
from typing import List, Optional

@dataclass
class InfluencerProfile:
    platform: str
    username: str
    display_name: str
    followers: int
    following: int
    post_count: int
    bio: str
    country: str
    engagement_rate: float = 0.0
    avg_likes: int = 0
    avg_comments: int = 0
    food_content_ratio: float = 0.0
    categories: List[str] = field(default_factory=list)

@dataclass
class FoodPost:
    platform: str
    post_id: str
    author: str
    content_text: str
    hashtags: List[str]
    mentions: List[str]
    likes: int
    comments: int
    shares: int
    posted_at: datetime
    location: Optional[str] = None
    restaurant_mentioned: Optional[str] = None
    is_sponsored: bool = False
    media_urls: List[str] = field(default_factory=list)
    engagement_rate: float = 0.0

class FoodInfluencerScraper:
    def __init__(self, proxy_user, proxy_pass):
        self.proxy_user = proxy_user
        self.proxy_pass = proxy_pass

    def _get_session(self, country="SG"):
        session = requests.Session()
        proxy_host = f"{country.lower()}-mobile.dataresearchtools.com"
        session.proxies = {
            "http": f"http://{self.proxy_user}:{self.proxy_pass}@{proxy_host}:8080",
            "https": f"http://{self.proxy_user}:{self.proxy_pass}@{proxy_host}:8080"
        }
        session.headers.update({
            "User-Agent": "Mozilla/5.0 (iPhone; CPU iPhone OS 17_0 like Mac OS X) "
                         "AppleWebKit/605.1.15 Mobile/15E148",
            "Accept": "application/json"
        })
        return session

Instagram Food Content Scraping

def scrape_instagram_hashtag(self, hashtag, country="SG", max_posts=100):
    """Scrape posts from a food-related Instagram hashtag."""
    session = self._get_session(country)
    posts = []

    # Use Instagram's web API
    response = session.get(
        f"https://www.instagram.com/api/v1/tags/{hashtag}/sections/",
        headers={"X-IG-App-ID": "936619743392459"}
    )

    if response.status_code != 200:
        return posts

    data = response.json()
    sections = data.get("sections", [])

    for section in sections:
        medias = section.get("layout_content", {}).get("medias", [])
        for media_item in medias:
            media = media_item.get("media", {})
            post = self._parse_instagram_post(media)
            if post:
                posts.append(post)

            if len(posts) >= max_posts:
                break

    return posts

def _parse_instagram_post(self, media):
    """Parse an Instagram media object into a FoodPost."""
    caption = media.get("caption", {})
    caption_text = caption.get("text", "") if caption else ""

    hashtags = self._extract_hashtags(caption_text)
    mentions = self._extract_mentions(caption_text)

    user = media.get("user", {})

    return FoodPost(
        platform="instagram",
        post_id=str(media.get("pk", "")),
        author=user.get("username", ""),
        content_text=caption_text,
        hashtags=hashtags,
        mentions=mentions,
        likes=media.get("like_count", 0),
        comments=media.get("comment_count", 0),
        shares=0,
        posted_at=datetime.fromtimestamp(media.get("taken_at", 0)),
        location=media.get("location", {}).get("name") if media.get("location") else None,
        is_sponsored="paid_partnership" in str(media.get("sponsor_tags", [])),
        media_urls=[
            media.get("image_versions2", {}).get("candidates", [{}])[0].get("url", "")
        ]
    )

def _extract_hashtags(self, text):
    """Extract hashtags from text."""
    import re
    return re.findall(r'#(\w+)', text)

def _extract_mentions(self, text):
    """Extract @mentions from text."""
    import re
    return re.findall(r'@(\w+)', text)

TikTok Food Content Scraping

def scrape_tiktok_food_content(self, hashtag, country="TH", max_videos=50):
    """Scrape food-related TikTok videos."""
    session = self._get_session(country)
    videos = []

    response = session.get(
        f"https://www.tiktok.com/api/challenge/item_list/",
        params={
            "challengeName": hashtag,
            "count": 30,
            "cursor": 0
        }
    )

    if response.status_code != 200:
        return videos

    data = response.json()
    items = data.get("itemList", [])

    for item in items[:max_videos]:
        author = item.get("author", {})
        stats = item.get("stats", {})

        video = FoodPost(
            platform="tiktok",
            post_id=item.get("id", ""),
            author=author.get("uniqueId", ""),
            content_text=item.get("desc", ""),
            hashtags=[c.get("title", "") for c in item.get("challenges", [])],
            mentions=self._extract_mentions(item.get("desc", "")),
            likes=stats.get("diggCount", 0),
            comments=stats.get("commentCount", 0),
            shares=stats.get("shareCount", 0),
            posted_at=datetime.fromtimestamp(item.get("createTime", 0)),
            media_urls=[item.get("video", {}).get("cover", "")]
        )

        # Detect restaurant mentions
        video.restaurant_mentioned = self._detect_restaurant_mention(
            video.content_text, video.hashtags
        )

        videos.append(video)

    return videos

Analyzing Food Influencer Data

Restaurant Mention Analysis

def analyze_restaurant_mentions(posts, known_restaurants=None):
    """Analyze which restaurants are most frequently mentioned by influencers."""
    restaurant_mentions = {}

    for post in posts:
        # Check location tags
        if post.location:
            location = post.location
            if location not in restaurant_mentions:
                restaurant_mentions[location] = {
                    "mention_count": 0,
                    "total_engagement": 0,
                    "avg_engagement": 0,
                    "posts": [],
                    "platforms": set()
                }
            restaurant_mentions[location]["mention_count"] += 1
            restaurant_mentions[location]["total_engagement"] += (
                post.likes + post.comments + post.shares
            )
            restaurant_mentions[location]["posts"].append(post.post_id)
            restaurant_mentions[location]["platforms"].add(post.platform)

        # Check text mentions
        if post.restaurant_mentioned:
            name = post.restaurant_mentioned
            if name not in restaurant_mentions:
                restaurant_mentions[name] = {
                    "mention_count": 0,
                    "total_engagement": 0,
                    "avg_engagement": 0,
                    "posts": [],
                    "platforms": set()
                }
            restaurant_mentions[name]["mention_count"] += 1
            restaurant_mentions[name]["total_engagement"] += (
                post.likes + post.comments + post.shares
            )
            restaurant_mentions[name]["posts"].append(post.post_id)
            restaurant_mentions[name]["platforms"].add(post.platform)

    # Calculate averages and format
    for name, data in restaurant_mentions.items():
        data["avg_engagement"] = round(
            data["total_engagement"] / data["mention_count"]
        )
        data["platforms"] = list(data["platforms"])

    return dict(sorted(
        restaurant_mentions.items(),
        key=lambda x: x[1]["total_engagement"],
        reverse=True
    ))

Trending Food Analysis

def analyze_food_trends(posts, timeframe_days=30):
    """Identify trending food topics from influencer content."""
    from collections import Counter
    from datetime import timedelta

    cutoff = datetime.utcnow() - timedelta(days=timeframe_days)
    recent_posts = [p for p in posts if p.posted_at >= cutoff]

    # Analyze hashtags
    all_hashtags = []
    for post in recent_posts:
        all_hashtags.extend([h.lower() for h in post.hashtags])

    food_related_hashtags = [
        h for h in all_hashtags
        if any(keyword in h for keyword in [
            "food", "eat", "restaurant", "cafe", "makan", "กิน", "makanan",
            "delivery", "yummy", "delicious", "brunch", "dinner", "lunch",
            "noodle", "rice", "chicken", "burger", "pizza", "sushi",
            "coffee", "boba", "tea", "dessert", "cake"
        ])
    ]

    hashtag_counts = Counter(food_related_hashtags)

    # Calculate engagement per hashtag
    hashtag_engagement = {}
    for post in recent_posts:
        engagement = post.likes + post.comments + post.shares
        for hashtag in post.hashtags:
            h = hashtag.lower()
            if h not in hashtag_engagement:
                hashtag_engagement[h] = {"total": 0, "count": 0}
            hashtag_engagement[h]["total"] += engagement
            hashtag_engagement[h]["count"] += 1

    trending = []
    for hashtag, count in hashtag_counts.most_common(50):
        eng_data = hashtag_engagement.get(hashtag, {"total": 0, "count": 1})
        trending.append({
            "hashtag": f"#{hashtag}",
            "post_count": count,
            "total_engagement": eng_data["total"],
            "avg_engagement": round(eng_data["total"] / eng_data["count"]),
            "trend_score": count * (eng_data["total"] / eng_data["count"]) / 1000
        })

    trending.sort(key=lambda x: x["trend_score"], reverse=True)
    return trending

Influencer Identification and Scoring

def score_food_influencers(profiles, posts_by_author):
    """Score and rank food influencers by relevance and influence."""
    scored_influencers = []

    for profile in profiles:
        author_posts = posts_by_author.get(profile.username, [])
        if not author_posts:
            continue

        # Calculate engagement metrics
        total_engagement = sum(
            p.likes + p.comments + p.shares for p in author_posts
        )
        avg_engagement = total_engagement / len(author_posts) if author_posts else 0

        engagement_rate = (avg_engagement / profile.followers * 100) if profile.followers > 0 else 0

        # Food content ratio
        food_posts = [
            p for p in author_posts
            if any(h.lower() in ' '.join([
                "food", "eat", "restaurant", "makan", "delicious", "yummy"
            ]) for h in p.hashtags)
        ]
        food_ratio = len(food_posts) / len(author_posts) if author_posts else 0

        # Sponsored content ratio
        sponsored = [p for p in author_posts if p.is_sponsored]
        sponsored_ratio = len(sponsored) / len(author_posts) if author_posts else 0

        # Calculate influence score
        influence_score = (
            min(profile.followers / 10000, 30) +  # Reach (max 30 pts)
            min(engagement_rate * 10, 30) +         # Engagement (max 30 pts)
            food_ratio * 20 +                        # Food focus (max 20 pts)
            min(len(author_posts) / 10, 10) +       # Activity (max 10 pts)
            (10 if profile.country else 0)           # Location verified (10 pts)
        )

        scored_influencers.append({
            "username": profile.username,
            "platform": profile.platform,
            "followers": profile.followers,
            "engagement_rate": round(engagement_rate, 2),
            "food_content_ratio": round(food_ratio, 2),
            "sponsored_ratio": round(sponsored_ratio, 2),
            "influence_score": round(influence_score, 1),
            "avg_engagement": round(avg_engagement),
            "country": profile.country,
            "tier": classify_influencer_tier(profile.followers)
        })

    return sorted(scored_influencers, key=lambda x: x["influence_score"], reverse=True)

def classify_influencer_tier(followers):
    """Classify influencer by follower count."""
    if followers >= 1000000:
        return "mega"
    elif followers >= 100000:
        return "macro"
    elif followers >= 10000:
        return "mid"
    elif followers >= 1000:
        return "micro"
    else:
        return "nano"

SEA-Specific Food Hashtags

Track these popular food hashtags across SEA markets:

Singapore

  • #sgfood, #sgeats, #singaporefood, #sgfoodie, #burpple, #hungrygowhere
  • #hawkerfood, #sgcafe, #sgfoodporn, #sgrestaurant

Malaysia

  • #myfood, #malaysianfood, #klfood, #makansedap, #penangfood
  • #mamak, #kopitiam, #streetfoodmalaysia

Thailand

  • #bangkokfood, #thaifood, #กินเที่ยว, #อาหารอร่อย, #ร้านอาหาร
  • #streetfoodthailand, #bangkokeats, #thaifoodie

Philippines

  • #foodph, #manilafood, #filipinofood, #kainanph, #eatsph
  • #foodmanila, #cebufoodeats

Indonesia

  • #kuliner, #kulinerindonesia, #makanenak, #jakartafood
  • #kulinerjakarta, #makanmakan, #jajanan

Competitive Influencer Marketing Analysis

def analyze_competitor_influencer_strategy(competitor_name, posts):
    """Analyze how a competitor uses food influencers."""
    competitor_mentions = [
        p for p in posts
        if competitor_name.lower() in p.content_text.lower() or
        competitor_name.lower() in [m.lower() for m in p.mentions]
    ]

    if not competitor_mentions:
        return {"competitor": competitor_name, "influencer_activity": "none_detected"}

    sponsored = [p for p in competitor_mentions if p.is_sponsored]
    organic = [p for p in competitor_mentions if not p.is_sponsored]

    authors = set(p.author for p in competitor_mentions)
    platforms = set(p.platform for p in competitor_mentions)

    return {
        "competitor": competitor_name,
        "total_mentions": len(competitor_mentions),
        "sponsored_posts": len(sponsored),
        "organic_posts": len(organic),
        "unique_influencers": len(authors),
        "platforms_used": list(platforms),
        "total_reach": sum(p.likes + p.comments + p.shares for p in competitor_mentions),
        "avg_engagement_per_post": round(
            sum(p.likes + p.comments for p in competitor_mentions) / len(competitor_mentions)
        ),
        "top_influencers": list(authors)[:10],
        "estimated_sponsored_spend": estimate_influencer_spend(sponsored)
    }

def estimate_influencer_spend(sponsored_posts):
    """Estimate influencer marketing spend based on post metrics."""
    total_estimate = 0
    for post in sponsored_posts:
        followers_estimate = post.likes * 20  # Rough follower estimate
        if followers_estimate >= 1000000:
            total_estimate += 5000  # Mega influencer
        elif followers_estimate >= 100000:
            total_estimate += 1500  # Macro
        elif followers_estimate >= 10000:
            total_estimate += 500   # Mid
        else:
            total_estimate += 150   # Micro
    return total_estimate

Why Mobile Proxies for Social Media Scraping

Social media platforms implement aggressive anti-scraping measures that make mobile proxies essential:

  1. Rate limiting by IP: Social platforms restrict requests per IP, mobile IPs have higher trust
  2. Geo-restricted content: Content relevance depends on location, mobile proxies provide authentic geo-targeting
  3. Mobile-first APIs: Social apps serve different content to mobile vs. desktop users
  4. Account safety: Using mobile IPs reduces the risk of triggering security challenges

DataResearchTools mobile proxies provide the authentic mobile carrier IPs needed to access social media content across all SEA markets, ensuring you collect comprehensive food influencer data without detection.

Conclusion

Food influencer intelligence gives F&B brands a powerful lens into consumer trends, competitive marketing strategies, and brand perception across Southeast Asia. By systematically scraping and analyzing influencer content with DataResearchTools mobile proxies, businesses can identify trending restaurants, discover effective content strategies, and make data-driven decisions about their own influencer marketing investments.

The key is building a systematic monitoring pipeline that tracks relevant hashtags, influencer profiles, and competitor mentions across platforms. Over time, this data reveals patterns in consumer preferences and marketing effectiveness that are impossible to see through manual observation alone.


Related Reading

Scroll to Top