Instagram Scraper
Browser-based tool to discover Instagram profiles by location/category and scrape their public info, stats, images, and engagement with export options.
Browser-based tool to discover Instagram profiles by location/category and scrape their public info, stats, images, and engagement with export options.
Real data. Real impact.
Emerging
Developers
Per week
Open source
Skills give you superpowers. Install in 30 seconds.
A browser-based Instagram profile discovery and scraping tool.
Part of ScrapeClaw — a suite of production-ready, agentic social media scrapers for Instagram, YouTube, X/Twitter, and Facebook built with Python & Playwright, no API keys required.
--- name: instagram-scraper description: Discover and scrape Instagram profiles from your browser. emoji: 📸 version: 1.0.6 author: influenza tags: - instagram - scraping - social-media - influencer-discovery metadata: clawdbot: requires: bins: - python3 - chromiumconfig: stateDirs: - data/output - data/queue - thumbnails outputFormats: - json - csv
This skill provides a two-phase Instagram scraping system:
instagram.com as the site to searchFor OpenClaw agent integration, the skill provides JSON output:
# Discover profiles (returns JSON) discover --location "Miami" --category "fitness" --output jsonScrape single profile (returns JSON)
scrape --username influencer123 --output json
{ "username": "example_user", "full_name": "Example User", "bio": "Fashion blogger | NYC", "followers": 125000, "following": 1500, "posts_count": 450, "is_verified": false, "is_private": false, "influencer_tier": "mid", "category": "fashion", "location": "New York", "profile_pic_local": "thumbnails/example_user/profile_abc123.jpg", "content_thumbnails": [ "thumbnails/example_user/content_1_def456.jpg", "thumbnails/example_user/content_2_ghi789.jpg" ], "post_engagement": [ {"post_url": "https://instagram.com/p/ABC123/", "likes": 5420, "comments": 89} ], "scrape_timestamp": "2025-02-09T14:30:00" }
| Tier | Follower Range |
|---|---|
| nano | < 1,000 |
| micro | 1,000 - 10,000 |
| mid | 10,000 - 100,000 |
| macro | 100,000 - 1M |
| mega | > 1,000,000 |
data/queue/{location}_{category}_{timestamp}.jsondata/output/{username}.jsonthumbnails/{username}/profile_*.jpg, thumbnails/{username}/content_*.jpgdata/export_{timestamp}.json, data/export_{timestamp}.csvEdit
config/scraper_config.json:
{ "proxy": { "enabled": false, "provider": "brightdata", "country": "", "sticky": true, "sticky_ttl_minutes": 10 }, "google_search": { "enabled": true, "api_key": "", "search_engine_id": "", "queries_per_location": 3 }, "scraper": { "headless": false, "min_followers": 1000, "download_thumbnails": true, "max_thumbnails": 6 }, "cities": ["New York", "Los Angeles", "Miami", "Chicago"], "categories": ["fashion", "beauty", "fitness", "food", "travel", "tech"] }
The scraper automatically filters out:
Running a scraper at scale without a residential proxy will get your IP blocked fast. Here's why proxies are essential for long-running scrapes:
| Advantage | Description |
|---|---|
| Avoid IP Bans | Residential IPs look like real household users, not data-center bots. Instagram is far less likely to flag them. |
| Automatic IP Rotation | Each request (or session) gets a fresh IP, so rate-limits never stack up on one address. |
| Geo-Targeting | Route traffic through a specific country/city so scraped content matches the target audience's locale. |
| Sticky Sessions | Keep the same IP for a configurable window (e.g. 10 min) — critical for maintaining a consistent browsing session. |
| Higher Success Rate | Rotating residential IPs deliver 95%+ success rates compared to ~30% with data-center proxies on Instagram. |
| Long-Running Scrapes | Scrape thousands of profiles over hours or days without interruption. |
| Concurrent Scraping | Run multiple browser instances across different IPs simultaneously. |
We have affiliate partnerships with top residential proxy providers. Using these links supports continued development of this skill:
| Provider | Best For | Sign Up |
|---|---|---|
| Bright Data | World's largest network, 72M+ IPs, enterprise-grade | 👉 Get Bright Data |
| IProyal | Pay-as-you-go, 195+ countries, no traffic expiry | 👉 Get IProyal |
| Storm Proxies | Fast & reliable, developer-friendly API, competitive pricing | 👉 Get Storm Proxies |
| NetNut | ISP-grade network, 52M+ IPs, direct connectivity | 👉 Get NetNut |
Sign up with any provider above, then grab:
export PROXY_ENABLED=true export PROXY_PROVIDER=brightdata # brightdata | iproyal | stormproxies | netnut | custom export PROXY_USERNAME=your_user export PROXY_PASSWORD=your_pass export PROXY_COUNTRY=us # optional: two-letter country code export PROXY_STICKY=true # optional: keep same IP per session
These are auto-configured when you set the
provider name:
| Provider | Host | Port |
|---|---|---|
| Bright Data | | |
| IProyal | | |
| Storm Proxies | | |
| NetNut | | |
Override with
PROXY_HOST / PROXY_PORT env vars if your plan uses a different gateway.
For any other proxy service, set provider to
custom and supply host/port manually:
{ "proxy": { "enabled": true, "provider": "custom", "host": "your.proxy.host", "port": 8080, "username": "user", "password": "pass" } }
Once configured, the scraper picks up the proxy automatically — no extra flags needed:
# Discover and scrape as usual — proxy is applied automatically python main.py discover --location "Miami" --category "fitness" python main.py scrape --username influencer123The log will confirm proxy is active:
INFO - Proxy enabled: <ProxyManager provider=brightdata enabled host=brd.superproxy.io:22225>
INFO - Browser using proxy: brightdata → brd.superproxy.io:22225
from proxy_manager import ProxyManagerFrom config (auto-reads config/scraper_config.json)
pm = ProxyManager.from_config()
From environment variables
pm = ProxyManager.from_env()
Manual construction
pm = ProxyManager( provider="brightdata", username="your_user", password="your_pass", country="us", sticky=True )
For Playwright browser context
proxy = pm.get_playwright_proxy()
→ {"server": "http://brd.superproxy.io:22225", "username": "user-country-us-session-abc123", "password": "pass"}
For requests / aiohttp
proxies = pm.get_requests_proxy()
→ {"http": "http://user:pass@host:port", "https": "http://user:pass@host:port"}
Force new IP (rotates session ID)
pm.rotate_session()
Debug info
print(pm.info())
"sticky": true."country": "us" (or your target region) so Instagram serves content in the expected locale.pm.rotate_session() between large batches of profiles to get a fresh IP.delay_between_profiles in config to avoid aggressive patterns.No automatic installation available. Please visit the source repository for installation instructions.
View Installation Instructions1,500+ AI skills, agents & workflows. Install in 30 seconds. Part of the Torly.ai family.
© 2026 Torly.ai. All rights reserved.