scrapling
Web scraping with Scrapling - HTTP fetching, stealth browser automation, Cloudflare bypass, and spider crawling via CLI and Python.
Web scraping with Scrapling - HTTP fetching, stealth browser automation, Cloudflare bypass, and spider crawling via CLI and Python.
Real data. Real impact.
Emerging
Developers
Per week
Excellent
Skills give you superpowers. Install in 30 seconds.
Scrapling is a web scraping framework with anti-bot bypass, stealth browser automation, and a spider framework. It provides three fetching strategies (HTTP, dynamic JS, stealth/Cloudflare) and a full CLI.
This skill is for educational and research purposes only. Users must comply with local/international data scraping laws and respect website Terms of Service.
web_extract tool does not return the data you needpip install "scrapling[all]" scrapling install
Minimal install (HTTP only, no browser):
pip install scrapling
With browser automation only:
pip install "scrapling[fetchers]" scrapling install
| Approach | Class | Use When |
|---|---|---|
| HTTP | / | Static pages, APIs, fast bulk requests |
| Dynamic | / | JS-rendered content, SPAs |
| Stealth | / | Cloudflare, anti-bot protected sites |
| Spider | | Multi-page crawling with link following |
scrapling extract get 'https://example.com' output.md
With CSS selector and browser impersonation:
scrapling extract get 'https://example.com' output.md \ --css-selector '.content' \ --impersonate 'chrome'
scrapling extract fetch 'https://example.com' output.md \ --css-selector '.dynamic-content' \ --disable-resources \ --network-idle
scrapling extract stealthy-fetch 'https://protected-site.com' output.html \ --solve-cloudflare \ --block-webrtc \ --hide-canvas
scrapling extract post 'https://example.com/api' output.json \ --json '{"query": "search term"}'
The output format is determined by the file extension:
.html -- raw HTML.md -- converted to Markdown.txt -- plain text.json / .jsonl -- JSONfrom scrapling.fetchers import Fetcher page = Fetcher.get('https://quotes.toscrape.com/') quotes = page.css('.quote .text::text').getall() for q in quotes: print(q)
from scrapling.fetchers import FetcherSession with FetcherSession(impersonate='chrome') as session: page = session.get('https://example.com/', stealthy_headers=True) links = page.css('a::attr(href)').getall() for link in links[:5]: sub = session.get(link) print(sub.css('h1::text').get())
page = Fetcher.post('https://api.example.com/data', json={"key": "value"}) page = Fetcher.put('https://api.example.com/item/1', data={"name": "updated"}) page = Fetcher.delete('https://api.example.com/item/1')
page = Fetcher.get('https://example.com', proxy='http://user:pass@proxy:8080')
For pages that require JavaScript execution (SPAs, lazy-loaded content):
from scrapling.fetchers import DynamicFetcher page = DynamicFetcher.fetch('https://example.com', headless=True) data = page.css('.js-loaded-content::text').getall()
page = DynamicFetcher.fetch( 'https://example.com', wait_selector=('.results', 'visible'), network_idle=True, )
Blocks fonts, images, media, stylesheets (~25% faster):
from scrapling.fetchers import DynamicSession with DynamicSession(headless=True, disable_resources=True, network_idle=True) as session: page = session.fetch('https://example.com') items = page.css('.item::text').getall()
from playwright.sync_api import Page from scrapling.fetchers import DynamicFetcher def scroll_and_click(page: Page): page.mouse.wheel(0, 3000) page.wait_for_timeout(1000) page.click('button.load-more') page.wait_for_selector('.extra-results') page = DynamicFetcher.fetch('https://example.com', page_action=scroll_and_click) results = page.css('.extra-results .item::text').getall()
For Cloudflare-protected or heavily fingerprinted sites:
from scrapling.fetchers import StealthyFetcher page = StealthyFetcher.fetch( 'https://protected-site.com', headless=True, solve_cloudflare=True, block_webrtc=True, hide_canvas=True, ) content = page.css('.protected-content::text').getall()
from scrapling.fetchers import StealthySession with StealthySession(headless=True, solve_cloudflare=True) as session: page1 = session.fetch('https://protected-site.com/page1') page2 = session.fetch('https://protected-site.com/page2')
All fetchers return a
Selector object with these methods:
page.css('h1::text').get() # First h1 text page.css('a::attr(href)').getall() # All link hrefs page.css('.quote .text::text').getall() # Nested selection
page.xpath('//div[@class="content"]/text()').getall() page.xpath('//a/@href').getall()
page.find_all('div', class_='quote') # By tag + attribute page.find_by_text('Read more', tag='a') # By text content page.find_by_regex(r'\$\d+\.\d{2}') # By regex pattern
Find elements with similar structure (useful for product listings, etc.):
first_product = page.css('.product')[0] all_similar = first_product.find_similar()
el = page.css('.target')[0] el.parent # Parent element el.children # Child elements el.next_sibling # Next sibling el.prev_sibling # Previous sibling
For multi-page crawling with link following:
from scrapling.spiders import Spider, Request, Response class QuotesSpider(Spider): name = "quotes" start_urls = ["https://quotes.toscrape.com/"] concurrent_requests = 10 download_delay = 1 async def parse(self, response: Response): for quote in response.css('.quote'): yield { "text": quote.css('.text::text').get(), "author": quote.css('.author::text').get(), "tags": quote.css('.tag::text').getall(), } next_page = response.css('.next a::attr(href)').get() if next_page: yield response.follow(next_page) result = QuotesSpider().start() print(f"Scraped {len(result.items)} quotes") result.items.to_json("quotes.json")
Route requests to different fetcher types:
from scrapling.fetchers import FetcherSession, AsyncStealthySession class SmartSpider(Spider): name = "smart" start_urls = ["https://example.com/"] def configure_sessions(self, manager): manager.add("fast", FetcherSession(impersonate="chrome")) manager.add("stealth", AsyncStealthySession(headless=True), lazy=True) async def parse(self, response: Response): for link in response.css('a::attr(href)').getall(): if "protected" in link: yield Request(link, sid="stealth") else: yield Request(link, sid="fast", callback=self.parse)
spider = QuotesSpider(crawldir="./crawl_checkpoint") spider.start() # Ctrl+C to pause, re-run to resume from checkpoint
scrapling install after pip install -- without it, DynamicFetcher and StealthyFetcher will failsolve_cloudflare=True adds 5-15 seconds to fetch time -- only enable when neededMIT
mkdir -p ~/.hermes/skills/research/scrapling && curl -o ~/.hermes/skills/research/scrapling/SKILL.md https://raw.githubusercontent.com/NousResearch/hermes-agent/main/optional-skills/research/scrapling/SKILL.md1,500+ AI skills, agents & workflows. Install in 30 seconds. Part of the Torly.ai family.
© 2026 Torly.ai. All rights reserved.