AnyCrawl Skill

AnyCrawl API integration for OpenClaw - Scrape, Crawl, and Search web content with high-performance multi-threaded crawling.

Setup

Method 1: Environment variable (Recommended)

export ANYCRAWL_API_KEY="your-api-key"

Make it permanent by adding to

~/.bashrc

~/.zshrc

echo 'export ANYCRAWL_API_KEY="your-api-key"' >> ~/.bashrc
source ~/.bashrc

Get your API key at: https://anycrawl.dev

Method 2: OpenClaw gateway config

openclaw config.patch --set ANYCRAWL_API_KEY="your-api-key"

Functions

1. anycrawl_scrape

Scrape a single URL and convert to LLM-ready structured data.

Parameters:

```
url
```
(string, required): URL to scrape
```
engine
```
(string, optional): Scraping engine -
```
"cheerio"
```
(default),
```
"playwright"
```
,
```
"puppeteer"
```

formats

(array, optional): Output formats -

["markdown"]

["html"]

["text"]

["json"]

["screenshot"]

```
timeout
```
(number, optional): Timeout in milliseconds (default: 30000)
```
wait_for
```
(number, optional): Delay before extraction in ms (browser engines only)
```
wait_for_selector
```
(string/object/array, optional): Wait for CSS selectors
```
include_tags
```
(array, optional): Include only these HTML tags (e.g.,
```
["h1", "p", "article"]
```
)
```
exclude_tags
```
(array, optional): Exclude these HTML tags
```
proxy
```
(string, optional): Proxy URL (e.g.,
```
"http://proxy:port"
```
)
```
json_options
```
(object, optional): JSON extraction with schema/prompt
```
extract_source
```
(string, optional):
```
"markdown"
```
(default) or
```
"html"
```

Examples:

// Basic scrape with default cheerio
anycrawl_scrape({ url: "https://example.com" })
// Scrape SPA with Playwright
anycrawl_scrape({
url: "https://spa-example.com",
engine: "playwright",
formats: ["markdown", "screenshot"]
})
// Extract structured JSON
anycrawl_scrape({
url: "https://product-page.com",
engine: "cheerio",
json_options: {
schema: {
type: "object",
properties: {
product_name: { type: "string" },
price: { type: "number" },
description: { type: "string" }
},
required: ["product_name", "price"]
},
user_prompt: "Extract product details from this page"
}
})

2. anycrawl_search

Search Google and return structured results.

Parameters:

```
query
```
(string, required): Search query
```
engine
```
(string, optional): Search engine -
```
"google"
```
(default)
```
limit
```
(number, optional): Max results per page (default: 10)
```
offset
```
(number, optional): Number of results to skip (default: 0)
```
pages
```
(number, optional): Number of pages to retrieve (default: 1, max: 20)
```
lang
```
(string, optional): Language locale (e.g.,
```
"en"
```
,
```
"zh"
```
,
```
"vi"
```
)
```
safe_search
```
(number, optional): 0 (off), 1 (medium), 2 (high)
```
scrape_options
```
(object, optional): Scrape each result URL with these options

Examples:

// Basic search
anycrawl_search({ query: "OpenAI ChatGPT" })
// Multi-page search in Vietnamese
anycrawl_search({
query: "hướng dẫn Node.js",
pages: 3,
lang: "vi"
})
// Search and auto-scrape results
anycrawl_search({
query: "best AI tools 2026",
limit: 5,
scrape_options: {
engine: "cheerio",
formats: ["markdown"]
}
})

3. anycrawl_crawl_start

Start crawling an entire website (async job).

Parameters:

```
url
```
(string, required): Seed URL to start crawling
```
engine
```
(string, optional):
```
"cheerio"
```
(default),
```
"playwright"
```
,
```
"puppeteer"
```

strategy

(string, optional):

"all"

"same-domain"

(default),

"same-hostname"

"same-origin"

```
max_depth
```
(number, optional): Max depth from seed URL (default: 10)
```
limit
```
(number, optional): Max pages to crawl (default: 100)
```
include_paths
```
(array, optional): Path patterns to include (e.g.,
```
["/blog/*"]
```
)
```
exclude_paths
```
(array, optional): Path patterns to exclude (e.g.,
```
["/admin/*"]
```
)
```
scrape_paths
```
(array, optional): Only scrape URLs matching these patterns
```
scrape_options
```
(object, optional): Per-page scrape options

Examples:

// Crawl entire website
anycrawl_crawl_start({ 
  url: "https://docs.example.com",
  engine: "cheerio",
  max_depth: 5,
  limit: 50
})
// Crawl only blog posts
anycrawl_crawl_start({
url: "https://example.com",
strategy: "same-domain",
include_paths: ["/blog/"],
exclude_paths: ["/blog/tags/"],
scrape_options: {
formats: ["markdown"]
}
})
// Crawl product pages only
anycrawl_crawl_start({
url: "https://shop.example.com",
strategy: "same-domain",
scrape_paths: ["/products/*"],
limit: 200
})

4. anycrawl_crawl_status

Check crawl job status.

Parameters:

```
job_id
```
(string, required): Crawl job ID

Example:

anycrawl_crawl_status({ job_id: "7a2e165d-8f81-4be6-9ef7-23222330a396" })

5. anycrawl_crawl_results

Get crawl results (paginated).

Parameters:

```
job_id
```
(string, required): Crawl job ID
```
skip
```
(number, optional): Number of results to skip (default: 0)

Example:

// Get first 100 results
anycrawl_crawl_results({ job_id: "xxx", skip: 0 })
// Get next 100 results
anycrawl_crawl_results({ job_id: "xxx", skip: 100 })

6. anycrawl_crawl_cancel

Cancel a running crawl job.

Parameters:

```
job_id
```
(string, required): Crawl job ID

7. anycrawl_search_and_scrape

Quick helper: Search Google then scrape top results.

Parameters:

```
query
```
(string, required): Search query
```
max_results
```
(number, optional): Max results to scrape (default: 3)
```
scrape_engine
```
(string, optional): Engine for scraping (default:
```
"cheerio"
```
)
```
formats
```
(array, optional): Output formats (default:
```
["markdown"]
```
)
```
lang
```
(string, optional): Search language

Example:

anycrawl_search_and_scrape({
  query: "latest AI news",
  max_results: 5,
  formats: ["markdown"]
})

Engine Selection Guide

Engine	Best For	Speed	JS Rendering
`cheerio`	Static HTML, news, blogs	⚡ Fastest	❌ No
`playwright`	SPAs, complex web apps	🐢 Slower	✅ Yes
`puppeteer`	Chrome-specific, metrics	🐢 Slower	✅ Yes

Response Format

All responses follow this structure:

{
  "success": true,
  "data": { ... },
  "message": "Optional message"
}

Error response:

{
  "success": false,
  "error": "Error type",
  "message": "Human-readable message"
}

Common Error Codes

```
400
```
- Bad Request (validation errors)
```
401
```
- Unauthorized (invalid API key)
```
402
```
- Payment Required (insufficient credits)
```
404
```
- Not Found
```
429
```
- Rate limit exceeded
```
500
```
- Internal server error

API Limits

Rate limits apply based on your plan
Crawl jobs expire after 24 hours
Max crawl limit: depends on credits

AnyCrawl-API

AI Skill Market Insights

Be Part of the 0+ Developer Community

AnyCrawl Skill

Setup

Method 1: Environment variable (Recommended)

Method 2: OpenClaw gateway config

Functions

1. anycrawl_scrape

2. anycrawl_search

3. anycrawl_crawl_start

4. anycrawl_crawl_status

5. anycrawl_crawl_results

6. anycrawl_crawl_cancel

7. anycrawl_search_and_scrape

Engine Selection Guide

Response Format

Common Error Codes

API Limits

Links

Quick Start

Manual Installation

TEAR & SHARE

Tags

AgentGo Cloud Browser

browse

Computer Use

Fastest Browser Use

ClearWeb

Channels

Learn

Compare

Company