arxiv
Search and retrieve academic papers from arXiv using their free REST API. No API key needed. Search by keyword, author, category, or ID. Combine with web_extract or the ocr-and-documents skill to read
Search and retrieve academic papers from arXiv using their free REST API. No API key needed. Search by keyword, author, category, or ID. Combine with web_extract or the ocr-and-documents skill to read
Real data. Real impact.
Emerging
Developers
Per week
Excellent
Skills give you superpowers. Install in 30 seconds.
Search and retrieve academic papers from arXiv via their free REST API. No API key, no dependencies — just curl.
| Action | Command |
|---|---|
| Search papers | |
| Get specific paper | |
| Read abstract (web) | |
| Read full paper (PDF) | |
The API returns Atom XML. Parse with
grep/sed or pipe through python3 for clean output.
curl -s "https://export.arxiv.org/api/query?search_query=all:GRPO+reinforcement+learning&max_results=5"
curl -s "https://export.arxiv.org/api/query?search_query=all:GRPO+reinforcement+learning&max_results=5&sortBy=submittedDate&sortOrder=descending" | python3 -c " import sys, xml.etree.ElementTree as ET ns = {'a': 'http://www.w3.org/2005/Atom'} root = ET.parse(sys.stdin).getroot() for i, entry in enumerate(root.findall('a:entry', ns)): title = entry.find('a:title', ns).text.strip().replace('\n', ' ') arxiv_id = entry.find('a:id', ns).text.strip().split('/abs/')[-1] published = entry.find('a:published', ns).text[:10] authors = ', '.join(a.find('a:name', ns).text for a in entry.findall('a:author', ns)) summary = entry.find('a:summary', ns).text.strip()[:200] cats = ', '.join(c.get('term') for c in entry.findall('a:category', ns)) print(f'{i+1}. [{arxiv_id}] {title}') print(f' Authors: {authors}') print(f' Published: {published} | Categories: {cats}') print(f' Abstract: {summary}...') print(f' PDF: https://arxiv.org/pdf/{arxiv_id}') print() "
| Prefix | Searches | Example |
|---|---|---|
| All fields | |
| Title | |
| Author | |
| Abstract | |
| Category | |
| Comment | |
# AND (default when using +) search_query=all:transformer+attention # OR search_query=all:GPT+OR+all:BERT # AND NOT search_query=all:language+model+ANDNOT+all:vision # Exact phrase search_query=ti:"chain+of+thought" # Combined search_query=au:hinton+AND+cat:cs.LG
| Parameter | Options |
|---|---|
| , , |
| , |
| Result offset (0-based) |
| Number of results (default 10, max 30000) |
# Latest 10 papers in cs.AI curl -s "https://export.arxiv.org/api/query?search_query=cat:cs.AI&sortBy=submittedDate&sortOrder=descending&max_results=10"
# By arXiv ID curl -s "https://export.arxiv.org/api/query?id_list=2402.03300" # Multiple papers curl -s "https://export.arxiv.org/api/query?id_list=2402.03300,2401.12345,2403.00001"
After fetching metadata for a paper, generate a BibTeX entry:
{% raw %}
curl -s "https://export.arxiv.org/api/query?id_list=1706.03762" | python3 -c " import sys, xml.etree.ElementTree as ET ns = {'a': 'http://www.w3.org/2005/Atom', 'arxiv': 'http://arxiv.org/schemas/atom'} root = ET.parse(sys.stdin).getroot() entry = root.find('a:entry', ns) if entry is None: sys.exit('Paper not found') title = entry.find('a:title', ns).text.strip().replace('\n', ' ') authors = ' and '.join(a.find('a:name', ns).text for a in entry.findall('a:author', ns)) year = entry.find('a:published', ns).text[:4] raw_id = entry.find('a:id', ns).text.strip().split('/abs/')[-1] cat = entry.find('arxiv:primary_category', ns) primary = cat.get('term') if cat is not None else 'cs.LG' last_name = entry.find('a:author', ns).find('a:name', ns).text.split()[-1] print(f'@article{{{last_name}{year}_{raw_id.replace(\".\", \"\")},') print(f' title = {{{title}}},') print(f' author = {{{authors}}},') print(f' year = {{{year}}},') print(f' eprint = {{{raw_id}}},') print(f' archivePrefix = {{arXiv}},') print(f' primaryClass = {{{primary}}},') print(f' url = {{https://arxiv.org/abs/{raw_id}}}') print('}') "
{% endraw %}
After finding a paper, read it:
# Abstract page (fast, metadata + abstract) web_extract(urls=["https://arxiv.org/abs/2402.03300"]) # Full paper (PDF → markdown via Firecrawl) web_extract(urls=["https://arxiv.org/pdf/2402.03300"])
For local PDF processing, see the
ocr-and-documents skill.
| Category | Field |
|---|---|
| Artificial Intelligence |
| Computation and Language (NLP) |
| Computer Vision |
| Machine Learning |
| Cryptography and Security |
| Machine Learning (Statistics) |
| Optimization and Control |
| Computational Physics |
Full list: https://arxiv.org/category_taxonomy
The
scripts/search_arxiv.py script handles XML parsing and provides clean output:
python scripts/search_arxiv.py "GRPO reinforcement learning" python scripts/search_arxiv.py "transformer attention" --max 10 --sort date python scripts/search_arxiv.py --author "Yann LeCun" --max 5 python scripts/search_arxiv.py --category cs.AI --sort date python scripts/search_arxiv.py --id 2402.03300 python scripts/search_arxiv.py --id 2402.03300,2401.12345
No dependencies — uses only Python stdlib.
arXiv doesn't provide citation data or recommendations. Use the Semantic Scholar API for that — free, no key needed for basic use (1 req/sec), returns JSON.
# By arXiv ID curl -s "https://api.semanticscholar.org/graph/v1/paper/arXiv:2402.03300?fields=title,authors,citationCount,referenceCount,influentialCitationCount,year,abstract" | python3 -m json.tool # By Semantic Scholar paper ID or DOI curl -s "https://api.semanticscholar.org/graph/v1/paper/DOI:10.1234/example?fields=title,citationCount"
curl -s "https://api.semanticscholar.org/graph/v1/paper/arXiv:2402.03300/citations?fields=title,authors,year,citationCount&limit=10" | python3 -m json.tool
curl -s "https://api.semanticscholar.org/graph/v1/paper/arXiv:2402.03300/references?fields=title,authors,year,citationCount&limit=10" | python3 -m json.tool
curl -s "https://api.semanticscholar.org/graph/v1/paper/search?query=GRPO+reinforcement+learning&limit=5&fields=title,authors,year,citationCount,externalIds" | python3 -m json.tool
curl -s -X POST "https://api.semanticscholar.org/recommendations/v1/papers/" \ -H "Content-Type: application/json" \ -d '{"positivePaperIds": ["arXiv:2402.03300"], "negativePaperIds": []}' | python3 -m json.tool
curl -s "https://api.semanticscholar.org/graph/v1/author/search?query=Yann+LeCun&fields=name,hIndex,citationCount,paperCount" | python3 -m json.tool
title, authors, year, abstract, citationCount, referenceCount, influentialCitationCount, isOpenAccess, openAccessPdf, fieldsOfStudy, publicationVenue, externalIds (contains arXiv ID, DOI, etc.)
python scripts/search_arxiv.py "your topic" --sort date --max 10curl -s "https://api.semanticscholar.org/graph/v1/paper/arXiv:ID?fields=citationCount,influentialCitationCount"web_extract(urls=["https://arxiv.org/abs/ID"])web_extract(urls=["https://arxiv.org/pdf/ID"])curl -s "https://api.semanticscholar.org/graph/v1/paper/arXiv:ID/references?fields=title,citationCount&limit=20"curl -s "https://api.semanticscholar.org/graph/v1/author/search?query=NAME"| API | Rate | Auth |
|---|---|---|
| arXiv | ~1 req / 3 seconds | None needed |
| Semantic Scholar | 1 req / second | None (100/sec with API key) |
python3 -m json.tool for readabilityhep-th/0601001) vs new (2402.03300)https://arxiv.org/pdf/{id} — Abstract: https://arxiv.org/abs/{id}https://arxiv.org/html/{id}ocr-and-documents skillarxiv.org/abs/1706.03762 always resolves to the latest versionarxiv.org/abs/1706.03762v1 points to a specific immutable version<id> field returns the versioned URL (e.g., http://arxiv.org/abs/1706.03762v7)Papers can be withdrawn after submission. When this happens:
<summary> field contains a withdrawal notice (look for "withdrawn" or "retracted")MIT
mkdir -p ~/.hermes/skills/research/arxiv && curl -o ~/.hermes/skills/research/arxiv/SKILL.md https://raw.githubusercontent.com/NousResearch/hermes-agent/main/skills/research/arxiv/SKILL.md1,500+ AI skills, agents & workflows. Install in 30 seconds. Part of the Torly.ai family.
© 2026 Torly.ai. All rights reserved.