page-agent
Embed alibaba/page-agent into your own web application — a pure-JavaScript in-page GUI agent that ships as a single <script> tag or npm package and lets end-users of your site drive the UI with natura
Embed alibaba/page-agent into your own web application — a pure-JavaScript in-page GUI agent that ships as a single <script> tag or npm package and lets end-users of your site drive the UI with natura
Real data. Real impact.
Emerging
Developers
Per week
Excellent
Skills give you superpowers. Install in 30 seconds.
alibaba/page-agent (https://github.com/alibaba/page-agent, 17k+ stars, MIT) is an in-page GUI agent written in TypeScript. It lives inside a webpage, reads the DOM as text (no screenshots, no multi-modal LLM), and executes natural-language instructions like "click the login button, then fill username as John" against the current page. Pure client-side — the host site just includes a script and passes an OpenAI-compatible LLM endpoint.
Load this skill when a user wants to:
/v1/chat/completionsFastest way to see it work. Uses alibaba's free testing LLM proxy — for evaluation only, subject to their terms.
Add to any HTML page (or paste into the devtools console as a bookmarklet):
<script src="https://cdn.jsdelivr.net/npm/page-agent@1.8.0/dist/iife/page-agent.demo.js" crossorigin="true"></script>
A panel appears. Type an instruction. Done.
Bookmarklet form (drop into bookmarks bar, click on any page):
javascript:(function(){var s=document.createElement('script');s.src='https://cdn.jsdelivr.net/npm/page-agent@1.8.0/dist/iife/page-agent.demo.js';document.head.appendChild(s);})();
Inside an existing web project (React / Vue / Svelte / plain):
npm install page-agent
Wire it up with your own LLM endpoint — never ship the demo CDN to real users:
import { PageAgent } from 'page-agent' const agent = new PageAgent({ model: 'qwen3.5-plus', baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1', apiKey: process.env.LLM_API_KEY, // never hardcode language: 'en-US', }) // Show the panel for end users: agent.panel.show() // Or drive it programmatically: await agent.execute('Click submit button, then fill username as John')
Provider examples (any OpenAI-compatible endpoint works):
| Provider | | |
|---|---|---|
| Qwen / DashScope | | |
| OpenAI | | |
| Ollama (local) | | |
| OpenRouter | | |
Key config fields (passed to
new PageAgent({...})):
model, baseURL, apiKey — LLM connectionlanguage — UI language (en-US, zh-CN, etc.)Security. Don't put your
apiKey in client-side code for a real deployment — proxy LLM calls through your backend and point baseURL at your proxy. The demo CDN exists because alibaba runs that proxy for evaluation.
Use this when the user wants to modify page-agent itself, test it against arbitrary sites via a local IIFE bundle, or develop the browser extension.
git clone https://github.com/alibaba/page-agent.git cd page-agent npm ci # exact lockfile install (or `npm i` to allow updates)
Create
.env in the repo root with an LLM endpoint. Example:
LLM_MODEL_NAME=gpt-4o-mini LLM_API_KEY=sk-... LLM_BASE_URL=https://api.openai.com/v1
Ollama flavor:
LLM_BASE_URL=http://localhost:11434/v1 LLM_API_KEY=NA LLM_MODEL_NAME=qwen3:14b
Common commands:
npm start # docs/website dev server npm run build # build every package npm run dev:demo # serve IIFE bundle at http://localhost:5174/page-agent.demo.js npm run dev:ext # develop the browser extension (WXT + React) npm run build:ext # build the extension
Test on any website using the local IIFE bundle. Add this bookmarklet:
javascript:(function(){var s=document.createElement('script');s.src=`http://localhost:5174/page-agent.demo.js?t=${Math.random()}`;s.onload=()=>console.log('PageAgent ready!');document.head.appendChild(s);})();
Then:
npm run dev:demo, click the bookmarklet on any page, and the local build injects. Auto-rebuilds on save.
Warning: your
.env LLM_API_KEY is inlined into the IIFE bundle during dev builds. Don't share the bundle. Don't commit it. Don't paste the URL into Slack. (Verified: grepping the public dev bundle returns the literal values from .env.)
Monorepo with npm workspaces. Key packages:
| Package | Path | Purpose |
|---|---|---|
| | Main entry with UI panel |
| | Core agent logic, no UI |
| | MCP server (beta) |
| — | | LLM client |
| — | | DOM ops + visual feedback |
| — | | Panel + i18n |
| — | | Chrome/Firefox extension |
| — | | Docs + landing site |
After Path 1 or Path 2:
baseURL, or a bad API key)baseURLAfter Path 3:
npm run dev:demo prints Accepting connections at http://localhost:5174curl -I http://localhost:5174/page-agent.demo.js returns HTTP/1.1 200 OK with Content-Type: application/javascriptnew PageAgent({apiKey: ...}) ships in your JS bundle. Always proxy through your own backend for real deployments..env in Path 3 — Vite only reads env at startup.^22.13.0 || >=24. Node 20 will fail npm ci with engine errors.MIT
mkdir -p ~/.hermes/skills/web-development/page-agent && curl -o ~/.hermes/skills/web-development/page-agent/SKILL.md https://raw.githubusercontent.com/NousResearch/hermes-agent/main/optional-skills/web-development/page-agent/SKILL.md1,500+ AI skills, agents & workflows. Install in 30 seconds. Part of the Torly.ai family.
© 2026 Torly.ai. All rights reserved.