Hermes on Modal: Serverless Claude Agents with Near-Zero Idle Cost
Using Hermes's Modal terminal backend to run Claude agents serverlessly. Cost math vs a VPS, cold-start tradeoffs, and when each model actually wins.
Using Hermes's Modal terminal backend to run Claude agents serverlessly. Cost math vs a VPS, cold-start tradeoffs, and when each model actually wins.
Hermes supports six terminal backends: local, docker, ssh, daytona, singularity, and modal. Each is a different answer to the same question — where do the commands your agent runs actually execute? This post is about modal, which is the most interesting answer if your workload is bursty.
Modal is a serverless Python platform. You define functions in Python, Modal packages them, runs them in the cloud, and bills per second of compute used. No servers to manage, no idle costs. When Hermes is configured with the Modal backend, every terminal command the agent issues spins up a Modal function, runs in a fresh container, and shuts down when done.
Most agent workloads are spiky. You ask the agent to do something in the morning, it runs for two minutes, it sits idle until afternoon, you ask something else, it runs for five minutes. A VPS is paying for 24 hours to get 10 minutes of useful work. Modal inverts that — you pay for the 10 minutes.
For scheduled jobs (covered in scheduling-claude-agents-hermes-cron-daily-reports) the inversion is even more stark. A daily 8am PR digest that runs for 90 seconds costs cents a month on Modal. The same workload on a VPS is paying for 719 minutes of idle every day.
Conceptually, Hermes ships a Modal "runner" function that accepts a shell command, executes it inside a Modal container, and returns stdout/stderr. When the agent wants to run ls -la or git status, Hermes dispatches that command to Modal instead of executing it locally.
A rough config shape (see docs for exact field names):
terminal:
backend: modal
modal:
app_name: "hermes-runner"
image: "python:3.11-slim"
cpu: 2
memory_mb: 4096
timeout_seconds: 300
mounts:
- local_path: "~/.hermes"
remote_path: "/root/.hermes"
The image field picks the base container. If your tasks need specific binaries (Node, ripgrep, curl), either pick a fatter base image or define a custom Modal image with those installed. Modal's image system is cached, so the first call is slow and subsequent calls use the cached image.
Let us put real numbers on it. Modal charges roughly per vCPU-second and per GB-RAM-second of compute. For a 2 vCPU / 4 GB agent running a single 30-second command, you are paying cents at most. A VPS at $5/month is about $0.17/day regardless of usage.
Scenario A: bursty workload. Ten agent commands per day, 15 seconds each, 2 vCPU, 4 GB. Daily Modal usage: 150 seconds of compute, call it a few cents per month. VPS: $5/month regardless. Modal wins by a factor of many.
Scenario B: steady workload. A long-running agent processing tasks continuously for 4 hours a day. Daily compute: 14,400 seconds. At typical Modal rates this exceeds the $5 VPS. VPS wins.
Scenario C: always-on messaging gateway. Hermes daemon needs to be alive to listen for Telegram messages. This is not a fit for Modal at all — the gateway daemon itself wants an always-on host. You run the daemon on a VPS and let the VPS dispatch terminal commands to Modal for heavy work. Hybrid.
The break-even rule of thumb: under 2-3 hours of active compute per day, Modal wins. Over that, a VPS wins.
Every Modal function invocation pays a cold-start penalty of a few seconds. For an agent that runs ls && cat file.txt && grep foo file.txt as three separate commands, that is three cold starts. Over a long session this adds up — both in wall-clock latency and in billed seconds.
Two mitigations. First, the Hermes Modal backend may keep containers warm between commands in a single session (see docs for exact behavior — this is backend-specific). Second, write skills that batch commands: one longer script instead of many short ones. Cold-start cost is paid per invocation, not per second, so fewer-larger commands are cheaper than many-shorter commands.
Modal has GPUs. If your agent occasionally needs to transcribe audio with Whisper, run an embedding model, or invoke a small local LLM, you can request a T4 or A10G for just the duration of that call. Typical rates are a few cents per minute.
This matters because your main agent can be Claude Sonnet 4.6 running via API, and your supporting tools can be GPU-backed models running on Modal, all composed in a single Hermes session. You only pay for GPU when the tool actually needs it.
The Modal backend is not an all-or-nothing choice. Hermes can route different tasks to different backends:
A mature setup uses Modal for the heavy parallel work, local for quick queries, and SSH for operating on production infra.
Three cases to consider the alternatives.
Modal as a Hermes backend is most valuable for the shape of work that does not fit a VPS: occasional bursts, GPU-requiring tools, untrusted code, parallel fan-out across many containers. For the boring always-on case, stick with a cheap VPS. For everything else, Modal earns its keep.
Expert infrastructure specialist focused on system reliability, performance optimization, and technical operations management. Maintains robust, scalable infrastructure supporting business operations
Delegate coding tasks to OpenAI Codex CLI agent. Use for building features, refactoring, PR reviews, and batch issue fixing. Requires the codex CLI and a git repository.
Expert DevOps engineer specializing in infrastructure automation, CI/CD pipeline development, and cloud operations
Manage Apple Notes via the memo CLI on macOS (create, view, search, edit).