Hermes supports six terminal backends: local, docker, ssh, daytona, singularity, and modal. Each is a different answer to the same question — where do the commands your agent runs actually execute? This post is about modal, which is the most interesting answer if your workload is bursty.

Modal is a serverless Python platform. You define functions in Python, Modal packages them, runs them in the cloud, and bills per second of compute used. No servers to manage, no idle costs. When Hermes is configured with the Modal backend, every terminal command the agent issues spins up a Modal function, runs in a fresh container, and shuts down when done.

Key Takeaways

The Modal backend runs every terminal command as a Modal function — pay only for the seconds of compute you actually use.
Idle cost is essentially zero, which is a material advantage over any always-on VPS for bursty workloads.
Each invocation pays a cold-start penalty (typically a few seconds), so chatty tasks with hundreds of small commands cost more than on a VPS.
Modal has GPUs available — if your agent needs to run a model locally (Whisper, local embedding, small LLM), this is the cheapest way.
Break-even against a $5 VPS is roughly "if your agent runs under a few hours a day total, Modal wins on cost."
Modal is Python-first; non-Python binaries need a container definition with them installed.

Most agent workloads are spiky. You ask the agent to do something in the morning, it runs for two minutes, it sits idle until afternoon, you ask something else, it runs for five minutes. A VPS is paying for 24 hours to get 10 minutes of useful work. Modal inverts that — you pay for the 10 minutes.

For scheduled jobs (covered in scheduling-claude-agents-hermes-cron-daily-reports) the inversion is even more stark. A daily 8am PR digest that runs for 90 seconds costs cents a month on Modal. The same workload on a VPS is paying for 719 minutes of idle every day.

Conceptually, Hermes ships a Modal "runner" function that accepts a shell command, executes it inside a Modal container, and returns stdout/stderr. When the agent wants to run ls -la or git status, Hermes dispatches that command to Modal instead of executing it locally.

A rough config shape (see docs for exact field names):

terminal:
  backend: modal
  modal:
    app_name: "hermes-runner"
    image: "python:3.11-slim"
    cpu: 2
    memory_mb: 4096
    timeout_seconds: 300
    mounts:
      - local_path: "~/.hermes"
        remote_path: "/root/.hermes"

The image field picks the base container. If your tasks need specific binaries (Node, ripgrep, curl), either pick a fatter base image or define a custom Modal image with those installed. Modal's image system is cached, so the first call is slow and subsequent calls use the cached image.

Let us put real numbers on it. Modal charges roughly per vCPU-second and per GB-RAM-second of compute. For a 2 vCPU / 4 GB agent running a single 30-second command, you are paying cents at most. A VPS at $5/month is about $0.17/day regardless of usage.

Scenario A: bursty workload. Ten agent commands per day, 15 seconds each, 2 vCPU, 4 GB. Daily Modal usage: 150 seconds of compute, call it a few cents per month. VPS: $5/month regardless. Modal wins by a factor of many.

Scenario B: steady workload. A long-running agent processing tasks continuously for 4 hours a day. Daily compute: 14,400 seconds. At typical Modal rates this exceeds the $5 VPS. VPS wins.

Scenario C: always-on messaging gateway. Hermes daemon needs to be alive to listen for Telegram messages. This is not a fit for Modal at all — the gateway daemon itself wants an always-on host. You run the daemon on a VPS and let the VPS dispatch terminal commands to Modal for heavy work. Hybrid.

The break-even rule of thumb: under 2-3 hours of active compute per day, Modal wins. Over that, a VPS wins.

Cold Starts Are The Catch

Every Modal function invocation pays a cold-start penalty of a few seconds. For an agent that runs ls && cat file.txt && grep foo file.txt as three separate commands, that is three cold starts. Over a long session this adds up — both in wall-clock latency and in billed seconds.

Two mitigations. First, the Hermes Modal backend may keep containers warm between commands in a single session (see docs for exact behavior — this is backend-specific). Second, write skills that batch commands: one longer script instead of many short ones. Cold-start cost is paid per invocation, not per second, so fewer-larger commands are cheaper than many-shorter commands.

The GPU Angle

Modal has GPUs. If your agent occasionally needs to transcribe audio with Whisper, run an embedding model, or invoke a small local LLM, you can request a T4 or A10G for just the duration of that call. Typical rates are a few cents per minute.

This matters because your main agent can be Claude Sonnet 4.6 running via API, and your supporting tools can be GPU-backed models running on Modal, all composed in a single Hermes session. You only pay for GPU when the tool actually needs it.

Pairing With Other Backends

The Modal backend is not an all-or-nothing choice. Hermes can route different tasks to different backends:

Modal: untrusted code execution, bursty compute, GPU-requiring tools.
Daytona: clean-room code experiments (see deploying-hermes-on-daytona-ephemeral-environments).
Local: fast, cheap, your own machine.
SSH: operating on a remote host you manage.

A mature setup uses Modal for the heavy parallel work, local for quick queries, and SSH for operating on production infra.

Three cases to consider the alternatives.

Your agent needs persistent state between invocations. A database, a long-running process, a TCP listener. Modal functions are ephemeral — each call starts fresh. Use a VPS or an SSH backend pointed at a long-lived host.
You run the messaging gateway itself. The daemon has to be alive to receive Telegram / Discord messages. Put the daemon on a VPS; have the daemon dispatch heavy work to Modal.
Your command surface is tiny and fast. A handful of 100ms commands per session — cold-start dominates, VPS is simpler and probably cheaper.

Closing Thought

Modal as a Hermes backend is most valuable for the shape of work that does not fit a VPS: occasional bursts, GPU-requiring tools, untrusted code, parallel fan-out across many containers. For the boring always-on case, stick with a cheap VPS. For everything else, Modal earns its keep.

Sources

Hermes GitHub: github.com/NousResearch/hermes-agent
Hermes docs: hermes-agent.nousresearch.com/docs/
Modal documentation: modal.com/docs
Series: scheduling-claude-agents-hermes-cron-daily-reports, deploying-hermes-on-daytona-ephemeral-environments, cost-control-hermes-max-turns-budget-fallback
Anthropic API docs: docs.anthropic.com

Hermes on Modal: Serverless Claude Agents with Near-Zero Idle Cost

Key Takeaways

Cold Starts Are The Catch

The GPU Angle

Pairing With Other Backends

Closing Thought

Sources

Infrastructure Maintainer

codex

DevOps Automator

apple-notes

Related Skills to Try

Related Skills to Try

Infrastructure Maintainer

codex

DevOps Automator

apple-notes

Related Articles

Related Articles

Gesture Recognition in AI Interfaces

CI/CD on Apple Silicon With AI

Apple Silicon Optimization for AI

Key Takeaways

Why Modal Matters for Agents

How the Modal Backend Works

Cost Math: Modal vs VPS

Cold Starts Are The Catch

The GPU Angle

Pairing With Other Backends

When Not to Use Modal

Closing Thought

Sources

Related Skills to Try

Related Skills to Try

Infrastructure Maintainer

codex

DevOps Automator

apple-notes

Related Articles

Related Articles

Gesture Recognition in AI Interfaces

CI/CD on Apple Silicon With AI

Apple Silicon Optimization for AI