Stagehand - Momentic

Momentic is a managed testing platform for the web. Tests are YAML, executed on a managed runner. A multi-modal step cache stores locator metadata per step and auto-heals in place when the UI drifts. AI primitives cover action, assertion, visual diff, and typed extraction. AI providers route with cross-provider failover behind a single managed surface. A dashboard captures run videos, traces, network, heal events, and AI reasoning. Stagehand is an open-source TypeScript library from Browserbase. It adds four AI primitives (act, observe, extract, agent) on top of Playwright. With env: "BROWSERBASE" it adds Browserbase Cache (server-side, on by default) and Browserbase Model Gateway (one Browserbase key routes to OpenAI, Anthropic, Google). It’s well-suited to teams that want programmatic TypeScript control with a thin AI layer over Playwright.

Speed and caching

	Momentic	Stagehand
What’s cached	Multi-modal locator data per step (docs).	Resolved action per `act` call.
DOM-drift resilience	Unrelated DOM changes don’t invalidate the entry.	Any structural shift in the a11y tree (banner mount, modal open, streaming content) flips the hash. Docs recommend `waitForLoadState("networkidle")` before every action.
Heal on miss	Re-resolves and updates the entry in place. Heal event on the run.	Re-resolves and writes a new entry under the new key.
Storage	Managed, git-aware.	Browserbase Cache (server-side, requires `env: "BROWSERBASE"`) or Local Cache (JSON in repo).
Smart waiting	Built-in: navigation, `load`, screenshots, DOM mutations, same-origin requests. 3s default, configurable.	Playwright actionability + manual `waitForLoadState` / `waitForResponse`.

A cached step stores more than one way to find the target: where it sits on screen, what it looks like, what text it contains, and the structural and accessibility attributes around it. Which of those signals matters for a given step is inferred from the natural-language description. “The red Cancel button below the Order Summary header” leans on visual and positional signals; “the Submit button in the form” leans on structure and role. When a step replays, the runner checks the stored signals against the live page and runs the action without invoking the LLM when there’s a match. On a miss, the locator agent (auto-heal) re-resolves the original description against the live page, updates the cache entry in place, and the run continues. A heal event is recorded against the run. Stagehand’s Browserbase Cache keys on the page’s accessibility tree, so any background change (a transient banner, a streaming widget, an A/B variant) flips the key even when the target itself hasn’t changed. The Stagehand caching docs recommend page.waitForLoadState("networkidle") before every action to keep the tree stable enough to hit. On miss, Stagehand re-runs the LLM and writes a new entry under the new key; the old entry isn’t reused. Local Cache writes a JSON file under cacheDir; the team owns expiry, ignore rules, and lock contention.

AI primitives and assertions

	Momentic	Stagehand
Primitives	40+ step types: `act`, `assert`, `extract`, `assertVisually`, drag-and-drop, file upload, hover, `<select>`, native dialogs, scroll, etc.	Four primitives: `act`, `observe`, `extract`, `agent`.
Asserts	`assert` is a first-class step type that fails by default.	No native assert. Drops to Playwright `expect`, or builds from `observe()` / `extract()` + custom code.
Visual regression	`assertVisually`, agent-scored against a golden.	Drops to Playwright `toHaveScreenshot()` (pixel / hash diff).
Managed AI	Cross-provider failover handled by the platform.	Browserbase Model Gateway: one key routes to OpenAI / Anthropic / Google with retries and backoff. Customer picks the model per call; no advertised cross-provider failover.

Technical details

Momentic step types (see test format)

Action: act, click, type, hover, scroll, navigate, dragAndDrop, fileUpload, select, dialog, refresh
Assert: assert, assertVisually, checkPageContains, checkElement<...>
Extract: extract (typed via JSON schema)
Control flow: if/then, modules, parameter inputs

Stagehand assertion pattern, for contrast

const { ok } = await page.extract({
  instruction: "Is the dashboard chart visible and not cut off?",
  schema: z.object({ ok: z.boolean() }),
});
expect(ok).toBe(true);

There’s no failure mode beyond a boolean. The team owns thresholds, schema design, and reporter integration.

CI, recovery, and observability

	Momentic	Stagehand
Runner	Built-in CLI (`momentic run`).	Bring-your-own (Vitest, Mocha, Playwright Test).
Reporters	`junit`, `allure`, `playwright-json`, `buildkite-json`.	Whatever the host runner emits.
Sharding	`--shard-index <i>` / `--shard-count <n>` (1-indexed). Deterministic alphabetical partition.	Owned by the host runner.
Quarantine	First-class: tests run, results report, exit code unaffected unless `--only-quarantined`.	Owned by the team.
Recovery	LLM agent proposes test edits in the dashboard.	None.
Dashboard	Run videos, traces, heal events, AI reasoning, screenshots, network.	Browserbase session replays.

Quarantine semantics

Default: quarantined tests run, results report, statuses do not affect exit code.
--skip-quarantined: quarantined tests are skipped entirely.
--only-quarantined: only quarantined tests run; statuses do affect exit code.

Authoring side-by-side

Same flow, both tools:

// Stagehand
import { Stagehand } from "@browserbasehq/stagehand";

const stagehand = new Stagehand({ env: "BROWSERBASE" });
await stagehand.init();
const { page } = stagehand;

await page.goto("https://app.example.com");
await page.act("Sign in with [email protected] / secret");
const { ok } = await page.extract({
  instruction: "Is the dashboard chart visible and not cut off?",
  schema: z.object({ ok: z.boolean() }),
});
expect(ok).toBe(true);

Agentic simplified format:

fileType: momentic/test/v2
id: sign-in-and-verify
url: https://app.example.com
steps:
  - act: Sign in with [email protected] / secret
  - assert: The dashboard chart is visible and not cut off

Explicit simplified format (same flow, step-by-step):

fileType: momentic/test/v2
id: sign-in-and-verify
url: https://app.example.com
steps:
  - type:
      text: [email protected]
      into: Email
  - type:
      text: secret
      into: Password
  - click: Sign in
  - assert: The dashboard chart is visible and not cut off

A more realistic test

The hello-world above doesn’t exercise the full simplified format surface. A representative checkout regression with module reuse, parameter inputs, typed extraction, and a conditional looks like this:

checkout.test.yaml

fileType: momentic/test/v2
id: checkout-with-promo
url: https://shop.example.com
steps:
  - module:
      path: ../modules/sign-in.module.yaml
      inputs:
        EMAIL: env.QA_EMAIL
        PASSWORD: env.QA_PASSWORD
  - act: Add the Tetris Eye Sweatshirt (size M) to the cart
  - navigate: https://shop.example.com/checkout
  - type:
      text: "{{ env.PROMO_CODE }}"
      into: Promo code field
  - click: Apply
  - if:
      assert: A success banner saying the promo was applied is visible
      then:
        - extract:
            goal: The discounted subtotal in the order summary
            schema:
              type: object
              properties:
                amount:
                  type: number
              required: [amount]
  - if:
      assert: An invalid-promo error is visible
      then:
        - assert: The subtotal is unchanged
  - assertVisually: The order summary section is fully visible and not cut off

The matching module:

../modules/sign-in.module.yaml

fileType: momentic/module/v2
id: sign-in
name: Sign in
parameters:
  - name: EMAIL
  - name: PASSWORD
steps:
  - type:
      text: "{{ env.EMAIL }}"
      into: Email
  - type:
      text: "{{ env.PASSWORD }}"
      into: Password
  - click: Sign in
  - assert: The dashboard chart is visible and not cut off

Technical details, MCP integration

Both ship MCP servers for coding agents. The integration shape is different.Momentic’s MCP server runs the agent as a preview-then-commit loop: the agent calls momentic_preview_step to verify a candidate against the live page, then momentic_run_step to commit. Authoring tools let the agent edit a slice of a saved test rather than rewriting the whole file. See the MCP server docs for the full tool surface.Stagehand MCP exposes act, observe, extract, agent against a live page. The agent runs the flow once from memory and either keeps the snippet or rewrites it. There’s no per-step preview / commit loop.

When to pick which

Stagehand is the right call if you already have a TypeScript Playwright suite you want to keep, you want programmatic control of every step, you’re already on Browserbase, and your AI use cases are narrow (one or two acts sprinkled into mostly-deterministic Playwright code). Momentic is the right call if wall-clock run time matters at scale, you need AI assertions and visual diffs as first-class primitives, selector maintenance is a real recurring cost, and you expect healing, recovery, quarantine, and a managed dashboard out of the box rather than as bring-your-own components.

Documentation Index

​Speed and caching

​How the multi-modal cache works

​AI primitives and assertions

​CI, recovery, and observability

​Authoring side-by-side

​A more realistic test

​When to pick which