Appium - Momentic

Momentic is a managed testing platform for iOS and Android. Tests are YAML, executed on managed remote emulators and simulators. A multi-modal step cache stores locator metadata per step and auto-heals in place when the UI drifts. AI primitives cover action, assertion, visual diff, and typed extraction. AI providers route with cross-provider failover. A dashboard captures run videos, view hierarchies, heal events, and AI reasoning. Appium is an open-source mobile automation framework that exposes the WebDriver / W3C protocol across iOS, Android, and other platforms via swappable drivers (UiAutomator2, XCUITest, Espresso, Flutter, Mac, Windows). Tests are written in TypeScript / Python / Java / Ruby / .NET against the Appium client of choice. It’s well-suited to teams that want OSS, multi-language flexibility, physical-hardware support, and deep customization at the driver level, plus the bandwidth to maintain a verbose, locator-heavy codebase.

Speed and caching

	Momentic	Appium
What’s cached	Multi-modal locator data per step (docs).	Nothing. Every locator strategy re-queries the device.
Heal on miss	Re-resolves and updates the entry in place. Heal event on the run.	Not supported. A miss is a test failure after the WebDriverWait timeout.
Waiting	Built-in: navigation, `load`, screenshots, DOM / view-hierarchy mutations, same-origin requests. 3s default, configurable.	`implicitlyWait` (per-driver) or explicit `WebDriverWait` + `ExpectedConditions` per query. Boilerplate per step.
Storage	Managed, git-aware.	N/A.
Cost of a UI change	Auto-heal absorbs renamed IDs, localized strings, reordered hierarchies.	One client-code edit per broken selector. XPath edits often require full re-derivation.

A cached step stores more than one way to find the target: where it sits on screen, what it looks like, what text it contains, and the accessibility and structural attributes around it. Which of those signals matters for a given step is inferred from the natural-language description. “The red Cancel button below the Order Summary header” leans on visual and positional signals; “the Sign in button” leans on accessibility and text. When a step replays, the runner checks the stored signals against the live UI and runs the action without invoking the LLM when there’s a match.

What happens on a UI change

A practical sequence that exercises the headline difference. Take a sign-in screen whose Email field has accessibilityIdentifier = "email_input", after a passing baseline run where the cache is warm. Refactor: the app team renames email_input to email_field. The XPath position of the field shifts because a container was added above it. Appium replay:

driver.$("~email_input") issues a findElement request against the device. The WebDriver waits up to implicitlyWait (or the configured WebDriverWait timeout) for the element to appear.
The timeout elapses with no match. The client throws NoSuchElementError.
The test stops; the CI job is red. Someone edits the client code to use the new accessibility ID (or rewrites the XPath against the new hierarchy), opens a PR, gets it merged, and re-runs CI. If the XPath path was deep, the edit can cascade across multiple steps.

Momentic replay:

The cached locator for the Email step misses on the live device.
The locator agent re-resolves the original natural-language description Email.
The new locator binds, the step runs, the test passes.
The cache entry is updated in place. A heal event is attached to the run for review. Subsequent runs hit the cache normally.

Across a suite this is the difference between a renamed-ID incident and a no-op.

Technical details

Smart waitingMomentic’s default smart wait is 3000ms and configurable per test. The runner waits on a combination of navigation, load, screenshots, DOM / view-hierarchy mutations, and same-origin requests until the UI is quiet or the timeout elapses.Appium waiting, for contrast

implicitlyWait is per-driver. Too low -> flakes; too high -> padded runs.
Explicit waits (WebDriverWait + ExpectedConditions.visibilityOfElementLocated, presenceOf, elementToBeClickable) are per query. Common to layer 3-5 explicit waits per logical step.
No notion of network quiescence — teams instrument their own request interceptors or poll the UI.

Locators and AI primitives

	Momentic	Appium
Locator model	Natural-language descriptions resolved by an AI agent against a11y tree + view hierarchy + screenshot. Cached, auto-healed.	`accessibility id`, `id`, `xpath`, `class name`, `-android uiautomator` (UiSelector), `-ios predicate string`, `-ios class chain`, `-image` (template match).
Visual cues	Color, icon, relative size, position part of the locator.	`-image` strategy is template matching only (no semantic context).
Agentic step	`act` accepts a multi-step goal; the agent plans and executes.	Not supported.
AI assert	`assert` is a first-class step type, fails by default.	Not built-in. Teams build with `getText` + manual checks.
Visual diff	`assertVisually`, agent-scored against a golden.	Not built-in. Third-party plugins (e.g. Applitools) bolt on.
AI provider	Managed; cross-provider failover handled by the platform.	None. Teams integrate LLMs themselves.

Technical details

Momentic mobile step types

Action: act, tap, doubleTap, longPress, type, swipe, scroll, back, dismissKeyboard, launchApp, terminateApp
Assert: assert, assertVisually, checkElement<...>
Extract: extract (typed via JSON schema)
Control flow: if/then, modules, parameter inputs

Appium locator trade-offs, for contrast

accessibility id is the most stable strategy but only exists when developers explicitly set contentDescription (Android) / accessibilityIdentifier (iOS). Production apps frequently miss them on dynamic content.
id (resource-id on Android) breaks under refactors and A/B testing.
xpath is the catch-all but is slow on large hierarchies and breaks on any structural change.
-image does template matching; works for static images, fails on themed UIs.
No locator strategy carries semantic intent. A failing step has no description to recover from.

Recovery, quarantine, and CI

	Momentic	Appium
Failure recovery	LLM agent proposes test edits in the dashboard.	Not supported.
Quarantine	First-class: tests run, results report, exit code unaffected unless `--only-quarantined`.	Not supported.
Sharding	`--shard-index <i>` / `--shard-count <n>`, 1-indexed. Deterministic alphabetical partition.	Owned by the host runner (pytest, Mocha, JUnit).
Reporters	`junit`, `allure`, `playwright-json`, `buildkite-json`.	Whatever the host runner emits (Allure, ExtentReports).
Device fleet	Remote Android 14/15 emulators and iOS 26 simulators with sub-1s provisioning, multi-region. Local AVDs / simulators supported.	Bring-your-own: local devices, Appium server cluster, or device cloud (BrowserStack, Sauce, LambdaTest, AWS Device Farm).
Dashboard	Run videos, traces, heal events, AI reasoning, screenshots, network.	Third-party (Allure, vendor dashboards).

Technical details

Sharding: --shard-index <i> / --shard-count <n>. Deterministic, contiguous partition of the suite.Device provisioning: each test gets its own device session, so parallel runs don’t share device state. No per-test execution cap.Appium grid, for contrast: Appium servers run locally or in a Selenium Grid. Device farms layer their own provisioning on top. Common pain points: stale UI hierarchies between sessions, dangling driver processes, capability drift across drivers and OS versions.

Authoring side-by-side

// Appium with WebdriverIO
import { remote } from "webdriverio";

const driver = await remote({
  hostname: "localhost",
  port: 4723,
  capabilities: {
    platformName: "Android",
    "appium:deviceName": "emulator-5554",
    "appium:automationName": "UiAutomator2",
    "appium:app": "/path/to/app.apk",
  },
});

const email = await driver.$("~email_input"); // accessibility id
await email.waitForDisplayed({ timeout: 10_000 });
await email.setValue("[email protected]");
const password = await driver.$("~password_input");
await password.setValue("secret");
const signIn = await driver.$('//*[@text="Sign in"]'); // XPath
await signIn.click();
await driver.waitUntil(
  async () => (await driver.$('//*[contains(@text, "Welcome")]')).isDisplayed(),
  { timeout: 10_000 },
);
// "Chart visible and not cut off" requires custom logic.
await driver.deleteSession();

Agentic simplified format:

fileType: momentic/test/v2
id: sign-in-and-verify
steps:
  - act: Sign in with [email protected] / secret
  - assert: The dashboard chart is visible and not cut off

Explicit simplified format (same flow, step-by-step):

fileType: momentic/test/v2
id: sign-in-and-verify
steps:
  - type:
      text: [email protected]
      into: Email
  - type:
      text: secret
      into: Password
  - tap: Sign in
  - assert: The dashboard chart is visible and not cut off

A more realistic test

The hello-world above doesn’t show the full simplified format surface. A representative onboarding regression with module reuse, parameter inputs, typed extraction, and a conditional looks like this:

onboarding.test.yaml

fileType: momentic/test/v2
id: onboarding-with-promo
steps:
  - launchApp
  - module:
      path: ../modules/sign-in.module.yaml
      inputs:
        EMAIL: env.QA_EMAIL
        PASSWORD: env.QA_PASSWORD
  - act: Skip the onboarding tour and land on Home
  - tap: Account
  - type:
      text: "{{ env.PROMO_CODE }}"
      into: Promo code field
  - tap: Apply
  - if:
      assert: A success banner saying the promo was applied is visible
      then:
        - extract:
            goal: The discounted monthly total shown on the plan card
            schema:
              type: object
              properties:
                amount:
                  type: number
              required: [amount]
  - if:
      assert: An invalid-promo error is visible
      then:
        - assert: The plan price is unchanged
  - assertVisually: The plan card is fully visible and not cut off

The matching module:

../modules/sign-in.module.yaml

fileType: momentic/module/v2
id: sign-in
name: Sign in
parameters:
  - name: EMAIL
  - name: PASSWORD
steps:
  - type:
      text: "{{ env.EMAIL }}"
      into: Email
  - type:
      text: "{{ env.PASSWORD }}"
      into: Password
  - tap: Sign in
  - assert: The Home tab is visible

There is no equivalent first-class surface in Appium. Reuse is by extracting host-language helpers; extraction is whatever the client codes; conditionals are if in the host language; visual assertions need a third-party plugin.

When to pick which

Appium is the right call if you have an existing Appium suite the team wants to keep, you have a hard requirement for an OSS WebDriver-protocol layer, you need multi-language clients, or you do deep customization at the driver level (custom plugins, native command extensions). Momentic is the right call if wall-clock run time matters at scale, selector maintenance is a real recurring cost, you want AI assertions that fail the test by default, you’d rather author in YAML than maintain a multi-language WebDriver codebase, and you expect healing, recovery, quarantine, sub-second emulator boots, and run videos out of the box.

Documentation Index

​Speed and caching

​How the multi-modal cache works

​What happens on a UI change

​Locators and AI primitives

​Recovery, quarantine, and CI

​Authoring side-by-side

​A more realistic test

​When to pick which

Speed and caching

How the multi-modal cache works

What happens on a UI change

Locators and AI primitives

Recovery, quarantine, and CI

Authoring side-by-side

A more realistic test

When to pick which