Skip to content

Changelog

All notable changes to rn-dev-agent will be documented in this file.

Format follows Keep a Changelog.

Fixed (GH #119 — AutoRepairOutcome cascading-selector clarification)

Section titled “Fixed (GH #119 — AutoRepairOutcome cascading-selector clarification)”
  • AutoRepairOutcome.nextFailedSelector: new optional field, populated when auto-repair succeeded but the post-repair retry failed on a DIFFERENT selector. Lets MTTR analysis distinguish “patch didn’t work” from “cascading failure — patch worked, next selector broke.” Without this, the telemetry made every cascading failure look like a failed patch.
  • Absent when retry passed (happy path) OR when retry failed on the SAME selector as the patch (= patch didn’t actually fix it). Codex flagged the misclassification at conf 85 in the PR #115 review.
  • 3 new regression tests cover the three cases. Suite 1312 → 1315 passing.

Added (GH #116 — wire cdp_run_action into /run-action slash command)

Section titled “Added (GH #116 — wire cdp_run_action into /run-action slash command)”
  • maestro_run now accepts params: Record<string, string> that get forwarded to maestro-runner as -e KEY=VALUE argv pairs. Keys must match [A-Z_][A-Z0-9_]* (Maestro env-style convention) — anything else is refused at the handler boundary so a hostile payload can’t become a shell-injectable flag. Values must be strings. Since the invocation uses execFile (not exec), values are passed as separate argv entries — shell metacharacters are inert by construction.
  • cdp_run_action forwards params through to both the first maestro_run call AND the post-repair retry, so a parameterised flow replays identically after auto-repair.
  • /rn-dev-agent:run-action slash command is rewritten to call cdp_run_action via MCP rather than shelling out to maestro-runner directly. User invocations of run-action wizard-create-task -e TITLE=... now benefit from auto-repair, structured RunRecords, and the GH #120 per-phase timing. The slash command still parses args locally (positional + -e + --platform + --dry-run + new --no-auto-repair) but delegates execution to the MCP tool.
  • --dry-run keeps the bash-only path since cdp_run_action always executes.
  • 6 new handler tests cover: malformed-key refusal (5 shell-injection shapes), non-string-value refusal, well-formed key acceptance, cdp_run_action’s params forwarding to the first maestro_run call, end-to-end params threading via a real temp project fixture, and params persistence into the post-repair retry path. Suite: 1312 → 1318 passing.

Step #4 of issue #116 (“Live smoke: replay wizard-create-task with -e TITLE=foo end-to-end on a booted simulator”) is left for a maintainer-driven verification — it requires a live simulator with the test app and is outside the scope of this code-only PR.

Hardened (GH #113 — saveAction precondition becomes a runtime soft-assertion)

Section titled “Hardened (GH #113 — saveAction precondition becomes a runtime soft-assertion)”
  • saveAction now throws SaveActionPreconditionError when the on-disk YAML has been edited externally since the in-memory action was loaded. Previously the “caller already gated actionWasEditedExternally” contract was implicit, surfaced only in a comment block. A future caller (e.g. the planned #104 auto-repair-on-failure wiring) could silently clobber a real human edit if it missed the contract.
  • Bypassed on first write (file doesn’t exist yet) since there’s no prior state to protect.
  • Both current callers (cdp_repair_action, cdp_record_test_save_as_action) gate correctly, so the new guard fires only for misbehaving new callers.
  • Error message references GH #113, the offending YAML path, and points the developer at actionWasEditedExternally / saveActionWithCAS.
  • 3 new regression tests; suite: 1312 → 1315 passing.
  • One stat() per save on the happy path — negligible cost.

Fixed (GH #112 — sidecar-io Windows path bug)

Section titled “Fixed (GH #112 — sidecar-io Windows path bug)”
  • sidecarPathFor now extracts the basename via split(/[\\/]/).pop() instead of split('/').pop(). The old form returned the entire backslash-containing path as a single segment on Windows, producing absurd deeply-nested directory trees through subsequent join(parent, 'state', base). PR #109’s atomic-writer trusted this output and ensureDir-ed it, making the pre-existing latent bug more impactful. Gemini flagged at conf 88 in the PR #109 multi-LLM review.
  • 4 new regression tests cover POSIX-style paths, .yml extension, Windows-style backslash input, and mixed-separator input. The fix works on both POSIX and Windows runtimes since the separator split is explicit rather than platform-native. Suite: 1312 → 1316 passing.

Hardened (GH #111 — atomic-writer concurrent pairWrite races)

Section titled “Hardened (GH #111 — atomic-writer concurrent pairWrite races)”
  • pairWrite now uses unique .tmp.<pid>.<time36>.<rand36> suffixes so two concurrent writers against the same action don’t share a tmp namespace. Without this, cleanupOrphans could unlinkSync writer A’s in-flight tmp file mid-rename, producing an opaque ENOENT for the user. Gemini flagged the failure mode at conf 82 in the PR #109 multi-LLM review.
  • cleanupOrphans is age-bounded: scans the target directory for files matching the path prefix and only unlinks orphans older than ORPHAN_MAX_AGE_MS = 5 minutes. Concurrent writer’s fresh tmp file preserved; crashed process’s stale tmp file becomes eligible for sweep after 5 minutes.
  • New _readdir test seam for deterministic cleanup mocking.
  • 7 new regression tests cover unique-stamp generation, stale-orphan sweep, fresh-orphan preservation, prefix anchoring, missing-dir tolerance, the exported constant value, and round-trip orphan-free state across 5 repeated pairWrite calls. Three existing tests updated to match the new .tmp.<stamp> filename shape. Suite: 1312 → 1319 passing.

Hardened (GH #110 — agent-device test seam fuse)

Section titled “Hardened (GH #110 — agent-device test seam fuse)”
  • _setRunAgentDeviceForTest is now one-way fused. Once any production runAgentDevice call has dispatched in this process, attempting to install a new override throws with blown fuse — a production runAgentDevice call (cliArgs[0]="...") already dispatched in this process. ... (GH #110 hardening). The fuse fires BEFORE any tier selection (Codex review conf 90), so a production call that throws downstream still seals the seam.
  • No reset escape hatch. A reset seam would be functionally equivalent to no fuse — any code that could call reset is the same code that could leak the override (Codex review conf 90). Tests that genuinely need both production and override paths should use Node 22’s node --test --test-isolation=process to get a fresh worker per file.
  • Throw, not no-op (Codex review conf 95). Silent no-op would let a forgotten afterEach(() => _setRunAgentDeviceForTest(null)) route a test through the real agent-device CLI, producing ENOENT errors that look nothing like a test-seam bug. The fuse error message includes the cliArgs[0] that blew it, so post-mortem debugging can identify which production call leaked first — that’s the test missing its cleanup.
  • 5 new subprocess-isolated regression tests cover: override is honored pre-fuse; setting null pre-fuse re-arms cleanly; production dispatch blows the fuse before tier completion; error message carries GH #110 + remediation hint; standard afterEach null cleanup remains legal. Suite: 1312 → 1317 passing.

Added (GH #106 — flow + skeleton bundling in experience export/import)

Section titled “Added (GH #106 — flow + skeleton bundling in experience export/import)”
  • exportExperience() now bundles .rn-agent/actions/*.yaml flows and .rn-agent/skeleton.yaml alongside heuristics + failure stats — matching what the rn-agent-export / rn-agent-import command docs have advertised since D1204. Until now the underlying script handled only heuristics, so a teammate exporting + importing got the muscle memory metadata but not the actual reusable actions that ARE the L3 corpus.
  • New src/experience/flow-bundle.ts exposes pure anonymize/restore helpers: rewrites appId: between com.example.<slug> (export) and the local project’s bundleId (import), truncates author-prose comment lines longer than 200 chars while preserving M7 fields verbatim, and extracts ${VAR} placeholders so the importer can surface them.
  • Placeholder manifest comments (Codex review A, HIGH conf): on import, if a flow contains ${UPPER_CASE_VARS}, prepend a # placeholders: VAR1, VAR2 — supply via -e KEY=VALUE on replay line above the M7 header. Codex’s call: don’t suffix every placeholder flow with .needs-review.yaml (that punishes correctly-authored flows); don’t go silent (violates spirit of acceptance criterion); use a grep-able comment instead.
  • appId: rewrite is line-wise (Codex review B, HIGH conf), so legitimate multi-line top sections (a # shared across envs comment above appId:) round-trip cleanly. Hard-fails only when zero appId: lines exist.
  • Conflict semantics: an imported flow whose id already exists locally lands at <id>.imported.yaml so the user can diff and merge manually. Same pattern for skeleton (skeleton.imported.yaml).
  • Status forced to experimental on import — keeps imported flows from claiming active before a local replay proves they work.
  • Sidecars are not bundled. Per-developer runtime state (runHistory, repairHistory, stats) is exactly what shouldn’t travel; on import the local loadOrInitSidecar seeds a fresh sidecar on first replay.
  • --no-flows / --no-skeleton opt-outs on both ExportOptions and ImportOptions (default both true).
  • Defense in depth: import-side flow id is re-validated against ^[A-Za-z0-9_-]+$ so a hand-crafted bundle with a path-traversal id can’t escape .rn-agent/actions/. Malformed flows are skipped with a one-line stderr log.
  • 29 new tests — 18 pure-helper tests on flow-bundle.ts + 11 integration tests with real temp dirs covering export, import, opt-outs, conflict-rename, malformed-bundle defense in depth, and no-app.json fallback. Suite: 1312 → 1341 passing.

Added (GH #91 acceptance #3 closeout — per-project verification config)

Section titled “Added (GH #91 acceptance #3 closeout — per-project verification config)”
  • verification.successShapes and verification.mutationMethods per-project overrides in .rn-agent/config.json for the mutation-absence detector. Closes the last open acceptance criterion on GH #91. Detector itself shipped in fed0dd0 (Apr 28).
  • New loadVerificationConfig(projectRoot) reads the config once per project root and caches the result. Defaults are preserved (no behavior change) on missing file, parse error, missing verification block, empty arrays, or all-invalid regex strings — apps that don’t opt in see zero change.
  • ReDoS-via-typo guard (Codex review conf 90): patterns longer than 200 chars are dropped before compilation, and matched-input length is capped at 256 chars in isSuccessShape. Bounds regex evaluation cost on the cdp_navigate / cdp_navigation_state / proof_step hot path so a developer typo can’t stall the MCP event loop.
  • Empty-array means defaults, not “disable detection” (Codex review conf 92). Silent loss of a safety net is the worse failure mode; explicit disable is reserved for a future verification.disable: true flag.
  • Observability: one stderr log line on first config load per project root ([verification] loaded config from .../.rn-agent/config.json (patterns: N, methods: M)). Makes “is my config picked up?” a one-line check, without needing SIGHUP/watcher reload machinery.
  • 18 new tests cover the loader, overrides, ReDoS guards, cache behavior, and the observability log. Suite: 1312 → 1330 tests, all passing.
  • device_press / cdp_interact wirings remain intentionally deferred as documented in the original fed0dd0 commit message: these tools don’t carry nav-state intent, and the success-shape signal is captured downstream by the next cdp_navigation_state call. Adding nav-state fetches per tap would bloat the hot path for noise this PR considers low-value.
Section titled “Fixed (Phase 134.2-followup — device_deeplink url injection)”
  • device_deeplink now POSIX-quotes the caller-supplied url before passing it through adb shell am start -d <url>. The Phase 134.2 fix validated packageName but missed url. Deepsec revalidation (run 20260512193352) re-flagged the deeplink as HIGH because the url arg still flowed unescaped into the Android remote shell, where argv is joined with spaces and re-interpreted as a raw command line. A URL like myapp://path;reboot would have executed reboot after the am start completed.
  • Two-layer defense:
    1. Validation at the handler boundary: reject any url that contains control characters or newlines (which would break out of the POSIX-quoted string itself), or exceeds 4096 chars.
    2. POSIX single-quote wrap: every shell metacharacter inside the URL (;, |, $, `, &) becomes inert. Same pattern as device-interact.ts:524 (buildAdbInputTextArgv).
  • Legitimate URLs with &, ?, =, # continue to work — the quote wrap makes those literal arguments to am start -d, not shell expansion targets.
  • 4 new unit tests in phase-134-2-adb-shell-arg-hardening.test.js:
    • newline-injected URL rejected
    • control-char-bearing URL rejected
    • oversized URL (>4096 chars) rejected
    • legitimate URL with query+fragment passes validation
  • Full unit suite: 1308 → 1312 passing, 0 failing.
  • Closes the last HIGH-severity finding from the original deepsec scan. Post-merge: CRITICAL = 0, HIGH = 0 (100% security-class findings closed).

Fixed (Phase 134.5 — workflow + correctness sweep, closes 3 MEDIUM + 2 BUG)

Section titled “Fixed (Phase 134.5 — workflow + correctness sweep, closes 3 MEDIUM + 2 BUG)”
  • GitHub Actions pinned to immutable commit SHAs. Both ci.yml and deploy-docs.yml previously used mutable @v4 tags — a moved tag (or compromised maintainer account) would silently substitute different code on the next run. Now pinned to:

    • actions/checkout@de0fac2e… (v6.0.2)
    • actions/setup-node@48b55a01… (v6.4.0)
    • actions/upload-pages-artifact@fc324d35… (v5.0.0)
    • actions/deploy-pages@cd2ce8fc… (v5.0.0)

    Per GitHub’s official security-hardening guide: “Pinning an action to a full-length commit SHA is currently the only way to use an action as an immutable release.”

  • deploy-docs.yml per-job least-privilege permissions. Previously pages: write + id-token: write were granted at the workflow level, so the build job (which only needs contents: read) had Pages-write and OIDC capability it didn’t use. Those permissions are now scoped to the deploy job only.

  • maestro_test_all’s pattern arg now guards against regex DoS. Caller-supplied pattern is length-capped (256 chars) and RegExp construction is try/catch wrapped; on any error, discovery proceeds without filtering rather than crashing.

  • cdp_connect’s already-connected bundleId check uses word-boundary matching instead of includes(). Prevents false-positive “already connected” when the live target is e.g. com.example.app-test and the caller asked for com.example.app.

  • Workflows: SHA-pin comments include the resolved version name (# v6.0.2) so Dependabot can read both and bump them together.
  • tools/connection.ts bundleId match uses a regex with non-bundle-id boundary characters ([^A-Za-z0-9._-]) — bundle IDs share . and - with their surrounding context, so \b alone wouldn’t work.

Fixed (Phase 134.4 — CDP multiplexer trust boundary, closes 1 HIGH)

Section titled “Fixed (Phase 134.4 — CDP multiplexer trust boundary, closes 1 HIGH)”
  • CDP multiplexer now requires a per-instance capability token in the WebSocket upgrade path. Previously any process that could discover the ephemeral loopback port could connect and send arbitrary CDP commands (Runtime.evaluate, Page.navigate, etc.) to the running Hermes runtime, bypassing Claude Code’s tool-permission prompts entirely. This included a browser tab scanning local ports, a sibling shell, or any malicious process with loopback access.
  • The token is 32 bytes of crypto.randomBytes (43 char base64url), unique per multiplexer instance, included in the WebSocket URL path as ws://127.0.0.1:<port>/<token>. The verifyClient handler uses timingSafeEqual on equal-length buffers to compare — no timing side channel leaks the token.
  • The exposed proxyUrl (from client.startProxy()) and the DevTools URL (from cdp_open_devtools) automatically include the token. Without the token in the path, the multiplexer returns 401 Unauthorized at upgrade time.
  • Token never appears in logs — log lines reference ws://127.0.0.1:<port>/<token> literally, not the actual value.
  • New exports from cdp/multiplexer.ts: generateCapabilityToken() and verifyConsumerPath(reqUrl, expectedToken). Both pure functions for unit testing. CDPMultiplexer.token getter for callers building DevTools URLs.
  • 9 new unit tests cover the token verification truth table: legitimate token accepted; wrong token / missing token / empty token / length mismatch / query-style appendage / non-string inputs all rejected. Plus uniqueness across instances.
  • Implements the deepsec recommendation: per-proxy high-entropy capability token required during WebSocket upgrade, rejection before consumer registration.

Fixed (Phase 134.3 — path containment, closes 2 HIGH + 3 MEDIUM)

Section titled “Fixed (Phase 134.3 — path containment, closes 2 HIGH + 3 MEDIUM)”
  • Action IDs are validated against a strict regex at every MCP tool boundary that uses them as a path segment under .rn-agent/actions/. cdp_run_action and cdp_repair_action reject IDs like ../etc/passwd with BAD_FILENAME before any file read. Closes 2 deepsec HIGH path-traversal findings.
  • actionPathFor enforces both regex + assertWithinDir as defense-in-depth. Even if a future caller bypasses the boundary regex, the path resolution refuses to land outside the actions dir.
  • device_screenshot rejects paths containing .. segments. A malicious agent could otherwise pass path: '../../../etc/passwd' and overwrite arbitrary files. Absolute paths to legitimate locations (e.g. ~/Desktop) remain allowed.
  • Default screenshot filename gets a random suffix so parallel same-millisecond calls can’t clobber each other’s output. Was /tmp/rn-screenshot-<ts>.jpg; now /tmp/rn-screenshot-<ts>-<rand>.jpg.
  • cross_platform_verify scanDir rejects .. traversal. Refuses to enumerate the filesystem outside the caller’s directory.
  • New module scripts/cdp-bridge/src/domain/path-safety.tsisValidActionId, assertValidActionId, assertWithinDir, isWithinDir, pathHasTraversal, plus PathTraversalError. Reused across action-store, repair-action, run-action, device-list, cross-platform-verify. Same chokepoint discipline as Phase 134.1 (Maestro validator) and 134.2 (bundle-ID validation).
  • 12 new unit tests cover path-traversal payloads, absolute paths, control chars, sibling-dir prefix collision, and the backward-parity path. Total test count: 1284 → 1296 passing, 0 failing.
  • Implements proposed D1214 from workspace ROADMAP Phase 134.1 (“Tool-supplied paths must pass assertWithinDir(p, projectActionsDir) before any fs.write* / createWriteStream call”).

Fixed (Phase 134.2 — adb shell-arg hardening, closes 5 HIGH)

Section titled “Fixed (Phase 134.2 — adb shell-arg hardening, closes 5 HIGH)”
  • appId / packageName are validated against the bundle-ID regex before any adb shell invocation. In the prompt-injection threat model, adb shell <cmd> <appId> re-interprets argv under the device shell — a metachar-laden appId becomes command injection on the connected Android device/emulator. Each tool now rejects malformed bundle IDs at its handler boundary with INVALID_APPID / DEVICE_RESET_INVALID_APPID / INVALID_PACKAGE_NAME. Closes 5 deepsec HIGH findings.
  • device_permission (tools/device-permission.ts) — both grant/revoke/reset and query actions
  • device_reset_state (tools/device-reset-state.ts) — at the entry point, before any of permission / terminate / launch helpers run
  • device_deeplink (tools/device-deeplink.ts) — packageName arg passed to adb shell am start -n <packageName> is now validated; packageName remains optional
  • device_snapshot action=open with attachOnly=true (tools/device-session.ts) — appId reaching adb shell pidof <appId> is validated
  • All 5 sites reuse isValidBundleId from domain/maestro-validator.ts (introduced in Phase 134.1). Single regex chokepoint — no per-call- site validation logic to drift. Implements proposed D1213 (workspace ROADMAP Phase 134.1).
  • 13 new unit tests in phase-134-2-adb-shell-arg-hardening.test.js cover newline injection, shell metachars (;, |, backtick, $()), and the backward-parity path (valid + hyphenated bundle IDs).
  • Full unit suite: 1284 passing, 0 failing.
  • New error codes added to types.ts: INVALID_APPID, DEVICE_RESET_INVALID_APPID, INVALID_PACKAGE_NAME.

Fixed (Phase 134.1 — Maestro/YAML hardening, closes 7 CRITICAL + 2 HIGH)

Section titled “Fixed (Phase 134.1 — Maestro/YAML hardening, closes 7 CRITICAL + 2 HIGH)”
  • runScript and other host-executing Maestro directives are now rejected by default. New central validator at scripts/cdp-bridge/src/domain/maestro-validator.ts enforces a strict command allowlist on every Maestro-emitting AND Maestro-executing path. In the prompt-injection threat model — where a malicious project file can reach the agent — this closes the highest-impact RCE class: 7 CRITICAL deepsec findings from the 2026-05-12 baseline scan.
  • appId / bundle IDs are validated against a strict reverse-DNS regex. No string concatenation into the Maestro YAML header anywhere; all flow construction goes through buildMaestroFlow() which serializes via the yaml library. Newline / --- / unicode-line-break injection in app.json slugs no longer becomes Maestro directive injection.
  • Project-supplied login flows (auto-login.ts) are now parse-and-inline, not blind-replay. Previously .maestro/subflows/login.yaml was loaded with only a clearState: true strip and wrapped in runFlow: file: .... The new flow parses + validates the project file, then inlines the validated commands directly into the wrapper — runFlow is no longer in the allowlist, so the indirection can’t smuggle a malicious flow back in.
  • maestro_test_all no longer trusts disk content. Every discovered .yaml is read + parse-and-validated + re-serialized to a canonical temp file before execution. Auto-discovery was the highest-trust gap in the codebase: a single prompt-injected save earlier in a session would have let attacker steps replay on every subsequent test_all invocation.
  • Test-recorder fixes (test-recorder-generators.ts): produces keys now pass through stripNewlines (closes the comment-escape CRITICAL); swipe directions are enum-constrained (closes the recording-replay CRITICAL).
  • replaceIdSelector refuses unsafe testIDs — the running app’s snapshot is attacker-controlled in the threat model; testIDs containing newlines or document separators no longer become Maestro injection vectors during auto-repair.

Multi-LLM review fixes (caught before merge)

Section titled “Multi-LLM review fixes (caught before merge)”
  • Allowlist extended with swipeUp/swipeDown/swipeLeft/swipeRight — test-recorder emits the shorthand form, without these every recorded action with a swipe would have failed at replay (both Codex 92% and Gemini 98% confidence — a real regression the test-recorder round-trip test now guards against).
  • Bundle-ID regex allows hyphens — Apple’s CFBundleIdentifier docs permit them, Expo apps commonly use them (com.my-app.testapp). Earlier hyphen-less regex would have refused legitimate apps.
  • auto-login no longer double-prepends launchApp when the project’s flow already begins with one.
  • maestro-run uses unique-per-call temp filenames so parallel calls can’t race on a shared /tmp/rn-maestro-inline.yaml.
  • Dropped over-strict --- substring rejection — \n rejection already catches the actual document-separator attack; mid-scalar --- (in legitimate text like “section --- title”) is harmless when emitted through yaml.stringify.
  • 26 new validator unit tests (phase-134-1-maestro-validator.test.js) cover the exact deepsec attack vectors plus the multi-LLM-caught regressions (hyphenated bundle IDs, swipe shorthand round-trip).
  • Files migrated through the validator: maestro-invoke.ts, maestro-run.ts, maestro-generate.ts, maestro-test-all.ts, auto-login.ts, test-recorder-generators.ts, repair-engine.ts.
  • Full unit suite: 1271 passing, 0 failing. TypeScript compiles clean.
  • Proposed decisions logged in workspace ROADMAP Phase 134.1:
    • D1212: Reject runScript by default in plugin-emitted and plugin-replayed Maestro flows.
    • D1213: Strict bundle-ID regex at every appId boundary.

Fixed (Phase 134.0 — Android capturer exit-code race, deepsec follow-up)

Section titled “Fixed (Phase 134.0 — Android capturer exit-code race, deepsec follow-up)”
  • defaultAndroidCapturer no longer reports success on adb non-zero exit when the stream finished first. The prior implementation settled on whichever of out.on('finish') or proc.on('close', code) fired first. Node doesn’t order these events, so a truncated/corrupt screenshot could be reported as ok: true when adb exited non-zero AFTER the WriteStream drained. The new two-track settle requires BOTH streamFinished === true AND procCode === 0 before reporting success; on any failure path the partial file is unlinked so resizeWithSips never reads a corrupt artifact.
  • The decision logic is extracted as a pure resolveCaptureOutcome(streamFinished, procCode) helper exported for unit testing; the event-wiring stays small enough to read at a glance.
  • Caught by the first deepsec full-repo scan (run 20260512130956-afb409ef9132fae2) — a class of race that neither the PR-A multi-LLM review (Codex + Gemini) flagged because their attention was on the stream-flush race, not the exit-code race. Phase 134 of the workspace ROADMAP sequences the remaining 7 CRITICAL + 11 HIGH findings from the same scan.
  • 3 new unit tests in gh-136-screenshot-raw-platform.test.js cover the resolveCaptureOutcome truth table (pending / success / failure including the exact deepsec scenario: streamFinished=true, procCode=non-zero). Full unit suite: 1245 passing, 0 failing.

Fixed (GH #136 — PR-A: Multi-Device Screenshot Routing)

Section titled “Fixed (GH #136 — PR-A: Multi-Device Screenshot Routing)”
  • device_screenshot platform: "ios" | "android" now disambiguates reliably when both an iOS sim and an Android emu are booted. Previously the call routed through agent-device --platform, which silently fell back to the active session and returned a wrong-platform image (iPhone-resolution JPEG when platform: "android" was requested). The explicit-platform path now resolves the booted device directly via xcrun simctl list -j devices booted (iOS) or adb devices (Android), then captures via xcrun simctl io <UDID> screenshot --type=jpeg <path> or adb -s <emu-id> exec-out screencap -p — bypassing agent-device entirely.
  • Backward-safe by design. The raw path fires only when the caller passes platform explicitly. Calls that omit the field (the common single-device case) keep the existing runAgentDevice flow exactly. Any failure in the raw path (resolver miss, command error, missing xcrun/adb) gracefully degrades to runAgentDevice — no new error surface for users.
  • New module scripts/cdp-bridge/src/tools/device-screenshot-raw.ts — pure parsers (parseSimctlBootedUDID, parseAdbDevicesEmu) plus the tryRawScreenshot orchestrator, with test seams (_setForTest, _resetForTest) for resolver/capturer injection.
  • New _setRunAgentDeviceForTest / _resetRunAgentDeviceForTest seams on device-list.ts for integration tests, mirroring the GH #136 PR-B picker convention.
  • 14 new unit tests in gh-136-screenshot-raw-platform.test.js cover the pure parsers, orchestrator branches (both iOS and Android arms, success + capturer-failure for each), raw-vs-fallback dispatch, and that the resize pipeline + EPHEMERAL_PATH advisory still wrap the raw result identically. Full unit suite at 1242 passing, 0 failing.
  • Multi-LLM review (Codex + Gemini, parallel) flagged an Android WriteStream flush race in the default capturer: out.end() returns before bytes drain, so the promise could resolve ok: true while resizeWithSips reads a truncated PNG (>64KB pipe buffer = real risk for high-res emulators). Fixed by waiting on the 'finish' event and destroying the stream on timeout / proc-error paths. The fix is to defaultAndroidCapturer only — iOS uses execFile-completion ordering which doesn’t have this race.

Fixed (GH #136 — PR-B: Dev-Client Picker Reliability)

Section titled “Fixed (GH #136 — PR-B: Dev-Client Picker Reliability)”
  • cdp_status no longer hangs 60s on the Expo Dev Client picker. The picker probe now runs before autoConnect instead of inside the post-failure catch block. When the picker is up, the plugin dismisses it first, then connects normally — no Metro discovery timeout to wait through.
  • dismissPicker now matches LAN IPs and .local hostnames. Replaces the literal localhost / 127.0.0.1 / 10.0.2.2 list with a three-pass matcher: literal IPs (backward parity), <host>:<port> port-pattern matching with port range validation, then first non-footer row below the picker title. Catches the previously-missed real-world setups.
  • Auto-advance race detection. dismissPicker re-probes isDevClientPickerShowing() before tapping; if the picker auto-dismissed mid-flight (single-server case has ~3-5s grace), returns success without tapping. Closes the ~30% race failure for Maestro flows wrapping post-launch in runFlow when: visible: "DEVELOPMENT SERVERS".
  • Tighter waitForBundle cadence. 100ms polling for the first second, 500ms thereafter, 10s overall budget (was 2s polling + 20s budget). Single-server pickers settle in ~200ms.
  • New pure helpers parsePortPatternEntry and parseFirstServerEntry in scripts/cdp-bridge/src/tools/dev-client-picker.ts — testable without the agent-device CLI in the loop.
  • New runAgentDeviceFn and hasActiveSessionFn test seams (underscore- prefixed exports) follow the codebase convention from gh-61-b1-deep-link-depth.test.js.
  • 18 new unit tests covering helpers, dismissPicker integration, auto-advance race, and the cdp_status flow inversion.
  • Plugin: 0.44.27 → 0.44.28
  • MCP server (cdp-bridge): 0.38.22 → 0.38.23

M6 / Phase 112 — Object.freeze test recorder. Closes the last remaining Phase 90 metro-mcp pattern-adoption story. Adds a new cdp_record_test_* tool family (7 tools) that records real user interactions on the running app and emits replayable Maestro YAML or Detox JS — without any app code changes. Bumps MCP server to 0.36.0 (new tools). All Phase 90 Tier 3 + Tier 4 (M6-M11) now shipped.

React’s createElement calls Object.freeze(props) in dev mode before sealing them. cdp_record_test_start monkey-patches Object.freeze inside Hermes — when React asks to freeze a props object that has onPress/onLongPress/onChangeText/onSubmitEditing/onScroll*, we wrap each handler with event-emission BEFORE letting the freeze proceed. Already-mounted scroll containers are caught via a fiber re-render walk (stateNode.forceUpdate() for class, renderer.overrideProps(fiber, ['__mcpInit'], 1) for function). Route is captured via the onCommitFiberRoot hook reading __RN_AGENT.getNavState(), cached into a closure variable so the Object.freeze hot-path stays synchronous.

Three deliberate deviations from metro-mcp’s reference

Section titled “Three deliberate deviations from metro-mcp’s reference”
  1. Finger-direction swipe semanticsdy > 0 (contentOffset increased; finger went UP) emits direction: 'up', matching Maestro’s swipeUp and Detox’s .swipe('up'). metro-mcp emits the inverted content-delta direction, producing YAML that replays in the wrong direction.
  2. 500-event cap with priority eviction — long sessions are capped; on overflow, oldest swipe/type events are dropped first (taps + navigates carry higher information value). truncated: true bubbles up to the stop envelope.
  3. Route caching via the commit hook — eliminates per-event CDP round-trips. metro-mcp expects the user app to install globalThis.__METRO_MCP_NAV_REF__; we instead read our existing __RN_AGENT.getNavState() once per React commit and cache the active route name in the IIFE closure.
  • NEW scripts/cdp-bridge/src/cdp/test-recorder-helpers.ts — five injected JS string constants (DEV_CHECK_JS, START_RECORDING_JS, STOP_RECORDING_JS, READ_EVENTS_JS, buildAnnotationJs(note) template). The Object.freeze interceptor IIFE is ~250 lines, mirrors metro-mcp’s structure with the deviations above. Includes the M8 1..5 renderer-loop port for fiber root resolution and a session-token (__METRO_MCP_REC_SESSION__) so stale wrappers from a prior start-stop cycle gracefully no-op when a new session begins.
  • NEW scripts/cdp-bridge/src/tools/test-recorder.ts — 7 handler factories (createRecordTestStartHandler, createRecordTestStopHandler, createRecordTestGenerateHandler, createRecordTestAnnotateHandler, createRecordTestSaveHandler, createRecordTestLoadHandler, createRecordTestListHandler), the RecordedEvent discriminated union, module-level storedEvents state, and pure helpers (deduplicateEvents, sanitizeFilename, getRecordingsDir, typeCounts). Test-only DI hooks (_resetState, _setStoredEvents, _getStoredEvents) for hermetic integration tests.
  • NEW scripts/cdp-bridge/src/tools/test-recorder-generators.tsgenerateMaestro + generateDetox + selector helpers (maestroSelector, detoxSelector, nextSelector). All user-controlled string interpolation (annotations, testName, bundleId, route names) goes through stripNewlines() to prevent comment-context escape (Gemini/Codex review).
  • 7 new tools registered in src/index.ts: cdp_record_test_start, cdp_record_test_stop, cdp_record_test_generate, cdp_record_test_annotate, cdp_record_test_save, cdp_record_test_load, cdp_record_test_list. Storage location: <projectRoot>/.rn-agent/recordings/<sanitized>.json. Appium format accepted in zod schema but returns NOT_IMPLEMENTED at runtime — Maestro + Detox cover our use cases.
  • 11 new error codes in ToolErrorCode (DEV_MODE_REQUIRED, EVAL_FAILED, BAD_RESPONSE, START_FAILED, NO_EVENTS, NOT_IMPLEMENTED, NOT_RECORDING, NO_PROJECT_ROOT, BAD_FILENAME, LOAD_FAILED, BAD_RECORDING).
  • __HELPERS_VERSION__ 15 → 16 in injected-helpers.ts to invalidate cached helpers on devices that connected before this release.

Running total: 549 → 605 passing, zero failures. +56 M6 tests across 5 files:

  • test-recorder-deduplicate.test.js (6) — type/tap collision rules
  • test-recorder-storage.test.js (5) — sanitizeFilename, getRecordingsDir
  • test-recorder-generators.test.js (15) — Maestro YAML + Detox JS output snapshots, swipe direction semantic, newline sanitization regression
  • test-recorder-js-guard.test.js (14) — structural invariants of the injected JS strings (Object.freeze override, cleanup, session token, 1..5 renderer loop, eviction policy, all 7 handler names, fiber re-render walk, finger-direction comment)
  • test-recorder-integration.test.js (15) — handler-level mock-CDP round-trips including DEV gate, start→stop→generate flow, save→load round-trip, NO_EVENTS / NOT_IMPLEMENTED / LOAD_FAILED error paths

Multi-LLM (Gemini + Codex). Three high-confidence findings, all applied inline before commit:

  • Codex (conf 95) + Gemini (conf 90) — newline injection in generators. Annotation note, testName, bundleId, and route names are user-controlled strings interpolated into single-line YAML/JS comments. A multi-line note ("reached checkout\nstep:malicious") escapes the comment context and either corrupts Maestro YAML (stray top-level mapping) or executes arbitrary JS in Detox tests. Fix: stripNewlines() helper in generators applied at every interpolation site, plus regression tests asserting attack strings stay inside comments.
  • Codex (conf 85) — Detox submit fallback used device.pressBack() which is Android-only and semantically wrong (back button vs return key). Fix: replaced with // submit: missing testID/label — replay manually comment.
  • Gemini (conf 80) — start-stop-start within a single MCP process leaves stale wrappers on already-frozen props from session 1. Their captured __currentRoute closure is stale in session 2 and they emit events with wrong route. Fix: session token (globalThis.__METRO_MCP_REC_SESSION__ + closure-captured sessionId); each wrapper checks the current global token against its captured token before emitting. Stale wrappers from prior sessions silently call through to the original handler without recording.
  • This is a Dev Client / dev-mode feature only. Calling cdp_record_test_start on a release build returns DEV_MODE_REQUIRED (release builds pre-freeze props at Metro bundling time, so the interceptor can never fire).
  • Recordings are stored project-locally (<projectRoot>/.rn-agent/recordings/) — commit them with feature branches or .gitignore the directory if you prefer ephemeral recording.
  • The existing maestro_generate (replay-based, no recording required) stays available for the case where you already know the steps.

M9 / Phase 111 — /rn-dev-agent:setup now detects USB-connected physical devices and applies (or hints at) the required prerequisites. Closes the Phase 90 Tier 4 M9 story. Auto-runs adb reverse tcp:8081 tcp:8081 on each physical Android so the device can reach Metro. Checks for idb-companion on physical iOS and prints brew install idb-companion when missing. Documents WiFi-debugging as unsupported (matching metro-mcp’s stance).

MCP server unchanged — this is the first story in the Phase 90 pattern-adoption batch with no scripts/cdp-bridge/ changes. MCP stays at 0.35.0.

  • NEW scripts/check-physical-devices.sh (executable). OS-aware bash probe. Android path uses adb devices filtered by emulator- prefix exclusion, iterates results, auto-runs adb -s <dev> reverse tcp:8081 tcp:8081. iOS path (gated on uname -s == Darwin) uses xcrun xctrace list devices, awk-extracts the == Devices == section, positive-filters for iOS form factors (iPhone/iPad/iPod/Apple TV/Apple Vision/Apple Watch) to exclude the host Mac, checks both idb_companion and idb-companion binary names on PATH. Linux/WSL hosts see an explicit “Physical iOS probe skipped (requires macOS; host is $HOST_OS)” line rather than a misleading “no iOS device”.
  • New step 10 in skills/rn-setup/SKILL.md — “Physical device prerequisites (optional)” invokes the script. Advisory — exits 0 in all cases. No-op when no physical devices are connected.
  • 8 structural test guards in scripts/cdp-bridge/test/unit/physical-devices-script-guard.test.js — pin the script’s invariants (exists + executable, bash shebang, expected probes, emulator filter, form-factor filter, idb-companion binary-name coverage, brew install hint, WiFi stance).
  • skills/rn-setup/SKILL.md: “9 checks” / “9-row” language updated to “10 checks” / “10-row” in three downstream references (Rationalizations, Red Flags, Verification checklist). New row added to the output-format table. New physical-device item added to the Verification checklist.

Running total: 541 → 549 passing, zero failures. 8 new structural guards — live functional smoke happens during every /setup invocation.

Multi-LLM (Gemini + Codex). Gemini clean (0 findings). Codex caught two valid issues both applied inline:

  • Confidence 90 — stale “9 checks” / “9-row” copy after adding section 10. Fixed.
  • Confidence 85 — no OS guard meant Linux/WSL hosts would silently show “No physical iOS detected” without context. Fixed — HOST_OS detected + iOS branch gated on Darwin.

Gemini dismissals validated: awk section filter correct against live xctrace output (anchored regex handles ”== Devices Offline ==” and name-mid-line == cases); adb reverse is idempotent; idb-companion binary check covers both brew-published variants; form-factor regex correctly matches “Apple TV HD”-style prefix names.

Ran on dev machine with no physical devices connected. First run misreported the MacBook itself as a “physical iOS device” — xcrun xctrace list devices surfaces the host Mac under == Devices == for Mac Catalyst targeting. Fixed via positive form-factor filter; re-ran correctly reports “No physical iOS devices detected” and explicitly labels Host OS.

  • adb reverse auto-run is stateful — idempotent per-device port-forwarding, but still a side effect. Documented as expected setup behavior.
  • idb-companion not auto-installed — brew installs are slow and can fail; hint-only is the canonical pattern for missing deps in this skill.
  • WiFi debugging not supported automatically — matches metro-mcp. Users can adb connect <ip> manually and the script runs adb reverse over the TCP transport.
  • Structural-only tests — no bats dependency. Live smoke during /setup is the functional validation.

D668 in rn-dev-agent-workspace/docs/DECISIONS.md. Phase 111 in rn-dev-agent-workspace/docs/ROADMAP.md. metro-mcp reference: troubleshooting “Physical Device Setup”.

M10 / Phase 110 — architecture detection + CPU profiler hint. Closes the Phase 90 Tier 4 M10 story. cdp_status.app.architecture now surfaces one of 'new' | 'old' | 'unknown' based on Fabric/bridge globals inside the running app. When cdp_cpu_profile fails AND the target is running on Old Architecture, the error result now includes an advisory hint pointing to cdp_heap_usage as an alternative and suggesting newArchitecture: true in app.json. MCP server bumped to 0.35.0.

  • cdp_status.app.architecture — new optional field: 'new' when globalThis.nativeFabricUIManager is present (Fabric loaded), 'old' when globalThis.__fbBatchedBridge is present and Fabric is absent (classic bridge), 'unknown' when neither signal exists or the probe throws. Fabric wins on transient “both present” interop state.
  • OLD_ARCH_PROFILER_HINT exported constant in scripts/cdp-bridge/src/tools/profiling.ts. Attached as meta.hint to cdp_cpu_profile failures when the architecture probe returns 'old'.
  • narrowArchitecture exported helper in scripts/cdp-bridge/src/tools/status.ts. Whitelists the union — any unexpected string or missing value collapses to 'unknown', so TypeScript’s union guarantee holds at runtime regardless of what future helper bundles emit.
  • __HELPERS_VERSION__ bumped 14 → 15 in injected-helpers.ts. Forces Hermes re-injection on next connect so apps pick up the new getAppInfo() shape. Existing freshness check (D502) triggers re-injection automatically.
  • getAppInfo() extended to compute architecture at probe time (wrapped in try/catch — probe failure defaults to 'unknown').
  • buildStatusResult() AND the __DEV__=false recovery retry path both write status.app.architecture so consumers see consistent values regardless of whether the recovery branch fired.
  • cpuProfile error catch now performs a single-shot architecture probe (via new internal safeProbeArchitecture) before returning failResult. Probe failure → 'unknown' → no hint.
  • 16 new tests. 6 detection in test/unit/injected-helpers.test.js (Fabric-only, bridge-only, neither, both-present → 'new', null-Fabric guard, helpers version = 15). 10 in new test/unit/m10-architecture.test.js (3 narrowArchitecture tests, 4 status-handler integration tests, 3 profiler-hint integration tests including a probe-itself-throws path).

Running total: 525 → 541 passing, zero failures.

Multi-LLM (Gemini + Codex). Both clean — zero high-confidence findings. Independently validated: Fabric is always object-typed in RN (never function); the extra error-path probe adds only ~50-200ms to an already-failed CPU profile call; narrowArchitecture whitelist is injection-safe; __DEV__=false recovery-path parity is complete; v14 → v15 cache invalidation is clean via the existing __v freshness check.

  • Fabric-wins on “both present” is a deliberate heuristic for interop-mode transients. Documented.
  • 'unknown' is non-actionable — consumers should treat as “skip arch-specific hints,” not as “assume old.”
  • Profiler hint is advisory, not blockingcdp_cpu_profile still runs on Old Arch; some profiles do succeed. Hint only fires on actual failures.

D667 in rn-dev-agent-workspace/docs/DECISIONS.md. Phase 110 in rn-dev-agent-workspace/docs/ROADMAP.md. Related: D502 (helper freshness check catches stale v14 caches), M8/D663 (prior __HELPERS_VERSION__ bump pattern).

M11 / Phase 108 — Metro --clear hint on empty buffers. Closes the Phase 90 Tier 4 UX story from the metro-mcp pattern adoption audit. When cdp_console_log or cdp_network_log return empty results AND the CDP session has been idle for more than 60s (measured as max(connectedAt, lastEventAt)), the tool result now includes meta.hint suggesting npx expo start --clear / npx react-native start --reset-cache. This surfaces a failure mode (stale Metro bundle cache) that previously required users to find it in a troubleshooting doc. MCP server bumped to 0.34.0.

Version note: M11 was originally tagged 0.37.0 (reserved before M7 shipped). Main moved to 0.38.0 before this PR merged, so M11 rebased to 0.39.0 above M7. The 0.37.0 reservation was abandoned; no ## [0.37.0] entry exists.

  • scripts/cdp-bridge/src/tools/metro-clear-hint.ts — pure helper. Exports METRO_CLEAR_HINT_THRESHOLD_MS = 60_000, METRO_CLEAR_HINT_TEXT, and shouldShowMetroClearHint(deps, resultIsEmpty): boolean where deps = { connectedAt, lastEventAt?, now }. Idle reference is max(connectedAt, lastEventAt ?? connectedAt) — any activity resets the clock.
  • CDPClient._connectedAt — timestamp of the current connection; null when disconnected. Reset via buildResettableState so every reconnect (including B132 proxy auto-resume) re-stamps correctly.
  • CDPClient._timeNowFn — injectable clock. Constructor now accepts new CDPClient(port?, timeNowFn?) (backwards compat). Defaults to Date.now.
  • DeviceBufferManager.getLastPush(deviceKey): number | undefined — public accessor over the existing internal lastPush map. Used by cdp_network_log to gauge per-device idle time.
  • cdp_console_log — on empty entries, wraps the result with { meta: { hint: METRO_CLEAR_HINT_TEXT } } when the connection has been idle for > 60s. Console has no per-buffer lastPush today (it queries in-app __RN_AGENT.getConsole() rather than our ring buffer), so the idle reference falls back to connectedAt only.
  • cdp_network_log — same pattern as console, but passes lastEventAt = client.networkBufferManager.getLastPush(scope). For scope === 'all' (cross-device query), falls back to connectedAt only since there’s no single-device lastPush to consult.
  • 20 new tests: 10 pure-helper at test/unit/metro-clear-hint.test.js (threshold boundaries, null-connectedAt, both-timestamp-max permutations, exactly-at-threshold, undefined-lastEventAt fallback, hint-text content); 10 integration at test/unit/metro-clear-hint-integration.test.js (CDPClient surface: null-on-fresh, injected fn returned, Date.now default; console-log handler: present / below / above threshold; network-log handler: present / recent-push / stale-both / scope=‘all’).
  • Mock helper extendedtest/helpers/mock-cdp-client.js gained connectedAt + now getters with sane defaults that suppress the hint unless explicitly overridden.

Running total after rebase onto M7-included main: 505 → 525 passing, zero failures.

Multi-LLM (Gemini + Codex) on original PR. Both clean. Gemini verified the B132 auto-resume path correctly re-stamps _connectedAt via the existing handleClose → resetState → reconnect → connectToTarget chain. Codex flagged one sub-threshold observation (clock-skew between DeviceBufferManager.push using real Date.now() vs. CDPClient using the injected _timeNowFn) — moot in production where both are Date.now.

  • Stateless — fires every call past threshold on genuinely idle apps. Accepted per design; LLM context usually absorbs repeated hints.

D665 in rn-dev-agent-workspace/docs/DECISIONS.md. Phase 108 in rn-dev-agent-workspace/docs/ROADMAP.md. metro-mcp reference: troubleshooting “Empty Results or Stale Data” (top-3 user issue).

M7 / Phase 109 — fast-runner tri-state liveness probe. Closes the Phase 90 Tier 3 M7 story and functionally retires the shape-equivalent leftover from R3. Previously isFastRunnerAvailable() only checked PID; a process whose PID was alive but whose HTTP server had wedged was reported as available, and every iOS device_press / device_fill / device_swipe stalled on a 10s fetch timeout before falling through to the daemon. M7 adds a probeFastRunnerLiveness() helper that distinguishes 'alive' | 'stale' | 'dead' via PID check + /health probe, plus an explicit reapStaleFastRunner() helper for the SIGTERM→grace→SIGKILL escalation. tryFastRunner is rewired to branch on the tri-state. MCP server bumped to 0.33.0.

Version note: this release skipped 0.37.0 (reserved for M11 PR #53 at the time). Main moved past 0.37.0 before M11 merged, so M11 shipped as 0.39.0 above this entry instead of slotting in below.

  • FastRunnerLiveness type + probeFastRunnerLiveness(deps?) + reapStaleFastRunner(deps?) in scripts/cdp-bridge/src/fast-runner-session.ts. All deps injectable (getState, processAlive, httpProbe, clearState, sendSignal, sleep) for hermetic tests — mirrors the lockfile.ts pattern.
  • Tri-state probe semantics: 'alive' when /health returns {ok:true}; 'stale' on any HTTP error or ok:false (including AbortError, ECONNREFUSED, 500, timeout); 'dead' when no state file or PID has exited (and state is cleared).
  • Graceful reap: SIGTERM → 500ms grace → SIGKILL if still alive → clear state. ESRCH tolerance on signal send.
  • tryFastRunner in scripts/cdp-bridge/src/agent-device-wrapper.ts now awaits the tri-state probe at entry. 'alive' proceeds; 'stale' reaps + returns null (daemon fallthrough); 'dead' + iOS cold-launches via startFastRunner; 'dead' + non-iOS returns null.
  • fastHealthCheck refactored to delegate to the new defaultHttpProbe helper (single source of truth for the /health call shape). Wrapped in try/catch to preserve its original boolean contract — caught during review.
  • Dangling ChildProcess handle after reap (Gemini review finding, confidence 84): clearStateFile() now nulls runnerProcess alongside runnerState. Self-heals via on('exit') but closes the window cleanly so a concurrent stopFastRunner() doesn’t double-signal a dead PID.
  • 17 new tests in test/unit/fast-runner-liveness.test.js. 8 probe variations (null state, dead PID, alive+healthy, alive+ok:false, HTTP 500, AbortError, ECONNREFUSED, timeout forwarding) + 2 cleanup invariants (probe is read-only on living processes; only 'dead' discovery clears state) + 6 reap variations (no-op on null state, SIGTERM-only success, SIGTERM-ignored→SIGKILL, ESRCH tolerance, graceMs override, default graceMs) + 1 default-timeout check. All hermetic.

Running total: 488 → 505 passing, zero failures.

  • Concurrent 'dead' probes can race two xcodebuild spawns if two DIFFERENT MCP tool calls arrive within the 30s startup window. Flagged sub-threshold (Gemini, confidence 82) — MCP SDK serializes tool invocations per connection, so the race window is narrow. Will follow up with in-flight promise cache on startFastRunner if observed in practice.
  • SIGKILL on xcodebuild PID may orphan xctest children briefly. macOS launchd reaps within seconds.
  • Legacy isFastRunnerAvailable(): boolean retained for sync callers (e.g., post-spawn check, status tool). Documented as coarse in JSDoc.
  • Stale detection conflates “hung” with “misbehaving-but-responsive”: a runner returning {ok:false} is reaped even if it might self-recover. Conservative — prefer respawn over hang.

Multi-LLM (Gemini + Codex). Two findings applied. Codex (confidence 90): the fastHealthCheck refactor dropped its outer try/catch — fixed by re-wrapping. Gemini (confidence 84): reap left runnerProcess dangling — fixed by nulling in clearStateFile. Two sub-threshold findings deferred (concurrent dead-probe race, SIGKILL xcodebuild orphan) — noted in Known Limits.

Story R3 (“fast-runner restart”) from Phase 85 was marked DONE during the Phase 92 stability sweep with the note that the implementation shape differed from the original spec (PID probe instead of /ping; restart integrated into session open). M7 ships the full spec: tri-state /health probe, explicit stale detection, graceful reap. The functional gap R3 left is closed.

D666 in rn-dev-agent-workspace/docs/DECISIONS.md. Phase 109 in rn-dev-agent-workspace/docs/ROADMAP.md. metro-mcp reference: src/plugins/devtools.ts::tryFocusExisting.

B133 / Phase 107 — M8 loose ends. Closes the carveout logged during M8’s Phase 106 review (Gemini finding, flagged at confidence 85, folded out of M8 per story boundary). Ports the 1..5 getFiberRoots probe into cdp_set_shared_value so that tool works on apps where __REACT_DEVTOOLS_GLOBAL_HOOK__.renderers is empty or missing. Also refreshes the cdp_open_devtools tool description, which had been frozen at M1a-era text and was factually misleading after M1b (Phase 104) and B132 (Phase 105) shipped. MCP server bumped to 0.31.1.

  • B133: cdp_set_shared_value inline renderer walk in src/index.ts:356-397. Replaced the stale Array.from(hook.renderers.keys()) loop with the 1..5 getFiberRoots probe pattern — mirrors M8’s findActiveRenderer and REACT_READY_PROBE_JS. Intentional divergence: the allRoots accumulator is preserved (not early-return) because Reanimated worklets can mount in a secondary renderer at a different ID from the React DOM/native renderer, and cdp_set_shared_value must tolerate that to locate its target testID.
  • cdp_open_devtools tool description in src/index.ts:798. Old text stopped at M1a/Phase 100 detection; new text honestly reflects the Phase 100 (M1a) → Phase 104 (M1b CDPClient proxy wiring) → Phase 105 (B132 auto-resume) chain. Returns shape now documents hermesWsUrl + proxyPort that the handler has been emitting since M1b.
  • 3 new structural guard tests at scripts/cdp-bridge/test/unit/shared-value-renderer-probe-guard.test.js. Walks all src/*.ts and fails on any reintroduction of hook.renderers.keys(). Pins index.ts to contain the 1..5 probe loop + the typeof getFiberRoots === 'function' guard. Pattern mirrors the existing screenshot-bypass-guard.test.js (B121).

Running total: 485 → 488 passing, zero failures.

Multi-LLM (Gemini + Codex). Both clean. Gemini validated the fix matches what they flagged during M8 review. Codex noted the allRoots accumulator divergence from M8’s pattern is intentionally correct for worklet-aware fiber walks.

D664 in rn-dev-agent-workspace/docs/DECISIONS.md. Phase 107 in rn-dev-agent-workspace/docs/ROADMAP.md. Parent: D663 / M8 / Phase 106.

M8 / Phase 106 — renderer 1..5 probe for fiber root resolution. Closes the Tier 3 story from the Phase 90 metro-mcp pattern adoption audit. Two places in the plugin gated React introspection on __REACT_DEVTOOLS_GLOBAL_HOOK__.renderers.size > 0injected-helpers.ts::findActiveRenderer (used by 9 downstream consumers) and cdp/setup.ts::waitForReact (the 30s readiness gate before helper injection). Both now brute-probe getFiberRoots(i) for i in 1..5, mirroring metro-mcp’s FIBER_ROOT_JS pattern. Apps where hook.renderers is empty or missing (React Native macros, Reanimated worklets, React DevTools loaded ahead of first render) now return live fiber trees instead of silent empties. MCP server bumped to 0.31.0.

  • REACT_READY_PROBE_JS exported constant in scripts/cdp-bridge/src/injected-helpers.ts. Eval-ready IIFE string with the same 1..5 getFiberRoots probe as findActiveRenderer. Single source of truth for the cross-file readiness invariant — setup.ts now imports and awaits it directly instead of reconstructing a narrower inline check.
  • findActiveRenderer() in the injected helper bundle now brute-probes getFiberRoots(1..5) instead of iterating hook.renderers.entries(). Dropped the early-return guard !hook.renderers || hook.renderers.size === 0 that caused silent empty-tree returns on affected apps. __HELPERS_VERSION__ bumped 13 → 14 so in-flight sessions pick up the new helper on next connect.
  • waitForReact in cdp/setup.ts now awaits REACT_READY_PROBE_JS instead of __REACT_DEVTOOLS_GLOBAL_HOOK__.renderers?.size > 0. Without this companion fix M8’s helper change was blunted — waitForReact would time out 30s on exactly the apps M8 was meant to help before injection began. Side benefit: the new probe refuses to declare “ready” until a fiber root actually exists, tightening the gate’s semantic correctness.
  • 10 new tests in scripts/cdp-bridge/test/unit/injected-helpers.test.js. 5 for findActiveRenderer (happy-path, skip-to-renderer-4, renderers-map-empty, all-empty, missing-getFiberRoots) and 5 for REACT_READY_PROBE_JS (run in isolated vm sandboxes — pin the probe’s public behavior so helper + probe can’t silently diverge). Running total: 475 → 485, zero failures.
  • Renderer IDs 6+ unreachable — matches metro-mcp’s identical bound. Never observed in practice.
  • cdp_set_shared_value in src/index.ts:359-363 still uses the hook.renderers.keys() pattern. Out-of-scope for M8; filed as B133 for a separate PR. Low-severity since cdp_set_shared_value is a niche proof-capture tool.

Multi-LLM (Gemini + Codex). Codex clean. Gemini flagged setup.ts’s sibling readiness gate at confidence 85 — originally scoped out of M8, folded in on user direction to preserve end-to-end benefit. Would have shipped as half-a-fix otherwise.

D663 in rn-dev-agent-workspace/docs/DECISIONS.md. Phase 106 in rn-dev-agent-workspace/docs/ROADMAP.md. metro-mcp reference pattern: src/utils/fiber.ts FIBER_ROOT_JS.

B132 / Phase 105 — proxy auto-resume across reconnect. Closes the known limitation logged during M1b review: the multiplexer captured hermesUrl once at startProxy time, so any event that invalidated the target URL (hot reload, target eviction, Metro restart) left the proxy routing to a dead upstream with every MCP call silently timing out. This release auto-suspends the proxy when the MCP’s CDP WebSocket closes, runs the normal reconnect loop directly against Hermes, then auto-resumes the proxy against the refreshed target URL. MCP server bumped to 0.30.0.

  • CDPClient._proxyDesired intent flag. Tracks user’s standing wish for a proxy separately from the live _proxyUrl. Set by successful startProxy(), cleared by stopProxy() / disconnect(). Preserved across internal _suspendProxy() so auto-resume knows to re-allocate.
  • CDPClient._suspendProxy() / _resumeProxy() internal lifecycle. _suspendProxy clears _proxyUrl synchronously (so the reconnect loop observes cleared state before the multiplexer’s HTTP server finishes its async shutdown) and tears down the old multiplexer best-effort. _resumeProxy rehydrates a fresh multiplexer against the CURRENT _connectedTarget.webSocketDebuggerUrl.
  • ReconnectContext.afterReconnect?: () => Promise<void>. New optional callback fired inside reconnect() after discoverAndConnect resolves successfully. Used by CDPClient to auto-resume the proxy. Hook failures are caught + logged, never propagated — a post-reconnect hook cannot undo a successful reconnect.
  • CDPClient._softReconnectDirect(). Bypasses the new softReconnect wrapper for internal callers like _doStartProxy, avoiding infinite-rollback where the wrapper would suspend the just-allocated multiplexer. Named method (not inline call) so tests can stub the direct path independently of the public softReconnect.
  • CDPClient.softReconnect() now wraps with suspend→reconnect→resume when a proxy is active. Covers all auto-recovery paths (e.g., cdp_status’s __DEV__=false recovery). When no proxy is active, behavior is unchanged.
  • CDPClient.handleClose() now fires-and-forgets _suspendProxy() before delegating to the reconnect machinery. _suspendProxy is ordered so its synchronous preamble (clearing _proxyUrl) runs before handleCloseFn returns control — guaranteeing the reconnect loop’s discoverAndConnect → connectToTarget → ctx.getProxyUrl() sees null. Multiplexer HTTP server shutdown runs concurrently but harmlessly (no one still routes to it).
  • B132: stale hermesUrl in multiplexer after target change or Metro reload — see Added. The multiplexer now rehydrates against the fresh target URL on every reconnect, eliminating the silent “proxy routes to dead upstream” failure mode.
  • 462 → 475 tests passing (+13): 10 CDPClient proxy-lifecycle tests (_suspendProxy sync behavior, _resumeProxy guards + one-shot failure policy, softReconnect wrapper suspend+resume, stopProxy/disconnect clearing desired flag, end-to-end URL rehydration via both softReconnect wrapper and afterReconnect callback trigger paths) + 3 reconnect-loop tests (afterReconnect fires exactly once on success, undefined-callback backwards compat, hook failure doesn’t propagate).

Gemini + Codex both returned clean (no high-confidence issues). Six critical race-condition questions verified: suspend-before-reconnect ordering, double-resume hazard on preemption, _doStartProxy rollback correctness, _resumeProxy failure policy, _startProxyInFlight concurrency sharing, end-to-end test soundness. One below-threshold observation from Gemini (the end-to-end test exercised the softReconnect-wrapper path but not the afterReconnect trigger specifically) closed with an additional focused test.

  • _resumeProxy failure → clear _proxyDesired (predictable over resilient). Silent-retry-on-every-reconnect would mask structural bugs. User sees the log warning once and can re-run cdp_open_devtools to retry manually.
  • handleClose uses void this._suspendProxy() (fire-and-forget). The synchronous preamble is sufficient to redirect the reconnect; awaiting mux.stop() would block the handleClose path unnecessarily.
  • D662 in DECISIONS.md. Phase 105 in ROADMAP.md. B132 closed in BUGS.md. Parent: D661 / Phase 104 (M1b, 2026-04-21). Branch: fix/b132-proxy-auto-resume.

M1b / Phase 104 — CDP proxy routing integration. Completes the M1 story split from 2026-04-20: on RN < 0.85, cdp_open_devtools now starts the multiplexer proxy automatically and re-routes the MCP’s own CDP WebSocket through it, so React Native DevTools can connect to the same proxy as a second consumer. Both coexist on single-debugger Hermes without evicting each other. MCP server bumped to 0.29.0.

  • CDPClient.startProxy(opts?) / stopProxy(). Lifecycle methods that create/dispose the multiplexer and soft-reconnect the MCP’s CDP WebSocket to route through it. Idempotent when already active.
  • cdp_status.proxy block. Reports { active, port, url, consumerCount }. consumerCount observes the 1 → 2 transition when DevTools connects.
  • cdp_open_devtools proxy-active mode. New fields: hermesWsUrl (direct Hermes URL, upstream of proxy) and proxyPort (bound loopback port). devtoolsUrl now always non-null and points DevTools at ws=127.0.0.1:PROXY_PORT when proxy-active.
  • CDPMultiplexer bounded resources. hermesBufferMaxSize option (default 1000) with drop-oldest enforcement in sendToHermes(); routingTimeoutMs option (default 60s) with periodic sweeper. Test-only getters hermesBufferSize / routingTableSize for regression assertions.
  • cdp_open_devtools mode rename: 'proxy-required''proxy-active' when RN < 0.85 is detected (or version probe fails — conservative default). Previously: returned workaround guidance + null devtoolsUrl. Now: proxy auto-starts, devtoolsUrl populated, DevTools is usable immediately. The old 'proxy-required' mode no longer exists.
  • CDPClient.disconnect() tears down multiplexer if one is active. The only reliable SIGTERM hook for the proxy; matches the precedent set by MetroEventsClient cleanup in the same path.
  • CDPMultiplexer.start() failure cleanup sets state='stopping' during cleanup (matching stop()), not 'stopped' before. Closes a concurrent-start race where a second caller would observe “stopped” and allocate on top of in-flight teardown.
  • Unbounded hermesBuffer during CONNECTING window — messages from fast/misbehaving consumers could pile up between new WebSocket(hermesUrl) and the open event. Cap + drop-oldest now enforced.
  • routingTable leaks when Hermes goes partial-death — entries allocated per consumer→upstream request were only cleaned up on close events. Unresponsive-but-not-closed upstreams leaked routing entries indefinitely. Periodic sweeper evicts entries past routingTimeoutMs.
  • 451 → 462 tests passing (+11): 3 prereq regression tests (hermesBuffer drop-oldest, routing sweeper, failed-start cleanup) + 1 open-devtools startProxy-error path + 10 CDPClient lifecycle tests. 2 existing cdp_open_devtools tests rewritten for proxy-active mode. Shared helper test/helpers/mock-hermes.js extracted for reuse across proxy tests.
  • CDPClient.startProxy concurrency guard (flagged by both Gemini + Codex at 92-95% confidence). Two parallel callers would each allocate a CDPMultiplexer; the second overwrote _multiplexer and orphaned the first (port bound, sweeper running, unreferenced). Fixed with an _startProxyInFlight promise cache that serializes concurrent callers on the same in-flight promise and clears in a finally so failed attempts don’t poison retries.
  • Rollback-path test coverage (flagged by both at 85-90% confidence). The catch block tearing the multiplexer back down when softReconnect throws post-allocation was unreachable by the existing mock-client tests (softReconnect never rejected). Added 3 tests using a real mock Hermes that exercise the rollback, the concurrency guard, and the in-flight-cache-clears-on-failure behavior.

Stale hermesUrl after target change or bundle reload — multiplexer captures the URL once at startProxy time. If Hermes regenerates the URL (reload, eviction, Metro restart), the proxy forwards to a dead upstream until cdp_disconnect + re-run cdp_open_devtools. Pre-existing M1a design limit, not introduced by M1b. Filed as B132 in BUGS.md for follow-up (requires multiplexer upstream-refresh API or client-level teardown-and-restart on target change).

  • D661 in DECISIONS.md. Phase 104 in ROADMAP.md. Parent: M1a / D654 (Phase 100, 2026-04-20). Branch: feat/m1b-cdp-proxy-routing.

Phase 90 metro-mcp pattern adoption (Tier 1 + Tier 2) plus story-driven bug sweep. MCP server bumped to 0.28.0. Seven PRs merged on main since v0.25.0 without intermediate public releases; v0.33.0 is the first public-release checkpoint for all of it.

  • cdp_metro_events MCP tool (M5 / D656). Read Metro reporter events (bundle_build_started / bundle_build_done / bundle_build_failed, reloads) captured by the MetroEventsClient attached alongside every CDP session. Accepts limit / type filter / clearErrors. Returns { eventsConnected, lastBuild, buildErrors, events, count, eventsReason?, hint? }.
  • cdp_open_devtools MCP tool (M1a / D654). Reports the React Native DevTools frontend URL + whether DevTools can coexist with the MCP session on the current RN version. On RN ≥ 0.85 returns a direct URL (native multi-debugger). On RN < 0.85 returns explicit guidance — full proxy auto-wiring deferred to M1b.
  • cdp_status.metro fields eventsConnected / lastBuild / buildErrors / eventsReason (M5 + B129). Surface bundler state and incompatibility reasons. On Expo-managed projects eventsReason: "expo-cli-incompatible" is set because Expo CLI hijacks /events for its manifest protocol.
  • cdp_status.capabilities.supportsMultipleDebuggers (M1a / D654). True when RN ≥ 0.85.
  • Single-instance MCP lockfile at /tmp/rn-dev-agent-cdp-${uid}-${hash}.lock (M3 / D652). Two Claude Code windows in the same project no longer fight over the single Hermes CDP slot — the second exits with code 11 and an actionable stderr message. --no-lock CLI flag for CI parallelism. Three-tier stale-lock reclaim: PID alive (kill(pid, 0)) + process name (ps -p <pid> -o args=) + mtime < 24h.
  • cdp_network_log + cdp_network_body gain optional device arg (M4 / D655). Default scope is the active device; pass "all" for a chronologically-merged union across every device.
  • rn-best-practices rule 5.2 (R7 / D650). Documents the presentation: 'transparentModal' blank-white bug on RN 0.76.7 + Bridgeless + react-native-screens 4.4.x and the dark BlurView workaround.
  • Exponential reconnect with jitter replaces the old linear 1.5s × 30 retry loop (M2 / D653). Curve: [0, 500, 1000, 2000, 4000, 8000, 16000, 30000, ...] ±500ms jitter. Attempt 0 returns 0ms so hot-reload reconnects stay instant. Metro CPU wake-ups in the first 60s of an outage drop from ~40 to ~7 attempts (5× less hammering). interruptibleSleep polls the dispose / soft-reconnect flags every 500ms so softReconnect’s 3s bail window still preempts a 30s cap sleep.
  • DeviceBufferManager for network events is now a process-scoped singleton at src/cdp/network-buffer-manager.ts (B128 / D657). Previously owned by CDPClient, so cdp_connect(force:true) / cdp_restart wiped all per-device buffers. Now buffers survive the canonical platform-switch use case. Memory unchanged (100 × 10 = 1000 entries total).
  • Platform inference reads Metro’s deviceName before falling back to package-list heuristics (B131 / D660). Dual-install bundles (same com.example.app on both iOS sim + Android emulator) are now correctly disambiguated by "iPhone 17 Pro" vs "sdk_gphone16k_arm64 - 17 - API 37" instead of defaulting to iOS + ambiguousPlatform: true.
  • Runner-leak recovery closeSession wrapper now also calls clearActiveSession() + stopFastRunner() (B130 / D659). Matches the normal close path. Stale fast-runner ref-map no longer survives recovery, so the post-recovery snapshot lands via daemon/CLI (with @eN refs) instead of fast-runner (tree-shaped, no refs) — which means device_fill / press / find actually work after recovery fires.
  • B128: per-device buffers wiped on reconnect — see Changed. Root cause: DeviceBufferManager lifetime was tied to CDPClient instance, not MCP process.
  • B129: Expo /events endpoint incompatibility surfaces silentlyMetroEventsClient now probes HTTP GET /events before WS upgrade. If the body matches the Expo manifest shape (runtimeVersion string OR launchAsset.url string), marks state 'incompatible' with eventsReason: "expo-cli-incompatible" and an actionable hint. Probe failure (timeout / non-200) falls through to WS attempt — doesn’t mark incompatible.
  • B130: device_fill “No snapshot in session” after runner-leak recovery — see Changed.
  • B131: cdp_connect({platform: "android"}) errored with “no matches” on dual-install bundles — see Changed.
  • M2 multi-review catch: softReconnect preemption race at 30s capinterruptibleSleep polls the dispose/soft-reconnect flags every 500ms so preemption latency stays bounded regardless of the sleep duration.
  • M5 multi-review catches — double-schedule on initial connect failure (error + close both fired), start() during reconnecting double-connected, port mismatch after CDP port change, stop() during CONNECTING state crashed the process via unhandled handshake-abort error. All four fixed with targeted regression tests.
  • M3 pre-release multi-review catch: ps -p <pid> -o comm= returned only "node" for Node-launched scripts, which meant the cdp-bridge needle match would NEVER succeed in production → the lockfile would be a no-op. Switched to -o args= which returns the full command line. This bug would have shipped silently without multi-review.
  • Unit test suite: 24,246ms → 3,151ms (87% faster) after adding skipIncompatibilityProbe: true to pre-B129 MetroEventsClient tests that use the WS-only mock server. The mock doesn’t respond to HTTP GET; every test was paying a 1500ms probe timeout.
  • Screenshot downscale via sips (B120/D647 from 0.26.0 — first public release). device_screenshot auto-resizes to max 800px width via macOS sips, saving ~35–46% on iPhone captures with no readability loss.

272 → 448 (+176 across the series):

  • M3: 14 hermetic unit + 4 real-process regression (stale-lock reclaim, multi-project coexistence, process-name validation against a real child process)
  • M2: 8 curve tests + 6 interruptibleSleep tests
  • M1a: 7 pure-function + 10 multiplexer integration + 5 tool handler
  • M4: 20 DeviceBufferManager tests + updates to 6 pre-existing network-tool tests
  • M5: 13 feature + 4 pass-1 regression + 1 pass-2 crash regression
  • B128-B131: 4 singleton + 10 Expo detector + 2 recovery-close-wrapper contract + 7 deviceName inference

Every feature PR and the fix PR went through a 2-pass multi-review (Gemini + Codex in parallel). Pass 1 blockers caught and fixed pre-merge. Three of the M3/M2/M5 blockers would have silently degraded or broken production had they shipped without review. The pattern “hermetic injection for unit coverage + at least one integration test per feature exercising the real default against a real external thing” captured in D652 and reinforced by every subsequent fix.

All three cross-platform validation stories (M4 network isolation, M5 Metro events, device interaction parity) and the B128-B131 fix validation story executed live against both iOS and Android simulators. 8/8 assertions pass in the B128-B131 validation. Artifacts in docs/stories/*.md and docs/proof/*.jpg in the workspace repo.

  • Required action: restart Claude Code after /plugin update rn-dev-agent to load the new MCP server. /reload-plugins alone does not respawn MCP subprocesses.
  • Expected behavior change (B131): cdp_connect({platform: "android"}) now succeeds on apps with the same bundleId installed on both iOS and Android. Callers that relied on explicit targetId for disambiguation are unaffected — the platform filter is now an additional valid path.
  • Expected behavior change (B128): network buffers persist across cdp_connect(force:true) / cdp_restart. To explicitly wipe, call cdp_network_log({clear: true}) (scoped to active device) or pass device: "<key>" for a specific device.
  • Expected behavior change (B129): on Expo-managed projects, cdp_status.metro.eventsConnected is now correctly false (previously true with silent empty events). Applications watching lastBuild should also watch eventsReason for the "expo-cli-incompatible" signal.
  • Two-window workflow (M3): opening the same project in two Claude Code windows now exits the second MCP with code 11 and the conflict message. Kill PID or close the other window to resolve.
AreaiOSAndroid
CDP connect + targets✅ (after B131 fix)
Per-device network buffers
Cross-device 'all' merge
Metro events (Expo → incompatible)
device_fill post-recovery✅ (B130)n/a (no runner-leak on Android)
cdp_open_devtools native moden/a (RN 0.76 < 0.85)n/a
Single-instance lockfile
  • Closed: M3 + M2 + M1a + M4 + M5 + R7 (Phase 85) + B128-B131. Phase 90 Tier 1 + Tier 2 complete.
  • Open (carveouts): M1b (CDPClient proxy routing — needs live simulator for end-to-end verification); Tier 3 (M6 test recorder, M7 fast-runner liveness, M8 renderer 1..5 loop); Tier 4 (M9–M11 polish).
  • Noted during validation but not blocking: cdp_store_state dot-path resolver breaks on hyphenated Zustand keys; stale agent-device daemon sessions (rn-agent-recovery-*) persist across MCP boots and cause DEVICE_IN_USE on first session open. Workarounds: pass storeType without path; agent-device close --session <name>.

Three-PR stability sprint: zombie target disambiguation (B111), MCP process lifecycle hardening (B76 + zombie cleanup), and security documentation (B5). MCP server bumped to 0.20.0. Skipped 0.24.0 because the inter-PR version coordination jumped from 0.23.0 → 0.24.0 (PR #32) → 0.25.0 (PR #33) on main without a public release at the intermediate step.

  • cdp_restart MCP tool — in-process soft state reset (disconnect + new CDPClient + autoConnect). Recovers from stuck connection state without losing the CC session. Does NOT reload new dist/ — that still requires a full Claude Code restart (B76/D644).
  • cdp.bundleId field on cdp_status — surfaces the connected target’s description (Metro reports the bundleId there) for “which app am I connected to?” debugging (B111/D643).
  • README ## Security section — documents that cdp_evaluate runs unrestricted JS in the app’s Hermes runtime; recommends local-dev-only usage and treating the agent like a developer with shell access (B5/PR #34).
  • B111 (CRITICAL — silent data corruption): CDP target selection picked zombie over fresh app target. selectTarget now hard-fails on explicit targetId / bundleId mismatch with actionable warnings listing available ids/descriptions; autoConnect auto-populates preferredBundleId from resolveBundleId(platform); bundleId/preferredBundleId matching is case-insensitive; deterministic sort tie-break (page-id desc → preferredBundleId-matched first → ascending lex by full id) (D643).
  • B76: MCP server cannot be restarted within a session — fixed via the new cdp_restart tool for in-process reset. SIGUSR2 handler retained for future supervisor wiring (CC does not auto-respawn MCP subprocesses today) (D644).
  • MCP zombie subprocesses surviving parent CC quit — root cause: the 5s setInterval background Metro poll held the Node event loop alive indefinitely when CC closed stdin without SIGTERM. New lifecycle/graceful-shutdown.ts factory funnels SIGTERM/SIGINT/SIGHUP/SIGUSR2/stdin.end/uncaughtException into a single idempotent shutdown path (clears bgPoll → disconnects CDP → stops fast-runner → exit) with a 3s timeout race for stuck cleanup (D644).
  • CDPClient.disconnect() race safety — added 2-line idempotent guard so concurrent cdp_restart + signal-shutdown don’t race (D644).
  • Latent production bug surfaced by CI: setTimeout(...).unref() on the load-bearing graceful-shutdown timeout meant the timer wouldn’t fire when the event loop had no other work, defeating its purpose. Removed .unref() so the timer always keeps the loop alive long enough to force-exit (D644 follow-up).

Verified-stale (closed via empirical sweep, no code change in this release)

Section titled “Verified-stale (closed via empirical sweep, no code change in this release)”
  • B73 (HIGH): MCP dies on Metro restart — verified empirically already fixed by historical reconnect loop + background poll pattern (D622). MCP survives Metro death and auto-reconnects when Metro returns.
  • B84, B100, B110, B112 — fixes had already shipped through earlier hardening phases; BUGS.md was stale.
  • All Phase 85 R-stories (R1-R10 except R7) — closed; R7 (transparentModal) noted as react-native-screens upstream.

249 → 272 (+23). New: 10 for B111 (selectTarget hard-fail, case-insensitive, deterministic sort tie-break, discoverAndConnect throw-on-empty); 13 for B76 (gracefulShutdown factory + cdp_restart handler, including a concurrent-race test that proves idempotency under parallel invocation).

PRs #32 (B111) and #33 (B76) reviewed independently by Gemini + Codex. PR #32: 0 high-confidence issues. PR #33: 1 important (SIGUSR1 → SIGUSR2 to avoid Node --inspect collision) + 3 advisories — all 4 applied as follow-ups before merge.

  • Restart Claude Code after /plugin update rn-dev-agent to pick up the new MCP server (/reload-plugins does NOT restart MCP subprocesses).
  • New tool cdp_restart is available immediately. Use it for in-session state reset without losing CC context. Loading new dist/ after npm run build still requires a full CC quit + reopen.
  • Behavioral change (B111): callers that previously passed an explicit targetId or bundleId that didn’t match any target used to silently connect to whatever sorted first; now they get a clear error with the available ids/descriptions listed. Any caller relying on the old silent-fallthrough behavior was already getting wrong data — the new error is strictly better.
  • Behavioral change (B76): SIGINT, SIGHUP, and stdin EOF now route through graceful shutdown (previously only SIGTERM). Subprocess termination is cleaner; no zombie MCP processes after CC quit.
  • 5-gate live smoke for B76 fix (CC restart → cdp_restart tool present → invocation → MCP PID unchanged) — all green.
  • 4-gate live smoke for B111 fix (kill Metro test → bad targetId reject → bad bundleId reject → auto-select picks live target) — all green.
  • B73 verification trace at docs/proof/b73-b76-mcp-lifecycle/b73-verification.log in the workspace repo.

Plugin code-side stability backlog effectively cleared after this release. All Phase 85 R-stories closed (R7 deferred as upstream). Remaining open items in BUGS.md are out-of-scope for plugin code (workspace test-app cosmetic, environmental Hermes/Android, accepted-tradeoff items).

Major session of correctness and performance fixes surfaced by end-to-end benchmarks and a live feature-dev run. MCP server bumped to 0.18.0.

  • cdp_native_errors MCP tool — reads xcrun simctl log show on iOS / adb logcat -d on Android, parses known native-module / bundle-fetch / FATAL EXCEPTION patterns, dedupes by message body. Fills the gap when cdp_error_log / cdp_console_log stay empty because native errors fired before __RN_AGENT injected. cdp_status also emits a suspicion hint pointing at this tool when connected && !helpersInjected && !hasRedBox && errorCount === 0 (B114/D642).
  • targetId + bundleId filters on cdp_connect — disambiguate zombie Expo Go host pages from real app targets (B111/D635).
  • attachOnly: true on device_snapshot — skip app launch when it’s already running; verifies via xcrun simctl spawn booted launchctl list / adb shell pidof. Prevents the ~12s app-restart cascade. Exported isAppRunning(platform, bundleId, probes?) helper (B112/D641).
  • Platform-aware CDP timeoutsdefaultTimeout(platform) and timeoutForMethod(method, platform) apply a 2× Android multiplier via a single constant ANDROID_MULTIPLIER. CDPClient routes Runtime.evaluate paths through it using this._connectedTarget?.platform. iOS unchanged (B118/D637).
  • platform param on device_screenshot — inherits from client.connectedTarget?.platform or accepts explicit override. When no active session is open, the wrapper appends --platform <p> to agent-device CLI args. Session-bound dispatch remains the canonical path (B117/D638, partial — upstream agent-device CLI ignores --platform without a session; workaround via open session).
  • simctl listapps cross-check in platform inferencecdp/discovery.ts::inferPlatforms now reads both adb shell pm list packages AND xcrun simctl listapps booted; targets installed on both platforms are flagged with ambiguousPlatform: true. Readers are injectable for unit testing (B116/D639).
  • Tab-dispatch fix for cdp_nav_graphbuildTabNavigateArgs(tab, screen, params) emits the flat ref.navigate(tab, params) when target === tab, nested form when they differ. Prevents self-referential navigate('TasksTab', { screen: 'TasksTab' }) that left RN stuck on the old tab (B115/D640).
  • B110: MCP server reports stale version — server version now read from package.json at module load; sync-versions.sh gained a regex guard against hardcoded version: literals in src/ (D630).
  • B113: device_screenshot --format always rejected — agent-device >= 0.8.0 doesn’t accept --format. Refactored into buildScreenshotArgs() + thin delegate; now uses --out <path> explicitly, extension drives encoding (D636).
  • Freshness probe caching — 2s TTL per connectionGeneration, WeakMap-keyed. Saves 30-150ms per back-to-back tool call by skipping redundant __RN_AGENT.__v round-trips (D631).
  • Structured error codes on ResultEnvelopeToolErrorCode union (STALE_TARGET, HELPERS_STALE, RECONNECT_TIMEOUT, NOT_CONNECTED, HELPERS_NOT_INJECTED). Agents can branch on code instead of regex on error text. Back-compat preserved (D634).
  • Extracted cdp/recovery.tsprobeFreshness() + recoverFromStaleTarget() moved out of utils.ts. Replaced error-string matching for stale detection with the __RN_AGENT.__v probe as the primary signal (D633).
  • RingBuffer requestId index — optional indexKey extractor builds a parallel Map<key, item>; getByKey(id) is O(1). Swapped 5 call sites (event-handlers.ts ×4, tools/network-body.ts ×1) from findLastgetByKey (D632).
  • CDP module extraction continuedcdp/connect.ts (213 lines), cdp/helper-expr.ts, cdp/recovery.ts (99 lines). CDPClient facade shrunk further; every module now has a typed Context interface instead of reaching into the facade directly.
  • cdp/state.ts setter-based ResettableState interface — replaces as unknown as CDPResettableState cast. Renaming a private field on CDPClient now produces a real TypeScript error.

Cross-platform benchmark (Task Power User flow + Priority Filter Row feature):

  • iOS (iPhone 17 Pro, iOS 26.3): 3.37s / 29 calls / 0 failures (cdp_interact p50 7ms)
  • Android (Pixel_9_Pro, API 37) pre-fix: 16.11s / 32 calls / 3 failures (incl. 5.3s typeText timeout)
  • Android post-fix: ~7.2s / 24 calls / 0 failures — 55% faster, zero false-negative timeouts (cdp_interact p50 16ms, p95 45ms)

158 → 249 (+91 this release cycle).

D630 through D642 in rn-dev-agent-workspace/docs/DECISIONS.md.

  • cdp_set_shared_value tool — Drive Reanimated SharedValue animations by testID for proof captures when gesture/scroll synthesis is unavailable. Walks the React fiber tree, finds the named prop, sets .value on the JS thread (D623).
  • Fast-runner auto-restart — When fast-runner dies mid-session, tryFastRunner automatically attempts one restart using the session’s deviceId. isFastRunnerAvailable() now probes process liveness via kill(pid, 0) instead of just checking state file (D620).
  • Reload counter + NativeWind corruption warningcdp_status warns after 5+ cdp_reload calls in a session: “NativeWind stylesheet may be corrupted” (D622).
  • Auto-open device session in Phase 5.5 — The 8-phase pipeline skill now mandates opening a device session at verification start, preventing fallback to bash commands (D619).
  • 9 new unit tests for deviceId parsing — covers all 4 agent-device response shapes, UDID regex validation, priority ordering, and edge cases. Test count: 139 → 148 (D625).
  • B103: cdp_navigate false success — Fallback navigation path now verifies the target screen exists in the navigation state after dispatch. Returns error if screen not found in any navigator (D616).
  • B106: device_scroll/device_swipe deadlock on Reanimated screens — Routes through fast-runner HID synthesis when available, bypassing agent-device daemon’s waitForIdle which deadlocks with Reanimated worklets (D610).
  • B107: deviceId parsing for agent-device v0.8.0 — Parses data.device_udid, data.id, and data.device.id (when object). Prefers device_udid over generic id. Validates against UDID regex before ensureFastRunner (D611, D618).
  • R2: device_screenshot ignores requested path — Fast-runner screenshot tier now copies the captured PNG to the requested output path instead of always writing to /tmp (D617).
  • R5: Scroll amount semantics diverge — Dropped * 2 factor in fast-runner scroll computation to match agent-device daemon’s interpretation of amount (D621).
  • MCP-only proof capture enforcement — Added “Never use xcrun simctl for screenshots” and “Never use sleep for settling” to skill boundaries (D624).
  • hono 4.12.12 → 4.12.14 — Fixes HTML injection in JSX SSR (Dependabot #5). Transitive dep of @modelcontextprotocol/sdk.
  • MCP tool count: 51 → 52 (cdp_set_shared_value). CDP tools: 24 → 25.
  • Plugin version: 0.21.1 → 0.22.0. MCP server: 0.16.0 → 0.17.0.
  • Decisions logged: D610-D625 (16 new).
  • MCP tools unavailable in spawned subagents (GH #31) — Agents split into protocol playbooks (parent-session-only) and spawnable workers.
  • MCP server reconnection failure after upgrade (#30) — Renaming the mcpServers key from rn-dev-agent-cdp to cdp in v0.19.1 broke Claude Code session reconnection. Added upgrade detection in SessionStart hook: compares plugin version against last-seen version, outputs restart notice on upgrade.
  • Convention D605: MCP server keys in plugin.json must never be renamed in minor or patch versions. Major versions may rename with explicit migration notes.

If CDP tools fail after upgrading, restart Claude Code to reinitialize MCP servers. This is a one-time issue caused by the server key rename in v0.19.1.

  • Experience Engine (Phases A-D) — self-improving failure pattern learning system:
    • Phase B: Classification + Retrieval — normalized error signatures, failure family matching, three-layer experience cascade (seed → project → user), environment fingerprint filtering
    • Phase B: Ghost Recovery — auto-recovers FF_STALE_CDP transparently (depth-1 circuit breaker, 30s cooldown, 15s timeout)
    • Phase C: Compaction + Promotion — telemetry scanner, candidate generator, auto-promotes ghost recoveries, stale heuristic decay, rn-agent-compact command
    • Phase D: Sharing + Polish — anonymized export/import, experience health dashboard, rn-agent-export, rn-agent-import, rn-agent-health commands
  • Auto-handle Dev Client picker (#9) — cdp_status detects and dismisses the Expo Dev Client server picker via device_find, auto-retries CDP connection after dismissal
  • FF_DEV_CLIENT_PICKER failure family in seed experience
  • MCP tool count: 25 (unchanged). Command count: 6 → 10 (4 new experience engine commands).
  • cdp_status refactored: extracted buildStatusResult() helper, picker detection in catch block
  • record_proof.sh standardized video output (#14): always MP4 with -movflags +faststart, ffprobe validation before copy, graceful fallback preserving correct extension
  • All command/skill .mov references updated to .mp4
  • Zod schemas tightened: count, holdMs, durationMs, amount, scale now have min/max bounds
  • ENAMETOOLONG on marketplace install (#6) — changed to local source "./" in marketplace.json
  • Shell globbing vulnerability in androidClipboardFill — escape *?[]{} chars
  • Missing -s device serial in adb calls — added getAdbSerial() helper
  • Platform detection gapisAndroidSession() falls back to ANDROID_SERIAL env
  • Misleading disableDevMenu fallback — removed unrelated setIsDebuggingRemotely call
  • ANDROID_SDK_ROOT not honored in run.sh — maps to ANDROID_HOME
  • Ineffective ANDROID_SERIAL export — persisted to file for cross-process access
  • Inexact package matching in post-edit health check — exact match with grep -cxF
  • Video corruption (#14) — record to temp, convert on stop, validate with ffprobe
  • Double .mp4.mp4 extension — strip any extension before appending .mp4
  • device_longpress — long press by @ref or coordinates with configurable duration. Enables context menus, drag initiation, hold-to-delete.
  • device_scroll — native directional scroll with configurable amount (0-1). Smoother than swipe for list scrolling.
  • device_scrollintoview — scroll until element visible by text or @ref. Works with ScrollView content (FlatList virtualizes, so elements must be rendered).
  • device_pinch — pinch/zoom gesture with scale factor and optional center point. iOS simulator only.
  • device_press enhanced — added doubleTap, count (repeated taps), and holdMs (long press via ref) options.
  • device_swipe enhanced — now supports coordinate-based swipes (x1,y1,x2,y2,durationMs) for precise gestures (drag-to-reorder, bottom sheets, pull-to-refresh). Direction shortcut still works, now delegates to native scroll.
  • MCP tool count: 21 → 25 (4 new device gesture tools).
  • disableDevMenu action for cdp_dev_settings (#8) — suppresses shake-to-show dev menu via DevSettings.setIsShakeToShowDevMenuEnabled(false). Auto-called before proof recordings.
  • Pre-recording readiness check in proof-capture and rn-feature-dev Phase 8 (#8) — verifies valid navigation route (not Dev Client picker) and disables dev menu before recording starts.
  • Dev Client clearState warning in rn-testing skill (#8) — all Maestro YAML examples updated to not use clearState:true.
  • rn-tester agent Safety Constraints now explicitly forbid clearState:true with Dev Client builds.
  • Video label subcommand (record_proof.sh label) — adds timed text labels to proof videos in a dedicated dark bar below the video content. Cross-platform (works on any .mp4). Uses Pillow for rendering, auto-installs in venv if missing.
  • Android emulator readiness script (scripts/ensure-android-ready.sh) — checks boot completion, cleans stale port forwarding, auto-selects ANDROID_SERIAL, warns about Play Protect. Runs on SessionStart.
  • Android text input workarounddevice_fill auto-detects Android sessions and chunks long/special-char strings into safe 10-char segments via adb shell input text.
  • Android app installation check in post-edit health check — verifies expo.android.package via adb shell pm list packages.
  • Android-Specific Testing Rules section in rn-testing skill — maestro-runner enforcement, text input best practices, boot timing, Play Protect.
  • 2 new failure familiesFF_MAESTRO_GRPC_ANDROID and FF_ANDROID_TEXT_INPUT_CRASH in seed experience.
  • 3 new platform quirksPQ_ANDROID_MAESTRO_GRPC, PQ_ANDROID_TEXT_INPUT_CRASH, PQ_ANDROID_PLAY_PROTECT.
  • maestro-runner enforced on Android — all agents (rn-tester, rn-debugger) and skills now require maestro-runner over classic Maestro for Android flows. Classic Maestro’s gRPC driver is unreliable (upstream #998).
  • All Maestro commands now include --platform flag explicitly.
  • Maestro gRPC UNAVAILABLE on Android (#7) — bypassed by enforcing maestro-runner which uses HTTP to UIAutomator2 instead of gRPC.
  • mobile_type_keys crashes app on Android (#7) — special characters and long strings now auto-chunked.
  • ENAMETOOLONG on marketplace install (#6) — repo renamed from react-native-dev-claude-plugin to rn-dev-agent, shortening marketplace qualifier from 39 to 21 chars on every cached path.
  • Shortened 9 long reference filenames in skills/rn-best-practices/references/ (max 42 → 31 chars).
  • Updated all internal references: plugin.json, marketplace.json, README install commands, troubleshooting, and source clone instructions.
  • collect_logs tool — multi-source log collection from JS console, native iOS (xcrun simctl log stream), and native Android (adb logcat) in parallel. Results merged by timestamp.
  • App-Side Dev Bridge (@rn-dev-agent/runtime) — stable public API replacing fragile fiber walks for navigation state, store state, console, and errors. Local dev-bridge.ts for test-app integration.
  • Vercel RN Best Practices skill — 36 rules from vercel-labs/agent-skills + 3 custom rules. Pass 4 keyword-triggered reviewer integration.
  • Post-edit health check hook — detects app crashes after source file edits via PostToolUse hook. Gated on active CDP session to avoid false positives.
  • MCP server resilience — reconnect window extended to 46s (30 attempts), background Metro poll for auto-reconnect after Metro restart.
  • DiagnosticsScreen (test-app) — dev-only screen with FlashList log viewer, level filter pills, and pull-to-refresh for collect_logs validation.
  • GlobalSearchModal (test-app) — FlashList with heterogeneous items, cross-store search, text highlighting.
  • TaskStatsCard (test-app) — Reanimated animated progress bar with staggered entries.
  • Auto-update guide in README for marketplace plugin users.
  • Navigation debugging recipe — B75 nested navigator patterns documented in skills/rn-debugging/references/.
  • Plugin now requires Node.js >= 22 (LTS).
  • Reviewer agent (Pass 4) loads best-practice rules based on keyword triggers in reviewed code.
  • Architect agent references CRITICAL/HIGH rules when designing component architecture.
  • cdp_status reports capabilities.bridgeDetected and capabilities.bridgeVersion.
  • Bridge-aware routing in navigation state, store state, console log, error log, and dispatch tools.
  • Health check hook gated on active CDP session flag file (/tmp/rn-dev-agent-cdp-active).
  • Bridgeless mode target detection checks both .title and .description fields.
  • Post-edit health check false positives outside RN projects (GH #1).
  • Post-edit health check false positives when app not installed or simulator booted without app (GH #2).
  • Console double-wrapping on Fast Refresh via global sentinel.
  • Store auto-detection re-scans globals on every call instead of caching first result.
  • Bridge detector validates required methods instead of accepting any truthy global.
  • Reconnect resets bridge state in handleClose() and softReconnect().
  • Initial release.
  • 19 MCP tools: 11 CDP (status, evaluate, reload, component tree, navigation state, store state, error log, network log, console log, interact, dev settings) + 8 device (list, screenshot, snapshot, find, press, fill, swipe, back).
  • 3 skills: rn-device-control, rn-testing, rn-debugging.
  • 5 agents: rn-tester, rn-debugger, rn-code-explorer, rn-code-architect, rn-code-reviewer.
  • 5 commands: rn-feature-dev, test-feature, debug-screen, check-env, build-and-test.
  • Injected helpers IIFE for Hermes runtime introspection.
  • Ring buffers for console (200), network (100), and error (50) events.
  • Network fallback for RN < 0.83 via fetch/XHR monkey-patches.
  • Auto-discovery across Metro ports 8081/8082/19000/19006.
  • maestro-runner and agent-device auto-installation hooks.