Browser Evaluate CDP Refactor Plan¶
Context¶
act:evaluate executes user provided JavaScript in the page. Today it runs via Playwright
(page.evaluate or locator.evaluate). Playwright serializes CDP commands per page, so a
stuck or long running evaluate can block the page command queue and make every later action
on that tab look "stuck".
PR #13498 adds a pragmatic safety net (bounded evaluate, abort propagation, and best-effort
recovery). This document describes a larger refactor that makes act:evaluate inherently
isolated from Playwright so a stuck evaluate cannot wedge normal Playwright operations.
Goals¶
act:evaluatecannot permanently block later browser actions on the same tab.- Timeouts are single source of truth end to end so a caller can rely on a budget.
- Abort and timeout are treated the same way across HTTP and in-process dispatch.
- Element targeting for evaluate is supported without switching everything off Playwright.
- Maintain backward compatibility for existing callers and payloads.
Non-goals¶
- Replace all browser actions (click, type, wait, etc.) with CDP implementations.
- Remove the existing safety net introduced in PR #13498 (it remains a useful fallback).
- Introduce new unsafe capabilities beyond the existing
browser.evaluateEnabledgate. - Add process isolation (worker process/thread) for evaluate. If we still see hard to recover stuck states after this refactor, that is a follow-up idea.
Current Architecture (Why It Gets Stuck)¶
At a high level:
- Callers send
act:evaluateto the browser control service. - The route handler calls into Playwright to execute the JavaScript.
- Playwright serializes page commands, so an evaluate that never finishes blocks the queue.
- A stuck queue means later click/type/wait operations on the tab can appear to hang.
Proposed Architecture¶
1. Deadline Propagation¶
Introduce a single budget concept and derive everything from it:
- Caller sets
timeoutMs(or a deadline in the future). - The outer request timeout, route handler logic, and the execution budget inside the page all use the same budget, with small headroom where needed for serialization overhead.
- Abort is propagated as an
AbortSignaleverywhere so cancellation is consistent.
Implementation direction:
- Add a small helper (for example
createBudget({ timeoutMs, signal })) that returns: signal: the linked AbortSignaldeadlineAtMs: absolute deadlineremainingMs(): remaining budget for child operations- Use this helper in:
src/browser/client-fetch.ts(HTTP and in-process dispatch)src/node-host/runner.ts(proxy path)- browser action implementations (Playwright and CDP)
2. Separate Evaluate Engine (CDP Path)¶
Add a CDP based evaluate implementation that does not share Playwright's per page command queue. The key property is that the evaluate transport is a separate WebSocket connection and a separate CDP session attached to the target.
Implementation direction:
- New module, for example
src/browser/cdp-evaluate.ts, that: - Connects to the configured CDP endpoint (browser level socket).
- Uses
Target.attachToTarget({ targetId, flatten: true })to get asessionId. - Runs either:
Runtime.evaluatefor page level evaluate, orDOM.resolveNodeplusRuntime.callFunctionOnfor element evaluate.
- On timeout or abort:
- Sends
Runtime.terminateExecutionbest-effort for the session. - Closes the WebSocket and returns a clear error.
- Sends
Notes:
- This still executes JavaScript in the page, so termination can have side effects. The win is that it does not wedge the Playwright queue, and it is cancelable at the transport layer by killing the CDP session.
3. Ref Story (Element Targeting Without A Full Rewrite)¶
The hard part is element targeting. CDP needs a DOM handle or backendDOMNodeId, while
today most browser actions use Playwright locators based on refs from snapshots.
Recommended approach: keep existing refs, but attach an optional CDP resolvable id.
3.1 Extend Stored Ref Info¶
Extend the stored role ref metadata to optionally include a CDP id:
- Today:
{ role, name, nth } - Proposed:
{ role, name, nth, backendDOMNodeId?: number }
This keeps all existing Playwright based actions working and allows CDP evaluate to accept
the same ref value when the backendDOMNodeId is available.
3.2 Populate backendDOMNodeId At Snapshot Time¶
When producing a role snapshot:
- Generate the existing role ref map as today (role, name, nth).
- Fetch the AX tree via CDP (
Accessibility.getFullAXTree) and compute a parallel map of(role, name, nth) -> backendDOMNodeIdusing the same duplicate handling rules. - Merge the id back into the stored ref info for the current tab.
If mapping fails for a ref, leave backendDOMNodeId undefined. This makes the feature
best-effort and safe to roll out.
3.3 Evaluate Behavior With Ref¶
In act:evaluate:
- If
refis present and hasbackendDOMNodeId, run element evaluate via CDP. - If
refis present but has nobackendDOMNodeId, fall back to the Playwright path (with the safety net).
Optional escape hatch:
- Extend the request shape to accept
backendDOMNodeIddirectly for advanced callers (and for debugging), while keepingrefas the primary interface.
4. Keep A Last Resort Recovery Path¶
Even with CDP evaluate, there are other ways to wedge a tab or a connection. Keep the existing recovery mechanisms (terminate execution + disconnect Playwright) as a last resort for:
- legacy callers
- environments where CDP attach is blocked
- unexpected Playwright edge cases
Implementation Plan (Single Iteration)¶
Deliverables¶
- A CDP based evaluate engine that runs outside the Playwright per-page command queue.
- A single end-to-end timeout/abort budget used consistently by callers and handlers.
- Ref metadata that can optionally carry
backendDOMNodeIdfor element evaluate. act:evaluateprefers the CDP engine when possible and falls back to Playwright when not.- Tests that prove a stuck evaluate does not wedge later actions.
- Logs/metrics that make failures and fallbacks visible.
Implementation Checklist¶
- Add a shared "budget" helper to link
timeoutMs+ upstreamAbortSignalinto: - a single
AbortSignal - an absolute deadline
- a
remainingMs()helper for downstream operations - Update all caller paths to use that helper so
timeoutMsmeans the same thing everywhere: src/browser/client-fetch.ts(HTTP and in-process dispatch)src/node-host/runner.ts(node proxy path)- CLI wrappers that call
/act(add--timeout-mstobrowser evaluate) - Implement
src/browser/cdp-evaluate.ts: - connect to the browser-level CDP socket
Target.attachToTargetto get asessionId- run
Runtime.evaluatefor page evaluate - run
DOM.resolveNode+Runtime.callFunctionOnfor element evaluate - on timeout/abort: best-effort
Runtime.terminateExecutionthen close the socket - Extend stored role ref metadata to optionally include
backendDOMNodeId: - keep existing
{ role, name, nth }behavior for Playwright actions - add
backendDOMNodeId?: numberfor CDP element targeting - Populate
backendDOMNodeIdduring snapshot creation (best-effort): - fetch AX tree via CDP (
Accessibility.getFullAXTree) - compute
(role, name, nth) -> backendDOMNodeIdand merge into the stored ref map - if mapping is ambiguous or missing, leave the id undefined
- Update
act:evaluaterouting: - if no
ref: always use CDP evaluate - if
refresolves to abackendDOMNodeId: use CDP element evaluate - otherwise: fall back to Playwright evaluate (still bounded and abortable)
- Keep the existing "last resort" recovery path as a fallback, not the default path.
- Add tests:
- stuck evaluate times out within budget and the next click/type succeeds
- abort cancels evaluate (client disconnect or timeout) and unblocks subsequent actions
- mapping failures cleanly fall back to Playwright
- Add observability:
- evaluate duration and timeout counters
- terminateExecution usage
- fallback rate (CDP -> Playwright) and reasons
Acceptance Criteria¶
- A deliberately hung
act:evaluatereturns within the caller budget and does not wedge the tab for later actions. timeoutMsbehaves consistently across CLI, agent tool, node proxy, and in-process calls.- If
refcan be mapped tobackendDOMNodeId, element evaluate uses CDP; otherwise the fallback path is still bounded and recoverable.
Testing Plan¶
- Unit tests:
(role, name, nth)matching logic between role refs and AX tree nodes.- Budget helper behavior (headroom, remaining time math).
- Integration tests:
- CDP evaluate timeout returns within budget and does not block the next action.
- Abort cancels evaluate and triggers termination best-effort.
- Contract tests:
- Ensure
BrowserActRequestandBrowserActResponseremain compatible.
Risks And Mitigations¶
- Mapping is imperfect:
- Mitigation: best-effort mapping, fallback to Playwright evaluate, and add debug tooling.
Runtime.terminateExecutionhas side effects:- Mitigation: only use on timeout/abort and document the behavior in errors.
- Extra overhead:
- Mitigation: only fetch AX tree when snapshots are requested, cache per target, and keep CDP session short lived.
- Extension relay limitations:
- Mitigation: use browser level attach APIs when per page sockets are not available, and keep the current Playwright path as fallback.
Open Questions¶
- Should the new engine be configurable as
playwright,cdp, orauto? - Do we want to expose a new "nodeRef" format for advanced users, or keep
refonly? - How should frame snapshots and selector scoped snapshots participate in AX mapping?