Add browser execution, recording, and LLM self-healing#149
Add browser execution, recording, and LLM self-healing#149Stefz29 wants to merge 4 commits intobrowser-use:mainfrom
Conversation
Replace all hardcoded ChatBrowserUse cloud API calls with a configurable LLM provider factory (workflow_use/llm/provider.py) that reads from .env. Supports local inference via LM Studio/Ollama (OpenAI-compatible API) or Browser Use cloud as fallback. Unblocks the entire CLI and backend which previously crashed without a BROWSER_USE_API_KEY. - New module: workflow_use/llm/provider.py with get_llm() factory - Updated cli.py, backend/service.py, healing/_agent/controller.py - Added langchain-openai and python-dotenv as dependencies - Updated .env.example with LLM_PROVIDER, LLM_BASE_URL, LLM_MODEL config Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When a deterministic step fails (element not found, selector stale), the system now: 1. Captures a screenshot of the current page 2. Sends screenshot + step context to the LLM (vision) 3. LLM diagnoses the failure and suggests corrected selectors 4. Retries the step with corrections (up to 2 attempts) 5. Persists successful fixes back to the workflow YAML file New module: workflow_use/healing/step_healer.py Modified: workflow_use/workflow/service.py (enable_self_healing param, retry loop) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Integrates key patterns from autoresearch-mlx into the step healer: - Snapshot/revert: saves workflow YAML before healing, reverts on failure (mirrors autoresearch's git reset --hard on worse results) - TSV results tracking: logs every attempt to logs/healing/healing_results.tsv with timestamp, step, status (keep/discard/crash/skip), confidence, timing - Fail-fast sanity checks: skips unhealable errors (browser crash, OOM) without wasting LLM calls (mirrors autoresearch's loss > 100 check) - Exponential backoff: 1s, 2s, 4s between retry attempts (mirrors autoresearch's 2**attempt download retry pattern) - Previous attempts context: sends failed corrections to LLM so it doesn't repeat the same fix (mirrors autoresearch's results.tsv history) - Session summary: logs keep/discard/crash stats at workflow end - mark_healing_outcome(): caller confirms if fix actually worked before persisting (mirrors autoresearch's keep-or-revert decision) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Phase 2a — Extension Independence: - Add recorder_router.py with 6 API endpoints for recording events - Add CORS support for Chrome extensions in api.py - Make backend URL configurable (stored in chrome.storage.sync) - Add settings-view.tsx for backend URL config + health check - Add tab-navigation.tsx with Dashboard/Record/Settings tabs - Update sidepanel index.tsx with tab navigation wrapper - Fix isRecordingEnabled defaulting to true (should be false) - Fix START_RECORDING to always broadcast to content scripts - Add chrome.scripting.executeScript() to inject into existing tabs - Add stopped-view.tsx save-to-backend button - Add storage, scripting, activeTab permissions to wxt.config.ts Phase 2b — In-Browser Execution: - Create content-executor.ts for replaying steps in the user's Chrome - Multi-strategy element finding: target_text → CSS → XPath - waitForElement with MutationObserver for dynamic content - Real DOM events for React/Vue compatibility - Visual highlight overlay on target elements - Add ExecutionEngine class to background.ts (~250 lines) - State machine: idle → running → waiting_nav → healing → completed/failed - Navigation detection + content script re-injection - Screenshot capture for self-healing - Service worker keep-alive during execution - Add dashboard-view.tsx with workflow list, Run/Stop buttons, progress bar - Add CLAUDE.md and architecture docs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
12 issues found across 26 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="workflows/workflow_use/llm/provider.py">
<violation number="1" location="workflows/workflow_use/llm/provider.py:39">
P2: `browser_use` provider hardcodes a single model and ignores purpose/env model selection contract.</violation>
</file>
<file name="extension/wxt.config.ts">
<violation number="1" location="extension/wxt.config.ts:17">
P2: Manifest host permissions are too narrow (localhost-only) for newly added cross-site script injection/automation, causing executeScript failures on non-localhost pages.</violation>
</file>
<file name="extension/src/entrypoints/background.ts">
<violation number="1" location="extension/src/entrypoints/background.ts:47">
P2: Backend URL is loaded asynchronously, but request handlers use BACKEND_BASE_URL immediately; early messages after service-worker startup can still hit the default backend before storage restore completes.</violation>
<violation number="2" location="extension/src/entrypoints/background.ts:695">
P2: START_RECORDING reinjects the content script on every start without a singleton guard. Re-injecting the same content script typically re-runs its module initialization, which can attach duplicate listeners and cause duplicated event handling in a tab.</violation>
<violation number="3" location="extension/src/entrypoints/background.ts:987">
P2: ExecutionEngine resumes long-running async steps without re-checking state; STOP_EXECUTION (or a second start) can be overwritten mid-flight, allowing execution to continue against mutated shared state.</violation>
</file>
<file name="docs/RESUME_INSTRUCTIONS.md">
<violation number="1" location="docs/RESUME_INSTRUCTIONS.md:15">
P2: Resume instructions are hardcoded to a single developer’s absolute local paths, so the documented recovery steps will fail for any checkout not located at `/Users/SZ/Desktop/Claude_APPS/workflow-use`. Use repo-relative paths or a placeholder (e.g., `<repo-root>`).</violation>
</file>
<file name="extension/src/entrypoints/content-executor.ts">
<violation number="1" location="extension/src/entrypoints/content-executor.ts:331">
P2: `input` target selection supports contenteditable/ARIA textbox elements, but `executeInput()` only writes through native input/textarea value setters, causing input replay failures on non-native text fields.</violation>
<violation number="2" location="extension/src/entrypoints/content-executor.ts:460">
P2: `key_press` steps are incorrectly blocked by mandatory element lookup, preventing fallback to activeElement/body.</violation>
</file>
<file name="CLAUDE.md">
<violation number="1" location="CLAUDE.md:62">
P2: Architecture flow in CLAUDE.md is outdated: recording endpoint/service no longer matches actual runtime path.</violation>
</file>
<file name="extension/src/entrypoints/sidepanel/components/dashboard-view.tsx">
<violation number="1" location="extension/src/entrypoints/sidepanel/components/dashboard-view.tsx:45">
P2: Dashboard does not initialize browser execution state on mount, so reopening sidepanel during an active run can show incorrect Run/Stop/progress UI until a later status event arrives.</violation>
<violation number="2" location="extension/src/entrypoints/sidepanel/components/dashboard-view.tsx:241">
P1: Browser execution UI only treats `running` as active, so Run can be re-enabled (and Stop hidden) during `waiting_nav`/`healing` and before first status update, allowing duplicate `EXECUTE_IN_BROWSER` requests.</violation>
</file>
<file name="workflows/workflow_use/healing/step_healer.py">
<violation number="1" location="workflows/workflow_use/healing/step_healer.py:348">
P2: Final healing keep/discard status is only updated in memory; `healing_results.tsv` remains with `pending` rows and never reflects the actual outcome.</violation>
</file>
Since this is your first cubic review, here's how it works:
- cubic automatically reviews your code and comments on bugs and improvements
- Teach cubic by replying to its comments. cubic learns from your replies and gets better over time
- Add one-off context when rerunning by tagging
@cubic-dev-aiwith guidance or docs links (includingllms.txt) - Ask questions if you need clarification on any suggestion
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| <Button | ||
| size="sm" | ||
| onClick={() => executeInBrowser(wf.name)} | ||
| disabled={isExecuting || browserExecution?.state === "running"} |
There was a problem hiding this comment.
P1: Browser execution UI only treats running as active, so Run can be re-enabled (and Stop hidden) during waiting_nav/healing and before first status update, allowing duplicate EXECUTE_IN_BROWSER requests.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At extension/src/entrypoints/sidepanel/components/dashboard-view.tsx, line 241:
<comment>Browser execution UI only treats `running` as active, so Run can be re-enabled (and Stop hidden) during `waiting_nav`/`healing` and before first status update, allowing duplicate `EXECUTE_IN_BROWSER` requests.</comment>
<file context>
@@ -0,0 +1,296 @@
+ <Button
+ size="sm"
+ onClick={() => executeInBrowser(wf.name)}
+ disabled={isExecuting || browserExecution?.state === "running"}
+ className="text-xs px-2.5 py-1 h-7"
+ title="Replay in THIS browser (keeps your login sessions)"
</file context>
|
|
||
| if provider == 'browser_use': | ||
| from browser_use.llm import ChatBrowserUse | ||
| return ChatBrowserUse(model='bu-latest') |
There was a problem hiding this comment.
P2: browser_use provider hardcodes a single model and ignores purpose/env model selection contract.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At workflows/workflow_use/llm/provider.py, line 39:
<comment>`browser_use` provider hardcodes a single model and ignores purpose/env model selection contract.</comment>
<file context>
@@ -0,0 +1,58 @@
+
+ if provider == 'browser_use':
+ from browser_use.llm import ChatBrowserUse
+ return ChatBrowserUse(model='bu-latest')
+
+ # Local provider (LM Studio, Ollama, or any OpenAI-compatible server)
</file context>
| permissions: ["tabs", "sidePanel", "<all_urls>"], | ||
| host_permissions: ["http://127.0.0.1/*"], | ||
| permissions: ["tabs", "sidePanel", "storage", "scripting", "activeTab", "<all_urls>"], | ||
| host_permissions: ["http://127.0.0.1/*", "http://localhost/*"], |
There was a problem hiding this comment.
P2: Manifest host permissions are too narrow (localhost-only) for newly added cross-site script injection/automation, causing executeScript failures on non-localhost pages.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At extension/wxt.config.ts, line 17:
<comment>Manifest host permissions are too narrow (localhost-only) for newly added cross-site script injection/automation, causing executeScript failures on non-localhost pages.</comment>
<file context>
@@ -13,8 +13,8 @@ export default defineConfig({
- permissions: ["tabs", "sidePanel", "<all_urls>"],
- host_permissions: ["http://127.0.0.1/*"],
+ permissions: ["tabs", "sidePanel", "storage", "scripting", "activeTab", "<all_urls>"],
+ host_permissions: ["http://127.0.0.1/*", "http://localhost/*"],
// options_page: "options.html",
// action: {
</file context>
| host_permissions: ["http://127.0.0.1/*", "http://localhost/*"], | |
| host_permissions: ["http://127.0.0.1/*", "http://localhost/*", "http://*/*", "https://*/*"], |
| let PYTHON_SERVER_ENDPOINT = `${BACKEND_BASE_URL}/api/recorder/event`; | ||
|
|
||
| // Load saved backend URL from storage | ||
| chrome.storage.sync.get(["backendUrl"], (result) => { |
There was a problem hiding this comment.
P2: Backend URL is loaded asynchronously, but request handlers use BACKEND_BASE_URL immediately; early messages after service-worker startup can still hit the default backend before storage restore completes.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At extension/src/entrypoints/background.ts, line 47:
<comment>Backend URL is loaded asynchronously, but request handlers use BACKEND_BASE_URL immediately; early messages after service-worker startup can still hit the default backend before storage restore completes.</comment>
<file context>
@@ -35,10 +35,24 @@ export default defineBackground(() => {
+ let PYTHON_SERVER_ENDPOINT = `${BACKEND_BASE_URL}/api/recorder/event`;
+
+ // Load saved backend URL from storage
+ chrome.storage.sync.get(["backendUrl"], (result) => {
+ if (result.backendUrl) {
+ BACKEND_BASE_URL = result.backendUrl;
</file context>
|
|
||
| // Ensure content scripts are injected into all open tabs | ||
| // (they may not be present if extension was reloaded after tabs were opened) | ||
| chrome.tabs.query({}, (tabs) => { |
There was a problem hiding this comment.
P2: START_RECORDING reinjects the content script on every start without a singleton guard. Re-injecting the same content script typically re-runs its module initialization, which can attach duplicate listeners and cause duplicated event handling in a tab.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At extension/src/entrypoints/background.ts, line 695:
<comment>START_RECORDING reinjects the content script on every start without a singleton guard. Re-injecting the same content script typically re-runs its module initialization, which can attach duplicate listeners and cause duplicated event handling in a tab.</comment>
<file context>
@@ -670,20 +684,35 @@ export default defineBackground(() => {
+
+ // Ensure content scripts are injected into all open tabs
+ // (they may not be present if extension was reloaded after tabs were opened)
+ chrome.tabs.query({}, (tabs) => {
+ tabs.forEach((tab) => {
+ if (tab.id && tab.url && !tab.url.startsWith("chrome://") && !tab.url.startsWith("chrome-extension://")) {
</file context>
| } | ||
|
|
||
| // For all other steps, find the target element | ||
| const element = await waitForElement(step); |
There was a problem hiding this comment.
P2: key_press steps are incorrectly blocked by mandatory element lookup, preventing fallback to activeElement/body.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At extension/src/entrypoints/content-executor.ts, line 460:
<comment>`key_press` steps are incorrectly blocked by mandatory element lookup, preventing fallback to activeElement/body.</comment>
<file context>
@@ -0,0 +1,563 @@
+ }
+
+ // For all other steps, find the target element
+ const element = await waitForElement(step);
+
+ if (!element) {
</file context>
| : nativeInputValueSetter; | ||
|
|
||
| if (setter) { | ||
| setter.call(inputEl, value); |
There was a problem hiding this comment.
P2: input target selection supports contenteditable/ARIA textbox elements, but executeInput() only writes through native input/textarea value setters, causing input replay failures on non-native text fields.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At extension/src/entrypoints/content-executor.ts, line 331:
<comment>`input` target selection supports contenteditable/ARIA textbox elements, but `executeInput()` only writes through native input/textarea value setters, causing input replay failures on non-native text fields.</comment>
<file context>
@@ -0,0 +1,563 @@
+ : nativeInputValueSetter;
+
+ if (setter) {
+ setter.call(inputEl, value);
+ } else {
+ inputEl.value = value;
</file context>
|
|
||
| ### Communication Flow | ||
| ``` | ||
| Extension content.ts → background.ts → HTTP POST :7331/event → RecordingService |
There was a problem hiding this comment.
P2: Architecture flow in CLAUDE.md is outdated: recording endpoint/service no longer matches actual runtime path.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At CLAUDE.md, line 62:
<comment>Architecture flow in CLAUDE.md is outdated: recording endpoint/service no longer matches actual runtime path.</comment>
<file context>
@@ -0,0 +1,110 @@
+
+### Communication Flow
+```
+Extension content.ts → background.ts → HTTP POST :7331/event → RecordingService
+UI (React :5173) → REST API :8000 → backend/routers.py → WorkflowService
+Workflow execution: Workflow.run() → _execute_step() → controller actions → Playwright
</file context>
| Extension content.ts → background.ts → HTTP POST :7331/event → RecordingService | |
| Extension content.ts → background.ts → HTTP POST :8000/api/recorder/event → backend/recorder_router.py |
| }, []); | ||
|
|
||
| useEffect(() => { | ||
| fetchWorkflows(); |
There was a problem hiding this comment.
P2: Dashboard does not initialize browser execution state on mount, so reopening sidepanel during an active run can show incorrect Run/Stop/progress UI until a later status event arrives.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At extension/src/entrypoints/sidepanel/components/dashboard-view.tsx, line 45:
<comment>Dashboard does not initialize browser execution state on mount, so reopening sidepanel during an active run can show incorrect Run/Stop/progress UI until a later status event arrives.</comment>
<file context>
@@ -0,0 +1,296 @@
+ }, []);
+
+ useEffect(() => {
+ fetchWorkflows();
+
+ // Listen for execution status updates
</file context>
| # Find the last pending result for this step | ||
| for result in reversed(self._results): | ||
| if result.step_index == step_index and result.status == 'pending': | ||
| result.status = 'keep' if success else 'discard' |
There was a problem hiding this comment.
P2: Final healing keep/discard status is only updated in memory; healing_results.tsv remains with pending rows and never reflects the actual outcome.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At workflows/workflow_use/healing/step_healer.py, line 348:
<comment>Final healing keep/discard status is only updated in memory; `healing_results.tsv` remains with `pending` rows and never reflects the actual outcome.</comment>
<file context>
@@ -0,0 +1,545 @@
+ # Find the last pending result for this step
+ for result in reversed(self._results):
+ if result.step_index == step_index and result.status == 'pending':
+ result.status = 'keep' if success else 'discard'
+ if success:
+ self.total_healed += 1
</file context>
Summary
This PR implements major enhancements to the workflow automation system:
Browser Execution & Recording
content-executor.ts) for direct DOM manipulationrecorder_router.pySelf-Healing System
step_healer.py) to automatically fix failed workflow stepsInfrastructure & Config
Breaking Changes
workflow_use/healing/_agent/controller.py- removed old controller patternSummary by cubic
Adds in‑browser workflow execution with recording and LLM self‑healing, plus a configurable local LLM provider. Improves the extension UI with a dashboard and settings, and removes hardcoded cloud dependencies.
New Features
content-executor.ts+ backgroundExecutionEnginefor step queueing, navigation handling, and screenshots./api/recorder/*; background script reads backend URL fromchrome.storage.sync.workflow_use/healing/step_healer.pywith snapshot/revert, retries, TSV logs; integrated intoworkflow/service.py.workflow_use/llm/provider.pyloads.envto use local LM Studio/Ollama vialangchain-openaior Browser Use cloud.storage,scripting,activeTab), and new docs.Migration
workflows/.env.exampleto.envand setLLM_PROVIDER,LLM_BASE_URL,LLM_MODEL(defaults provided).http://127.0.0.1:8000) and run health check./api/recorder/*; recording is off by default in the background script.ChatBrowserUseusage withget_llm()fromworkflow_use.llm.Written for commit f03985c. Summary will update on new commits.