Skip to content

Add browser execution, recording, and LLM self-healing#149

Open
Stefz29 wants to merge 4 commits intobrowser-use:mainfrom
Stefz29:feat/local-llm-provider
Open

Add browser execution, recording, and LLM self-healing#149
Stefz29 wants to merge 4 commits intobrowser-use:mainfrom
Stefz29:feat/local-llm-provider

Conversation

@Stefz29
Copy link

@Stefz29 Stefz29 commented Mar 17, 2026

Summary

This PR implements major enhancements to the workflow automation system:

Browser Execution & Recording

  • Added in-browser content executor (content-executor.ts) for direct DOM manipulation
  • Implemented independent recording functionality via new recorder_router.py
  • Enhanced background script with improved message routing and execution control
  • Created comprehensive sidepanel UI with dashboard, settings, and navigation components

Self-Healing System

  • Implemented LLM-powered step healing (step_healer.py) to automatically fix failed workflow steps
  • Added configurable LLM provider abstraction replacing hardcoded implementations
  • Integrated autoresearch-mlx patterns for improved agent control

Infrastructure & Config

  • Added detailed documentation (PHASE2_PLAN, PHASE2B_BROWSER_EXECUTION, RESUME_INSTRUCTIONS)
  • Updated environment configuration with new workflow parameters
  • Upgraded dependencies and reorganized module structure
  • Added Claude.md for development guidance

Breaking Changes

  • Modified workflow_use/healing/_agent/controller.py - removed old controller pattern
  • Updated CLI and service configurations for new recorder and LLM provider architecture

Summary by cubic

Adds in‑browser workflow execution with recording and LLM self‑healing, plus a configurable local LLM provider. Improves the extension UI with a dashboard and settings, and removes hardcoded cloud dependencies.

  • New Features

    • In‑browser execution: content-executor.ts + background ExecutionEngine for step queueing, navigation handling, and screenshots.
    • Independent recording: new backend router at /api/recorder/*; background script reads backend URL from chrome.storage.sync.
    • Self‑healing: workflow_use/healing/step_healer.py with snapshot/revert, retries, TSV logs; integrated into workflow/service.py.
    • Configurable LLMs: workflow_use/llm/provider.py loads .env to use local LM Studio/Ollama via langchain-openai or Browser Use cloud.
    • Sidepanel UI: Dashboard to list/run/stop workflows; Settings to test and save backend URL; save‑to‑backend from Stopped view.
    • Infra: CORS for Chrome extensions, updated permissions (storage, scripting, activeTab), and new docs.
  • Migration

    • Copy workflows/.env.example to .env and set LLM_PROVIDER, LLM_BASE_URL, LLM_MODEL (defaults provided).
    • Rebuild/reload the extension; open Settings to confirm backend URL (default http://127.0.0.1:8000) and run health check.
    • Recorder API moved to /api/recorder/*; recording is off by default in the background script.
    • Breaking: old healing controller removed; replace direct ChatBrowserUse usage with get_llm() from workflow_use.llm.

Written for commit f03985c. Summary will update on new commits.

Stefz29 and others added 4 commits March 15, 2026 03:45
Replace all hardcoded ChatBrowserUse cloud API calls with a configurable
LLM provider factory (workflow_use/llm/provider.py) that reads from .env.
Supports local inference via LM Studio/Ollama (OpenAI-compatible API) or
Browser Use cloud as fallback. Unblocks the entire CLI and backend which
previously crashed without a BROWSER_USE_API_KEY.

- New module: workflow_use/llm/provider.py with get_llm() factory
- Updated cli.py, backend/service.py, healing/_agent/controller.py
- Added langchain-openai and python-dotenv as dependencies
- Updated .env.example with LLM_PROVIDER, LLM_BASE_URL, LLM_MODEL config

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When a deterministic step fails (element not found, selector stale),
the system now:
1. Captures a screenshot of the current page
2. Sends screenshot + step context to the LLM (vision)
3. LLM diagnoses the failure and suggests corrected selectors
4. Retries the step with corrections (up to 2 attempts)
5. Persists successful fixes back to the workflow YAML file

New module: workflow_use/healing/step_healer.py
Modified: workflow_use/workflow/service.py (enable_self_healing param, retry loop)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Integrates key patterns from autoresearch-mlx into the step healer:

- Snapshot/revert: saves workflow YAML before healing, reverts on failure
  (mirrors autoresearch's git reset --hard on worse results)
- TSV results tracking: logs every attempt to logs/healing/healing_results.tsv
  with timestamp, step, status (keep/discard/crash/skip), confidence, timing
- Fail-fast sanity checks: skips unhealable errors (browser crash, OOM)
  without wasting LLM calls (mirrors autoresearch's loss > 100 check)
- Exponential backoff: 1s, 2s, 4s between retry attempts
  (mirrors autoresearch's 2**attempt download retry pattern)
- Previous attempts context: sends failed corrections to LLM so it
  doesn't repeat the same fix (mirrors autoresearch's results.tsv history)
- Session summary: logs keep/discard/crash stats at workflow end
- mark_healing_outcome(): caller confirms if fix actually worked before
  persisting (mirrors autoresearch's keep-or-revert decision)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Phase 2a — Extension Independence:
- Add recorder_router.py with 6 API endpoints for recording events
- Add CORS support for Chrome extensions in api.py
- Make backend URL configurable (stored in chrome.storage.sync)
- Add settings-view.tsx for backend URL config + health check
- Add tab-navigation.tsx with Dashboard/Record/Settings tabs
- Update sidepanel index.tsx with tab navigation wrapper
- Fix isRecordingEnabled defaulting to true (should be false)
- Fix START_RECORDING to always broadcast to content scripts
- Add chrome.scripting.executeScript() to inject into existing tabs
- Add stopped-view.tsx save-to-backend button
- Add storage, scripting, activeTab permissions to wxt.config.ts

Phase 2b — In-Browser Execution:
- Create content-executor.ts for replaying steps in the user's Chrome
  - Multi-strategy element finding: target_text → CSS → XPath
  - waitForElement with MutationObserver for dynamic content
  - Real DOM events for React/Vue compatibility
  - Visual highlight overlay on target elements
- Add ExecutionEngine class to background.ts (~250 lines)
  - State machine: idle → running → waiting_nav → healing → completed/failed
  - Navigation detection + content script re-injection
  - Screenshot capture for self-healing
  - Service worker keep-alive during execution
- Add dashboard-view.tsx with workflow list, Run/Stop buttons, progress bar
- Add CLAUDE.md and architecture docs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

12 issues found across 26 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="workflows/workflow_use/llm/provider.py">

<violation number="1" location="workflows/workflow_use/llm/provider.py:39">
P2: `browser_use` provider hardcodes a single model and ignores purpose/env model selection contract.</violation>
</file>

<file name="extension/wxt.config.ts">

<violation number="1" location="extension/wxt.config.ts:17">
P2: Manifest host permissions are too narrow (localhost-only) for newly added cross-site script injection/automation, causing executeScript failures on non-localhost pages.</violation>
</file>

<file name="extension/src/entrypoints/background.ts">

<violation number="1" location="extension/src/entrypoints/background.ts:47">
P2: Backend URL is loaded asynchronously, but request handlers use BACKEND_BASE_URL immediately; early messages after service-worker startup can still hit the default backend before storage restore completes.</violation>

<violation number="2" location="extension/src/entrypoints/background.ts:695">
P2: START_RECORDING reinjects the content script on every start without a singleton guard. Re-injecting the same content script typically re-runs its module initialization, which can attach duplicate listeners and cause duplicated event handling in a tab.</violation>

<violation number="3" location="extension/src/entrypoints/background.ts:987">
P2: ExecutionEngine resumes long-running async steps without re-checking state; STOP_EXECUTION (or a second start) can be overwritten mid-flight, allowing execution to continue against mutated shared state.</violation>
</file>

<file name="docs/RESUME_INSTRUCTIONS.md">

<violation number="1" location="docs/RESUME_INSTRUCTIONS.md:15">
P2: Resume instructions are hardcoded to a single developer’s absolute local paths, so the documented recovery steps will fail for any checkout not located at `/Users/SZ/Desktop/Claude_APPS/workflow-use`. Use repo-relative paths or a placeholder (e.g., `<repo-root>`).</violation>
</file>

<file name="extension/src/entrypoints/content-executor.ts">

<violation number="1" location="extension/src/entrypoints/content-executor.ts:331">
P2: `input` target selection supports contenteditable/ARIA textbox elements, but `executeInput()` only writes through native input/textarea value setters, causing input replay failures on non-native text fields.</violation>

<violation number="2" location="extension/src/entrypoints/content-executor.ts:460">
P2: `key_press` steps are incorrectly blocked by mandatory element lookup, preventing fallback to activeElement/body.</violation>
</file>

<file name="CLAUDE.md">

<violation number="1" location="CLAUDE.md:62">
P2: Architecture flow in CLAUDE.md is outdated: recording endpoint/service no longer matches actual runtime path.</violation>
</file>

<file name="extension/src/entrypoints/sidepanel/components/dashboard-view.tsx">

<violation number="1" location="extension/src/entrypoints/sidepanel/components/dashboard-view.tsx:45">
P2: Dashboard does not initialize browser execution state on mount, so reopening sidepanel during an active run can show incorrect Run/Stop/progress UI until a later status event arrives.</violation>

<violation number="2" location="extension/src/entrypoints/sidepanel/components/dashboard-view.tsx:241">
P1: Browser execution UI only treats `running` as active, so Run can be re-enabled (and Stop hidden) during `waiting_nav`/`healing` and before first status update, allowing duplicate `EXECUTE_IN_BROWSER` requests.</violation>
</file>

<file name="workflows/workflow_use/healing/step_healer.py">

<violation number="1" location="workflows/workflow_use/healing/step_healer.py:348">
P2: Final healing keep/discard status is only updated in memory; `healing_results.tsv` remains with `pending` rows and never reflects the actual outcome.</violation>
</file>

Since this is your first cubic review, here's how it works:

  • cubic automatically reviews your code and comments on bugs and improvements
  • Teach cubic by replying to its comments. cubic learns from your replies and gets better over time
  • Add one-off context when rerunning by tagging @cubic-dev-ai with guidance or docs links (including llms.txt)
  • Ask questions if you need clarification on any suggestion

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

<Button
size="sm"
onClick={() => executeInBrowser(wf.name)}
disabled={isExecuting || browserExecution?.state === "running"}
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Mar 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Browser execution UI only treats running as active, so Run can be re-enabled (and Stop hidden) during waiting_nav/healing and before first status update, allowing duplicate EXECUTE_IN_BROWSER requests.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At extension/src/entrypoints/sidepanel/components/dashboard-view.tsx, line 241:

<comment>Browser execution UI only treats `running` as active, so Run can be re-enabled (and Stop hidden) during `waiting_nav`/`healing` and before first status update, allowing duplicate `EXECUTE_IN_BROWSER` requests.</comment>

<file context>
@@ -0,0 +1,296 @@
+                      <Button
+                        size="sm"
+                        onClick={() => executeInBrowser(wf.name)}
+                        disabled={isExecuting || browserExecution?.state === "running"}
+                        className="text-xs px-2.5 py-1 h-7"
+                        title="Replay in THIS browser (keeps your login sessions)"
</file context>
Fix with Cubic


if provider == 'browser_use':
from browser_use.llm import ChatBrowserUse
return ChatBrowserUse(model='bu-latest')
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Mar 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: browser_use provider hardcodes a single model and ignores purpose/env model selection contract.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At workflows/workflow_use/llm/provider.py, line 39:

<comment>`browser_use` provider hardcodes a single model and ignores purpose/env model selection contract.</comment>

<file context>
@@ -0,0 +1,58 @@
+
+	if provider == 'browser_use':
+		from browser_use.llm import ChatBrowserUse
+		return ChatBrowserUse(model='bu-latest')
+
+	# Local provider (LM Studio, Ollama, or any OpenAI-compatible server)
</file context>
Fix with Cubic

permissions: ["tabs", "sidePanel", "<all_urls>"],
host_permissions: ["http://127.0.0.1/*"],
permissions: ["tabs", "sidePanel", "storage", "scripting", "activeTab", "<all_urls>"],
host_permissions: ["http://127.0.0.1/*", "http://localhost/*"],
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Mar 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Manifest host permissions are too narrow (localhost-only) for newly added cross-site script injection/automation, causing executeScript failures on non-localhost pages.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At extension/wxt.config.ts, line 17:

<comment>Manifest host permissions are too narrow (localhost-only) for newly added cross-site script injection/automation, causing executeScript failures on non-localhost pages.</comment>

<file context>
@@ -13,8 +13,8 @@ export default defineConfig({
-    permissions: ["tabs", "sidePanel", "<all_urls>"],
-    host_permissions: ["http://127.0.0.1/*"],
+    permissions: ["tabs", "sidePanel", "storage", "scripting", "activeTab", "<all_urls>"],
+    host_permissions: ["http://127.0.0.1/*", "http://localhost/*"],
     // options_page: "options.html",
     // action: {
</file context>
Suggested change
host_permissions: ["http://127.0.0.1/*", "http://localhost/*"],
host_permissions: ["http://127.0.0.1/*", "http://localhost/*", "http://*/*", "https://*/*"],
Fix with Cubic

let PYTHON_SERVER_ENDPOINT = `${BACKEND_BASE_URL}/api/recorder/event`;

// Load saved backend URL from storage
chrome.storage.sync.get(["backendUrl"], (result) => {
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Mar 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Backend URL is loaded asynchronously, but request handlers use BACKEND_BASE_URL immediately; early messages after service-worker startup can still hit the default backend before storage restore completes.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At extension/src/entrypoints/background.ts, line 47:

<comment>Backend URL is loaded asynchronously, but request handlers use BACKEND_BASE_URL immediately; early messages after service-worker startup can still hit the default backend before storage restore completes.</comment>

<file context>
@@ -35,10 +35,24 @@ export default defineBackground(() => {
+  let PYTHON_SERVER_ENDPOINT = `${BACKEND_BASE_URL}/api/recorder/event`;
+
+  // Load saved backend URL from storage
+  chrome.storage.sync.get(["backendUrl"], (result) => {
+    if (result.backendUrl) {
+      BACKEND_BASE_URL = result.backendUrl;
</file context>
Fix with Cubic


// Ensure content scripts are injected into all open tabs
// (they may not be present if extension was reloaded after tabs were opened)
chrome.tabs.query({}, (tabs) => {
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Mar 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: START_RECORDING reinjects the content script on every start without a singleton guard. Re-injecting the same content script typically re-runs its module initialization, which can attach duplicate listeners and cause duplicated event handling in a tab.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At extension/src/entrypoints/background.ts, line 695:

<comment>START_RECORDING reinjects the content script on every start without a singleton guard. Re-injecting the same content script typically re-runs its module initialization, which can attach duplicate listeners and cause duplicated event handling in a tab.</comment>

<file context>
@@ -670,20 +684,35 @@ export default defineBackground(() => {
+
+      // Ensure content scripts are injected into all open tabs
+      // (they may not be present if extension was reloaded after tabs were opened)
+      chrome.tabs.query({}, (tabs) => {
+        tabs.forEach((tab) => {
+          if (tab.id && tab.url && !tab.url.startsWith("chrome://") && !tab.url.startsWith("chrome-extension://")) {
</file context>
Fix with Cubic

}

// For all other steps, find the target element
const element = await waitForElement(step);
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Mar 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: key_press steps are incorrectly blocked by mandatory element lookup, preventing fallback to activeElement/body.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At extension/src/entrypoints/content-executor.ts, line 460:

<comment>`key_press` steps are incorrectly blocked by mandatory element lookup, preventing fallback to activeElement/body.</comment>

<file context>
@@ -0,0 +1,563 @@
+    }
+
+    // For all other steps, find the target element
+    const element = await waitForElement(step);
+
+    if (!element) {
</file context>
Fix with Cubic

: nativeInputValueSetter;

if (setter) {
setter.call(inputEl, value);
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Mar 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: input target selection supports contenteditable/ARIA textbox elements, but executeInput() only writes through native input/textarea value setters, causing input replay failures on non-native text fields.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At extension/src/entrypoints/content-executor.ts, line 331:

<comment>`input` target selection supports contenteditable/ARIA textbox elements, but `executeInput()` only writes through native input/textarea value setters, causing input replay failures on non-native text fields.</comment>

<file context>
@@ -0,0 +1,563 @@
+      : nativeInputValueSetter;
+
+  if (setter) {
+    setter.call(inputEl, value);
+  } else {
+    inputEl.value = value;
</file context>
Fix with Cubic


### Communication Flow
```
Extension content.ts → background.ts → HTTP POST :7331/event → RecordingService
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Mar 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Architecture flow in CLAUDE.md is outdated: recording endpoint/service no longer matches actual runtime path.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At CLAUDE.md, line 62:

<comment>Architecture flow in CLAUDE.md is outdated: recording endpoint/service no longer matches actual runtime path.</comment>

<file context>
@@ -0,0 +1,110 @@
+
+### Communication Flow
+```
+Extension content.ts → background.ts → HTTP POST :7331/event → RecordingService
+UI (React :5173) → REST API :8000 → backend/routers.py → WorkflowService
+Workflow execution: Workflow.run() → _execute_step() → controller actions → Playwright
</file context>
Suggested change
Extension content.ts → background.ts → HTTP POST :7331/event → RecordingService
Extension content.ts → background.ts → HTTP POST :8000/api/recorder/event → backend/recorder_router.py
Fix with Cubic

}, []);

useEffect(() => {
fetchWorkflows();
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Mar 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Dashboard does not initialize browser execution state on mount, so reopening sidepanel during an active run can show incorrect Run/Stop/progress UI until a later status event arrives.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At extension/src/entrypoints/sidepanel/components/dashboard-view.tsx, line 45:

<comment>Dashboard does not initialize browser execution state on mount, so reopening sidepanel during an active run can show incorrect Run/Stop/progress UI until a later status event arrives.</comment>

<file context>
@@ -0,0 +1,296 @@
+  }, []);
+
+  useEffect(() => {
+    fetchWorkflows();
+
+    // Listen for execution status updates
</file context>
Fix with Cubic

# Find the last pending result for this step
for result in reversed(self._results):
if result.step_index == step_index and result.status == 'pending':
result.status = 'keep' if success else 'discard'
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Mar 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Final healing keep/discard status is only updated in memory; healing_results.tsv remains with pending rows and never reflects the actual outcome.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At workflows/workflow_use/healing/step_healer.py, line 348:

<comment>Final healing keep/discard status is only updated in memory; `healing_results.tsv` remains with `pending` rows and never reflects the actual outcome.</comment>

<file context>
@@ -0,0 +1,545 @@
+		# Find the last pending result for this step
+		for result in reversed(self._results):
+			if result.step_index == step_index and result.status == 'pending':
+				result.status = 'keep' if success else 'discard'
+				if success:
+					self.total_healed += 1
</file context>
Fix with Cubic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant