Skip to content

feat: add MCP server for AI agent integration#349

Open
ms6rb wants to merge 110 commits intousestrix:mainfrom
ms6rb:feat/mcp-orchestration
Open

feat: add MCP server for AI agent integration#349
ms6rb wants to merge 110 commits intousestrix:mainfrom
ms6rb:feat/mcp-orchestration

Conversation

@ms6rb
Copy link
Contributor

@ms6rb ms6rb commented Mar 8, 2026

Summary

  • Adds strix-mcp, an MCP (Model Context Protocol) server that exposes Strix's Docker sandbox tools to AI coding agents (Claude Code, Cursor, Windsurf, and any MCP-compatible client)
  • 13 proxied tools with full parity to native Strix (terminal, browser, proxy, file editing, sitemap)
  • 10 MCP orchestration tools: scan lifecycle, subagent dispatch, vulnerability dedup/chaining, knowledge modules
  • Tech stack auto-detection (code-based + HTTP fingerprinting) with recommended scan plans
  • Automatic vulnerability chaining across findings (10 chain rules)
  • OWASP Top 10 categorization and finding persistence in strix_runs/ format
  • Coverage map documenting parity with native Strix and roadmap for unsupported tools
  • Multi-client installation docs (Claude Code, Cursor, Windsurf)
  • Small MCP section added to root README

Test plan

  • 112 unit tests passing (chaining, stack detection, resources, tools, persistence)
  • Integration test with live Docker sandbox
  • Manual end-to-end scan with Claude Code
  • Manual verification with Cursor/Windsurf MCP clients

🤖 Generated with Claude Code

ms6rb and others added 30 commits March 8, 2026 17:12
Security testing playbook for NestJS applications covering guard bypass,
validation pipe exploits, module boundary leaks, cross-transport auth
inconsistencies, passport/JWT misuse, serialization leaks, ORM injection,
CRUD generator gaps, and rate limiting bypass.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
FastMCP server exposing Strix security sandbox tools to Claude Code,
compatible with the skills-based module system. Includes:

- Web target HTTP fingerprinting in start_scan
- Finding deduplication with title normalization and merge-on-insert
- list_vulnerability_reports, list_modules, get_scan_status tools
- Richer end_scan summary with OWASP grouping and dedup stats
- Web-only methodology branch with adjusted subagent template
- 49 unit tests covering all new functionality

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix guard ordering claim: NestJS uses AND logic, bypass is
  metadata-driven via @public()/@SetMetadata, not order-driven
- Add missing validation requirements for ORM injection and
  cache poisoning

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add nestjs module to nestjs trigger, add domain/subdomain_takeover
- Add info_disclosure, open_redirect, path_traversal to web_app rules
- Add 4 agent templates: NestJS, info disclosure, path traversal, subdomain
- Expand HTTP probe paths from 5 to 18 (actuator, .env, swagger, etc.)
- Detect Spring Actuator, exposed .env, Swagger from probe results
- Add affected_endpoint and cvss_score to vulnerability reports
- Update methodology subagent templates with new report fields
- 8 new tests (57 total)
- Add MODULE_RULES and agent templates for Django, WordPress, Laravel,
  Rails, Express, and Flask — detected frameworks now get dedicated
  testing agents instead of only generic web testing
- Auto-fetch OpenAPI/Swagger spec when swagger is detected during
  fingerprinting — extracts endpoint list and passes to coordinator
  for better subagent targeting
- Add missing OWASP keywords: open_redirect, subdomain_takeover,
  information_disclosure, prototype_pollution, exposed_env, actuator
- Update methodology with OpenAPI auto-discovery guidance
- 10 new tests (67 total)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…gather

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…lan output

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…t tool

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Individual markdown files per finding, CSV index sorted by severity,
get_finding tool for selective recall, minimal tool responses.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…d finding recall

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fixes: keep end_scan name (avoids collision with native finish_scan),
remove wrong test_integration change, add strix-agent dependency,
add server.py resource descriptions, add pyproject.toml metadata task.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Move create_vulnerability_report to MCP Orchestration (not proxied)
- Note str_replace_editor as partial parity (no create/view/insert)
- Add native create_vulnerability_report to Not Yet Supported
- Update design doc with final decisions, mark as superseded by plan

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix proxied tool count (14 -> 13)
- Add agent_id parameter documentation requirement for all proxied tools
- Add workflow section to README template

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove register_agent as public tool (dispatch_agent handles it)
- Update all 23 tool descriptions with parameter docs and enum values
- Add agent_id documentation to all 13 proxied tools
- Consistent formatting across MCP-only and proxied tools

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ms6rb and others added 3 commits March 17, 2026 10:15
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When the tracer fails to initialize, create_vulnerability_report
silently returns phantom report IDs that are never persisted.
list_vulnerability_reports then returns empty results.

- Log the actual exception on tracer init failure (was silently swallowed)
- Warn when create_vulnerability_report files without a tracer

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ms6rb
Copy link
Contributor Author

ms6rb commented Mar 17, 2026

@greptileai

ms6rb and others added 25 commits March 17, 2026 10:34
The dedup merge path mutated dicts from get_existing_vulnerabilities(),
relying on them being shared references to the tracer's internal list.
If the tracer ever returns copies, merges would be silently discarded.

Access tracer.vulnerability_reports[idx] directly instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
start_scan now returns a "tracer" field ("active", "failed", or
"unavailable") and a warning if findings won't be persisted. This
makes tracer init failures visible to the agent instead of silently
succeeding and then failing on nuclei_scan/create_vulnerability_report.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
list_requests passed end_page=None to the sandbox, which crashes
with 'NoneType - int' when the sandbox does pagination arithmetic.
Only include optional params in the proxy call when they have values.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
download_sourcemaps:
- Handle both sandbox response formats ({"response": {"body": ...}} and {"body": ...})
- Return html_length for debugging empty-result cases

nuclei_scan:
- Capture stderr instead of discarding to /dev/null
- Return nuclei_stderr in response when present (template errors, binary issues)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sync with upstream v0.8.3 (sandbox 0.1.13, load_skill tool, chaining
templates migrated to load_skill). Add 6 new MCP recon/analysis tools:
compare_sessions (session diffing for IDOR), firebase_audit (Firestore
ACL matrix), analyze_js_bundles (JS pattern extraction), discover_api
(GraphQL/gRPC/OpenAPI detection), discover_services (third-party CMS
detection + Sanity GROQ probing), reason_chains (cross-tool chain
reasoning). Add browser_security skill with address bar spoofing and
prompt injection test templates. Update methodology with tool-call
discipline, scope guidance, and recon tool integration. 193 tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move 6 analysis tools (compare_sessions, firebase_audit, analyze_js_bundles,
discover_api, reason_chains, discover_services) from tools.py into a new
tools_analysis.py module with register_analysis_tools(). Pure refactor with
no behavior changes. Removes unused imports from tools.py.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move nuclei_scan and download_sourcemaps to dedicated tools_recon module,
reducing tools.py from 915 to 584 lines. Pure refactor, no behavior change.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove unused re, VALID_NOTE_CATEGORIES from tools.py. Remove unused
Tracer, set_global_tracer, datetime/UTC from tools_analysis.py. Remove
redundant local asyncio/hashlib re-imports shadowing top-level imports.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The 0.1.13 image has a 0-byte docker-entrypoint.sh (upstream build bug),
causing "exec format error" on startup. Pinned to 0.1.12 until fixed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ing, SAML, supply chain, postMessage, OAuth, prototype pollution, LLM injection

High-impact vulnerability skills based on 2025-2026 HackerOne bounty
research. Covers the top-paying attack classes currently underrepresented
in the skill catalog.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Enhance analyze_js_bundles with CSPT sink detection, postMessage listener
enumeration, and internal package name discovery. Add new cross-tool chain
patterns for CSPT, supply chain, OAuth, cache poisoning, smuggling, and
LLM injection. Update methodology vulnerability priorities and chaining
patterns to reflect 2025-2026 bounty landscape.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Automated HTTP request smuggling detection (CL.TE, TE.CL, TE.TE, TE.0
variants with proxy fingerprinting) and web cache poisoning/deception
testing (unkeyed headers, parser discrepancy paths).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- TE.0 probe now sends actual chunked body with CL:0 (was empty)
- Document httpx limitation for duplicate TE header probes
- Add test_request_smuggling and test_cache_poisoning to methodology recon

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…k_ssrf, dangling_resources, pg_tenant_audit

Battle-tested skills from a Neon bug bounty session that found 2 High-severity
bugs (SSRF CVSS 8.6, PKCE bypass CVSS 8.1). Covers OAuth server enumeration,
webhook SSRF methodology, dangling resource detection, and managed PostgreSQL
tenant isolation auditing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…+ chain body format warning

K8s service enumeration wordlist generator for SSRF probing. Blind SSRF
oracle calibration tool (retry/timing/status differentials). Agent
authorization context in templates to prevent refusals. Chain reasoning
body format compatibility warning for webhook SSRF.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add k8s_enumerate tests (4), ssrf_oracle tests (2), body_format_warning
tests (2). Add k8s_enumerate, ssrf_oracle, oauth_audit, webhook_ssrf,
dangling_resources, pg_tenant_audit to methodology recon directives.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…t, load_skill overflow

- download_sourcemaps: fix regex to match type=module crossorigin scripts
- k8s_enumerate: map services to default ports instead of cartesian product,
  add scheme parameter (default https), cap output size
- load_skill: add max_content_length (50K) and summary_only mode to prevent
  MCP buffer overflow on large skills

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…proxy

Root cause: nuclei loaded all 2252 templates (5249 requests) through
Caido proxy, exceeding 600s timeout on most targets.

Fixes:
- Default to focused tags (exposure,misconfig,cve,takeover,default-login,token)
  instead of all templates — reduces to ~500-800 requests
- Add -env-vars=false to bypass system proxy for direct scanning
- Add -no-httpx to skip probe (target already known live)
- Replace -silent with -stats for progress visibility
- Parse and return last stats line in scan_progress field

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…aphs

- k8s_enumerate: distribute max_urls evenly across namespaces instead of
  truncating first namespaces. Remove cross-product from short_forms.
- load_skill: summary_only now returns title + first paragraph (up to
  500 chars) instead of just the # heading line.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- k8s_enumerate: services mapped to likely namespaces (grafana→monitoring,
  kubernetes→default, argocd-server→argocd, etc). Unmapped services only
  in default+kube-system. Reduces 488→73 URLs. max_urls=0 returns empty.
- ssrf_oracle: use https:// for all test URLs to isolate IP/hostname
  validation from scheme validation. Document retry oracle limitation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant