feat: add MCP server for AI agent integration#349
Open
ms6rb wants to merge 110 commits intousestrix:mainfrom
Open
feat: add MCP server for AI agent integration#349ms6rb wants to merge 110 commits intousestrix:mainfrom
ms6rb wants to merge 110 commits intousestrix:mainfrom
Conversation
Security testing playbook for NestJS applications covering guard bypass, validation pipe exploits, module boundary leaks, cross-transport auth inconsistencies, passport/JWT misuse, serialization leaks, ORM injection, CRUD generator gaps, and rate limiting bypass. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
FastMCP server exposing Strix security sandbox tools to Claude Code, compatible with the skills-based module system. Includes: - Web target HTTP fingerprinting in start_scan - Finding deduplication with title normalization and merge-on-insert - list_vulnerability_reports, list_modules, get_scan_status tools - Richer end_scan summary with OWASP grouping and dedup stats - Web-only methodology branch with adjusted subagent template - 49 unit tests covering all new functionality Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix guard ordering claim: NestJS uses AND logic, bypass is metadata-driven via @public()/@SetMetadata, not order-driven - Add missing validation requirements for ORM injection and cache poisoning Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add nestjs module to nestjs trigger, add domain/subdomain_takeover - Add info_disclosure, open_redirect, path_traversal to web_app rules - Add 4 agent templates: NestJS, info disclosure, path traversal, subdomain - Expand HTTP probe paths from 5 to 18 (actuator, .env, swagger, etc.) - Detect Spring Actuator, exposed .env, Swagger from probe results - Add affected_endpoint and cvss_score to vulnerability reports - Update methodology subagent templates with new report fields - 8 new tests (57 total)
- Add MODULE_RULES and agent templates for Django, WordPress, Laravel, Rails, Express, and Flask — detected frameworks now get dedicated testing agents instead of only generic web testing - Auto-fetch OpenAPI/Swagger spec when swagger is detected during fingerprinting — extracts endpoint list and passes to coordinator for better subagent targeting - Add missing OWASP keywords: open_redirect, subdomain_takeover, information_disclosure, prototype_pollution, exposed_env, actuator - Update methodology with OpenAPI auto-discovery guidance - 10 new tests (67 total)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…gather Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…lan output Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…t tool Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Individual markdown files per finding, CSV index sorted by severity, get_finding tool for selective recall, minimal tool responses. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…d finding recall Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fixes: keep end_scan name (avoids collision with native finish_scan), remove wrong test_integration change, add strix-agent dependency, add server.py resource descriptions, add pyproject.toml metadata task. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Move create_vulnerability_report to MCP Orchestration (not proxied) - Note str_replace_editor as partial parity (no create/view/insert) - Add native create_vulnerability_report to Not Yet Supported - Update design doc with final decisions, mark as superseded by plan Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix proxied tool count (14 -> 13) - Add agent_id parameter documentation requirement for all proxied tools - Add workflow section to README template Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove register_agent as public tool (dispatch_agent handles it) - Update all 23 tool descriptions with parameter docs and enum values - Add agent_id documentation to all 13 proxied tools - Consistent formatting across MCP-only and proxied tools Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When the tracer fails to initialize, create_vulnerability_report silently returns phantom report IDs that are never persisted. list_vulnerability_reports then returns empty results. - Log the actual exception on tracer init failure (was silently swallowed) - Warn when create_vulnerability_report files without a tracer Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Contributor
Author
The dedup merge path mutated dicts from get_existing_vulnerabilities(), relying on them being shared references to the tracer's internal list. If the tracer ever returns copies, merges would be silently discarded. Access tracer.vulnerability_reports[idx] directly instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
start_scan now returns a "tracer" field ("active", "failed", or
"unavailable") and a warning if findings won't be persisted. This
makes tracer init failures visible to the agent instead of silently
succeeding and then failing on nuclei_scan/create_vulnerability_report.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
list_requests passed end_page=None to the sandbox, which crashes with 'NoneType - int' when the sandbox does pagination arithmetic. Only include optional params in the proxy call when they have values. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
download_sourcemaps:
- Handle both sandbox response formats ({"response": {"body": ...}} and {"body": ...})
- Return html_length for debugging empty-result cases
nuclei_scan:
- Capture stderr instead of discarding to /dev/null
- Return nuclei_stderr in response when present (template errors, binary issues)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sync with upstream v0.8.3 (sandbox 0.1.13, load_skill tool, chaining templates migrated to load_skill). Add 6 new MCP recon/analysis tools: compare_sessions (session diffing for IDOR), firebase_audit (Firestore ACL matrix), analyze_js_bundles (JS pattern extraction), discover_api (GraphQL/gRPC/OpenAPI detection), discover_services (third-party CMS detection + Sanity GROQ probing), reason_chains (cross-tool chain reasoning). Add browser_security skill with address bar spoofing and prompt injection test templates. Update methodology with tool-call discipline, scope guidance, and recon tool integration. 193 tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move 6 analysis tools (compare_sessions, firebase_audit, analyze_js_bundles, discover_api, reason_chains, discover_services) from tools.py into a new tools_analysis.py module with register_analysis_tools(). Pure refactor with no behavior changes. Removes unused imports from tools.py. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move nuclei_scan and download_sourcemaps to dedicated tools_recon module, reducing tools.py from 915 to 584 lines. Pure refactor, no behavior change. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove unused re, VALID_NOTE_CATEGORIES from tools.py. Remove unused Tracer, set_global_tracer, datetime/UTC from tools_analysis.py. Remove redundant local asyncio/hashlib re-imports shadowing top-level imports. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The 0.1.13 image has a 0-byte docker-entrypoint.sh (upstream build bug), causing "exec format error" on startup. Pinned to 0.1.12 until fixed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ing, SAML, supply chain, postMessage, OAuth, prototype pollution, LLM injection High-impact vulnerability skills based on 2025-2026 HackerOne bounty research. Covers the top-paying attack classes currently underrepresented in the skill catalog. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Enhance analyze_js_bundles with CSPT sink detection, postMessage listener enumeration, and internal package name discovery. Add new cross-tool chain patterns for CSPT, supply chain, OAuth, cache poisoning, smuggling, and LLM injection. Update methodology vulnerability priorities and chaining patterns to reflect 2025-2026 bounty landscape. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Automated HTTP request smuggling detection (CL.TE, TE.CL, TE.TE, TE.0 variants with proxy fingerprinting) and web cache poisoning/deception testing (unkeyed headers, parser discrepancy paths). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- TE.0 probe now sends actual chunked body with CL:0 (was empty) - Document httpx limitation for duplicate TE header probes - Add test_request_smuggling and test_cache_poisoning to methodology recon Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…k_ssrf, dangling_resources, pg_tenant_audit Battle-tested skills from a Neon bug bounty session that found 2 High-severity bugs (SSRF CVSS 8.6, PKCE bypass CVSS 8.1). Covers OAuth server enumeration, webhook SSRF methodology, dangling resource detection, and managed PostgreSQL tenant isolation auditing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…+ chain body format warning K8s service enumeration wordlist generator for SSRF probing. Blind SSRF oracle calibration tool (retry/timing/status differentials). Agent authorization context in templates to prevent refusals. Chain reasoning body format compatibility warning for webhook SSRF. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add k8s_enumerate tests (4), ssrf_oracle tests (2), body_format_warning tests (2). Add k8s_enumerate, ssrf_oracle, oauth_audit, webhook_ssrf, dangling_resources, pg_tenant_audit to methodology recon directives. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…t, load_skill overflow - download_sourcemaps: fix regex to match type=module crossorigin scripts - k8s_enumerate: map services to default ports instead of cartesian product, add scheme parameter (default https), cap output size - load_skill: add max_content_length (50K) and summary_only mode to prevent MCP buffer overflow on large skills Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…proxy Root cause: nuclei loaded all 2252 templates (5249 requests) through Caido proxy, exceeding 600s timeout on most targets. Fixes: - Default to focused tags (exposure,misconfig,cve,takeover,default-login,token) instead of all templates — reduces to ~500-800 requests - Add -env-vars=false to bypass system proxy for direct scanning - Add -no-httpx to skip probe (target already known live) - Replace -silent with -stats for progress visibility - Parse and return last stats line in scan_progress field Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…aphs - k8s_enumerate: distribute max_urls evenly across namespaces instead of truncating first namespaces. Remove cross-product from short_forms. - load_skill: summary_only now returns title + first paragraph (up to 500 chars) instead of just the # heading line. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- k8s_enumerate: services mapped to likely namespaces (grafana→monitoring, kubernetes→default, argocd-server→argocd, etc). Unmapped services only in default+kube-system. Reduces 488→73 URLs. max_urls=0 returns empty. - ssrf_oracle: use https:// for all test URLs to isolate IP/hostname validation from scheme validation. Document retry oracle limitation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
strix-mcp, an MCP (Model Context Protocol) server that exposes Strix's Docker sandbox tools to AI coding agents (Claude Code, Cursor, Windsurf, and any MCP-compatible client)strix_runs/formatTest plan
🤖 Generated with Claude Code