A complete, production-grade framework for AI agents to autonomously develop software projects across multiple sessions — with built-in safety, quality assurance, and regression protection.
This specification defines a structured approach for AI agents to build complete software projects incrementally, solving the core challenge of context window fragmentation in long-running development tasks.
Inspired by Anthropic's research on effective harnesses for long-running agents.
- Two-Phase Agent Pattern — Initializer Agent + Coding Agent
- Structured State Management — Progress files, feature lists, git history
- Incremental Development — One feature at a time, never one-shot
- Clean Handoffs — Every session leaves a mergeable, documented state
- Test-Driven Verification — Prove features work with tool-based evidence
- Rapid Onboarding — Standard startup procedure for every new session
- Security by Default — Command whitelist & safe execution guardrails
- Regression Protection — Detect and fix broken features before adding new ones
ai-driven-dev-spec/
├── README.md # This file
├── docs/
│ └── specification.md # Core specification (detailed specification)
├── templates/
│ ├── scaffold/ # Project scaffold templates
│ │ ├── .ai/
│ │ │ ├── feature_list.json # Feature tracking (truth source)
│ │ │ ├── progress.md # Human-readable progress log
│ │ │ ├── architecture.md # Architecture decisions
│ │ │ └── session_log.jsonl # Machine-readable session log
│ │ ├── init.sh # Automated environment bootstrap
│ │ └── .gitignore # Standard gitignore
│ └── prompts/
│ ├── initializer_prompt.md # System prompt for Phase 1
│ ├── coding_prompt.md # System prompt for Phase 2
│ ├── testing_prompt.md # Testing & QA guide
│ └── review_prompt.md # Code review checklist
├── workflows/
│ ├── init-project.md # Step-by-step project init
│ ├── dev-session.md # Single dev session workflow
│ ├── test-verify.md # Testing & verification workflow
│ └── handoff.md # Session handoff checklist
└── examples/
├── feature_list_example.json # Full-featured example
└── app_spec_example.md # Sample app specification
- Read the Core Specification
- Copy
templates/scaffold/into your new project - Customize prompts from
templates/prompts/ - Follow the workflows in
workflows/ - Tell the AI: "Please read the files in the .ai/ directory and start working according to the development specifications"
| Feature | v2.1 | v2.2 |
|---|---|---|
| Data collection infrastructure | ✅ | ✅ |
| Learning value capture | ✅ | ✅ |
| Performance metrics | ✅ | ✅ |
| Modular harness config | ✅ | ✅ |
| Failure analysis tools | ✅ | ✅ |
| Pre-Completion Checklist Middleware | ❌ | ✅ NEW |
| Auto Context Injection Middleware | ❌ | ✅ NEW |
| Deterministic context injection | ❌ | ✅ NEW |
| Ralph Wiggum Loop pattern | ❌ | ✅ NEW |
Inspired by LangChain's Harness Engineering approach:
-
Pre-Completion Checklist Middleware
- Mandatory exit gate before ending any session
- Blocks session termination until all checks pass
- Forces build-verify loop that models don't naturally enter
-
Auto Context Injection Middleware
- Directory Context: Auto-inject current directory structure
- Tool Context: Auto-detect available tools
- Failure Pattern Context: Auto-inject recent failure patterns
- Deterministic context injection helps agents verify their work
-
Ralph Wiggum Loop Pattern
- Hook forces agent to continue executing on exit
- Ensures verification pass against task spec
- Prevents premature session termination
| Feature | v2.0 | v2.1 |
|---|---|---|
| Dual-agent pattern | ✅ | ✅ |
| Feature list tracking | ✅ Enhanced | ✅ Enhanced |
| Git integration | ✅ | ✅ |
| Test cases in feature list | ✅ | ✅ |
| Security command whitelist | ✅ | ✅ |
| Environment health checks | ✅ Full | ✅ Full |
| Regression detection | ✅ | ✅ |
| Core feature locking | ✅ | ✅ |
| Browser automation (E2E) | ✅ | ✅ |
| Session stability verification | ✅ | ✅ |
| Multi-language support | ✅ | ✅ |
| Detailed testing prompt | ✅ | ✅ |
| Code review checklist | ✅ | ✅ |
| Data collection infrastructure | ❌ | ✅ NEW |
| Learning value capture | ❌ | ✅ NEW |
| Performance metrics | ❌ | ✅ NEW |
| Modular harness config | ❌ | ✅ NEW |
| "Built for deletion" principle | ❌ | ✅ NEW |
| Failure analysis tools | ❌ | ✅ NEW |
| Automated insights generation | ❌ | ✅ NEW |
-
Data Collection Infrastructure
- Automated collection of failure cases, success patterns, and performance metrics
- Structured data storage in JSONL format
- Privacy-aware anonymization
-
Learning Value Capture
- Every failure is training data
- Every success is a best practice
- Feedback loop for continuous improvement
-
Performance Metrics System
- Reliability metrics (completion rate, regression rate, retry rate)
- Efficiency metrics (development time, estimation accuracy)
- Quality metrics (test coverage, code quality)
-
Modular Harness Configuration
- "Build for deletion" philosophy
- Enable/disable modules based on model capabilities
- Document rationale for each constraint
-
Analysis Tools
analyze_failures.py: Identify common failure patternsgenerate_metrics.py: Generate comprehensive performance reports- Automated weekly/monthly analysis
Philosophy: ADDS is not just a development framework—it's a data collection platform and infrastructure for model evolution.
GNU General Public License v3.0 (GPL-3.0)
See LICENSE file for full text.
Key features:
- ✅ Free to use, modify, and distribute
- ✅ Copyleft: derivative works must also be GPL-3.0
- ✅ Patent grant included
- ✅ Source code availability required