Large language models (LLMs) have recently shown strong potential for Automated Program Repair (APR), yet most existing approaches remain unimodal and fail to leverage the rich diagnostic signals contained in visual artifacts such as screenshots and control-flow graphs. In practice, many bug reports convey critical information visually (e.g., layout breakage or missing widgets), but directly using such dense visual inputs often causes context loss and noise, making it difficult for MLLMs to ground visual observations into precise fault localization and executable patches. To bridge this semantic gap, we propose SVRepair, a multimodal APR framework with structured visual representation. SVRepair first fine-tunes a vision-language model, Structured Visual Representation (SVR), to uniformly transform heterogeneous visual artifacts into a semantic scene graph that captures GUI elements and their structural relations (e.g., hierarchy), providing normalized, code-relevant context for downstream repair. Building on the graph, SVRepair drives a coding agent to localize faults and synthesize patches, and further introduces an iterative visual-artifact segmentation strategy that progressively narrows the input to bug-centered regions to suppress irrelevant context and reduce hallucinations. Extensive experiments across multiple benchmarks demonstrate state-of-the-art performance: SVRepair achieves 36.47% accuracy on SWE-Bench M, 38.02% on MMCode, and 95.12% on CodeVision, validating the effectiveness of SVRepair for multimodal program repair.
π¦ Model Weights: CodeFuse-SVR-8B
- Python 3.10+
- Agent: CodeFuse-Agent
- SWE-Bench-Multimodel: SWE-bench Multimodal
# Clone repository
git clone https://github.com/codefuse-ai/CodeFuse-SVR.git
cd CodeFuse-SVR
# Install dependencies
export PYTHONPATH=./
pip install -r requirements.txtProcess SWE-bench Multimodal data and download repositories
# output.json will be used as input_data in subsequent steps
python src/utils/process_data_dl_repo.py --parquet-path /path/to/your/test-00000-of-00001.parquet --output-path /path/to/your/output.json --repo-dir /path/to/your/repoUse the provided script:
# Run full SVRepair pipeline
bash scripts/full_run.sh# API Configuration
export OPENAI_API_KEY="your-agent-api-key"
export VLM_API_KEY="your-vlm-api-key"
# System Configuration
export PYTHONPATH=./{
"grommet__grommet-6282": {
"repo": "grommet/grommet",
"instance_id": "grommet__grommet-6282",
"base_commit": "xxxxx",
"patch": "",
"test_patch": "",
"problem_statement": "Bug description...",
"hints_text": "",
"created_at": "",
"image_assets": "{\"problem_statement\": [\"url/of/image1.png\", \"url/of/image2.png\"]}",
"version": "",
"FAIL_TO_PASS": "",
"PASS_TO_PASS": ""
}
}SVRepair provides a comprehensive CLI with the following commands:
Run the complete SVRepair workflow:
python main.py full-run \
--input_data PATH \
--output_dir PATH \
--repo_path PATH \
--image_dir PATH \
--vlm_model MODEL \
--vlm_url URL \
--model_name MODEL \
--base_url URL \
[--temperature 0.0] \
[--max_workers 4] \
[--copy_repo] \
[--project_name svrepair]Generate structured representations from visual artifacts:
python main.py generate-image-ir \
--model_name MODEL \
--base_url URL \
--input_data PATH \
--image_dir PATH \
--result_path PATH \
--output_dir PATH \
[--max_workers 4]Generate code patches based on image IR:
python main.py generate-patch \
--image_ir_path PATH \
--output_dir PATH \
--repo_path PATH \
--model_name MODEL \
--base_url URL \
[--temperature 0.0] \
[--copy_repo]Validate generated patches:
python main.py validation \
--image_ir_path PATH \
--result_path PATH \
--output_dir PATH \
--model_name MODEL \
--base_url URL \
[--max_workers 4] \
[--repo_path PATH]Localize code segments based on visual context:
python main.py localization \
--repo_path PATH \
--image_dir PATH \
--output_dir PATH \
--model_name MODEL \
--base_url URL \
--result_path PATHAfter running SVRepair, you'll find the following structure:
results/
βββ image_ir_data.json # Image IR results
βββ instance_1/ # Generated patch files
β βββ cropped_image.png
β βββ instance_1.patch
β βββ resp.json
β βββ subgraph_instance_1.json
β βββ ...
βββ instance_2/
βββ ...
βββ swebench_image_cropped_instance.json # All generated patches
βββ all_validation_failed_instance.json # Validation results
βββ all_subgraphs_merged.json # Subgraph localization results
βββ SVR-VL_result_path.json # patch-diff
Patch File Format:
--- a/file/path
+++ b/file/path
@@ -10,7 +10,7 @@
- old code
+ new codeThe SVRepair pipeline consists of 6 main stages:
- Image IR Generation: Convert visual artifacts to structured representations
- Patch Generation: Generate code patches based on IR and problem statements
- Validation: Validate patches using rule-based and agent-based methods
- Localization: Identify relevant code segments from visual context
- Refined Patch Generation: Generate improved patches using localization results
- Result Processing: Compile final results and generate evaluation files
This project is licensed under the MIT License - see the LICENSE file for details.
@misc{tang2026svrepairstructuredvisualreasoning,
title={SVRepair: Structured Visual Reasoning for Automated Program Repair},
author={Xiaoxuan Tang and Jincheng Wang and Liwei Luo and Jingxuan Xu and Sheng Zhou and Dajun Chen and Wei Jiang and Yong Li},
year={2026},
eprint={2602.06090},
archivePrefix={arXiv},
primaryClass={cs.SE},
url={https://arxiv.org/abs/2602.06090},
}
