Skip to content

[EMNLP 2025] OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking

License

Notifications You must be signed in to change notification settings

zjunlp/OmniThink

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

52 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OmniThink

Expanding Knowledge Boundaries in Machine Writing through Thinking

Table of Contents

🔔News

  • 2025-08-24, We have added offline local search support using RAGFlow technology! Now you can search local documents without internet connection.
  • 2025-03-12, We have optimized the Docker usage for OmniThink.
  • 2025-02-20, We have added the evaluation methods from the paper to OmniThink, and in the future, we will integrate more evaluation methods.
  • 2025-01-28, We have provided support for the deepseek-reasoner model. You can try running ./examples/deepseekr1.py to test OmniThink's performance within deepseek-reasoner.
Previous News
  • 2025-01-18, we open-sourced OmniThink, a machine writing framework.

🌻Acknowledgement

📖 Quick Start

  • 🌏 The Online Demo is avaiable at ModelScope now!

📌 Introduction

Welcome to OmniThink, an innovative machine writing framework designed to replicate the human cognitive process of iterative expansion and reflection in generating insightful long-form articles.

  • Iterative Expansion and Reflection: OmniThink uses a unique mechanism that simulates human cognitive behaviors to deepen the understanding of complex topics.
  • Enhanced Knowledge Density: OmniThink focuses on expanding knowledge boundaries, resulting in articles that are rich in information and insights.
  • Comprehensive Article Generation: OmniThink constructs outlines and generates articles, delivering high-quality content that is both coherent and contextually robust.

🛠 Dependencies

📦 Conda

conda create -n OmniThink python=3.11
git clone https://github.com/zjunlp/OmniThink.git
cd OmniThink
# Install requirements
pip install -r requirements.txt

🔍 Local Search Support

OmniThink now supports offline local search using RAGFlow technology! This feature allows you to:

  • Search local documents without internet connection
  • Use vector embeddings for semantic search
  • Index and retrieve your own document collections
  • Maintain data privacy with local-only processing

Local Search Features

  • OfflineRAGFlow: Core RAG engine with FAISS vector database
  • LocalSearch: DSPy-compatible search interface
  • Sentence Transformers: High-quality text embeddings
  • Smart Chunking: Intelligent document segmentation
  • Semantic Retrieval: Context-aware search results

Quick Local Search Setup

from src.tools.rm import OfflineRAGFlow, LocalSearch

# Initialize the local RAG engine
rag_engine = OfflineRAGFlow(
    model_name="sentence-transformers/all-MiniLM-L6-v2",
    chunk_size=800,
    overlap=120,
    k=5
)

# Add documents to your local index
rag_engine.ingest(
    text="Your document content here...",
    meta={"title": "Document Title", "doc_id": "doc1"}
)

# Create DSPy-compatible search interface
local_search = LocalSearch(search=rag_engine, k=3)

# Use in your DSPy pipeline
results = local_search.forward("your search query")

🐳 Docker

git clone https://github.com/zjunlp/OmniThink.git
docker pull zjunlp/omnithink:latest
docker run -it zjunlp/omnithink:latest

🔑 Before running, please export the LM API key and SEARCH key as an environment variable:

export LM_KEY=YOUR_API_KEY
export SEARCHKEY=YOUR_SEARCHKEY

Local Search Dependencies

For local search functionality, additional packages are required:

# Install local search dependencies
pip install sentence-transformers faiss-cpu numpy

# Or use the updated requirements.txt
pip install -r requirements.txt

You can define your own LM API and SEARCH API

Note that the output of the LM should be a LIST.

Results in OmniThink

The preformance of OmniThink is shown below:

Generate Article in OmniThink

Just one command required

sh run.sh

You can find your Article, Outline and mindmap in ./results/

🔍 Evaluation

We provide convenient scripts for evaluating your method. The evaluation is divided into three categories: Rubric_Grading, Knowledge_Density, and Information_Diversity.

We use the factscore library. Please run the following code before starting the evaluation.

cd eval
git clone https://github.com/shmsw25/FActScore.git

For Rubric Grading

python Rubric_Grading.py \
 --articlepath articlepath \
 --modelpath modelpath

For Information Diversity

python Information_Diversity.py \
 --mappath mappath \
 --model_path model_path

For Knowledge_Density

python Knowledge_Density.py \
 --articlepath articlepath \
 --api_path api_path \
 --threads threads

Citation

If you find our repo useful in your research, please kindly consider cite:

@misc{xi2025omnithinkexpandingknowledgeboundaries,
      title={OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking}, 
      author={Zekun Xi and Wenbiao Yin and Jizhan Fang and Jialong Wu and Runnan Fang and Ningyu Zhang and Jiang Yong and Pengjun Xie and Fei Huang and Huajun Chen},
      year={2025},
      eprint={2501.09751},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2501.09751}, 
}