Skip to content

A powerful Python library for converting plain text files (.txt) to professional EPUB eBooks with intelligent chapter detection and AI-enhanced structure analysis.

License

Notifications You must be signed in to change notification settings

oomol-lab/txt-to-epub-converter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

14 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

TXT to EPUB Converter

PyPI version Python Versions License: MIT

A powerful Python library for converting plain text files (.txt) to professional EPUB eBooks with intelligent chapter detection and AI-enhanced structure analysis.

ไธญๆ–‡ๆ–‡ๆกฃ | English

โœจ Features

  • ๐Ÿ“š Intelligent Chapter Detection: Automatically identifies hierarchical structure (volumes, chapters, sections) using pattern matching
  • ๐Ÿค– AI-Enhanced Parsing (Optional): Integrates with OpenAI-compatible LLMs for improved chapter title generation and structure analysis
  • ๐ŸŽฏ Resume Support: Built-in checkpoint mechanism allows resuming interrupted conversions
  • ๐ŸŒ Multi-Language Support: Handles both Chinese (GB18030, GBK, UTF-8) and English text with automatic encoding detection
  • ๐Ÿ’ง Watermark Support: Optional watermark text for copyright protection
  • โœ… Content Validation: Automatic word count validation ensures conversion integrity
  • โšก Progress Tracking: Real-time progress bar with detailed status updates
  • ๐ŸŽจ Professional Formatting: Clean, readable EPUB output with proper CSS styling

๐Ÿš€ Installation

Install from PyPI (Recommended)

pip install txt-to-epub-converter

Install from Source

git clone https://github.com/yourusername/txt-to-epub-converter.git
cd txt-to-epub-converter
pip install -e .

Optional Dependencies

For AI-enhanced parsing (requires OpenAI-compatible API):

pip install txt-to-epub-converter[ai]

For development:

pip install txt-to-epub-converter[dev]

๐Ÿ“– Quick Start

Basic Usage

from txt_to_epub import txt_to_epub

# Simple conversion
result = txt_to_epub(
    txt_file="my_novel.txt",
    epub_file="output/my_novel.epub",
    title="My Novel",
    author="Author Name"
)

print(f"Conversion completed: {result['output_file']}")
print(f"Chapters: {result['chapters_count']}")
print(f"Validation: {'โœ“ Passed' if result['validation_passed'] else 'โœ— Failed'}")

Advanced Configuration

from txt_to_epub import txt_to_epub, ParserConfig

# Custom configuration
config = ParserConfig(
    # Chapter detection patterns
    chapter_patterns=[
        r'^็ฌฌ[0-9้›ถไธ€ไบŒไธ‰ๅ››ไบ”ๅ…ญไธƒๅ…ซไนๅ็™พๅƒ]+็ซ \s+.+$',  # Chinese: ็ฌฌ1็ซ  ๆ ‡้ข˜
        r'^Chapter\s+\d+[:\s]+.+$'                      # English: Chapter 1: Title
    ],

    # Enable AI assistance
    enable_llm_assistance=True,
    llm_api_key="your-api-key",
    llm_base_url="https://api.openai.com/v1",
    llm_model="gpt-4o-mini",

    # Watermark
    enable_watermark=True,
    watermark_text="ยฉ 2026 Author Name. All rights reserved.",

    # Content filtering
    min_chapter_length=100,  # Minimum characters per chapter
    max_chapter_length=50000 # Maximum characters per chapter
)

# Convert with custom config
result = txt_to_epub(
    txt_file="my_book.txt",
    epub_file="output/my_book.epub",
    title="My Book",
    author="Author Name",
    cover_image="cover.jpg",  # Optional cover image
    config=config,
    enable_resume=True         # Enable checkpoint resume
)

๐ŸŽฏ Use Cases

Converting Web Novels

Perfect for converting downloaded web novels with standard chapter formatting:

from txt_to_epub import txt_to_epub

result = txt_to_epub(
    txt_file="web_novel.txt",
    epub_file="web_novel.epub",
    title="Epic Fantasy Novel",
    author="Web Author"
)

Converting Technical Documentation

Handles technical books with hierarchical structure:

from txt_to_epub import txt_to_epub, ParserConfig

config = ParserConfig(
    volume_patterns=[r'^Part\s+\d+[:\s]+.+$'],
    chapter_patterns=[r'^Chapter\s+\d+[:\s]+.+$'],
    section_patterns=[r'^\d+\.\d+\s+.+$']
)

result = txt_to_epub(
    txt_file="programming_guide.txt",
    epub_file="programming_guide.epub",
    title="Programming Guide",
    author="Tech Writer",
    config=config
)

Batch Conversion

Convert multiple files efficiently:

from txt_to_epub import txt_to_epub
from pathlib import Path

txt_files = Path("books").glob("*.txt")

for txt_file in txt_files:
    epub_file = f"output/{txt_file.stem}.epub"

    try:
        result = txt_to_epub(
            txt_file=str(txt_file),
            epub_file=epub_file,
            title=txt_file.stem.replace("_", " ").title(),
            author="Collection"
        )
        print(f"โœ“ Converted: {txt_file.name}")
    except Exception as e:
        print(f"โœ— Failed: {txt_file.name} - {e}")

๐Ÿ› ๏ธ Configuration Options

ParserConfig Parameters

Parameter Type Default Description
chapter_patterns List[str] Built-in patterns Regex patterns for chapter detection
volume_patterns List[str] Built-in patterns Regex patterns for volume detection
section_patterns List[str] Built-in patterns Regex patterns for section detection
min_chapter_length int 50 Minimum characters per chapter
max_chapter_length int 100000 Maximum characters per chapter
enable_llm_assistance bool False Enable AI-enhanced parsing
llm_api_key str None OpenAI-compatible API key
llm_base_url str OpenAI URL API base URL
llm_model str "gpt-4o-mini" Model name
enable_watermark bool False Enable watermark
watermark_text str None Watermark text

txt_to_epub() Parameters

Parameter Type Required Description
txt_file str Yes Input TXT file path
epub_file str Yes Output EPUB file path
title str No Book title (default: "My Book")
author str No Author name (default: "Unknown")
cover_image str No Cover image path (PNG/JPG)
config ParserConfig No Custom configuration
show_progress bool No Show progress bar (default: True)
enable_resume bool No Enable checkpoint resume (default: False)

๐Ÿ“Š Output Structure

The converter generates EPUB files with the following structure:

output.epub
โ”œโ”€โ”€ Volume 1: Title
โ”‚   โ”œโ”€โ”€ Chapter 1: Title
โ”‚   โ”œโ”€โ”€ Chapter 2: Title
โ”‚   โ””โ”€โ”€ ...
โ”œโ”€โ”€ Volume 2: Title
โ”‚   โ””โ”€โ”€ ...
โ””โ”€โ”€ Chapter N: Title (standalone chapters without volumes)
    โ”œโ”€โ”€ Section 1.1
    โ””โ”€โ”€ Section 1.2

๐Ÿค– AI-Enhanced Features

When enable_llm_assistance=True:

  1. Smart Title Generation: Generates descriptive titles for chapters without clear titles
  2. Table of Contents Detection: Removes redundant TOC sections automatically
  3. Batch Processing: Processes multiple chapters in parallel for efficiency
  4. Cost Tracking: Reports API usage and costs

Example with AI:

from txt_to_epub import txt_to_epub, ParserConfig

config = ParserConfig(
    enable_llm_assistance=True,
    llm_api_key="sk-...",
    llm_model="gpt-4o-mini"  # Fast and cost-effective
)

result = txt_to_epub(
    txt_file="novel.txt",
    epub_file="novel.epub",
    title="My Novel",
    author="Author",
    config=config
)

# AI usage stats are logged automatically

๐Ÿ”„ Resume Feature

The resume feature allows you to continue interrupted conversions:

result = txt_to_epub(
    txt_file="large_book.txt",
    epub_file="large_book.epub",
    title="Large Book",
    author="Author",
    enable_resume=True  # Enable checkpoint resume
)

If the conversion is interrupted (Ctrl+C, crash, etc.), simply run the same command again. The converter will:

  • Detect the previous state file
  • Verify the source file hasn't changed
  • Resume from the last processed chapter
  • Clean up the state file when complete

๐Ÿ“ Content Validation

Every conversion includes automatic validation:

=== Conversion Content Integrity Report ===
Source file: my_novel.txt
Original characters: 123,456
Converted characters: 123,450
Match rate: 99.99%

โœ“ Content integrity verification passed

๐ŸŽจ Supported Text Formats

Chapter Title Formats

Chinese:

  • ็ฌฌไธ€็ซ  ๆ ‡้ข˜ (Traditional numbering)
  • ็ฌฌ1็ซ  ๆ ‡้ข˜ (Arabic numerals)
  • ็ฌฌ001็ซ  ๆ ‡้ข˜ (Zero-padded)
  • Chapter 1: ๆ ‡้ข˜ (Mixed)

English:

  • Chapter 1: Title
  • Chapter One: Title
  • CHAPTER 1 - TITLE
  • 1. Title

Volume/Book Formats

  • ็ฌฌไธ€ๅท ๆ ‡้ข˜ / ็ฌฌ1ๅท ๆ ‡้ข˜ (Chinese)
  • Volume 1: Title / Book 1: Title (English)
  • Part I: Title (Roman numerals)

๐Ÿงช Testing

Run the test suite:

# Install dev dependencies
pip install -e .[dev]

# Run tests
pytest

# Run with coverage
pytest --cov=txt_to_epub --cov-report=html

๐Ÿ“š Examples

Check the examples directory for complete examples:

๐Ÿค Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

Development Setup

# Clone repository
git clone https://github.com/yourusername/txt-to-epub-converter.git
cd txt-to-epub-converter

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install development dependencies
pip install -e .[dev]

# Run tests
pytest

# Format code
black src/txt_to_epub

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ™ Acknowledgments

  • EbookLib - EPUB file generation
  • chardet - Character encoding detection
  • OpenAI - LLM assistance (optional)

๐Ÿ“ฎ Support

๐Ÿ—บ๏ธ Roadmap

  • Support for more eBook formats (MOBI, PDF)
  • GUI application
  • Command-line interface (CLI)
  • Cloud service integration
  • Enhanced AI features (style analysis, content summarization)
  • Multi-language UI

Made with โค๏ธ by the TXT to EPUB Converter Team

Star โญ this repository if you find it helpful!

About

A powerful Python library for converting plain text files (.txt) to professional EPUB eBooks with intelligent chapter detection and AI-enhanced structure analysis.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published

Languages