Add Pocket TTS support to LiveKit Agents#4836
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9991bfd9bc
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if not producer_task.done(): | ||
| producer_task.cancel() | ||
| with contextlib.suppress(BaseException): | ||
| await producer_task |
There was a problem hiding this comment.
Stop producer thread before releasing generation lock
When synthesis times out or is cancelled, this finalizer only calls producer_task.cancel() on an asyncio.to_thread task; cancellation does not stop the underlying worker thread, so generate_audio_stream(...) can keep running after _push_generated_audio exits and releases _generation_lock. In practice, a request that hits conn_options.timeout can leave a background generation active while the next request starts, defeating the serialization guarantee and causing concurrent model access/thread leaks under repeated timeouts.
Useful? React with 👍 / 👎.
…guration for pocket_tts module
- Fix P1 producer thread leak: add threading.Event stop signal with stop_event.set() in finally block to properly stop producer on timeout - Replace queue.Queue with asyncio.Queue + loop.call_soon_threadsafe for thread-safe producer-consumer communication - Add text sanitization (_sanitize_tts_text) to strip markdown formatting - Add text chunking (_chunk_tts_text) with max 220 chars for lower latency - Replace asyncio.Lock with asyncio.Semaphore(max_concurrent_generations) - Add built-in debug telemetry with wall_to_audio_ratio warnings - Improve channel detection heuristic (axis with dim <= 8 = channels) - Make PocketTTS the primary class name (TTS = PocketTTS alias) - Raise ValueError for non-native sample_rate instead of warning+ignore - Silence pocket_tts library loggers to reduce console spam - Resolve uv.lock merge conflicts from upstream rebase Co-Authored-By: Claude Opus 4.6 <[email protected]>
86b99f2 to
508b39f
Compare
… and consistency - Rename PocketTTS to TTS for better alignment with class usage - Update references in the codebase to reflect the new class name - Adjust error messages and logging for consistency - Ensure all related tests are updated to use the new class name
| cleaned_lines: list[str] = [] | ||
| for raw_line in normalized.split("\n"): | ||
| line = raw_line.strip() | ||
| if not line: | ||
| continue | ||
| line = _BULLET_PREFIX_RE.sub("", line) | ||
| line = line.lstrip("#> ").strip() | ||
| line = line.replace("**", "") | ||
| line = line.replace("__", "") | ||
| line = line.replace("`", "") | ||
| line = line.replace("*", "") | ||
| line = line.replace("|", " ") | ||
| cleaned_lines.append(line) |
There was a problem hiding this comment.
🟡 SynthesizeStream raises APIError when pushed text sanitizes to empty
When text pushed to SynthesizeStream is non-whitespace but sanitizes to empty (e.g., "####", "**", "#"), _flush_text_buffer at line 373 returns early without calling start_segment/end_segment, so no audio is generated and no segments are created. After _run() completes, the base class _main_task (livekit-agents/livekit/agents/tts/tts.py:464-466) checks self._pushed_text.strip() — which is still the original non-empty text — and finds pushed_duration(idx=-1) <= 0.0, raising APIError("no audio frames were pushed for text: ..."). This triggers retries that all fail identically since _input_ch is already consumed. The ChunkedStream._run() correctly handles this at line 316 by setting self._input_text = "", but SynthesizeStream has no equivalent reset of self._pushed_text.
Was this helpful? React with 👍 or 👎 to provide feedback.
- Introduce mypy configuration to ignore missing imports for the mistralai module and its submodules, enhancing type checking flexibility.
| "Programming Language :: Python :: 3.10", | ||
| "Programming Language :: Python :: 3 :: Only", | ||
| ] | ||
| dependencies = ["livekit-agents>=1.4.1", "pocket-tts>=1.0.3"] |
There was a problem hiding this comment.
🔴 Missing numpy in declared package dependencies
The plugin directly imports numpy (import numpy as np at livekit-plugins/livekit-plugins-pocket/livekit/plugins/pocket/tts.py:27) and uses it extensively in _tensor_to_pcm_bytes, but numpy is not listed in the dependencies array of pyproject.toml. While numpy is listed as an optional dependency of livekit-agents (under the codecs extra at livekit-agents/pyproject.toml:61), it is not a core dependency. If a user installs livekit-plugins-pocket without the codecs extra, numpy may not be present, causing an ImportError at import time.
| dependencies = ["livekit-agents>=1.4.1", "pocket-tts>=1.0.3"] | |
| dependencies = ["livekit-agents>=1.4.1", "pocket-tts>=1.0.3", "numpy>=1.26.0"] |
Was this helpful? React with 👍 or 👎 to provide feedback.
| self._max_concurrent_generations = max_concurrent_generations | ||
| self._model: Any = TTSModel.load_model(temp=temperature, lsd_decode_steps=lsd_decode_steps) | ||
| self._voice_state: Any = self._load_voice_state(voice) | ||
| self._generation_semaphore = asyncio.Semaphore(max_concurrent_generations) |
There was a problem hiding this comment.
🟡 Semaphore created in __init__ is bound to one event loop, breaks if used from a different loop
The asyncio.Semaphore is created in __init__ (tts.py:106), which may run in one event loop context (or no loop at all). In Python 3.10-3.11, a Semaphore created outside an async context will bind to the loop on first acquire. If a TTS instance is created and then used from a different event loop (e.g., created in one thread/context and used in another), the semaphore will raise RuntimeError or silently misbehave. While this is an edge case, other plugins in this repo typically create such primitives lazily or at usage time rather than in __init__.
Was this helpful? React with 👍 or 👎 to provide feedback.
- Add a check to set _input_text to an empty string if no speakable content is found after sanitization. - Update the _sanitize_tts_text function to strip whitespace and only append non-empty lines to the cleaned output.
This PR introduces a new livekit-plugins-pocket integration that adds support for the Kyutai Pocket TTS model in LiveKit Agents.