Skip to content

Add Pocket TTS support to LiveKit Agents#4836

Open
dvalle08 wants to merge 7 commits intolivekit:mainfrom
dvalle08:feat/pocket-tts-plugin
Open

Add Pocket TTS support to LiveKit Agents#4836
dvalle08 wants to merge 7 commits intolivekit:mainfrom
dvalle08:feat/pocket-tts-plugin

Conversation

@dvalle08
Copy link

This PR introduces a new livekit-plugins-pocket integration that adds support for the Kyutai Pocket TTS model in LiveKit Agents.

@CLAassistant
Copy link

CLAassistant commented Feb 15, 2026

CLA assistant check
All committers have signed the CLA.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9991bfd9bc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +202 to +205
if not producer_task.done():
producer_task.cancel()
with contextlib.suppress(BaseException):
await producer_task

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Stop producer thread before releasing generation lock

When synthesis times out or is cancelled, this finalizer only calls producer_task.cancel() on an asyncio.to_thread task; cancellation does not stop the underlying worker thread, so generate_audio_stream(...) can keep running after _push_generated_audio exits and releases _generation_lock. In practice, a request that hits conn_options.timeout can leave a background generation active while the next request starts, defeating the serialization guarantee and causing concurrent model access/thread leaks under repeated timeouts.

Useful? React with 👍 / 👎.

Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 6 additional findings.

Open in Devin Review

dvalle08 and others added 3 commits March 15, 2026 16:21
- Fix P1 producer thread leak: add threading.Event stop signal with
  stop_event.set() in finally block to properly stop producer on timeout
- Replace queue.Queue with asyncio.Queue + loop.call_soon_threadsafe for
  thread-safe producer-consumer communication
- Add text sanitization (_sanitize_tts_text) to strip markdown formatting
- Add text chunking (_chunk_tts_text) with max 220 chars for lower latency
- Replace asyncio.Lock with asyncio.Semaphore(max_concurrent_generations)
- Add built-in debug telemetry with wall_to_audio_ratio warnings
- Improve channel detection heuristic (axis with dim <= 8 = channels)
- Make PocketTTS the primary class name (TTS = PocketTTS alias)
- Raise ValueError for non-native sample_rate instead of warning+ignore
- Silence pocket_tts library loggers to reduce console spam
- Resolve uv.lock merge conflicts from upstream rebase

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@dvalle08 dvalle08 force-pushed the feat/pocket-tts-plugin branch from 86b99f2 to 508b39f Compare March 15, 2026 21:48
… and consistency

- Rename PocketTTS to TTS for better alignment with class usage
- Update references in the codebase to reflect the new class name
- Adjust error messages and logging for consistency
- Ensure all related tests are updated to use the new class name
@dvalle08 dvalle08 changed the title Add support for pocket plugin in pyproject.toml and update mypy configuration for pocket_tts module Add Pocket TTS support to LiveKit Agents Mar 15, 2026
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 11 additional findings in Devin Review.

Open in Devin Review

Comment on lines +402 to +414
cleaned_lines: list[str] = []
for raw_line in normalized.split("\n"):
line = raw_line.strip()
if not line:
continue
line = _BULLET_PREFIX_RE.sub("", line)
line = line.lstrip("#> ").strip()
line = line.replace("**", "")
line = line.replace("__", "")
line = line.replace("`", "")
line = line.replace("*", "")
line = line.replace("|", " ")
cleaned_lines.append(line)
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot Mar 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 SynthesizeStream raises APIError when pushed text sanitizes to empty

When text pushed to SynthesizeStream is non-whitespace but sanitizes to empty (e.g., "####", "**", "#"), _flush_text_buffer at line 373 returns early without calling start_segment/end_segment, so no audio is generated and no segments are created. After _run() completes, the base class _main_task (livekit-agents/livekit/agents/tts/tts.py:464-466) checks self._pushed_text.strip() — which is still the original non-empty text — and finds pushed_duration(idx=-1) <= 0.0, raising APIError("no audio frames were pushed for text: ..."). This triggers retries that all fail identically since _input_ch is already consumed. The ChunkedStream._run() correctly handles this at line 316 by setting self._input_text = "", but SynthesizeStream has no equivalent reset of self._pushed_text.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

- Introduce mypy configuration to ignore missing imports for the mistralai module and its submodules, enhancing type checking flexibility.
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 new potential issues.

View 13 additional findings in Devin Review.

Open in Devin Review

"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3 :: Only",
]
dependencies = ["livekit-agents>=1.4.1", "pocket-tts>=1.0.3"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Missing numpy in declared package dependencies

The plugin directly imports numpy (import numpy as np at livekit-plugins/livekit-plugins-pocket/livekit/plugins/pocket/tts.py:27) and uses it extensively in _tensor_to_pcm_bytes, but numpy is not listed in the dependencies array of pyproject.toml. While numpy is listed as an optional dependency of livekit-agents (under the codecs extra at livekit-agents/pyproject.toml:61), it is not a core dependency. If a user installs livekit-plugins-pocket without the codecs extra, numpy may not be present, causing an ImportError at import time.

Suggested change
dependencies = ["livekit-agents>=1.4.1", "pocket-tts>=1.0.3"]
dependencies = ["livekit-agents>=1.4.1", "pocket-tts>=1.0.3", "numpy>=1.26.0"]
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

self._max_concurrent_generations = max_concurrent_generations
self._model: Any = TTSModel.load_model(temp=temperature, lsd_decode_steps=lsd_decode_steps)
self._voice_state: Any = self._load_voice_state(voice)
self._generation_semaphore = asyncio.Semaphore(max_concurrent_generations)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Semaphore created in __init__ is bound to one event loop, breaks if used from a different loop

The asyncio.Semaphore is created in __init__ (tts.py:106), which may run in one event loop context (or no loop at all). In Python 3.10-3.11, a Semaphore created outside an async context will bind to the loop on first acquire. If a TTS instance is created and then used from a different event loop (e.g., created in one thread/context and used in another), the semaphore will raise RuntimeError or silently misbehave. While this is an edge case, other plugins in this repo typically create such primitives lazily or at usage time rather than in __init__.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

- Add a check to set _input_text to an empty string if no speakable content is found after sanitization.
- Update the _sanitize_tts_text function to strip whitespace and only append non-empty lines to the cleaned output.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants