Skip to content

fix(sarvam-tts): correct mime_type from audio/mp3 to audio/wav#5086

Merged
davidzhao merged 1 commit intolivekit:mainfrom
shmundada93:fix/sarvam-tts-mime-type
Mar 12, 2026
Merged

fix(sarvam-tts): correct mime_type from audio/mp3 to audio/wav#5086
davidzhao merged 1 commit intolivekit:mainfrom
shmundada93:fix/sarvam-tts-mime-type

Conversation

@shmundada93
Copy link
Contributor

Summary

The Sarvam TTS API returns WAV (RIFF) audio data, but the plugin declares mime_type="audio/mp3" in both ChunkedStream (line 670) and SynthesizeStream (line 710). This causes the LiveKit audio decoder to attempt MP3 decoding on WAV data, resulting in:

av.error.InvalidDataError: Invalid data found when processing input: 'avcodec_send_packet()'

followed by:

APIError: no audio frames were pushed for text: <text>

This affects all Sarvam TTS models (bulbul:v2, bulbul:v3-beta, bulbul:v3).

Root cause

Verified by calling the Sarvam REST API directly and inspecting the raw response:

raw = base64.b64decode(audios[0])
raw[:4]  # b'RIFF' — WAV header, not MP3

All models and sample rates (8000, 16000, 22050, 24000) return RIFF/WAV audio.

Fix

Change mime_type="audio/mp3"mime_type="audio/wav" in both:

  • ChunkedStream._run() (HTTP batch path)
  • SynthesizeStream._run() (WebSocket streaming path)

Test plan

  • Tested with bulbul:v2 (anushka, en-IN) — audio decodes correctly
  • Tested with bulbul:v3 (shubh, hi-IN) — audio decodes correctly
  • Tested with bulbul:v3 (ritu, en-IN) — audio decodes correctly
  • Tested with bulbul:v3 + temperature=0.3 — audio decodes correctly
  • Tested with bulbul:v3 + enable_preprocessing=True — audio decodes correctly

The Sarvam TTS API returns WAV (RIFF) audio data, but the plugin
declares mime_type="audio/mp3" in both ChunkedStream and
SynthesizeStream. This causes the LiveKit audio decoder to attempt
MP3 decoding on WAV data, resulting in:

  av.error.InvalidDataError: Invalid data found when processing input

Confirmed by inspecting raw API responses — all Sarvam TTS endpoints
(bulbul:v2, v3-beta, v3) return base64-encoded WAV with RIFF headers.

This fix updates both emission points to use the correct mime_type.
@CLAassistant
Copy link

CLAassistant commented Mar 11, 2026

CLA assistant check
All committers have signed the CLA.

Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 2 additional findings.

Open in Devin Review

Copy link
Member

@davidzhao davidzhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lg

@davidzhao davidzhao merged commit 7f1a351 into livekit:main Mar 12, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants