test: add 140 extensive edge-case tests across all modules#4
test: add 140 extensive edge-case tests across all modules#4Alex-Wengg merged 7 commits intomainfrom
Conversation
39d5e30 to
35a0771
Compare
Covers ITN and TN directions, sentence-level processing, roundtrip
consistency, boundary conditions, cross-tagger interference, and
realistic TTS preprocessing scenarios. Includes comprehensive date
testing (all 31 days, all 12 months, numeric formats, decades, year
verbalization).
Documents 7 known issues found during testing:
- Decimal tagger intercepts scale numbers before cardinal (e.g. "one million" → "1 million")
- Time tagger rejects hours 10-19 without AM/PM to avoid year conflicts
- "o clock" (with space) not matched by time tagger
- Measure tagger adds space before % ("50 %" not "50%")
- Sentence ITN consumes "and" between number groups
- Adjacent number words combined greedily in sentence mode
- Comma-attached tokens break sentence-mode date matching
Rename internal modules from academic naming (ITN/TN) to consumer- oriented naming (ASR/TTS) to match FluidAudio conventions. Fix three bugs flagged in PR review: - Year verbalization: X01-X09 years now produce "oh" form (1901 → "nineteen oh one" instead of "nineteen one") - URL parser: case-insensitive protocol detection now matches case-insensitive prefix stripping (HTTP:// works correctly) - number_to_words: i64::MIN no longer panics on negation overflow (uses wrapping_neg + u64 instead of -n)
9123b7f to
4afad25
Compare
| let cents: i64 = if cents_str.len() == 1 { | ||
| cents_str.parse::<i64>().ok()? * 10 | ||
| } else if cents_str.len() == 2 { | ||
| cents_str.parse().ok()? | ||
| } else { | ||
| // More than 2 decimal places — take first 2 | ||
| cents_str[..2].parse().ok()? | ||
| }; |
There was a problem hiding this comment.
🔴 Panic on currency input with trailing decimal point (e.g., "$5.")
In parse_dollars_cents, when the amount string ends with a dot (e.g., "$5."), splitn(2, '.') produces ["5", ""], making cents_str an empty string (len=0). The three-branch if/else if/else at lines 151-158 doesn't handle len=0: it falls through to the else branch which assumes cents_str.len() >= 3 and executes cents_str[..2], causing a panic (index out of bounds). This is reachable from the public tn_normalize("$5.") API — the money tagger's validation at src/tts/money.rs:90-96 allows dots, so "5." passes through to parse_dollars_cents. Compare with src/tts/decimal.rs:53 which correctly guards against empty fractional parts with if frac_str.is_empty() { return None; }.
| let cents: i64 = if cents_str.len() == 1 { | |
| cents_str.parse::<i64>().ok()? * 10 | |
| } else if cents_str.len() == 2 { | |
| cents_str.parse().ok()? | |
| } else { | |
| // More than 2 decimal places — take first 2 | |
| cents_str[..2].parse().ok()? | |
| }; | |
| let cents: i64 = if cents_str.is_empty() { | |
| 0 | |
| } else if cents_str.len() == 1 { | |
| cents_str.parse::<i64>().ok()? * 10 | |
| } else if cents_str.len() == 2 { | |
| cents_str.parse().ok()? | |
| } else { | |
| // More than 2 decimal places — take first 2 | |
| cents_str[..2].parse().ok()? | |
| }; |
Was this helpful? React with 👍 or 👎 to provide feedback.
| if clean.contains('.') { | ||
| let parts: Vec<&str> = clean.splitn(2, '.').collect(); | ||
| if parts.len() == 2 { | ||
| let int_val: i64 = parts[0].parse().ok()?; |
There was a problem hiding this comment.
🟡 ? operator inside for loop causes premature return from entire function instead of continuing to next unit match
In the measure TN tagger's parse() function, the ? operator on lines 155 and 169 propagates None out of the entire parse() function when a number fails to parse as i64. Since these are inside a for loop iterating over candidate unit matches, this aborts the loop prematurely instead of continue-ing to the next unit. For example, if a number like .5 (empty integer part) is matched against %, the "".parse::<i64>() fails and the function returns None immediately, without trying any remaining units. While the practical impact is limited (other units for the same input usually have the same number part), this violates the intended control flow of trying all units before giving up.
Prompt for agents
In src/tts/measure.rs, replace the two uses of the ? operator inside the for loop with continue statements to avoid prematurely returning None from the entire parse() function.
Line 155: Change `let int_val: i64 = parts[0].parse().ok()?;` to use a match or if-let that calls continue on parse failure:
let Ok(int_val) = parts[0].parse::<i64>() else { continue; };
Line 169: Change `let n: i64 = clean.parse().ok()?;` similarly:
let Ok(n) = clean.parse::<i64>() else { continue; };
This ensures that if a number fails to parse for one unit match, the loop continues trying other unit matches.
Was this helpful? React with 👍 or 👎 to provide feedback.
Summary
Known Issues Documented
normalize("one million")→"1 million"not"1000000"normalize("twelve fifteen")→"1215"not"12:15"normalize("two o clock")→ passthrough"o'clock"or"oclock", not"o clock"normalize("fifty percent")→"50 %"not"50%""{number} {unit}"words_to_numberfilters "and" as separatorTest plan