Conversation
Greptile SummaryThis PR improves transcript-based speaker detection by making it context-aware of already-known people in the user's profile. The two main changes are: (1) a session-scoped Key changes:
Issues found:
Confidence Score: 4/5Mostly safe to merge but the possessive-apostrophe false-positive in the known-name regex path should be fixed before landing to avoid incorrect speaker labelling in production One P1 logic issue: the \b anchor in _detect_known_name_from_text matches before possessive 's, so "this is bob's phone" incorrectly identifies Bob as the speaker. The broad "this is / it is / it's" introduction phrases make this more likely to surface than with the existing stricter patterns. The remaining P2 items (missed cache update on fallback, English-only patterns without documentation) are non-blocking quality concerns. backend/utils/speaker_identification.py — the _detect_known_name_from_text regex boundary logic needs a negative lookahead for possessives Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[segment.text arrives] --> B{detect_speaker_from_text\nwith known_names}
B --> C{Match 33-language\nregex patterns?}
C -- Yes --> D[_normalize_detected_name\nfrom regex capture]
C -- No --> E{known_names\nprovided?}
E -- No --> Z[return None]
E -- Yes --> F[_detect_known_name_from_text\nnormalize text + names to lowercase]
F --> G{re.search English\nself-reference patterns\nfor each known name}
G -- Match --> H[_normalize_detected_name\nof known_name]
G -- No match --> Z
D --> I[detected_name]
H --> I
I --> J{Cache hit in\nknown_people_by_name_key?}
J -- Yes --> K[Use cached person\nset detected_name = person name]
J -- No --> L[get_person_by_name\nFirestore fallback]
L -- Found --> M[Use found person\n⚠️ NOT added to cache]
L -- Not found --> N[create_person\nupdate cache]
K --> O[SpeakerLabelSuggestionEvent]
M --> O
N --> O
Reviews (1): Last reviewed commit: "feat(backend): improve transcript speake..." | Re-trigger Greptile |
| for known_name in sorted_names: | ||
| escaped_name = re.escape(normalize_person_name_key(known_name)) | ||
| for pattern in SELF_REFERENCE_KNOWN_NAME_PATTERNS: | ||
| if re.search(rf"\b{pattern.format(name=escaped_name)}\b", normalized_text): |
There was a problem hiding this comment.
Possessive
's triggers false-positive speaker detection
The final \b word-boundary anchor in the pattern matches between a word character and an apostrophe, so phrases like "this is bob's phone" or "it's anne-marie's idea" will still match the pattern and incorrectly attribute the segment to the known person.
Concretely, for known_name = "Bob" and normalized text "this is bob's phone":
- Pattern:
r"\bthis is\s+bob\b" \bfires betweenband'becausebis\wand'is\W— this IS a regex word boundary.
The broader "this is / it is / it's" phrases make this especially risky compared to the stricter "I am / My name is" anchors that were already present. The fix is to add a negative lookahead for possessives:
| if re.search(rf"\b{pattern.format(name=escaped_name)}\b", normalized_text): | |
| if re.search(rf"\b{pattern.format(name=escaped_name)}\b(?!['\u2019]s)", normalized_text): |
| if person is None: | ||
| person = user_db.get_person_by_name(uid, detected_name) | ||
| if person: | ||
| person_id = person['id'] | ||
| detected_name = person['name'] |
There was a problem hiding this comment.
Successful
get_person_by_name result not added to cache
When the local cache misses but the Firestore fallback (get_person_by_name) succeeds, the found person is never stored in known_people_by_name_key. For conversations where a speaker repeatedly uses the same self-introduction phrase, every occurrence triggers a redundant Firestore read instead of being served from cache after the first hit.
The creation path at line 2276 correctly updates the cache; the fallback-hit path should mirror it:
| if person is None: | |
| person = user_db.get_person_by_name(uid, detected_name) | |
| if person: | |
| person_id = person['id'] | |
| detected_name = person['name'] | |
| if person is None: | |
| person = user_db.get_person_by_name(uid, detected_name) | |
| if person: | |
| known_people_by_name_key[normalize_person_name_key(person['name'])] = person |
| SELF_REFERENCE_KNOWN_NAME_PATTERNS = [ | ||
| r"(?:i am|i'm|my name is|this is|it is|it's|call me|name is)\s+{name}", | ||
| r"{name}\s+(?:here|speaking)", | ||
| ] |
There was a problem hiding this comment.
Known-name patterns are English-only, silently skipping 32 other languages
SELF_REFERENCE_KNOWN_NAME_PATTERNS uses exclusively English introduction phrases (i am, this is, my name is, etc.). The 33 language patterns in SPEAKER_IDENTIFICATION_PATTERNS handle non-English phrases but require uppercase ASR output. If a French ASR produces "je suis pierre" and "Pierre" is in the user's contacts, neither path will match.
Consider adding a comment to document this scope explicitly so future contributors don't assume cross-language support:
# NOTE: These patterns are English-only. Lowercase-ASR normalization for other
# languages is not yet supported — non-English patterns in
# SPEAKER_IDENTIFICATION_PATTERNS still require title-cased ASR output.
SELF_REFERENCE_KNOWN_NAME_PATTERNS = [
...
]|
Follow-up pushed on head 2ff07e1 to address the review notes:\n\n- fixed the possessive false-positive in known-name detection ( his is bob's ... no longer resolves to Bob)\n- cached successful get_person_by_name fallback hits back into the session map to avoid repeated Firestore reads\n- documented the known-name fallback as an English-only heuristic for now\n- added a regression test for the possessive case\n\nValidation in this environment:\n- python -m py_compile utils/speaker_identification.py routers/transcribe.py tests/unit/test_speaker_id_pipeline.py\n- targeted inline Python smoke checks for known-name matching, including the possessive regression |
|
Follow-up pushed on the current head to close the remaining real regression here. The known-name fallback no longer treats possessive phrases like \ his is bob's phone\ or \ his is bob’s phone\ as a self-introduction, and I added a regression check for both variants. Re-verified with \python -m py_compile backend/utils/speaker_identification.py backend/routers/transcribe.py backend/tests/unit/test_speaker_id_pipeline.py\ plus a targeted inline Python smoke check of the known-name path in this environment. |
|
Closing — unsolicited AI-generated PR. Please open an issue or discuss before submitting PRs. by AI for @beastoin |
|
Hey @rbxict 👋 Thank you so much for taking the time to contribute to Omi! We truly appreciate you putting in the effort to submit this pull request. After careful review, we've decided not to merge this particular PR. Please don't take this personally — we genuinely try to merge as many contributions as possible, but sometimes we have to make tough calls based on:
Your contribution is still valuable to us, and we'd love to see you contribute again in the future! If you'd like feedback on how to improve this PR or want to discuss alternative approaches, please don't hesitate to reach out. Thank you for being part of the Omi community! 💜 |
Summary
/listenflow so lowercase ASR transcripts likethis is bobcan resolve to existing people without duplicate person creationthis is bob's phoneandthis is bob?s phoneso they do not get mislabeled as self-introductionsWhy this approach
Testing
python -m py_compile backend/utils/speaker_identification.py backend/routers/transcribe.py backend/tests/unit/test_speaker_id_pipeline.pypytest backend/tests/unit/test_speaker_id_pipeline.pywas not runnable here because this local environment is missing the backend test dependencies needed for import-time modulesPayout Details
Wallet (EVM): 0xe744f6791a685b0A0cC316ED44375B69361c837F
Solana: 8BsByR6rPqxDPku6dYtdoiSk6bdgE9YETbLQF2RGSw1C
Closes #3039
PoA:
poa_21ba67cc5737dade