feat: solution for issue #3039 by rbxict · Pull Request #6136 · BasedHardware/omi

rbxict · 2026-03-28T20:38:38Z

Summary

make transcript-based speaker detection context-aware for already known people instead of relying only on exact-case regex matches
normalize known person names locally in the /listen flow so lowercase ASR transcripts like this is bob can resolve to existing people without duplicate person creation
reject possessive phrases like this is bob's phone and this is bob?s phone so they do not get mislabeled as self-introductions
add focused regression coverage for lowercase ASR, multi-word names, apostrophes/hyphens, and false-positive addressee or possessive cases

Why this approach

keeps inference fast enough for live transcripts
adds no new model downloads or runtime dependencies
improves real production cases where ASR lowercases names but the user already has a matching person profile

Testing

python -m py_compile backend/utils/speaker_identification.py backend/routers/transcribe.py backend/tests/unit/test_speaker_id_pipeline.py
targeted inline Python smoke check for known-name transcript matching, including straight and curly-apostrophe possessive regressions
full pytest backend/tests/unit/test_speaker_id_pipeline.py was not runnable here because this local environment is missing the backend test dependencies needed for import-time modules

Payout Details

Wallet (EVM): 0xe744f6791a685b0A0cC316ED44375B69361c837F
Solana: 8BsByR6rPqxDPku6dYtdoiSk6bdgE9YETbLQF2RGSw1C

Closes #3039

PoA: poa_21ba67cc5737dade

greptile-apps · 2026-03-28T20:45:04Z

Greptile Summary

This PR improves transcript-based speaker detection by making it context-aware of already-known people in the user's profile. The two main changes are: (1) a session-scoped known_people_by_name_key cache built at stream-start from get_people, and (2) a new _detect_known_name_from_text path that matches against normalized (lowercase) known names using English self-reference phrases, allowing ASR output like "this is bob" to resolve to an existing "Bob" profile without creating a duplicate.

Key changes:

normalize_person_name_key / _normalize_detected_name utilities enable case-insensitive cache lookups and proper title-casing of matched names, including hyphenated and apostrophe-containing names
detect_speaker_from_text gains an optional known_names parameter; the 33-language regex path runs first, and the new known-name path is a fallback for lowercase ASR
The creation path in transcribe.py now stores newly created persons back into the cache to prevent repeat DB writes in the same session
Five new unit tests cover the targeted scenarios (lowercase ASR, multi-word names, punctuated names, addressee rejection, normalization)

Issues found:

The final \\b word-boundary anchor in _detect_known_name_from_text still fires before possessive 's, so \"this is bob's phone\" will false-positive as speaker detection for a known person "Bob" — the \"this is / it is\" phrases are broader than the existing "I am / My name is" patterns and make this more likely to trigger
When the get_person_by_name Firestore fallback succeeds, the result is not stored in known_people_by_name_key, causing a fresh Firestore read on every subsequent segment in the same session that matches the same speaker
SELF_REFERENCE_KNOWN_NAME_PATTERNS is English-only; non-English lowercase ASR is not handled by the new path, which is not documented

Confidence Score: 4/5

Mostly safe to merge but the possessive-apostrophe false-positive in the known-name regex path should be fixed before landing to avoid incorrect speaker labelling in production

One P1 logic issue: the \b anchor in _detect_known_name_from_text matches before possessive 's, so "this is bob's phone" incorrectly identifies Bob as the speaker. The broad "this is / it is / it's" introduction phrases make this more likely to surface than with the existing stricter patterns. The remaining P2 items (missed cache update on fallback, English-only patterns without documentation) are non-blocking quality concerns.

backend/utils/speaker_identification.py — the _detect_known_name_from_text regex boundary logic needs a negative lookahead for possessives

Important Files Changed

Filename	Overview
backend/utils/speaker_identification.py	Adds normalize_person_name_key, _normalize_detected_name, and _detect_known_name_from_text helpers plus English-only SELF_REFERENCE_KNOWN_NAME_PATTERNS; the final \b anchor in the detection regex produces false positives on possessive constructs like "this is bob's"
backend/routers/transcribe.py	Introduces a session-scoped known_people_by_name_key cache populated at stream start; integrates it into the text-based detection flow to avoid duplicate person creation; cache is not refreshed after a successful get_person_by_name fallback, causing repeated Firestore reads per recurrence
backend/tests/unit/test_speaker_id_pipeline.py	Adds five focused unit tests for the new known-name detection path covering lowercase ASR, multi-word names, punctuated names, addressee rejection, and key normalization; all cases are well-chosen and correctly structured

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[segment.text arrives] --> B{detect_speaker_from_text\nwith known_names}
    B --> C{Match 33-language\nregex patterns?}
    C -- Yes --> D[_normalize_detected_name\nfrom regex capture]
    C -- No --> E{known_names\nprovided?}
    E -- No --> Z[return None]
    E -- Yes --> F[_detect_known_name_from_text\nnormalize text + names to lowercase]
    F --> G{re.search English\nself-reference patterns\nfor each known name}
    G -- Match --> H[_normalize_detected_name\nof known_name]
    G -- No match --> Z
    D --> I[detected_name]
    H --> I
    I --> J{Cache hit in\nknown_people_by_name_key?}
    J -- Yes --> K[Use cached person\nset detected_name = person name]
    J -- No --> L[get_person_by_name\nFirestore fallback]
    L -- Found --> M[Use found person\n⚠️ NOT added to cache]
    L -- Not found --> N[create_person\nupdate cache]
    K --> O[SpeakerLabelSuggestionEvent]
    M --> O
    N --> O

_{Reviews (1): Last reviewed commit: "feat(backend): improve transcript speake..." | Re-trigger Greptile}

greptile-apps · 2026-03-28T20:45:08Z

backend/utils/speaker_identification.py

+    for known_name in sorted_names:
+        escaped_name = re.escape(normalize_person_name_key(known_name))
+        for pattern in SELF_REFERENCE_KNOWN_NAME_PATTERNS:
+            if re.search(rf"\b{pattern.format(name=escaped_name)}\b", normalized_text):


Possessive 's triggers false-positive speaker detection

The final \b word-boundary anchor in the pattern matches between a word character and an apostrophe, so phrases like "this is bob's phone" or "it's anne-marie's idea" will still match the pattern and incorrectly attribute the segment to the known person.

Concretely, for known_name = "Bob" and normalized text "this is bob's phone":

Pattern: r"\bthis is\s+bob\b"

\b fires between b and ' because b is \w and ' is \W — this IS a regex word boundary.

The broader "this is / it is / it's" phrases make this especially risky compared to the stricter "I am / My name is" anchors that were already present. The fix is to add a negative lookahead for possessives:

Suggested change

if re.search(rf"\b{pattern.format(name=escaped_name)}\b", normalized_text):

if re.search(rf"\b{pattern.format(name=escaped_name)}\b(?!['\u2019]s)", normalized_text):

greptile-apps · 2026-03-28T20:45:09Z

backend/routers/transcribe.py

+                        if person is None:
+                            person = user_db.get_person_by_name(uid, detected_name)
                        if person:
                            person_id = person['id']
+                            detected_name = person['name']


Successful get_person_by_name result not added to cache

When the local cache misses but the Firestore fallback (get_person_by_name) succeeds, the found person is never stored in known_people_by_name_key. For conversations where a speaker repeatedly uses the same self-introduction phrase, every occurrence triggers a redundant Firestore read instead of being served from cache after the first hit.

The creation path at line 2276 correctly updates the cache; the fallback-hit path should mirror it:

Suggested change

if person is None:

person = user_db.get_person_by_name(uid, detected_name)

if person:

person_id = person['id']

detected_name = person['name']

if person is None:

person = user_db.get_person_by_name(uid, detected_name)

if person:

known_people_by_name_key[normalize_person_name_key(person['name'])] = person

greptile-apps · 2026-03-28T20:45:10Z

backend/utils/speaker_identification.py

+SELF_REFERENCE_KNOWN_NAME_PATTERNS = [
+    r"(?:i am|i'm|my name is|this is|it is|it's|call me|name is)\s+{name}",
+    r"{name}\s+(?:here|speaking)",
+]


Known-name patterns are English-only, silently skipping 32 other languages

SELF_REFERENCE_KNOWN_NAME_PATTERNS uses exclusively English introduction phrases (i am, this is, my name is, etc.). The 33 language patterns in SPEAKER_IDENTIFICATION_PATTERNS handle non-English phrases but require uppercase ASR output. If a French ASR produces "je suis pierre" and "Pierre" is in the user's contacts, neither path will match.

Consider adding a comment to document this scope explicitly so future contributors don't assume cross-language support:

# NOTE: These patterns are English-only. Lowercase-ASR normalization for other # languages is not yet supported — non-English patterns in # SPEAKER_IDENTIFICATION_PATTERNS still require title-cased ASR output. SELF_REFERENCE_KNOWN_NAME_PATTERNS = [ ... ]

rbxict · 2026-03-28T20:49:42Z

Follow-up pushed on head 2ff07e1 to address the review notes:\n\n- fixed the possessive false-positive in known-name detection ( his is bob's ... no longer resolves to Bob)\n- cached successful get_person_by_name fallback hits back into the session map to avoid repeated Firestore reads\n- documented the known-name fallback as an English-only heuristic for now\n- added a regression test for the possessive case\n\nValidation in this environment:\n- python -m py_compile utils/speaker_identification.py routers/transcribe.py tests/unit/test_speaker_id_pipeline.py\n- targeted inline Python smoke checks for known-name matching, including the possessive regression

rbxict · 2026-03-29T22:01:33Z

Follow-up pushed on the current head to close the remaining real regression here. The known-name fallback no longer treats possessive phrases like \ his is bob's phone\ or \ his is bob’s phone\ as a self-introduction, and I added a regression check for both variants. Re-verified with \python -m py_compile backend/utils/speaker_identification.py backend/routers/transcribe.py backend/tests/unit/test_speaker_id_pipeline.py\ plus a targeted inline Python smoke check of the known-name path in this environment.

beastoin · 2026-03-30T02:16:30Z

Closing — unsolicited AI-generated PR. Please open an issue or discuss before submitting PRs. by AI for @beastoin

github-actions · 2026-03-30T02:16:41Z

Hey @rbxict 👋

Thank you so much for taking the time to contribute to Omi! We truly appreciate you putting in the effort to submit this pull request.

After careful review, we've decided not to merge this particular PR. Please don't take this personally — we genuinely try to merge as many contributions as possible, but sometimes we have to make tough calls based on:

Project standards — Ensuring consistency across the codebase
User needs — Making sure changes align with what our users need
Code best practices — Maintaining code quality and maintainability
Project direction — Keeping aligned with our roadmap and vision

Your contribution is still valuable to us, and we'd love to see you contribute again in the future! If you'd like feedback on how to improve this PR or want to discuss alternative approaches, please don't hesitate to reach out.

Thank you for being part of the Omi community! 💜

feat(backend): improve transcript speaker detection for known people

27750cc

greptile-apps bot reviewed Mar 28, 2026

View reviewed changes

fix(backend): close speaker detection review gaps

2ff07e1

rbxict added 2 commits March 29, 2026 01:43

fix: reuse multilingual speaker name patterns

d66c775

fix(backend): reject possessive known-name matches

7e79ea4

beastoin closed this Mar 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: solution for issue #3039#6136

feat: solution for issue #3039#6136
rbxict wants to merge 4 commits intoBasedHardware:mainfrom
rbxict:bounty-fix-3039

rbxict commented Mar 28, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Mar 28, 2026

Uh oh!

greptile-apps bot Mar 28, 2026

Uh oh!

greptile-apps bot Mar 28, 2026

Uh oh!

greptile-apps bot Mar 28, 2026

Uh oh!

rbxict commented Mar 28, 2026

Uh oh!

rbxict commented Mar 29, 2026

Uh oh!

beastoin commented Mar 30, 2026

Uh oh!

github-actions bot commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	if re.search(rf"\b{pattern.format(name=escaped_name)}\b", normalized_text):
	if re.search(rf"\b{pattern.format(name=escaped_name)}\b(?!['\u2019]s)", normalized_text):

Conversation

rbxict commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why this approach

Testing

Payout Details

Uh oh!

greptile-apps bot commented Mar 28, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps bot Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

rbxict commented Mar 28, 2026

Uh oh!

rbxict commented Mar 29, 2026

Uh oh!

beastoin commented Mar 30, 2026

Uh oh!

github-actions bot commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rbxict commented Mar 28, 2026 •

edited

Loading