feat: enhance daily digest logic for send/log integrity and location … by TheTrueAI · Pull Request #80 · TheTrueAI/immermatch

TheTrueAI · 2026-03-19T21:41:30Z

…relevance.

Closes #44

…relevance

Copilot

Pull request overview

Improves the daily digest pipeline to better satisfy Issue #44 by enforcing per-subscriber location relevance during evaluation and tightening send/log semantics to reduce “sent log ≠ delivered email” mismatches and improve retry idempotency.

Changes:

Track job URLs per normalized location bucket and only evaluate unseen jobs from a subscriber’s own location bucket.
Split evaluated jobs into “good matches” vs “low score” and adjust logging so good matches are logged only after a successful send (low-score jobs are logged even on send failure).
Add tests covering send-failure logging, retry behavior, and location relevance; mark R7 roadmap steps as done.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File	Description
`daily_task.py`	Implements location-bucket URL tracking and adjusts send/log ordering & retry semantics.
`tests/test_daily_task.py`	Adds regression tests for send/log integrity, retry behavior, and location relevance filtering.
`docs/strategy/ROADMAP.md`	Marks Issue #44 (R7) steps as completed.

Copilot · 2026-03-19T21:45:27Z

daily_task.py

+        # Send succeeded — log ALL evaluated jobs and mark subscriber as sent
+        all_eval_ids = low_score_ids + good_match_ids
        if all_eval_ids:
            log_sent_jobs(db, sub_id, all_eval_ids)
+        mark_subscriber_last_sent(db, sub_id)


mark_subscriber_last_sent() is now called after log_sent_jobs(). If log_sent_jobs() fails (DB/network issue), the subscriber won't get last_sent_at updated, which can cause duplicate sends for weekly cadence (and reruns in general) even though the email already went out. Consider marking last_sent_at immediately after a successful send_daily_digest() and treating log_sent_jobs() as best-effort (e.g., wrap/log exceptions) so a logging failure doesn’t lead to repeat emails.

Suggested change

# Send succeeded — log ALL evaluated jobs and mark subscriber as sent

all_eval_ids = low_score_ids + good_match_ids

if all_eval_ids:

log_sent_jobs(db, sub_id, all_eval_ids)

mark_subscriber_last_sent(db, sub_id)

# Send succeeded — first mark subscriber as sent, then best-effort log ALL evaluated jobs

try:

mark_subscriber_last_sent(db, sub_id)

except Exception:

log.exception(

" sub=%s — failed to mark last_sent_at; subscriber may receive duplicate digests",

sub_id,

)

all_eval_ids = low_score_ids + good_match_ids

if all_eval_ids:

try:

log_sent_jobs(db, sub_id, all_eval_ids)

except Exception:

log.exception(

" sub=%s — failed to log sent jobs; will retry evaluation next run",

sub_id,

)

Copilot · 2026-03-19T21:45:28Z

daily_task.py

+        # Split evaluated jobs by score threshold.
+        # Low-score IDs are always safe to log (we never want to re-evaluate them).
+        # Good-match IDs are only logged after a successful send so they retry
+        # on the next run if the email fails.
+        ej_urls = {id(ej): _job_url(ej) for ej in evaluated}
        good_matches = [ej for ej in evaluated if ej.evaluation.score >= sub_min_score]
+        low_score_ids = [
+            url_to_db_id[ej_urls[id(ej)]]
+            for ej in evaluated
+            if ej.evaluation.score < sub_min_score and ej_urls[id(ej)] in url_to_db_id
+        ]
+        good_match_ids = [url_to_db_id[ej_urls[id(ej)]] for ej in good_matches if ej_urls[id(ej)] in url_to_db_id]


The ej_urls = {id(ej): _job_url(ej) for ej in evaluated} pattern is hard to follow and relies on object identity as a key. It would be clearer (and less error-prone) to compute the URL alongside each EvaluatedJob (e.g., build a list of (ej, url) pairs or a direct ej -> url mapping) and reuse that when building *_ids and email_jobs.

Copilot · 2026-03-19T21:45:28Z

daily_task.py

@@ -113,6 +113,8 @@ def main() -> int:
    # ── 5. Search once per unique (query-set, location) ──────────────────
    # Collect all jobs keyed by title|company to avoid duplication


The comment says jobs are keyed by title|company, but the actual dedupe key includes title|company_name|location. Updating the comment would prevent confusion when reasoning about deduping and cache/log keys.

Suggested change

# Collect all jobs keyed by title|company to avoid duplication

# Collect all jobs keyed by title|company_name|location to avoid duplication

Copilot · 2026-03-19T21:45:28Z

daily_task.py

-        unseen_urls = [url for url in all_urls if url_to_db_id.get(url) and url_to_db_id[url] not in sent_ids]
+        sub_loc = normalize_location(sub.get("target_location") or "")
+        sub_urls = location_urls.get(sub_loc, set())
+        unseen_urls = [url for url in sub_urls if url_to_db_id.get(url) and url_to_db_id[url] not in sent_ids]


location_urls is a set, so sub_urls iteration order is non-deterministic. That makes the evaluation/email job ordering non-reproducible across runs, which can complicate debugging and lead to flaky behavior if downstream assumes stable ordering. Consider sorting unseen_urls (or storing URLs in a deterministic sequence) before building unseen_jobs/calling evaluate_all_jobs.

Suggested change

unseen_urls = [url for url in sub_urls if url_to_db_id.get(url) and url_to_db_id[url] not in sent_ids]

unseen_urls = sorted(

url

for url in sub_urls

if url_to_db_id.get(url) and url_to_db_id[url] not in sent_ids

)

feat: enhance daily digest logic for send/log integrity and location …

20c1454

…relevance

Copilot AI review requested due to automatic review settings March 19, 2026 21:41

Copilot started reviewing on behalf of TheTrueAI March 19, 2026 21:41 View session

Copilot AI reviewed Mar 19, 2026

View reviewed changes

fix: address PR feedback — daily digest ordering and logging

062e23a

TheTrueAI merged commit a80bf1a into main Mar 19, 2026
4 checks passed

TheTrueAI deleted the r7 branch March 19, 2026 21:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: enhance daily digest logic for send/log integrity and location …#80

feat: enhance daily digest logic for send/log integrity and location …#80
TheTrueAI merged 2 commits intomainfrom
r7

TheTrueAI commented Mar 19, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 19, 2026

Uh oh!

Copilot AI Mar 19, 2026

Uh oh!

Copilot AI Mar 19, 2026

Uh oh!

Copilot AI Mar 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -113,6 +113,8 @@ def main() -> int:
		# ── 5. Search once per unique (query-set, location) ──────────────────
		# Collect all jobs keyed by title\|company to avoid duplication

Conversation

TheTrueAI commented Mar 19, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants