Skip to content

feat: enhance daily digest logic for send/log integrity and location …#80

Merged
TheTrueAI merged 2 commits intomainfrom
r7
Mar 19, 2026
Merged

feat: enhance daily digest logic for send/log integrity and location …#80
TheTrueAI merged 2 commits intomainfrom
r7

Conversation

@TheTrueAI
Copy link
Owner

…relevance.

Closes #44

Copilot AI review requested due to automatic review settings March 19, 2026 21:41
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Improves the daily digest pipeline to better satisfy Issue #44 by enforcing per-subscriber location relevance during evaluation and tightening send/log semantics to reduce “sent log ≠ delivered email” mismatches and improve retry idempotency.

Changes:

  • Track job URLs per normalized location bucket and only evaluate unseen jobs from a subscriber’s own location bucket.
  • Split evaluated jobs into “good matches” vs “low score” and adjust logging so good matches are logged only after a successful send (low-score jobs are logged even on send failure).
  • Add tests covering send-failure logging, retry behavior, and location relevance; mark R7 roadmap steps as done.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
daily_task.py Implements location-bucket URL tracking and adjusts send/log ordering & retry semantics.
tests/test_daily_task.py Adds regression tests for send/log integrity, retry behavior, and location relevance filtering.
docs/strategy/ROADMAP.md Marks Issue #44 (R7) steps as completed.

daily_task.py Outdated
Comment on lines +275 to +279
# Send succeeded — log ALL evaluated jobs and mark subscriber as sent
all_eval_ids = low_score_ids + good_match_ids
if all_eval_ids:
log_sent_jobs(db, sub_id, all_eval_ids)
mark_subscriber_last_sent(db, sub_id)
Copy link

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mark_subscriber_last_sent() is now called after log_sent_jobs(). If log_sent_jobs() fails (DB/network issue), the subscriber won't get last_sent_at updated, which can cause duplicate sends for weekly cadence (and reruns in general) even though the email already went out. Consider marking last_sent_at immediately after a successful send_daily_digest() and treating log_sent_jobs() as best-effort (e.g., wrap/log exceptions) so a logging failure doesn’t lead to repeat emails.

Suggested change
# Send succeeded — log ALL evaluated jobs and mark subscriber as sent
all_eval_ids = low_score_ids + good_match_ids
if all_eval_ids:
log_sent_jobs(db, sub_id, all_eval_ids)
mark_subscriber_last_sent(db, sub_id)
# Send succeeded — first mark subscriber as sent, then best-effort log ALL evaluated jobs
try:
mark_subscriber_last_sent(db, sub_id)
except Exception:
log.exception(
" sub=%s — failed to mark last_sent_at; subscriber may receive duplicate digests",
sub_id,
)
all_eval_ids = low_score_ids + good_match_ids
if all_eval_ids:
try:
log_sent_jobs(db, sub_id, all_eval_ids)
except Exception:
log.exception(
" sub=%s — failed to log sent jobs; will retry evaluation next run",
sub_id,
)

Copilot uses AI. Check for mistakes.
daily_task.py Outdated
Comment on lines +212 to +223
# Split evaluated jobs by score threshold.
# Low-score IDs are always safe to log (we never want to re-evaluate them).
# Good-match IDs are only logged after a successful send so they retry
# on the next run if the email fails.
ej_urls = {id(ej): _job_url(ej) for ej in evaluated}
good_matches = [ej for ej in evaluated if ej.evaluation.score >= sub_min_score]
low_score_ids = [
url_to_db_id[ej_urls[id(ej)]]
for ej in evaluated
if ej.evaluation.score < sub_min_score and ej_urls[id(ej)] in url_to_db_id
]
good_match_ids = [url_to_db_id[ej_urls[id(ej)]] for ej in good_matches if ej_urls[id(ej)] in url_to_db_id]
Copy link

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ej_urls = {id(ej): _job_url(ej) for ej in evaluated} pattern is hard to follow and relies on object identity as a key. It would be clearer (and less error-prone) to compute the URL alongside each EvaluatedJob (e.g., build a list of (ej, url) pairs or a direct ej -> url mapping) and reuse that when building *_ids and email_jobs.

Copilot uses AI. Check for mistakes.
daily_task.py Outdated
@@ -113,6 +113,8 @@ def main() -> int:
# ── 5. Search once per unique (query-set, location) ──────────────────
# Collect all jobs keyed by title|company to avoid duplication
Copy link

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says jobs are keyed by title|company, but the actual dedupe key includes title|company_name|location. Updating the comment would prevent confusion when reasoning about deduping and cache/log keys.

Suggested change
# Collect all jobs keyed by title|company to avoid duplication
# Collect all jobs keyed by title|company_name|location to avoid duplication

Copilot uses AI. Check for mistakes.
daily_task.py Outdated
unseen_urls = [url for url in all_urls if url_to_db_id.get(url) and url_to_db_id[url] not in sent_ids]
sub_loc = normalize_location(sub.get("target_location") or "")
sub_urls = location_urls.get(sub_loc, set())
unseen_urls = [url for url in sub_urls if url_to_db_id.get(url) and url_to_db_id[url] not in sent_ids]
Copy link

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

location_urls is a set, so sub_urls iteration order is non-deterministic. That makes the evaluation/email job ordering non-reproducible across runs, which can complicate debugging and lead to flaky behavior if downstream assumes stable ordering. Consider sorting unseen_urls (or storing URLs in a deterministic sequence) before building unseen_jobs/calling evaluate_all_jobs.

Suggested change
unseen_urls = [url for url in sub_urls if url_to_db_id.get(url) and url_to_db_id[url] not in sent_ids]
unseen_urls = sorted(
url
for url in sub_urls
if url_to_db_id.get(url) and url_to_db_id[url] not in sent_ids
)

Copilot uses AI. Check for mistakes.
@TheTrueAI TheTrueAI merged commit a80bf1a into main Mar 19, 2026
4 checks passed
@TheTrueAI TheTrueAI deleted the r7 branch March 19, 2026 21:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Digest correctness: location relevance + send/log integrity + idempotency

2 participants