feat: add per-SP piece cleanup job to bound storage growth by Chaitu-Tatipamula · Pull Request #346 · FilOzone/dealbot

Chaitu-Tatipamula · 2026-03-10T22:10:06Z

Summary

Adds a piece_cleanup job type that periodically deletes the oldest pieces from each SP once total stored data exceeds a configurable quota (MAX_DATASET_STORAGE_SIZE_BYTES, default 24 GiB). Cleanup runs on the existing sp.work queue with singleton-per-SP semantics, ensuring it doesn't overlap with deal/retrieval work.

Closes #281

What it does

Quota-based cleanup: Deletes oldest pieces (FIFO) until SP is back under quota
Over-quota gating: Skips new deal creation for SPs that exceed the storage quota
Idempotent deletion: Treats contract reverts for already-deleted pieces as success and continues
Runtime cap: Aborts cleanup after MAX_PIECE_CLEANUP_RUNTIME_SECONDS (default 300s)
Rate-based scheduling: JOB_PIECE_CLEANUP_PER_SP_PER_HOUR (default 1), consistent with DEALS_PER_SP_PER_HOUR pattern

Changes

Config & Entity

New env vars: MAX_DATASET_STORAGE_SIZE_BYTES, JOB_PIECE_CLEANUP_PER_SP_PER_HOUR, MAX_PIECE_CLEANUP_RUNTIME_SECONDS
Deal entity: cleaned_up (bool) and cleaned_up_at (timestamptz) columns
piece_cleanup added to JobType union

Core Service (`piece-cleanup/`)

PieceCleanupService — cleanup loop, quota check, Synapse SDK integration
isProviderOverQuota() — used by deal handler to gate new deals
Shared Synapse instance with on-demand fallback
No-progress bail-out to prevent infinite loops when all deletions fail

Jobs Integration (`jobs/`)

handlePieceCleanupJob with AbortController timeout
Over-quota check in handleDealJob
Rate-based interval in getIntervalSecondsForRates
Always-scheduled in ensureScheduleRows

Documentation

docs/environment-variables.md — 3 new env vars
docs/jobs.md — piece_cleanup in job types table, capacity formula
docs/runbooks/jobs.md — pause/resume/trigger SQL examples
docs/checks/production-configuration-and-approval-methodology.md — updated FAQ

Testing

24 new tests in piece-cleanup.service.spec.ts
Updated jobs.service.spec.ts for new config structure
292/292 tests pass, biome ci clean

things to be considered

Soft-delete vs hard-delete: Currently soft-deletes deal rows (cleaned_up=true).
Failure backoff: Currently logs piece_cleanup_no_progress and retries next hour. maybe we can alert when this event appears like 3+ times in a cleanup interval (1 hour)

…ce cleanup - Add MAX_DATASET_STORAGE_SIZE_BYTES (24 GiB default), JOB_PIECE_CLEANUP_PER_SP_PER_HOUR (default 1), MAX_PIECE_CLEANUP_RUNTIME_SECONDS (default 300s) to app.config.ts - Add IPieceCleanupConfig and extend IJobsConfig with cleanup rate/timeout fields - Add cleanedUp and cleanedUpAt columns to Deal entity - Add piece_cleanup to JobType union in job-schedule-state entity - Add PIECE_CLEANUP_QUEUE constant to job-queues.ts - Update docs/environment-variables.md with new variable documentation

- Add PieceCleanupService with while-loop cleanup until under quota - Implement idempotent deletion (not found / already deleted = success) - Add no-progress bail-out guard to prevent infinite loops - Add isProviderOverQuota() for deal creation gating - Add over-quota gating in handleDealJob (skip deal if SP above quota) - Use jobs.maxPieceCleanupRuntimeSeconds for timeout in handlePieceCleanupJob - Add PieceCleanupModule to JobsModule and AppModule imports - Update jobs.service.spec.ts config mock for new structure - Add 23 unit tests covering all edge cases

- Removed unused `PIECE_CLEANUP_QUEUE` constant. Piece cleanup jobs correctly run on the shared `sp.work` singleton queue. - Fixed a timeout leak in `handleDealJob` by moving the SP over-quota check before setting up the `AbortController` timer. - Optimized `cleanupPiecesForProvider` to create a single `StorageContext` before the batch deletion loop.

Copilot

Pull request overview

Adds a new “piece cleanup” mechanism to automatically delete old pieces from storage providers when they exceed a configured quota, and integrates it into the pg-boss job scheduler (including optional deal-job gating when over quota).

Changes:

Introduces PieceCleanupService (+ module + tests) to compute stored bytes per SP and delete oldest pieces until back under quota.
Extends pg-boss scheduling/worker handling with a new per-SP piece_cleanup job type and adds deal-job over-quota gating.
Adds new env/config knobs and documentation for quota size, cleanup rate, and cleanup runtime cap.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
docs/environment-variables.md	Documents new env vars for quota + cleanup scheduling/runtime.
apps/backend/src/piece-cleanup/piece-cleanup.service.ts	Implements quota check + cleanup loop and Synapse delete calls.
apps/backend/src/piece-cleanup/piece-cleanup.service.spec.ts	Adds unit tests for stored-bytes calc, cleanup looping, abort/idempotency.
apps/backend/src/piece-cleanup/piece-cleanup.module.ts	Wires cleanup service with TypeORM Deal repository + Wallet SDK.
apps/backend/src/jobs/jobs.service.ts	Adds `piece_cleanup` job handling, schedules it per SP, and gates deal jobs when over quota.
apps/backend/src/jobs/jobs.service.spec.ts	Updates schedule-row expectations and constructor deps for new service/config.
apps/backend/src/jobs/jobs.module.ts	Imports `PieceCleanupModule` so JobsService can use it.
apps/backend/src/database/entities/job-schedule-state.entity.ts	Extends `JobType` union to include `piece_cleanup`.
apps/backend/src/database/entities/deal.entity.ts	Adds `cleanedUp` / `cleanedUpAt` fields to track deletions.
apps/backend/src/config/app.config.ts	Adds Joi validation + config loading for new piece cleanup env vars.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

apps/backend/src/database/entities/deal.entity.ts

Copilot · 2026-03-11T00:00:48Z

apps/backend/src/piece-cleanup/piece-cleanup.service.ts

+      .select("COALESCE(SUM(deal.piece_size), 0)", "totalBytes")
+      .where("deal.sp_address = :spAddress", { spAddress })
+      .andWhere("deal.status = :status", { status: DealStatus.DEAL_CREATED })
+      .andWhere("deal.piece_id IS NOT NULL")


getStoredBytesForProvider() sums only deal.piece_size and doesn’t exclude rows with missing data_set_id, but cleanup candidates are filtered by dataSetId IS NOT NULL and bytesRemoved falls back to fileSize. This can undercount stored bytes (if piece_size is NULL/0) and/or treat non-cleanable rows as contributing to quota, leading to incorrect over-quota gating and cleanup behavior. Align the stored-bytes query with candidate selection (e.g., include data_set_id IS NOT NULL and sum COALESCE(piece_size, file_size) or whatever reflects actual stored bytes).

Suggested change

.select("COALESCE(SUM(deal.piece_size), 0)", "totalBytes")

.where("deal.sp_address = :spAddress", { spAddress })

.andWhere("deal.status = :status", { status: DealStatus.DEAL_CREATED })

.andWhere("deal.piece_id IS NOT NULL")

.select("COALESCE(SUM(COALESCE(deal.piece_size, deal.file_size)), 0)", "totalBytes")

.where("deal.sp_address = :spAddress", { spAddress })

.andWhere("deal.status = :status", { status: DealStatus.DEAL_CREATED })

.andWhere("deal.piece_id IS NOT NULL")

.andWhere("deal.data_set_id IS NOT NULL")

i agree with the alignment but, we're intentionally not adding COALESCE(piece_size, file_size) to the sum query

file_size is the raw data size before Filecoin sealing. piece_size is the actual padded on-chain piece size this is what physically occupies disk, a 10 mb file becomes a 16 mb piece after padding, so file_size consistently undercounts real disk usage. using it in a quota calculation would make the quota metric less accurate.

for a deal to reach DEAL_CREATED status with a valid piece_id, piece_size should always be populated by upload. If it's somehow null, that can be a data integrity issue.

i am gonna remove the fileSize fallback from the cleanup loop's bytesRemoved counter. Since piece_size is always populated alongside piece_id and DEAL_CREATED status, a piece_size = 0 candidate should never happen. if it somehow does, crediting 0 bytes and letting the no-progress bail-out terminate the loop should be the correct behavior.

let me know if there should be any corrections! @BigLep

apps/backend/src/piece-cleanup/piece-cleanup.service.ts

apps/backend/src/jobs/jobs.service.ts

- Drop dataSetId from the deletePiece - Align getStoredBytesForProvider with getCleanupCandidates - Add unit tests for over-quota gating in handleDealJob

…ounter pieceSize is set atomically with pieceId and DEAL_CREATED status by the upload pipeline. There is no real scenario where a cleanup candidate has pieceSize = 0.

…ot a function" warnings

BigLep · 2026-03-12T15:31:02Z

NOte we may be delayed looking at this since it's not a priority for the M4.1 mainnet readiness milestone we're working on.

Chaitu-Tatipamula added 2 commits March 11, 2026 01:14

FilOzzy added this to FOC Mar 10, 2026

github-project-automation bot moved this to 📌 Triage in FOC Mar 10, 2026

Chaitu-Tatipamula force-pushed the feat/piece-cleanup branch from c8ea542 to c98b174 Compare March 10, 2026 22:47

BigLep requested a review from Copilot March 10, 2026 23:54

BigLep moved this from 📌 Triage to 🔎 Awaiting review in FOC Mar 10, 2026

Copilot started reviewing on behalf of BigLep March 10, 2026 23:55 View session

Copilot AI reviewed Mar 11, 2026

View reviewed changes

Chaitu-Tatipamula added 2 commits March 11, 2026 12:01

fix(piece-cleanup): address PR review feedback on cleanup service

fc7da67

- Drop dataSetId from the deletePiece - Align getStoredBytesForProvider with getCleanupCandidates - Add unit tests for over-quota gating in handleDealJob

chore(db): add migration for piece cleanup columns

997a460

Chaitu-Tatipamula force-pushed the feat/piece-cleanup branch from d1b67f7 to 997a460 Compare March 11, 2026 06:56

Chaitu-Tatipamula added 2 commits March 11, 2026 12:29

docs: document piece_cleanup job

105875f

refactor(piece-cleanup): remove fileSize fallback from bytesRemoved c…

a2a9bfc

…ounter pieceSize is set atomically with pieceId and DEAL_CREATED status by the upload pipeline. There is no real scenario where a cleanup candidate has pieceSize = 0.

Chaitu-Tatipamula force-pushed the feat/piece-cleanup branch from 8926d91 to a2a9bfc Compare March 11, 2026 07:52

test(jobs): add default isProviderOverQuota mock to prevent "... is n…

c134a04

…ot a function" warnings

BigLep assigned Chaitu-Tatipamula Mar 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add per-SP piece cleanup job to bound storage growth#346

feat: add per-SP piece cleanup job to bound storage growth#346
Chaitu-Tatipamula wants to merge 8 commits intoFilOzone:mainfrom
Chaitu-Tatipamula:feat/piece-cleanup

Chaitu-Tatipamula commented Mar 10, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI Mar 11, 2026

Uh oh!

Chaitu-Tatipamula Mar 11, 2026

Uh oh!

Uh oh!

Uh oh!

BigLep commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Chaitu-Tatipamula commented Mar 10, 2026

Summary

What it does

Changes

Config & Entity

Core Service (piece-cleanup/)

Jobs Integration (jobs/)

Documentation

Testing

things to be considered

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Chaitu-Tatipamula Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

BigLep commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Core Service (`piece-cleanup/`)

Jobs Integration (`jobs/`)