feat: add per-SP piece cleanup job to bound storage growth#346
feat: add per-SP piece cleanup job to bound storage growth#346Chaitu-Tatipamula wants to merge 8 commits intoFilOzone:mainfrom
Conversation
…ce cleanup - Add MAX_DATASET_STORAGE_SIZE_BYTES (24 GiB default), JOB_PIECE_CLEANUP_PER_SP_PER_HOUR (default 1), MAX_PIECE_CLEANUP_RUNTIME_SECONDS (default 300s) to app.config.ts - Add IPieceCleanupConfig and extend IJobsConfig with cleanup rate/timeout fields - Add cleanedUp and cleanedUpAt columns to Deal entity - Add piece_cleanup to JobType union in job-schedule-state entity - Add PIECE_CLEANUP_QUEUE constant to job-queues.ts - Update docs/environment-variables.md with new variable documentation
- Add PieceCleanupService with while-loop cleanup until under quota - Implement idempotent deletion (not found / already deleted = success) - Add no-progress bail-out guard to prevent infinite loops - Add isProviderOverQuota() for deal creation gating - Add over-quota gating in handleDealJob (skip deal if SP above quota) - Use jobs.maxPieceCleanupRuntimeSeconds for timeout in handlePieceCleanupJob - Add PieceCleanupModule to JobsModule and AppModule imports - Update jobs.service.spec.ts config mock for new structure - Add 23 unit tests covering all edge cases
- Removed unused `PIECE_CLEANUP_QUEUE` constant. Piece cleanup jobs correctly run on the shared `sp.work` singleton queue. - Fixed a timeout leak in `handleDealJob` by moving the SP over-quota check before setting up the `AbortController` timer. - Optimized `cleanupPiecesForProvider` to create a single `StorageContext` before the batch deletion loop.
c8ea542 to
c98b174
Compare
There was a problem hiding this comment.
Pull request overview
Adds a new “piece cleanup” mechanism to automatically delete old pieces from storage providers when they exceed a configured quota, and integrates it into the pg-boss job scheduler (including optional deal-job gating when over quota).
Changes:
- Introduces
PieceCleanupService(+ module + tests) to compute stored bytes per SP and delete oldest pieces until back under quota. - Extends pg-boss scheduling/worker handling with a new per-SP
piece_cleanupjob type and adds deal-job over-quota gating. - Adds new env/config knobs and documentation for quota size, cleanup rate, and cleanup runtime cap.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| docs/environment-variables.md | Documents new env vars for quota + cleanup scheduling/runtime. |
| apps/backend/src/piece-cleanup/piece-cleanup.service.ts | Implements quota check + cleanup loop and Synapse delete calls. |
| apps/backend/src/piece-cleanup/piece-cleanup.service.spec.ts | Adds unit tests for stored-bytes calc, cleanup looping, abort/idempotency. |
| apps/backend/src/piece-cleanup/piece-cleanup.module.ts | Wires cleanup service with TypeORM Deal repository + Wallet SDK. |
| apps/backend/src/jobs/jobs.service.ts | Adds piece_cleanup job handling, schedules it per SP, and gates deal jobs when over quota. |
| apps/backend/src/jobs/jobs.service.spec.ts | Updates schedule-row expectations and constructor deps for new service/config. |
| apps/backend/src/jobs/jobs.module.ts | Imports PieceCleanupModule so JobsService can use it. |
| apps/backend/src/database/entities/job-schedule-state.entity.ts | Extends JobType union to include piece_cleanup. |
| apps/backend/src/database/entities/deal.entity.ts | Adds cleanedUp / cleanedUpAt fields to track deletions. |
| apps/backend/src/config/app.config.ts | Adds Joi validation + config loading for new piece cleanup env vars. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| .select("COALESCE(SUM(deal.piece_size), 0)", "totalBytes") | ||
| .where("deal.sp_address = :spAddress", { spAddress }) | ||
| .andWhere("deal.status = :status", { status: DealStatus.DEAL_CREATED }) | ||
| .andWhere("deal.piece_id IS NOT NULL") |
There was a problem hiding this comment.
getStoredBytesForProvider() sums only deal.piece_size and doesn’t exclude rows with missing data_set_id, but cleanup candidates are filtered by dataSetId IS NOT NULL and bytesRemoved falls back to fileSize. This can undercount stored bytes (if piece_size is NULL/0) and/or treat non-cleanable rows as contributing to quota, leading to incorrect over-quota gating and cleanup behavior. Align the stored-bytes query with candidate selection (e.g., include data_set_id IS NOT NULL and sum COALESCE(piece_size, file_size) or whatever reflects actual stored bytes).
| .select("COALESCE(SUM(deal.piece_size), 0)", "totalBytes") | |
| .where("deal.sp_address = :spAddress", { spAddress }) | |
| .andWhere("deal.status = :status", { status: DealStatus.DEAL_CREATED }) | |
| .andWhere("deal.piece_id IS NOT NULL") | |
| .select("COALESCE(SUM(COALESCE(deal.piece_size, deal.file_size)), 0)", "totalBytes") | |
| .where("deal.sp_address = :spAddress", { spAddress }) | |
| .andWhere("deal.status = :status", { status: DealStatus.DEAL_CREATED }) | |
| .andWhere("deal.piece_id IS NOT NULL") | |
| .andWhere("deal.data_set_id IS NOT NULL") |
There was a problem hiding this comment.
i agree with the alignment but, we're intentionally not adding COALESCE(piece_size, file_size) to the sum query
file_size is the raw data size before Filecoin sealing. piece_size is the actual padded on-chain piece size this is what physically occupies disk, a 10 mb file becomes a 16 mb piece after padding, so file_size consistently undercounts real disk usage. using it in a quota calculation would make the quota metric less accurate.
for a deal to reach DEAL_CREATED status with a valid piece_id, piece_size should always be populated by upload. If it's somehow null, that can be a data integrity issue.
i am gonna remove the fileSize fallback from the cleanup loop's bytesRemoved counter. Since piece_size is always populated alongside piece_id and DEAL_CREATED status, a piece_size = 0 candidate should never happen. if it somehow does, crediting 0 bytes and letting the no-progress bail-out terminate the loop should be the correct behavior.
let me know if there should be any corrections! @BigLep
- Drop dataSetId from the deletePiece - Align getStoredBytesForProvider with getCleanupCandidates - Add unit tests for over-quota gating in handleDealJob
d1b67f7 to
997a460
Compare
…ounter pieceSize is set atomically with pieceId and DEAL_CREATED status by the upload pipeline. There is no real scenario where a cleanup candidate has pieceSize = 0.
8926d91 to
a2a9bfc
Compare
…ot a function" warnings
|
NOte we may be delayed looking at this since it's not a priority for the M4.1 mainnet readiness milestone we're working on. |
Summary
Adds a piece_cleanup job type that periodically deletes the oldest pieces from each SP once total stored data exceeds a configurable quota (
MAX_DATASET_STORAGE_SIZE_BYTES, default 24GiB). Cleanup runs on the existing sp.work queue with singleton-per-SP semantics, ensuring it doesn't overlap with deal/retrieval work.Closes #281
What it does
FIFO) until SP is back under quotaMAX_PIECE_CLEANUP_RUNTIME_SECONDS(default 300s)JOB_PIECE_CLEANUP_PER_SP_PER_HOUR(default 1), consistent withDEALS_PER_SP_PER_HOURpatternChanges
Config & Entity
MAX_DATASET_STORAGE_SIZE_BYTES,JOB_PIECE_CLEANUP_PER_SP_PER_HOUR,MAX_PIECE_CLEANUP_RUNTIME_SECONDScleaned_up(bool) andcleaned_up_at(timestamptz) columnsJobTypeunionCore Service (
piece-cleanup/)PieceCleanupService— cleanup loop, quota check, Synapse SDK integrationisProviderOverQuota()— used by deal handler to gate new dealsJobs Integration (
jobs/)handlePieceCleanupJobwith AbortController timeouthandleDealJobgetIntervalSecondsForRatesensureScheduleRowsDocumentation
docs/environment-variables.md— 3 new env varsdocs/jobs.md— piece_cleanup in job types table, capacity formuladocs/runbooks/jobs.md— pause/resume/trigger SQL examplesdocs/checks/production-configuration-and-approval-methodology.md— updated FAQTesting
piece-cleanup.service.spec.tsjobs.service.spec.tsfor new config structurethings to be considered