Skip to content

feat(cli): CPLYTM-1291 content sync tool org wide doc#2

Open
sonupreetam wants to merge 11 commits intocomplytime:mainfrom
sonupreetam:006-go-sync-tool
Open

feat(cli): CPLYTM-1291 content sync tool org wide doc#2
sonupreetam wants to merge 11 commits intocomplytime:mainfrom
sonupreetam:006-go-sync-tool

Conversation

@sonupreetam
Copy link
Contributor

@sonupreetam sonupreetam commented Mar 13, 2026

Summary

Adds a Go CLI tool (cmd/sync-content/) that syncs all repos registered in the complytime governance registry (peribolos.yaml), fetches README content and metadata via the GitHub REST API, applies Markdown transforms, and generates Hugo-compatible project pages and landing page card data. A declarative config overlay (sync-config.yaml) provides precision control for repos needing custom documentation layouts.

Key capabilities

  • Governance-driven repo listing: repos are sourced from peribolos.yaml — new repos appear on the site when added to the governance registry
  • Config-driven precision sync: sync-config.yaml controls per-repo file destinations, frontmatter injection, and transforms (strip_badges, rewrite_links, inject_frontmatter)
  • Content approval gate: .content-lock.json pins each repo to an approved branch SHA; production deploys fetch content at locked SHAs only
  • Two-tier SHA change detection: branch SHA for fast pre-filtering, README SHA for content-level accuracy
  • Dry-run by default: --write flag required for any disk I/O
  • Concurrent processing: bounded worker pool (--workers) with race-safe implementation
  • Stale content cleanup: manifest-based orphan tracking removes all generated files when repos are removed
  • CI integration: GITHUB_OUTPUT variables and GITHUB_STEP_SUMMARY for GitHub Actions

CLI interface

Flag Default Description
--org complytime GitHub organization to scan
--token $GITHUB_TOKEN GitHub API token (or set env var)
--config (none) Path to sync-config.yaml for config-driven file syncs
--write false Required to write files to disk (default: dry-run)
--output . Hugo site root directory
--workers 5 Max concurrent repo processing goroutines
--timeout 3m Overall timeout for all API operations
--include (all) Comma-separated repo allowlist
--exclude (see config) Comma-separated repo names to skip
--repo (none) Sync only this repo (e.g., complytime/complyctl)
--summary (none) Write markdown change summary to file
--lock (none) Path to .content-lock.json for content approval gating
--update-lock false Write current upstream SHAs to lockfile (requires --lock)

Output structure

content/docs/projects/
├── _index.md                 # Hand-maintained section index (committed)
└── {repo}/                   # Generated per-repo content (gitignored)
    ├── _index.md             # Section index — frontmatter only, no body
    ├── overview.md           # README content as child page (weight: 1)
    └── {doc}.md              # Doc pages from discovery.scan_paths

data/
└── projects.json             # Landing page project cards (gitignored)

.sync-manifest.json           # Written file manifest for orphan cleanup (gitignored)
.content-lock.json            # Approved upstream SHAs per repo (committed)

Reviewer guide

Pre-reading (recommended before reviewing code):

  1. specs/006-go-sync-tool/spec.md — full scope, user stories, acceptance criteria
  2. sync-config.yaml — the actual config file
  3. Reference document: sync-content

Suggested review order (commits are layered bottom-up by dependency):

Commit Scope What to look for
1 Config, path, lock + tests Types, YAML parsing, path traversal guards
2 GitHub API, manifest, discovery + tests REST client, httptest stubs, rate limiting
3 Hugo output, transforms, cleanup + layouts Frontmatter gen, link rewriting, Hugo templates
4 Sync orchestration + main refactor Worker pool, end-to-end flow, main.go slimdown
5 CI workflows, sync-config, lockfile GitHub Actions, deploy pipeline integration

Workflow runs (manual dispatch on fork — all green):

Workflow Run
CI #23267705888
Deploy Hugo to GitHub Pages #23267713720
Content Sync Check #23485842117

Test plan

  • go vet ./... passes
  • gofmt -l ./cmd/sync-content/ reports no unformatted files
  • go test -race ./cmd/sync-content/... passes (57 test functions, 10 test files)
  • Dry-run produces zero files: go run ./cmd/sync-content --org complytime --config sync-config.yaml
  • Write mode generates correct output: go run ./cmd/sync-content --org complytime --config sync-config.yaml --write
  • Hugo builds with zero errors after sync: hugo --minify --gc
  • CI workflow (ci.yml) passes on this PR
  • --lock skips repos not in .content-lock.json

@sonupreetam sonupreetam force-pushed the 006-go-sync-tool branch 4 times, most recently from cd090d6 to d8e0bf2 Compare March 13, 2026 19:33
@sonupreetam sonupreetam changed the title feat(sync): add Go content sync tool for org-wide documentation feat(sync): CPLYTM-1291 content sync tool for org-wide doc Mar 13, 2026
@sonupreetam sonupreetam marked this pull request as draft March 13, 2026 23:35
@sonupreetam sonupreetam force-pushed the 006-go-sync-tool branch 2 times, most recently from ef80241 to 7726b55 Compare March 16, 2026 13:03
Copy link

@marcusburghardt marcusburghardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sonupreetam , I shared some initial considerations. I think the most important would be to rely on peribolos instead of an API call. The reasons are that we may not want to include everything, such as private repositories or eventual testing repositories (it may happen). But everything that is defined in peribolos is formally used so we need less filters and exceptions. This could also simplify the logic and reduce the permission of the token.

@sonupreetam sonupreetam force-pushed the 006-go-sync-tool branch 2 times, most recently from 0f5b310 to b3ac7d2 Compare March 18, 2026 23:24
@sonupreetam sonupreetam marked this pull request as ready for review March 19, 2026 08:48
@sonupreetam sonupreetam requested a review from a team March 19, 2026 12:50
@sonupreetam
Copy link
Contributor Author

@marcusburghardt Thank you for your feedbacks, I have taken care of your suggestions.

AlexXuan233
AlexXuan233 previously approved these changes Mar 23, 2026
Copy link
Member

@AlexXuan233 AlexXuan233 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link

@marcusburghardt marcusburghardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have only two minor comments that could be good but not blocking the PR. Up to you @sonupreetam to incorporate them or not. : ) Thanks

@sonupreetam sonupreetam dismissed stale reviews from marcusburghardt and AlexXuan233 via 4a59692 March 24, 2026 10:29
@sonupreetam sonupreetam force-pushed the 006-go-sync-tool branch 3 times, most recently from 98340e9 to 7c5d33c Compare March 24, 2026 11:03
@sonupreetam sonupreetam changed the title feat(sync): CPLYTM-1291 content sync tool for org-wide doc feat(cmd): CPLYTM-1291 content sync tool for org wide doc Mar 24, 2026
@sonupreetam sonupreetam changed the title feat(cmd): CPLYTM-1291 content sync tool for org wide doc feat(cli): CPLYTM-1291 content sync tool org wide doc Mar 24, 2026
@marcusburghardt
Copy link

@sonupreetam , there is a merge commit there. I don't think it was intentional. Could you confirm, please?

@marcusburghardt
Copy link

It seems we also need a configuration file for mega-linter since the default is enabling more checks than we need.

@sonupreetam sonupreetam force-pushed the 006-go-sync-tool branch 2 times, most recently from 9d9321c to dd8ea4e Compare March 25, 2026 08:28
…ment

Signed-off-by: Sonu Preetam <spreetam@redhat.com>
…covery

Signed-off-by: Sonu Preetam <spreetam@redhat.com>
…cleanup

Signed-off-by: Sonu Preetam <spreetam@redhat.com>
Signed-off-by: Sonu Preetam <spreetam@redhat.com>
…egration

Signed-off-by: Sonu Preetam <spreetam@redhat.com>
Signed-off-by: Sonu Preetam <spreetam@redhat.com>
Signed-off-by: Sonu Preetam <spreetam@redhat.com>
Signed-off-by: Sonu Preetam <spreetam@redhat.com>
Signed-off-by: Sonu Preetam <spreetam@redhat.com>
Signed-off-by: Sonu Preetam <spreetam@redhat.com>
Signed-off-by: Sonu Preetam <spreetam@redhat.com>
Copy link

@marcusburghardt marcusburghardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants