Skip to content

Add LAVA artifact upload via axum HTTP server (rework of #17)#116

Draft
Copilot wants to merge 4 commits intomainfrom
copilot/update-lava-jobs-artifact-upload
Draft

Add LAVA artifact upload via axum HTTP server (rework of #17)#116
Copilot wants to merge 4 commits intomainfrom
copilot/update-lava-jobs-artifact-upload

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 29, 2026

Reworks PR #17 against current main (clap, LazyLock, async_trait, etc.) and addresses all 7 review comments. LAVA jobs can POST artifact files to an ephemeral per-job URL; the runner collects them and includes them in the GitLab artifact archive.

Design

  • No global staticUploadServer is created in main() and threaded into Run via closure capture
  • No periodic cleanupJobArtifacts is an RAII handle; dropping it removes the job key from UploadStore automatically
  • No Arc<Mutex<JobArtifacts>> scatter — the HTTP handler sends uploads through UploadStore, which holds the inner state map; Run owns JobArtifacts directly
  • Single env varLAVA_ARTIFACT_UPLOAD_BASE_URL (full URL); server only starts if set. Optional LAVA_ARTIFACT_UPLOAD_LISTEN_ADDR (default 0.0.0.0:0)
  • Artifact path is just the filename — no {id}_artifacts/ prefix (artifacts only come from submit jobs)
  • Path sanitization — uploaded paths are validated via sanitize_artifact_path(): .., absolute paths, and empty paths are rejected; redundant . components are stripped
  • Disk-backed caching — files ≥ 1 MB are spilled to a NamedTempFile to avoid large in-memory allocations; smaller files remain as Bytes
  • Per-job upload limit — each job is capped at 1 GB of total artifact data (overwrites correctly adjust the running total); a matching DefaultBodyLimit is applied at the axum router level
  • Meaningful HTTP status codes — the upload handler returns 400 Bad Request for invalid paths, 404 Not Found for unknown job keys, 413 Payload Too Large when the per-job limit is exceeded, and 500 Internal Server Error on I/O failures

Changes

Cargo.toml

  • Add axum = "0.8", enable tokio = { features = ["full"] }

src/upload.rs (new)

  • UploadServer — owns base_url and Arc<Mutex<UploadStore>>; add_new_job() returns an owned JobArtifacts
  • UploadStore — shared with axum handler via State; routes uploaded bytes to the right job by key; validates and sanitizes the path before storing
  • JobArtifacts — RAII; Drop removes key from store
  • ArtifactFileArc-wrapped enum: Memory(Bytes) for small files, Disk(NamedTempFile) for large files; Clone is cheap (ref-count only)
  • sanitize_artifact_path() — rejects unsafe path components; returns a normalized /-joined relative path
  • ARTIFACT_JOB_LIMIT (1 GB) — enforced in JobArtifactsInner; overwrites subtract the old file size before checking

src/main.rs

  • TransformVariables gains a runner map → exposes {{ runner.ARTIFACT_UPLOAD_URL }} in LAVA job YAML templates
  • LavaUploadableFileType::Artifact { path, file } variant added (stores ArtifactFile); get_path() returns bare path; get_data() opens a reader without an extra heap copy
  • LavaUploadableFile.store is now Option<Arc<AvailableArtifactStore>> (None for artifact files)
  • Run gains upload_server: Option<Arc<Mutex<UploadServer>>> and artifacts: Option<JobArtifacts>
  • transform() takes upload_url: &str; command() submit branch registers a job slot and stores the handle
  • get_uploadable_files() appends artifact entries collected from JobArtifacts
  • upload_artifact axum handler receives State<Arc<Mutex<UploadStore>>> — no global; returns appropriate HTTP status codes
  • main() spawns upload server only when env var is present; uses tokio::net::TcpListener::bind + axum::serve; applies DefaultBodyLimit
# In your LAVA job definition template:
actions:
  - deploy:
      # ... upload results back using the runner-provided URL:
      # {{ runner.ARTIFACT_UPLOAD_URL }}myresult.bin
Original prompt

This is a reworked version of PR #17 ("Allow LAVA jobs to pass artifacts back to the runner") that:

  1. Fixes all merge conflicts against the current main branch (which has diverged significantly — it now uses clap instead of structopt, LazyLock instead of lazy_static, async_trait, newer dependency versions, etc.)
  2. Addresses all 7 review comments left on PR Allow LAVA jobs to pass artifacts back to the runner #17

Background

The feature adds a lightweight axum HTTP server to lava-gitlab-runner so that LAVA jobs can POST artifact files back to the runner, which then includes them in the GitLab artifact archive. The LAVA job learns the upload URL from the runner.ARTIFACT_UPLOAD_URL Handlebars template variable.

Files to change

Cargo.toml

Add the following dependencies (compatible with the current version pinning style in main):

axum = "0.8"
rand_chacha = "0.3"
tokio = { version = "1", features = ["full"] }  # already present but ensure "full" or at minimum "net" feature

Do not add local-ip-address (see review comment below).

src/upload.rs (new file)

Implement the upload server with the following design changes vs PR #17:

Review comment addressed: "why is a periodic cleanup required? or rather can't the registration be dropped whenever the relevant jobs get dropped? Also I'm not a big fan of scattering Arc<Mutex<T>> around"

Design:

  • UploadServer holds base_url: Option<Url> (a full URL, not host+port)
  • UploadServer::add_new_job() returns a JobArtifacts directly (owned, not Arc<Mutex<...>>)
  • JobArtifacts is the registration object held by Run. When Run drops it, the LAVA job key is automatically gone — no separate cleanup needed.
  • The web server needs to look up jobs. Use a tokio::sync::Mutex<BTreeMap<String, tokio::sync::mpsc::Sender<(String, Bytes)>>> inside UploadServer so the HTTP handler can send files to the right JobArtifacts via a channel. This avoids Arc<Mutex<JobArtifacts>> scatter.
  • Or alternatively: use a DashMap or Arc<Mutex<BTreeMap<String, Weak<...>>>> — keep it simple but clean.

Actually the cleanest design that avoids scattering Arc<Mutex<T>>:

// src/upload.rs

use std::collections::BTreeMap;
use std::sync::{Arc, Mutex};
use bytes::Bytes;
use rand::distributions::Alphanumeric;
use rand::{Rng, SeedableRng};
use rand_chacha::ChaCha20Rng;
use tracing::warn;

/// Shared state between the HTTP server and the job handlers.
/// Wrapped in Arc<Mutex<>> once, held by both UploadServer and the axum handler.
pub struct UploadStore {
    jobs: BTreeMap<String, Arc<Mutex<JobArtifacts>>>,
}

impl UploadStore {
    fn new() -> Self {
        Self { jobs: BTreeMap::new() }
    }

    /// Called by the axum handler to store an uploaded file.
    pub fn upload_file(&mut self, key: &str, path: &str, data: Bytes) {
        if let Some(ja) = self.jobs.get(key) {
            ja.lock().unwrap().upload_artifact(path, data);
        } else {
            warn!("Ignoring attempt to upload {} for non-existent job key", path);
        }
    }
}

pub struct UploadServer {
    rng: ChaCha20Rng,
    base_url: Option<url::Url>,
    store: Arc<Mutex<UploadStore>>,
}

impl UploadServer {
    pub fn new() -> Self {
        Self {
            rng: ChaCha20Rng::from_entropy(),
            base_url: None,
            store: Arc::new(Mutex::new(UploadStore::new())),
        }
    }

    /// Set the externally-routable base URL for the upload server.
    /// E.g. "https://my-upload-host:2456/some/prefix"
    pub fn set_base_url(&mut self, base_url: url::Url) {
        self.base_url = Some(base_url);
    }

    pub fn store(&self) -> Arc<Mutex<UploadStore>> {
        self.store.clone()
    }

    /// Create a new JobArtifacts registration. The returned value is owned by Run.
    /// When it is dropped, its key is removed from the store automatically.
    pub fn add_new_job(&mut self) -> Option<JobArtifacts> {
        let base_url = self.base_url.as_ref()?;
        let key = self.generate_unique_id();
        let mut upload_url = base_url.clone();
        upload_url.path_segments_mut().ok()?.push(&key);
        // ensure trailing slash
        let upload_url_str = format!("{}/", upload_url);

        let ja_inner = Arc::new(Mutex::new(JobArtifactsInner::new()));
        self.store.lock().unwrap().jobs.insert(key.clone(), ja_inner.clone());
        Some(JobArtifacts {
            key,
            upload_url: upload_url_str,
            inner: ja_inner,
            store: self.store.clone(),
        })
    }

    fn generate_unique_id(&mut self) -> String {
        (&mut self.rng)
            .sample_iter(&Alphanumeric)
            .take(64)
            .map(char::from)
            .collect()
    }
}

/// Owned by Run. Dropped when Run is done — automatically deregisters from the store.
pub struct JobArtifacts {
    key: String,
    upload_url: String,
    inner: Arc<Mutex<JobArtifactsInner>>,
    store: Arc<Mutex<UploadStore>>,
}

impl JobArtifacts {
    pub fn upload_url...

</details>



<!-- START COPILOT CODING AGENT SUFFIX -->

*This pull request was created from Copilot chat.*
>

<!-- START COPILOT CODING AGENT TIPS -->
---

📍 Connect Copilot coding agent with [Jira](https://gh.io/cca-jira-docs), [Azure Boards](https://gh.io/cca-azure-boards-docs) or [Linear](https://gh.io/cca-linear-docs) to delegate work to Copilot in one click without leaving your project management tool.

- Add axum = "0.8" dependency; enable full tokio features
- New src/upload.rs: UploadServer, UploadStore, JobArtifacts (RAII-based,
  auto-deregisters on drop, no global static, no periodic cleanup)
- Add runner.ARTIFACT_UPLOAD_URL template variable via TransformVariables.runner
- Add Artifact{path, data} to LavaUploadableFileType (path only, no id prefix)
- Run struct gains upload_server and artifacts fields
- transform() takes upload_url param; submit command creates JobArtifacts
- get_uploadable_files() includes artifacts from submit jobs
- upload_artifact axum handler uses State<Arc<Mutex<UploadStore>>>
- main() starts server only if LAVA_ARTIFACT_UPLOAD_BASE_URL is set
- Single env var replaces separate host/port; uses TcpListener::bind + axum::serve

Agent-Logs-Url: https://github.com/collabora/lava-gitlab-runner/sessions/831d8c7f-5dc6-47d2-82c1-7e79eed0c6bf

Co-authored-by: sjoerdsimons <22603932+sjoerdsimons@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix merge conflicts and address review comments for artifact upload Add LAVA artifact upload via axum HTTP server (rework of #17) Mar 29, 2026
Copilot AI requested a review from sjoerdsimons March 29, 2026 14:23
@sjoerdsimons sjoerdsimons requested a review from Copilot March 29, 2026 14:32
@sjoerdsimons
Copy link
Copy Markdown
Contributor

@copilot please fix CI failures

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an optional embedded HTTP upload endpoint so LAVA “submit” jobs can POST back additional artifact files, which the runner then includes in the GitLab artifact archive.

Changes:

  • Introduce UploadServer/UploadStore/JobArtifacts (RAII) to register per-job upload keys and collect uploaded bytes.
  • Extend template variables with runner.ARTIFACT_UPLOAD_URL and include uploaded artifacts in get_uploadable_files().
  • Start an axum server when LAVA_ARTIFACT_UPLOAD_BASE_URL is set; bind address controlled via LAVA_ARTIFACT_UPLOAD_LISTEN_ADDR.

Reviewed changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 5 comments.

File Description
Cargo.toml Enables full Tokio features and adds axum dependency for the HTTP server.
Cargo.lock Locks new dependency graph introduced by axum/tokio feature changes.
src/upload.rs Implements per-job upload registration + in-memory artifact collection via shared store.
src/main.rs Wires upload URL into templating, collects uploaded artifacts, and spawns the axum server.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +19 to +23
/// Called by the axum handler to store an uploaded file.
pub fn upload_file(&mut self, key: &str, path: &str, data: Bytes) {
if let Some(ja) = self.jobs.get(key) {
ja.lock().unwrap().upload_artifact(path, data);
} else {
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

upload_file() accepts an arbitrary path from the HTTP route and stores it verbatim. This allows .., absolute paths, path separators, and name collisions (e.g., overwriting *_log.yaml) inside the artifact archive. Validate/sanitize path to a single safe filename (and/or a constrained relative path) before storing.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot apply changes based on this feedback -- artifact overwrites are fine, but paths should be sanatized or even enforced to be normalized

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in a9c59c3. Added sanitize_artifact_path() which iterates std::path::Path::components() and rejects any .., absolute/prefix/root components, and empty results. Only Normal components (and skipped .) are kept, then joined with /.

src/main.rs Outdated
Comment on lines +971 to +975
async fn upload_artifact(
State(store): State<Arc<Mutex<UploadStore>>>,
Path((key, path)): Path<(String, String)>,
body: Bytes,
) {
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uploads are fully buffered into memory (body: Bytes) and then retained in a per-job map with no size/count limits. A malicious or misconfigured job can OOM the runner by POSTing large/many artifacts. Add request body limits (e.g., axum DefaultBodyLimit / tower limit layer) and enforce a per-job/per-runner quota (or stream artifacts to disk).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot apply changes based on this feedback This should use a cache for bigger files, but also enforce an upload limit per job. iirc the gitlab-runner crate already does that for artifact downloads

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in a9c59c3. Files >= 1 MB are spilled to a NamedTempFile (via tempfile); smaller files stay as Bytes in memory. JobArtifactsInner tracks total_bytes (accounting for overwrites by subtracting the old size before adding the new) and rejects uploads that would exceed 1 GB per job. A DefaultBodyLimit::max(ARTIFACT_JOB_LIMIT) is also applied to the axum router to enforce the limit at the HTTP layer.

Comment on lines +971 to 977
async fn upload_artifact(
State(store): State<Arc<Mutex<UploadStore>>>,
Path((key, path)): Path<(String, String)>,
body: Bytes,
) {
store.lock().unwrap().upload_file(&key, &path, body);
}
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The upload handler always returns 200 OK and UploadStore::upload_file silently ignores unknown job keys. This can make artifact loss hard to detect/debug. Consider returning an explicit status (e.g., 404 for unknown key, 400 for invalid path) and logging the key/path together for traceability.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot apply changes based on this feedback

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in a9c59c3. upload_artifact now returns impl IntoResponse with: 200 OK on success, 400 Bad Request for an invalid/unsafe path, 404 Not Found for an unknown job key, 413 Payload Too Large when the per-job limit is exceeded, and 500 Internal Server Error on I/O errors.

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Mar 29, 2026

@copilot please fix CI failures

Fixed in b2400c1 — the CI failure was a cargo fmt formatting issue. Running cargo fmt cleaned up whitespace and line-wrapping in src/main.rs and src/upload.rs.

…HTTP status codes

Agent-Logs-Url: https://github.com/collabora/lava-gitlab-runner/sessions/439b5174-50be-4263-8ccd-e88b204f9251

Co-authored-by: sjoerdsimons <22603932+sjoerdsimons@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants