3. GitLab CI/CD Pipeline Infrastructure

This document provides a comprehensive reference for the GitLab CI/CD pipeline infrastructure used by the global-workflow project. It covers the repository mirroring strategy between GitHub and GitLab, the pipeline architecture and configuration, the GitLab runner deployment on RDHPCS systems, and the day-to-day maintenance procedures that keep the system operational.

3.1. Overview 

The global-workflow CI/CD system uses GitLab CI/CD as the execution engine for continuous integration testing across NOAA’s Research and Development High-Performance Computing Systems (RDHPCS). GitHub remains the authoritative repository where all development, code review, and pull request activity occurs.

The fundamental challenge this infrastructure solves is that NOAA’s HPC systems (Hera, Gaea, Orion, Hercules, Ursa) are not directly accessible from GitHub Actions runners. By mirroring the repository to GitLab and placing GitLab runners directly on those HPC systems, the project gains the ability to build and test the workflow in the same environments where it will be deployed operationally.

CI/CD Architecture Diagram — Fig. 3.1 High-level CI/CD architecture showing repository mirroring and pipeline flow.

The architecture can also be summarized textually:

┌──────────────────────────┐         ┌───────────────────────────┐         ┌──────────────────────────┐
│   GitHub (Authoritative) │  Pull   │  Licensed GitLab Instance │  Push   │  VLab Community GitLab   │
│   github.com/NOAA-EMC/   │ Mirror  │  (Premium — Mirroring     │ Mirror  │  vlab.noaa.gov/          │
│   global-workflow        ├────────►│   Only)                   ├────────►│  gitlab-community/...    │
│                          │         │                           │         │  (CI/CD Pipelines here)  │
└──────────┬───────────────┘         └───────────────────────────┘         └────────────┬─────────────┘
           │                                                                            │
           │  GitHub Actions                                               Pipeline Stages
           │  (API Trigger)                                                             │
           │                                                    ┌───────────────────────▼──────────┐
           │                                                    │  1. Build → 2. Setup → 3. Run →  │
           └───────────────────────────────────────────────────►│           4. Finalize            │
                                                                └──────────────────┬───────────────┘
                                                                                   │
                       ┌───────────────────────────────────────────────────────────▼───────────┐
                       │        RDHPCS GitLab Shell Runners                                    │
                       │  ┌───────┐ ┌────────┐ ┌──────┐ ┌─────────┐ ┌──────┐                   │
                       │  │ Hera  │ │Gaea C6 │ │Orion │ │Hercules │ │ Ursa │                   │
                       │  │17 case│ │15 cases│ │8 case│ │10 cases │ │17 cas│                   │
                       │  └───────┘ └────────┘ └──────┘ └─────────┘ └──────┘                   │
                       └───────────────────────────────────────────────────────────────────────┘

3.1.1. Key Design Principles 

GitHub is authoritative: All development happens on GitHub (https://github.com/NOAA-EMC/global-workflow). GitLab is used solely as a CI execution platform.
Two-tier mirroring: A licensed GitLab instance performs the pull mirror from GitHub, and subsequently push mirrors to the NOAA community GitLab instance.
HPC-native testing: Runners execute directly on the target HPC nodes, ensuring tests build and run against the real Spack-Stack software environment.
Multi-modal pipelines: The system supports both comprehensive end-to-end experiment cases and fast CTest-based functional checks.
GitHub feedback loop: Pipeline results flow back to GitHub through PR labels, PR comments (including error log gists), and status badges.

3.2. Repository Mirroring: GitHub to GitLab 

Because GitHub is the authoritative source of truth and GitLab is the CI execution platform, a reliable synchronization mechanism is required. The global-workflow project uses a two-stage mirroring strategy involving two GitLab instances.

3.2.1. Pull Mirroring (Licensed GitLab Instance)

The first stage uses pull mirroring, a feature that is only available on licensed (paid) tiers of GitLab (Premium or Ultimate). A single licensed GitLab instance is configured to pull from the authoritative GitHub repository:

Table 3.1 Pull Mirror Configuration
Setting	Value
Source repository	`https://github.com/NOAA-EMC/global-workflow.git`
Direction	Pull
Scope	All branches
Sync frequency	Automatic (every few minutes)

The licensed instance’s sole purpose is mirroring — it does not run any CI/CD pipelines itself. Its pull mirror keeps the GitLab copy synchronized with GitHub, and its push mirror (described below) propagates changes onward.

Note

Pull mirroring is an advanced feature available only on licensed instances of GitLab (Premium tier and above). It is not available on GitLab Community Edition (CE) or the free tier. This is why a separate licensed instance is required for the first stage of the mirror chain.

3.2.2. Push Mirroring (Community GitLab at VLab)

The second stage uses push mirroring from the licensed GitLab instance to the NOAA community GitLab instance hosted at VLab:

Table 3.2 Push Mirror Configuration
Setting	Value
Target repository	`https://vlab.noaa.gov/gitlab-community/NWS/Operations/NCEP/EMC/global-workflow.git`
Direction	Push
Scope	All branches
Sync frequency	Automatic (every few minutes)

The VLab community GitLab instance is where the CI/CD pipelines actually execute. GitLab runners deployed on RDHPCS systems register against this instance, and all pipeline stages (build, setup, test, finalize) run here. This instance also provides the broader NOAA user community with read access to the repository.

3.2.3. Mirror Chain Summary 

The complete mirror chain is:

GitHub (authoritative)
    │
    │  Pull Mirror (licensed GitLab feature)
    ▼
Licensed GitLab Instance (mirroring only)
    │
    │  Push Mirror (available on all GitLab tiers)
    ▼
VLab Community GitLab (CI/CD pipelines execute here, NOAA-wide access)

Both mirrored repositories track all branches, ensuring that any branch pushed to GitHub (including PR branches fetched during pipeline execution) is available for CI testing.

Important

Developers should never push directly to either GitLab instance. All code changes must flow through GitHub. The GitLab mirrors are read-only copies maintained by the mirroring configuration.

3.3. Pipeline Architecture 

The pipeline is defined across four YAML configuration files that are included from the top-level .gitlab-ci.yml:

Table 3.3 Pipeline Configuration Files
File	Purpose
`.gitlab-ci.yml`	Main orchestration: stages, variables, base templates, build template
`dev/ci/gitlab-ci-cases.yml`	Templates for standard experiment test cases (setup, run, finalize)
`dev/ci/gitlab-ci-ctests.yml`	Templates for CTest-based functional testing (CMake/CTest)
`dev/ci/gitlab-ci-hosts.yml`	Host-specific jobs, test matrices, runner tags, and conditional rules

3.3.1. Pipeline Stages 

Every pipeline execution proceeds through four stages in order:

build — Clone the repository, checkout the PR branch (if applicable), build the codebase via ci_utils.sh build, and link the workflow.
setup_tests — Prepare the test environment: create experiment directories (PR Cases) or configure the CMake/CTest build (CTests).
run_tests — Execute the tests: run Rocoto-orchestrated experiments (PR Cases) or run ctest with specific labels (CTests).
finalize — Report results: update GitHub PR labels, manage nightly directory symlinks, and update status badges.

3.3.2. Pipeline Modalities 

The PIPELINE_TYPE variable controls which testing modality runs:

3.3.2.1. PR Cases (`PIPELINE_TYPE=pr_cases`)

Comprehensive end-to-end experiment testing. Each test case is defined by a YAML file in dev/ci/cases/pr/ that specifies an experiment configuration:

# Example: dev/ci/cases/pr/C48_ATM.yaml
experiment:
  net: gfs
  mode: forecast-only
  app: ATM
  resdetatmos: 48
  idate: 2021032312
  edate: 2021032312

workflow:
  engine: rocoto
  rocoto:
    maxtries: 2

The pipeline creates a full experiment directory, launches Rocoto, and monitors the workflow to completion. Failures are detected through Rocoto state tracking and reported back to the GitHub PR with error log gists.

Currently defined PR case tests include:

C48_ATM — Atmosphere-only forecast
C48_S2SW — Coupled atmosphere-ocean-ice-wave
C48_S2SWA_gefs — GEFS ensemble coupled run
C48mx500_3DVarAOWCDA — 3DVar coupled data assimilation
C48mx500_hybAOWCDA — Hybrid EnVar coupled data assimilation
C96C48_hybatmDA — Hybrid atmosphere-only data assimilation
C96C48_hybatmsnowDA — Hybrid atmosphere + snow data assimilation
C96C48_hybatmsoilDA — Hybrid atmosphere + soil data assimilation
C96_atm3DVar — C96 resolution 3DVar atmosphere
C96_gcafs_cycled — GCAFS cycled system
C96mx100_S2S — Seasonal-to-subseasonal coupled
C48_gsienkf_atmDA — GSI ensemble Kalman filter
C48_ufsenkf_atmDA — UFS ensemble Kalman filter
And others (see dev/ci/gitlab-ci-hosts.yml for per-machine matrices)

3.3.2.2. CTests (`PIPELINE_TYPE=ctests`)

Fast, focused unit-level testing using the CMake/CTest framework. These tests exercise individual Rocoto jobs (JJOBS) with predefined, pre-staged input data and verify their outputs against baselines from nightly stable runs.

The CTest flow:

cmake -S "${GW_HOMEgfs}" — Configure the CTest build
ctest -N — List available tests
ctest -L "${CTEST_NAME}" — Run tests matching a specific label
JUnit XML results are published as GitLab artifacts

CTests provide rapid developer feedback (minutes instead of hours) and are ideal for targeted validation of specific job changes.

3.3.3. Per-Host Test Matrices 

Each HPC platform runs a specific subset of test cases, defined in dev/ci/gitlab-ci-hosts.yml. The matrices reflect the software and data availability on each system:

Table 3.4 Test Cases by Platform
Platform	Test Cases
Hera	C48_ATM, C48_S2SW, C48_S2SWA_gefs, C48mx500_3DVarAOWCDA, C48mx500_hybAOWCDA, C96C48_hybatmDA, C96C48_hybatmsnowDA, C96C48_hybatmsoilDA, C96C48_ufsgsi_hybatmDA, C96C48_ufs_hybatmDA, C96C48mx500_S2SW_cyc_gfs, C96_atm3DVar, C96_gcafs_cycled, C96_gcafs_cycled_noDA, C96mx100_S2S, C48_gsienkf_atmDA, C48_ufsenkf_atmDA
Gaea C6	C48_ATM, C48_S2SW, C48_S2SWA_gefs, C48mx500_3DVarAOWCDA, C48mx500_hybAOWCDA, C96C48_hybatmDA, C96C48_hybatmsnowDA, C96C48_hybatmsoilDA, C96C48mx500_S2SW_cyc_gfs, C96_atm3DVar, C96_gcafs_cycled, C96_gcafs_cycled_noDA, C96mx100_S2S, C48_gsienkf_atmDA, C48_ufsenkf_atmDA
Orion	C48_ATM, C48_S2SW, C48_S2SWA_gefs, C96C48_hybatmDA, C96C48mx500_S2SW_cyc_gfs, C96_atm3DVar, C96mx100_S2S, C96_gcafs_cycled
Hercules	C48_ATM, C48_S2SW, C48_S2SWA_gefs, C48mx500_3DVarAOWCDA, C48mx500_hybAOWCDA, C96C48_hybatmDA, C96C48mx500_S2SW_cyc_gfs, C96_atm3DVar, C96mx100_S2S, C96_gcafs_cycled
Ursa	C48_ATM, C48_S2SW, C48_S2SWA_gefs, C48mx500_3DVarAOWCDA, C48mx500_hybAOWCDA, C96C48_hybatmDA, C96C48_hybatmsnowDA, C96C48_hybatmsoilDA, C96C48_ufsgsi_hybatmDA, C96C48_ufs_hybatmDA, C96C48mx500_S2SW_cyc_gfs, C96_atm3DVar, C96mx100_S2S, C96_gcafs_cycled, C96_gcafs_cycled_noDA, C48_gsienkf_atmDA, C48_ufsenkf_atmDA

3.3.4. Pipeline Variables 

The following variables control pipeline behavior and can be set from GitLab scheduled pipelines, GitHub Actions triggers, or the GitLab web UI:

Table 3.5 Key Pipeline Variables
Variable	Default	Description
`PIPELINE_TYPE`	`pr_cases`	Testing modality: `pr_cases` or `ctests`
`GFS_CI_RUN_TYPE`	`pr_cases`	Run classification: `pr_cases` or `nightly`
`RUN_ON_MACHINES`	`all`	Space-separated list of machines or `all`
`PR_NUMBER`	`0`	GitHub PR number (`0` = develop branch)
`GITHUB_COMMIT_SHA`	(empty)	PR head commit SHA for GitLab native GitHub integration
`GW_REPO_URL`	`https://github.com/NOAA-EMC/global-workflow.git`	Authoritative GitHub repository URL

3.4. GitHub Actions Integration 

Pipelines are triggered from GitHub via the trigger-gitlab-pipelines.yml workflow in .github/workflows/. This provides a user-friendly interface for developers to initiate CI testing.

3.4.1. Triggering a Pipeline 

Navigate to the Actions tab in the GitHub repository.
Select the “Trigger GitLab Pipelines” workflow.
Click “Run workflow” and configure the inputs:
- PR number: Enter the PR number to test, or 0 for the develop branch.
- Pipeline Type: Choose “PR Cases” or “CTests”.
- Machine checkboxes: Select which RDHPCS machines to run on (Hera, Gaea C6, Orion, Hercules, Ursa).
Click “Run workflow” to submit.

The workflow performs the following:

Permission check: Verifies the triggering user is in the AUTHORIZED_GITLAB_TRIGGER_USERS list (stored as a GitHub repository variable).
Parameter setup: Resolves the PR head commit SHA, determines the pipeline type, and builds the machine selection list.
GitLab trigger: Sends a POST request to the GitLab Pipeline Trigger API with all the necessary variables.
Label management: Adds CI-<Machine>-Ready labels to the PR on GitHub.

3.4.2. Required GitHub Secrets and Variables 

Table 3.6 GitHub Configuration
Name	Type	Description
`GITLAB_TRIGGER_TOKEN`	Secret	GitLab pipeline trigger token (Settings > CI/CD > Pipeline triggers)
`GITHUBTOKEN`	Secret	GitHub personal access token with repo scope
`GW_REPO_URL`	Variable	GitHub repository URL (e.g., `NOAA-EMC/global-workflow`)
`GITLAB_TRIGGER_URL`	Variable	GitLab trigger API endpoint URL
`AUTHORIZED_GITLAB_TRIGGER_USERS`	Variable	Comma-separated list of authorized GitHub usernames

3.4.3. PR Label Lifecycle 

GitHub PR labels track the CI state through the pipeline:

Table 3.7 CI Label Flow
Label	Set By	Meaning
`CI-<Machine>-Ready`	GitHub Actions	Pipeline has been triggered for this machine
`CI-<Machine>-Building`	Build stage	Build is in progress
`CI-<Machine>-Running`	Build stage (on success)	Tests are actively running
`CI-<Machine>-Passed`	Finalize (success)	All test cases passed on this machine
`CI-<Machine>-Failed`	Finalize (failure)	One or more test cases failed

When a test case fails, the run_check_gitlab_ci.sh script automatically posts a comment to the GitHub PR containing:

The failed case name and machine
The experiment directory path
Links to error log gists (uploaded via publish_logs.py)

3.5. Nightly Pipeline Operations 

Nightly pipelines are configured as GitLab scheduled pipelines with GFS_CI_RUN_TYPE=nightly. They differ from PR-triggered pipelines in several ways:

3.5.1. Directory Management 

On successful completion of a nightly pipeline:

The workspace directory is renamed from the pipeline-ID format to a date-based format:

# During execution:
${CI_BUILDS_DIR}/nightly_${CI_COMMIT_SHORT_SHA}_${CI_PIPELINE_ID}/

# After success:
${CI_BUILDS_DIR}/nightly_${CI_COMMIT_SHORT_SHA}_${MMDDYY}/

A stable symlink is created pointing to the latest successful nightly:

${CI_BUILDS_DIR}/stable -> nightly_${CI_COMMIT_SHORT_SHA}_${MMDDYY}/

Old nightly directories (except the stable target) are cleaned up.

The stable directory is significant because CTest baseline data (STAGED_CTESTS) is sourced from it:

export STAGED_CTESTS=${GITLAB_BUILDS_DIR}/stable/RUNTESTS

3.5.2. Badge Updates 

Nightly pipelines update status badges stored as GitHub Gists. On success, a green “passed” badge is generated; on failure, a red “failed” badge is generated. These badges are referenced from the project README for visibility.

# Badge generation (from finalize stage)
curl -sSL "https://img.shields.io/badge/${machine}_nightly-passed-brightgreen" \
  -o "${badge_img_file}"
${GH} gist edit "${badge_GIST_ID}" --add "${badge_img_file}"

3.6. GitLab Runner Setup 

GitLab runners are deployed directly on each RDHPCS system. They execute as shell runners (not Docker), running directly in the HPC environment with access to the native compilers, Spack-Stack modules, and shared filesystems.

3.6.1. Platform Configuration Files 

Each supported platform has a configuration file at dev/ci/platforms/config.<MACHINE_ID> that defines platform-specific paths and settings:

Table 3.8 Platform Configurations
Platform	Config File	CI Root Directory
Hera	`config.hera`	`/scratch3/NCEPDEV/global/role.glopara/GFS_CI_CD/HERA`
Gaea C6	`config.gaeac6`	`/gpfs/f6/drsa-precip3/proj-shared/${USER}/GFS_CI_CD`
Orion	`config.orion`	`/work2/noaa/global/${USER}/GFS_CI_CD/ORION`
Hercules	`config.hercules`	`/work2/noaa/global/role-global/GFS_CI_CD/HERCULES`
Ursa	`config.ursa`	`/scratch3/NCEPDEV/global/role.glopara/GFS_CI_CD/URSA`
WCOSS2	`config.wcoss2`	`/lfs/h2/emc/global/noscrub/globalworkflow.ci/GFS_CI_ROOT`

Each configuration file exports the following key variables:

# Base directory for all CI operations
export GFS_CI_ROOT=/scratch3/NCEPDEV/global/role.glopara/GFS_CI_CD/HERA

# Initial condition data for experiments
export ICSDIR_ROOT=/scratch3/NCEPDEV/global/role.glopara/data/ICSDIR

# GitLab runner registration URL
export GITLAB_URL=https://vlab.noaa.gov/gitlab-community

# Human-readable runner name
export GITLAB_RUNNER_NAME="RDHPCS Hera"

# Directory where pipeline builds are stored
export GITLAB_BUILDS_DIR=${GFS_CI_ROOT}/BUILDS/GITLAB

# GitLab runner working directory (state files, config)
export GITLAB_RUNNER_DIR="${GFS_CI_ROOT}/GitLab/Runner"

# Baseline data for CTests
export STAGED_CTESTS=${GITLAB_BUILDS_DIR}/stable/RUNTESTS

# Custom Rocoto path (dry-run capable build)
export GFS_CI_ROCOTO_PATH="${GFS_CI_UTIL_PATH}/src/rocoto-1.3.7-dryrun_nodaemon/bin"

Note

Hera and Ursa share the same physical filesystem (cross-mounted), so their GFS_CI_ROOT paths include the machine name (HERA or URSA) to avoid collisions.

3.6.2. The `launch_gitlab_runner.sh` Script 

The dev/ci/scripts/utils/gitlab/launch_gitlab_runner.sh script is the primary tool for managing GitLab runners on each RDHPCS system. It supports three operations: register, run, and unregister.

3.6.2.1. Setup Prerequisites 

Before using the launch script, ensure:

Platform config exists: A config.<MACHINE_ID> file must exist in dev/ci/platforms/ for the target machine.
Runner token is available: The GitLab runner registration token must be available through one of:
- Command-line argument (second positional parameter)
- GITLAB_RUNNER_TOKEN environment variable
- A gitlab_token file in the runner directory
Runner binary: The script will automatically download the GitLab runner binary if it is not present in the GITLAB_RUNNER_DIR.

3.6.2.2. Registering a Runner 

To register a new runner on an RDHPCS system:

# SSH to the target HPC system
ssh role.glopara@hera.rdhpcs.noaa.gov

# Navigate to the global-workflow checkout
cd /path/to/global-workflow

# Register the runner (token can also be in GITLAB_RUNNER_TOKEN or gitlab_token file)
dev/ci/scripts/utils/gitlab/launch_gitlab_runner.sh register <GITLAB_RUNNER_TOKEN>

The registration command configures the runner with:

Executor: shell (runs directly in the HPC environment)
Shell: bash
Builds directory: ${GITLAB_BUILDS_DIR} (from platform config)
Custom build directory: enabled (allowing .gitlab-ci.yml to override the clone path via GIT_CLONE_PATH)
Concurrency: 24 concurrent requests

After registration, the script updates the runner’s config.toml to set concurrent = 24.

3.6.2.3. Starting a Runner 

To start a registered runner:

dev/ci/scripts/utils/gitlab/launch_gitlab_runner.sh run

This launches the runner as a background process using nohup. The runner’s working directory is set to ${GITLAB_RUNNER_DIR} from the platform config. Logs are written to a date-stamped log file in the working directory.

3.6.2.4. Unregistering a Runner 

To remove a runner from the GitLab server:

dev/ci/scripts/utils/gitlab/launch_gitlab_runner.sh unregister

This removes the runner registration identified by ${GITLAB_RUNNER_NAME} from the GitLab server.

3.6.3. Runner Directory Layout 

Each platform follows a common directory structure under its GFS_CI_ROOT:

${GFS_CI_ROOT}/
├── BUILDS/
│   └── GITLAB/             # Pipeline build artifacts
│       ├── pr_cases_<sha>_<id>/
│       ├── nightly_<sha>_<date>/
│       └── stable -> nightly_<sha>_<date>/
├── GitLab/
│   └── Runner/             # Runner working directory
│       ├── gitlab-runner   # Runner binary
│       ├── config.toml     # Runner configuration (auto-generated)
│       ├── gitlab_token    # Optional token file
│       └── launched_gitlab_runner-*.log  # Runner logs
└── Jenkins/                # Legacy Jenkins directories
    ├── agent/
    └── workspace/

3.6.4. Runner Maintenance 

Common maintenance tasks:

Check if a runner is active:

ps aux | grep gitlab-runner

View runner logs:

tail -f ${GFS_CI_ROOT}/GitLab/Runner/launched_gitlab_runner-*.log

Restart a runner (e.g., after system maintenance):

# Stop any existing runner
pkill -f "gitlab-runner run"

# Start fresh
cd /path/to/global-workflow
dev/ci/scripts/utils/gitlab/launch_gitlab_runner.sh run

Re-register after token rotation:

# Unregister the old runner
dev/ci/scripts/utils/gitlab/launch_gitlab_runner.sh unregister

# Register with the new token
dev/ci/scripts/utils/gitlab/launch_gitlab_runner.sh register <NEW_TOKEN>

# Start the runner
dev/ci/scripts/utils/gitlab/launch_gitlab_runner.sh run

3.7. Pipeline Execution Details 

3.7.1. Build Stage 

The build stage (defined in .build_template in .gitlab-ci.yml) performs:

Environment setup: Sources the platform config and validates paths.
Custom Rocoto loading: If GFS_CI_ROCOTO_PATH is set in the platform config, it is prepended to PATH to use a custom Rocoto build with dry-run support.
PR checkout: For PR pipelines (PR_NUMBER != 0), the build fetches the PR from GitHub and checks it out using gh pr checkout.
Build execution: Calls dev/ci/scripts/utils/ci_utils.sh build.
Workflow linking: Runs sorc/link_workflow.sh to create necessary symlinks.
Label updates: Updates GitHub PR labels from CI-<Machine>-Ready to CI-<Machine>-Building and then to CI-<Machine>-Running.

3.7.2. Test Execution (PR Cases)

The run_check_gitlab_ci.sh script manages each experiment’s lifecycle:

Launches the experiment with rocotorun.
Enters a monitoring loop that alternates between rocotorun and rocotostat calls.
Tracks Rocoto state through completion (DONE) or failure (FAIL, UNAVAILABLE, UNKNOWN, STALLED).
On failure: extracts error logs from failed/dead tasks, uploads them as GitHub Gists, and posts a comment to the PR.
Exits with rc=0 for success or rc=1 for failure.

3.7.3. Test Execution (CTests)

CTest execution (defined in .run_ctests_template in gitlab-ci-ctests.yml):

Changes to the CTest build directory.
Runs ctest -L "${CTEST_NAME}" to execute tests for a specific label.
Publishes JUnit XML results as GitLab artifacts.
Examines both the ctest exit code and the JUnit XML for failure indicators.

3.7.4. Finalize Stage 

On success:

PR pipelines: Adds CI-<Machine>-Passed, removes CI-<Machine>-Running.
Nightly pipelines: Renames the workspace to date format, creates the stable symlink, cleans old directories, and updates status badges.

On failure:

PR pipelines: Adds CI-<Machine>-Failed, removes CI-<Machine>-Running.
Nightly pipelines: Updates the status badge to show failure.

Failure cleanup is also handled in after_script blocks that run regardless of job status, canceling any remaining batch jobs and cleaning up resources.

3.8. Adding a New Host Platform 

To extend the CI pipeline to a new RDHPCS system:

Create a platform config: Add dev/ci/platforms/config.<new_machine> with the required environment variables (follow an existing config as a template).

Define the test matrix: Add a case matrix in dev/ci/gitlab-ci-hosts.yml:

.new_machine_cases_matrix: &new_machine_cases
  - caseName: ["C48_ATM", "C48_S2SW", ...]

Add host-specific jobs: Create setup, run, and finalize jobs in dev/ci/gitlab-ci-hosts.yml that extend the appropriate templates and reference the new machine tag:

setup_experiments-new_machine:
  extends: .setup_experiment_template
  variables:
    machine: new_machine
  tags:
    - new_machine
  parallel:
    matrix: *new_machine_cases
  needs:
    - build-new_machine
  rules:
    - if: $PIPELINE_TYPE == "pr_cases" && ...

Add a build job: Add a build job in dev/ci/gitlab-ci-hosts.yml:

build-new_machine:
  extends: .build_template
  variables:
    machine: new_machine
  tags:
    - new_machine

Register a runner: SSH to the new machine and register a GitLab runner using launch_gitlab_runner.sh register.
Update GitHub Actions: Add a new boolean input for the machine in .github/workflows/trigger-gitlab-pipelines.yml.
Stage baseline data: Ensure nightly baseline data is available at the STAGED_CTESTS path for CTest validation.

3.9. File Reference 

Table 3.9 Complete File Reference
File Path	Description
`.gitlab-ci.yml`	Main pipeline orchestration and base templates
`dev/ci/gitlab-ci-cases.yml`	Setup, run, and finalize templates for experiment cases
`dev/ci/gitlab-ci-ctests.yml`	CMake/CTest setup and execution templates
`dev/ci/gitlab-ci-hosts.yml`	Per-host job definitions, test matrices, and runner tags
`dev/ci/platforms/config.*`	Platform-specific CI/CD environment configuration
`dev/ci/cases/pr/*.yaml`	Individual test case definitions (experiment YAML files)
`dev/ci/scripts/utils/ci_utils.sh`	Core CI utility functions (build, create_experiment, etc.)
`dev/ci/scripts/run_check_gitlab_ci.sh`	Experiment monitoring, Rocoto polling, and failure reporting
`dev/ci/scripts/utils/gitlab/launch_gitlab_runner.sh`	GitLab runner registration, startup, and removal
`dev/ci/scripts/utils/gitlab/badge-updater-pipeline.yml`	Standalone badge update pipeline configuration
`dev/ci/scripts/utils/publish_logs.py`	Error log upload to GitHub Gists
`dev/ci/scripts/utils/rocotostat.py`	Rocoto status parsing and reporting
`.github/workflows/trigger-gitlab-pipelines.yml`	GitHub Actions workflow for triggering GitLab pipelines

3.10. Troubleshooting 

3.10.1. Runner Not Picking Up Jobs 

Verify the runner process is active: ps aux | grep gitlab-runner
Check runner logs for connection errors.
Ensure the runner tags match the job tags in the pipeline configuration.
Verify network connectivity to the GitLab instance from the HPC node.

3.10.2. Build Failures 

Check that GW_HOMEgfs is correctly set and the directory exists.
Verify that Spack-Stack modules are loadable on the target platform.
Review the ci_utils.sh build output in the job logs.
For PR builds, ensure gh (GitHub CLI) is installed and authenticated.

3.10.3. Test Case Timeouts 

Rocoto-based experiments have a maximum Rocoto cycle timeout configured in the CI runner (RUNNER_SCRIPT_TIMEOUT: 8h).
If experiments consistently time out, check:
- Job scheduler queue availability on the HPC system.
- maxtries setting in the test case YAML.
- Whether batch jobs are being submitted and scheduled correctly.

3.10.4. CTest Baseline Mismatches 

Verify that STAGED_CTESTS points to a valid, recent nightly build.
Confirm the stable symlink is intact and pointing to a successful nightly.
Check that the baseline data matches the current develop branch state.

3.10.5. GitLab Mirror Sync Issues 

Verify the pull mirror is operational on the licensed GitLab instance (Settings > Repository > Mirroring repositories).
Check the “Last successful update” timestamp — it should be within the last few minutes.
For push mirror issues to the community instance, verify the credentials and target URL are still valid.
If a specific branch is missing, trigger a manual sync from the mirroring settings page.