Self-Hosted GitLab CI/CD Platform

Context

Running a homelab that manages real infrastructure — Kubernetes clusters, Terraform modules, Ansible playbooks, Cloudflare DNS — means you need a proper GitOps workflow, not just ad-hoc scripts run from someone's laptop.

The goal was a fully self-hosted CI/CD platform where:

Every infrastructure change goes through a merge request
Pipelines run automatically on push
Built artefacts (container images) are stored in a private registry
A single set of credentials works across all services (no per-app accounts)
The platform itself is observable — not a black box

SaaS GitLab would have covered the CI/CD basics, but integrating with a private Authentik SSO, keeping container images in-house, and scraping custom metrics into an existing Prometheus stack required self-hosting.

Architecture

GitLab EE runs as a Docker Compose stack on a dedicated VM. Traefik handles TLS termination and routing at the edge — GitLab gets a clean HTTPS hostname without needing to manage certificates inside the container.

Internet → Traefik (TLS) → gitlab.<domain>           (web UI + API)
                         → registry.<domain>          (Container Registry)

SSH access for git push and git pull is exposed on a non-standard port, keeping port 22 reserved for host system SSH. This avoids the usual config gymnastics of running two SSH services on the same machine.

The container registry runs as a separate container sharing the same Traefik network, giving it its own subdomain and TLS certificate. Images are stored on a local volume, with backups flowing to TrueNAS.

Runner Architecture

Most self-hosted GitLab setups treat the runner as an afterthought — configured once, never revisited. That's where things break in interesting ways, because two executor types mean two completely different failure modes. When CI stops working, the first question is which runner is involved.

Two runners, two executor types, deliberately separated:

Docker executor (docker tag) — Build and test jobs run inside Docker containers. Each job gets a fresh, isolated environment. No state accumulates between runs. Used for: building container images, running linters, running tests, packaging artefacts.

Shell executor (deploy tag) — Runs directly on the host with access to the Docker socket. Used for: deploying services via docker compose up, running Ansible playbooks, applying Terraform plans. This runner has host access that build runners intentionally do not.

# Runner — Docker executor
image: gitlab/gitlab-runner:alpine
volumes:
  - /var/run/docker.sock:/var/run/docker.sock

The separation is a deliberate security boundary. A compromised build job in the Docker executor cannot reach the host or other running containers. Only pipelines that explicitly tag a job with deploy get that access — and only after the merge to main.

The failure modes are also completely different. Docker executor failures are isolated to the job container — image pull errors, networking inside the container, resource limits, a bad base image. Shell executor failures are host-level — a broken deploy credential, a locked Docker socket, a full disk on the host. Knowing which executor a job uses tells you immediately where to look.

A typical pipeline flow:

stages:
  - build
  - test
  - deploy
 
build-image:
  stage: build
  tags: [docker]
  script:
    - docker build -t $CI_REGISTRY_IMAGE:$CI_COMMIT_SHORT_SHA .
    - docker push $CI_REGISTRY_IMAGE:$CI_COMMIT_SHORT_SHA
 
deploy:
  stage: deploy
  tags: [deploy]
  only: [main]
  script:
    - docker compose pull
    - docker compose up -d

SSO via Authentik

GitLab is registered as an OIDC application in Authentik. Local password login is disabled for non-admin users — the only way in is through SSO.

This matters for a few reasons:

One set of credentials across all homelab services. No per-service account management.
Centralised access control — removing a user from Authentik immediately revokes access across GitLab, Grafana, Traefik dashboards, and everything else.
Group-based permissions — GitLab groups map to Authentik groups, so access to specific projects follows the same policy as everything else.
Audit trail — Authentik logs authentication events. One place to look if something unexpected happens.

The main tradeoff: if Authentik is down, non-admin users cannot log in to GitLab. This is acceptable in a homelab but worth designing around in production (always keep at least one local admin account with a strong password for break-glass access).

Container Registry

The internal registry hosts all service images. Every service that needs a custom image has a CI pipeline that builds and pushes to registry.<domain>/<group>/<project>:<tag> on merge to main.

Docker Compose stacks and Kubernetes manifests reference the internal registry directly:

image: registry.<domain>/homelab/traefik:a3f9c2e

Benefits over pulling from DockerHub:

Pinned, verified artefacts — the image in production was built from a specific commit
No rate limits — no DockerHub pull throttling
Offline resilient — services can restart even if external connectivity is down
Audit log — GitLab tracks who pushed what image and when

Observability

GitLab exposes a Prometheus metrics endpoint covering the full application stack:

Area	Key Metrics
Web server (Puma)	Worker count, thread pool capacity, request duration
Background jobs (Sidekiq)	Queue depth, job failure rate, processing latency
CI/CD	Pipeline creation duration, runner job counts
Database	PostgreSQL transaction rate, connection pool
Cache	Redis client connections, command rate

Five dedicated Grafana dashboards visualise the full stack. Alert rules fire on:

Sidekiq queue backlog exceeding threshold (pipelines stalling)
High job failure rate (something breaking in CI)
Puma worker saturation (web layer under pressure)

This means GitLab is treated the same as any other service in the stack — not a special case that gets ignored because "it's just a tool".

Backup Strategy

GitLab's built-in backup tool runs on a schedule and writes archives to TrueNAS over NFS. The backup includes:

All git repositories
Database (PostgreSQL)
Container registry data
Uploaded files and attachments

The GitLab configuration file and secrets file are backed up separately. This is the step that most people skip and then deeply regret during a restore — the secrets file is required to decrypt database content and cannot be regenerated.

What This Enables

Every infrastructure change in the homelab goes through a GitLab merge request:

Ansible playbook update → opens an MR → pipeline lints the YAML and runs --check → merge to main triggers the shell runner to apply
Terraform module change → pipeline runs terraform plan → output posted as MR comment → apply on merge
New Docker service → pipeline builds and pushes image → deploy job pulls and restarts

The result is an auditable history of every change, who made it, what the pipeline showed, and what got deployed. No tribal knowledge, no "I ran something from my laptop", no wondering what changed last Tuesday.

That's what production-grade GitOps looks like — even at homelab scale.