# Custom Repository for Linux Packages

## 1. Context
I have a pet project: an Electron application for Linux that grew over two years to roughly 25,000 monthly active users across all channels. It was distributed mainly as an AppImage. As the userbase grew, some users started to hit the limits of a single-format approach.

The awkward truth about Linux packaging is that there is no truly universal format, despite all attempts at creating it. Different distribution families handle dependencies differently. That leads to the ecosystem being so rich and varied, but it also means broad coverage requires shipping several formats rather than one. To reach most users, you realistically need five: AppImage, DEB, RPM, Snap, and Flatpak.

Two of those five were already well-served. AppImage has no central repository. You distribute it as a downloadable file from your own site, and it carries its own auto-update mechanism. Snap, served through Snapcraft.io, is close to fully automated and needs little human involvement. The gap was the other three. DEB, RPM, and Flatpak had no centralized, auto-updating channel.

The obvious fix is a centralized managed repository, where the real win for users is automatic updates through their native package manager. But the existing hosted options do not always fit. PPA, COPR, and Flathub each come with policy constraints around source-code openness, licensing, or AI usage, and any of those can rule a service out for a given project. Each one also asks you to hand over your signing identity to a third party.

That leaves a concrete question. Can you self-host a centralized repository for DEB, RPM, and Flatpak yourself, covering the gap those three formats leave, without surrendering control of your signing identity or babysitting a fleet of services? This case study is the answer.

## 2. Constraints and non-goals
- **Single application, single publisher.** This is not a multi-app, multi-tenant platform. A company or a developer shipping desktop software usually has one or a few products and will not let outsiders publish to their infrastructure, so identity and repository management for many publishers is not needed.
- **No long-running per-format daemon.** Serving packages is trivial and should not consume meaningful resources. At rest the repository is just a tree of static files.
- **Public, read-only, no auth, by design.** App building is done in CI, and distribution does not need authorization on the read path. A read-only repository open to anyone is the right fit. That means the security model is about protecting the write path and the integrity of what gets published, not about gating access to it.
- **Architectures frozen at first publish.** This is an accepted limitation. The apt index layout fixes its architecture list when the repo is first created. For a single app shipping to a known set of targets, that trade-off is acceptable up front. I come back to it in the final section.

## 3. The decision / approach

The decision reduces to a single fork. If you want object storage plus a CI job that generates the repo plus nginx to serve it, then lightweight static-metadata generators like `aptly` and `createrepo_c` are sufficient and often preferable. If you want an internal package platform with governance, meaning multiple repos, controlled promotion, upstream mirroring, and auditability, then Pulp 3 fits better.

The trade-off is operational simplicity versus built-in governance and promotion semantics. For distributing a handful of package versions of one app, the governance machinery is weight you would carry but never use, so the lightweight path wins.

The same logic applies to self-hosting at all rather than reaching for a hosted service. Self-hosting earns its keep when you want to control retention and package history, publish for multiple distributions or release channels from one pipeline, avoid vendor lock-in, or tightly control resource usage. It also wins when you have three ecosystems and one app.

I considered three alternatives before settling on the lightweight stack.

- **Flathub, PPA, COPR.** These are three different hosted services, each covering one format under its own policy regime, each requiring its own surrender of signing identity. The real cost is fragmentation across three vendors with three sets of rules.
- **flat-manager.** A second persistent daemon, only justified by token-authenticated multi-publisher uploads.
- **Pulp 3.** The right answer at higher scale, and overkill for a single-app push model.

## 4. Implementation highlights

One note before the details. When I started, I expected Flatpak to require a living daemon for its API and build backend. It turned out the whole thing could be a fully static repo served by nginx, and that discovery reshaped the architecture below.

### One service, everything else on demand

The whole system rests on a single observation. A `deb`, `rpm`, or `flatpak` repository is just a tree of static files plus some index metadata. Nothing needs to be running to serve it. The tools that build those indexes, namely `aptly`, `createrepo_c`, and `flatpak`/`ostree`, are one-shot metadata generators. They read the packages on disk, write out index files, and exit. They only need to run when the repo changes, not while it is being served.

So only one process stays up. That is nginx, serving three sibling directory trees over HTTP. Everything else runs on demand, writes into the same tree nginx serves, and exits.

```
ALWAYS RUNNING                    ON-DEMAND (run, write files, exit)
┌──────────────┐                  ┌────────────────────────────────────┐
│    nginx     │  serves  ◄────── │  aptly         (rebuild apt index) │
│ static files │   reads          │  createrepo_c  (rebuild rpm index) │
│ + ACME chal. │                  │  flatpak       (rebuild ostree)    │
└──────────────┘                  │  certbot       (issue/renew TLS)   │
                                  └────────────────────────────────────┘
```

It could be implemented with Docker Compose profiles. In `docker-compose.yml`, only nginx is a plain service. The four on-demand containers carry `profiles: ["tools"]`, which keeps them out of `docker compose up` entirely. They are built but never started as daemons. Each one runs via `docker compose run --rm <svc>`, wrapped by the publish and TLS scripts. The rule that keeps the architecture is simple: never turn a publisher into a long-running service.

TLS fits the same shape. `certbot` is just another on-demand container. It runs to obtain or renew the Let's Encrypt certificate via the HTTP-01 challenge (which nginx serves from a shared webroot), then exits. The cert lands in a shared volume nginx reads. There is no persistent ACME agent.

Flatpak is a bit different from the rest. Flatpak's official multi-build server, flat-manager, is a long-running daemon, and adopting it would have added the one persistent service the whole design avoids. For a single-app push model, plain `flatpak build-update-repo` is one-shot exactly like `createrepo_c`, so that is what I used. flat-manager earns its keep only with token-authenticated multi-publisher uploads, build queuing, or delta generation at scale, none of which a single-app channel needs.

### The two-key trust model

With the serving model settled, the next problem was trust. There are two distinct signing operations, and conflating them is the common mistake.

|                | Package signing | Repository signing |
| -------------- | --------------- | ------------------ |
| What is signed | The individual `.deb` / `.rpm` / flatpak commit | The repo's index metadata (`Release`, `repomd.xml`, ostree `summary`) |
| Answers | "Who built this package, and is it untampered?" | "Is this the authentic index of the repo?" |
| Role | Package maintainer (possibly many) | Repository owner (exactly one) |
| Where | In CI, at build time | On the server, when the index is regenerated |
| Key lives | With the maintainer / in CI secrets | On the repo server only, `chmod 600` |

Keeping the keys and steps separate has concrete payoffs. A second package maintainer can be added with their own key without touching the repo's signing identity. A compromise of a maintainer's CI key does not compromise the index signature, and the reverse holds too. The discipline is absolute in both directions. Package signing never happens on the server, and index signing never happens in CI.

The two keys are protected differently, because they face different leaks. The maintainer key is passphrase-protected, and the passphrase is held as a separate CI secret injected only at signing time. This does not help if the whole CI secret store is read at once, since both would leak together. What it defends against is partial exposure, where the key blob alone escapes through a build artifact, a log line, an accidental commit, or a stray backup. In those cases the passphrase is still missing and the leaked blob is inert. The repo key cannot benefit from this. It signs the index unattended on the server on every publish, so any passphrase would have to sit on the same host beside it and be readable by the same automation, which is the same blast radius and also breaks unattended ostree summary signing. So it is intentionally passphrase-less and protected instead by filesystem permissions, host hardening, and a separate encrypted backup.

### The server is the enforcement point

Client capabilities for per-package signatures differ sharply by format. `apt`, for instance, does not verify per-package signatures at all. So I enforce the maintainer signature on the server, before anything is indexed. CI emits a detached, armored signature alongside each package. The publish scripts verify it against the maintainer public key and fail closed if it is missing or made by the wrong key. Without this gate, anyone who could write a file into the drop-zone could get arbitrary content signed into the index.

The publisher containers also import the repo private key in order to sign the index, so a plain `gpg --verify` would happily accept a package signed by the repo key too, which silently defeats the whole split. So verification runs in a throwaway GnuPG home that holds only the maintainer public key, then asserts the signer fingerprint matches it. A related gotcha drives another rule: always select keys by fingerprint, never by email. The two keys may share an email, and an email selector lets gpg grab the wrong one, which then fails unattended index signing in a cryptic way.

### Unprivileged upload, privileged publish

CI should be able to push packages without holding a shell, Docker access, or write access to the deploy root. The design splits the two responsibilities cleanly. CI is an unprivileged, SFTP-only, chrooted `publish` user that can do exactly one thing: drop files into an incoming directory and write a trigger marker. A systemd path unit watching that marker runs the publish pipeline as root. CI never touches the running system directly.

The drop-zone deliberately lives outside the deploy root. sshd refuses to chroot a user unless every parent of the chroot is root-owned, but the deploy root is owned by the deploying user. So the incoming directory gets its own root-owned location rather than bending the deploy root's ownership to fit.

Because a root-run pipeline is consuming files that an unprivileged user controls, the publish script applies several defensive controls before it reads or writes anything.

- **Symlink rejection.** The publish user owns the drop-zone subdirectories over SFTP, so it could plant a symlink pointing a `.deb` at a root-readable file, or a status file at `/etc/cron.d/`. The script refuses outright if any symlink exists anywhere under the incoming root.
- **Trigger consumed up front.** The marker is removed before publishing begins, so a failed publish does not leave the path unit hot-looping on a marker that never clears. The packages stay in place for inspection.
- **Atomic, symlink-safe status writes.** Completion status is written to a fresh root-owned temp file and moved into place. `rename(2)` never follows a symlink in the final component, so a racing symlink cannot trick the root process into writing through it.

There is also an asynchronous-completion obstacle worth calling out, because it is what makes the split practical to operate. The re-index runs out of band, and CI has no shell on the host, but CI still needs to know when publishing actually finished rather than just when the upload landed. So CI stamps a unique run id into the trigger marker, and the pipeline echoes `done <id>` or `failed <id>` into a status file in the SFTP chroot. CI polls over SFTP and proceeds only when it sees its own run id, so a stale status from a previous run never matches. That is what lets a post-publish step, such as scaling a pay-as-you-go VM back down, safely wait for the real finish.

### The CDN cache split

Serving 25,000 monthly users cheaply comes down to one decision. Because everything is a static file, nginx emits `ETag` and `Last-Modified` on every response, and a CDN edge can revalidate with a cheap `304`. The cache policy splits the tree in two.

- **Immutable payloads.** These are the actual `.deb` and `.rpm` files and the content-addressed flatpak/ostree objects. Once published under a path they are never rewritten, so they are cached hard at the edge and in clients (default 60 days).
- **Mutable index metadata.** This is `Release`/`InRelease`, `repomd.xml`, the ostree `summary`, the landing page, and the public keys. These are regenerated on every publish, so they get a short TTL (default 1 day), which is also the upper bound on how long a publish takes to become visible.

Worth noting that the flatpak commit's detached GPG signature, a `*.commitmeta` file, should be treated differently. It lives under the otherwise-immutable `objects/` tree, but it is mutable. It is absent until the commit is signed, and rewritten on re-sign or key rotation. Caching it as an immutable payload would let a CDN pin a stale or missing signature across a publish, and clients reject that outright with "GPG verification enabled, but no signatures found". So a preceding regex location carves it out to the short metadata TTL. Getting that one path wrong is the difference between a CDN that works and one that intermittently breaks installs.

This split is also what makes putting the repo behind a CDN safe. A client never sees a fresh index pointing at packages the edge cannot serve yet, because the index always expires faster than the payloads it references.

If you want to explore the real implementation, check out the [repo with the source code](https://github.com/anechunaev/linux-repo-server). It is hosted at [repo.nechunaev.com](https://repo.nechunaev.com/).

## 5. Rollout

The repository was added alongside existing channels, each of which served a different slice of the userbase and got a different upgrade out of it.

Most users were on the AppImage, which already had a working auto-update mechanism, so for them the new repo is not a rescue. It is an additional, more native option. A few thousand users were on `deb` and `rpm` packages attached to GitHub releases, downloading and installing them by hand with no update path at all. This is the group the repository changes most, turning a manual re-download every release into a normal `apt upgrade` or `dnf upgrade`. A more recent Snap channel on Snapcraft.io served under a thousand users. Flatpak did not exist for this app before the project, so that format is a brand-new channel rather than a migration.

That framing shaped the rollout into addition rather than cutover. Nobody is forced to move. The AppImage and Snap channels keep working, the old GitHub-release packages stay where they are, and the repository simply becomes the recommended path for users who want signed, auto-updating `deb` and `rpm` installs, plus the only path for flatpak. The migration that actually matters is the hand-install deb and rpm cohort discovering they no longer have to watch for updates. That migration is opt-in by changing where they install from, not a breaking change pushed at anyone.

Provisioning the host is deliberately a single step. A fresh Debian or Ubuntu machine becomes a live repository with one Ansible playbook run. The playbook lays down the deployed project at `/srv/repo` and renders the nginx config and landing page from the same `.env` that drives everything else. There is no multi-stage bring-up to sequence.

Publishing is fully automated from the first release onward. On every release, CI signs each artifact with the maintainer key, SFTPs the packages into the drop-zone, and drops the trigger marker. The server then verifies signatures, regenerates and re-signs the indexes, and rebuilds the landing page. CI polls for its own run id in the status file and reports the release as succeeded or failed only once the server-side re-index has actually finished. So the automated publish is end-to-end from tag to live repo, with no manual step in the happy path.

A package repository has exactly one test that proves it works: a clean-machine install. Before relying on any change that touches the publish path, the nginx config, or the Ansible role, I run a real `apt install`, `dnf install`, and `flatpak install` from all three repos on freshly-provisioned VMs. These are QEMU guests restored from clean-install snapshots, so each test starts from a genuinely pristine state rather than a machine my earlier tests have already contaminated. It is a manual gate today. Automating it is feasible but out of scope for this iteration.

## 6. Results

- **Footprint**
  - Idle: about 1% vCPU and 0.5 GB RAM
  - Building: about 75% vCPU and 1 GB RAM, a spike within 50 seconds
  - Peak network: 464.3 KB/s in, 40.5 KB/s out
- **Operability:** about 15 minutes from bare host to live repo, including a full CI build-and-publish run.
- **Cost:** about $3.7 a month with my cloud provider (1 vCPU, 1 GB RAM, 50 GB NVMe plus backups, 100 TB bandwidth).
- **Reach:** I will update this after a year in production. I expect this service to serve 10k to 15k MAU of the app's total userbase.

## 7. What I'd do differently

**Bind the trust split tighter at the client.** For rpm and flatpak, clients import both the maintainer and repo public keys into a single trust set used for both package and index checks. That means a repo-key compromise on the server could be used to forge packages those clients would accept, so the package-versus-index split I rely on is not fully enforceable on the client side. The server-side verification gate, the repo key's restrictive permissions, host hardening, and an offline encrypted key backup are what bound the blast radius today. If I revisited this, I would dig into per-format mechanisms for keeping those trust sets genuinely separate at the client, and accept that some of it may simply be a constraint of the ecosystems rather than something I can close.

**Revisit freezing architectures at first publish.** The apt index layout fixes its architecture list at the first publish, which was a fine call for a single app shipping to a known set of targets. But adding an architecture later is more painful than it should be. If I were doing it again I would at least make that decision explicit and reversible from the start, rather than discovering the constraint when I need a new target.

**Match the distribution model to the scale, instead of defaulting to self-hosted.** This is the big one, and my answer genuinely depends on the numbers.

- _One app, one maintainer_, which is exactly this project. I would push the whole thing into CI and drop the server entirely. The index builders are already one-shot generators, and nothing about them strictly needs a persistent host. CI could build the indexes, sign them, and publish the static tree to object storage behind a CDN. That removes the host, the systemd path unit, the SFTP chroot, and the whole privileged-publish dance in one move. For a single-publisher push model, the server is arguably more machinery than the problem requires.
- _Past about 10 apps or more than one maintainer._ In this case the hand-rolled glue stops paying for itself. Multiple apps, more architectures, and teammates who should not be touching shell scripts are precisely the conditions where I would move to Pulp 3, which natively unifies deb, rpm, and flatpak/OSTree behind one API and replaces the signing-and-cron glue with a single managed model. It is a rare condition, because modern pipelines push most of the build, sign, and publish work into CI (so a single system user do all of this), but it is still possible to have many apps with different lifecycles that need careful management.
- _In between._ The current stack is the right tool. It gives more control and a smaller footprint than Pulp, and it is more durable and operable than a pure-CI setup.

The thing I would keep in all three cases is the discipline that made this work. That means one source of truth for configuration, fail-closed signature verification, and the cache split, none of which is specific to self-hosting.
