How I Cut Our Docker Image From 1.2GB to 90MB
It started with a Slack message from our DevOps lead: "Our CI pipeline is taking 18 minutes to run. What the hell did you change?"
I hadn't changed anything obvious. A dependency update here, a config tweak there. But when I pulled up the image size history in our registry, the number was staring at me like an accusation: 1.2GB. Three months ago it had been around 400MB, which wasn't great either, but at least it was manageable. Somewhere along the way, through a dozen small PRs and no one really paying attention, our Node.js API image had quietly ballooned into something grotesque.
What followed was about two weeks of actual investigation — not just running docker image ls and adding an .alpine suffix to a base tag and calling it done. Real layer-by-layer archaeology. Here's what I found, what I fixed, and the order in which the wins actually landed.
First: Understanding What Was Actually Inside
The first tool I reached for was dive, a CLI utility that lets you walk through Docker layers interactively and see exactly what each instruction added or removed. If you haven't used it, install it now — it's one of those tools that makes you feel slightly embarrassed about every Dockerfile you've ever written.
Running dive our-api:latest immediately showed me the problem was layered (no pun intended). Layer 7 added node_modules — 680MB. Layer 9 added our compiled TypeScript output and source files together — including src/, tests/, coverage/, and a .git directory that had somehow crept in. Layer 12 ran npm install again for reasons that made sense to no one anymore.
That .git folder alone was 47MB. Our test coverage HTML reports were another 90MB. We were shipping our entire development artifact history into production containers.
Win #1: The .dockerignore That Should Have Existed Since Day One
I am not proud that we didn't have a proper .dockerignore. We had one, technically, but it was four lines long and hadn't been updated in a year. Here's roughly what I added:
.git
.gitignore
node_modules
npm-debug.log*
coverage
.nyc_output
dist
*.test.ts
*.spec.ts
__tests__
.env*
docker-compose*
*.md
.eslintrc*
.prettier*
tsconfig*.json
The tsconfig files surprised me — we were copying them in and then never using them at runtime because the TypeScript compilation happened during build. Same with eslint and prettier configs. Dead weight.
This alone dropped the image to about 890MB. Still terrible, but progress.
Win #2: Multi-Stage Builds — The One That Actually Matters
Here was our original Dockerfile, simplified:
FROM node:18
WORKDIR /app
COPY . .
RUN npm install
RUN npm run build
EXPOSE 3000
CMD ["node", "dist/index.js"]
Do you see it? We're installing all dependencies — including every dev dependency, every TypeScript compiler, every test runner — and then running the app with all of that still sitting there. The compiled output dist/ only needs the production dependencies to run. Everything else is build-time tooling that has no business being in a production image.
Multi-stage builds let you use one image for the build step and then copy only the artifacts you actually need into a clean, minimal final image:
FROM node:18-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --include=dev
COPY src/ ./src/
COPY tsconfig.json ./
RUN npm run build
FROM node:18-alpine AS runtime
WORKDIR /app
ENV NODE_ENV=production
COPY package*.json ./
RUN npm ci --only=production && npm cache clean --force
COPY --from=builder /app/dist ./dist
EXPOSE 3000
CMD ["node", "dist/index.js"]
A few things to notice. In the builder stage, I copy package*.json first before copying source. This is the classic Docker cache layer trick — if your dependencies haven't changed, Docker reuses the cached npm ci layer and only re-runs it when package.json or package-lock.json actually changes. Before this, every single code change was triggering a full reinstall.
In the runtime stage, I run npm ci --only=production separately, which means TypeScript, ts-node, jest, eslint, and everything else used only during development simply never enters the final image. Then npm cache clean --force wipes the npm cache that otherwise just sits there taking up space.
After this change: 210MB. We cut the image by almost 80% just from this one Dockerfile rewrite.
Win #3: Pinning Base Tags — The One People Skip
We were using node:18-alpine which seems specific but isn't. Alpine Linux releases minor updates, and node:18-alpine will silently pull a different underlying layer next month than it does today. This matters for two reasons: reproducibility and size creep.
I looked up the specific digest we wanted:
docker pull node:18.20.4-alpine3.20
docker inspect node:18.20.4-alpine3.20 --format='{{index .RepoDigests 0}}'
And pinned to the SHA256 digest in our Dockerfile:
FROM node:18.20.4-alpine3.20@sha256:a8b4f9e3c2d1...
This also surfaced something interesting: our CI was building on node:18-alpine but a slightly different version of it than our local machines, which explained a subtle difference in one UUID generation library's behavior we'd been chasing for weeks. The library used crypto.randomUUID() which behaved slightly differently across patch versions. Pinning fixed it instantly.
For our build scripts that generated content hashes for cache-busting, we'd been using a combination of md5sum (available on the full Debian-based node image) and a custom script. On Alpine, md5sum isn't there by default. Discovering this while debugging image sizes led us to replace it with Node's built-in crypto module directly — which was actually more portable anyway and removed a shell script dependency that had been a hidden source of hash inconsistencies between environments.
Win #4: Production-Only node_modules Pruning
After the multi-stage build, I ran dive again and noticed our production node_modules was still 85MB. I went through the top ten largest packages:
aws-sdkv2 — we'd migrated to v3 modular packages six months ago. The v2 package was still independenciesas a leftover.moment— we usedate-fns. Someone had installed moment for a one-off utility that got deleted.- Three different PDF generation libraries — we use exactly one of them.
Cleaning up actual unused production dependencies brought node_modules from 85MB to 31MB. This is the part no tool can automate for you — you have to actually look.
Final image: 90MB.
What Changed in CI
The pipeline went from 18 minutes to 6. Most of that gain came from layer caching actually working properly now — because we copy package*.json before source code, the npm install step is cached on the vast majority of builds. We also stopped pushing 1.2GB images to our registry, which reduced our ECR storage costs noticeably.
We added a size gate to the pipeline:
- name: Check image size
run: |
SIZE=$(docker image inspect our-api:${{ github.sha }} --format='{{.Size}}')
MAX=$((150 * 1024 * 1024))
if [ "$SIZE" -gt "$MAX" ]; then
echo "Image size ${SIZE} exceeds 150MB limit"
exit 1
fi
Crude but effective. The next time someone accidentally copies in a new directory of test fixtures, the build fails loudly instead of silently shipping it to production.
The Part Nobody Talks About
The lesson I keep thinking about isn't really about Docker. It's about how technical debt accumulates in infrastructure the same way it accumulates in code — gradually, through individually defensible decisions, until one day you're staring at a 1.2GB image wondering how you got there.
Nobody added the .git directory maliciously. Nobody made a conscious decision to install dev dependencies in production. It just happened, layer by layer, PR by PR, because nobody owned the Dockerfile the way developers own application code.
Now we review Dockerfile changes in PRs the same way we review application logic. We run dive in CI and post the output as a comment. Small images are a forcing function for actually understanding what your application needs to run — nothing more, nothing less.
The 1.2GB image was a symptom. The actual problem was inattention. The fix was just paying attention, layer by layer, until there was nothing left that wasn't necessary.