Switching to Multi-Stage Dockerfiles

21 July 2018 - .NET , Docker

I'm now using Docker for quite a few projects, but up until now, my build workflow had tended to involve building the application outside of Docker. Then whilst building the Docker image (using the Dockerfile of course), it would then just copy those published files into the image.

This has worked well, but it does mean that any build tools or SDKs required to build the app also have to be installed. Obviously on my build machine, they are going to be installed anyway - but on build agent, that might not be the case. And you may have many build agents that you would need to ensure all the various build tools are installed on (and updated).

In fact, this is the exact scenario I've recently had at a company I'm working with. I was adding a project (which was using Docker anyway) to their CI/CD pipeline. I could have installed the latest .NET SDK and Node.js on all their build agents. And I could manually update each of these each time there's a new version of .NET or Node. Or I could just install Docker. So I chose the latter. And since doing so, I'm now moving all my Docker projects to build in this way, as it's a much nicer (and more consistent) workflow!

The steps I had previously used for a .NET app had looked something like this ...

dotnet restore
dotnet publish -c Release
docker build ...
docker push ...

As you can see, it's building the application before we do anything related to Docker.

And the Dockerfile used in that docker build step might look something like this ...

FROM microsoft/dotnet:2.1-runtime
WORKDIR /app
EXPOSE 80
COPY ./bin/Release/netcoreapp2.1/publish .
ENTRYPOINT dotnet MyProject.dll

When building this Docker image, it's literally just copying the pre-built application binaries. Docker has no concept of building our app here.

However, if we instead do the dotnet restore/publish inside the Dockerfile, then this means all of the required build tools (.NET SDK, Node, COBOL SDK, whatever) no longer have to be installed on the host machine, as they are included in the base image you specified in your Dockerfile, which then gets pulled down from Docker Hub automatically (and cached).

Even if something you need isn't in the base image, you can still explicitly install it into the image via the RUN command, eg. RUN apt-get install git, etc.

Before we move onto talking about multi-stage Dockerfiles, let's break down the above example for those who are fairly new to Dockerfiles ...

The above "Single"-Stage Dockerfile explained

(feel free to skip this section if you're comfortable with Dockerfiles)

Whilst this post assumes you already know how this kind of Dockerfile works, let's still just go through this one line at a time just to make sure we're on the same page. Remember that this is the file that is used whilst you're building your Docker image - i.e. the docker build step mentioned above.

FROM microsoft/dotnet:2.1-runtime <-- This line defines the base image that are image will be built upon. Note that in this example, we're referring to the runtime version of .NET Core, so is small for production use. It does not contain any .NET build tools.
WORKDIR /app <-- This line sets the current working directory inside the container that we're building this image with. This is just a folder we're going to copy our application into.
EXPOSE 80 <-- This tells Docker what port is going to be exposed from the container.
COPY ./bin/Release/netcoreapp2.1/publish . <-- This is copying all the pre-built application files from outside the container. Note that the source path is relative to where you're running the docker build command from (called the build context). In this example, that path is where the dotnet publish -c Release build our app into. The . at the end indicates the current working directory inside the container, which is our /app folder. So this command is copying our application into our /app folder.
ENTRYPOINT dotnet MyProject.dll <-- This final command specifies the entry point when the image is instantiated into a running container. Remember a Docker image is a template which containers are created from. A container is all about one process. It's run when the container starts, and when that process exits for whatever reason, then container also exits. In this example, the dotnet command is that process, and my project DLL is an argument that's being passed to that process.

Building your application with Docker (the old way)

So if it was a .NET application, you could use microsoft/dotnet:2.1-sdk as the base image, and that'll include your .NET build tools. No need to install .NET 2.1 on each build agent.

Before multi-stage Dockerfiles, this meant that you needed two Dockerfiles - one to build your application (using the base image with the build tools / SDK); and another to build the production image (using the lighter weight runtime base image). You do not want to use the larger SDK version in production at runtime, as the build tools are only required at build time.

Even worse, you also still needed to manually copy the files from the first 'build' image to the second 'runtime' image. This meant that after building the first image, you had to create a container from that image to copy the built files out of it, so you could copy them into the second image!

Enter multi-stage Dockerfiles! ...

Multi-stage Dockerfiles

I first heard of the multi-stage Dockerfile support when it first came out last year, but I didn't really do anything with it as I wasn't using Docker for that much then.

Let's start with an example ...

FROM microsoft/dotnet:2.1-sdk as build
WORKDIR /src
COPY . .
WORKDIR /src
RUN dotnet restore
RUN dotnet publish -c Release

FROM microsoft/dotnet:2.1-runtime
WORKDIR /app
COPY --from=build /src/bin/Release/netcoreapp2.1/publish .
EXPOSE 80
ENTRYPOINT dotnet MyProject.dll

The first thing you'll notice is that there are two FROM statements in the same Dockerfile! So two different base images. The first being the full dotnet SDK image, and the latter being just the cutdown runtime image.

The second thing to notice is that the first FROM statement has as build at the end. Then in the second block, we do COPY --from=build. So Docker is effectively doing two image builds here, but in just one Dockerfile. It then copies a directory out of the first image and into the second.

So this is doing for us what I described in the previous section! It's clean and succinct, and means that the build script only has to do the docker build command - it doesn't have to faff around creating containers of your build image to copy files into your release image!

Bonus tip! Avoiding the `npm install` each time for Node apps

If your Dockerfile looks like this ...

FROM node:10.6.0-alpine AS build
WORKDIR /src
COPY MyWebApp/ .
RUN npm install \
 && npm run build

FROM nginx:alpine
EXPOSE 80
COPY MyWebApp/nginx.conf /etc/nginx/nginx.conf
COPY --from=build /src/dist/ /var/www/

Then every time the source code changes and you build your image, it'll do a full npm install. This obviously can take quite a while!

Instead, try this ...

FROM node:10.6.0-alpine AS build
WORKDIR /src
COPY MyWebApp/package*.json ./
RUN npm install
COPY MyWebApp/ .
RUN npm run build

FROM nginx:alpine
EXPOSE 80
COPY MyWebApp/nginx.conf /etc/nginx/nginx.conf
COPY --from=build /src/dist/ /var/www/

Notice how it's copying the package.json file first and doing the npm install before we copy the rest of the source code? Whilst this will add an extra layer, it does mean that that layer is cached, and will only need to do the npm install if the package.json file changes. Which is much better than if any file in your source code changes!

Please retweet if you enjoyed this post ...