A Guide to Optimize Build-Time in Docker

20 Feb 2020

A Guide to Optimize Build-Time in Docker

With Dockers doing the rounds in the technical world there is a more practical approach towards deploying applications and several companies are now becoming a part of this change and opting to move towards containerization. Well, HashedIn was graced with such an opportunity to handle the containerization for a client, around 150 java, go and python based applications were dockerized and the best practices were followed for a stupendous outcome.
One of the KPI of our success was reducing build time. This blog covers a few tips on how we can reduce the build time of a docker image and focus on ensuring that the build time was minimal :

Small-Image size
Fast build-time

If you’re new to Docker you can refer to our blogs, Getting started with Docker, Create Docker Image, Step-by-step Docker tutorial for beginners.

APPROACH TOWARDS SMALL-SIZED IMAGE

Optimize RUN instructions in Dockerfile
Every line that you write and execute in a Dockerfile starting with “FROM”, “RUN”, “COPY” and “ADD” creates a layer which is basically the building block of a docker image. These layers influence a lot on the build-time and size of the image that will be built out of the Dockerfile.
Docker layers can be very useful where a similar image with different versions is to be run since the image building process becomes faster. But our requirement is to reduce the image build time from the very beginning.

Well, how can that be done?

Let’s have a look at the below example:

FROM ubuntu:14.04 RUN apt-get update -y RUN apt-get install -y curl RUN apt-get install -y postgresql RUN apt-get install -y postgresql-client RUN rm -rf /var/lib/apt/lists/* CMD bash Below are some points to be considered when an image is to be built using this Dockerfile: Every RUN command will create a new Docker layer The apt-get update command increases the image size The rm -rf /var/lib/apt/lists/* command doesn’t reduce the size of the image Considering these to be constant, five layers are being generated by specifying five application-dependencies. This method of including dependencies increases the build-time, further increasing the image size. Collectively adding the constant dependencies together, i.e the kind of dependencies that do not/rarely tend to change reduces the number of image layers and creates large cache which reduces build-time for future image builds as well. With the following technique, the number of layers can be reduced: FROM ubuntu:14.04 RUN apt-get update -y && \ apt-get install -y curl postgresql postgresql-client && \ rm -rf /var/lib/apt/lists/* CMD bash The five extra layers created by using five RUN commands in the previous Dockerfile has now been converted into one single layer. Thus, the takeaway with this example is that similar dependency-related packages can be clubbed together as a single layer by including them in a single command which will make build-time faster and image-size reduced.

PROPER USAGE OF DOCKER CACHE

As learned in the previous section, new Docker layers are created for every ADD, RUN and COPY command. When creating a new image, Docker first checks whether or not a layer with the same content and history already exists in your OS. If that already exists, Docker can reuse it without any occupying any extra space. But if there is no such thing, then Docker needs to create a new layer. Let’s consider this example: FROM ubuntu:16.04 COPY ./sample-app /home/gradle/sample-app RUN apt-get update && apt-get upgrade -y &&\ apt-get -y install openjdk-8-jdk openjfx wget unzip -y && \ wget https://services.gradle.org/distributions/gradle-2.4-bin.zip && \ unzip gradle-2.4-bin.zip RUN /gradle*/bin/gradle build ENTRYPOINT [ “sh”, “-c”, “java -jar sample-app.jar”] The above Dockerfile performs the following tasks: Pull an ubuntu image as a base image of the Docker container Copy application’s code into the container Install and update all the dependencies Build jar file from all the dependencies Finally, run the jar file of the application Now consider we build this Dockerfile, the Dockerfile will be processed line by line, each layer will get created and finally, the image will be created in some time. After building the image we realize that there’s some mistake in the code or a new feature that has to be added in the application with the same set of dependencies mentioned in the RUN command. Required code changes will be made and again the image will be built. COPY ./sample-app /home/gradle/sample-app will create a new layer every time there’s a change in the source code. Due to this, the next command where all the dependencies are being installed will start afresh because the history of cached layers has changed. To prevent this, it’s always better to cache the data that are very less likely to change, which belongs to dependencies. Thus, we should install all the dependencies first, and add the source code on top of it. Once the source code is added it is something like this, FROM ubuntu:16.04 RUN apt-get update && apt-get upgrade -y &&\ apt-get -y install openjdk-8-jdk openjfx wget unzip -y && \ wget https://services.gradle.org/distributions/gradle-2.4-bin.zip && \ unzip gradle-2.4-bin.zip \ apt-get autoclean && apt-get autoremove COPY ./sample-app /home/gradle/sample-app RUN /gradle*/bin/gradle build ENTRYPOINT [ “sh”, “-c”, “java -jar sample-app.jar”] By using the above strategy, you can shorten your image’s build time and reduce the number of layers that need to be uploaded on each deployment. CLEANING UP USING APT-GET PACKAGES An application image can be optimized at a dependency level. Debian based images like Ubuntu can use several apt-get packages that help remove extra files that are not required anymore by the application. These extra files or binaries can be considered as inter-dependency, to elaborate the dependencies required by the application have their own set of dependencies which are needed in order to be installed. Once the application’s packages are installed, inter-dependency is not required anymore. These extra files or binaries end up taking a lot of space thus increasing image size and the build-time. In order to remove these inter-dependent files, it is mandatory to install the following cleaning packages: apt-get clean It clears the local repository of retrieved package files that are left in /var/cache. The directories it cleans out are /var/cache/apt/archives/ and /var/cache/apt/archives/partial/. apt-get autoclean It clears the local repository of retrieved package files, but it only removes files that can no longer be downloaded and are virtually useless. It helps to keep your cache from growing too large. apt-get autoremove Removes packages that are automatically installed because some other package requires them but, with those other packages removed, they are no longer needed. The packages to be removed are often called “unused dependencies”.

UTILIZING .dockerignore

.dockerignore file is basically used to list out the files that are not required in the application container. This file is supposed to be in the same directory where the build-context is present for the docker image build. Using this file, one can specify ignore-rules and exceptions from these rules for files and folder, that won’t be included in the build context and thus won’t be packed into an archive and uploaded to the Docker server. The following are the list of files that need to be included in .dockerignore file: Build logs Test scripts Temporary files caching/intermediate artifacts Local secrets Local development files such as docker-compose.yml Sample .dockerignore file: # ignore all markdown files (md) beside all README*.md other than README-secret.md *.md !README*.md README-secret.md Thus, .dockerignore files increase image optimization by decreasing image size, build-time and preventing unintended secret exposure.

USAGE OF SMALL BASE IMAGE

The base images which are big in size, for example, Ubuntu, have a lot of extra libraries and dependencies that are actually not required by the application for which the base image is actually used. These extra non-required dependencies of the base image only make the final application image bulkier. You can consider the following alternatives for small base image: Alpine : Alpine Linux is a Linux distribution built around musl libc and BusyBox. The image is only 5 MB in size and has access to a package repository that is much more complete than other BusyBox based images. It uses its own package manager called apk, the OpenRC init system, script driven set-ups. Scratch The scratch image is the most minimal image in Docker. For all other images, this is used as the source image. The image on the scratch is actually empty since there are no directories, libraries or any dependencies present in it. At a certain point in time, some dependencies will be missing when the image size is reduced, and you’ll probably have to spend some time figuring out how to manually install them. However, this is only a one-time issue, once resolved can lead to faster deployment of your applications.

USING MULTI-STAGE

The multi-stage build is a feature implemented in Docker 17.05 or higher versions. Multistage builds are useful for anyone having difficulties optimizing Dockerfiles when it comes to reducing the image-size or keeping them readable and easy to maintain. Following are the features of Multi-Stage: With multi-stage builds, one can use multiple FROM statements in your Dockerfile. Each FROM instruction tends to use a different base image, and each of them begins a new stage of the build. You can selectively copy artifacts from one stage to another, leaving behind everything you don’t want in the final image. Consider the following example, wherein the first stage is being utilized for copying the source code and building it using gradle in order to install all the dependencies of the application and package them into a jar file. In the second stage using a base image to provide a jar file with a run-time environment and copy the artifact (that’s our jar file) in the final stage and then finally running the application. #Stage 1 FROM gradle:4.8.1 AS build COPY –chown=gradle:gradle . /home/gradle/sample-app WORKDIR /home/gradle/sample-app RUN gradle build -x test #Stage 2 (Final Stage) FROM openjdk:8-jre-alpine WORKDIR /opt COPY –from=build /home/gradle/sample-app/server/build/libs/sample-app.jar . ENTRYPOINT [ “sh”, “-c”, “java -jar sample-app.jar”] Now, if you read through the Dockerfile you’ll notice that the alpine version is being used only in the final stage and not in all stages. But Why? That’s because the size of the docker image builds from this Dockerfile will be the one with either the final stage by default if not mentioned, otherwise using –target flag in the docker build command. The end result is the same tiny production image as before, with a significant reduction in complexity. This reduces the need to create any intermediate images and you don’t need to extract any artifacts to your local system at all.

CONCLUSION

There are a lot more best practices that must be followed in order to get an optimized docker image like using the correct versions of a base image rather than using the latest or being able to understand when to use COPY and ADD, etc. Following the above practices will surely make Docker implementation easier especially at the production level.