A Guide to Optimize Build-Time in Docker

Sugandh Pasricha and Manish Dave

20 Feb 2020

With Dockers doing the rounds in the technical world there is a more practical approach towards deploying applications and several companies are now becoming a part of this change and opting to move towards containerization. Well, HashedIn was graced with such an opportunity to handle the containerization for a client, around 150 java, go and python based applications were dockerized and the best practices were followed for a stupendous outcome.

 

One of the KPI of our success was reducing build time. This blog covers a few tips on how we can reduce the build time of a docker image and focus on ensuring that the build time was minimal :

 

 

If you’re new to Docker you can refer to our blogs, Getting started with Docker, Create Docker Image, Step-by-step Docker tutorial for beginners.

 

APPROACH TOWARDS SMALL-SIZED IMAGE

Optimize RUN instructions in Dockerfile

 

 

Every line that you write and execute in a Dockerfile starting with “FROM”, “RUN”, “COPY” and “ADD” creates a layer which is basically the building block of a docker image. These layers influence a lot on the build-time and size of the image that will be built out of the Dockerfile.

Docker layers can be very useful where a similar image with different versions is to be run since the image building process becomes faster. But our requirement is to reduce the image build time from the very beginning. 

 

Well, how can that be done?

 

Let’s have a look at the below example:

 

FROM ubuntu:14.04

RUN apt-get update -y

RUN apt-get install -y curl

RUN apt-get install -y postgresql

RUN apt-get install -y postgresql-client

RUN rm -rf /var/lib/apt/lists/*

CMD bash

 

Below are some points to be considered when an image is to be built using this Dockerfile:

 

Every RUN command will create a new Docker layer

 

  1. The apt-get update command increases the image size 
  2. The rm -rf /var/lib/apt/lists/* command doesn’t reduce the size of the image

 

Considering these to be constant, five layers are being generated by specifying five application-dependencies. This method of including dependencies increases the build-time, further increasing the image size.

Collectively adding the constant dependencies together, i.e the kind of dependencies that do not/rarely tend to change reduces the number of image layers and creates large cache which reduces build-time for future image builds as well.   

 

With the following technique, the number of layers can be reduced:

 

FROM ubuntu:14.04

RUN apt-get update -y && \

    apt-get install -y curl postgresql postgresql-client && \

    rm -rf /var/lib/apt/lists/*

CMD bash

 

The five extra layers created by using five RUN commands in the previous Dockerfile has now been converted into one single layer. 

 

Thus, the takeaway with this example is that similar dependency-related packages can be clubbed together as a single layer by including them in a single command which will make build-time faster and image-size reduced.

 

 

PROPER USAGE OF DOCKER CACHE

 

As learned in the previous section, new Docker layers are created for every ADD, RUN and COPY command. 

When creating a new image, Docker first checks whether or not a layer with the same content and history already exists in your OS. If that already exists, Docker can reuse it without any occupying any extra space. But if there is no such thing, then Docker needs to create a new layer.

 

Let’s consider this example:

 

FROM ubuntu:16.04

COPY ./sample-app /home/gradle/sample-app

RUN apt-get update && apt-get upgrade -y &&\

  apt-get -y install openjdk-8-jdk openjfx  wget unzip -y && \

  wget https://services.gradle.org/distributions/gradle-2.4-bin.zip && \

  unzip gradle-2.4-bin.zip

RUN /gradle*/bin/gradle build

ENTRYPOINT [ “sh”, “-c”,  “java -jar sample-app.jar”]

 

The above Dockerfile performs the following tasks:

 

 

Now consider we build this Dockerfile, the Dockerfile will be processed line by line, each layer will get created and finally, the image will be created in some time. 

 

After building the image we realize that there’s some mistake in the code or a new feature that has to be added in the application with the same set of dependencies mentioned in the RUN command. Required code changes will be made and again the image will be built. 

 

COPY ./sample-app /home/gradle/sample-app will create a new layer every time there’s a change in the source code. Due to this, the next command where all the dependencies are being installed will start afresh because the history of cached layers has changed. 

 

To prevent this, it’s always better to cache the data that are very less likely to change, which belongs to dependencies. Thus, we should install all the dependencies first, and add the source code on top of it. Once the source code is added it is something like this,

 

FROM ubuntu:16.04

RUN apt-get update && apt-get upgrade -y &&\

  apt-get -y install openjdk-8-jdk openjfx  wget unzip -y && \

  wget https://services.gradle.org/distributions/gradle-2.4-bin.zip && \

  unzip gradle-2.4-bin.zip \

  apt-get autoclean && apt-get autoremove

COPY ./sample-app /home/gradle/sample-app

RUN /gradle*/bin/gradle build

ENTRYPOINT [ “sh”, “-c”,  “java -jar sample-app.jar”]

 

By using the above strategy, you can shorten your image’s build time and reduce the number of layers that need to be uploaded on each deployment.

 

 

CLEANING UP USING APT-GET PACKAGES

 

An application image can be optimized at a dependency level. Debian based images like Ubuntu can use several apt-get packages that help remove extra files that are not required anymore by the application. These extra files or binaries can be considered as inter-dependency, to elaborate the dependencies required by the application have their own set of dependencies which are needed in order to be installed. Once the application’s packages are installed, inter-dependency is not required anymore.

These extra files or binaries end up taking a lot of space thus increasing image size and the build-time.

In order to remove these inter-dependent files, it is mandatory to install the following cleaning packages:

 

 

 

 

 

 

 

 

UTILIZING .dockerignore

 

.dockerignore file is basically used to list out the files that are not required in the application container. This file is supposed to be in the same directory where the build-context is present for the docker image build. Using this file, one can specify ignore-rules and exceptions from these rules for files and folder, that won’t be included in the build context and thus won’t be packed into an archive and uploaded to the Docker server.

 

The following are the list of files that need to be included in .dockerignore file: 

 

 

Sample .dockerignore file:

 

# ignore all markdown files (md) beside all README*.md other than README-secret.md

*.md

!README*.md

README-secret.md

 

Thus, .dockerignore files increase image optimization by decreasing image size, build-time and preventing unintended secret exposure.

 

USAGE OF SMALL BASE IMAGE

 

 

The base images which are big in size, for example, Ubuntu, have a lot of extra libraries and dependencies that are actually not required by the application for which the base image is actually used. These extra non-required dependencies of the base image only make the final application image bulkier.

You can consider the following alternatives for small base image:

 

 

 

At a certain point in time, some dependencies will be missing when the image size is reduced, and you’ll probably have to spend some time figuring out how to manually install them. However, this is only a one-time issue, once resolved can lead to faster deployment of your applications.

 

USING MULTI-STAGE

 

 

The multi-stage build is a feature implemented in Docker 17.05 or higher versions. 

Multistage builds are useful for anyone having difficulties optimizing Dockerfiles when it comes to reducing the image-size or keeping them readable and easy to maintain.

Following are the features of Multi-Stage:

 

 

Consider the following example, wherein the first stage is being utilized for copying the source code and building it using gradle in order to install all the dependencies of the application and package them into a jar file.

In the second stage using a base image to provide a jar file with a run-time environment and copy the artifact (that’s our jar file) in the final stage and then finally running the application.

 

#Stage 1

FROM gradle:4.8.1 AS build 

COPY –chown=gradle:gradle . /home/gradle/sample-app

WORKDIR /home/gradle/sample-app

RUN gradle build -x test

#Stage 2 (Final Stage)

FROM openjdk:8-jre-alpine 

WORKDIR /opt

COPY –from=build /home/gradle/sample-app/server/build/libs/sample-app.jar .

ENTRYPOINT [ “sh”, “-c”,  “java -jar sample-app.jar”]

 

Now, if you read through the Dockerfile you’ll notice that the alpine version is being used only in the final stage and not in all stages.

 

But Why?

 

That’s because the size of the docker image builds from this Dockerfile will be the one with either the final stage by default if not mentioned, otherwise using –target flag in the docker build command.

 

The end result is the same tiny production image as before, with a significant reduction in complexity. This reduces the need to create any intermediate images and you don’t need to extract any artifacts to your local system at all.

 

CONCLUSION

There are a lot more best practices that must be followed in order to get an optimized docker image like using the correct versions of a base image rather than using the latest or being able to understand when to use COPY and ADD, etc.

Following the above practices will surely make Docker implementation easier especially at the production level.

 


Have a question?

Need Technology advice?

Connect

+1 669 253 9011

contact@hashedin.com

linkedIn youtube