Packing Python scripts with pyInstaller

June 12, 2017
python docker

Utility to display HTTP response headers

Let’s build something akin to curl -I <url>, but way more complicated and with completely unnecessary steps involved. Weather report for today is sunny with a chance to learn something new.

The app

To get started, create a new directory to store your project files. We will be using the requests library, as well as pyInstaller and the easiest way to install those is with pip via a requirements.txt file.

So create a file called requirements.txt with the following content:

requests==2.12.4
pyinstaller==3.2.1

We also need some code to perform the request and show us the response headers. Create a file called run.py.

import requests
import sys

# we specifically include this package, otherwise pyInstaller
# will not automatically identify it and will omit it
from multiprocessing import Queue

# use google.com as a default url, otherwise select what the user
# supplied as a command-line argument
url = sys.argv[1] if len(sys.argv) > 1 else 'https://www.google.com'

# perform a HEAD request
head = requests.request('HEAD', url)

# and show headers
print(head.headers)

Multi-stage Dockerfile

This is a a really cool feature of Docker (since version 17.05) that allows us to skip intermediate images and produce a final, trimmed-down, production-ready image.

The idea here is to separate the building and running of our code. While we are building, we can be a bit sloppy and include more packages than we really need. But when we are running, we should be as optimized as possible.

Before Docker 17.05, we would have to use the Builder pattern, which would require at least two Dockerfiles.

Create a Dockerfile with the following content:

# ---[ Packer stage ]---
FROM python:3.5 as packer
COPY requirements.txt /requirements.txt
RUN pip install -r /requirements.txt
WORKDIR /app
COPY . /app

# pyInstaller, no support for 3.6 at the time of this writing
RUN pyinstaller run.py

# ---[ Runtime stage ]---
FROM busybox:1.26.2-glibc
WORKDIR /app
COPY --from=packer /app/dist/run/ /app

# use `ldd` to find which libraries are called
COPY --from=packer /lib/x86_64-linux-gnu/libdl.so.2 /lib/x86_64-linux-gnu/libdl.so.2
COPY --from=packer /lib/x86_64-linux-gnu/libz.so.1 /lib/x86_64-linux-gnu/libz.so.1
COPY --from=packer /lib/x86_64-linux-gnu/libc.so.6 /lib/x86_64-linux-gnu/libc.so.6

# this one was called from a python .so, which `ldd` did not pickup
COPY --from=packer /lib/x86_64-linux-gnu/libutil.so.1 /lib/x86_64-linux-gnu/libutil.so.1
ENTRYPOINT ["/app/run"]

The real magic here is the --from=packer statement. This will force Docker to use the filesystem from the packer build and copy /app/dist/run/ folder from that filesystem to the current working directory in the new filesytem. If you payed close attention, you will have noticed that we included an extra statement in the first stage, FROM ... as packer. If we ommited this step, then we could still reference that build with --from=0.

Your project directory should now look like this:

$ ls
-rw-r--r-- 1 user group 796 Jun  5 18:23 Dockerfile
-rw-r--r-- 1 user group  35 Jun  5 11:50 requirements.txt
-rw-r--r-- 1 user group 385 Jun  5 11:09 run.py

Build and run

Building the final image

With everything else in place, it’s time to actually build some images.

docker build -t jango/headers .

You can of course change jango/headers to a tag of your preference.

The neat thing about having a separate build and runtime image, is the filesize of the final runtime image.

$ docker images
REPOSITORY       TAG       IMAGE ID            CREATED             SIZE
jango/headers    latest    f1d3bbfa9b78        2 minutes ago       31.1MB
<none>           <none>    d6e0e6f5ca78        2 minutes ago       739MB

As you can see, the build image is almost 24x larger!

Running the image

docker run jango/headers

Running the image, without passing any arguments, will fetch google.com and output headers.

{
    'Cache-Control': 'private, max-age=0', 
    'X-Frame-Options': 'SAMEORIGIN', 
    'Server': 'gws', 
    'Expires': '-1', 
    'Set-Cookie': '...'
}

To fetch some other url, simply pass it as an argument.

docker run --rm jango/headers https://yahoo.com

We also specified --rm this time, to automatically remove the container once it has finished.

Makefile

Let’s make use of the make command to make our lives easier.

Create a file called Makefile.

DOCKER_TAG=jango/headers

build:
	docker build -t ${DOCKER_TAG} .

run:
	@docker run --rm ${DOCKER_TAG} ${URL}

Building with make

make build

Running with make

# to fetch google.com
make run

# to fetch yahoo.com
make run URL=https://yahoo.com

Update #1 (2017-06-15): clarified introduction text