Skip to content

Building a Docker image for a Python Django application

Tags: docker, hosting, python, raspberry-pi • Categories: Learning

Table of Contents

After building a crypto index fund bot I wanted to host the application so the purchase routines would run automatically. In addition to this bot, there were a couple of other smaller applications I’ve been wanting to see if I could self-host (Monica, Storj, Duplicati).

In addition to what I’ve already been doing with my Raspberry Pi, I wanted to see if I could host a couple small utilities/applications on it, and wanted to explore docker more. A perfect learning project!

Open Source Docker Files

As with any learning project, I find it incredibly helpful to clone a bunch of repos with working code into a ~/Projects/docker so I can easily ripgrep my way through them.

And here’s my resulting Dockerfile for hosting the crypto index fund bot I’ve been playing with.

Learning Docker

I first ran into Docker at a Spree conference way before it was widely adopted. I remember thinking the technology sounded neat, but it was hard to imagine why you’d want to build a docker container.

It takes time for new technologies to make sense. Now docker containers are everywhere, and you can’t imagine living without them. Although I’ve used docker indirectly through Heroku, Dokku, or blindly running docker compose up on an open source project, I’ve never dug in and actually created my own docker image.

Here’s what I learned while writing my first image:

  • Docker has great install instructions.
    • The repository-based install instructions did not work for me. I went the sh install script route. This guide was helpful
    • Run sudo docker run hello-world to verify docker is working
  • Each command in a Dockerfile generates a new ‘layer’ (intermediate container image). These layers are incrementally built upon to generate your final docker image.
  • ENTRYPOINT always has a default of a shell, CMD is not set by default.
  • ENTRYPOINT cannot be overwritten, CMD can when specified with a docker run command
  • The base images are generally pretty bare. You’ll need to install the packages that you need using something like RUN apt-get update && apt-get install -y --no-install-recommends bash
  • You’ll see set -eux at the beginning of most RUN or other shell commands executed by docker. This ensures that when one shell command fails, the failure bubbles up and the docker build fails as well. Look at the manpage for set to learn more about the specific failure codes.
  • docker exec runs a command within an existing container, docker run creates a new container and executes the command.
  • .dockerignore is like .gitignore but for the COPY command, which is generally used to grab your source code and stuff it in the container. This is important because each command that is run in a Dockerfile attempts to create a cache of the image at that state. If you include files in COPY that are not core to your application, and they are modified often, it will cause longer docker build times, which will slow down your development loop.
  • If a docker command fails, you’ll get a image SHA that you can use to jump into teh container and debug its state: docker image inspect b01352c2271a
  • dive is a really neat tool to inspect each layer of an image. Helpful for debugging container issues.
  • It’s not possible to map a layer SHA to a Dockerfile. When the layers are pulled into your local, they aren’t tagged. Your best bet is using the FROM commands in your Dockerfile and attempting to find the source Dockerfile the tagged images were created from. However, you can publish a docker image to Docker Hub without linking it to an open source Dockerfile (this seems to be rare in practice).
  • What are the differences between all of these base image types? The most popular ones I’ve seen are Debian (buster, stretch, etc) and Alpine. This is a good explanation. Bottom line is most likely you want debian’s latest release (right now, it’s ‘buster’).
  • You may see ‘busybox’ referenced in Dockerfiles. For a while, alpine linux was popular. It was a slimmed-down linux base layer designed to be small (I don’t fully understand why folks are so concerned with image size). The downside is it doesn’t include important utils—like cron. This is where busybox comes in, it’s a space-efficient GNU-toolset replacement. Most likely, you should just use the full debian image and forget about busybox.
  • However, there are cases where the busybox implementation is better and designed to play well with containerized environments. For instance, if you are running a cron (on debian, alpine makes it easier), it’s challenging to get stdout redirected to the parent process without busybox.
  • Build your image with docker build -t your-image-name . and then run it with docker run --env-file .env -it your-image-name
  • You’ll see rm -rf /some/cache/folder in Dockerfiles. This is to eliminate package management cache, which increases the file size of the image.
  • apt-get clean can be used instead of rm -rf /cache/folder. I’m not sure why this is more commonly used in Dockerfiles.
  • By default, COPY requires the source file to exist. However, you can use a glob to safely optionally copy a file COPY *external_portfolio.json ./
  • You can have multiple FROM statements in your file. This is helpful if you need to install two runtimes (rust and python, for example).

Running Cron in a Docker Debian Container

At some point, you’ll need to run a specific command on some sort of schedule without installing a full-blown job scheduler like Resque or Celery.

The ‘easiest’ way to do that is via a simple cron entry. However, cron is not plug-n-play on docker images as I painfully discovered.

  • Cron is not installed by default in debian base layers. This is done to save space.
  • Installing busybox does not install the cron component when using debian. This is probably because it’s available via the standard cron package.
  • Here’s how to install cron on a debian-based image apt-get update && apt-get install -y --no-install-recommends cron && apt-get clean
  • You may be wondering: why use debian? This all seems so difficult, right? In my specific scenario, I’m using the python docker image which defaults to debian. From what I understand, alpine could cause other dependency issues with python and python extensions which contain C-based extensions.
  • You don’t need to install rsyslog in order to get stdout routed to the parent process (and therefore displayed in the docker logs).
  • To get stdout routed to the parent process, add > /proc/1/fd/1 2>&1 at the end of your cron job definition.
  • By default, cron uses sh not bash and does not pick up on any of the environment variables passed into the docker container.

Here’s my cron.sh to setup the cron schedule and execute it:

#!/bin/bash -l

set -eu

printenv | awk -F= '{print "export " """$1""""=""""$2""" }' >> /etc/profile

echo "$SCHEDULE root sh -lc '/pull/path/to/executable' > /proc/1/fd/1 2>&1" >> /etc/crontab
cron -L 8 -f

It’s insane to me this isn’t more simple. Another argument for keeping docker containers as simple as possible and moving as much execution logic into your application.

Building a Dockerfile

In many cases, a repo will have multiple different dockerfiles. For instance, the Monica repo has a couple different dockerfiles for various purposes. You can specify which file to build using -f:

docker build -t monicahq/monicahq -f scripts/docker/Dockerfile

The -f argument is important, as opposed to cding into the directory with the Dockerfile, since we want many of the commands (notably COPY) to run from a specific directory on the host.

As build is running, it outputs a hash (e.g. c1861cb1ff7f) at each step. When the build fails, you can use that hash to debug the container by shelling in and poking around:

docker run -it c1861cb1ff7f bash

Note that run takes a single command. You cannot pass a shell command with arguments.

In my specific situation, my build was failing due to javascript compilation errors on the Pi. After digging into it, I realized it was going to be a major pain to build the web assets on the Raspberry Pi. I just built them locally and scp‘d them over:

cd public && scp -r css/ js/ fonts/ mix-manifest.json monica@raspberrypi.local:~/monica-source/public/

After the build is complete locally, you can use it in your docker-compose.yml:

image: monicahq/monicahq

This is helpful if you are using a docker-compose.yml with a pre-existing reference to a named/tagged (with -t) Dockerfile, but you need to patch that Dockerfile to work properly. If you can edit docker-compose.yml, a better approach is to just reference the sub-Dockerfile directly:

services:
  worker:
    build:
      context: .
      dockerfile: Dockerfile

After you’ve rebuilt your docker image (or simply edited the component Dockerfile if you are using build), here’s how to apply the changes:

docker-compose up -d --remove-orphans

I’ll detail some learnings about docker-compose in a separate blog post in the future.

Hosting on a Raspberry Pi

Raspberry Pi’s architecture (32bit ARM by default) is supported by docker. However, some software isn’t packaged to run on the Pi’s ARM architecture. Additionally, the running images on the Pi generally isn’t tested as well as a traditional EC2 instance.

I ran into lots of weird and interesting bugs hosting images on the Pi. I wouldn’t recommend it if you just want to get something working quickly.

Modifying a Dockerfile to work with Raspberry Pi

If you do choose to host an application on the Pi, you’ll inevitably run into weird execution issues. Here’s one that I ran into and how I debugged it.

There’s a great dockerfile for backing up a mysqlsql database, but it was failing for me on the Pi with the following error:

exec user process caused: exec format error

It looks like this error was caused by a missing shebang at the top of the sh files.

git clone https://github.com/schickling/dockerfiles.git schickling-dockerfiles
cd schickling-dockerfiles/mysql-backup-s3/

Both install.sh and run.sh had an extra space in their shebang line.I removed the spaces and built the docker image:

docker build -t iloveitaly/mysql-backup-s3 .

I got a build error:

fetch https://dl-cdn.alpinelinux.org/alpine/v3.13/main/armv7/APKINDEX.tar.gz
ERROR: https://dl-cdn.alpinelinux.org/alpine/v3.13/main: temporary error (try again later)
fetch https://dl-cdn.alpinelinux.org/alpine/v3.13/community/armv7/APKINDEX.tar.gz
WARNING: Ignoring https://dl-cdn.alpinelinux.org/alpine/v3.13/main: No such file or directory
ERROR: https://dl-cdn.alpinelinux.org/alpine/v3.13/community: temporary error (try again later)
WARNING: Ignoring https://dl-cdn.alpinelinux.org/alpine/v3.13/community: No such file or directory

I jumped into the last successful build step (note that sh needed to be used instead of bash, I’m assuming this is because alpine is used as the base image and doesn’t contain bash by default):

docker run -it 186581f43b48 sh

It looks like the error is caused by a raspberrypi issue that requires updating a specific library:

wget http://ftp.de.debian.org/debian/pool/main/libs/libseccomp/libseccomp2_2.5.1-1_armhf.deb
sudo dpkg -i libseccomp2_2.5.1-1_armhf.deb

This fixed the particular build error I was running into, but caused another one: the apk install command was referencing an old python package. I bumped the apk command and that was fixed.

At this point, docker build was running but executing the image caused a different error! This time python was complaining:

ModuleNotFoundError: No module named 'six'

With some googling it looks like that can happen if the pypi project is removed, which is what was happening in the Dockerfile script. I updated the docker file to stop removing pypi which fixed the issue.

However, when I tried to run the image with a SCHEDULE: '@daily' (in the yaml file above) I ran into a go-cron failure. The package hasn’t been updated in many years, so I’m guessing it was an incompatibility with the latest alpine version.

Instead of using that package, I opted to modify the run.sh script to use the native cron functionality. I found conflicting information about using native cron functionality:

  • Some claimed you needed to use complex workarounds or use some sort of wrapper (similar to the workaround described earlier in the post).
  • I found that (a) running cron in the foreground and (b) using -d 8 (option available via busybox cron) routes all cron logs to the parent stdout so you’ll see it in the docker logs.

I rebuilt the container (docker build -t iloveitaly/mysql-backup-s3 .) and applied the modifications; finally everything was working.

I ended up trying out Storj, which is a decentralized s3-compatible storage service. It comes with a generous (150gb) free tier, and it gave me an excuse to tinker around with some dweb stuff. It worked surprisingly well.

Moral of the story: if something does go wrong (high likelihood when using a system with relatively low adoption like raspberrypi) it’s a pain to debug, the feedback loop is painful.

Thoughts on Docker

It was fun playing with Docker images and getting a feel for the ecosystem. I’ll write about docker-compose separately, but it’s a very nice abstraction on top of a raw Dockerfile. The ecosystem has consistently improved over the years and Docker has been hugely helpful in eliminating differences between development, CI, staging, and production environments.

That being said, it was surprising to me how brittle Dockerfiles were (broke easily on the Pi) and how slow it was to debug them. They also take up a ton of ram on macOS. I’m due for a new MacBook, but I have 16gb of RAM, and Docker ate up my free RAM and slowed down my computer to a halt. I can see the value in using Docker to quickly spin-up a local Redis, Postgres, etc but the speed cost for local development was too high for me.

I find it fun to play around with lower level linux system stuff, but I don’t have much patience for tinkering with it when I’m just trying to get something deployed for an application I’m building. I’m a big fan of Heroku for this reason—they build the container image(s) for you automatically with basically zero configuration on your part. If you want more control over your infrastructure, you can use the open source alternative Dokku. Or, if you still want to run Docker images manually, you can use BuildPacks to generate the docker image for you.

This is all to say, I don’t see the value in managing Dockerfiles directly unless you are a very large company who needs nuanced control over your application’s runtime environment. Definitely helpful to understand how this technology works under the hood, but I can’t see myself managing these Dockerfiles directly instead of using a Heroku-like system.