Building a Docker image for a Python Django application

After building a crypto index fund bot I wanted to host the application so the purchase routines would run automatically. In addition to this bot, there were a couple of other smaller applications I’ve been wanting to see if I could self-host (Monica, Storj, Duplicati).

In addition to what I’ve already been doing with my Raspberry Pi, I wanted to see if I could host a couple small utilities/applications on it, and wanted to explore docker more. A perfect learning project!

Open Source Docker Files

As with any learning project, I find it incredibly helpful to clone a bunch of repos with working code into a ~/Projects/docker so I can easily ripgrep my way through them.

https://github.com/schickling/dockerfiles/ Older, but simple Dockerfiles. Helpful to understand the basics of how to solve various problems in Docker. https://github.com/linuxserver/docker-duplicati Example of how to build a Docker image compatible with the raspberry pi. The @linuxserver group on GitHub has a lot of interesting Dockerfiles to learn from. https://github.com/monicahq/docker Docker images for a classic LAMP application. https://github.com/getsentry/sentry Image for a Python Django application https://github.com/mdn/kuma Another Python Django application example

And here’s my resulting Dockerfile for hosting the crypto index fund bot I’ve been playing with.

Learning Docker

I first ran into Docker at a Spree conference way before it was widely adopted. I remember thinking the technology sounded neat, but it was hard to imagine why you’d want to build a docker container.

It takes time for new technologies to make sense. Now docker containers are everywhere, and you can’t imagine living without them. Although I’ve used docker indirectly through Heroku, Dokku, or blindly running docker compose up on an open source project, I’ve never dug in and actually created my own docker image.

Here’s what I learned while writing my first image:

Docker has great install instructions. The repository-based install instructions did not work for me. I went the sh install script route. This guide was helpful Run sudo docker run hello-world to verify docker is working Each command in a Dockerfile generates a new ‘layer’ (intermediate container image). These layers are incrementally built upon to generate your final docker image. ENTRYPOINT always has a default of a shell, CMD is not set by default. ENTRYPOINT cannot be overwritten, CMD can when specified with a docker run command The base images are generally pretty bare. You’ll need to install the packages that you need using something like RUN apt-get update && apt-get install -y --no-install-recommends bash You’ll see set -eux at the beginning of most RUN or other shell commands executed by docker. This ensures that when one shell command fails, the failure bubbles up and the docker build fails as well. Look at the manpage for set to learn more about the specific failure codes. docker exec runs a command within an existing container, docker run creates a new container and executes the command. .dockerignore is like .gitignore but for the COPY command, which is generally used to grab your source code and stuff it in the container. This is important because each command that is run in a Dockerfile attempts to create a cache of the image at that state. If you include files in COPY that are not core to your application, and they are modified often, it will cause longer docker build times, which will slow down your development loop. If a docker command fails, you’ll get a image SHA that you can use to jump into teh container and debug its state: docker image inspect b01352c2271a dive is a really neat tool to inspect each layer of an image. Helpful for debugging container issues. It’s not possible to map a layer SHA to a Dockerfile. When the layers are pulled into your local, they aren’t tagged. Your best bet is using the FROM commands in your Dockerfile and attempting to find the source Dockerfile the tagged images were created from. However, you can publish a docker image to Docker Hub without linking it to an open source Dockerfile (this seems to be rare in practice). What are the differences between all of these base image types? The most popular ones I’ve seen are Debian (buster, stretch, etc) and Alpine. This is a good explanation. Bottom line is most likely you want debian’s latest release (right now, it’s ‘buster’). You may see ‘busybox’ referenced in Dockerfiles. For a while, alpine linux was popular. It was a slimmed-down linux base layer designed to be small (I don’t fully understand why folks are so concerned with image size). The downside is it doesn’t include important utils—like cron. This is where busybox comes in, it’s a space-efficient GNU-toolset replacement. Most likely, you should just use the full debian image and forget about busybox. However, there are cases where the busybox implementation is better and designed to play well with containerized environments. For instance, if you are running a cron (on debian, alpine makes it easier), it’s challenging to get stdout redirected to the parent process without busybox. Build your image with docker build -t your-image-name . and then run it with docker run --env-file .env -it your-image-name You’ll see rm -rf /some/cache/folder in Dockerfiles. This is to eliminate package management cache, which increases the file size of the image. apt-get clean can be used instead of rm -rf /cache/folder. I’m not sure why this is more commonly used in Dockerfiles. By default, COPY requires the source file to exist. However, you can use a glob to safely optionally copy a file COPY *external_portfolio.json ./ You can have multiple FROM statements in your file. This is helpful if you need to install two runtimes (rust and python, for example). Running Cron in a Docker Debian Container

At some point, you’ll need to run a specific command on some sort of schedule without installing a full-blown job scheduler like Resque or Celery.

The ‘easiest’ way to do that is via a simple cron entry. However, cron is not plug-n-play on docker images as I painfully discovered.

Cron is not installed by default in debian base layers. This is done to save space. Installing busybox does not install the cron component when using debian. This is probably because it’s available via the standard cron package. Here’s how to install cron on a debian-based image apt-get update && apt-get install -y --no-install-recommends cron && apt-get clean You may be wondering: why use debian? This all seems so difficult, right? In my specific scenario, I’m using the python docker image which defaults to debian. From what I understand, alpine could cause other dependency issues with python and python extensions which contain C-based extensions. You don’t need to install rsyslog in order to get stdout routed to the parent process (and therefore displayed in the docker logs). To get stdout routed to the parent process, add > /proc/1/fd/1 2>&1 at the end of your cron job definition. By default, cron uses sh not bash and does not pick up on any of the environment variables passed into the docker container. To pick up ENV vars, some people recommend executing a bash script with a login flag. This didn’t work for me. Some recommended storing ENV variables in a file and sourcing it within the cron job script. Similarly, others recommend modifying BASH_ENV in your crontab Neither of these solutions worked perfectly for me. What worked was exporting the current environment variables (being careful to handle special characters) into /etc/profile which is automatically sourced by the cron process.

Here’s my cron.sh to setup the cron schedule and execute it:

#!/bin/bash -l set -eu printenv | awk -F= '{print "export " "\""$1"\"""=""\""$2"\"" }' >> /etc/profile echo "$SCHEDULE root sh -lc '/pull/path/to/executable' > /proc/1/fd/1 2>&1" >> /etc/crontab cron -L 8 -f

It’s insane to me this isn’t more simple. Another argument for keeping docker containers as simple as possible and moving as much execution logic into your application.

Building a Dockerfile

In many cases, a repo will have multiple different dockerfiles. For instance, the Monica repo has a couple different dockerfiles for various purposes. You can specify which file to build using -f:

docker build -t monicahq/monicahq -f scripts/docker/Dockerfile

The -f argument is important, as opposed to cding into the directory with the Dockerfile, since we want many of the commands (notably COPY) to run from a specific directory on the host.

As build is running, it outputs a hash (e.g. c1861cb1ff7f) at each step. When the build fails, you can use that hash to debug the container by shelling in and poking around:

docker run -it c1861cb1ff7f bash

Note that run takes a single command. You cannot pass a shell command with arguments.

In my specific situation, my build was failing due to javascript compilation errors on the Pi. After digging into it, I realized it was going to be a major pain to build the web assets on the Raspberry Pi. I just built them locally and scp‘d them over:

cd public && scp -r css/ js/ fonts/ mix-manifest.json monica@raspberrypi.local:~/monica-source/public/

After the build is complete locally, you can use it in your docker-compose.yml:

image: monicahq/monicahq

This is helpful if you are using a docker-compose.yml with a pre-existing reference to a named/tagged (with -t) Dockerfile, but you need to patch that Dockerfile to work properly. If you can edit docker-compose.yml, a better approach is to just reference the sub-Dockerfile directly:

services: worker: build: context: . dockerfile: Dockerfile

After you’ve rebuilt your docker image (or simply edited the component Dockerfile if you are using build), here’s how to apply the changes:

docker-compose up -d --remove-orphans

I’ll detail some learnings about docker-compose in a separate blog post in the future.

Hosting on a Raspberry Pi

Raspberry Pi’s architecture (32bit ARM by default) is supported by docker. However, some software isn’t packaged to run on the Pi’s ARM architecture. Additionally, the running images on the Pi generally isn’t tested as well as a traditional EC2 instance.

I ran into lots of weird and interesting bugs hosting images on the Pi. I wouldn’t recommend it if you just want to get something working quickly.

Modifying a Dockerfile to work with Raspberry Pi

If you do choose to host an application on the Pi, you’ll inevitably run into weird execution issues. Here’s one that I ran into and how I debugged it.

There’s a great dockerfile for backing up a mysqlsql database, but it was failing for me on the Pi with the following error:

exec user process caused: exec format error

It looks like this error was caused by a missing shebang at the top of the sh files.

git clone https://github.com/schickling/dockerfiles.git schickling-dockerfiles cd schickling-dockerfiles/mysql-backup-s3/

Both install.sh and run.sh had an extra space in their shebang line.I removed the spaces and built the docker image:

docker build -t iloveitaly/mysql-backup-s3 .

I got a build error:

fetch https://dl-cdn.alpinelinux.org/alpine/v3.13/main/armv7/APKINDEX.tar.gz ERROR: https://dl-cdn.alpinelinux.org/alpine/v3.13/main: temporary error (try again later) fetch https://dl-cdn.alpinelinux.org/alpine/v3.13/community/armv7/APKINDEX.tar.gz WARNING: Ignoring https://dl-cdn.alpinelinux.org/alpine/v3.13/main: No such file or directory ERROR: https://dl-cdn.alpinelinux.org/alpine/v3.13/community: temporary error (try again later) WARNING: Ignoring https://dl-cdn.alpinelinux.org/alpine/v3.13/community: No such file or directory

I jumped into the last successful build step (note that sh needed to be used instead of bash, I’m assuming this is because alpine is used as the base image and doesn’t contain bash by default):

docker run -it 186581f43b48 sh

It looks like the error is caused by a raspberrypi issue that requires updating a specific library:

wget http://ftp.de.debian.org/debian/pool/main/libs/libseccomp/libseccomp2_2.5.1-1_armhf.deb sudo dpkg -i libseccomp2_2.5.1-1_armhf.deb

This fixed the particular build error I was running into, but caused another one: the apk install command was referencing an old python package. I bumped the apk command and that was fixed.

At this point, docker build was running but executing the image caused a different error! This time python was complaining:

ModuleNotFoundError: No module named 'six'

With some googling it looks like that can happen if the pypi project is removed, which is what was happening in the Dockerfile script. I updated the docker file to stop removing pypi which fixed the issue.

However, when I tried to run the image with a SCHEDULE: '@daily' (in the yaml file above) I ran into a go-cron failure. The package hasn’t been updated in many years, so I’m guessing it was an incompatibility with the latest alpine version.

Instead of using that package, I opted to modify the run.sh script to use the native cron functionality. I found conflicting information about using native cron functionality:

Some claimed you needed to use complex workarounds or use some sort of wrapper (similar to the workaround described earlier in the post). I found that (a) running cron in the foreground and (b) using -d 8 (option available via busybox cron) routes all cron logs to the parent stdout so you’ll see it in the docker logs.

I rebuilt the container (docker build -t iloveitaly/mysql-backup-s3 .) and applied the modifications; finally everything was working.

I ended up trying out Storj, which is a decentralized s3-compatible storage service. It comes with a generous (150gb) free tier, and it gave me an excuse to tinker around with some dweb stuff. It worked surprisingly well.

Moral of the story: if something does go wrong (high likelihood when using a system with relatively low adoption like raspberrypi) it’s a pain to debug, the feedback loop is painful.

Thoughts on Docker

It was fun playing with Docker images and getting a feel for the ecosystem. I’ll write about docker-compose separately, but it’s a very nice abstraction on top of a raw Dockerfile. The ecosystem has consistently improved over the years and Docker has been hugely helpful in eliminating differences between development, CI, staging, and production environments.

That being said, it was surprising to me how brittle Dockerfiles were (broke easily on the Pi) and how slow it was to debug them. They also take up a ton of ram on macOS. I’m due for a new MacBook, but I have 16gb of RAM, and Docker ate up my free RAM and slowed down my computer to a halt. I can see the value in using Docker to quickly spin-up a local Redis, Postgres, etc but the speed cost for local development was too high for me.

I find it fun to play around with lower level linux system stuff, but I don’t have much patience for tinkering with it when I’m just trying to get something deployed for an application I’m building. I’m a big fan of Heroku for this reason—they build the container image(s) for you automatically with basically zero configuration on your part. If you want more control over your infrastructure, you can use the open source alternative Dokku. Or, if you still want to run Docker images manually, you can use BuildPacks to generate the docker image for you.

This is all to say, I don’t see the value in managing Dockerfiles directly unless you are a very large company who needs nuanced control over your application’s runtime environment. Definitely helpful to understand how this technology works under the hood, but I can’t see myself managing these Dockerfiles directly instead of using a Heroku-like system.

Continue Reading

Blocking Ads & Monitoring External Drives with Raspberry Pi

I’ve written about how I setup my raspberry pi to host time machine backups. I took my pi a bit further and set it up as a local DNS server to block ad tracking systems and, as part of my digital minimalism kick/obsession, to block distracting websites network-wide on a schedule.

Pi-hole: block ads and trackers on your network

Pi-hole is a neat project: it hosts a local DNS server on your Pi which automatically pulls in a blacklist of domains used by advertisers. The interesting side effect is you can control the blacklist programmatically, enabling you to block distracting websites on a schedule. This is perfect for my digital minimalism toolkit.

Pi-hole has an active Discourse forum. I’ve come to love these project-specific forums instead of everything being centralized on StackOverflow. Really impressed with how simple and well designed in the install process is. Run curl -sSL https://install.pi-hole.net | bash and there’s a nice CLI wizard that walks you through the process. By the end, you can You’ll need to point your DNS resolution to your pi on your router, but you can manually override your router settings in your internet config in MacOS for testing. After you have DNS resolution setup to point to the Pi, you can access the admin via http://pi.hole/admin Upgrade your pi-hole via pihole -up There’s also an interesting project which bundles wireguard (vpn) into a docker image: https://github.com/IAmStoxe/wirehole Automatically blocking distracting websites

Now to automatically block distracting websites! I have a system for aggressively blocking distracting sites on my local machine, but I wanted to extend this network-wide.

First, we’ll need two scripts to block and allow websites. Let’s call our blocking script block.sh:

#!/bin/bash blockDomains=(facebook.com www.facebook.com pinterest.com www.pinterest.com amazon.com www.amazon.com smile.amazon.com) for domain in ${blockDomains[@]}; do pihole -b $domain done

For the allow.sh just switch the pihole command in the above script to include the -d option:

pihole -b -d $domain

You’ll need to chmod +x both allow.sh and block.sh. Put the scripts in ~/Documents/. Test them locally via ./allow.sh.

Now we need to add them to cron. Run crontab -e and add these two entries:

0 21 * * * bash -l -c '/home/pi/Documents/block.sh' | logger -p cron.info 0 6 * * * bash -l -c '/home/pi/Documents/allow.sh' | logger -p cron.info

Next, make the following changes to enable a dedicated cron log file and more verbose cron logging:

# uncomment line with #cron sudo nano /etc/rsyslog.conf # add EXTRA_OPTS='-L 15'. 15 is the *sum* of the logging options that you want to enable # I found this syntax very confusing and it wasn't until I read the manpage that I realized # why my logging levels were not taken into effect. sudo nano /etc/default/cron # restart relevant services sudo service rsyslog restart sudo service cron restart # follow the new log file tail -f /var/log/cron.log What’s all this extra stuff around our script?

I wanted to see the stdout of my cron jobs in cron.log. Here’s why the extra cruft around {block,allow}.sh enables this.

The bash -l -c is important: it ensures that the pi user’s env configuration is used, which ensures the script can find pihole and other commands you might use in the script. Sourcing the user’s environment is not recommended for a ‘real’ production system, but it’s ok for our home-based pi project.

By default, the stdout of the script run in your cron definition is not sent to the parent processes stdout. Instead, it’s emailed to you (if you don’t have email configured on your pi it will land in /var/mail/pi). To me, this is insane, but I imagine this is the result of a decision made long ago and any seasoned sysadmin has this drilled into his memory.

As an aside, it is unfortunate that many ancient decisions made on a whim continue to cause wasted hours and lots of frustration to newcomers. Think of all of the lost time, or people who give up continuing to learn, because of the unneeded barriers to entry in various technologies. Ok, back to the explanation I promised!

In order to avoid having your cron job output sent to mail you need to redirect the output. | logger does this for us and sends the stdout to syslog. the -p cron.info argument sets the facility.level of the log message. Facility is a weird word used for ‘process’ or ‘log category’ and is important because it maps the log entry to the cron.log file specified in the rsyslog.conf modification we made earlier. In other words, it sets the facility of the log message so syslog can run it through its internal ruleset engine to determine which file it should go in. man logger has more nitty-gritty details about how this works.

How long will it take for these block/allow changes to take effect?

Since pi-hole uses DNS for the blacklist, the TTL on the DNS entry matters. Luckily, it’s very very short (2m) by default. This means that it will take ~2m for websites to be blocked after the scripts above run on the Pi. You can check the local-ttl value by cat /etc/dnsmasq.d/01-pihole.conf. You can also see the TTL value on a specific DNS entry via the first number under then ANSWER SECTION response when running dig google.com.

If you want to test query response times (and the response content!) between your previous DNS server and your pi-hosted DNS server you can specify a DNS server to use: dig @raspberrypi.local facebook.com. However, something funky is going on with the query response times: code>@raspberrypi.local@192.168.1.2

Continue Reading

Time Machine Backups with a Raspberry Pi and External Drives

As I was reviewing my backup strategy, I realized I hadn’t completed a Time Machine backup on my machines in a long time. Plugging in the drive was just enough friction to forget doing it completely.

The Airport Express has a USB port to plug hard drives, printers, etc into. These devices would be magically broadcasted to the network. It was awesome, and then Apple killed the device. The Eero I upgraded to is great, but the USB port is useless.

But, there’s silver lining! I’ve been looking for a good excuse to buy a Raspberry Pi and mounting external hard drives on the network fit the bill! $35 for a tiny computer more powerful that anything I had growing up and more powerful than a $5 DigitalOcean or AWS VPS. What’s not to like?

Purchasing the Hardware Raspberry PI 4 2GB. $45. I didn’t end up using the USB-C => micro USB connector and the eBook was useless. HDMI connector was helpful. Case, fan, and power supply. $12. The 5V 3A power supply required isn’t common, so you’ll most likely need to buy one. Having a case is really nice. You’ll also need a micro SD card, but I had an extra 16GB card.

So not exactly the $35 sticker price that is advertised, but still cheap.

Setting up Raspberry Pi for Remote VNC & SSH Access

My goal was to run the Pi headless. Here’s how I got the Pi setup for VNC access over the network that works across reboots:

Download http://downloads.raspberrypi.org/NOOBS_latest. Unzip and put it on the SD card. Make sure the SD card is FAT formatted. Make sure you don’t put the unzipped folder on the root directory, but rather the contents of the unzipped folder. Startup the Pi. You’ll want a monitor connected via HDMI and a (wired) keyboard to complete the setup process. You don’t need a mouse. Setup VNC & ssh. Open up a terminal and run sudo raspi-config. Navigate to "Interfacing Options", enable VNC & SSH. Set boot options to desktop for easy VNC usage. Here’s more info You also want to set the default resolution via raspi-config or VNC won’t work when you reboot without a monitor. On your mac brew cask install vnc-viewer. Username: pi, password is what you used during the on-screen setup. You should be able to manage the device right from your mac.

At this point, you’ll have access to the PI without a keyboard and mouse. Let’s setup the Pi to serve up the hard drives over the network!

Setting Up External Hard Drives as Network Attached Storage (NAS)

Here’s a couple articles I found that were helpful:

This one is the most recent and complete: https://gregology.net/2018/09/raspberry-pi-time-machine/ https://magpi.raspberrypi.org/articles/build-a-raspberry-pi-nas https://mudge.name/2019/11/12/using-a-raspberry-pi-for-time-machine/ https://gregology.net/2018/09/raspberry-pi-time-machine/ https://github.com/mr-bt/raspberrypi-timemachine

None of the articles seemed to completely match by setup. Here’s what I wanted to setup:

I have two external drives. I wanted to use one as a networked time machine drive and the other as general storage. One of the drives had a power supply and the other did not.

Here’s how I ended up serving my two hard drives on the network:

Install the packages we’ll need: sudo apt-get --assume-yes install netatalk A quick note on HFS+ formatted drives: I ended up corrupting the drives on HFS+ mode, most likely because I aggressively turned the power on/off without unmounting the drives. I’d recommend against using HFS+ formatted drives and instead format to the linux-native Ext4. I’ve documented this below. Run netatalk -v to make sure you have a recent version and get the location to the config file. Latest version is indicated here: http://netatalk.sourceforge.net sudo nano /etc/netatalk/afp.conf to edit the config file. This is the location for version 3.1. Pull the location of the config from the output of the previous netatalk command we ran if you run into issues finding this file. The instructions inlined in the config file are pretty straightforward. In Global add mimic model = TimeCapsule6,106. This broadcasts the time machine drive to look like a ‘real’ time machine device. Here’s a list of other options you can use. Neat! https://wiki.archlinux.org/index.php/Netatalk You’ll also want to edit sudo nano /etc/nsswitch.conf and append mdns4 mdns to the line with dns. This broadcasts the drive availability on the network. Get a list of all services running on your Pi with sudo service --status-all sudo shutdown -r now or sudo reboot to restart the system from the command line Change your AFP config? sudo service netatalk restart I did find the MacOS finder was pretty glitchy when I restarted services on the Pi. I ended up force quitting the finder a couple times to pick up new drive configurations. sudo chown -R pi:pi /media/pi/MikeExternalStorage to fix strange permission issues when accessing the drive. This may have had to do with attempting to use HFS+ formatting at first, so you most likely don’t need to do this.

Here’s the final /etc/nsswitch.conf:

# /etc/nsswitch.conf # # Example configuration of GNU Name Service Switch functionality. # If you have the `glibc-doc-reference' and `info' packages installed, try: # `info libc "Name Service Switch"' for information about this file. passwd: files group: files shadow: files gshadow: files hosts: files mdns4_minimal [NOTFOUND=return] dns mdns4 mdns networks: files protocols: db files services: db files ethers: db files rpc: db files netgroup: nis

And /etc/netatalk/afp.conf:

; ; Netatalk 3.x configuration file ; [Global] mimic model = TimeCapsule6,106 ; [Homes] ; basedir regex = /xxxx [ExternalStorage] path = /media/pi/ExternalStorage [TimeMachine] path = /media/pi/TimeMachine time machine = yes A Warning on Filesystem Types

I had everything working, and then I accidentally restarted the Pi and everything was mounted as a read-only drive. I ran mount and saw type hfsplus (ro... in the info line. RO = read only.

After some googling I found that this seemed to be caused by the HFS+ filesystem (i.e. macos’s default) that I attempted to support by installing hfsprogs hfsplus hfsutils. Don’t try to host a HFS drive via the Pi. The drivers are not great and I believe this is why I ran intro trouble here.

Here’s what I tried to fix the problem:

After some googling I found: sudo fsck.hfsplus -f /dev/sda1. This didn’t do anything for me. I tried https://askubuntu.com/questions/333287/how-to-fix-external-hard-disk-read-only. This didn’t seem to work for me because I had partitions lsblk, blkid, sudo fdisk -l, ls -l /dev/disk/by-uuid/ are useful tools for inspecting what disks/devices are mounted. sudo cp fstab fstab.bak, edit fstab with nano, UUID=1-1-1-1 /media/pi/ExternalStorage hfsplus force,rw. No quotes around uuid. Tried sudo umount /dev/sda2 && sudo mount /dev/sda2. No luck. Determine how large a folder is: sudo du -sh Tried disabling sudo service netatalk stop && sudo service avahi-daemon stop and restarting the computer. No dice.

Uh oh. Not good.

I plugged the drive into my MacOS computer and told me the drive was corrupted. I tried to repair the disk and it gave me esoteric error. I think accidentally turning off the power may have corrupted the disk. HFS+ isn’t supported natively and seems to be prone to corruption issues if drives are not unmounted properly. Ext4 is the recommended file system format.

Some reference articles:

https://askubuntu.com/questions/997279/how-to-make-hfs-external-drive-read-write-journaling-already-disabled-still-n https://www.raspberrypi.org/forums/viewtopic.php?t=109157 https://raspberrypi.stackexchange.com/questions/30151/unplugged-hfs-usb-drive-from-rpi-is-corrupted/41643

It’s surprising to me, that in 2020, file system formats still matter across operating systems. I feel like I’m transported back to the 90s. Why isn’t this a solved problem yet?

Here’s what I did to fix the issue:

Luckily, although I can’t write to my drives, I can read from them and was able duplicate the data other devices. I pulled the data off to various drives and computers using rsync. It was a huge pain: I needed to spread the data across a couple of different devices. Here’s the commands I used: sudo rsync -avh --progress pi@192.168.2.200:/media/pi/MikeExternalStorage/MikeiTunes ~/Desktop/MikeiTunes After the data was moved off I reformatted the drive: Unmount the drive: sudo umount /dev/sba2 Wipe the drive and format sudo mkfs.ext4 /dev/sba Create a mount point sudo mkdir /media/TimeMachine && sudo chmod 777 /media/TimeMachine Add the table entry: sudo nano /etc/fstab, then /dev/sda /media/TimeMachine auto defaults 0 2. fstab => "File System Table" Add a label to the drive sudo e2label /dev/sda TimeMachine https://www.raspberrypi.org/forums/viewtopic.php?t=67896 Mount the drive sudo mount /dev/sda After this was done, I rsync’d the data back to the drive. In some cases I needed to use sudo on the Pi to avoid any permission issues. sudo rsync -avh --progress --rsync-path="sudo rsync" ~/Desktop/EmilyBackup pi@192.168.7.200:/media/ExternalStorage/ I found it helpful to use screen to manage long running sessions on the PI: sudo apt install screen screen to start a new session screen ls to list all sessions. Attach to a session like screen -r THE_ID tmux is also great for this (and probably better) I also needed a way to unzip some folders. There’s not a built-in util that provides unzipping with a progress indicator. I found 7z which fits the bill: 7z x Projects_old.zip -o./. https://askubuntu.com/questions/909918/q-how-to-show-unzip-progress

Summary: don’t use HFS/mac formatted hard drives on linux!

A Note on Using Old Drives for Storage

I couldn’t get my 2TB drive to pass the SMART monitoring (details in a future blog post!) ‘long’ test. I did a bit of research and external drives tend to only last ~5 years. The 2TB drive was ~10 years old and had exhibiting some glitchy behavior. I ended up replacing it with a much smaller 2TB drive for $60.

This is all to say: it’s worth replacing drives every 5 years or so and ensure they are monitoring by SMART to catch any failures early on.

Fixing Raspberry Pi’s Emergency Mode

I connected the new drive to replace my old one, formatted it, and setup the fstab config just like the other drives. The UI started glitching out and I had two mounts setup for my old drive. I figured restarting the Pi would fix the issue, but that was a bad idea.

The Pi wouldn’t connect to the network and appeared dead. I tried unplugging the drives, but that didn’t help.

I ended up having to plug it back into a monitor and found the following message:

You are in emergency mode. After logging in, "journalctl -xb" to view system logs, "systemctl reboot" to reboot, "systemctl default" or ^D to try again to boot into default mode.

Here’s what I found:

If any of the devices in fstab cannot be found, it will hang the boot process and you’ll be kicked into emergency mode. This was surprising to me: looks like linux is not too forgiving with bad configuration. Here’s how to fix the issue: Plug your SD card into another computer, edit cmdline.txt on the root of the card, and add init=/bin/sh to the end of it. Looks like the Pi reads that txt file to determine how to boot. I believe this is a Pi-specific config file. Plug the SD card back into the pi, run mount -o remount,rw / when the prompt appears and comment out custom lines in /etc/fstab. Reboot the Pi and you’ll be back in action. I ran journalctl -xb but couldn’t find any errors specifically identifying the drive. /var/log/syslog is also a good place to look. sudo findmnt --verify --verbose is a way to verify your fstab config If you specify default,nofail in fstab it looks like you can avoid this problem. I’m not sure what the side effects of this approach is. I don’t understand why fstab definitions are necessary if the default drive config is working fine. All drives automount when connected. I ended up removing all fstab entries and using the autogenerated mount points at /media/pi.

Resources:

http://www.clarkle.com/notes/emergecy-mode-bad-fstab/ https://www.raspberrypi.org/forums/viewtopic.php?t=193153 https://unix.stackexchange.com/questions/44027/how-to-fix-boot-failure-due-to-incorrect-fstab Under Voltage & USB-Powered Devices

When I replaced my old hard drive, I grabbed a USB powered one off of Amazon. However, the Pi can only support powering a single external drive drive.

You can determine if this is happening by searching the syslogs:

cat /var/log/kern.log | grep -i 'voltage'

Some references:

https://www.raspberrypi.org/forums/viewtopic.php?t=160819 https://github.com/raspberrypi/linux/pull/2397

The solution is buying a USB hub that is externally powered, like this one.

Spotlight Indexing on NAS

It’s not possible:

https://discussions.apple.com/thread/130462?tstart=0 https://www.raspberrypi.org/forums/viewtopic.php?t=270155 https://care.qumulo.com/hc/en-us/articles/115008514788-Mac-OS-X-Spotlight-Search-and-Qumulo

As an aside, I’ve also learned it’s not possible to exclude folders with a specific pattern (such as node_modules) or with a specific dot file within the folder (.metadata-no-index). You can only control what’s indexed via the control panel.

Wow, this seems like a lot of work? Was this even a good idea?

Yes, it was. Took way more time than I expected. Probably not a great idea! If you just want to get a networked time machine up and running quickly, I wouldn’t do this.

But…I learned a bunch, which was the fun part for me.

Why is Linux still hard to use?

Way back before Heroku & AWS were a thing, I used to manage server config for various apps I developed. It was a massive pain. I remember clearly staring blankly into my terminal editing files in /etc/* as instructed by obscure blog posts across the internet and hoping things worked. Once I had things working, I left them alone.

Now, to be sure, things have gotten better. Ansible, Terraform, CDK, etc all allow you to configure servers and cloud services with code rather than manually editing files. However, these abstractions are simply that—abstractions. Many times you’ll run into issues with the underlying system config that you need to correct.

The Pi experience, which I’m assuming mirrors the general state of Linux config in general, is really bad. I forgot how incredibly valuable it is to have sane, smart defaults configured on MacOS that is tailored to the hardware which it’s running on. A Given the slow decay of Mac devices (high hopes for Apple Silicon, but overall Apple machines have gotten worse over the years), I’ve thought about moving to Linux, but this experience has eliminated that thought from my mind.

Maybe some of this pain can be chalked up to the Pi OS, but I can’t imagine things are many orders-of-magnitude better on other Linux variants. I hope I’m wrong, and I hope Linux desktops can eventually get to the ‘just works’ state that MacOS maintains.

Continue Reading