Using the Multiprocess Library in Python 3

Python has a nifty multiprocessing library which comes with a lot of helpful abstractions. However, as with concurrent programming in most languages, there are lots of footguns.

Here some of the gotchas I ran into:

Logging does not work as you’d expect. Global state associated with your logger will be wiped out, although if you’ve already defined a logger variable it will continue to reference the same object from the parent process. It seems like the easiest solution for logging is to setup a new file-based logger in the child process. If you can’t do this, you’ll need to implement some sort of message queue logging which sounds terrible. Relatedly, be careful about using any database connections, file handles, etc in a forked process. This can cause strange, hard to debug errors. When you pass variables to a forked process, they are ‘pickled’. This serializes the python data structure and deserializes on the ‘other end’ (i.e. in the forked process). I was trying to decorate a function and pickle it, and ran into weird issues. Only top-level module functions can be pickled. If you are using the macos libraries via python, you cannot reference them both on a parent and child process. The solution here is to run all functions which hit the macos libraries in a subprocess. I was not able to get the decorator in this linked post working. Here’s a working example using a modified version of the source below.

I struggled to find full working examples of using the multiprocess library online (here’s the best I found). I’ve included an example of using multiprocessing to create a forked process to execute a function and result the results inline.

Send a signal from the parent process to the child process to start executing using multiprocessing.Condition. I want not able to get this working without first notify()ing the parent process. Kill the child process after 10m. This works around memory leaks I was running into with the applescript I was trying to execute. Configure logging in forked process. Return result synchronously to the caller using a shared queue implemented using multiprocessing.Queue import multiprocessing import time import logging forked_condition = None forked_result_queue = None forked_process = None forked_time = None logger = logging.getLogger(__name__) def _wrapped_function(condition, result_queue, function_reference): # this is run in a forked process, which wipes all logging configuration # you'll need to reconfigure your logging instance in the forked process logger.setLevel(logging.DEBUG) first_run = True while True: with condition: # notify parent process that we are ready to wait for notifications # an alternative here that I did not attempt is waiting for `is_alive()` https://stackoverflow.com/questions/57929895/python-multiprocessing-process-start-wait-for-process-to-be-started if first_run: condition.notify() first_run = False condition.wait() try: logger.debug("running operation in fork") result_queue.put(function_reference()) except Exception as e: logger.exception("error running function in fork") result_queue.put(None) def _run_in_forked_process(function_reference): global forked_condition, forked_result_queue, forked_process, forked_time # terminate the process after 10m if forked_time and time.time() - forked_time > 60 * 10: assert forked_process logger.debug("killing forked process, 10 minutes have passed") forked_process.kill() forked_process = None if not forked_process: forked_condition = multiprocessing.Condition() forked_result_queue = multiprocessing.Queue() forked_process = multiprocessing.Process( target=_wrapped_function, args=(forked_condition, forked_result_queue, function_reference) ) forked_process.start() forked_time = time.time() # wait until fork is ready, if this isn't done the process seems to miss the # the parent process `notify()` call. My guess is `wait()` needs to be called before `notify()` with forked_condition: logger.debug("waiting for child process to indicate readiness") forked_condition.wait() # if forked_process is defined, forked_condition always should be as well assert forked_condition and forked_result_queue # signal to the process to run `getInfo` again and put the result on the queue with forked_condition: forked_condition.notify() logger.debug("waiting for result of child process") return forked_result_queue.get(block=True) def _exampleFunction(): # do something strange, like running applescript return "hello" def exampleFunction(): return _run_in_forked_process(_exampleFunction) # you can use the wrapped function like a normal python function print(exampleFunction()) # this doesn't make sense to use in a single-use script, but if you need to you'll need to terminate the forked process forked_process.kill()

Note that the target environment here was macos. This may not work perfectly on linux or windows, it seems as though there are additional footguns on windows in particular.

Continue Reading

Building a Docker image for a Python Django application

After building a crypto index fund bot I wanted to host the application so the purchase routines would run automatically. In addition to this bot, there were a couple of other smaller applications I’ve been wanting to see if I could self-host (Monica, Storj, Duplicati).

In addition to what I’ve already been doing with my Raspberry Pi, I wanted to see if I could host a couple small utilities/applications on it, and wanted to explore docker more. A perfect learning project!

Open Source Docker Files

As with any learning project, I find it incredibly helpful to clone a bunch of repos with working code into a ~/Projects/docker so I can easily ripgrep my way through them.

https://github.com/schickling/dockerfiles/ Older, but simple Dockerfiles. Helpful to understand the basics of how to solve various problems in Docker. https://github.com/linuxserver/docker-duplicati Example of how to build a Docker image compatible with the raspberry pi. The @linuxserver group on GitHub has a lot of interesting Dockerfiles to learn from. https://github.com/monicahq/docker Docker images for a classic LAMP application. https://github.com/getsentry/sentry Image for a Python Django application https://github.com/mdn/kuma Another Python Django application example

And here’s my resulting Dockerfile for hosting the crypto index fund bot I’ve been playing with.

Learning Docker

I first ran into Docker at a Spree conference way before it was widely adopted. I remember thinking the technology sounded neat, but it was hard to imagine why you’d want to build a docker container.

It takes time for new technologies to make sense. Now docker containers are everywhere, and you can’t imagine living without them. Although I’ve used docker indirectly through Heroku, Dokku, or blindly running docker compose up on an open source project, I’ve never dug in and actually created my own docker image.

Here’s what I learned while writing my first image:

Docker has great install instructions. The repository-based install instructions did not work for me. I went the sh install script route. This guide was helpful Run sudo docker run hello-world to verify docker is working Each command in a Dockerfile generates a new ‘layer’ (intermediate container image). These layers are incrementally built upon to generate your final docker image. ENTRYPOINT always has a default of a shell, CMD is not set by default. ENTRYPOINT cannot be overwritten, CMD can when specified with a docker run command The base images are generally pretty bare. You’ll need to install the packages that you need using something like RUN apt-get update && apt-get install -y --no-install-recommends bash You’ll see set -eux at the beginning of most RUN or other shell commands executed by docker. This ensures that when one shell command fails, the failure bubbles up and the docker build fails as well. Look at the manpage for set to learn more about the specific failure codes. docker exec runs a command within an existing container, docker run creates a new container and executes the command. .dockerignore is like .gitignore but for the COPY command, which is generally used to grab your source code and stuff it in the container. This is important because each command that is run in a Dockerfile attempts to create a cache of the image at that state. If you include files in COPY that are not core to your application, and they are modified often, it will cause longer docker build times, which will slow down your development loop. If a docker command fails, you’ll get a image SHA that you can use to jump into teh container and debug its state: docker image inspect b01352c2271a dive is a really neat tool to inspect each layer of an image. Helpful for debugging container issues. It’s not possible to map a layer SHA to a Dockerfile. When the layers are pulled into your local, they aren’t tagged. Your best bet is using the FROM commands in your Dockerfile and attempting to find the source Dockerfile the tagged images were created from. However, you can publish a docker image to Docker Hub without linking it to an open source Dockerfile (this seems to be rare in practice). What are the differences between all of these base image types? The most popular ones I’ve seen are Debian (buster, stretch, etc) and Alpine. This is a good explanation. Bottom line is most likely you want debian’s latest release (right now, it’s ‘buster’). You may see ‘busybox’ referenced in Dockerfiles. For a while, alpine linux was popular. It was a slimmed-down linux base layer designed to be small (I don’t fully understand why folks are so concerned with image size). The downside is it doesn’t include important utils—like cron. This is where busybox comes in, it’s a space-efficient GNU-toolset replacement. Most likely, you should just use the full debian image and forget about busybox. However, there are cases where the busybox implementation is better and designed to play well with containerized environments. For instance, if you are running a cron (on debian, alpine makes it easier), it’s challenging to get stdout redirected to the parent process without busybox. Build your image with docker build -t your-image-name . and then run it with docker run --env-file .env -it your-image-name You’ll see rm -rf /some/cache/folder in Dockerfiles. This is to eliminate package management cache, which increases the file size of the image. apt-get clean can be used instead of rm -rf /cache/folder. I’m not sure why this is more commonly used in Dockerfiles. By default, COPY requires the source file to exist. However, you can use a glob to safely optionally copy a file COPY *external_portfolio.json ./ You can have multiple FROM statements in your file. This is helpful if you need to install two runtimes (rust and python, for example). Running Cron in a Docker Debian Container

At some point, you’ll need to run a specific command on some sort of schedule without installing a full-blown job scheduler like Resque or Celery.

The ‘easiest’ way to do that is via a simple cron entry. However, cron is not plug-n-play on docker images as I painfully discovered.

Cron is not installed by default in debian base layers. This is done to save space. Installing busybox does not install the cron component when using debian. This is probably because it’s available via the standard cron package. Here’s how to install cron on a debian-based image apt-get update && apt-get install -y --no-install-recommends cron && apt-get clean You may be wondering: why use debian? This all seems so difficult, right? In my specific scenario, I’m using the python docker image which defaults to debian. From what I understand, alpine could cause other dependency issues with python and python extensions which contain C-based extensions. You don’t need to install rsyslog in order to get stdout routed to the parent process (and therefore displayed in the docker logs). To get stdout routed to the parent process, add > /proc/1/fd/1 2>&1 at the end of your cron job definition. By default, cron uses sh not bash and does not pick up on any of the environment variables passed into the docker container. To pick up ENV vars, some people recommend executing a bash script with a login flag. This didn’t work for me. Some recommended storing ENV variables in a file and sourcing it within the cron job script. Similarly, others recommend modifying BASH_ENV in your crontab Neither of these solutions worked perfectly for me. What worked was exporting the current environment variables (being careful to handle special characters) into /etc/profile which is automatically sourced by the cron process.

Here’s my cron.sh to setup the cron schedule and execute it:

#!/bin/bash -l set -eu printenv | awk -F= '{print "export " "\""$1"\"""=""\""$2"\"" }' >> /etc/profile echo "$SCHEDULE root sh -lc '/pull/path/to/executable' > /proc/1/fd/1 2>&1" >> /etc/crontab cron -L 8 -f

It’s insane to me this isn’t more simple. Another argument for keeping docker containers as simple as possible and moving as much execution logic into your application.

Building a Dockerfile

In many cases, a repo will have multiple different dockerfiles. For instance, the Monica repo has a couple different dockerfiles for various purposes. You can specify which file to build using -f:

docker build -t monicahq/monicahq -f scripts/docker/Dockerfile

The -f argument is important, as opposed to cding into the directory with the Dockerfile, since we want many of the commands (notably COPY) to run from a specific directory on the host.

As build is running, it outputs a hash (e.g. c1861cb1ff7f) at each step. When the build fails, you can use that hash to debug the container by shelling in and poking around:

docker run -it c1861cb1ff7f bash

Note that run takes a single command. You cannot pass a shell command with arguments.

In my specific situation, my build was failing due to javascript compilation errors on the Pi. After digging into it, I realized it was going to be a major pain to build the web assets on the Raspberry Pi. I just built them locally and scp‘d them over:

cd public && scp -r css/ js/ fonts/ mix-manifest.json monica@raspberrypi.local:~/monica-source/public/

After the build is complete locally, you can use it in your docker-compose.yml:

image: monicahq/monicahq

This is helpful if you are using a docker-compose.yml with a pre-existing reference to a named/tagged (with -t) Dockerfile, but you need to patch that Dockerfile to work properly. If you can edit docker-compose.yml, a better approach is to just reference the sub-Dockerfile directly:

services: worker: build: context: . dockerfile: Dockerfile

After you’ve rebuilt your docker image (or simply edited the component Dockerfile if you are using build), here’s how to apply the changes:

docker-compose up -d --remove-orphans

I’ll detail some learnings about docker-compose in a separate blog post in the future.

Hosting on a Raspberry Pi

Raspberry Pi’s architecture (32bit ARM by default) is supported by docker. However, some software isn’t packaged to run on the Pi’s ARM architecture. Additionally, the running images on the Pi generally isn’t tested as well as a traditional EC2 instance.

I ran into lots of weird and interesting bugs hosting images on the Pi. I wouldn’t recommend it if you just want to get something working quickly.

Modifying a Dockerfile to work with Raspberry Pi

If you do choose to host an application on the Pi, you’ll inevitably run into weird execution issues. Here’s one that I ran into and how I debugged it.

There’s a great dockerfile for backing up a mysqlsql database, but it was failing for me on the Pi with the following error:

exec user process caused: exec format error

It looks like this error was caused by a missing shebang at the top of the sh files.

git clone https://github.com/schickling/dockerfiles.git schickling-dockerfiles cd schickling-dockerfiles/mysql-backup-s3/

Both install.sh and run.sh had an extra space in their shebang line.I removed the spaces and built the docker image:

docker build -t iloveitaly/mysql-backup-s3 .

I got a build error:

fetch https://dl-cdn.alpinelinux.org/alpine/v3.13/main/armv7/APKINDEX.tar.gz ERROR: https://dl-cdn.alpinelinux.org/alpine/v3.13/main: temporary error (try again later) fetch https://dl-cdn.alpinelinux.org/alpine/v3.13/community/armv7/APKINDEX.tar.gz WARNING: Ignoring https://dl-cdn.alpinelinux.org/alpine/v3.13/main: No such file or directory ERROR: https://dl-cdn.alpinelinux.org/alpine/v3.13/community: temporary error (try again later) WARNING: Ignoring https://dl-cdn.alpinelinux.org/alpine/v3.13/community: No such file or directory

I jumped into the last successful build step (note that sh needed to be used instead of bash, I’m assuming this is because alpine is used as the base image and doesn’t contain bash by default):

docker run -it 186581f43b48 sh

It looks like the error is caused by a raspberrypi issue that requires updating a specific library:

wget http://ftp.de.debian.org/debian/pool/main/libs/libseccomp/libseccomp2_2.5.1-1_armhf.deb sudo dpkg -i libseccomp2_2.5.1-1_armhf.deb

This fixed the particular build error I was running into, but caused another one: the apk install command was referencing an old python package. I bumped the apk command and that was fixed.

At this point, docker build was running but executing the image caused a different error! This time python was complaining:

ModuleNotFoundError: No module named 'six'

With some googling it looks like that can happen if the pypi project is removed, which is what was happening in the Dockerfile script. I updated the docker file to stop removing pypi which fixed the issue.

However, when I tried to run the image with a SCHEDULE: '@daily' (in the yaml file above) I ran into a go-cron failure. The package hasn’t been updated in many years, so I’m guessing it was an incompatibility with the latest alpine version.

Instead of using that package, I opted to modify the run.sh script to use the native cron functionality. I found conflicting information about using native cron functionality:

Some claimed you needed to use complex workarounds or use some sort of wrapper (similar to the workaround described earlier in the post). I found that (a) running cron in the foreground and (b) using -d 8 (option available via busybox cron) routes all cron logs to the parent stdout so you’ll see it in the docker logs.

I rebuilt the container (docker build -t iloveitaly/mysql-backup-s3 .) and applied the modifications; finally everything was working.

I ended up trying out Storj, which is a decentralized s3-compatible storage service. It comes with a generous (150gb) free tier, and it gave me an excuse to tinker around with some dweb stuff. It worked surprisingly well.

Moral of the story: if something does go wrong (high likelihood when using a system with relatively low adoption like raspberrypi) it’s a pain to debug, the feedback loop is painful.

Thoughts on Docker

It was fun playing with Docker images and getting a feel for the ecosystem. I’ll write about docker-compose separately, but it’s a very nice abstraction on top of a raw Dockerfile. The ecosystem has consistently improved over the years and Docker has been hugely helpful in eliminating differences between development, CI, staging, and production environments.

That being said, it was surprising to me how brittle Dockerfiles were (broke easily on the Pi) and how slow it was to debug them. They also take up a ton of ram on macOS. I’m due for a new MacBook, but I have 16gb of RAM, and Docker ate up my free RAM and slowed down my computer to a halt. I can see the value in using Docker to quickly spin-up a local Redis, Postgres, etc but the speed cost for local development was too high for me.

I find it fun to play around with lower level linux system stuff, but I don’t have much patience for tinkering with it when I’m just trying to get something deployed for an application I’m building. I’m a big fan of Heroku for this reason—they build the container image(s) for you automatically with basically zero configuration on your part. If you want more control over your infrastructure, you can use the open source alternative Dokku. Or, if you still want to run Docker images manually, you can use BuildPacks to generate the docker image for you.

This is all to say, I don’t see the value in managing Dockerfiles directly unless you are a very large company who needs nuanced control over your application’s runtime environment. Definitely helpful to understand how this technology works under the hood, but I can’t see myself managing these Dockerfiles directly instead of using a Heroku-like system.

Continue Reading

Using GitHub Actions With Python, Django, Pytest, and More

GitHub actions is a powerful tool. When GitHub was first released, it felt magical. Clean, simple, extensible, and adds so much value that it felt like you should be paying for it. GitHub actions feel similarly powerful and positively affected the package ecosystem of many languages.

I finally had a chance to play around with it as part of building a crypto index fund bot. I wanted to setup a robust CI run which included linting, type checking, etc.

Here’s what I learned:

It’s not possible to test changes to GitHub actions locally. You can use the GH CLI locally to run them, but GH will use the latest version of the workflow that exists in your repo. The best workflow I found is working on a branch and then squashing the changes. You can use GitHub actions to run arbitrary scripts on a schedule. This may sound obvious, but it can be used in really interesting ways, like updating a repo everyday with the results of a script. You can setup dependabot to submit automatic package update PRs using a .github/dependabot.yml file. The action/package ecosystem seems relatively weak. The GitHub-owned actions are great and work well, but even very popular flows outside of the default action set do not seem widely used and seem to have quirks. There are some nice linting tools available with VS Code so you don’t need to remember the exact key structure of the GitHub actions yaml. Unlike docker’s depends_on, containers running in the services key, are not linked to the CI jobs in a similar way to docker compose yaml files. By ‘linked’ I’m referring to exposing ports, host IP, etc to the other images that are running your jobs. You need to explicitly define ports to expose on these service images, and they are all bound to localhost. on: workflow_dispatch does not allow you to manually trigger a workflow to run with locally modified yaml. This will only run a job in your yaml already pushed to GitHub. Matrix builds are easy to setup to run parallelized builds across different runtime/dependency versions. Here’s an example. Some details about the postgres service: Doesn’t seem like you can create new databases using the default postgres/postgres username + password pair. You must use the default database, postgres. Unlike docker, the image does not resolve the domain postgres to an IP. Use 127.0.0.1 instead. You must expose the ports using ports: otherwise redis is inaccessible. You must set the password on the image, which felt very strange to me. You’ll run into errors if you don’t do this.

Here’s an example .github/workflows/ci.yml file with the following features:

Redis & postgres services for Django ORM, Django cache, and Celery queue store support. Django test configuration specification using DJANGO_SETTINGS_MODULE. This pattern is not standard to django, here’s more information about how this works and why you probably want to use it. Database migrations against postgres using Django Package installation via Poetry Caching package installation based on VM type and SHA of the poetry/package lock file Code formatting checks using black and isort Type checking using pyright Linting using pylint Test runs using pytest name: Django CI on: workflow_dispatch: push: branches: [ main ] pull_request: branches: [ main ] jobs: build: runs-on: ubuntu-latest # each step can define `env` vars, but it's easiest to define them on the build level # if you'll add additional jobs testing the same application later (which you probably will env: DJANGO_SECRET_KEY: django-insecure-@o-)qrym-cn6_*mx8dnmy#m4*$j%8wyy+l=)va&pe)9e7@o4i) DJANGO_SETTINGS_MODULE: botweb.settings.test REDIS_URL: redis://localhost:6379 TEST_DATABASE_URL: postgres://postgres:postgres@localhost:5432/postgres # port mapping for each of these services is required otherwise it's inaccessible to the rest of the jobs services: redis: image: redis # these options are recommended by GitHub to ensure the container is fully operational before moving options: >- --health-cmd "redis-cli ping" --health-interval 10s --health-timeout 5s --health-retries 5 ports: - 6379:6379 postgres: image: postgres ports: - 5432:5432 env: POSTGRES_PASSWORD: postgres steps: - uses: actions/checkout@v2 - uses: actions/setup-python@v2 with: python-version: 3.9.6 # install packages via poetry and cache result so future CI runs are fast # the result is only cached if the build is successful # https://stackoverflow.com/questions/62977821/how-to-cache-poetry-install-for-github-actions - name: Install poetry uses: snok/install-poetry@v1.2.0 with: version: 1.1.8 virtualenvs-create: true virtualenvs-in-project: true - name: Load cached venv id: cached-poetry-dependencies uses: actions/cache@v2 with: path: .venv key: venv-${{ runner.os }}-${{ hashFiles('**/poetry.lock') }} - name: Install dependencies run: poetry install if: steps.cached-poetry-dependencies.outputs.cache-hit != 'true' - name: Linting run: | source .venv/bin/activate pylint **/*.py - name: Code Formatting run: | # it's unclear to me if `set` is required to ensure errors propagate, or if that's by default in some way # the examples I found did not consistently set these options or indicate that it wasn't required set -eax source .venv/bin/activate black --version black --check . isort **/*.py -c -v - name: Setup node.js (for pyright) uses: actions/setup-node@v2.4.0 with: node-version: "12" - name: Run type checking run: | npm install -g pyright source .venv/bin/activate pyright . - name: Run DB migrations run: | source .venv/bin/activate python manage.py migrate - name: Run Tests run: | source .venv/bin/activate pytest

Continue Reading

Lessons learned building with Django, Celery, and Pytest

As someone who writes ruby professionally, I recently learned python to build a bot which buys an index of crypto using binance.

The best thing about ruby is Rails, so I wanted an excuse to try out Django and see how it compared. Adding multi-user mode to the crypto bot felt like a good enough excuse. My goal was to:

Add a model for the user that persisted to a database Cron job to kick off a job for each user, preferably using a job management library Add some tests for primary application flows Docker-compose for the DB and app admin

I’ll detail learnings around Docker in a separate post. In this post, I walk through my raw notes as I dug into the django + python ecosystem further.

(I’ve written some other learning logs in this style if you are interested)

Open Source Django Projects

I found a bunch of mature, open-source django projects that were very helpful to grep (or, ripgrep) through. Clone these into a ~/Projects/django folder so you can easily search through them locally when learning:

https://github.com/getsentry/sentry https://github.com/arrobalytics/django-ledger https://github.com/intelowlproject/IntelOwl https://github.com/mdn/kuma – manages the MDN docs https://github.com/apache/airflow https://github.com/kiwicom/kiwi-structlog-config – Advanced structlog configuration examples. More python language learnings

I learned a bunch more about the core python language. I was using the most recent (3.9) version of python at the time.

You can setup imports in __init__ to make it more convenient for users to import from your package. As of python3, you don’t need a __init__ within a folder to make it importable. You can import multiple objects in a single statement from sentry.db.models import (one, two, three) iPython can be setup to automatically reload modified code. Somehow VS Code’s python.terminal.activateEnvironment got enabled again. This does not seem to play well with poetry’s venv. I disabled it and it eliminated some weird environment stuff I was running into. When using poetry, if you specify a dependency with path in your toml, even if it’s the dev section, it still is referenced and validated when running poetry install. This can cause issues when building dockerfiles for production when still referencing local copies of a package you are modifying. It doesn’t seem like there is a way to force a non-nil value in mypy. If you are getting typing errors due to nil values assert var is not None or t.cast are the best options I found. Inline return with a condition is possible: if not array_of_dicts: return None There doesn’t seem to be a one-command way to install pristine packages. poetry env remove python && poetry env use python && poetry install looks like the best approach. I ran into this when I switched a package to reference a github branch; the package was already installed and poetry wouldn’t reinstall it from the github repo. You can copy/paste functions into a REPL with iPython, but without iPython enabled it’s very hard to copy/paste multiline chunks of code. This is a good reason to install iPython in your production deployment: makes repl debugging in production much easier. By default all arguments can be either keyword or positional. However, you can define certain parameters to be positional-only using a / in the function definition. Variable names cannot start with numbers. This may seem obvious, but when you are switching from using dicts to TypedDict you may have keys which start with a number that will only cause issues when you start to construct TypedDict instances. There is not a clean way to update TypedDicts. Looks like the easiest way is to create a brand new one or type cast a raw updated dict. Cast a union list of types to a specific type with typing.cast. Convert a string to a enum via EnumClassName('input_string') as long as your enum has str as one of its subclasses. Disable typing for a specific line with # type: ignore as an inline comment Memoize a function by specifying a variable as global and setting a default value for that variable within the python file the function is in. There is also @functools.cache included with stdlib for this that should work in most situations. mypy is a popular type checker, but there’s also pyright which is installed by default with pylance (VS Code’s python extension). pylint seems like the best linter, although I was surprised at how many different options there were. This answer helped me get it working with VS Code. Magic methods (i.e. __xyz__) are also called dunder methods. A ‘sentinel value is used to distinguish between an intentional None value and a value that indicates a failure, cache miss, no object found, etc. Think undefined vs null in Javascript. First time I heard it used to describe this pattern. The yield keyword is interesting. It returns the value provided, but the state of the function is maintained and somehow wrapped in a returned iterator. Each subsequent next will return the value of the next yield in the logic Unlike ruby, it does not seem possible to add functions to the global namespace. This is a nice feature; less instances of ‘where is this method coming from. Black code formatting is really good. I thought I wouldn’t like it, but I was wrong. The cognitive load it takes off your mind when you are writing code is more than I would have expected. Structured logging with context & ENV-customized levels

structlog is a really powerful package, but the documentation is lacking and was hard to configure. Similar to my preferred ruby logger I wanted the ability to:

Set global logging context Easily pass key/value pairs into the logger Configure log level through environment variables

Here’s the configuration which worked for me:

# utils.py import structlog from decouple import config from structlog.threadlocal import wrap_dict def setLevel(level): level = getattr(logging, level.upper()) structlog.configure( # context_class enables thread-local logging to avoid passing a log instance around # https://www.structlog.org/en/21.1.0/thread-local.html context_class=wrap_dict(dict), wrapper_class=structlog.make_filtering_bound_logger(level), cache_logger_on_first_use=True, ) log_level = config("LOG_LEVEL", default="WARN") setLevel(log_level) log = structlog.get_logger()

To add context to the logger and log a key-value pair

from utils import log log.bind(user_id=user.id) log.info("something", amount=amount) Django poetry add django to your existing project to get started. Then, poetry shell and run django-admin startproject thename to setup the project Django has an interesting set of bundled apps: activeadmin-like Swap the DB connection information in settings.py to use PG and poetry add psycopg2. Django will not create the database for you, so you need to run CREATE DATABASE <dbname>; to add it before running your migrations. The default configuration does not pull from your ENV variables. I’ve written a section below about application configuration; it was tricky for me coming from rails. django-extensions is a popular package that includes a bunch of missing functionality from the core django project. Some highlights: shell_plus, reset_db, sqlcreate. It doesn’t look like there are any generators, unlike rails or phoenix. Asset management is not included. There’s a host of options you can pick from. There’s a full-featured ORM with adaptors to multiple DBs. Here are some tips and tricks: There’s a native JSONField type which is compatible with multiple databases. Uses jsonb under the hood when postgres is in place. After you’ve defined a model, you autogen the migration code and then run the migrations. python manage.py makemigrations Then, to migrate: python manage.py migrate To get everything: User.objects.all() or User.objects.iterator() to page through them. Getting a single object: User.objects.get(id=1) Use save() on an object to update or create it Create an object in a single line using User.objects.create(kwargs) You need a project (global config) and apps (actual code that makes up the core of your application) It looks like django apps (INSTALLED_APPS) are sort of like rails engines, but much more lightweight. Apps can each have their own migrations and they are not stored in a global folder. For instance, the built-in auth application has a bunch of migrations that will run but are not included in your application source code. I found this confusing. Table names are namespaced based on which app the model is in. If you have a user model in a users app the table will be named users_user. It looks like there is a unicorn equivalent, gunicorn, that is the preferred way for running web workers. It’s not included or configured by default. Flask is a framework similar to sinatra: simple routing and rendering web framework. The app scaffolding is very lightweight. Views, models, tests, and admin UI has a standard location. Everything else is up to the user. There’s a caching system built into django, but it doesn’t support redis by default. I already have redis in place, so I don’t want to use the default adapter (memcache). There’s a package django-redis that adds redis support to django cache. django-extensions has a nifty SHELL_PLUS_PRE_IMPORTS = [("decimal", "Decimal")] setting that will auto-import additional packages for you. It’s annoying to have to import various objects just to poke around in the REPL, and this setting eliminates this friction. Use decimal objects for floats when decoding JSON

In my case, I needed to use Decimals instead of floats everywhere to avoid floating point arithmetic inaccuracies. Even $0.01 difference could cause issues when submitting orders to the crypto exchange.

This is really easy when parsing JSON directly:

requests.get(endpoint).json(parse_float=decimal.Decimal),

If you are using a JSONField to store float values, it gets more complicated. You can’t just pass parse_float to the JSONField constructor. A custom decoder must be created:

class CustomJSONDecoder(json.JSONDecoder): def __init__(self, *args, **kwargs): from decimal import Decimal kwargs["parse_float"] = Decimal super().__init__(*args, **kwargs) class YourModel(models.Model): the_field = models.JSONField(default=dict, decoder=CustomJSONDecoder) Multiple django environments

There is not a standard way of managing different environments (staging, development, test, prod) in django. I found this very confusing and wasted time attempting to figure out what the best practice was here.

Here are some tips & recommendations:

Django doesn’t come with the ability to parse database URLs. There’s an extension, dj_database_url, for this. Poetry has a built-in dev category, which can be used for packages only required for development and test packages. There are no separate test or development groups. python-dotenv seems like the best package for loading a .env file into os.environ. However, if you are building an application with multiple entrypoints (i.e. web, cli, repl, worker, etc) this gets tricky as you need to ensure load_dotenv() is called before any code which looks at os.environ. After attempting to get python-dotenv working for me, I gave decouple a shot. It’s much better: you use it’s config function to extract variables from the environment. That function ensures that .env is loaded before looking at your local os.environ. Use this package instead. By default, Django does not setup your settings.py to pull from the environment. You need to do this manually. I included some snippets below. After getting decouple in place, you’ll probably want separate configurations for different environments. The best way to do this is to set DJANGO_SETTINGS_MODULE to point to a completely separate configuration file for each environment. In your toml you can set settings path [tool.pytest.ini_options] DJANGO_SETTINGS_MODULE = "app.settings.test" to force a different environment for testing. In production, you’ll set the DJANGO_SETTINGS_MODULE to app.settings.production in the docker or heroku environment For all other environments, you’ll set DJANGO_SETTINGS_MODULE to app.settings.development in your manage.py In each of these files (app/settings/development.py, app/settings/test.py, etc) you’ll from .application import * and store all common configuration in app/settings/application.py. Here’s a working example.

Here’s how to configure django cache and celery to work with redis:

CACHES = { "default": { "BACKEND": "django_redis.cache.RedisCache", "LOCATION": config("REDIS_URL"), "OPTIONS": { "CLIENT_CLASS": "django_redis.client.DefaultClient", }, } }

Here’s how to use dj_database_url with decouple:

DATABASES = {"default": dj_database_url.parse(config("DATABASE_URL"))} Job management using Celery

Django does not come with a job queue. Celery is the most popular job queue library out there and requires redis. It looks like it will require a decent amount of config, but I chose to use it anyway to understand how it compared to Sidekiq/Resque/ActiveJob/Oban/etc.

poetry add celery --allow-prereleases (I needed a prerelease to work with the version of click I was using) If you are using redis as the broker (easier for me, since I already had it installed + running) you’ll need to poetry add redis Celery does not use manage.py so it would not load the .env file. I needed to manually run dotenv_load() at the top of your celery config. I discovered that this needed to be conditionally loaded for prod, at which point I discovered that decouple is a much better package for managing configuration. I put my celery tasks within the users application as tasks.py. You can specify a dot-path to the celery config via the CLI: celery -A users.tasks worker --loglevel=INFO You can configure celery to store results. If you do, you are responsible for clearing out results. They do not expire automatically. Celery has a built-in cron scheduler. Very nice! There’s even a nice -B option for running the scheduler within a single worker process (not recommended for prod, but nice for development). When I tried to access django models, I got some weird errors. There’s a django-specific setup process you need to run through. DJANGO_SETTINGS_MODULE needs to be set, just like in manage.py. You can’t import django-specific modules at the top of the celery config file. Celery is threaded by default. If your code is not thread safe, you’ll need to set --concurrency=1. By default, tasks do not run inline. If you want to setup an integration test for your tasks, you need either (a) run tasks in eager mode (not recommended) or (b) setup a worker thread to run tasks for you during your tests. Eager mode is not recommended for testing, since it doesn’t simulate the production environment as closely. However, running a worker thread introduces another set of issues (like database cleanup not working properly). There’s no real downside to using @shared_task instead of @app.task. It’s easier to do this from the start: less refactoring to do when your application grows. Testing

Some more learnings about working with pytest & vcr in combination with django:

Database cleaning is done automatically for you via @pytest.mark.django_db at the top of your test class. This is great: no need to pull in a separate database cleaner. To be able to run pytest which relies on django models/configuration, you need the pytest-django extension. You can stick any config that would be in pytest.ini in your toml file under [tool.pytest.ini_options] You need to setup a separate config for your database to ensure it doesn’t use the same one as your development environment. The easiest way to do this is to add DJANGO_SETTINGS_MODULE = "yourapp.settings.test" to your toml file and then override the database setup in the yourapp/settings/test.py file. You can use pytest fixtures to implement ruby-style around functions. Redis/django cache is not cleared automatically between test runs. You can do this manually via django.core.cache.clear() In a scenario where you memoize/cache a global function that isn’t tied to a class, you may need to clear the cache to avoid global state causing indeterminate test results. You can do this for a single method via clear_cache() or identify all functions with lru cache and clear them. Django has a test runner (python manage.py test). It seems very different (doesn’t support fixtures), and I ran into strange compatibility issues when using it. Use pytest instead. My thoughts on Django

I continue to be impressed with the python ecosystem. The dev tooling (linting, repls, type checking, formatting, etc) is robust, there are reasonably well-written and maintained packages for everything I needed. It seems as though most packages are better maintained than the ruby equivalents. I only once had to dive into a package and hack a change I needed into the package. That’s pretty impressive, especially since the complexity of this application grew a lot more than I expected.

Working with python is just fun and fast (two things that are very important for me!). A similar level of fun to ruby, but the language is better designed and therefore easy to read. You can tell the ecosystem has more throughput: more developers are using various packages, and therefore more configuration options and bugs worked out. This increases dev velocity which matters a ton for a small side project and even more for a small startup. I don’t see a reason why I’d use ruby if I’m not building a rails-style web application.

Rails is ruby’s killer app. It’s better than Django across a couple of dimensions:

Better defaults. Multiple environments supported out of the box. Expansive batteries-included components. Job queuing, asset management, web workers, incoming/outgoing email processing, etc. This is the biggest gap in my mind: it takes a lot more effort & decisions to get all of these components working. Since django takes a ‘bring your own application components’ approach, you don’t get the benefit of large companies like Shopify, GitHub, etc using these and working out all of the bugs for you.

The Django way seems to be a very slim feature set that can be easily augmented by additional packages. Generally, I like the unix-style single responsibility tooling, but in my experience, the integration + maintenance cost of adding 10s of packages is very high. I want my web framework to do a lot for me. Yes, I’m biased, since I’m used to rails but I do think this approach is just better for rapid application development.

This was a super fun project. Definitely learned to love python and appreciate the Django ecosystem.

What I’m missing

There were some things I missed from other languages, although the list is pretty short and nitpicky:

Source code references within docs. I love this about the ruby/elixir documentation: as you are looking at the docs for a method, you can reveal the source code for that method. It was painful to (a) jump into a ipython session (b) import the module (c) ?? module.reference to view the source code. Package documentation in Dash More & better defaults in django setup. Improved stdlib map-reduce. If you can’t fit your data transformation into a comprehension, it’s painful to write and read. You end writing for loops and appending to arrays. Format code references in the path/to/file.py:line:col format for easy click-to-open support in various editors. This drove me nuts when debugging stack traces. Improved TypedDict support. It seems this is a relatively new feature, and it shows. They are frustrating to work with. Open Questions

I hope to find an excuse to dig a bit more into the python ecosystem, specifically to learn the ML side of things. Here are some questions I still had at the end of the project:

Does numpy/pandas eliminate data manipulation pain? My biggest gripe with python is the lack of chained data manipulation operators like ruby/elixir. How does the ML/AI/data science stuff work? This was one of my primary motivations for brushing up on my python skills and I’d love to deeply explore this. How does async/await work in python? How does asset management / frontend work in django? Debugging asdf plugin issues

Although unrelated to this post, I had to debug some issues with an asdf plugin. Here’s how to do this:

Clone the asdf plugin repo locally: git clone https://github.com/asdf-community/asdf-poetry ~/Projects/ Remove the existing version of the repo ~/.asdf/plugins && rm -rf poetry Symlink the repo you cloned: ln -s ~/Projects/asdf-poetry poetry

Now all commands hitting the poetry plugin will use your custom local copy.

Continue Reading

Building a Crypto Index Bot and Learning Python

A long time ago, I was contracted to build a MacOS application using PyObjc. It was a neat little app that controlled the background music at high-end bars around London. That was the last time I used python (early 2.0 days if I remember properly). Since then, python has become the language of choice for ML/AI/data science and has grown to be the 2nd most popular language.

I’ve been wanting to brush up on my python knowledge and explore the language and community. Building a bot to buy a cryptocurrency index was the perfect learning project, especially since there was a bunch of existing code on GitHub doing similar things.

You can view the final crypto index bot project here. The notes from this learning project are below. These are mainly written for me to map my knowledge in other languages to python. Hopefully, it’s also helpful for others looking to get started quickly in the language!

Tooling & Package Management

I work primarily in ruby (and still enjoy the language after years of writing professionally in it). Some of the comparisons below are to the equivalent tooling in ruby-land.

pip == bundle Package versions are specified in a requirements.txt file if you are using pip. https://rubygems.org/ = https://pypi.org/ There’s not really a rake equivalent that’s been adopted by the community. Poetry is an alternative to pip that seems to be the most popular choice for new projects. virtualenv = rbenv, but just for packages, not for the core python version, and is specific to each project. Poetry will autogen a virtualenv for you. There are dev and non-dev categories in poetry, but not a test category by default. Here’s how to add a dev dependency poetry add -D pytest If you are using the VS Code terminal, certain extensions will automatically source your virtualenv. I found this annoying and disabled this extension (can’t remember which extension was causing me issues). pyproject.toml alternative to requirements.txt, but also includes gemspec-like metadata about the package. It looks like poetry update consumes the .toml file and generates a poetry.lock. I’m guessing that other build tools also consume the .toml config and it’s not used just for poetry. The python community seems to be into toml configuration. This is used for poetry package specifications and project-specific variables. I don’t get it: it’s slightly nicer looking than JSON, but you can’t specify arrays or nested hash/dictionaries. Why not just use yaml instead? Or just keep it simple and use JSON? I ran into this issue where poetry was using the global ~/Library/Caches/pypoetry cache directory and I thought this was causing some package installation issues. I don’t think that ended up being the isweu poetry debug poetry config -vvv to see what configuration files are being loaded poetry config --list indicated that a global cache directory was being used. Tried upgrading pip, didn’t work: python3 -m pip install --upgrade pip I can’t remember how I fixed the issue, but these commands were helpful in understanding where poetry throws various code. If you want to hack on a package locally and use it in your project: vcrpy = { path = "/full/path/to/project", develop = true } in your toml file Note that you cannot use ~ in the path definition After adding this to your pyproject.toml run poetry lock && poetry install This will be easier in poetry 1.2 Want to make sure your project is pulling from your locally defined project? You can inspect the path that a module was pulled from via packagename.__file__ i.e. import vcr; print(vcr.__file__) I had trouble with a corrupted poetry env, I had to run poetry env use python to pick up my local package definition Working on a project not using poetry? Create a venv python -m venv venv && source ./venv/bin/activate If there’s a setup.py then run python setup.py install However, you can’t install ‘extra’ dependencies (like development/testing) via setup.py. It looks like pip install -e '.[dev]' It sounds like setup.py and requirements.txt do not define dev dependencies. You’ll probably need to install these manually. Look at the CI definition in the project to determine what dev dependencies need to be installed. There’s a .spec file that seems to be used with pyinstaller, a python package, when packaging a python application for distribution. Pyinstaller is primarily aimed at distributing packages for execution locally on someone’s computer. This use-case is one of the areas where python shines: there’s decent tooling for building a multi-platform desktop application. You’ll see readme-like documents written in rst (restructure text format) instead of md. I have no idea why markdown just isn’t used. A ‘wheel’ is an architecture-specific package bundle that contained compiled binaries. This is helpful if a python package contains non-python code that needs to be compiled since it eliminates the compile step and reduces the change of any library compatibility issues (this is a major problem in PHP-land). black looks like the most popular python code formatter. Language Multiline strings (""") at the beginning of a class or function definition isn’t just a python idiom. They are ‘docstrings’ and get automatically pulled into the autogen’d python documentation. Similar to ruby, camelCase is used for class names, snake_case is used for function/variable names. Calling a function requires parens, unlike ruby or elixir. Like javascript, return values need to explicitly be defined by return val. Conditionals do not return values, which means you need to assign variables inside the block (unlike the ability to assign a variable to the return value of a block in ruby, a feature that I love). Each folder in a python project is transformed into a package that can you import. the __init__ file in the folder is automatically imported when you import the folder name. Imports have to be explicitly defined, like javascript, to use any functions outside the set of global/built-in functions. Speaking of built-in functions, python provides a pretty random group of global functions available to you without any imports. I found this confusing: round() is a built-in but ceil() is not. When you import with a . it looks at the local directory for matching imports first. Import everything in package with from math import *. This is not good practice, but helpful for debugging/hacking. Although you can import individual functions from a package, this is not good practice. Import modules or classes, not individual functions. You have to from package.path import ClassName to pull a classname from a module. You can’t import package.path.ClassName None is nil and capitalization matters True and False are the bool values; capitalization matters. Hashes are called dicts in python Arrays are called lists in python You can check the existence of an element in a list with element in list. Super handy! Triple-quoted strings are like heredocs in other languages. They can be used for long comments or multi-line strings. Substring extraction ranges are specified by the_string[0:-1]. If you omit a starting range, 0 is used: the_string[:-1]. The traditional boolean operators && and || aren’t used. Natural language and and or is what you use instead. Keyword arguments are grouped together using **kwargs in the method definition. You can splat a dict into keyword arguments using function_call(**dict) All arguments are keyword arguments in python. More info. You can lazy-evaluate a comprehension using () instead of [] When playing with comprehensions inside of a ipython session variable scoping will not act the same as if you weren’t executing within a breakpoint(). I don’t understand the reasons for this, but beware! In addition to list comprehensions, there are dictionary comprehensions. Use {...} for these. When logic gets complex for a list comprehension, you’ll need to use a for loop instead (even if you want to do basic log debugging within a comprehension). I miss ruby’s multi-line blocks and chained maps. List comprehensions are neat, but there doesn’t seem to be a way to do complex data transformations cleanly. I hate having to define an array, append to it, and then return it. The filter/map/etc functions can’t be easily chained like ruby or javascript. I wonder what I’m missing here? I’ve heard of pandas/numpy, maybe this is what those libraries solve? There are strange gaps in the stdlib, especially around manipulating data structures. For instance, there’s no dead-simple way to flatten an array-of-arrays. import operator; from functools import reduce; reduce(operator.concat, array_of_arrays) Similarly, there’s no easy way to get unique values from a list. Get all of the string values of an enum [choice.value for choice in MarketIndexStrategy] By subclassing str and enum.Enum (ex: class MarketIndexStrategy(str, enum.Enum):) you can use == to compare strings to enums. There’s no ? tertiary operator, instead you can do a one-liner if-else: assignment = result if condition else alternative To enable string interpolation that references variable names you need to use f"string {variable}". Otherwise you’ll need to run format on the string to get it interpolated: "string {}".format(variable) Python has built-in tuples (1, 2, 3). I’ve always found it annoying when languages just have arrays and don’t support tuples. Unlike ruby, not all code has a return value. You have to explicitly return from a function and you can’t assign the result of a code block to a variable. There’s some really neat python packages: natural language processing, pandas, numpy. Python has gained a lot of traction in the deep learning/AI space because of the high-quality packages available. is is NOT the same as ==. is tests if the variable references the same object, not if the objects are equal in value You can’t do an inline try/catch. Many bad patterns that ruby and other languages really shouldn’t let you do are blocked. In a lot of ways, python is a simpler language that forces you to be more explicit and write simpler code. I like this aspect of the language a lot. Sets are denoted with {}, which is also used for dictionaries/hashes. Here’s how decorators work: The @decorator on top of a method is like an elixir macro or ruby metaprogramming. It transforms the method beneath the decorator. The @ syntax ("pie" operator) calls the decorator function, passing the function below the decorator as an argument to the decorator function, and reassigning the passed function to the transformed function definition. The decorator function must return a function. There is no special syntax to designate a function as a ‘decorator function’. As long it accepts a function as an argument and returns a function, it can be used as a decorator. Referencing an unspecified key in a dict raises an exception. You need to specify a default: h.get(key, None) to safely grab a value from a dict. An empty array will evaluate to false. You don’t need to if len(l) == 0:. Instead you can if !l:. Same goes with empty dicts and sets. Lambdas can only be single-line. This is a bummer, and forces you to write code in a different style. := allows you to assign and test a value within a conditional. Interesting that there’s a completely separate syntax for ‘assign & test’. __init__.py in a folder defines what happens when you import a folder reference. Here’s how classes work: class newClass(superClass): for defining a new class __init__ is the magic initialization method self.i_var within __init__ defines a new instance variable for a class. This is a good breakdown of instance and class variables. you can execute code within a class outside of a method definition for class-level variables and logic, new instances of a class are created via newClass(). Instance methods of a class are always passed self as the first argument Class variables are available on the instance as well, which is a bit strange. You can use class variables as default values for instance variables. This doesn’t seem like a great idea. newClass.__dict__ will give you a breakdown of everything on the class. Kind of like prototype in javascript. Python has multiple inheritance. class newClass(superClass1, superClass2). Inherited classes are searched left-to-right. There are not private variables built into the language, but the convention for indicating a variable is private is using a _ like self._private = value There’s a javascript-like async/await pattern (coroutines). I didn’t dig into it, but seems very similar to Javascript’s pattern. Debugging & Hacking

One of the important aspects of a language for me is the REPL and tinkering/hacking environment. If I can’t open up a REPL and interactively write/debug code, I’m a much slower developer. Thus far, ruby has the best interactive development environment that I’ve encountered:

binding.pry and binding.pry_remote when your console isn’t running your code directly to open a repl Automatic breakpoints on unhandled exceptions, in tests or when running the application locally Display code context in terminal when a breakpoint is hit Print and inspect local variables within a breakpoint Navigate up and down the callstack and inspect variables and state within each frame Overwrite/monkeypatch existing runtime code and rerun it with the new implementation within a repl Define new functions within the repl Inspect function implementation within the repl

I’d say that python is the first language that matches ruby’s debugging/hacking environment that I’ve used. It’s great, and better than ruby in many ways.

inspect is a very helpful stdlib package for poking at an object in a repl and figuring out the method, variables, etc available to it. traceback provides some great tools for inspecting the current stack. How you drop an interactive console at any point in your code? There are a couple ways: Uses the ipython enhanced repl in combination with the built in debugger import ipdb; ipdb.set_trace(). Requires you to install a separate package. There’s a breakpoint() builtin that launches the standard pdb debugger. You can configure breakpoint() to use ipdb via export PYTHONBREAKPOINT=ipdb.set_trace. All of the standard pdb functions work with ipdb import code; code.interact(local=dict(globals(), **locals())) can be used without any additional packages installed. bpython is a great improvement to the default python. You need to install this within your venv otherwise the packages within your projects venv won’t be available to it: pip install bpython && asdf reshim ipython is a bpython alternative that looks to be better maintained and integrates directly with ipdb. python -m ipdb script.py to automatically open up ipython when an exception is raised when running script.py Some misc ipython tips and tricks: If something is throwing an exception and you want to debug it: from ipdb import launch_ipdb_on_exception; with launch_ipdb_on_exception(): thing_causing_exception() who / whos in whereami %psource or source like show-source pp to pretty print an object ipython --pdb script.py to break on unhandled exceptions Great grab bag of interesting tips %quickref for detailed help exit gets you out of the repl entirely All of the pypi information is pulled from a PKG-INFO file in the root of a package rich-powered tracebacks are neat, especially with locals=True The ruby-like metaprogramming/monkeypatching stuff happens via the __*__ functions which are mostly contained within the base object definitions. For instance, logging.__getattribute__('WARN') is equivalent to logging.WARN You can reload code in a REPL via from importlib import reload; reload(module_name). Super helpful for hacking on a module (definitely not as nice as Elixir’s recompile). Monkeypatching in python isn’t as clean as ruby, which in some ways is better since monkeypatching is really an antipattern and shouldn’t be used often. Making it harder and more ugly helps to dissuade folks from using it. To monkeypatch, you reassign the function/method to another method: ClassName.method_name = new_method. Here’s an example. Typing

I’ve become a huge fan of gradual types in dynamic languages. I never use them right away, but once the code hardens and I’m relatively sure I won’t need to iterate on the code design, I add some types in to improve self-documentation and make it safer to refactor in the future.

Python has a great gradual type system built-in. Way better than Ruby’s.

mypy . on the command line to test all python files within a folder. If your project fails to pass mypy, it won’t cause any runtime errors by default. There’s a VS Code extension. This extension is included in Pylance, which you should probably be using instead, but you need to set the typing mode to ‘basic’. Return value types are set with -> before the : at the end of the method definition. Otherwise, typing works very similar to other languages with gradular typing (TypeScript, Ruby, etc). A common pattern is importing types via import types as t t.Union[str, float] for union/any types, You can’t merge dictionaries if you are using a TypedDict (dict | dict_to_merge). Massive PITA when mutating API data. Verbose types can be assigned to a variable, and that variable can be used in type definintions. Handy way to make your code a bit cleaner. Enums defined with enum.Enum can be types. Testing Like Elixir, there are doctests that execute python within docstrings to ensure they work. Neat! There are built-in test libraries that look comparable to ruby’s testunit. pytest is similar to minitest: provides easy plugins, some better standard functionality, and builds on top of unittest. You probably want to use pytest for your testing framework. setup.cfg is parsed by pytest automatically and can change how tests work. conftest.py is another magic file autoloaded by pytest which sets up hooks for various plugins. You can put this in the root of your project, or in test/ Test files must follow a naming convention test_*.py or *_test.py. If you don’t follow this convention, they won’t be picked up by pytest by default. breakpoint()s won’t work by default, you need to pass the -s param to pytest Like ruby, there are some great plugins for recording and replaying HTTP requests. Checkout pytest-recording and vcrpy. To record HTTP request run pytest --record-mode=once If you want to be able to inspect & modify the API responses that are saved, use the VCR configuration option "decode_compressed_response": True There’s a mocking library in stdlib, which is comprehensive. I’m not sure why other languages don’t do this—everyone needs a mocking library. It looks like you set expectations on a mock after it runs, not before. Here’s how mocking works: The @patch decorator is a clean way to manage mocking if you don’t have too many methods or objects to mock in a single test. If you add multiple patch decorators to a method, the mocks for those methods are passed in as additional arguments. The last patch applied is the first argument. mock.call_count, mock.mock_calls, mock.mock_calls[0].kwargs are the main methods you’ll want for assertions asset without parens is used in tests. This confused me, until I looked it up in the stdlib docs and realized assert is a language construct not a method. tox is much more complex that pytest. It’s not a replacement for pytest, but seems to run on top of it, adding a bunch of functionality like running against multiple environments and installing additional packages. It feels confusing—almost like GitHub actions running locally. If you want to just run a single test file, you need to specify an environment identifier and test file tox -epy38-requests -- -x tests/unit/test_persist.py My thoughts on Python

Overall, I’m impressed with how python is improved over the years. Here are some things I enjoyed:

Gradual typing included in the core language Comprehensions are natural to write Syntax simplicity: there are not too many ways to do things, which makes code more straightforward to read. Mature, well-designed libraries Virtual environments out of the box Robust, well-maintained developer tooling (ibpd, ipython, etc) with a advanced REPL Great built-in testing libraries Lots of example code to grep through for usage examples Explicit imports and local-by-default logic (unlike ruby, where it’s much easier to modify global state) Easy to understand runtime environment (in comparison to JavaScript & Elixir/BEAM)

The big question is if Django is a good alternative to Rails. I love Rails: it’s expansive, well-maintained, thoughtfully designed and constantly improving. It provides a massive increase in development velocity and I haven’t found a framework that’s as complete as Rails. If Django is close to rails, I don’t see a strong argument for not using anything python over ruby for a web product.

Open Questions

Some questions I didn’t have time to answer. If I end up working on this project further, this is a list of questions I’d love to answer:

How good is django? Does it compare to Rails, or is it less batteries-included and more similar to phoenix/JS in that sense. Does numpy/pandas solve the data manipulation issue? My biggest gripe with python is the lack of chained data manipulation operators like ruby. How does the ML/AI/data science stuff work? This was one of my primary motivations for brushing up on my python skills and I’d love to deeply explore this. How does async/await work in python? Learning Resources

General guides:

https://python-patterns.guide/python/module-globals/ https://book.pythontips.com/en/latest/ternary_operators.html https://realpython.com/python-lambda/#anonymous-functions https://google.github.io/styleguide/pyguide.html

Monkeypatching:

https://sharmapacific.in/monkey-patching-in-python/ https://github.com/ytdl-org/youtube-dl/commit/00fcc17aeeab11ce694699bf183d33a3af75aab6 https://filippo.io/instance-monkey-patching-in-python/ https://tryolabs.com/blog/2013/07/05/run-time-method-patching-python/ Open Source Example Code

There are some great, large open source python projects to learn from:

https://github.com/getsentry/sentry https://github.com/arachnys/cabot – opens source APM https://github.com/vitorfs/bootcamp https://github.com/rafalp/Misago

Download these in a folder on your local to easily grep through.

Continue Reading