Learning Swift Development for macOS by Building a Website Blocker

I loved Focus App. It blocked websites and apps on a schedule. But, years ago it started glitching out: sucking up tons of ram and freezing my computer. They didn’t fix the bug and I abandoned using it and instead switched to a host-based blocking system which has served me well.

However, there are some issues with the host-based approach:

I can’t block specific URLs, only hosts (focus app couldn’t do this either) I can’t set a schedule I can’t block apps If I remove a host it will not automatically get blocked unless I sleep and wake the computer Sleepwatcher (cli tool) is dead and requires some manual set up to get working.

My goal is to layer on top of the existing host-based system that has been working great and add another layer of focus tooling:

CLI-first tool Allow configuration to be easily set using a JSON file Allow different blocking configuration to be scheduled Replace sleepwatcher by configuring script execution on wake Add a ‘first wake of the day’ trigger that I can tie into clean browsers and todoist scheduler Allow both hosts and partial match urls to be blocked ‘Partial match’ means (a) anchors are excluded and (b) the configured block url must only be a subset of the url on the browser in order to be blocked. This will enable things like blocking news or shopping search on google. Support blocking urls in google chrome and safari No UI, maybe build a simple REST API that could be tied into my beloved Raycast Run CLI tool as privileged (in order to mutate /etc/hosts)

With a clear goal in mind for this learning project, I was able to get started and build this out. Here are the two repos with the resulting code:

hyper-focus CLI source code hyper-focus GUI via Raycast extension

I haven’t touched macOS development in years and hadn’t done any Swift development before. Below are my notes from learning swift and macOS development.

Swift Language The guard statement is explicitly used to return early. It’s like unless in ruby with some special scoping properties. More info. Specifically guard is useful for unwrapping an optional and assigning the unwrapped variable to something that can be used in the outer scope. There’s a community built package manager, but it requires that you (a) have a Package.swift and (b) use a specific source code structure. Both of which are a pain for a simple utility. I found later on that it’s better to just set up your application using Package.swift, even if it’s small. You’ll end up needing a community package and using the swift CLI tooling is nice. There’s a built-in JSON decoder, but it requires you to describe the incoming JSON payload as a struct. This makes sense since swift is strictly typed, but makes fiddling with data structures a PITA. There’s no built-in logging library with levels. There’s an open-source package out there, but not having it included with the stdlib is crazy to me. Here’s a < 50 line implementation of a simple stdout log. @objc exposes the swift function/class to the objective-c side of the world. You don’t have to worry too much about this, the compiler will warn you and enforce that you put these attributes in the right places. You can extend existing classes via extension String and add whatever methods you’d like onto them. I’m surprised by this for what seems an otherwise very structured language. This was a great compromise. One of the guys who works on the Swift language built Rust. I don’t know Rust (it’s on my learning list!) but from what I’ve heard—and the adoption it’s gotten across the new CLI tooling that has been emerging—it’s an amazing language. Probably part of the reason Swift seems so well-designed. Doesn’t seem like there are union types in Swift. You have to define an enum and then unwrap the enum using a switch statement. This seems insane to be and makes for very ugly code, I must be missing something here. You can nest struct definitions, which is nice. You can’t add a trailing comma to arrays or dicts, which drives me nuts. Makes it harder to refactor code and adds additional mental overhead to editing anything. It’s puzzling to me why more languages don’t allow this (one of the things I love about Ruby). You can typecast an object to a specific type with as! SafariWindow I imagine, since Swift is strongly typed, this has some limitations + compile errors, but I don’t know what they are and didn’t bother to learn. You only need an import to pull in a framework, not individual files. All files in the project are automatically compiled. Anything marked with public is available to everything in the project. This seems to indicate something otherwise, still some more investigation needed here. Argument order matters even when using keyword arguments. Bummer. Crash reports are still nearly useless. They have a stack trace, but no line numbers. You need to convert the crash report into a stack trace which is usable, which requires symbol-mapping file (dSYM) generated at the same time as the binary that generated the crash report. PLCrashReporter does a lot of this for you, but for a simple single-file swift script this is a massive pain. There are no stack traces on the command line, even in debug mode. ! asserts that the optional is not nil. If it is, your app will crash. You can use as? to define a default value if a non-nil value does not exist Method overloads exist, so you can define a method multiple times with different params. I really like this pattern, wish Swift had method guards like Elixir (one of my favorite things about Elixir). You have to explicitly indicate that a func could throw an exception with throws in the method signature. This is interesting, I think I like it, makes the design of the function more explicit. Empty dictionary is [:], and you can inline-type Any to a dictionary via varName: [String: Any]. I think Swift dictionaries are the same as an NSDictionary under the hood. dispatchMain() is not the same as RunLoop.main.run() despite what some blog articles say. let == const in JavaScript, var is roughly equivalent to JavaScript. Multiple let statements in an if can be separated by a comma. If any of the let statements results in a nil value, then the if statement fails. I don’t understand the value of this syntax above &&. I don’t like this language design choice. There are some magic variables. For instance, if you are in a catch block the error variable represents the exception. If you have a global function named error it is not accessible and overwritten by the local error variable. I didn’t read up on Swift’s memory allocation strategy, but my assumption is if a var isn’t referenced any longer (i.e. out of scope) it’s removed/garbage collected. The foot gun here is you have a class which subscribed to a notification (NSWorkspace.shared.notificationCenter.addObserver) but that class is not assigned to a var that will continue to persist after the caller completes (i.e. a class or global variable) the object will be garbage collected and you’ll never receive that notification and an error will not be thrown. However, if a function creates a Task which creates its own run loop, that task will continue to run as long as the loop is created even after the caller that created the Task has completed. I would imagine this is a bad design pattern. This also applies to other systems which receive ‘notifications’. I use this word very vaguely because I don’t understand macos subsystems very well/at all. It seems like there are ‘grand central dispatch’ queues which feel similar to a SQS queue, and those seem to be impacted as well. Any async pub/sub type interface would be impacted by the subscriber being garbage collected and you will not receive an error. It puzzles me why errors are not thrown. Hosting a localhost server

This is simple as long as you do bind to a local IP: localhost, 127.0.0.1, etc. If you bind to your router’s IP address you’ll run into all sorts of permissioning issues:

The default permissioning is different depending on what macos version you are on. Here’s an example of how to check an application’s default permissioning You cannot change your entitlements/permissions if you are just building a simple binary or cli app. You need an app with a Info.plist to set the proper security config. This is because of new security stuff that apple has introduced. This means you need to use xcode to setup and build your application. I couldn’t find any good examples of an app that is built without using XCode. The alternative to this is using another layer of indirection, like tuist. This is bringing back memories of all of the stuff I hated about desktop application development. Don’t bind to the device IP (i.e. the wifi- or ethernet-assigned address) unless you need to. Bind to localhost so the server is only accessible on the device. Swift server package options https://criollo.io https://github.com/httpswift/swifter https://github.com/Building42/Telegraph https://github.com/envoy/Ambassador Packaging

Not using a Package.swift for anything even slightly complex will bring a world of pain:

The VS Code tooling doesn’t work as well (no error highlights and LSP stuff) You can’t use a package manager and therefore can’t easily pull in community packages Anything that uses swift build doesn’t work

You’ll want to use a Package.swift in your project. Generating a Package.swift is pretty easy:

swift package init --type executable

When running swift build I ran into:

no such module 'PackageDescription

This post describes the issue and the following command fixes it for me:

sudo xcode-select --reset

If you run into issues with compilation errors due to some features not being available on older macos versions, you’ll need to add a platform requirement to your Package.swift:

platforms: [ .macOS(.v13) ],

Here’s an example Package.swift for the CLI tool.

Cleaning All Cache

I ran into a very weird build error:

❯ swift run Building for debugging... Build complete! (0.25s) dyld[21481]: Symbol not found: (_$s10Foundation11JSONDecoderC6decode_4fromxxm_AA4DataVtKSeRzlFTj) Referenced from: '/Users/mike/Projects/focus-app/.build/x86_64-apple-macosx/debug/focus-app' Expected in: '/System/Library/Frameworks/Foundation.framework/Versions/C/Foundation' [1] 21481 abort swift run

Even after resetting the project to a state where I knew it compiled, it still errored out. After walking away for a while, I found this post and tried updating the min macos version. It magically fixed the issue.

Here’s what I used to clear all build caches:

rm -Rf .build/ rm Package.resolved rm -Rf ~/Library/Developer/Xcode/DerivedData rm -Rf /Users/mike/Library/Caches/org.swift.swiftpm Open Questions Is there a way to open a repl with your application’s code imported? It was nice that a compiled language had a recent repl, but ideally, I want to open a repl and be able to import/use my applications code. How is the debugger? I just did caveman debugging for this project and didn’t bother understanding the GUI debug tooling. It’s unclear how good the package ecosystem is. It seems better than my Cocoa days, but there weren’t that many options and the package activity seems pretty dead. It doesn’t seem like you can build a .app without an xcode project. This is annoying, especially if you are building a small tool and don’t want to learn and understand the xcode toolchain (it still seems terrible). I wonder if I’m missing something here and if there’s some good tooling to support a CLI-based application build? I was surprised at how many errors were not reported. If you’ve subscribed an object as an observer to a notification center, the object was GC’d, that should give you an error. It seems like there were a good number of silent failures which made it harder to discover unexpected failures, especially to someone who is not a desktop developer. I wonder if there’s some env flags that change this behavior. I never understood/learned exactly what the @ does in Swift. It looks like a JS/Python decorator, but it’s unclear if all of the annotations are owned by Swift or if developers can write their own. Where is the documentation for all of the magic variables? i.e. error in a catch block? Open Source https://github.com/Ranchero-Software/NetNewsWire https://github.com/rxhanson/Rectangle Has automated some of the release process https://github.com/exelban/stats https://github.com/kean/PulsePro https://github.com/piemonte/Player https://github.com/cirruslabs/tart https://github.com/signalapp/Signal-iOS https://github.com/onevcat/Rainbow https://github.com/Sequel-Ace/Sequel-Ace https://github.com/HedvigInsurance/ugglan https://github.com/lvillani/chai https://github.com/halo/LinkLiar Thoughts on Swift

Swift is a really nice language. I like how it is strongly typed, but the typing system is good at inferring types when it can, so you don’t have to specify that many types. The type inference seems very good—better than TypeScript, Sorbet, and python from what I can tell.

I don’t like how there are not any imports, and how anything marked as public can clutter the global namespace. I hate this about ruby, and it’s something I think python gets very right. I wish there would be explicit imports and any package-level functions would be forced to be called with their package name. I can understand how this would get very messy with the objc stuff, but that could have been special-cased in some way.

Some of the objc interface stuff is strange, but I think the language designers did a very good job of dealing with it in a simple way.

The tooling isn’t bad but there are some strange gaps in the stdlib, largely because of the legacy cocoa infrastructure you can leverage. I found this annoying: there’s not a simple logger, there’s no built-in yaml parser, etc. The Cocoa apis have a lot of legacy decisions to deal with and they are generally a pain to use. I wish the stdlib was more expansive and designed without thinking about the legacy APIs too much.

The package manager requires you to build your application in a specific way, which is annoying, but if you follow the golden path things work in a pretty clean way. It’s nice that there is an official package manager that Apple is committed to maintaining.

After writing something simple in Swift, I found myself wishing JavaScript was Swift. It feels like JavaScript in many ways, but has less foot guns and is more simple. The language designers did a great job, and it felt fun to work in.

Continue Reading

My Experience With GitHub Codespaces

I have an older intel MacBook (2016, 2.9ghz) that I use for personal projects. My corporate machine is an M1 Macbook Pro and I love it, but I’ve been holding off on replacing my personal machine until the pro M2 comes out (hopefully soon!).

I love playing with new technology, especially developer tools, and when I got accepted to the codespace beta I couldn’t resist tinkering with it. To speed up my ancient MacBook, try some new tech, and have the ability to learn more ML/AI tooling in the future.

Summary

I largely agree with this analysis.

Codespaces are very cool. They work better than I expected—it felt like I was developing on a local machine. Given how expensive the sticker pricing is, I don’t get why you wouldn’t just buy a more powerful local machine in a corporate setting (codespaces is free for open source work). I can’t see devs being ok with a Chromebook vs MacBook pro, so the cost savings aren’t there (i.e. buy a cheaper machine and put the savings into rented codespace).

You could run a similar dockerized setup locally on the MacBook if you wanted to normalize the dev environment (which is a big benefit, esp in larger orgs). I think this is one of the best benefits of codespaces—completely documented and normalizing your development environment so it’s portable across machines.

Notes

Here are some notes & thoughts on my experience with codespaces:

Codespace is essentially a docker image running on a VM in the cloud wired up to your VS code local installation in a way that makes your experience feel like you aren’t using a remote machine. Amazingly, code, gh pr view --web, etc all work (i.e. opens a local browser) and integrate with macOS. They’ve done a decent job integrating codespaces into the native experience so you forgot If you are curious, this is done by a magic environment variable: BROWSER=/vscode/bin/linux-x64/e7f30e38c5a4efafeec8ad52861eb772a9ee4dfb/bin/helpers/browser.sh Add Development Container Configuration is the command you need to run to autogen the default .devcontainer/ config for your codespace. Your dotfiles are magically cloned to /workspaces/.codespaces/.persistedshare/dotfiles File system changes are not instantly updated in the file explorer. There is a slight delay, which is frustrating. It looks like there is a reference that has emerged after the initial beta. Lots of examples/open source code still references some of the old stuff, so you’ll have to be careful not to cargo-cult everything if you want to build things in the latest style that will be resilient to changes. /workspaces/.codespaces/shared/.env has a bunch of tokens and context about the environment. You can have multiple windows/editors against multiple folders. You can do this by cloning additional folders to /workspaces and then run code . when cd‘d in that folder. Terminal state is not restored when a codespace is paused Codespace logs are persisted to /workspaces/.codespaces/.persistedshare/EnvironmentLogbackup.txt. You can also access them via the cli gh codespace logs Some of the utilities used to communicate with your local installation of vscode are located in ~/.vscode-remote/bin/[unique sha]/bin/. It’s interesting to poke around and understand how client communication works. /workspaces/.codespaces/shared/.env-secrets contains github credentials, and other important secrets. CODESPACE_VSCODE_FOLDER is not setup in /etc/profile.d. This is injected into the environment via VS Code extension JavaScript. Therefore, this variable is not available during postCreateCommand execution. If you’ve used the remote SSH development, much of the magic that makes that work is used in a codespace. There’s a hidden .vscode folder installed on the remote machine and some binaries which run there to make VS Code work properly. Load order

I couldn’t find clear documentation on the load order: when does your code get copied to the container, when do all of the VS code tools startup on the machine, etc. https://containers.dev/implementors/spec/ for the general devcontainer specification, but it’s not too helpful.

Dockerfile. Your application code does not exist, features are not installed. Features (like brew). Each feature is effectively a bundle of shell scripts that are executed serially. Application code does not exist at this point. Post Install. Dockerfile is built, features are installed, application code exists, dotfiles are not installed. Dotfiles. At this step (and all previous steps), code (vs code cli) does not exist and has not yet been installed. Sometime after this the code binary is installed and some of the daemon-like processes that run on the remote machine are started up. From what I can tell, there’s not a single-run lifecycle hook that you can use at this stage. ASDF: Version Manager for Everything

I really like asdf conceptually: one version manager to rule them all. Consistent versions and installation methods across machines and languages. Simple and beautiful. I’ve been using it for years on Elixir, Ruby, JavaScript, and Python projects and have had a great experience.

The devcontainer image examples had a completely different runtime for each major language. What if you use multiple languages? What if your environment is more custom?

I thought it would make sense to try to use asdf across all projects, as opposed to language-specific builds.

Some notes:

If you install asdf via homebrew it will throw asdf installation files in /home/linuxbrew/.linuxbrew/Cellar/asdf/0.10.2/libexec/asdf.sh. Many tools, including ElixirLS assume that the full installation exists in ~/.asdf . This caused issues on the codespace, it seems as though the shell script to start ElixirLS was not using the default shell and did not seem to be sourcing standard environment variables. I’m guessing depending on how the extension is built it does not properly run in I ran into weird issues with pyright: poetry run pyright . returned zero errors, while running pyright. inside of poetry shell triggered a lot of errors relating to missing imports (related issue). Erlang uses devcontainers and asdf, which is a good place to look for examples.

Here’s the image I ended up building and it’s been working great across a couple of projects.

Docker Compose & Docker-in-Docker

Using docker compose (to run postgres, redis, etc) is super helpful but is not straightforward. Here’s how I got it working:

You can specify a docker-compose.yml file to be used in your devcontainer.json. This seems like a great idea until you realize that you can’t manage the other services that are started through the compose definition at all. You are "trapped" inside your application container and cannot inspect or manage the other processes at all. Most of the documentation + content out there recommends using dockerComposeFile in your devcontainer.json. This is not the best way. The more flexible approach is to install docker inside a single container. This requires a bit more setup, specifically passing additional flags to the parent docker container in order to be able to run docker. Dotfiles transformation

My dotfiles are very well documented, but were not ready for codespaces. I needed to do some work to separate out the macos specific stuff from the cross-platform compatible tools.

Here’s a great guide on how to get your dotfiles setup Thankfully, brew works on linux and has a really easy integration within codespaces. This made my life easier since my dotfiles are built around brew. Pull out packages that are system-agnostic and stick them in a Brewfile. Here’s mine. Create an install script specifically for codespaces. Here’s what mine looks like. VS Code Extensions

Sync Settings extensions are not installed automatically. You have to specify which extensions you want installed on the codespace through a separate configuration github.codespaces.defaultExtensions.

Homebrew Installation Failure

Due to old packages (or old apt-get state, not sure which) installed on the image. If you use a raw base image for your codespace, you need to ensure you run apt-get update in order for homebrew install to work properly.

Another alternative is using the dev- variant of many of the base images (here’s an example).

GPG Signing

It looks like the codespace machine calls some sort of GH API to power the GPG signing. If you have a .gitconfig in your dotfiles, it will overwrite the custom settings GitHub creates when generating the codespace machine. You’ll run into errors writing commits in this scenario.

Here’s what you need to do to fix the issue:

git config --global credential.helper /.codespaces/bin/gitcredential_github.sh git config --global gpg.program /.codespaces/bin/gh-gpgsign

You’ll also want to ensure that GPG signing is enabled for the repository you are working in. If it’s not, you’ll get the following error:

error: gpg failed to sign the data fatal: failed to write commit object

You can ensure you’ve allowed GPG access by going to your codespace settings and looking at the "GPG Verification" header.

As an aside, this was an interesting post detailing out how to debug git & gpg errors.

Awk, and other tools

The version of awk on some of the base machines seems old or significantly different than the macOS version. It wouldn’t even respond to awk --version. I installed the latest version via homebrew and it fixed an issue I was having with git fuzzy log where no commit found on line would be displayed when viewing the commit history.

I imagine other packages are old or have strange versions installed too. If you run into issues with tooling in your dotfiles that work locally, try updating underlying packages.

Shell Snippets

Here are some useful shell commands to make integrating cs with your local dev environment more simple.

# gh cli does not provide an easy way to pull the codespace machine name when inside a repo targetMachine=$(gh codespace list --repo iloveitaly/$(gh repo view --json name | jq -r ".name") --json name | jq -r '.[0].name') # copy files from local to remote machine. Note that `$RepositoryName` is a magic variable that is substituted by the gh cli gh codespace cp -e -c $targetMachine ./local_file 'remote:/workspaces/$RepositoryName/remote_file' # create a new codespace for the current repo in the pwd gh alias set cs-create --shell 'gh cs create --repo $(gh repo view --json nameWithOwner | jq -r .nameWithOwner)' Unsupported CLI Tooling

Here are some gotchas I ran into with my tooling:

zsh-notify. Macos popup when a command completes won’t work anymore. pbcopy/pbpaste doesn’t work in the terminal. You lose all of your existing shell history. There are some neat tools out there to sync shell history across machines, might be a way to fix this. Open Questions Is there more control available for codespaces generated by a pull request? Ideally, you could have a script that would run to generate sample data, spin up a web server, etc and make that web server available to the public internet in some secure way. I think vercel does this in some way, but it would be neat if this was built into GitHub, tied into VS Code, and allowed for a high level of control. I’m still in the process of learning/mastering tmux, there seemed to be some incompatibilities that I’ll need to work around. cmd+f within the integrated shell doesn’t search through scroll buffer clipboard integration doesn’t work (main reason for using tmux is keyboard scroll-buffer search and copy/paste support) pbcopy/pbpaste, which I use pretty often, doesn’t work. A good option is using something like Uniclip, but this will require some additional effort to get working. Other alternatives that might be worth investigating: https://github.com/jedisct1/piknik https://gist.github.com/dergachev/8259104 I had trouble with some specific VS Code tasks not working properly. This was due to how some tasks build the shell environment. Can you run github actions locally within the codespace? This would be super cool. Looks like it’s not possible right now, but there’s some open source tooling around this which looks interesting. There’s got to be a cleaner way to sharing a consistent ssh key with a codespace for deploys. This post had some notes around this. I’m not sure how the timeout works. What if I’m running a long-running test or some other terminal process? Will it be terminated? Is there a way to keepalive the session in some other side process? Can you mount the remote drive locally and have it available in the finder? scping files to view and manipulate locally is going to get tired fast.

Continue Reading

Book Notes: The Hard Thing About Hard Things

Something new I’m doing this year is book notes. I believe writing down your thoughts helps you develop, harden, and remember them. Books take a lot of time to read, taking time to document lessons learned is worth it.

Here are the notes for The Hard Thing About Hard Things by Ben Horowitz. Definitely worth reading, especially if you are actively building a company, although I wouldn’t say it’s in the must-read category.

Below are my notes! Enjoy.

Leadership

A much better idea would have been to give the problem to the people who could not only fix it, but who would also be personally excited and motivated to do so.

I think any good leader feels personally responsible for the outcome of whatever they are doing. Everything is their job, in the sense that ultimately if the project isn’t successful it is their fault.

However, I think Ben’s framing is important: it’s the leaders job to clearly describe problems—instead of hiding them—no matter how large, and get the right people aligned to the problem who are energized by big scary problems that need to be solved.

The more you communicate without BS—describing reality exactly how it is—the more people will trust what you say. There are no lines to read between. It takes time for this trust to filter its way through an organization, but it makes any other communication (which is a prime job of a leader) way easier in the future.

Former secretary of state Colin Powell says that leadership is the ability to get someone to follow you even if only out of curiosity.

Sometimes only the founder has the courage to ignore the data;

It’s nice to lean on data to make decisions. All of the great decisions in life need to be made out of an absence of data; in the absence of certainty. The safety of the modern world has made us less comfortable with taking risks and being decisive in areas of life where it is impossible to get certainty.

the wrong way to view an executive firing is as an executive failure; the correct way to view an executive firing is as an interview/integration process system failure.

Ben has a lot of counterintuitive thinking about executive management throughout the book. I found the thinking around executive hiring, management, etc the part of the book most worth reading.

He articulates the executive hiring, management, and firing process as incredibly messy, opaque, and constantly changing. I think this is the thing that technical founders struggle with a lot—it’s not straightforward, requires a lot of tacit knowledge that can only be acquired through experience, and requires lots of conflict-laden conversations which everyone hates.

Part of the leader’s job is the ability to step-in and cover any of the executive’s job if they leave or are fired. This helps the leader understand what’s really needed in that role at this stage of the company.

What is needed from an executive changes quickly as a company grows. It’s your job as a leader to understand what is needed right now, communicate that expectation, and then measure their performance off that revised standard. It’s up to the executive to figure out how to retool their skills to meet the new requirements; you don’t have time to help them here. If they can’t figure out the new role you need to let them go fast.

Management techniques that work with non-executives don’t work with executives. You can’t lead professional leaders in the same way. For instance, the "shit sandwich" approach feels babying to a professional when it may work well for a lead-node individual contributor. What works on a lead-node team doesn’t work when running a management team.

in my experience, look and feel are the top criteria for most executive searches.

Developing and holding to an independent standard in any of life is incredibly hard. We are deeply mimetic and avoiding pattern-matching on what the herd believes is right is one of the hardest tasks of leadership.

Consensus decisions about executives almost always sway the process away from strength and toward lack of weakness.

You want someone who is world-class at thing you are hiring them for. Make sure your organization can swallow their faults; don’t try to avoid faults—even major ones—completely.

Relatedly, the concept of "madness of crowds" is a good mental model to keep in mind.

This is why you must look beyond the black-box results and into the sausage factory to see how things get made.

Understanding how things work at the ground-level in an organization is key to improving performance. I always thought Stripe’s leadership did a great job here: jumping into engineering teams for a week to understand what the real problems were can’t be replaced by having 100 1:1s.

I describe the CEO job as knowing what to do and getting the company to do what you want.

This is what I liked most about the book—plain descriptions of commonly amorphous concepts.

Company building

as often candidates who do well in interviews turn out to be bad employees.

If someone is good at cracking an interview, it could be a signal that they aren’t good at the core work. If someone is exceptional, they aren’t going to care about interviewing well or understanding the big-company decision-making matrix around hiring: they know they are smart and want to work at a place that values the work.

This is a distinct advantage startups have. I love the interview process at one of my new favorite productivity apps:

We don’t do whiteboard interviews and you’re always allowed to google. We’ll talk about things you’ve previously worked on and do a work trial – you’ll be paid as a contractor for this.

They can focus on the work and ignore the mess of other signals that are only important when you need to ensure quality at scale.

In good organizations, people can focus on their work and have confidence that if they get their work done, good things will happen for both the company and them personally. It is a true pleasure to work in an organization such as this. Every person can wake up knowing that the work they do will be efficient, effective, and make a difference for the organization and themselves. These things make their jobs both motivating and fulfilling.

Simple and true description of what makes a company great, and conversely what makes bureaucratic organizations painful to operate in.

Companies execute well when everybody is on the same page and everybody is constantly improving.

Constant improvement compounds over time.

What do I mean by politics? I mean people advancing their careers or agendas by means other than merit and contribution.

Good definition of politics.

I’d love to understand what companies have designed a performance process for higher management tiers that isn’t political. At larger companies, getting promoted to higher levels becomes more political almost by definition: it’s harder to describe your impact quantitatively because your work is more people-oriented and dependent on your leadership ability.

Perhaps the CEO’s most important operational responsibility is designing and implementing the communication architecture for her company.

I’d love to hear more stories about well-designed communication systems in companies.

Perhaps most important, after you and your people go through the inhuman amount of work that it will take to build a successful company, it will be an epic tragedy if your company culture is such that even you don’t want to work there.

Reminds me of the parenting idea "don’t raise kids that you don’t want to hang out with."

the challenge is to grow but degrade as slowly as possible.

Ben makes the assumption that all companies degrade over time. Things that were easy become difficult when you add more people: mostly because of the communication overhead/coordinate and knowledge gaps across the organization.

I want to learn more about what organizations fought against this and when they felt there was an inflection point of degradation. How big can you grow before things degrade quickly?

Management

big company executives tend to be interrupt-driven.

They wait for problems to come to them, and they don’t execute work individually. Be aware of when you’ve reached this stage and then hire for these people. Hiring this type of person too early will most likely fail—if you are used to working in this style, it’s hard to change.

An early lesson I learned in my career was that whenever a large organization attempts to do anything, it always comes down to a single person who can delay the entire project.

Resonates with my experience. It’s amazing how one or two B players can destroy the ability to get anything significant done. The Elon Musk biography talks about how Elon’s employees were terrified about being "the blocker" and would do anything they needed to in order to avoid being that person. He would ask for status update multiple times a day and force you to do whatever needed to be done to eliminate yourself as the primary blocker.

However, if I’d learned anything it was that conventional wisdom had nothing to do with the truth and the efficient market hypothesis was deceptive. How else could one explain Opsware trading at half of the cash we had in the bank when we had a $20 million a year contract and fifty of the smartest engineers in the world? No, markets weren’t “efficient” at finding the truth; they were just very efficient at converging on a conclusion—often the wrong conclusion.

[managing by the numbers] penalizes managers who sacrifice the future for the short term and rewards those who invest in the future even if that investment cannot be easily measured.

Not everything can be measured. You need to have qualitative and quantitative metrics, and you can’t rely too strongly on quantitative metrics. Building anything great requires great conviction in the absence of evidence supporting the outcome you believe is inevitable.

As Andy Grove points out in his management classic High Output Management, the Peter Principle is unavoidable, because there is no way to know a priori at what level in the hierarchy a manager will be incompetent.

This is the sort of thing that makes management so incredibly hard.

If you become a prosecuting attorney and hold her to the letter of the law on her commitment [to fix a problem that she discovered], you will almost certainly discourage her and everybody else from taking important risks in the future.

No easy answer to this question. You have to hold people accountable but understand the situation enough not to disincentivize critical behavior which improves the company. If you don’t do this right, people notice and will manage their work towards what is indirectly rewarded.

the best ideas, the biggest problems, and the most intense employee life issues make their way to the people who can deal with them. One-on-ones are a time-tested way to do that,

This rings true to me. Although, I think it’s critical to get as much state out of meetings into central systems as possible so 1:1s can mostly focused on the small batch of critically important stuff that cannot be handled async.

Sales

There’s an interesting thread in the story of OpsWare that could yield the lesson "Don’t rely too much on whales". I don’t think anyone would disagree with this advice in the abstract, but I think practically it’s hard to build a big business without whales. I think you want to avoid being too reliant on whales, but I believe you also need to be ok pandering to your largest customers in B2B SaaS and doing what needs to happen to keep them thrilled with you.

There was a really helpful appendix with some great questions and guides to hiring a sales leader. I think these people-oriented jobs can sometimes seem as a black art to the hyper-logical work that technical founders start out doing.

Continue Reading

Using GitHub Actions With Python, Django, Pytest, and More

GitHub actions is a powerful tool. When GitHub was first released, it felt magical. Clean, simple, extensible, and adds so much value that it felt like you should be paying for it. GitHub actions feel similarly powerful and positively affected the package ecosystem of many languages.

I finally had a chance to play around with it as part of building a crypto index fund bot. I wanted to setup a robust CI run which included linting, type checking, etc.

Here’s what I learned:

It’s not possible to test changes to GitHub actions locally. You can use the GH CLI locally to run them, but GH will use the latest version of the workflow that exists in your repo. The best workflow I found is working on a branch and then squashing the changes. You can use GitHub actions to run arbitrary scripts on a schedule. This may sound obvious, but it can be used in really interesting ways, like updating a repo everyday with the results of a script. You can setup dependabot to submit automatic package update PRs using a .github/dependabot.yml file. The action/package ecosystem seems relatively weak. The GitHub-owned actions are great and work well, but even very popular flows outside of the default action set do not seem widely used and seem to have quirks. There are some nice linting tools available with VS Code so you don’t need to remember the exact key structure of the GitHub actions yaml. Unlike docker’s depends_on, containers running in the services key, are not linked to the CI jobs in a similar way to docker compose yaml files. By ‘linked’ I’m referring to exposing ports, host IP, etc to the other images that are running your jobs. You need to explicitly define ports to expose on these service images, and they are all bound to localhost. on: workflow_dispatch does not allow you to manually trigger a workflow to run with locally modified yaml. This will only run a job in your yaml already pushed to GitHub. Matrix builds are easy to setup to run parallelized builds across different runtime/dependency versions. Here’s an example. Some details about the postgres service: Doesn’t seem like you can create new databases using the default postgres/postgres username + password pair. You must use the default database, postgres. Unlike docker, the image does not resolve the domain postgres to an IP. Use 127.0.0.1 instead. You must expose the ports using ports: otherwise redis is inaccessible. You must set the password on the image, which felt very strange to me. You’ll run into errors if you don’t do this.

Here’s an example .github/workflows/ci.yml file with the following features:

Redis & postgres services for Django ORM, Django cache, and Celery queue store support. Django test configuration specification using DJANGO_SETTINGS_MODULE. This pattern is not standard to django, here’s more information about how this works and why you probably want to use it. Database migrations against postgres using Django Package installation via Poetry Caching package installation based on VM type and SHA of the poetry/package lock file Code formatting checks using black and isort Type checking using pyright Linting using pylint Test runs using pytest name: Django CI on: workflow_dispatch: push: branches: [ main ] pull_request: branches: [ main ] jobs: build: runs-on: ubuntu-latest # each step can define `env` vars, but it's easiest to define them on the build level # if you'll add additional jobs testing the same application later (which you probably will env: DJANGO_SECRET_KEY: django-insecure-@o-)qrym-cn6_*mx8dnmy#m4*$j%8wyy+l=)va&pe)9e7@o4i) DJANGO_SETTINGS_MODULE: botweb.settings.test REDIS_URL: redis://localhost:6379 TEST_DATABASE_URL: postgres://postgres:postgres@localhost:5432/postgres # port mapping for each of these services is required otherwise it's inaccessible to the rest of the jobs services: redis: image: redis # these options are recommended by GitHub to ensure the container is fully operational before moving options: >- --health-cmd "redis-cli ping" --health-interval 10s --health-timeout 5s --health-retries 5 ports: - 6379:6379 postgres: image: postgres ports: - 5432:5432 env: POSTGRES_PASSWORD: postgres steps: - uses: actions/checkout@v2 - uses: actions/setup-python@v2 with: python-version: 3.9.6 # install packages via poetry and cache result so future CI runs are fast # the result is only cached if the build is successful # https://stackoverflow.com/questions/62977821/how-to-cache-poetry-install-for-github-actions - name: Install poetry uses: snok/install-poetry@v1.2.0 with: version: 1.1.8 virtualenvs-create: true virtualenvs-in-project: true - name: Load cached venv id: cached-poetry-dependencies uses: actions/cache@v2 with: path: .venv key: venv-${{ runner.os }}-${{ hashFiles('**/poetry.lock') }} - name: Install dependencies run: poetry install if: steps.cached-poetry-dependencies.outputs.cache-hit != 'true' - name: Linting run: | source .venv/bin/activate pylint **/*.py - name: Code Formatting run: | # it's unclear to me if `set` is required to ensure errors propagate, or if that's by default in some way # the examples I found did not consistently set these options or indicate that it wasn't required set -eax source .venv/bin/activate black --version black --check . isort **/*.py -c -v - name: Setup node.js (for pyright) uses: actions/setup-node@v2.4.0 with: node-version: "12" - name: Run type checking run: | npm install -g pyright source .venv/bin/activate pyright . - name: Run DB migrations run: | source .venv/bin/activate python manage.py migrate - name: Run Tests run: | source .venv/bin/activate pytest

Continue Reading

Lessons learned building with Django, Celery, and Pytest

As someone who writes ruby professionally, I recently learned python to build a bot which buys an index of crypto using binance.

The best thing about ruby is Rails, so I wanted an excuse to try out Django and see how it compared. Adding multi-user mode to the crypto bot felt like a good enough excuse. My goal was to:

Add a model for the user that persisted to a database Cron job to kick off a job for each user, preferably using a job management library Add some tests for primary application flows Docker-compose for the DB and app admin

I’ll detail learnings around Docker in a separate post. In this post, I walk through my raw notes as I dug into the django + python ecosystem further.

(I’ve written some other learning logs in this style if you are interested)

Open Source Django Projects

I found a bunch of mature, open-source django projects that were very helpful to grep (or, ripgrep) through. Clone these into a ~/Projects/django folder so you can easily search through them locally when learning:

https://github.com/getsentry/sentry https://github.com/arrobalytics/django-ledger https://github.com/intelowlproject/IntelOwl https://github.com/mdn/kuma – manages the MDN docs https://github.com/apache/airflow https://github.com/kiwicom/kiwi-structlog-config – Advanced structlog configuration examples. More python language learnings

I learned a bunch more about the core python language. I was using the most recent (3.9) version of python at the time.

You can setup imports in __init__ to make it more convenient for users to import from your package. As of python3, you don’t need a __init__ within a folder to make it importable. You can import multiple objects in a single statement from sentry.db.models import (one, two, three) iPython can be setup to automatically reload modified code. Somehow VS Code’s python.terminal.activateEnvironment got enabled again. This does not seem to play well with poetry’s venv. I disabled it and it eliminated some weird environment stuff I was running into. When using poetry, if you specify a dependency with path in your toml, even if it’s the dev section, it still is referenced and validated when running poetry install. This can cause issues when building dockerfiles for production when still referencing local copies of a package you are modifying. It doesn’t seem like there is a way to force a non-nil value in mypy. If you are getting typing errors due to nil values assert var is not None or t.cast are the best options I found. Inline return with a condition is possible: if not array_of_dicts: return None There doesn’t seem to be a one-command way to install pristine packages. poetry env remove python && poetry env use python && poetry install looks like the best approach. I ran into this when I switched a package to reference a github branch; the package was already installed and poetry wouldn’t reinstall it from the github repo. You can copy/paste functions into a REPL with iPython, but without iPython enabled it’s very hard to copy/paste multiline chunks of code. This is a good reason to install iPython in your production deployment: makes repl debugging in production much easier. By default all arguments can be either keyword or positional. However, you can define certain parameters to be positional-only using a / in the function definition. Variable names cannot start with numbers. This may seem obvious, but when you are switching from using dicts to TypedDict you may have keys which start with a number that will only cause issues when you start to construct TypedDict instances. There is not a clean way to update TypedDicts. Looks like the easiest way is to create a brand new one or type cast a raw updated dict. Cast a union list of types to a specific type with typing.cast. Convert a string to a enum via EnumClassName('input_string') as long as your enum has str as one of its subclasses. Disable typing for a specific line with # type: ignore as an inline comment Memoize a function by specifying a variable as global and setting a default value for that variable within the python file the function is in. There is also @functools.cache included with stdlib for this that should work in most situations. mypy is a popular type checker, but there’s also pyright which is installed by default with pylance (VS Code’s python extension). pylint seems like the best linter, although I was surprised at how many different options there were. This answer helped me get it working with VS Code. Magic methods (i.e. __xyz__) are also called dunder methods. A ‘sentinel value is used to distinguish between an intentional None value and a value that indicates a failure, cache miss, no object found, etc. Think undefined vs null in Javascript. First time I heard it used to describe this pattern. The yield keyword is interesting. It returns the value provided, but the state of the function is maintained and somehow wrapped in a returned iterator. Each subsequent next will return the value of the next yield in the logic Unlike ruby, it does not seem possible to add functions to the global namespace. This is a nice feature; less instances of ‘where is this method coming from. Black code formatting is really good. I thought I wouldn’t like it, but I was wrong. The cognitive load it takes off your mind when you are writing code is more than I would have expected. Structured logging with context & ENV-customized levels

structlog is a really powerful package, but the documentation is lacking and was hard to configure. Similar to my preferred ruby logger I wanted the ability to:

Set global logging context Easily pass key/value pairs into the logger Configure log level through environment variables

Here’s the configuration which worked for me:

# utils.py import structlog from decouple import config from structlog.threadlocal import wrap_dict def setLevel(level): level = getattr(logging, level.upper()) structlog.configure( # context_class enables thread-local logging to avoid passing a log instance around # https://www.structlog.org/en/21.1.0/thread-local.html context_class=wrap_dict(dict), wrapper_class=structlog.make_filtering_bound_logger(level), cache_logger_on_first_use=True, ) log_level = config("LOG_LEVEL", default="WARN") setLevel(log_level) log = structlog.get_logger()

To add context to the logger and log a key-value pair

from utils import log log.bind(user_id=user.id) log.info("something", amount=amount) Django poetry add django to your existing project to get started. Then, poetry shell and run django-admin startproject thename to setup the project Django has an interesting set of bundled apps: activeadmin-like Swap the DB connection information in settings.py to use PG and poetry add psycopg2. Django will not create the database for you, so you need to run CREATE DATABASE <dbname>; to add it before running your migrations. The default configuration does not pull from your ENV variables. I’ve written a section below about application configuration; it was tricky for me coming from rails. django-extensions is a popular package that includes a bunch of missing functionality from the core django project. Some highlights: shell_plus, reset_db, sqlcreate. It doesn’t look like there are any generators, unlike rails or phoenix. Asset management is not included. There’s a host of options you can pick from. There’s a full-featured ORM with adaptors to multiple DBs. Here are some tips and tricks: There’s a native JSONField type which is compatible with multiple databases. Uses jsonb under the hood when postgres is in place. After you’ve defined a model, you autogen the migration code and then run the migrations. python manage.py makemigrations Then, to migrate: python manage.py migrate To get everything: User.objects.all() or User.objects.iterator() to page through them. Getting a single object: User.objects.get(id=1) Use save() on an object to update or create it Create an object in a single line using User.objects.create(kwargs) You need a project (global config) and apps (actual code that makes up the core of your application) It looks like django apps (INSTALLED_APPS) are sort of like rails engines, but much more lightweight. Apps can each have their own migrations and they are not stored in a global folder. For instance, the built-in auth application has a bunch of migrations that will run but are not included in your application source code. I found this confusing. Table names are namespaced based on which app the model is in. If you have a user model in a users app the table will be named users_user. It looks like there is a unicorn equivalent, gunicorn, that is the preferred way for running web workers. It’s not included or configured by default. Flask is a framework similar to sinatra: simple routing and rendering web framework. The app scaffolding is very lightweight. Views, models, tests, and admin UI has a standard location. Everything else is up to the user. There’s a caching system built into django, but it doesn’t support redis by default. I already have redis in place, so I don’t want to use the default adapter (memcache). There’s a package django-redis that adds redis support to django cache. django-extensions has a nifty SHELL_PLUS_PRE_IMPORTS = [("decimal", "Decimal")] setting that will auto-import additional packages for you. It’s annoying to have to import various objects just to poke around in the REPL, and this setting eliminates this friction. Use decimal objects for floats when decoding JSON

In my case, I needed to use Decimals instead of floats everywhere to avoid floating point arithmetic inaccuracies. Even $0.01 difference could cause issues when submitting orders to the crypto exchange.

This is really easy when parsing JSON directly:

requests.get(endpoint).json(parse_float=decimal.Decimal),

If you are using a JSONField to store float values, it gets more complicated. You can’t just pass parse_float to the JSONField constructor. A custom decoder must be created:

class CustomJSONDecoder(json.JSONDecoder): def __init__(self, *args, **kwargs): from decimal import Decimal kwargs["parse_float"] = Decimal super().__init__(*args, **kwargs) class YourModel(models.Model): the_field = models.JSONField(default=dict, decoder=CustomJSONDecoder) Multiple django environments

There is not a standard way of managing different environments (staging, development, test, prod) in django. I found this very confusing and wasted time attempting to figure out what the best practice was here.

Here are some tips & recommendations:

Django doesn’t come with the ability to parse database URLs. There’s an extension, dj_database_url, for this. Poetry has a built-in dev category, which can be used for packages only required for development and test packages. There are no separate test or development groups. python-dotenv seems like the best package for loading a .env file into os.environ. However, if you are building an application with multiple entrypoints (i.e. web, cli, repl, worker, etc) this gets tricky as you need to ensure load_dotenv() is called before any code which looks at os.environ. After attempting to get python-dotenv working for me, I gave decouple a shot. It’s much better: you use it’s config function to extract variables from the environment. That function ensures that .env is loaded before looking at your local os.environ. Use this package instead. By default, Django does not setup your settings.py to pull from the environment. You need to do this manually. I included some snippets below. After getting decouple in place, you’ll probably want separate configurations for different environments. The best way to do this is to set DJANGO_SETTINGS_MODULE to point to a completely separate configuration file for each environment. In your toml you can set settings path [tool.pytest.ini_options] DJANGO_SETTINGS_MODULE = "app.settings.test" to force a different environment for testing. In production, you’ll set the DJANGO_SETTINGS_MODULE to app.settings.production in the docker or heroku environment For all other environments, you’ll set DJANGO_SETTINGS_MODULE to app.settings.development in your manage.py In each of these files (app/settings/development.py, app/settings/test.py, etc) you’ll from .application import * and store all common configuration in app/settings/application.py. Here’s a working example.

Here’s how to configure django cache and celery to work with redis:

CACHES = { "default": { "BACKEND": "django_redis.cache.RedisCache", "LOCATION": config("REDIS_URL"), "OPTIONS": { "CLIENT_CLASS": "django_redis.client.DefaultClient", }, } }

Here’s how to use dj_database_url with decouple:

DATABASES = {"default": dj_database_url.parse(config("DATABASE_URL"))} Job management using Celery

Django does not come with a job queue. Celery is the most popular job queue library out there and requires redis. It looks like it will require a decent amount of config, but I chose to use it anyway to understand how it compared to Sidekiq/Resque/ActiveJob/Oban/etc.

poetry add celery --allow-prereleases (I needed a prerelease to work with the version of click I was using) If you are using redis as the broker (easier for me, since I already had it installed + running) you’ll need to poetry add redis Celery does not use manage.py so it would not load the .env file. I needed to manually run dotenv_load() at the top of your celery config. I discovered that this needed to be conditionally loaded for prod, at which point I discovered that decouple is a much better package for managing configuration. I put my celery tasks within the users application as tasks.py. You can specify a dot-path to the celery config via the CLI: celery -A users.tasks worker --loglevel=INFO You can configure celery to store results. If you do, you are responsible for clearing out results. They do not expire automatically. Celery has a built-in cron scheduler. Very nice! There’s even a nice -B option for running the scheduler within a single worker process (not recommended for prod, but nice for development). When I tried to access django models, I got some weird errors. There’s a django-specific setup process you need to run through. DJANGO_SETTINGS_MODULE needs to be set, just like in manage.py. You can’t import django-specific modules at the top of the celery config file. Celery is threaded by default. If your code is not thread safe, you’ll need to set --concurrency=1. By default, tasks do not run inline. If you want to setup an integration test for your tasks, you need either (a) run tasks in eager mode (not recommended) or (b) setup a worker thread to run tasks for you during your tests. Eager mode is not recommended for testing, since it doesn’t simulate the production environment as closely. However, running a worker thread introduces another set of issues (like database cleanup not working properly). There’s no real downside to using @shared_task instead of @app.task. It’s easier to do this from the start: less refactoring to do when your application grows. Testing

Some more learnings about working with pytest & vcr in combination with django:

Database cleaning is done automatically for you via @pytest.mark.django_db at the top of your test class. This is great: no need to pull in a separate database cleaner. To be able to run pytest which relies on django models/configuration, you need the pytest-django extension. You can stick any config that would be in pytest.ini in your toml file under [tool.pytest.ini_options] You need to setup a separate config for your database to ensure it doesn’t use the same one as your development environment. The easiest way to do this is to add DJANGO_SETTINGS_MODULE = "yourapp.settings.test" to your toml file and then override the database setup in the yourapp/settings/test.py file. You can use pytest fixtures to implement ruby-style around functions. Redis/django cache is not cleared automatically between test runs. You can do this manually via django.core.cache.clear() In a scenario where you memoize/cache a global function that isn’t tied to a class, you may need to clear the cache to avoid global state causing indeterminate test results. You can do this for a single method via clear_cache() or identify all functions with lru cache and clear them. Django has a test runner (python manage.py test). It seems very different (doesn’t support fixtures), and I ran into strange compatibility issues when using it. Use pytest instead. My thoughts on Django

I continue to be impressed with the python ecosystem. The dev tooling (linting, repls, type checking, formatting, etc) is robust, there are reasonably well-written and maintained packages for everything I needed. It seems as though most packages are better maintained than the ruby equivalents. I only once had to dive into a package and hack a change I needed into the package. That’s pretty impressive, especially since the complexity of this application grew a lot more than I expected.

Working with python is just fun and fast (two things that are very important for me!). A similar level of fun to ruby, but the language is better designed and therefore easy to read. You can tell the ecosystem has more throughput: more developers are using various packages, and therefore more configuration options and bugs worked out. This increases dev velocity which matters a ton for a small side project and even more for a small startup. I don’t see a reason why I’d use ruby if I’m not building a rails-style web application.

Rails is ruby’s killer app. It’s better than Django across a couple of dimensions:

Better defaults. Multiple environments supported out of the box. Expansive batteries-included components. Job queuing, asset management, web workers, incoming/outgoing email processing, etc. This is the biggest gap in my mind: it takes a lot more effort & decisions to get all of these components working. Since django takes a ‘bring your own application components’ approach, you don’t get the benefit of large companies like Shopify, GitHub, etc using these and working out all of the bugs for you.

The Django way seems to be a very slim feature set that can be easily augmented by additional packages. Generally, I like the unix-style single responsibility tooling, but in my experience, the integration + maintenance cost of adding 10s of packages is very high. I want my web framework to do a lot for me. Yes, I’m biased, since I’m used to rails but I do think this approach is just better for rapid application development.

This was a super fun project. Definitely learned to love python and appreciate the Django ecosystem.

What I’m missing

There were some things I missed from other languages, although the list is pretty short and nitpicky:

Source code references within docs. I love this about the ruby/elixir documentation: as you are looking at the docs for a method, you can reveal the source code for that method. It was painful to (a) jump into a ipython session (b) import the module (c) ?? module.reference to view the source code. Package documentation in Dash More & better defaults in django setup. Improved stdlib map-reduce. If you can’t fit your data transformation into a comprehension, it’s painful to write and read. You end writing for loops and appending to arrays. Format code references in the path/to/file.py:line:col format for easy click-to-open support in various editors. This drove me nuts when debugging stack traces. Improved TypedDict support. It seems this is a relatively new feature, and it shows. They are frustrating to work with. Open Questions

I hope to find an excuse to dig a bit more into the python ecosystem, specifically to learn the ML side of things. Here are some questions I still had at the end of the project:

Does numpy/pandas eliminate data manipulation pain? My biggest gripe with python is the lack of chained data manipulation operators like ruby/elixir. How does the ML/AI/data science stuff work? This was one of my primary motivations for brushing up on my python skills and I’d love to deeply explore this. How does async/await work in python? How does asset management / frontend work in django? Debugging asdf plugin issues

Although unrelated to this post, I had to debug some issues with an asdf plugin. Here’s how to do this:

Clone the asdf plugin repo locally: git clone https://github.com/asdf-community/asdf-poetry ~/Projects/ Remove the existing version of the repo ~/.asdf/plugins && rm -rf poetry Symlink the repo you cloned: ln -s ~/Projects/asdf-poetry poetry

Now all commands hitting the poetry plugin will use your custom local copy.

Continue Reading

Building a Crypto Index Bot and Learning Python

A long time ago, I was contracted to build a MacOS application using PyObjc. It was a neat little app that controlled the background music at high-end bars around London. That was the last time I used python (early 2.0 days if I remember properly). Since then, python has become the language of choice for ML/AI/data science and has grown to be the 2nd most popular language.

I’ve been wanting to brush up on my python knowledge and explore the language and community. Building a bot to buy a cryptocurrency index was the perfect learning project, especially since there was a bunch of existing code on GitHub doing similar things.

You can view the final crypto index bot project here. The notes from this learning project are below. These are mainly written for me to map my knowledge in other languages to python. Hopefully, it’s also helpful for others looking to get started quickly in the language!

Tooling & Package Management

I work primarily in ruby (and still enjoy the language after years of writing professionally in it). Some of the comparisons below are to the equivalent tooling in ruby-land.

pip == bundle Package versions are specified in a requirements.txt file if you are using pip. https://rubygems.org/ = https://pypi.org/ There’s not really a rake equivalent that’s been adopted by the community. Poetry is an alternative to pip that seems to be the most popular choice for new projects. virtualenv = rbenv, but just for packages, not for the core python version, and is specific to each project. Poetry will autogen a virtualenv for you. There are dev and non-dev categories in poetry, but not a test category by default. Here’s how to add a dev dependency poetry add -D pytest If you are using the VS Code terminal, certain extensions will automatically source your virtualenv. I found this annoying and disabled this extension (can’t remember which extension was causing me issues). pyproject.toml alternative to requirements.txt, but also includes gemspec-like metadata about the package. It looks like poetry update consumes the .toml file and generates a poetry.lock. I’m guessing that other build tools also consume the .toml config and it’s not used just for poetry. The python community seems to be into toml configuration. This is used for poetry package specifications and project-specific variables. I don’t get it: it’s slightly nicer looking than JSON, but you can’t specify arrays or nested hash/dictionaries. Why not just use yaml instead? Or just keep it simple and use JSON? I ran into this issue where poetry was using the global ~/Library/Caches/pypoetry cache directory and I thought this was causing some package installation issues. I don’t think that ended up being the isweu poetry debug poetry config -vvv to see what configuration files are being loaded poetry config --list indicated that a global cache directory was being used. Tried upgrading pip, didn’t work: python3 -m pip install --upgrade pip I can’t remember how I fixed the issue, but these commands were helpful in understanding where poetry throws various code. If you want to hack on a package locally and use it in your project: vcrpy = { path = "/full/path/to/project", develop = true } in your toml file Note that you cannot use ~ in the path definition After adding this to your pyproject.toml run poetry lock && poetry install This will be easier in poetry 1.2 Want to make sure your project is pulling from your locally defined project? You can inspect the path that a module was pulled from via packagename.__file__ i.e. import vcr; print(vcr.__file__) I had trouble with a corrupted poetry env, I had to run poetry env use python to pick up my local package definition Working on a project not using poetry? Create a venv python -m venv venv && source ./venv/bin/activate If there’s a setup.py then run python setup.py install However, you can’t install ‘extra’ dependencies (like development/testing) via setup.py. It looks like pip install -e '.[dev]' It sounds like setup.py and requirements.txt do not define dev dependencies. You’ll probably need to install these manually. Look at the CI definition in the project to determine what dev dependencies need to be installed. There’s a .spec file that seems to be used with pyinstaller, a python package, when packaging a python application for distribution. Pyinstaller is primarily aimed at distributing packages for execution locally on someone’s computer. This use-case is one of the areas where python shines: there’s decent tooling for building a multi-platform desktop application. You’ll see readme-like documents written in rst (restructure text format) instead of md. I have no idea why markdown just isn’t used. A ‘wheel’ is an architecture-specific package bundle that contained compiled binaries. This is helpful if a python package contains non-python code that needs to be compiled since it eliminates the compile step and reduces the change of any library compatibility issues (this is a major problem in PHP-land). black looks like the most popular python code formatter. Language Multiline strings (""") at the beginning of a class or function definition isn’t just a python idiom. They are ‘docstrings’ and get automatically pulled into the autogen’d python documentation. Similar to ruby, camelCase is used for class names, snake_case is used for function/variable names. Calling a function requires parens, unlike ruby or elixir. Like javascript, return values need to explicitly be defined by return val. Conditionals do not return values, which means you need to assign variables inside the block (unlike the ability to assign a variable to the return value of a block in ruby, a feature that I love). Each folder in a python project is transformed into a package that can you import. the __init__ file in the folder is automatically imported when you import the folder name. Imports have to be explicitly defined, like javascript, to use any functions outside the set of global/built-in functions. Speaking of built-in functions, python provides a pretty random group of global functions available to you without any imports. I found this confusing: round() is a built-in but ceil() is not. When you import with a . it looks at the local directory for matching imports first. Import everything in package with from math import *. This is not good practice, but helpful for debugging/hacking. Although you can import individual functions from a package, this is not good practice. Import modules or classes, not individual functions. You have to from package.path import ClassName to pull a classname from a module. You can’t import package.path.ClassName None is nil and capitalization matters True and False are the bool values; capitalization matters. Hashes are called dicts in python Arrays are called lists in python You can check the existence of an element in a list with element in list. Super handy! Triple-quoted strings are like heredocs in other languages. They can be used for long comments or multi-line strings. Substring extraction ranges are specified by the_string[0:-1]. If you omit a starting range, 0 is used: the_string[:-1]. The traditional boolean operators && and || aren’t used. Natural language and and or is what you use instead. Keyword arguments are grouped together using **kwargs in the method definition. You can splat a dict into keyword arguments using function_call(**dict) All arguments are keyword arguments in python. More info. You can lazy-evaluate a comprehension using () instead of [] When playing with comprehensions inside of a ipython session variable scoping will not act the same as if you weren’t executing within a breakpoint(). I don’t understand the reasons for this, but beware! In addition to list comprehensions, there are dictionary comprehensions. Use {...} for these. When logic gets complex for a list comprehension, you’ll need to use a for loop instead (even if you want to do basic log debugging within a comprehension). I miss ruby’s multi-line blocks and chained maps. List comprehensions are neat, but there doesn’t seem to be a way to do complex data transformations cleanly. I hate having to define an array, append to it, and then return it. The filter/map/etc functions can’t be easily chained like ruby or javascript. I wonder what I’m missing here? I’ve heard of pandas/numpy, maybe this is what those libraries solve? There are strange gaps in the stdlib, especially around manipulating data structures. For instance, there’s no dead-simple way to flatten an array-of-arrays. import operator; from functools import reduce; reduce(operator.concat, array_of_arrays) Similarly, there’s no easy way to get unique values from a list. Get all of the string values of an enum [choice.value for choice in MarketIndexStrategy] By subclassing str and enum.Enum (ex: class MarketIndexStrategy(str, enum.Enum):) you can use == to compare strings to enums. There’s no ? tertiary operator, instead you can do a one-liner if-else: assignment = result if condition else alternative To enable string interpolation that references variable names you need to use f"string {variable}". Otherwise you’ll need to run format on the string to get it interpolated: "string {}".format(variable) Python has built-in tuples (1, 2, 3). I’ve always found it annoying when languages just have arrays and don’t support tuples. Unlike ruby, not all code has a return value. You have to explicitly return from a function and you can’t assign the result of a code block to a variable. There’s some really neat python packages: natural language processing, pandas, numpy. Python has gained a lot of traction in the deep learning/AI space because of the high-quality packages available. is is NOT the same as ==. is tests if the variable references the same object, not if the objects are equal in value You can’t do an inline try/catch. Many bad patterns that ruby and other languages really shouldn’t let you do are blocked. In a lot of ways, python is a simpler language that forces you to be more explicit and write simpler code. I like this aspect of the language a lot. Sets are denoted with {}, which is also used for dictionaries/hashes. Here’s how decorators work: The @decorator on top of a method is like an elixir macro or ruby metaprogramming. It transforms the method beneath the decorator. The @ syntax ("pie" operator) calls the decorator function, passing the function below the decorator as an argument to the decorator function, and reassigning the passed function to the transformed function definition. The decorator function must return a function. There is no special syntax to designate a function as a ‘decorator function’. As long it accepts a function as an argument and returns a function, it can be used as a decorator. Referencing an unspecified key in a dict raises an exception. You need to specify a default: h.get(key, None) to safely grab a value from a dict. An empty array will evaluate to false. You don’t need to if len(l) == 0:. Instead you can if !l:. Same goes with empty dicts and sets. Lambdas can only be single-line. This is a bummer, and forces you to write code in a different style. := allows you to assign and test a value within a conditional. Interesting that there’s a completely separate syntax for ‘assign & test’. __init__.py in a folder defines what happens when you import a folder reference. Here’s how classes work: class newClass(superClass): for defining a new class __init__ is the magic initialization method self.i_var within __init__ defines a new instance variable for a class. This is a good breakdown of instance and class variables. you can execute code within a class outside of a method definition for class-level variables and logic, new instances of a class are created via newClass(). Instance methods of a class are always passed self as the first argument Class variables are available on the instance as well, which is a bit strange. You can use class variables as default values for instance variables. This doesn’t seem like a great idea. newClass.__dict__ will give you a breakdown of everything on the class. Kind of like prototype in javascript. Python has multiple inheritance. class newClass(superClass1, superClass2). Inherited classes are searched left-to-right. There are not private variables built into the language, but the convention for indicating a variable is private is using a _ like self._private = value There’s a javascript-like async/await pattern (coroutines). I didn’t dig into it, but seems very similar to Javascript’s pattern. Debugging & Hacking

One of the important aspects of a language for me is the REPL and tinkering/hacking environment. If I can’t open up a REPL and interactively write/debug code, I’m a much slower developer. Thus far, ruby has the best interactive development environment that I’ve encountered:

binding.pry and binding.pry_remote when your console isn’t running your code directly to open a repl Automatic breakpoints on unhandled exceptions, in tests or when running the application locally Display code context in terminal when a breakpoint is hit Print and inspect local variables within a breakpoint Navigate up and down the callstack and inspect variables and state within each frame Overwrite/monkeypatch existing runtime code and rerun it with the new implementation within a repl Define new functions within the repl Inspect function implementation within the repl

I’d say that python is the first language that matches ruby’s debugging/hacking environment that I’ve used. It’s great, and better than ruby in many ways.

inspect is a very helpful stdlib package for poking at an object in a repl and figuring out the method, variables, etc available to it. traceback provides some great tools for inspecting the current stack. How you drop an interactive console at any point in your code? There are a couple ways: Uses the ipython enhanced repl in combination with the built in debugger import ipdb; ipdb.set_trace(). Requires you to install a separate package. There’s a breakpoint() builtin that launches the standard pdb debugger. You can configure breakpoint() to use ipdb via export PYTHONBREAKPOINT=ipdb.set_trace. All of the standard pdb functions work with ipdb import code; code.interact(local=dict(globals(), **locals())) can be used without any additional packages installed. bpython is a great improvement to the default python. You need to install this within your venv otherwise the packages within your projects venv won’t be available to it: pip install bpython && asdf reshim ipython is a bpython alternative that looks to be better maintained and integrates directly with ipdb. python -m ipdb script.py to automatically open up ipython when an exception is raised when running script.py Some misc ipython tips and tricks: If something is throwing an exception and you want to debug it: from ipdb import launch_ipdb_on_exception; with launch_ipdb_on_exception(): thing_causing_exception() who / whos in whereami %psource or source like show-source pp to pretty print an object ipython --pdb script.py to break on unhandled exceptions Great grab bag of interesting tips %quickref for detailed help exit gets you out of the repl entirely All of the pypi information is pulled from a PKG-INFO file in the root of a package rich-powered tracebacks are neat, especially with locals=True The ruby-like metaprogramming/monkeypatching stuff happens via the __*__ functions which are mostly contained within the base object definitions. For instance, logging.__getattribute__('WARN') is equivalent to logging.WARN You can reload code in a REPL via from importlib import reload; reload(module_name). Super helpful for hacking on a module (definitely not as nice as Elixir’s recompile). Monkeypatching in python isn’t as clean as ruby, which in some ways is better since monkeypatching is really an antipattern and shouldn’t be used often. Making it harder and more ugly helps to dissuade folks from using it. To monkeypatch, you reassign the function/method to another method: ClassName.method_name = new_method. Here’s an example. Typing

I’ve become a huge fan of gradual types in dynamic languages. I never use them right away, but once the code hardens and I’m relatively sure I won’t need to iterate on the code design, I add some types in to improve self-documentation and make it safer to refactor in the future.

Python has a great gradual type system built-in. Way better than Ruby’s.

mypy . on the command line to test all python files within a folder. If your project fails to pass mypy, it won’t cause any runtime errors by default. There’s a VS Code extension. This extension is included in Pylance, which you should probably be using instead, but you need to set the typing mode to ‘basic’. Return value types are set with -> before the : at the end of the method definition. Otherwise, typing works very similar to other languages with gradular typing (TypeScript, Ruby, etc). A common pattern is importing types via import types as t t.Union[str, float] for union/any types, You can’t merge dictionaries if you are using a TypedDict (dict | dict_to_merge). Massive PITA when mutating API data. Verbose types can be assigned to a variable, and that variable can be used in type definintions. Handy way to make your code a bit cleaner. Enums defined with enum.Enum can be types. Testing Like Elixir, there are doctests that execute python within docstrings to ensure they work. Neat! There are built-in test libraries that look comparable to ruby’s testunit. pytest is similar to minitest: provides easy plugins, some better standard functionality, and builds on top of unittest. You probably want to use pytest for your testing framework. setup.cfg is parsed by pytest automatically and can change how tests work. conftest.py is another magic file autoloaded by pytest which sets up hooks for various plugins. You can put this in the root of your project, or in test/ Test files must follow a naming convention test_*.py or *_test.py. If you don’t follow this convention, they won’t be picked up by pytest by default. breakpoint()s won’t work by default, you need to pass the -s param to pytest Like ruby, there are some great plugins for recording and replaying HTTP requests. Checkout pytest-recording and vcrpy. To record HTTP request run pytest --record-mode=once If you want to be able to inspect & modify the API responses that are saved, use the VCR configuration option "decode_compressed_response": True There’s a mocking library in stdlib, which is comprehensive. I’m not sure why other languages don’t do this—everyone needs a mocking library. It looks like you set expectations on a mock after it runs, not before. Here’s how mocking works: The @patch decorator is a clean way to manage mocking if you don’t have too many methods or objects to mock in a single test. If you add multiple patch decorators to a method, the mocks for those methods are passed in as additional arguments. The last patch applied is the first argument. mock.call_count, mock.mock_calls, mock.mock_calls[0].kwargs are the main methods you’ll want for assertions asset without parens is used in tests. This confused me, until I looked it up in the stdlib docs and realized assert is a language construct not a method. tox is much more complex that pytest. It’s not a replacement for pytest, but seems to run on top of it, adding a bunch of functionality like running against multiple environments and installing additional packages. It feels confusing—almost like GitHub actions running locally. If you want to just run a single test file, you need to specify an environment identifier and test file tox -epy38-requests -- -x tests/unit/test_persist.py My thoughts on Python

Overall, I’m impressed with how python is improved over the years. Here are some things I enjoyed:

Gradual typing included in the core language Comprehensions are natural to write Syntax simplicity: there are not too many ways to do things, which makes code more straightforward to read. Mature, well-designed libraries Virtual environments out of the box Robust, well-maintained developer tooling (ibpd, ipython, etc) with a advanced REPL Great built-in testing libraries Lots of example code to grep through for usage examples Explicit imports and local-by-default logic (unlike ruby, where it’s much easier to modify global state) Easy to understand runtime environment (in comparison to JavaScript & Elixir/BEAM)

The big question is if Django is a good alternative to Rails. I love Rails: it’s expansive, well-maintained, thoughtfully designed and constantly improving. It provides a massive increase in development velocity and I haven’t found a framework that’s as complete as Rails. If Django is close to rails, I don’t see a strong argument for not using anything python over ruby for a web product.

Open Questions

Some questions I didn’t have time to answer. If I end up working on this project further, this is a list of questions I’d love to answer:

How good is django? Does it compare to Rails, or is it less batteries-included and more similar to phoenix/JS in that sense. Does numpy/pandas solve the data manipulation issue? My biggest gripe with python is the lack of chained data manipulation operators like ruby. How does the ML/AI/data science stuff work? This was one of my primary motivations for brushing up on my python skills and I’d love to deeply explore this. How does async/await work in python? Learning Resources

General guides:

https://python-patterns.guide/python/module-globals/ https://book.pythontips.com/en/latest/ternary_operators.html https://realpython.com/python-lambda/#anonymous-functions https://google.github.io/styleguide/pyguide.html

Monkeypatching:

https://sharmapacific.in/monkey-patching-in-python/ https://github.com/ytdl-org/youtube-dl/commit/00fcc17aeeab11ce694699bf183d33a3af75aab6 https://filippo.io/instance-monkey-patching-in-python/ https://tryolabs.com/blog/2013/07/05/run-time-method-patching-python/ Open Source Example Code

There are some great, large open source python projects to learn from:

https://github.com/getsentry/sentry https://github.com/arachnys/cabot – opens source APM https://github.com/vitorfs/bootcamp https://github.com/rafalp/Misago

Download these in a folder on your local to easily grep through.

Continue Reading

Building a Elixir & Phoenix Application

Learning Elixir

Ever since I ran into Elixir/Phoenix through a couple of popular Hacker News posts I’ve been interested in tinkering with the language. I have a little idea for an app that I’m just motivated enough to build that Elixir would work for. I’ve document my learning process below by logging my thoughts as I learned Elixir via a ‘learning project’.

What I’m building

Here’s what I’d like to build:

Web app which detects the user’s location using the built-in location service in the browser The zip code of that location is determined (server or client-side) The zip code is handed off to a server-side process which renders a page with the zip code.

Here’s what I’ll need to learn:

Elixir programming language Phoenix application framework Managing packages and dependencies Erlang runtime architecture How client-side assets are managed in phoenix How routing in Pheonix works

I’m not going to be worried about deploying the application in this project.

This is going to be fun, let’s get started!

Learning Elixir & Pheonix

I’ve worked with Rails for a while now, so most of the conceptual mapping is going to be from Ruby => Elixir and Rails => Phoenix.

First, let’s get a basic Pheonix dev environment up and running: https://hexdocs.pm/phoenix/installation.html Wow: "An Erlang system running over one million (Erlang) processes may run one operating system process". Processes are not OS processes but are instead similar to green threads with much less overhead. Some tooling equivalents: https://thoughtbot.com/blog/elixir-for-rubyists. asdf, exenv, kiex == rbenv. Looks like asdf is the most popular replacement. Reading this through, I can see why rubyists are so angry about the pipe operator (|>). The elixir version is much different (better, actually useful) than the proposed ruby version. It takes the output of a previous function and uses it as the first input to the next function in the chain. "Function declarations support guards and multiple clauses". What does that mean? It sounds like you can define a method multiple times by defining what the argument shape looks like. Instead of a bunch of if conditions at the top of a function to change logic based on inputs, you simply define the function multiple times. Makes control flow easier to reason about. There’s some great syntactical sugar for array iteration for document <- documents == documents.each { |document| ... } "I believe Elixir and Ruby are interchangeable for simple web applications with no high-traffic or that don’t require very short response times." This has been my assumption thus far: Elixir is only really helpful when performance (specifically concurrent connections) is a critical component. We will see if this plays out as I learn more. I’d recommend creating an elixir folder and cloning all of the open-source projects I reference below into it. Makes it very easy to grep (I’d recommend ripgrep, which is much better than grep) for various API usage patterns. To install elixir: brew install elixir; elixir -v verifies that we have the minimum required erlang and elixir versions. I ran this check, we are ready to go! mix is a task runner and package manager in one (rake + bundle + bin/*). It uses dot syntax instead of a colon for subcommands: bundle exec rake db:reset => mix ecto.reset When I ran the install command for pheonix it asked for hex. Looks like bundler/rubygems for elixir. https://hex.pm Webpack is used for frontend asset management and isn’t tied into Elixir at all (which I really like). Postgres is configured as the default DB. Now can I start running through the Phoenix hello world: https://hexdocs.pm/phoenix/up_and_running.html etco == ActiveRecord, kind-of. Seems a bit more light weight. Time to setup the database! config/dev.exs is the magic file. Looks like a very Rails-like folder structure at first glance. Interesting that they have a self-signed local https setup built in. That was a huge pain in ruby-land. Looks like lib/NAME_web => app/ eex === erb and has ~same templating language https://milligram.io looks like an interesting minimalist bootstrap. This was included in the default landing page. elixir atom == ruby symbol Erlang supports hot code updates: "We didn’t need to stop and re-start the server while we made these changes." Very cool. Later on, I learned that this isn’t as cool/easy as it sounds. Most folks don’t use this unless their applications have very specific requirements. Routing (routes.ex) looks to be very similar to rails. The biggest difference is the ability to define unique middleware stacks ("pipelines") that match against specific URL routes or content-types. Later on, I realized there aren’t nearly as many configuration options compared to rails. For example, I don’t believe you use regexes to define a URL param constraint. Huh, alias seems to be like include within modules. Nope! Got this one wrong: Looks like it just makes it easier to type in a module reference. Instead of Some.Path.Object, with alias you can just use Object (without specifying the namespace). use is similar to include in ruby. mix phx.server === bundle exec rails server mix deps.get === bundle Plugs seem similar to Rails engines. Nope! Plug is just a middleware stack. Umbrella applications are similar to Rails engines. Dots . instead of double colons :: for nested modules: MyApp.TheModule == MyApp::TheModule Huh, never ran into the HTTP HEAD method before https://developer.mozilla.org/en-US/docs/Web/HTTP/Methods/HEAD Examples seem to indicate that router pipelines should be used for before_filter type of logic. Looks like a plug can be a full-blown module, or simply a function on the controller that’s called before the action starts executing. https://stackoverflow.com/questions/30958446/rails-before-filter-equivalent-in-phoenix You can setup after_action-like functionality, but it’s not as intuitive: https://elixirforum.com/t/phoenix-controllers-post-action-plugs/18267 fn is a lambda function in Elixir. Doesn’t look like there are multiple ways to do lambdas. Yay! I hated the many ways of defining anonymous functions in ruby that all worked slightly differently (procs, blocks, and lambdas). There is shorthand syntax fn(arg) -> arg.something end == &(&1.something) There’s defp, def, and defmodule. What’s the difference? After a bit of digging, these are core elements to elixir which slightly change how the methods are defined. defp is a private method, for instance. Ahh func/2 references the implementation of func with two arguments. When referencing a function you must specify the number of arguments using this syntax. use Phoenix.Endpoint references a macro, how exactly do macros work? Macros are Elixir’s metaprogramming primitive. That’s all for now, I’ll read more later. I ended up not doing any metaprogramming in my application but learned it a bit about it. It sounds like you essentially specify code you want to inject into a module by quoteing it within the defmacro __using__ function in your module. This __using__ function is automagically called when you use the module. This enables you to dynamically write the elixir code you want to include (you can think of a quote as dynamically eval’d code). Live reload for the front and backend is installed by default and "just works" when running a development server. Yay! Hated all of the config in rails around this. "It is also possible for an application to have multiple endpoints, each with its own supervision tree" sounds very cool. I’m guessing this allows for multiple applications to be developed within one codebase but to run as essentially separate running processes? Something to investigate in another project. Interesting that the SSL config is passed directly to the core phoenix endpoint configuration. I wonder if there is something like unicorn/puma in the mix? It looks like there is, Cowboy is the unicorn/puma equivalent. Ecto is not bundled in the Phoenix framework. It’s a separate project. Looks like phoenix favors a layered vs all-in-one approach, but is opinionated about what packages which are installed by default (which I like). I don’t fully understand this yet, but it looks like there is an in-memory key-value store built into OTP, which is the elixir runtime (i.e. erlang). In other words, something like Redis is built-in. What are the trade-offs here? Why use this over Redis or another key/value store? Because you can define multiple variations of a method, things like action_fallback is possible. Define error handling farther up the chain and just think about the happy path in the content of the method you are writing. Neat. "EEx is the default template system in Phoenix…It is actually part of Elixir itself" Great, so this isn’t something specific to Phoenix. This made something click for me: "pattern matching is strong typing" https://news.ycombinator.com/item?id=18842123 It seems as though one of the goals behind pattern matching + function definitions is to eliminate nested conditionals. Elixir (and probably functional programming in general) seems to favor "flat" logic: I’m not seeing many nested if statements anywhere. As I learned later, if statements are generally discouraged and hard to use as they have their own scope (you can’t modify variables in the outer scope at all). ^ ‘pins’ a variable. const in node, but slightly different because of this "matching not assignment" concept (which I don’t fully get yet). This is used a lot in Ecto queries, but I’m not sure why. Gen prefix stands for Generic NOT Generate as I thought. i.e. GenServer == Generic Server. I still don’t understand this "Let it crash" philosophy. Like, if a sub-routine of some sort fails, it would corrupt the response of any downstream logic. I can see the benefits of this for some sort of async map-reduce process, but not a standard web stack. What am I missing? After many rabbit holes, I’m ready to tackle my initial goal! I’m having a blast, it all seems very well designed: I’m getting the same feeling as when I first started learning Rails via Spree Commerce year ago. What I’m missing from Ruby

Overall, I found the built-in Elixir tooling to be top-notch. There didn’t seem to be too many obvious gaps and things generally "just worked". However, there’s some tooling from the ruby ecosystem that I was missing as I went along.

Automatically open up a REPL when an exception is thrown. In ruby, this is done via pry-rescue. Super helpful for quickly diving into the exact context where the error occurs. In Phoenix, it would be amazing if the debugging plug (which displays a page when an exception is thrown) displays the variables bound in a specific scope so I can reproduce & fix errors quickly. It would be even better if a REPL could be opened and interacted with on the exception page. better-errors does this in ruby. Given that all code in Elixir is functional, simply knowing the local variables in a specific scope would be enough to reproduce most errors and would make for a very quick debugging loop. iex -S mix phx.server feels weird. It would feel a bit nicer if there was a mix phx.console which setup IEx for you. The Allow? [Yn] prompt is annoying when I’m debugging a piece of code. It would be great if you could auto-accept require IEx; IEx.pry requests. In a debugging session, I couldn’t figure out how to navigate up and down the call stack. Is there something like pry-nav available? Scan dependencies for security issues. In ruby, this is done via bundler-audit. I couldn’t find a VS Code extension with Phoenix snippets. Built-in Structured Logging. In my experience, using structured logs is incredibly helpful in effectively debugging non-trival production systems. I’ve always found it frustrating that it’s not built-in to the language (I built one for ruby). I think it would be amazing if this was provided as an optional feature in Elixir’s logger: Logger.info "something happened", user: user.id => something happened user=1 It doesn’t seem possible to run a mix task in production when using Elixir releases. There are many scenarios where you’d want to run a misc task on production data (a report, migration, etc). In Rails-land, this has been a great tool to have to solve a myriad of operational problems when running a large-ish application. Ability to add multiple owners/authors to a hex package. This makes it challenging to hand off ownership of a package when the original creator doesn’t have the time to maintain it anymore. Coming from Rails, phoenix_html feels very limited. There are many convenience methods I’m used to in Rails that I wasn’t excited about re-implementing. In ruby, if you are working on improvements to a gem (package) you can locally override the dependency using bundle config local.gem_name ~/the_gem_path. This is a nice feature for quickly debugging packages. There’s not a built-in way to do this.

I posted about this on the Elixir forums and got helpful workarounds along with confirmations about missing functionality.

Initial impressions

I enjoyed learning Elixir! It’s a well designed language with great tooling and a very supportive community. However, it still feels too early to use for a traditional SaaS product.

Although there are packages for most needs, they just don’t have as many users as the ruby/javascript ecosystem and there’s a lot of work you’ll need to do to get any given package working for you. Phoenix is great, but it’s nowhere close to rails in terms of feature coverage and you’ll find yourself having to solve problems the Rails community has already perfected over the years. The deployment story is really poor and is not natively supported on Lambda, Heroku, etc.

There are specific use-cases where Elixir is a great choice: applications that have high concurrency and/or performance demands (i.e. chat, real-time, etc) and IoT/embedded systems (via nerves) are both situations where Elixir will shine. The Elixir language has been more carefully curated compared to ruby and continues to improve at a great velocity. It’s cool to see the creator of Elixir very active in the forums an actively listening to the users and incorporating feedback. It very much reminds me of the early days of Rails.

This is all to say, in my experience, Ruby + Rails is still the fastest way to build web applications that don’t have intense concurrency/performance requirements on day one. The ecosystem, opinionated defaults, and hardened abstractions battle-tested by large companies (Shopify, GitHub, Stripe) are just too good. The dynamic nature of the language allows for tooling (better-errors, pry-rescue, byebug, etc) that materially increases development velocity.

Other Learnings Community matters

When I first started learning how to program, Kirupa (which still exists, amazingly) was an incredible resource. Random people from the internet answered by basic programming questions. All of my initial freelancing work came from the job board. The Flash/Actionscript tutorials on the site were incredibly helpful. It was a relatively small tightly-nit community that was ready to help.

I’ve feel like we’ve lost that with StackOverflow and googling for random blog posts.

The ElixirForum.com community is awesome and has that same kind, tight-nit, open-to-beginners feel that the forums of the 90s had. I was impressed and enjoyed participating in the community.

Confirmation bias is very real

I already liked Elixir before I dug into it. It looked cool, felt hot, etc. I was looking for reasons to like it as I did this example project.

It was interesting to compare this to my experience with node. I already didn’t like Javascript as a whole and was ready to find reasons I didn’t like node.

I found them, but would I have found just as many frustrating aspects of Elixir if I didn’t have a pre-existing positive bias towards Elixir?

Managing your psychology and biases is hard, but something to be aware of in any project.

Functional programming isn’t complicated

"Functional programming" is an overloaded concept. Languages are touted as "functional programming languages", there are dedicated FP conferences, and fancy terms (like "monads") all make it harder for an outsider to understand what’s going on.

I want to write up a deep-dive on functional programming at some point, but getting started with this style of programming is very easy:

You can program in a functional style in any language. Don’t store state (or store as little as possible) in objects. This forces you to declare all inputs needed for the function as arguments, instead of sourcing variables from an instance or class variable. Writing functions that don’t depend on external state are deterministic/idempotent by default. In other words, running the function against the same set of inputs yields the same results.

Boom! You are programming in a functional style. There’s more to it, but that’s the core.

Per-language folders for easy code search

Having a set of repositories is very helpful is understanding how various libraries are used in production. I’ve found it super helpful to have a folder with any great open source applications I can find in the language I’m learning. This makes it very easy to grep for various keywords or function names to quickly understand patterns and real-world usage.

For example:

cd ~/Projects mkdir elixir cd elixir git clone https://github.com/thechangelog/changelog.com # ripgrep is a faster and much easier to use version of grep rg -F 'Repo.'

Along these lines, grep.app is a great tool for quickly searching a subset of GitHub repositories (can’t wait until GitHub fixes their code search).

Open questions

There’s a bunch of concepts I didn’t get a chance to look into. Here’s some of the open questions I’d love to tackle via another learning project:

Processes/GenServer/GenStage. Although I did work with packages that create their own processes, I didn’t work with Gen{Server,Stage} from scratch. Macros / metaprogramming. Testing. Ecto/ORM. Callbacks (what does @behaviour do?). Clusters/Nodes (connecting multiple erlang VMs together to load balance) Functional programming concepts. These were referenced around the edges but I never dug into them in a deep way. Recommended Elixir style guide. I know there’s a built-in formatter/linter, but I wonder if there’s a community-driven opinionated style guide. Background jobs. Deployment. VS Code/language server/development environment optimizations. What’s up with @spec? Is there typing coming to elixir? Supervisor trees. Built-in ETS tables. Looks like a built-in key-value store similar to redis. Resources for learning Elixir & Phoenix General https://www.sihui.io/first-impression-of-elixir/ http://crevalle.io/mistakes-rails-developers-make-in-phoenix-pt-1-background-jobs.html https://dockyard.com/blog/2016/05/02/phoenix-tips-and-tricks http://blog.plataformatec.com.br/2018/04/elixir-processes-and-this-thing-called-otp/ https://www.scopelift.co/blog/2018/3/1/phoenix-on-heroku-our-experience-getting-coinrecapio-deployed https://davidlaprade.github.io/inserting-breakpoints-in-elixir https://github.com/h4cc/awesome-elixir https://github.com/thoughtbot/constable https://thoughtbot.com/services/elixir-phoenix http://digitalfreepen.com/2017/08/16/elixir-in-depth-notes.html https://elixirforum.com/t/elixir-blog-posts/150 https://howistart.org/posts/elixir/1/index.html https://til.hashrocket.com/elixir https://extips.blackode.in https://elixirschool.com/en/ https://gist.github.com/raviwu/2e128666ef7e7325c94753097f48c500 Specific Topics Elixir with a Rubyist: http://joaomdmoura.com/articles/learn-elixir-with-a-rubyist-episode-i Debugging: https://www.youtube.com/watch?v=w4xMarVUZQ4 String interpolation: https://thepugautomatic.com/2016/01/elixir-string-interpolation-for-the-rubyist/ Regex: https://thepugautomatic.com/2016/01/pattern-matching-complex-strings/ Opinions https://journal.dedasys.com/2015/04/23/elixir-vs-erlang-a-question-of-momentum/ https://news.ycombinator.com/item?id=18838115 https://github.com/dwyl/learn-elixir/issues/102 https://news.ycombinator.com/item?id=20357055 http://underjord.io/why-am-i-interested-in-elixir.html https://adrian-philipp.com/post/why-elixir-has-great-potential Videos https://www.youtube.com/channel/UC0l2QTnO1P2iph-86HHilMQ https://www.youtube.com/channel/UCIYiFWyuEytDzyju6uXW40Q https://www.youtube.com/channel/UCKrD_GYN3iDpG_uMmADPzJQ https://www.youtube.com/channel/UC47eUBNO8KBH_V8AfowOWOw https://www.youtube.com/watch?v=srQt1NAHYC0 https://www.youtube.com/watch?v=JvBT4XBdoUE https://www.youtube.com/watch?v=B4rOG9Bc65Q Example Applications

Clone these for easy local grepping.

https://github.com/thechangelog/changelog.com – actively managed https://github.com/rizafahmi/elixirjobs – dead project https://github.com/poanetwork/blockscout – active https://github.com/aviacommerce/avia – looks like a zombie project. No commits in 2+ months. https://github.com/edgurgel/httparrot https://github.com/hashrocket/tilex https://github.com/yodiaditya/gps-monitoring https://github.com/ComeBike/come.bike https://github.com/getsentry/sentry-elixir

Continue Reading

Using Ansible to Deploy Elixir Applications on Dokku

For me, the best (and most fun!) way to learn is to find a problem with a new set of tools you want to learn. I’ve documented my process of learning Ansible below, I hope it’s interesting to others!

Motivation

I built an application with Elixir and Phoenix and deployed it using Gigalixir. Gigalixir worked well, but after a couple of weeks the site shut down due to a lack of updates (I was on the free tier). Since this project is strictly for learning, I figured it would be fun to learn Ansible and save a couple bucks by signing up for a free VPS service.

I initially chose Vultr because they offered $50 of free credit towards a $3.50/month VPS, which should be more than enough for a year. This ended up now working out and I switched to AWS (detailed below).

I have some experience with Ansible-like technologies. Long ago, I used Puppet to configure and manage configuration on a single VPS which hosted a Spree Commerce application. It also had a Solr and MySQL server (this was before managed services were a thing and you had to host things yourself). It was interesting to set up, but a pain to manage. Making changes was always scary and created surprising and hard-to-debug errors. Puppet has a unique DSL and both the client and the server have to have Puppet installed for the configuration to work properly. It felt better than configuring Apache & Ubuntu by hand in the PHP days, but it wasn’t that much better.

I keep hearing about Ansible, let’s learn it and see how things have improved!

What I’m building

Here’s what I’d like to build:

An Ansible configuration that will bootstrap a bare VPS with Dokku. Setup the Dokku application with an SSL certificate using Lets Encrypt. Elixir + Phoenix running using the community buildpacks. Ideally, I don’t want to do any manual configuration on the VPS. I want my entire production setup to be built via Ansible. Learning Ansible

Here’s my "liveblog" of my thinking and learnings as I built my ansible config:

Awhile back, I used Dokku to manage ~5 different microservices on a single (small) AWS VPS (via Lightsail). It worked amazingly well and was very stable. Before I move forward with Dokku, I took a look at the project on GitHub and it’s still (very) active, which is amazing! Let’s use that to manage our Elixir deployment. Ansible is a Python-based replacement for puppet/chef. Looks like it consumes yml files and configures servers via ssh. You only need Ansible installed on the "controller machine". This sounds like I can just install it on my laptop and avoid having to install anything on the target/remote server. This is a huge improvement over Chef/Puppet. MacOS install: sudo easy_install pip && sudo pip install ansible && ansible --version A brew command I ran in the meantime ended up breaking my easy_install version. There was a library conflict. I ended up installing via brew instead and this fixed the issue. Setup a ansible.cfg in your project directory. You’ll also need an inventory file to specify where your servers are. You may need to add your SSH key to the VPS you spun up ssh-copy-id -i ~/.ssh/id_rsa.pub root@123.123.123.123. Alternatively you can specify a SSH key in your inventory. Put ansible_ssh_private_key_file=~/.ssh/yourkey.pem after your IP address. I have ansible all -m ping working. Now to try to whip up a Ansible playbook that will install Dokku. Playbooks are a separate yml file that describes how you want to setup the server. Let’s call ours playbook.yml. We’ll run it using ansible-playbook playbook.yml. An Ansible "role" is a bundle of tasks. You can then layer on additional tasks on top of the role. I’m guessing you can also run multiple roles (confirmed this later on). My main goal is to use https://github.com/dokku/ansible-dokku to bootstrap a server. I cloned this to my local to more easily poke around at the code. It look like the variable defaults are specified in defaults/main.yml At least in this repo, each task contained in the ansible-dokku repo is a separate py file which defines an interface to Ansible using a AnsibleModule A "lookup plugin" can pull data from a URL, file, etc for a variable. This will be handy for setting up SSH keys, etc. Here’s an example: "{{lookup('file', '~/.ssh/id_rsa.pub')}}" Looks like roles don’t auto install when you run Ansible. "Galaxy" is the package registry for roles. You need to run a separate command to install packages. Best way to manage roles is to setup a requirements.yml and then run ansible-galaxy install -r requirements.yml. Docs are straightforward: https://galaxy.ansible.com/docs/using/installing.html Think of "modules" as a library. An abstraction around some common system task so you can call it via yml. A module can contain roles and tasks. You’ll see name everywhere in the yml files. This is optional and is only metadata used for logging & debugging. {{ }} are used for variable substitution. Does not need to be inside a string. You can call lookups from inside the brackets. I’m not a yml expert, but this seems like a custom layer on top of the core yml spec. become: true at the top of your playbook tells Ansible to use sudo for everything. Think of it as root: true. Each task has a default state. You can override the state by adding state=thestate to your task options. Each task defines a method to extract the current state from the system Ansible is operated on. Here’s an example. State is mostly extracted by reading configuration files or running a command to read the status of various systems (it’s not as magical as you might expect). Ansible has a vault feature which can encrypt an entire file or an inline variable. Rails introduced something similar where it would encrypt your production secrets into a local file so you could edit/manage them all in a single place. You can also inline encrypt a string using ansible-vault encrypt_string the_thing_to_encrypt --name the_yml_key. You can then copy/paste the resulting string into a var. Add vault_password_file = ./vault_password to your ansible.cfg and hide the file via .gitignore. This eliminates the need to enter the password each time you deploy via Ansible. You can then store the password in 1Password for safekeeping. Encrypted variables need to be stored in vars. I wanted to use encrypted variables for secret definitions passed to dokku config, but I couldn’t use the encrypted string directly in the ENV var config. In vars define your secret app_database_url: !vault |..., then reference the secret in your ENV config DATABASE_URL: "{{ app_database_url }}". Use -vvvv as a CLI option to enable verbose logging. I ran into an issue where a subcommand was hanging waiting around a reply from stdin. However, verbose logging didn’t help me here. I’m guessing the subprocess called didn’t redirect output to the parent stdout/stderr so I couldn’t see any helpful debugging output. This issue ended up being a bit interesting. ansible-dokku used the python3 subprocess module to run dokku commands on the machine. check_call was used, which doesn’t redirect stdin or stdout but subprocess data didn’t pipe it’s way to the ansible stdout or stdin even after I switched to using run. I’m guessing there’s a layer of abstraction in the ansible library which overrides all process pipes and prevents output from making its way to the user without a specific flag passed to AnsibleModule. Alright! I finally have my playbook running properly. Note that most ansible roles seem to work with Ubuntu, but not CentOS which was the default on the VPS provider I was testing out (Vultr). To modify a role that you are using, clone the repo, remove the repo from ~/.ansible/roles and then symlink the directory you removed from the directory. This will allow you to edit role code locally and test it on a live server (obviously, a horrible idea for a real product, fine for a side project). If you see a plain killed message in your deployment log, it’s probably because the server is running out of memory. Let’s add some swap to fix this! There’s got to be a role for adding swap memory to a server. There is: geerlingguy.swap. Added that to requirements.yml and added configuration options to my vars and boom, it works! Nice. I tried to add my own task dokku_lets_encrypt to the dokku-ansible role, but I ran into strange permission issues. Also, the development loop was pretty poor: make a change on my local and rerun the change on the server. Not fun. I ended up just giving up and running the letsencrypt setup manually on the server, so I failed in my goal to fully automate the server configuration. If you just want to run a single task use the --tags option https://stackoverflow.com/questions/23945201/how-to-run-only-one-task-in-ansible-playbook.

Here’s the template I based my config off of. Here’s the playbook configuration I ended up with, which demonstrates how to configure specific dokku module versions and uses encrypted strings:

--- - hosts: all become: true roles: - dokku_bot.ansible_dokku - geerlingguy.swap vars: swap_file_size_mb: '2048' dokku_version: 0.21.4 herokuish_version: 0.5.14 plugn_version: 0.5.0 sshcommand_version: 0.11.0 dokku_users: - name: mbianco username: mbianco ssh_key: "{{lookup('file', '~/.ssh/id_rsa.pub')}}" dokku_plugins: - name: clone url: https://github.com/crisward/dokku-clone.git - name: letsencrypt url: https://github.com/dokku/dokku-letsencrypt.git tasks: - name: create app dokku_app: # change this name in your template! app: &appname the_app - name: environment configuration dokku_config: app: *appname config: MIX_ENV: prod DATABASE_URL: "{{ app_database_url }}" SECRET_KEY_BASE: "{{ app_secret_key_base }}" DOKKU_LETSENCRYPT_EMAIL: hello@domain.com # specify port so `domains` can setup the port mapping properly PORT: "5000" vars: # encrypted variables need to be in `vars` and then pulled into `config` via app_database_url: !vault | $ANSIBLE_VAULT;1.1;AES256 abc123 app_secret_key_base: !vault | $ANSIBLE_VAULT;1.1;AES256 abc123 - name: add domain dokku_domains: app: *appname domains: - domain.com - www.domain.com - name: add domain dokku_domains: app: *appname global: True domains: [] # this command doesn't work via ansible, but always works when run locally... # https://github.com/dokku/ansible-dokku/pull/49 # - name: letsencrypt # dokku_lets_encrypt: # app: *appname # you'll need to `git push` once this is all setup

Here are key commands to manage your servers:

# can we reach our inventory? ansible all -m ping # encrypt secret keys in playbook ansible-vault encrypt_string 'the_value' --name the_key # install dependencies ansible-galaxy install -r requirements.yml --force-with-deps --force # run playbook ansible-playbook playbook.yml Deploying Elixir & Phoenix on Dokku

I’ve used dokku for projects in the past, and blogged about some of the edge cases I ran into. It took some fighting to get Elixir + Phoenix running on the Dokku side of things:

I needed to create a Procfile with an elixir web worker definition web: elixir --sname server -S mix phx.server. Things aren’t as out of the box compared with rails. I think this is mostly because there’s two separate buildpacks required that aren’t officially maintained. Dokku plugins are just git repos. There’s no registry. Best place to find plugins is the dokku documentation. There’s an install command that pulls them from GitHub. The dokku-ansible role handles many common plugins, but you need to add them to your vars => dokku_plugins config to get them to autoinstall. dokku clone needs you to add the generated key to GitHub. ssh dokku@45.77.156.135 clone:key to get the public key, then add it as a deploy key in the GitHub repo. It may not be worth it to set this up. Easier to just git-push deploy manually. Dokku (apparently, just like Heroku) allows you set a .buildpacks file in the root directory. Just add a list of git repo URLs. Use a # to specify an exact git repo SHA to use. If you keep messing around with deploys you may exit the shell while there is a lock on the deploy. dokku apps:unlock to the rescue. This has never happened to me on Heroku, although I have always been much more careful with my production applications. Curious how Heroku handles this. If the build is failing, instead of continuing to run builds via git push you can find the failing build container and jump in. docker ps -a | grep build. The second ID, which is either a short SHA or a string (dokku/yourapp:latest), is what you want to plug into docker run -ti 077581956a92 /bin/bash. From there you can experiment and tinker with the build. Most buildpacks modify the PATH to point to executables like npm, node, etc that are pulled locally for bundling web assets. Helpful for debugging issues with buildpacks. If you want to jump into a running container: docker exec -it CONTAINER_ID /bin/bash. herokish (the set of scripts which creates the heroku experience on dokku) builds things in the /tmp/build directory. https://github.com/gliderlabs/herokuish/blob/master/include/herokuish.bash and https://github.com/gliderlabs/herokuish#paths It looks like the cache dir is actually stored in /home/APPNAME/cache. This is linked to the build container during a git-push. I ran into issues with node_modules cache that required some manually debugging. dokku run does not enter into the same container that’s running your app. Use dokku enter app_name process_type the_command for that. If you are generating a sitemap, using dokku run won’t work because it doesn’t persist the files to the same container that is serving your static assets. Using S3 for static asset hosting would eliminate this problem.

Here’s what my buildpack config looks like:

# .buildpacks https://github.com/HashNuke/heroku-buildpack-elixir.git#1251439227711cf28bbfbafc101f9c9ff7f9345a https://github.com/gjaldon/heroku-buildpack-phoenix-static.git#b44e094c9da48483af5e221ff11f954a8b85479b # pheonix_static_buildpack.config # the pheonix buildpack does not specify recent versions of node & npm, which causes webpack issues node_version=12.14.1 npm_version=6.14.4 # elixir_buildpack.config elixir_version=1.10.4 # https://erlang.org/download/otp_versions_tree.html erlang_version=22.3.4 Configuring AWS EC2 using Ansible

Vultr’s free credits ended up expiring after a couple of months (as opposed to a year). I wasn’t thrilled with the service and was curious to learn more about AWS by using additional services in the future, so I decided to move the server over to AWS:

Looks like amazon linux isn’t supported on Ansible. Use the ubuntu image instead. https://github.com/geerlingguy/ansible-role-docker/issues/141 "Amazon Linux" root user is ec2-user, ubuntu’s root is ubuntu. Amazon Linux is not compatible with many ansible packages, so use ubuntu. become: true (sudo mode) is required on Amazon. The local disk space of EC2 instances is tiny by default. You can expand the local disk space, which is a EBS instance, but navigating to the elastic block store and adjusting the instance. You’ll probably need to restart shutdown -h now I forgot about this: ports for http and https not exposed by default. If you run through the one-click EC2 wizard, only ssh will be exposed. Use the longer wizard to generate a "security group" exposing the proper ports. You’ll also want to setup an elastic IP. This is an IP that you can assign, and then reassign, to another EC2 instance. I’ve always been annoyed by AWS. It’s incredibly powerful, but hard to understand. You have to think of every little configuration option as a separate object with state that needs to be configured just right. Designing infra with code via https://github.com/aws/aws-cdk makes a ton of sense. I bet once you load the entire AWS data model in your head things make a lot more sense. Learning Resources Ansible https://medium.com/@mitesh_shamra/introduction-to-ansible-e5b56ee76b8c https://blog.morizyun.com/blog/dokku-isntall-vultr-pass-mini-heroku/index.html https://www.digitalocean.com/community/tutorials/configuration-management-101-writing-ansible-playbooks https://lebenplusplus.de/2017/06/09/how-secure-are-ansible-vaults/ https://medium.com/@burakkarakan/what-exactly-is-docker-1dd62e1fde38 https://opensource.com/article/16/12/devops-security-ansible-vault Dokku https://www.petekeen.net/introduction-to-heroku-buildpacks https://github.com/jeffrafter/howto/blob/master/unformatted/elixir-phoenix-dokku.md https://www.petekeen.net/introduction-to-heroku-buildpacks https://dokku.github.io/general/automating-dokku-setup Yaml

Interestingly, there’s not great canonical documentation for yaml. There’s a spec, but not docs on the official homepage.

http://lzone.de/cheat-sheet/YAML https://github.com/darvid/trine/wiki/YAML-Primer

Continue Reading

My Process for Intentional Learning

Lately, I’ve been able to carve out dedicated to learning new skills. What I’ve learned has been random, from programming languages to how to build a tiny house. I’ve found a lot of joy in learning new skills, slowly becoming a generalist.

Over the last year, I’ve found you can optimize your "learning time" by thinking through the process of learning before you start. In my experience, picking a learning project, and creating a "learning log" for each skill is hugely helpful.

Identify a Learning Project

Learning in a vacuum doesn’t work for me.

I love reading fiction, but reading a topic that I have no immediate need to understand makes it much harder to comprehend. When I’m motivated by a problem I’m trying to solve, I can plow through books and other information quickly. Without an immediate need, I’ll read the same page many times or fall asleep with the book in my hand.

In other words, learning something Just in Case doesn’t work for me. It has to be Just in Time.

This is why a ‘learning project’ is really important. A small, useful, and preferably time-bound project that requires new skills to complete. The project is a forcing function for learning new skills. You want a project where the pain of leaving it half-done is painful.

For example, when our second daughter was born, I knew she would need the room in our house that I was using as an office (I work remotely). I could move into a room in our basement, but I loved having a large window in the room and for some reason, I didn’t want to work in a basement. So, I decided to build a tiny house to work in.

I’d never built any physical thing in my life before.

I knew I’d lose motivation once I started it (especially as the Colorado summer heat ramped up). I ordered a massive truckload of wood and dumped it in my driveway and built the initial foundation. I knew our new daughter would need my room at the end of the summer and it would become too cold to make real progress on it by October.

These factors created enough motivation to force me to finish the project when I didn’t want to. I’m glad I did! By building a mini house I learned most of the handyman skills I’ve been wanting to learn for years—the perfect learning project.

Before jumping into learning something new, take some time in picking your learning project.

For instance, let’s say you wanted to learn software programming. You could take a bunch of online courses or start reading random tutorials online. You could spend a bunch of money on a coding bootcamp, or join something like Lambda School.

However, you could also find a a simple job on UpWork that feels simple & small enough for you to figure out. This provides a context and specific application for your learnings and the extrinsic motivation to finish the work (there’s someone on the internet trusting you to get this thing done for their business).

Structure Your Learning

After you’ve picked a project, I’ve found it to be helpful to structure your learning process by asking some questions (here’s a post that roughly follows this structure):

What’s your learning project? Example: build a tiny house or automatically mark RSS articles as read What does success look like? This prevents you from following rabbit holes and forces you to finish the project. Example: build an insulated tiny house (not painted, not drywalled) or a script which marks articles more than two weeks old as read. What ‘open questions’ do you have? What are the gaps in your knowledge that would prevent you from completing the project? Write these down at the top of the document. What tools are you missing? This won’t be apparent to you at the outset, but as you start learning you’ll find friction in your process that you’ll want to eliminate. For instance, I found that the hammer I had was hard to use. I noted this down and found that $10 bought me a much better hammer. Or, in the context of programming, your IDE autocomplete may not be working in the language you are learning. What are some of the top books, tutorials, YouTube channels, etc that align most closely with what you are trying to do? What completed pieces of work are similar to what you are trying to do? For digital projects, this could be open source projects or raw asset files for a media project. Is there a community (online or otherwise) around the thing you are learning? Documenting the places where friendly people on the internet, who are obsessed with what you are learning, is super helpful. You’ll remember to ask them a question when you get stuck!

With this information in place, I start working on the project. As questions come to mind I write them down in a "learning log"—bullets in a document. If there’s a large piece of knowledge or tool that’s missing I’ll add it to the top of the document and handle it later.

I’ve found that this live-blogging style learning log helpful, even if no one reads it. By writing down questions and problems that are coming to mind as I’m learning, it forces me to clarify and refine my thinking. This often helps me solve a problem quickly. Writing down the question helps prompt my mind to provide better & unique answers.

As a meta-point, by writing down this little guide it helped me better structure my learning process for my next project!

Continue Reading

Learning Clojure by Automating an RSS Reader

I’ve been working on revamping how I consume information. Most of my information consumption has been moved to RSS feeds, but I can’t keep up with the number of articles in my feeds. When I take a look at my reader I tend to get overwhelmed and spend more time than I’d like to trying to "catch up" on information I generally was consuming out of curiosity.

Not good.

I want articles to be automatically marked as read after they are a month old to eliminate the feeling of being "behind". This is a perfect little project to learn a programming language that’s looked interesting for a while!

Building a small project in a new language or technology is the best way to learn. While I was building this tool, I documented what questions I was asking, answers to these questions, and what articles and resources I found helpful.

Posts like this have been interesting to me, hopefully this is a fun read for others!

What do I want to build?

I want to build a Clojure script for FeedBin that will:

Inspect all unread articles If the publish date is more than two weeks in the past, mark the article as unread Automatically run every day

Let’s get started!

Resources

Here are some helpful blogs & tutorials I used while learning:

http://slipset.github.io/posts/Why-Clojure-is-my-favourite-language https://ltriant.github.io/2019/08/13/clojure-learning-functional-design.html https://learnxinyminutes.com/docs/clojure/ https://eli.thegreenplace.net/2017/notes-on-debugging-clojure-code/ https://clojure.org/guides/getting_started

Also, I always try to grab a couple of large open-source repos to look at when I’m learning a new language. Here are some places I searched:

https://github.com/trending/clojure https://clojars.org http://open-source.braveclojure.com

Some repos I found interesting:

https://github.com/metabase/metabase This is probably the largest full-blown open-source Clojure application out there. Most other projects I found were libraries, not applications. https://github.com/LightTable/LightTable https://github.com/clojars/clojars-web https://github.com/dakrone/clj-http Syntax & Structure

Now that I have some browser tabs open with documentation, let’s start learning!

How do I install this thing? https://clojure.org/guides/getting_started => brew install clojure/tools/clojure Going through the "Learn X in Y" guide, some interesting takeaways: Clojure is built on the JVM and uses Java classes for things like arrays. Code in Clojure is essentially a list-of-lists. A list is how you execute code: the first element is the method name, and then arguments separated by spaces. This feels very weird at first, but it’s a really powerful concept. Simple made Easy explains the philosophy behind this a bit. "Quoting" (prefacing a list with a single quote) prevents the list from executing. This is helpful for defining a list, passing code as a data structure that can be mutated later on. Sequences (Arrays/Lists) seem to have some important different properties from vectors. I need to understand this a bit more. When you define a function it doesn’t get a name. You need to assign it (def) to a variable to give it a name. The [] in a function definition is the list of arguments. There are lots of ways to create functions: fn, defn, def ... #() multi-variadic function is a new word for me! It’s a function with a variable number of arguments. Looks like you can define different execution paths depending on the arguments, kind-of like Elixir’s pattern matching. [& args] is equivalent to (*args) in ruby The beginner (me!) can treat ArrayMap and HashMap as the same. Keywords == ruby symbols The language looks to execute from the inside out, and the composition of functions is done via spaces not commas, parens, etc. Looks like everything is immutable in Clojure. Everything is a function. So much so, that even basic control flow is managed the same way as a standard function. Looks like "STM" is an escape hatch if you need to store state. Similar to Elixir’s process state. The Clojure community is big on "repl driven development", but what exactly do they mean? How is that different from binding.pry in a ruby process to play around with code? Looks like it’s not that different. Some nice editor integrations make things a bit more clean, but more or less the same as opening up rails console with pry enabled. I’ve always disliked the ability to alias a module or function to a custom name. It makes it much harder for newcomers to the codebase to navigate what is going on. Looks like this is a pretty common pattern in Clojure, the require at the top of a file can setup custom aliases for all functions. "forms" have been mentioned a couple of times, but I still don’t get it. What is a form? I’ve heard that Clojure is a Lisp. What is a "lisp"? https://en.wikipedia.org/wiki/Lisp_(programming_language) There was an original LISP programming language, but "a lisp" is a language patterned after the original LISP Seems like the unique property of a lisp-style language is code is essentially is a linked list data structure. Since all code is a data structure, you can define really interesting macros to modify your source code. Another property is the parentheses-based syntax. It’s interesting to look at the different lisp styles available. I feel like the only language that is popular today is Clojure. Sounds like immutability is unique to Clojure and isn’t a core structure other lisps.

I think I know just enough to start coding.

Coding in Clojure

Here’s the learning process which generated the final source code:

Let’s define the namespace and get a "Hello World" to make sure I have the runtime executing locally without an issue. 184408626bb41b87d53f9b0bb5485a8e9201d8d5 Ok, now let’s outline the logic we’ll need to implement. 7e018b05ff8ad925ef2bfe9c56c4a702dce4c3d0 Now, let’s pick a HTTP library and figure out how to add it as a dependency. https://clojars.org looks like the most popular package repository. It doesn’t seem like there’s any download/popularity indicator that you can sort by. Bummer. Hard to figure out what sort of HTTP library I should use. Looks like project.clj is a gemspec type definition file. Metabase’s http library is clj-http. Let’s use that. We’ll also need to figure out how to setup this dependency file. https://github.com/metabase/metabase/blob/master/project.clj#L63 https://github.com/technomancy/leiningen is linked in the project.clj files I’ve seen. It’s listed as a dependency manager on the clj-http library: https://clojars.org/clj-http. Let’s install it via brew install leiningen. lein new feedbin and mv ./feedbin./ ./ to setup the project structure. Looks like lein will help us with dependencies and deployment. b0b4022618abac840af6679f900584d04de510c1 There’s this skip-aot thing in the main: definition which I don’t understand. In any case, if I stuff a defn -main in the file for the namespace defined in main lein run works! 764d7a1e2a537d61b036df4229a2c96671725dd8 It looks like this ^: syntax is used often. What is it? Ok, let’s copy our logic outline from the other file we were working on over to the src/feedbin/core.clj and try to add our HTTP dependency. Added [clj-http "3.10.0"] to the dependency list in project.clj, lein run seemed to pull down a bunch of files and run successfully. Now, let’s pull the FeedBin variable from the ENV and store it to a var. Looks like you have to wrap let in parens, and include commands that rely on the var within the scope of the parens. I could see how this would force you to keep methods relatively short. 6f1f8099ffd0ed5f997be93685d18d1c574efb6b Let’s hit the API and get all unread entries and store them in a var. Looks like cheshire is a popular JSON decoder, let’s use that. It looks like let is only when you want temporary bindings within a specific scope. Otherwise, you should use def to setup a variable. 5b63cd289052d9fcebec2cb2965d598927b0616a Convention is - for word separation, not _ or camel case. Let’s refactor the getenv to use def. Much better! a6a95a1e4703c07e76ecce32b56b6b0f1903acca Time to select entries that are two months old. A debugger is going to be helpful here to poke at the API responses. Looks like debugger is the pry equivalent. I had trouble getting this to work and deep-dived on this a bit: (pst) displays the stacktrace associated with the last exception. This is not dependent on clj-debugger Looking closer at clj-debugger it has ~no documentation and hasn’t been updated in nearly two years. Is there a better option? Doesn’t look like it (require 'feedbin.core :reload-all) seems like the best way to hot reload the code in a repl. Then you can rerun (feedbin.core/-main) Ah, figured it out! (break) on it’s own won’t do anything. It needs an input to debug. (break true) works. You need to run this in lien repl for it to work. As a side note, I’ve found the REPL/debugging aspect of learning a new programming language to be really important. Languages that don’t have great tooling and accessible documentation around this make it much harder for newcomers to come up to speed. The REPL feedback loop is just so much faster and in developer tooling speed matters. I was able to extract the published date, now I just need to do some date comparison to figure out which entries are over a month old. ca16f54f66a39753933168c3f8deac636144ca47 Now to mark the entries as "read" (in feedbin this is deleting the entries). Should be able to just iterate through the ID list and POST to the delete endpoint. I started running into rate limiting errors as I was testing this. # turns a string into a regex, but appears to do much more. Looks like it’s a shorthand for creating lambda. https://clojure.org/guides/weird_characters macroexpansion is an interesting command to help with debugging. With the rate limit errors gone, I can finally get this working for good. I tried passing in the article IDs as a comma-separated list as a query string and it didn’t work. I need to send this data in as a JSON blob. 166ea49439ed690ff08c8fd987530b170b9bb80e Got the delete call working. You can pass a hash directly to clj-http and it’ll convert it into JSON. Nice. 63ac8bf1d4fd969326fffa9ad7b50ad1f0a4b56d

Great! We have the script working. Now, let’s deploy it.

Clojure Deployment Using AWS Serverless

I have a friend who is constantly talking about how awesome serverless is (i.e. AWS Lambda). I also remember hearing that you can setup cron-like jobs in AWS that hit a lambda. Let’s see if that’s the case and if we can get this script working on lambda.

Some things we’ll need to figure out:

How/where do I specify that an endpoint should be hit every X hours? How do I specify where the entrypoint is for the lambda function? How do we specify environment variables?

Notes

I jumped into AWS lambda dashboard and created a function named "Mark-Feedbin-Entries-As-Read" with Java 11. It looks like the crazy AWS permission structure is generated for me. I added the com.amazonaws/aws-lambda-java-core package and it looks like I need to run gen-class to expose my handler. What is gen-class? It generates a .class file when compiling, which I vaguely remember is a file which is bundled into the .jar executable. Looks like aot compilation needs to be enabled as well. Still need to understand what aot is. I ran lein uberjar and specified feedbin.core::handler as my handler. Created a test event with "testing" as the input. Used the -standalone jar version that was generated. Looks like environment variables can be setup directly in the Lambda GUI. "Cron jobs" are setup via CloudWatch events. What is CloudWatch? It’s AWS’s monitoring stack. Strange that this is the recommended way to setup cron jobs. I would have thought there was a dedicated service for recurring job schedules. "Serverless" (looks like a CDK-like YML configuration syntax for AWS serverless) makes it look easy to deploy a lambda which executes on a schedule, but doesn’t indicate how it’s actually managed in AWS in the blog post. Aside: It’s interesting the more you dig into AWS, the more it feels like a programming language. Each of the services is a library and the interface to configure them in yaml. It looks like "Amazon EventBridge" is the new "CloudWatch Events". Looks like we can setup a rule which triggers a lambda function at a particular rate. Neat, you can setup a rule directly with the AWS Lambda GUI. Use a EventBridge trigger with rate(1 day) to trigger the function every day. Really easy! I checked on it the next day and it’s failing. How can we inspect the request? It’s probably failing due to the input data being some sort of JSON object vs a simple string that I tested with. Here’s what I found: you can inspect the logs, use CloudTrail to view an event, enable X-Ray tracing, and send failed events to a dead letter queue. I enabled all of this stuff: my end goal to inspect the event JSON passed the lambda to determine how to fix it. Ah! After a bit more digging, if you find the event in CloudTrail there’s a "View event" button that will give you the JSON output. I can then copy the JSON into the test event in the configuration for the lambda and run it there to get helpful debugging information. Feels a bit primitive, but it works. I wonder how you would run the function and locally and write integration tests using example AWS JSON? Looks like the function signature for my handler is incorrect. When handling events, the handler accepts two arguments [Object com.amazonaws.services.lambda.runtime.Context]. This fixed the issue! 8520e8a319bd5d41a67a01f9517ce4cf559ab381

Resources:

https://bernhardwenzel.com/articles/using-clojure-with-aws-lambda/ https://aws.amazon.com/blogs/compute/clojure/ https://thenewstack.io/move-your-cron-jobs-to-serverless-in-3-steps/ https://serverless.com/blog/cron-jobs-on-aws/ https://docs.aws.amazon.com/lambda/latest/dg/with-scheduledevents-example-use-app-spec.html https://lumigo.io/blog/eventbridge-vs-cloudwatch-events-kinesis-and-sns/ https://docs.aws.amazon.com/eventbridge/latest/userguide/run-lambda-schedule.html https://d0nkrs.com/post/building-aws-lambda-functions-with-clojure https://github.com/aws/aws-cdk https://github.com/jebberjeb/lambda-sample Open Questions

Here’s a list of questions that I wasn’t able to answer during my learning process:

How can you parallelize operations in Clojure? How easy is deployment? How does interop with Java work? Is there a rails-like web stack? Is there a style guide?

Continue Reading