Building a Crypto Index Bot and Learning Python

A long time ago, I was contracted to build a MacOS application using PyObjc. It was a neat little app that controlled the background music at high-end bars around London. That was the last time I used python (early 2.0 days if I remember properly). Since then, python has become the language of choice for ML/AI/data science and has grown to be the 2nd most popular language.

I've been wanting to brush up on my python knowledge and explore the language and community. Building a bot to 'buy the index' of cryptocurrency was the perfect learning project, especially since there was a bunch of existing code on GitHub doing similar things.

You can view the final crypto index bot project here. The notes from this learning project are below. These are mainly written for me to map my knowledge in other languages to python. Hopefully, it's also helpful for others looking to get started quickly in the language!

Tooling & Package Management

I work primarily in ruby (and still enjoy the language after years of writing professionally in it). Some of the comparisons below are to the equivalent tooling in ruby-land.

  • pip == bundle
  • Package versions are specified in a requirements.txt file if you are using pip.
  • https://rubygems.org/ = https://pypi.org/
  • There's not really a rake equivalent that's been adopted by the community.
  • Poetry is an alternative to pip that seems to be the most popular choice for new projects.
  • virtualenv = rbenv, but just for packages, not for the core python version, and is specific to each project. Poetry will autogen a virtualenv for you.
  • There are dev and non-dev categories in poetry, but not a test category by default. Here's how to add a dev dependency poetry add -D pytest
  • If you are using the VS Code terminal, certain extensions will automatically source your virtualenv. I found this annoying and disabled this extension (can't remember which extension was causing me issues).
  • pyproject.toml alternative to requirements.txt, but also includes gemspec-like metadata about the package. It looks like poetry update consumes the .toml file and generates a poetry.lock. I'm guessing that other build tools also consume the .toml config and it's not used just for poetry.
  • The python community seems to be into toml configuration. This is used for poetry package specifications and project-specific variables. I don't get it: it's slightly nicer looking than JSON, but you can't specify arrays or nested hash/dictionaries. Why not just use yaml instead? Or just keep it simple and use JSON?
  • I ran into this issue where poetry was using the global ~/Library/Caches/pypoetry cache directory and I thought this was causing some package installation issues. I don't think that ended up being the isweu
    • poetry debug
    • poetry config -vvv to see what configuration files are being loaded
    • poetry config --list indicated that a global cache directory was being used.
    • Tried upgrading pip, didn't work: python3 -m pip install --upgrade pip
    • I can't remember how I fixed the issue, but these commands were helpful in understanding where poetry throws various code.
  • If you want to hack on a package locally and use it in your project:
    • vcrpy = { path = "/full/path/to/project", develop = true } in your toml file
    • Note that you cannot use ~ in the path definition
    • After adding this to your pyproject.toml run poetry lock && poetry install
    • This will be easier in poetry 1.2
    • Want to make sure your project is pulling from your locally defined project? You can inspect the path that a module was pulled from via packagename.__file__ i.e. import vcr; print(vcr.__file__)
    • I had trouble with a corrupted poetry env, I had to run poetry env use python to pick up my local package definition
  • Working on a project not using poetry?
    • Create a venv python -m venv venv && source ./venv/bin/activate
    • If there's a setup.py then run python setup.py install
    • However, you can't install 'extra' dependencies (like development/testing) via setup.py. It looks like pip install -e '.[dev]'
    • It sounds like setup.py and requirements.txt do not define dev dependencies. You'll probably need to install these manually. Look at the CI definition in the project to determine what dev dependencies need to be installed.
  • There's a .spec file that seems to be used with pyinstaller, a python package, when packaging a python application for distribution. Pyinstaller is primarily aimed at distributing packages for execution locally on someone's computer. This use-case is one of the areas where python shines: there's decent tooling for building a multi-platform desktop application.
  • You'll see readme-like documents written in rst (restructure text format) instead of md. I have no idea why markdown just isn't used.
  • A 'wheel' is an architecture-specific package bundle that contained compiled binaries. This is helpful if a python package contains non-python code that needs to be compiled since it eliminates the compile step and reduces the change of any library compatibility issues (this is a major problem in PHP-land).
  • black looks like the most popular python code formatter.

Language

  • Multiline strings (""") at the beginning of a class or function definition isn't just a python idiom. They are 'docstrings' and get automatically pulled into the autogen'd python documentation.
  • Similar to ruby, camelCase is used for class names, snake_case is used for function/variable names.
  • Calling a function requires parens, unlike ruby or elixir.
  • Like javascript, return values need to explicitly be defined by return val.
  • Conditionals do not return values, which means you need to assign variables inside the block (unlike the ability to assign a variable to the return value of a block in ruby, a feature that I love).
  • Each folder in a python project is transformed into a package that can you import. the __init__ file in the folder is automatically imported when you import the folder name.
  • Imports have to be explicitly defined, like javascript, to use any functions outside the set of global/built-in functions.
  • Speaking of built-in functions, python provides a pretty random group of global functions available to you without any imports. I found this confusing: round() is a built-in but ceil() is not.
  • When you import with a . it looks at the local directory for matching imports first.
  • Import everything in package with from math import *. This is not good practice, but helpful for debugging/hacking.
  • Although you can import individual functions from a package, this is not good practice. Import modules or classes, not individual functions.
  • You have to from package.path import ClassName to pull a classname from a module. You can't import package.path.ClassName
  • None is nil and capitalization matters
  • True and False are the bool values; capitalization matters.
  • Hashes are called dicts in python
  • Arrays are called lists in python
  • You can check the existence of an element in a list with element in list. Super handy!
  • Triple-quoted strings are like heredocs in other languages. They can be used for long comments or multi-line strings.
  • Substring extraction ranges are specified by the_string[0:-1]. If you omit a starting range, 0 is used: the_string[:-1].
  • The traditional boolean operators && and || aren't used. Natural language and and or is what you use instead.
  • Keyword arguments are grouped together using **kwargs in the method definition.
  • You can splat a dict into keyword arguments using function_call(**dict)
  • All arguments are keyword arguments in python. More info.
  • You can lazy-evaluate a comprehension using () instead of []
  • When playing with comprehensions inside of a ipython session variable scoping will not act the same as if you weren't executing within a breakpoint(). I don't understand the reasons for this, but beware!
  • In addition to list comprehensions, there are dictionary comprehensions. Use {...} for these.
  • When logic gets complex for a list comprehension, you'll need to use a for loop instead (even if you want to do basic log debugging within a comprehension). I miss ruby's multi-line blocks and chained maps.
  • List comprehensions are neat, but there doesn't seem to be a way to do complex data transformations cleanly. I hate having to define an array, append to it, and then return it. The filter/map/etc functions can't be easily chained like ruby or javascript. I wonder what I'm missing here? I've heard of pandas/numpy, maybe this is what those libraries solve?
  • There are strange gaps in the stdlib, especially around manipulating data structures. For instance, there's no dead-simple way to flatten an array-of-arrays. import operator; from functools import reduce; reduce(operator.concat, array_of_arrays)
  • Similarly, there's no easy way to get unique values from a list.
  • Get all of the string values of an enum [choice.value for choice in MarketIndexStrategy]
  • By subclassing str and enum.Enum (ex: class MarketIndexStrategy(str, enum.Enum):) you can use == to compare strings to enums.
  • There's no ? tertiary operator, instead you can do a one-liner if-else: assignment = result if condition else alternative
  • To enable string interpolation that references variable names you need to use f"string {variable}". Otherwise you'll need to run format on the string to get it interpolated: "string {}".format(variable)
  • Python has built-in tuples (1, 2, 3). I've always found it annoying when languages just have arrays and don't support tuples.
  • Unlike ruby, not all code has a return value. You have to explicitly return from a function and you can't assign the result of a code block to a variable.
  • There's some really neat python packages: natural language processing, pandas, numpy. Python has gained a lot of traction in the deep learning/AI space because of the high-quality packages available.
  • is is NOT the same as ==. is tests if the variable references the same object, not if the objects are equal in value
  • You can't do an inline try/catch. Many bad patterns that ruby and other languages really shouldn't let you do are blocked. In a lot of ways, python is a simpler language that forces you to be more explicit and write simpler code. I like this aspect of the language a lot.
  • Sets are denoted with {}, which is also used for dictionaries/hashes.
  • Here's how decorators work:
    • The @decorator on top of a method is like an elixir macro or ruby metaprogramming. It transforms the method beneath the decorator.
    • The @ syntax ("pie" operator) calls the decorator function, passing the function below the decorator as an argument to the decorator function, and reassigning the passed function to the transformed function definition. The decorator function must return a function.
    • There is no special syntax to designate a function as a 'decorator function'. As long it accepts a function as an argument and returns a function, it can be used as a decorator.
  • Referencing an unspecified key in a dict raises an exception. You need to specify a default: h.get(key, None) to safely grab a value from a dict.
  • An empty array will evaluate to false. You don't need to if len(l) == 0:. Instead you can if !l:.
  • Same goes with empty dicts and sets.
  • Lambdas can only be single-line. This is a bummer, and forces you to write code in a different style.
  • := allows you to assign and test a value within a conditional. Interesting that there's a completely separate syntax for 'assign & test'.
  • __init__.py in a folder defines what happens when you import a folder reference.
  • Here's how classes work:
    • class newClass(superClass): for defining a new class
    • __init__ is the magic initialization method
    • self.i_var within __init__ defines a new instance variable for a class. This is a good breakdown of instance and class variables.
    • you can execute code within a class outside of a method definition for class-level variables and logic, new instances of a class are created via newClass().
    • Instance methods of a class are always passed self as the first argument
    • Class variables are available on the instance as well, which is a bit strange.
    • You can use class variables as default values for instance variables. This doesn't seem like a great idea.
    • newClass.__dict__ will give you a breakdown of everything on the class. Kind of like prototype in javascript.
    • Python has multiple inheritance. class newClass(superClass1, superClass2). Inherited classes are searched left-to-right.
    • There are not private variables built into the language, but the convention for indicating a variable is private is using a _ like self._private = value
  • There's a javascript-like async/await pattern (coroutines). I didn't dig into it, but seems very similar to Javascript's pattern.

Debugging & Hacking

One of the important aspects of a language for me is the REPL and tinkering/hacking environment. If I can't open up a REPL and interactively write/debug code, I'm a much slower developer. Thus far, ruby has the best interactive development environment that I've encountered:

  • binding.pry and binding.pry_remote when your console isn't running your code directly to open a repl
  • Automatic breakpoints on unhandled exceptions, in tests or when running the application locally
  • Display code context in terminal when a breakpoint is hit
  • Print and inspect local variables within a breakpoint
  • Navigate up and down the callstack and inspect variables and state within each frame
  • Overwrite/monkeypatch existing runtime code and rerun it with the new implementation within a repl
  • Define new functions within the repl
  • Inspect function implementation within the repl

I'd say that python is the first language that matches ruby's debugging/hacking environment that I've used. It's great, and better than ruby in many ways.

  • inspect is a very helpful stdlib package for poking at an object in a repl and figuring out the method, variables, etc available to it.
  • traceback provides some great tools for inspecting the current stack.
  • How you drop an interactive console at any point in your code? There are a couple ways:
    • Uses the ipython enhanced repl in combination with the built in debugger import ipdb; ipdb.set_trace(). Requires you to install a separate package.
    • There's a breakpoint() builtin that launches the standard pdb debugger. You can configure breakpoint() to use ipdb via export PYTHONBREAKPOINT=ipdb.set_trace.
    • All of the standard pdb functions work with ipdb
    • import code; code.interact(local=dict(globals(), **locals())) can be used without any additional packages installed.
  • bpython is a great improvement to the default python. You need to install this within your venv otherwise the packages within your projects venv won't be available to it: pip install bpython && asdf reshim
  • ipython is a bpython alternative that looks to be better maintained and integrates directly with ipdb.
  • python -m ipdb script.py to automatically open up ipython when an exception is raised when running script.py
  • Some misc ipython tips and tricks:
    • If something is throwing an exception and you want to debug it: from ipdb import launch_ipdb_on_exception; with launch_ipdb_on_exception(): thing_causing_exception()
    • who / whos in whereami
    • %psource or source like show-source
    • pp to pretty print an object
    • ipython --pdb script.py to break on unhandled exceptions
    • Great grab bag of interesting tips
    • %quickref for detailed help
    • exit gets you out of the repl entirely
  • All of the pypi information is pulled from a PKG-INFO file in the root of a package
  • rich-powered tracebacks are neat, especially with locals=True
  • The ruby-like metaprogramming/monkeypatching stuff happens via the __*__ functions which are mostly contained within the base object definitions. For instance, logging.__getattribute__('WARN') is equivalent to logging.WARN
  • You can reload code in a REPL via from importlib import reload; reload(module_name). Super helpful for hacking on a module (definitely not as nice as Elixir's recompile).
  • Monkeypatching in python isn't as clean as ruby, which in some ways is better since monkeypatching is really an antipattern and shouldn't be used often. Making it harder and more ugly helps to dissuade folks from using it. To monkeypatch, you reassign the function/method to another method: ClassName.method_name = new_method. Here's an example.

Typing

I've become a huge fan of gradual types in dynamic languages. I never use them right away, but once the code hardens and I'm relatively sure I won't need to iterate on the code design, I add some types in to improve self-documentation and make it safer to refactor in the future.

Python has a great gradual type system built-in. Way better than Ruby's.

  • mypy . on the command line to test all python files within a folder.
  • If your project fails to pass mypy, it won't cause any runtime errors by default.
  • There's a VS Code extension. This extension is included in Pylance, which you should probably be using instead, but you need to set the typing mode to 'basic'.
  • Return value types are set with -> before the : at the end of the method definition. Otherwise, typing works very similar to other languages with gradular typing (TypeScript, Ruby, etc).
  • A common pattern is importing types via import types as t
  • t.Union[str, float] for union/any types,
  • You can't merge dictionaries if you are using a TypedDict (dict | dict_to_merge). Massive PITA when mutating API data.
  • Verbose types can be assigned to a variable, and that variable can be used in type definintions. Handy way to make your code a bit cleaner.
  • Enums defined with enum.Enum can be types.

Testing

  • Like Elixir, there are doctests that execute python within docstrings to ensure they work. Neat!
  • There are built-in test libraries that look comparable to ruby's testunit.
  • pytest is similar to minitest: provides easy plugins, some better standard functionality, and builds on top of unittest. You probably want to use pytest for your testing framework.
  • setup.cfg is parsed by pytest automatically and can change how tests work.
  • conftest.py is another magic file autoloaded by pytest which sets up hooks for various plugins. You can put this in the root of your project, or in test/
  • Test files must follow a naming convention test_*.py or *_test.py. If you don't follow this convention, they won't be picked up by pytest by default.
  • breakpoint()s won't work by default, you need to pass the -s param to pytest
  • Like ruby, there are some great plugins for recording and replaying HTTP requests. Checkout pytest-recording and vcrpy.
  • To record HTTP request run pytest --record-mode=once
  • If you want to be able to inspect & modify the API responses that are saved, use the VCR configuration option "decode_compressed_response": True
  • There's a mocking library in stdlib, which is comprehensive. I'm not sure why other languages don't do this—everyone needs a mocking library.
  • It looks like you set expectations on a mock after it runs, not before.
  • Here's how mocking works:
    • The @patch decorator is a clean way to manage mocking if you don't have too many methods or objects to mock in a single test.
    • If you add multiple patch decorators to a method, the mocks for those methods are passed in as additional arguments. The last patch applied is the first argument.
    • mock.call_count, mock.mock_calls, mock.mock_calls[0].kwargs are the main methods you'll want for assertions
  • asset without parens is used in tests. This confused me, until I looked it up in the stdlib docs and realized assert is a language construct not a method.
  • tox is much more complex that pytest. It's not a replacement for pytest, but seems to run on top of it, adding a bunch of functionality like running against multiple environments and installing additional packages. It feels confusing—almost like GitHub actions running locally. If you want to just run a single test file, you need to specify an environment identifier and test file tox -epy38-requests -- -x tests/unit/test_persist.py

My thoughts on Python

Overall, I'm impressed with how python is improved over the years. Here are some things I enjoyed:

  • Gradual typing included in the core language
  • Comprehensions are natural to write
  • Syntax simplicity: there are not too many ways to do things, which makes code more straightforward to read.
  • Mature, well-designed libraries
  • Virtual environments out of the box
  • Robust, well-maintained developer tooling (ibpd, ipython, etc) with a advanced REPL
  • Great built-in testing libraries
  • Lots of example code to grep through for usage examples
  • Explicit imports and local-by-default logic (unlike ruby, where it's much easier to modify global state)
  • Easy to understand runtime environment (in comparison to JavaScript & Elixir/BEAM)

The big question is if Django is a good alternative to Rails. I love Rails: it's expansive, well-maintained, thoughtfully designed and constantly improving. It provides a massive increase in development velocity and I haven't found a framework that's as complete as Rails. If Django is close to rails, I don't see a strong argument for not using anything python over ruby for a web product.

Open Questions

Some questions I didn't have time to answer. If I end up working on this project further, this is a list of questions I'd love to answer:

  • How good is django? Does it compare to Rails, or is it less batteries-included and more similar to phoenix/JS in that sense.
  • Does numpy/pandas solve the data manipulation issue? My biggest gripe with python is the lack of chained data manipulation operators like ruby.
  • How does the ML/AI/data science stuff work? This was one of my primary motivations for brushing up on my python skills and I'd love to deeply explore this.
  • How does async/await work in python?

Learning Resources

General guides:

Monkeypatching:

Open Source Example Code

There are some great, large open source python projects to learn from:

Download these in a folder on your local to easily grep through.