Building a Crypto Index Bot and Learning Python
A long time ago, I was contracted to build a MacOS application using PyObjc. It was a neat little app that controlled the background music at high-end bars around London. That was the last time I used python (early 2.0 days if I remember properly). Since then, python has become the language of choice for ML/AI/data science and has grown to be the 2nd most popular language.
I’ve been wanting to brush up on my python knowledge and explore the language and community. Building a bot to buy a cryptocurrency index was the perfect learning project, especially since there was a bunch of existing code on GitHub doing similar things.
You can view the final crypto index bot project here. The notes from this learning project are below. These are mainly written for me to map my knowledge in other languages to python. Hopefully, it’s also helpful for others looking to get started quickly in the language!
Tooling & Package Management
I work primarily in ruby (and still enjoy the language after years of writing professionally in it). Some of the comparisons below are to the equivalent tooling in ruby-land.
- pip == bundle
- Package versions are specified in a
requirements.txt
file if you are using pip. - https://rubygems.org/ = https://pypi.org/
- There’s not really a rake equivalent that’s been adopted by the community.
- Poetry is an alternative to pip that seems to be the most popular choice for new projects.
- virtualenv = rbenv, but just for packages, not for the core python version, and is specific to each project. Poetry will autogen a virtualenv for you.
- There are dev and non-dev categories in poetry, but not a test category by default. Here’s how to add a dev dependency
poetry add -D pytest
- If you are using the VS Code terminal, certain extensions will automatically source your virtualenv. I found this annoying and disabled this extension (can’t remember which extension was causing me issues).
pyproject.toml
alternative torequirements.txt
, but also includesgemspec
-like metadata about the package. It looks likepoetry update
consumes the.toml
file and generates apoetry.lock
. I’m guessing that other build tools also consume the.toml
config and it’s not used just forpoetry
.- The python community seems to be into
toml
configuration. This is used for poetry package specifications and project-specific variables. I don’t get it: it’s slightly nicer looking than JSON, but you can’t specify arrays or nested hash/dictionaries. Why not just use yaml instead? Or just keep it simple and use JSON? - I ran into this issue where
poetry
was using the global~/Library/Caches/pypoetry
cache directory and I thought this was causing some package installation issues. I don’t think that ended up being the isweupoetry debug
poetry config -vvv
to see what configuration files are being loadedpoetry config --list
indicated that a global cache directory was being used.- Tried upgrading pip, didn’t work:
python3 -m pip install --upgrade pip
- I can’t remember how I fixed the issue, but these commands were helpful in understanding where poetry throws various code.
- If you want to hack on a package locally and use it in your project:
vcrpy = { path = "/full/path/to/project", develop = true }
in yourtoml
file- Note that you cannot use
~
in the path definition - After adding this to your
pyproject.toml
runpoetry lock && poetry install
- This will be easier in poetry 1.2
- Want to make sure your project is pulling from your locally defined project? You can inspect the path that a module was pulled from via
packagename.__file__
i.e.import vcr; print(vcr.__file__)
- I had trouble with a corrupted poetry env, I had to run
poetry env use python
to pick up my local package definition
- Working on a project not using poetry?
- Create a venv
python -m venv venv && source ./venv/bin/activate
- If there’s a setup.py then run
python setup.py install
- However, you can’t install ‘extra’ dependencies (like development/testing) via
setup.py
. It looks likepip install -e '.[dev]'
- It sounds like
setup.py
andrequirements.txt
do not define dev dependencies. You’ll probably need to install these manually. Look at the CI definition in the project to determine what dev dependencies need to be installed.
- Create a venv
- There’s a
.spec
file that seems to be used withpyinstaller
, a python package, when packaging a python application for distribution. Pyinstaller is primarily aimed at distributing packages for execution locally on someone’s computer. This use-case is one of the areas where python shines: there’s decent tooling for building a multi-platform desktop application. - You’ll see readme-like documents written in
rst
(restructure text format) instead ofmd
. I have no idea why markdown just isn’t used. - A ‘wheel’ is an architecture-specific package bundle that contained compiled binaries. This is helpful if a python package contains non-python code that needs to be compiled since it eliminates the compile step and reduces the change of any library compatibility issues (this is a major problem in PHP-land).
black
looks like the most popular python code formatter.
Language
- Multiline strings (
"""
) at the beginning of a class or function definition isn’t just a python idiom. They are ‘docstrings’ and get automatically pulled into the autogen’d python documentation. - Similar to ruby, camelCase is used for class names, snake_case is used for function/variable names.
- Calling a function requires parens, unlike ruby or elixir.
- Like javascript, return values need to explicitly be defined by
return val
. - Conditionals do not return values, which means you need to assign variables inside the block (unlike the ability to assign a variable to the return value of a block in ruby, a feature that I love).
- Each folder in a python project is transformed into a package that can you
import
. the__init__
file in the folder is automatically imported when youimport
the folder name. - Imports have to be explicitly defined, like javascript, to use any functions outside the set of global/built-in functions.
- Speaking of built-in functions, python provides a pretty random group of global functions available to you without any imports. I found this confusing:
round()
is a built-in butceil()
is not. - When you
import
with a.
it looks at the local directory for matching imports first. - Import everything in package with
from math import *
. This is not good practice, but helpful for debugging/hacking. - Although you can import individual functions from a package, this is not good practice. Import modules or classes, not individual functions.
- You have to
from package.path import ClassName
to pull a classname from a module. You can’timport package.path.ClassName
None
isnil
and capitalization mattersTrue
andFalse
are the bool values; capitalization matters.- Hashes are called
dict
s in python - Arrays are called
list
s in python - You can check the existence of an element in a list with
element in list
. Super handy! - Triple-quoted strings are like heredocs in other languages. They can be used for long comments or multi-line strings.
- Substring extraction ranges are specified by
the_string[0:-1]
. If you omit a starting range,0
is used:the_string[:-1]
. - The traditional boolean operators
&&
and||
aren’t used. Natural languageand
andor
is what you use instead. - Keyword arguments are grouped together using
**kwargs
in the method definition. - You can splat a dict into keyword arguments using
function_call(**dict)
- All arguments are keyword arguments in python. More info.
- You can lazy-evaluate a comprehension using
()
instead of[]
- When playing with comprehensions inside of a ipython session variable scoping will not act the same as if you weren’t executing within a
breakpoint()
. I don’t understand the reasons for this, but beware! - In addition to list comprehensions, there are dictionary comprehensions. Use
{...}
for these. - When logic gets complex for a list comprehension, you’ll need to use a for loop instead (even if you want to do basic log debugging within a comprehension). I miss ruby’s multi-line blocks and chained maps.
- List comprehensions are neat, but there doesn’t seem to be a way to do complex data transformations cleanly. I hate having to define an array, append to it, and then return it. The filter/map/etc functions can’t be easily chained like ruby or javascript. I wonder what I’m missing here? I’ve heard of pandas/numpy, maybe this is what those libraries solve?
- There are strange gaps in the stdlib, especially around manipulating data structures. For instance, there’s no dead-simple way to flatten an array-of-arrays.
import operator; from functools import reduce; reduce(operator.concat, array_of_arrays)
- Similarly, there’s no easy way to get unique values from a list.
- Get all of the string values of an enum
[choice.value for choice in MarketIndexStrategy]
- By subclassing
str
andenum.Enum
(ex:class MarketIndexStrategy(str, enum.Enum):
) you can use==
to compare strings to enums. - There’s no
?
tertiary operator, instead you can do a one-liner if-else:assignment = result if condition else alternative
- To enable string interpolation that references variable names you need to use
f"string {variable}"
. Otherwise you’ll need to runformat
on the string to get it interpolated:"string {}".format(variable)
- Python has built-in tuples
(1, 2, 3)
. I’ve always found it annoying when languages just have arrays and don’t support tuples. - Unlike ruby, not all code has a return value. You have to explicitly return from a function and you can’t assign the result of a code block to a variable.
- There’s some really neat python packages: natural language processing, pandas, numpy. Python has gained a lot of traction in the deep learning/AI space because of the high-quality packages available.
is
is NOT the same as==
.is
tests if the variable references the same object, not if the objects are equal in value- You can’t do an inline try/catch. Many bad patterns that ruby and other languages really shouldn’t let you do are blocked. In a lot of ways, python is a simpler language that forces you to be more explicit and write simpler code. I like this aspect of the language a lot.
- Sets are denoted with
{}
, which is also used for dictionaries/hashes. - Here’s how decorators work:
- The
@decorator
on top of a method is like an elixir macro or ruby metaprogramming. It transforms the method beneath the decorator. - The
@
syntax ("pie" operator) calls thedecorator
function, passing the function below the decorator as an argument to thedecorator
function, and reassigning the passed function to the transformed function definition. The decorator function must return a function. - There is no special syntax to designate a function as a ‘decorator function’. As long it accepts a function as an argument and returns a function, it can be used as a decorator.
- The
- Referencing an unspecified key in a dict raises an exception. You need to specify a default:
h.get(key, None)
to safely grab a value from a dict. - An empty array will evaluate to false. You don’t need to
if len(l) == 0:
. Instead you canif !l:
. - Same goes with empty dicts and sets.
- Lambdas can only be single-line. This is a bummer, and forces you to write code in a different style.
:=
allows you to assign and test a value within a conditional. Interesting that there’s a completely separate syntax for ‘assign & test’.__init__.py
in a folder defines what happens when you import a folder reference.- Here’s how classes work:
class newClass(superClass):
for defining a new class__init__
is the magic initialization methodself.i_var
within__init__
defines a new instance variable for a class. This is a good breakdown of instance and class variables.- you can execute code within a class outside of a method definition for class-level variables and logic, new instances of a class are created via
newClass()
. - Instance methods of a class are always passed
self
as the first argument - Class variables are available on the instance as well, which is a bit strange.
- You can use class variables as default values for instance variables. This doesn’t seem like a great idea.
newClass.__dict__
will give you a breakdown of everything on the class. Kind of likeprototype
in javascript.- Python has multiple inheritance.
class newClass(superClass1, superClass2)
. Inherited classes are searched left-to-right. - There are not private variables built into the language, but the convention for indicating a variable is private is using a
_
likeself._private = value
- There’s a javascript-like async/await pattern (coroutines). I didn’t dig into it, but seems very similar to Javascript’s pattern.
Debugging & Hacking
One of the important aspects of a language for me is the REPL and tinkering/hacking environment. If I can’t open up a REPL and interactively write/debug code, I’m a much slower developer. Thus far, ruby has the best interactive development environment that I’ve encountered:
binding.pry
andbinding.pry_remote
when your console isn’t running your code directly to open a repl- Automatic breakpoints on unhandled exceptions, in tests or when running the application locally
- Display code context in terminal when a breakpoint is hit
- Print and inspect local variables within a breakpoint
- Navigate up and down the callstack and inspect variables and state within each frame
- Overwrite/monkeypatch existing runtime code and rerun it with the new implementation within a repl
- Define new functions within the repl
- Inspect function implementation within the repl
I’d say that python is the first language that matches ruby’s debugging/hacking environment that I’ve used. It’s great, and better than ruby in many ways.
inspect
is a very helpful stdlib package for poking at an object in a repl and figuring out the method, variables, etc available to it.traceback
provides some great tools for inspecting the current stack.- How you drop an interactive console at any point in your code? There are a couple ways:
- Uses the ipython enhanced repl in combination with the built in debugger
import ipdb; ipdb.set_trace()
. Requires you to install a separate package. - There’s a
breakpoint()
builtin that launches the standard pdb debugger. You can configurebreakpoint()
to use ipdb viaexport PYTHONBREAKPOINT=ipdb.set_trace
. - All of the standard pdb functions work with ipdb
import code; code.interact(local=dict(globals(), **locals()))
can be used without any additional packages installed.
- Uses the ipython enhanced repl in combination with the built in debugger
- bpython is a great improvement to the default python. You need to install this within your venv otherwise the packages within your projects venv won’t be available to it:
pip install bpython && asdf reshim
ipython
is a bpython alternative that looks to be better maintained and integrates directly withipdb
.python -m ipdb script.py
to automatically open up ipython when an exception is raised when runningscript.py
- Some misc ipython tips and tricks:
- If something is throwing an exception and you want to debug it:
from ipdb import launch_ipdb_on_exception; with launch_ipdb_on_exception(): thing_causing_exception()
- who / whos in
whereami
%psource
orsource
likeshow-source
pp
to pretty print an objectipython --pdb script.py
to break on unhandled exceptions- Great grab bag of interesting tips
%quickref
for detailed helpexit
gets you out of the repl entirely
- If something is throwing an exception and you want to debug it:
- All of the pypi information is pulled from a
PKG-INFO
file in the root of a package rich
-powered tracebacks are neat, especially withlocals=True
- The ruby-like metaprogramming/monkeypatching stuff happens via the
__*__
functions which are mostly contained within the baseobject
definitions. For instance,logging.__getattribute__('WARN')
is equivalent tologging.WARN
- You can reload code in a REPL via
from importlib import reload; reload(module_name)
. Super helpful for hacking on a module (definitely not as nice as Elixir’srecompile
). - Monkeypatching in python isn’t as clean as ruby, which in some ways is better since monkeypatching is really an antipattern and shouldn’t be used often. Making it harder and more ugly helps to dissuade folks from using it. To monkeypatch, you reassign the function/method to another method:
ClassName.method_name = new_method
. Here’s an example.
Typing
I’ve become a huge fan of gradual types in dynamic languages. I never use them right away, but once the code hardens and I’m relatively sure I won’t need to iterate on the code design, I add some types in to improve self-documentation and make it safer to refactor in the future.
Python has a great gradual type system built-in. Way better than Ruby’s.
mypy .
on the command line to test all python files within a folder.- If your project fails to pass mypy, it won’t cause any runtime errors by default.
- There’s a VS Code extension. This extension is included in Pylance, which you should probably be using instead, but you need to set the typing mode to ‘basic’.
- Return value types are set with
->
before the:
at the end of the method definition. Otherwise, typing works very similar to other languages with gradular typing (TypeScript, Ruby, etc). - A common pattern is importing types via
import types as t
t.Union[str, float]
for union/any types,- You can’t merge dictionaries if you are using a TypedDict (
dict | dict_to_merge
). Massive PITA when mutating API data. - Verbose types can be assigned to a variable, and that variable can be used in type definintions. Handy way to make your code a bit cleaner.
- Enums defined with
enum.Enum
can be types.
Testing
- Like Elixir, there are doctests that execute python within docstrings to ensure they work. Neat!
- There are built-in test libraries that look comparable to ruby’s testunit.
pytest
is similar tominitest
: provides easy plugins, some better standard functionality, and builds on top ofunittest
. You probably want to use pytest for your testing framework.setup.cfg
is parsed bypytest
automatically and can change how tests work.conftest.py
is another magic file autoloaded bypytest
which sets up hooks for various plugins. You can put this in the root of your project, or intest/
- Test files must follow a naming convention
test_*.py
or*_test.py
. If you don’t follow this convention, they won’t be picked up bypytest
by default. breakpoint()
s won’t work by default, you need to pass the-s
param topytest
- Like ruby, there are some great plugins for recording and replaying HTTP requests. Checkout
pytest-recording
andvcrpy
. - To record HTTP request run
pytest --record-mode=once
- If you want to be able to inspect & modify the API responses that are saved, use the VCR configuration option
"decode_compressed_response": True
- There’s a mocking library in stdlib, which is comprehensive. I’m not sure why other languages don’t do this—everyone needs a mocking library.
- It looks like you set expectations on a mock after it runs, not before.
- Here’s how mocking works:
- The
@patch
decorator is a clean way to manage mocking if you don’t have too many methods or objects to mock in a single test. - If you add multiple patch decorators to a method, the mocks for those methods are passed in as additional arguments. The last patch applied is the first argument.
mock.call_count
,mock.mock_calls
,mock.mock_calls[0].kwargs
are the main methods you’ll want for assertions
- The
asset
without parens is used in tests. This confused me, until I looked it up in the stdlib docs and realizedassert
is a language construct not a method.tox
is much more complex thatpytest
. It’s not a replacement forpytest
, but seems to run on top of it, adding a bunch of functionality like running against multiple environments and installing additional packages. It feels confusing—almost like GitHub actions running locally. If you want to just run a single test file, you need to specify an environment identifier and test filetox -epy38-requests -- -x tests/unit/test_persist.py
My thoughts on Python
Overall, I’m impressed with how python is improved over the years. Here are some things I enjoyed:
- Gradual typing included in the core language
- Comprehensions are natural to write
- Syntax simplicity: there are not too many ways to do things, which makes code more straightforward to read.
- Mature, well-designed libraries
- Virtual environments out of the box
- Robust, well-maintained developer tooling (ibpd, ipython, etc) with a advanced REPL
- Great built-in testing libraries
- Lots of example code to grep through for usage examples
- Explicit imports and local-by-default logic (unlike ruby, where it’s much easier to modify global state)
- Easy to understand runtime environment (in comparison to JavaScript & Elixir/BEAM)
The big question is if Django is a good alternative to Rails. I love Rails: it’s expansive, well-maintained, thoughtfully designed and constantly improving. It provides a massive increase in development velocity and I haven’t found a framework that’s as complete as Rails. If Django is close to rails, I don’t see a strong argument for not using anything python over ruby for a web product.
Open Questions
Some questions I didn’t have time to answer. If I end up working on this project further, this is a list of questions I’d love to answer:
- How good is django? Does it compare to Rails, or is it less batteries-included and more similar to phoenix/JS in that sense.
- Does numpy/pandas solve the data manipulation issue? My biggest gripe with python is the lack of chained data manipulation operators like ruby.
- How does the ML/AI/data science stuff work? This was one of my primary motivations for brushing up on my python skills and I’d love to deeply explore this.
- How does async/await work in python?
Learning Resources
General guides:
- https://python-patterns.guide/python/module-globals/
- https://book.pythontips.com/en/latest/ternary_operators.html
- https://realpython.com/python-lambda/#anonymous-functions
- https://google.github.io/styleguide/pyguide.html
Monkeypatching:
- https://sharmapacific.in/monkey-patching-in-python/
- https://github.com/ytdl-org/youtube-dl/commit/00fcc17aeeab11ce694699bf183d33a3af75aab6
- https://filippo.io/instance-monkey-patching-in-python/
- https://tryolabs.com/blog/2013/07/05/run-time-method-patching-python/
Open Source Example Code
There are some great, large open source python projects to learn from:
- https://github.com/getsentry/sentry
- https://github.com/arachnys/cabot – opens source APM
- https://github.com/vitorfs/bootcamp
- https://github.com/rafalp/Misago
Download these in a folder on your local to easily grep through.