Building a SouthWest Price Monitor and Learning Server Side JavaScript

November 30, 2021

Tags: javascript, node • Categories: Learning

Table of Contents

I originally wrote a draft of this post in early 2019. I’m spending some time learning TypeScript, so I wanted to finally get my JavaScript-related posts out of draft. Some notes and learnings here are out of date.

Both sides of our family live out of state. Over the last couple years, we’ve turned them on to credit card hacking to make visiting cheap (free). SouthWest has some awesome point bonuses on credit cards, but you can’t watch for price drops on Kayak and other flight aggregators.

After a bit of digging, I found a basic version of a tool to do just this. It’s a self-hosted bot to watch for flight cost drops so you can book (or rebook for free). I’ve been wanting to dig into server side JavaScript development, and this is the perfect excuse.

Here’s what I’d like to do:

Get the tool running somewhere simple: Heroku, Raspberry Pi, etc
Convert the use of redis to mongodb. Redis isn’t a database, it’s a key-value store. But this project is using it for persistence. Why switch to MongoDB? I’ve been wanting to understand document databases a bit more. Postgres would have been easier for me, but this project is all about learning.
Possibly add the option of searching for the best flight deal on a particular month

Below is a ‘learning log’ of what I discovered along the way. Let’s get started!

Learning JavaScript

As I mentioned in an earlier post, my JavaScript knowledge was very out of date (pre ES6). Some findings and musings below will be obvious to a seasoned JavaScript developer, but to someone more experienced in Ruby/Python/etc they’ll be some interesting tidbits.

Looks like express is the dominant HTTP router + server. It’s equivalent to the routing engine of Rails combined with rack and unicorn. It doesn’t seem like there are strong conventions to how you setup an express-based app. You bring your own ODM/ORM, testing library, etc.
There is a consistent template/folder structure. However, express doesn’t make any assumptions about a database library, although it does support a couple of different templating languages and has a preferred default (pug).
app.use adds additional middleware to the stack. Middleware is simply a function with three arguments. Very similar to rack in ruby-land or plugs in Elixir-land.
There’s a part of me that loves the micro-modularity of the node/npm ecosystem, but the lack of declarative programming like DateTime.now + 1.day starts to feel really messy. The equivalent in node is (new Date()).setDate((new Date()).getDate() + 1);. Another example: there’s no built-in sortBy and sort mutates the original array.
- Some popular packages that solve this (moment, datefuncs, underscore, etc) and the popular choice is to just pull in these packages and use them heavily.
- I always find including many external decencies adds a lot of maintenance risk to your code. These packages can die, cause strange performance issues, cause weird compatibility issues with future iterations of the language, etc. The good news is the JavaScript ecosystem is so massive, the most popular packages have a very low risk of abandonment.
Variable scoping is weird in debugger mode. If the variable isn’t referenced in the function, it’s not available to inspect in the debugger repl. Make sure you reference the variable to inspect/play with it in real time.
Node, express, etc are not billed as full-stack web frameworks like rails. I find this super frustrating: not being able spin up a console (rails console) with your entire app’s environment loaded up is annoying.
- For this particular problem, it looks like the best alternative is to write your own console.js (here’s another guide) with the things you need and startup a repl. The annoying thing here is you need to manually connect to your DB and trigger the REPL after the DB connection is successful.
- Blitz and Redwood are solving these problems, although these didn’t exist when this post was written.
It seems like node inspect + a debugger line doesn’t run the code ‘completely’. For instance, if the code runs past a mongodb.connection line it doesn’t connect. I wonder if this is because the .connection call runs async and doesn’t get a chance to execute before the debugger line is called? Is there a way to instruct the repl to execute anything in the async queue? I found that starting up a vanilla node console and requiring what you needed works better.
There are some interesting utility libraries that convert all methods on an object to be promises (async). http://bluebirdjs.com/docs/api/promise.promisifyall.html
Languages with declarative convenience methods are just so much nicer. args.priceHistory[args.priceHistory.length - 1] is just ugly compared to args.priceHistory.last.
My time at a BigCo has helped me understand the value of typing. I still find the highest velocity developer experience is type-hinting (i.e. types are not required) combined with a linter. This lets you play with code without getting all the details hardened, but still enforces guardrails to avoid a class of production errors.
I’m not seeing the value in the event-loop programming paradigm. I get how it allows you to handle more concurrent connections, but isn’t that something that should be handled by the language or in some lower level abstraction? It’s much easier to reason about code when it runs sequentially. For instance, not having object.save throw an exception right away is really annoying: I need to either use callbacks to act when the code has executed OR use async and await everywhere. I do not understand why this pattern has become so popular.
https://repl.it is very cool. The idea of sending out links with a console running your code is very handy. This is used a lot in the JavaScript community.
It’s fascinating to me how there’s always the 10x-er that becomes a hero of the community. https://github.com/substack has created a ridiculous number of npm packages.
Think about let r = await promise as let r = null; promise.then(rr => r = rr) which is executed synchronously.
Instead of hash.merge(h2) you write Object.assign({}, h2, hash). There are many unintuitive sharp edges to the language, as you learning, just googling "how to do X with JavaScript" is the best way to determine the JavaScript equivalent.
http://jsnice.org is great at parsing obfuscated JS. It tries to rename variables based on the context. Very cool.
... is the splat operator used on objects It’s called the ‘rest’ operator.
constructor is the magic method for class initialization
Looks like function definitions within a class don’t need the function keyword

Puppeteer, Proxies, and Scraping

Part of this project involved scraping information the web. Here’s some tidbits about scraping that I learned:

The node ecosystem is great for web scraping. Puppeteer is a well maintained chrome-controller package and there’s lot of sample code you can leverage to hack things together quickly.
Websites have gotten very good at detecting scrapers. There are some workarounds to try to block bot detection, but if you are using a popular site, you will most likely be detected if you are using the default puppeteer installation.
A common (and easy) detection method is IP address. If you are scraping from an AWS/cloud IP, you’ll be easily blocked. The way around this is a proxy to a residential IP address. Another option is to host your scraper locally on a Raspberry Pi or on your local computer.
https://chrome.browserless.io cool way to test puppeteer scripts
I learned a bit about web proxies. Firstly, there are a bunch of proxy protocols (SOCKS, HTTP with basic auth, etc). Different systems support different type of proxies.

Package Management

You can’t effectively use npm and yarn in the same project. Pick one or the other. Yarn is a more stable, more secure version of npm (but doesn’t have as many features / as much active development)
module.exports lets a file expose constants to others which import the file, similar to python’s import system (but with default exports). I like this compared with ruby’s "everything is global" approach. It allows the other author to explicitly define what it wants other users to access.
Npm will run pre & post scripts simply based on the name of the scripts.
import Section, {SectionGroup} assigns Section to the default export of the file, and imports the SectionGroup explicitly.
If you try to import something that isn’t defined in the module.exports of a file you will not get an error and will instead get an undefined value for that import.

Testing

tape is the test runner that this particular project used. It doesn’t look like it’s possible to run just a single test in a file without changing the test code to use test.only instead of test.
The "Test Anything Protocol" is interesting http://testanything.org. Haven’t run into this before. I like consistent test output across languages.
I do like how tape tests list out the status of each individual assertion. It becomes a bit verbose, but it’s helpful to see what assertions after the failing assertion succeeded or failed.
VS Code + node debugging is very cool when you get it configured. You need to modify your VS Code launch.json in order to get it to work with test files. https://gist.github.com/dchowitz/83bdd807b5fa016775f98065b381ca4e#gistcomment-2204588

Debugging & Hacking

I’m a big fan of REPL driven development and I always put effort into understanding the repl environment in a language to increase development speed. Here are some tips & tricks I learned:

Tab twice (after inputting ob.) in a repl exposes everything that is available on the object under inspection.
node inspect THE_FILE.js allows debugger statements to work. You can also debug remotely with chrome or with VS Code. Visual debugging is the happy path with node development, the CLI experience is poor.
You don’t need to setup variables properly in the node repl. Nice! You can just a = 1 instead of let a = 1
I’ll often copy code into a live console to play around with it, but if it’s defined as const I need to restart the console and make sure I don’t copy the const part of the variable definition. That’s annoying. There’s a lot of sharp edges to the developer ergonomics.
console.dir to output the entire javascript object
Unlike pry you need to explicitly call repl after you hit a breakpoint when running node inspect. Also, debugger causes all promises not to resolve when testing puppeteer. https://github.com/berstend/puppeteer-extra/wiki/How-to-debug-puppeteer
Cool! Navigating to about:inspect in Chrome allows you to inspect a node/puppeteer process.
list is equivalent to whereami. You need to execute it explicitly with params list(5)
_ exists like in ruby, but it doesn’t seem to work in a repl triggered by a debugger statement. _error is a neat feature which keeps the last exception that was thrown.
.help while in a repl will output a list of "dot commands" you can use in the repl.
- I had a lot of trouble getting puppeteer to execute within a script executed with node inspect and paused with debugger. I’m not sure why, but I suspect it has something to do with how promises are resolved in inspect mode.
You can enable await in your node console via --experimental-repl-await. This is really helpful to avoid having to write let r; promise.then(o => r) all of the time.

Mongo & ODMs

You’ll want to install mongo and the compass tool (brew install mongodb-compass) for GUI inspection.
- Running into startup problems? tail -f ~/Library/LaunchAgents/homebrew.mxcl.mongodb-community.plist
- If you had an old version of mongo install long ago, you may need to brew sevices stop mongodb-community && rm -rf /usr/local/var/mongodb && mkdir /usr/local/var/mongodb && brew services start mongodb-community -dv
The connection string defaults to mongodb://localhost:27017
Mongoose looks like a well-liked JavaScript ODM for Mongo.
You can think of each "row" (called a "document") as a JSON blob. You can nest things (arrays, objects, etc) in the blob. The blob is named using a UID, which is like a primary key but alphanumeric. You can do some fancy filtering that’s not possible with SQL and index specific keys on the blob.
Looks like you define classes that map to tables ("schemas") but it doesn’t look like you can easily extend them. You can add individual methods to a class but you can’t extend a mongoose model class.
It looks like a mongoose.connection call creates an event loop. Without closing the event loop, the process will hang. Use process.exit() to kill all event loops.
Relatedly, all mongo DB calls are run async, so you’ll want to await them if you expect results synchronously.
brew install mongodb-compass-community gives you a GUI to explore your mongo DB. Similar to Postico for Postgres.

Open Questions

How are event loops, like the one mongoose uses implemented? Is the node event loop built in Javascript or are there C-level hooks used for performance?
There are lots of gaps in the default REPL experience. Is there an improved repl experience for hacking?
Do Blitz/RedwoodJS/others materially improve the server side JS experience?
What killer features does mongodb have? How does it compare to other document databases? Is there a real reason to use document databases now that most SQL databases have a jsonb column type with an array of json operators built in?