Building a SouthWest Price Monitor and Learning Server Side JavaScript
Tags: javascript, node • Categories: Learning
I originally wrote a draft of this post in early 2019. I’m spending some time learning TypeScript, so I wanted to finally get my JavaScript-related posts out of draft. Some notes and learnings here are out of date.
Both sides of our family live out of state. Over the last couple years, we’ve turned them on to credit card hacking to make visiting cheap (free). SouthWest has some awesome point bonuses on credit cards, but you can’t watch for price drops on Kayak and other flight aggregators.
After a bit of digging, I found a basic version of a tool to do just this. It’s a self-hosted bot to watch for flight cost drops so you can book (or rebook for free). I’ve been wanting to dig into server side JavaScript development, and this is the perfect excuse.
Here’s what I’d like to do:
- Get the tool running somewhere simple: Heroku, Raspberry Pi, etc
- Convert the use of redis to mongodb. Redis isn’t a database, it’s a key-value store. But this project is using it for persistence. Why switch to MongoDB? I’ve been wanting to understand document databases a bit more. Postgres would have been easier for me, but this project is all about learning.
- Possibly add the option of searching for the best flight deal on a particular month
Below is a ‘learning log’ of what I discovered along the way. Let’s get started!
Learning JavaScript
As I mentioned in an earlier post, my JavaScript knowledge was very out of date (pre ES6). Some findings and musings below will be obvious to a seasoned JavaScript developer, but to someone more experienced in Ruby/Python/etc they’ll be some interesting tidbits.
- Looks like
express
is the dominant HTTP router + server. It’s equivalent to the routing engine of Rails combined with rack and unicorn. It doesn’t seem like there are strong conventions to how you setup an express-based app. You bring your own ODM/ORM, testing library, etc. - There is a consistent template/folder structure. However, express doesn’t make any assumptions about a database library, although it does support a couple of different templating languages and has a preferred default (pug).
app.use
adds additional middleware to the stack. Middleware is simply a function with three arguments. Very similar to rack in ruby-land or plugs in Elixir-land.- There’s a part of me that loves the micro-modularity of the node/npm ecosystem, but the lack of declarative programming like
DateTime.now + 1.day
starts to feel really messy. The equivalent in node is(new Date()).setDate((new Date()).getDate() + 1);
. Another example: there’s no built-insortBy
andsort
mutates the original array.- Some popular packages that solve this (moment, datefuncs, underscore, etc) and the popular choice is to just pull in these packages and use them heavily.
- I always find including many external decencies adds a lot of maintenance risk to your code. These packages can die, cause strange performance issues, cause weird compatibility issues with future iterations of the language, etc. The good news is the JavaScript ecosystem is so massive, the most popular packages have a very low risk of abandonment.
- Variable scoping is weird in debugger mode. If the variable isn’t referenced in the function, it’s not available to inspect in the debugger repl. Make sure you reference the variable to inspect/play with it in real time.
- Node, express, etc are not billed as full-stack web frameworks like rails. I find this super frustrating: not being able spin up a console (
rails console
) with your entire app’s environment loaded up is annoying.- For this particular problem, it looks like the best alternative is to write your own
console.js
(here’s another guide) with the things you need and startup a repl. The annoying thing here is you need to manually connect to your DB and trigger the REPL after the DB connection is successful. - Blitz and Redwood are solving these problems, although these didn’t exist when this post was written.
- For this particular problem, it looks like the best alternative is to write your own
- It seems like
node inspect
+ adebugger
line doesn’t run the code ‘completely’. For instance, if the code runs past amongodb.connection
line it doesn’t connect. I wonder if this is because the.connection
call runs async and doesn’t get a chance to execute before thedebugger
line is called? Is there a way to instruct the repl to execute anything in the async queue? I found that starting up a vanilla node console and requiring what you needed works better. - There are some interesting utility libraries that convert all methods on an object to be promises (async). http://bluebirdjs.com/docs/api/promise.promisifyall.html
- Languages with declarative convenience methods are just so much nicer.
args.priceHistory[args.priceHistory.length - 1]
is just ugly compared toargs.priceHistory.last
. - My time at a BigCo has helped me understand the value of typing. I still find the highest velocity developer experience is type-hinting (i.e. types are not required) combined with a linter. This lets you play with code without getting all the details hardened, but still enforces guardrails to avoid a class of production errors.
- I’m not seeing the value in the event-loop programming paradigm. I get how it allows you to handle more concurrent connections, but isn’t that something that should be handled by the language or in some lower level abstraction? It’s much easier to reason about code when it runs sequentially. For instance, not having
object.save
throw an exception right away is really annoying: I need to either use callbacks to act when the code has executed OR useasync
andawait
everywhere. I do not understand why this pattern has become so popular. - https://repl.it is very cool. The idea of sending out links with a console running your code is very handy. This is used a lot in the JavaScript community.
- It’s fascinating to me how there’s always the 10x-er that becomes a hero of the community. https://github.com/substack has created a ridiculous number of npm packages.
- Think about
let r = await promise
aslet r = null; promise.then(rr => r = rr)
which is executed synchronously. - Instead of
hash.merge(h2)
you writeObject.assign({}, h2, hash)
. There are many unintuitive sharp edges to the language, as you learning, just googling "how to do X with JavaScript" is the best way to determine the JavaScript equivalent. - http://jsnice.org is great at parsing obfuscated JS. It tries to rename variables based on the context. Very cool.
...
is the splat operator used on objects It’s called the ‘rest’ operator.constructor
is the magic method for class initialization- Looks like function definitions within a class don’t need the
function
keyword
Puppeteer, Proxies, and Scraping
Part of this project involved scraping information the web. Here’s some tidbits about scraping that I learned:
- The node ecosystem is great for web scraping. Puppeteer is a well maintained chrome-controller package and there’s lot of sample code you can leverage to hack things together quickly.
- Websites have gotten very good at detecting scrapers. There are some workarounds to try to block bot detection, but if you are using a popular site, you will most likely be detected if you are using the default puppeteer installation.
- A common (and easy) detection method is IP address. If you are scraping from an AWS/cloud IP, you’ll be easily blocked. The way around this is a proxy to a residential IP address. Another option is to host your scraper locally on a Raspberry Pi or on your local computer.
- https://chrome.browserless.io cool way to test puppeteer scripts
- I learned a bit about web proxies. Firstly, there are a bunch of proxy protocols (SOCKS, HTTP with basic auth, etc). Different systems support different type of proxies.
Package Management
- You can’t effectively use
npm
andyarn
in the same project. Pick one or the other. Yarn is a more stable, more secure version of npm (but doesn’t have as many features / as much active development) module.exports
lets a file expose constants to others which import the file, similar to python’s import system (but with default exports). I like this compared with ruby’s "everything is global" approach. It allows the other author to explicitly define what it wants other users to access.- Npm will run pre & post scripts simply based on the name of the scripts.
import Section, {SectionGroup}
assignsSection
to the default export of the file, and imports theSectionGroup
explicitly.- If you try to import something that isn’t defined in the
module.exports
of a file you will not get an error and will instead get an undefined value for that import.
Testing
tape
is the test runner that this particular project used. It doesn’t look like it’s possible to run just a single test in a file without changing the test code to usetest.only
instead oftest
.- The "Test Anything Protocol" is interesting http://testanything.org. Haven’t run into this before. I like consistent test output across languages.
- I do like how tape tests list out the status of each individual assertion. It becomes a bit verbose, but it’s helpful to see what assertions after the failing assertion succeeded or failed.
- VS Code + node debugging is very cool when you get it configured. You need to modify your VS Code
launch.json
in order to get it to work with test files. https://gist.github.com/dchowitz/83bdd807b5fa016775f98065b381ca4e#gistcomment-2204588
Debugging & Hacking
I’m a big fan of REPL driven development and I always put effort into understanding the repl environment in a language to increase development speed. Here are some tips & tricks I learned:
- Tab twice (after inputting
ob.
) in a repl exposes everything that is available on the object under inspection. node inspect THE_FILE.js
allowsdebugger
statements to work. You can also debug remotely with chrome or with VS Code. Visual debugging is the happy path with node development, the CLI experience is poor.- You don’t need to setup variables properly in the node repl. Nice! You can just
a = 1
instead oflet a = 1
- I’ll often copy code into a live console to play around with it, but if it’s defined as
const
I need to restart the console and make sure I don’t copy theconst
part of the variable definition. That’s annoying. There’s a lot of sharp edges to the developer ergonomics. console.dir
to output the entire javascript object- Unlike
pry
you need to explicitly callrepl
after you hit a breakpoint when runningnode inspect
. Also,debugger
causes all promises not to resolve when testing puppeteer. https://github.com/berstend/puppeteer-extra/wiki/How-to-debug-puppeteer - Cool! Navigating to
about:inspect
in Chrome allows you to inspect a node/puppeteer process. list
is equivalent towhereami
. You need to execute it explicitly with paramslist(5)
_
exists like in ruby, but it doesn’t seem to work in a repl triggered by adebugger
statement._error
is a neat feature which keeps the last exception that was thrown..help
while in a repl will output a list of "dot commands" you can use in the repl.- I had a lot of trouble getting puppeteer to execute within a script executed with
node inspect
and paused withdebugger
. I’m not sure why, but I suspect it has something to do with how promises are resolved in inspect mode.
- I had a lot of trouble getting puppeteer to execute within a script executed with
- You can enable
await
in your node console via--experimental-repl-await
. This is really helpful to avoid having to writelet r; promise.then(o => r)
all of the time.
Mongo & ODMs
- You’ll want to install mongo and the compass tool (
brew install mongodb-compass
) for GUI inspection.- Running into startup problems?
tail -f ~/Library/LaunchAgents/homebrew.mxcl.mongodb-community.plist
- If you had an old version of mongo install long ago, you may need to
brew sevices stop mongodb-community && rm -rf /usr/local/var/mongodb && mkdir /usr/local/var/mongodb && brew services start mongodb-community -dv
- Running into startup problems?
- The connection string defaults to
mongodb://localhost:27017
- Mongoose looks like a well-liked JavaScript ODM for Mongo.
- You can think of each "row" (called a "document") as a JSON blob. You can nest things (arrays, objects, etc) in the blob. The blob is named using a UID, which is like a primary key but alphanumeric. You can do some fancy filtering that’s not possible with SQL and index specific keys on the blob.
- Looks like you define classes that map to tables ("schemas") but it doesn’t look like you can easily extend them. You can add individual methods to a class but you can’t extend a mongoose model class.
- It looks like a
mongoose.connection
call creates an event loop. Without closing the event loop, the process will hang. Useprocess.exit()
to kill all event loops. - Relatedly, all mongo DB calls are run async, so you’ll want to
await
them if you expect results synchronously. brew install mongodb-compass-community
gives you a GUI to explore your mongo DB. Similar to Postico for Postgres.
Open Questions
- How are event loops, like the one
mongoose
uses implemented? Is the node event loop built in Javascript or are there C-level hooks used for performance? - There are lots of gaps in the default REPL experience. Is there an improved repl experience for hacking?
- Do Blitz/RedwoodJS/others materially improve the server side JS experience?
- What killer features does mongodb have? How does it compare to other document databases? Is there a real reason to use document databases now that most SQL databases have a jsonb column type with an array of json operators built in?