Learning Clojure by Automating an RSS Reader

I've been working on revamping how I consume information. Most of my information consumption has been moved to RSS feeds, but I can't keep up with the number of articles in my feeds. When I take a look at my reader I tend to get overwhelmed and spend more time than I'd like to trying to "catch up" on information I generally was consuming out of curiosity.

Not good.

I want articles to be automatically marked as read after they are a month old to eliminate the feeling of being "behind". This is a perfect little project to learn a programming language that's looked interesting for a while!

Building a small project in a new language or technology is the best way to learn. While I was building this tool, I documented what questions I was asking, answers to these questions, and what articles and resources I found helpful.

Posts like this have been interesting to me, hopefully this is a fun read for others!

What do I want to build?

I want to build a Clojure script for FeedBin that will:

  • Inspect all unread articles
  • If the publish date is more than two weeks in the past, mark the article as unread
  • Automatically run every day

Let's get started!

Resources

Here are some helpful blogs & tutorials I used while learning:

Also, I always try to grab a couple of large open-source repos to look at when I'm learning a new language. Here are some places I searched:

Some repos I found interesting:

Syntax & Structure

Now that I have some browser tabs open with documentation, let's start learning!

  • How do I install this thing? https://clojure.org/guides/getting_started => brew install clojure/tools/clojure
  • Going through the "Learn X in Y" guide, some interesting takeaways:
    • Clojure is built on the JVM and uses Java classes for things like arrays.
    • Code in Clojure is essentially a list-of-lists. A list is how you execute code: the first element is the method name, and then arguments separated by spaces. This feels very weird at first, but it's a really powerful concept. Simple made Easy explains the philosophy behind this a bit.
    • "Quoting" (prefacing a list with a single quote) prevents the list from executing. This is helpful for defining a list, passing code as a data structure that can be mutated later on.
    • Sequences (Arrays/Lists) seem to have some important different properties from vectors. I need to understand this a bit more.
    • When you define a function it doesn't get a name. You need to assign it (def) to a variable to give it a name.
    • The [] in a function definition is the list of arguments.
    • There are lots of ways to create functions: fn, defn, def ... #()
    • multi-variadic function is a new word for me! It's a function with a variable number of arguments. Looks like you can define different execution paths depending on the arguments, kind-of like Elixir's pattern matching.
    • [& args] is equivalent to (*args) in ruby
    • The beginner (me!) can treat ArrayMap and HashMap as the same.
    • Keywords == ruby symbols
    • The language looks to execute from the inside out, and the composition of functions is done via spaces not commas, parens, etc.
    • Looks like everything is immutable in Clojure.
    • Everything is a function. So much so, that even basic control flow is managed the same way as a standard function.
    • Looks like "STM" is an escape hatch if you need to store state. Similar to Elixir's process state.
  • The Clojure community is big on "repl driven development", but what exactly do they mean? How is that different from binding.pry in a ruby process to play around with code?
    • Looks like it's not that different. Some nice editor integrations make things a bit more clean, but more or less the same as opening up rails console with pry enabled.
  • I've always disliked the ability to alias a module or function to a custom name. It makes it much harder for newcomers to the codebase to navigate what is going on. Looks like this is a pretty common pattern in Clojure, the require at the top of a file can setup custom aliases for all functions.
  • "forms" have been mentioned a couple of times, but I still don't get it. What is a form?
  • I've heard that Clojure is a Lisp. What is a "lisp"? https://en.wikipedia.org/wiki/Lisp_(programming_language)
    • There was an original LISP programming language, but "a lisp" is a language patterned after the original LISP
    • Seems like the unique property of a lisp-style language is code is essentially is a linked list data structure. Since all code is a data structure, you can define really interesting macros to modify your source code.
    • Another property is the parentheses-based syntax.
    • It's interesting to look at the different lisp styles available. I feel like the only language that is popular today is Clojure.
    • Sounds like immutability is unique to Clojure and isn't a core structure other lisps.

I think I know just enough to start coding.

Coding in Clojure

Here's the learning process which generated the final source code:

  • Let's define the namespace and get a "Hello World" to make sure I have the runtime executing locally without an issue. 184408626bb41b87d53f9b0bb5485a8e9201d8d5
  • Ok, now let's outline the logic we'll need to implement. 7e018b05ff8ad925ef2bfe9c56c4a702dce4c3d0
  • Now, let's pick a HTTP library and figure out how to add it as a dependency.
  • https://clojars.org looks like the most popular package repository. It doesn't seem like there's any download/popularity indicator that you can sort by. Bummer. Hard to figure out what sort of HTTP library I should use.
  • Looks like project.clj is a gemspec type definition file. Metabase's http library is clj-http. Let's use that. We'll also need to figure out how to setup this dependency file. https://github.com/metabase/metabase/blob/master/project.clj#L63
  • https://github.com/technomancy/leiningen is linked in the project.clj files I've seen. It's listed as a dependency manager on the clj-http library: https://clojars.org/clj-http. Let's install it via brew install leiningen.
  • lein new feedbin and mv ./feedbin./ ./ to setup the project structure. Looks like lein will help us with dependencies and deployment. b0b4022618abac840af6679f900584d04de510c1
  • There's this skip-aot thing in the main: definition which I don't understand. In any case, if I stuff a defn -main in the file for the namespace defined in main lein run works! 764d7a1e2a537d61b036df4229a2c96671725dd8
    • It looks like this ^: syntax is used often. What is it?
  • Ok, let's copy our logic outline from the other file we were working on over to the src/feedbin/core.clj and try to add our HTTP dependency. Added [clj-http "3.10.0"] to the dependency list in project.clj, lein run seemed to pull down a bunch of files and run successfully.
  • Now, let's pull the FeedBin variable from the ENV and store it to a var. Looks like you have to wrap let in parens, and include commands that rely on the var within the scope of the parens. I could see how this would force you to keep methods relatively short. 6f1f8099ffd0ed5f997be93685d18d1c574efb6b
  • Let's hit the API and get all unread entries and store them in a var. Looks like cheshire is a popular JSON decoder, let's use that. It looks like let is only when you want temporary bindings within a specific scope. Otherwise, you should use def to setup a variable. 5b63cd289052d9fcebec2cb2965d598927b0616a
  • Convention is - for word separation, not _ or camel case.
  • Let's refactor the getenv to use def. Much better! a6a95a1e4703c07e76ecce32b56b6b0f1903acca
  • Time to select entries that are two months old. A debugger is going to be helpful here to poke at the API responses. Looks like debugger is the pry equivalent. I had trouble getting this to work and deep-dived on this a bit:
    • (pst) displays the stacktrace associated with the last exception. This is not dependent on clj-debugger
    • Looking closer at clj-debugger it has ~no documentation and hasn't been updated in nearly two years. Is there a better option? Doesn't look like it
    • (require 'feedbin.core :reload-all) seems like the best way to hot reload the code in a repl. Then you can rerun (feedbin.core/-main)
    • Ah, figured it out! (break) on it's own won't do anything. It needs an input to debug. (break true) works. You need to run this in lien repl for it to work.
    • As a side note, I've found the REPL/debugging aspect of learning a new programming language to be really important. Languages that don't have great tooling and accessible documentation around this make it much harder for newcomers to come up to speed. The REPL feedback loop is just so much faster and in developer tooling speed matters.
  • I was able to extract the published date, now I just need to do some date comparison to figure out which entries are over a month old. ca16f54f66a39753933168c3f8deac636144ca47
  • Now to mark the entries as "read" (in feedbin this is deleting the entries). Should be able to just iterate through the ID list and POST to the delete endpoint. I started running into rate limiting errors as I was testing this.
  • # turns a string into a regex, but appears to do much more. Looks like it's a shorthand for creating lambda. https://clojure.org/guides/weird_characters
  • macroexpansion is an interesting command to help with debugging.
  • With the rate limit errors gone, I can finally get this working for good. I tried passing in the article IDs as a comma-separated list as a query string and it didn't work. I need to send this data in as a JSON blob. 166ea49439ed690ff08c8fd987530b170b9bb80e
  • Got the delete call working. You can pass a hash directly to clj-http and it'll convert it into JSON. Nice. 63ac8bf1d4fd969326fffa9ad7b50ad1f0a4b56d

Great! We have the script working. Now, let's deploy it.

Clojure Deployment Using AWS Serverless

I have a friend who is constantly talking about how awesome serverless is (i.e. AWS Lambda). I also remember hearing that you can setup cron-like jobs in AWS that hit a lambda. Let's see if that's the case and if we can get this script working on lambda.

Some things we'll need to figure out:

  1. How/where do I specify that an endpoint should be hit every X hours?
  2. How do I specify where the entrypoint is for the lambda function?
  3. How do we specify environment variables?

Notes

  • I jumped into AWS lambda dashboard and created a function named "Mark-Feedbin-Entries-As-Read" with Java 11. It looks like the crazy AWS permission structure is generated for me.
  • I added the com.amazonaws/aws-lambda-java-core package and it looks like I need to run gen-class to expose my handler. What is gen-class? It generates a .class file when compiling, which I vaguely remember is a file which is bundled into the .jar executable. Looks like aot compilation needs to be enabled as well. Still need to understand what aot is.
  • I ran lein uberjar and specified feedbin.core::handler as my handler. Created a test event with "testing" as the input. Used the -standalone jar version that was generated.
  • Looks like environment variables can be setup directly in the Lambda GUI.
  • "Cron jobs" are setup via CloudWatch events. What is CloudWatch? It's AWS's monitoring stack. Strange that this is the recommended way to setup cron jobs. I would have thought there was a dedicated service for recurring job schedules.
  • "Serverless" (looks like a CDK-like YML configuration syntax for AWS serverless) makes it look easy to deploy a lambda which executes on a schedule, but doesn't indicate how it's actually managed in AWS in the blog post.
  • Aside: It's interesting the more you dig into AWS, the more it feels like a programming language. Each of the services is a library and the interface to configure them in yaml.
  • It looks like "Amazon EventBridge" is the new "CloudWatch Events". Looks like we can setup a rule which triggers a lambda function at a particular rate.
  • Neat, you can setup a rule directly with the AWS Lambda GUI. Use a EventBridge trigger with rate(1 day) to trigger the function every day. Really easy!
  • I checked on it the next day and it's failing. How can we inspect the request? It's probably failing due to the input data being some sort of JSON object vs a simple string that I tested with. Here's what I found: you can inspect the logs, use CloudTrail to view an event, enable X-Ray tracing, and send failed events to a dead letter queue. I enabled all of this stuff: my end goal to inspect the event JSON passed the lambda to determine how to fix it.
  • Ah! After a bit more digging, if you find the event in CloudTrail there's a "View event" button that will give you the JSON output. I can then copy the JSON into the test event in the configuration for the lambda and run it there to get helpful debugging information. Feels a bit primitive, but it works. I wonder how you would run the function and locally and write integration tests using example AWS JSON?
  • Looks like the function signature for my handler is incorrect. When handling events, the handler accepts two arguments [Object com.amazonaws.services.lambda.runtime.Context]. This fixed the issue! 8520e8a319bd5d41a67a01f9517ce4cf559ab381

Resources:

Open Questions

Here's a list of questions that I wasn't able to answer during my learning process:

  • How can you parallelize operations in Clojure?
  • How easy is deployment?
  • How does interop with Java work?
  • Is there a rails-like web stack?
  • Is there a style guide?