Skip to content

Categorizing Personal Email Contacts with AI

For years I’ve wanted to send out a yearly update (digital Christmas card of sorts) email to friends, both new and old. One of the problems is I don’t have a good address book that indicates who is a personal or work contact. I’ve been playing with datasette and I thought it would be fun to index all of my past emails and have AI categorize if they are a work, personal, or vendor contact. Importing Emails The first step is importing your emails from gmail into a local database. The easiest way to do this is to use a combination of Google takeout and a datasette plugin (note that it’s easiest to use my fork of the datasette plugin, the original package is dead)…

Continue Reading

Using ChatGPT to Convert LabCorp PDFs into a Google Sheet

The last couple of years I’ve monitored my food, blood levels, etc more closely. It’s a topic for another blog post, but it’s been really interesting to watch how key blood levels have changed over time and reacted to changes in my diet and exercise. I use lab core for all my blood work, and I’ve been relatively happy with them. However, their online portal does not allow you to download a CSV or Excel document of your blood work over time. They only offer a PDF download. This makes it challenging to track your levels over time and understand what’s changing and why due to lifestyle changes. Enter ChatGPT. With the latest vision models, you can use it to extract tabular data from the unstructured PDF that LabCorp provides you…

Continue Reading

Scraping the web with OpenAI

One of the really interesting LLM use cases is extracting structured data from unstructured data. In the old days (6mo ago), extracting structured data from web pages required custom xpath or css selectors for each website that constantly broke as the host changed their page structure. For instance, extracting the price of a house on redfin. This is why Plaid (and similar competitors) break so often: many of their integrations "screen scrape" which means they need a team of people updating xpath and css selectors on various bank sites (TreasuryDirect, for example, is broken constantly). I built a open source database of venture capital firms that used this approach to extract team member information from each firm…

Continue Reading