Automatically Download and Rename Your LabCorp PDFs

August 29, 2024

Tags: automation, health, software • Categories: Learning, Software, Uncategorized

Table of Contents

Over the last couple of years, I’ve incrementally tried to improve my understanding of my health. Part of this is understanding my lab work and collecting the data in a way that I can personally understand and analyze it. In order to do this, I needed to download all of my past LabCorp PDFs. My goal in downloading these PDFs is to funnel them into my custom GPT, which will extract the results into a nicely formatted CSV that I can then copy and paste into Google Sheets. This enables me to easily graph, chart, and otherwise analyze my blood work results.

Given the lack of a straightforward way to download all PDFs from the LabCorp website, I created scripts to automate the downloading and renaming of the PDFs for easier analysis and storage. This process is detailed below.

Download All LabCorp PDF

First, log in to your labcorp online account and navigate to the results page.

If you view the network requests, you’ll notice that this API is being called https://portal-api.patient.cws.labcorp.com/protected/patients/current/linkedAccounts/results/headers/all.

{
  "id": 3779450413,
  "accountName": "Some Provider",
  "orderingProviderName": "J Doe",
  "dateOfService": "2023-10-05T21:30:00.000Z",
  "reportDate": "2023-10-06T15:10:00.000Z",
  "dashboardCategory": "recently_viewed",
  "weHealthEligible": false,
  "pid": 17651317,
  "patientName": "Mike Bianco",
  "hasMinimumDataForPdfRetrieval": true,
  "isDetailAvailable": true
}

You can extract the ID field that you can extract and plug into a download URL to grab the PDF:

https://portal-api.patient.cws.labcorp.com/protected/patients/current/results/3787290962/pdf

If you copy the raw JSON from the API call and then execute the following shell script, you’ll download all of the PDFs.

pbpaste | jq -r 'map(.id | "https://portal-api.patient.cws.labcorp.com/protected/patients/current/results/" + (. | tostring) + "/pdf")[]' | while read url; do
  open -a Safari "$url"
  sleep 0.5
done

Rename PDFs

However, they are all named something useless. Copy all the PDFs to a dedicated folder and then run this script to extract the collection date and rename the files.

You’ll need to brew install pdftotext for this to work.

rename_files_with_date_collected() {
  folder_path=$1

  for pdf in "$folder_path"/*.pdf; do
    text=$(pdftotext "$pdf" -)
    date_collected=$(echo "$text" | grep -Eo 'Date Collected: [0-9]{2}/[0-9]{2}/[0-9]{4}' | head -n 1 | cut -d' ' -f3)

    if [[ -z $date_collected ]]; then
      echo "Error: No date detected in $pdf"
      continue
    fi

    formatted_date=$(echo $date_collected | sed 's/\//./g')
    base_new_filename="${pdf:h}/LabCorp-$formatted_date.pdf"
    new_filename="$base_new_filename"
    counter=1

    if [[ -f $new_filename ]]; then
      new_filename="${base_new_filename%.pdf}-$counter.pdf"
      while [[ -f $new_filename ]]; do
        ((counter++))
        new_filename="${base_new_filename%.pdf}-$counter.pdf"
      done
    fi

    mv "$pdf" "$new_filename"
  done
}

rename_files_with_date_collected ~/path/to/folder

What’s interesting about this script is it could be easily modified to rename any sort of PDFs in a folder based on their text content. This is super helpful when mass downloading poorly named PDFs from online services.

Download All LabCorp PDF

Rename PDFs

Keep in Touch