Automatically Download and Rename Your LabCorp PDFs
Tags: automation, health, software • Categories: Learning, Software, Uncategorized
Over the last couple of years, I’ve incrementally tried to improve my understanding of my health. Part of this is understanding my lab work and collecting the data in a way that I can personally understand and analyze it. In order to do this, I needed to download all of my past LabCorp PDFs. My goal in downloading these PDFs is to funnel them into my custom GPT, which will extract the results into a nicely formatted CSV that I can then copy and paste into Google Sheets. This enables me to easily graph, chart, and otherwise analyze my blood work results.
Given the lack of a straightforward way to download all PDFs from the LabCorp website, I created scripts to automate the downloading and renaming of the PDFs for easier analysis and storage. This process is detailed below.
Download All LabCorp PDF
First, log in to your labcorp online account and navigate to the results page.
If you view the network requests, you’ll notice that this API is being called https://portal-api.patient.cws.labcorp.com/protected/patients/current/linkedAccounts/results/headers/all
.
{
"id": 3779450413,
"accountName": "Some Provider",
"orderingProviderName": "J Doe",
"dateOfService": "2023-10-05T21:30:00.000Z",
"reportDate": "2023-10-06T15:10:00.000Z",
"dashboardCategory": "recently_viewed",
"weHealthEligible": false,
"pid": 17651317,
"patientName": "Mike Bianco",
"hasMinimumDataForPdfRetrieval": true,
"isDetailAvailable": true
}
You can extract the ID field that you can extract and plug into a download URL to grab the PDF:
https://portal-api.patient.cws.labcorp.com/protected/patients/current/results/3787290962/pdf
If you copy the raw JSON from the API call and then execute the following shell script, you’ll download all of the PDFs.
pbpaste | jq -r 'map(.id | "https://portal-api.patient.cws.labcorp.com/protected/patients/current/results/" + (. | tostring) + "/pdf")[]' | while read url; do
open -a Safari "$url"
sleep 0.5
done
Rename PDFs
However, they are all named something useless. Copy all the PDFs to a dedicated folder and then run this script to extract the collection date and rename the files.
You’ll need to brew install pdftotext
for this to work.
rename_files_with_date_collected() {
folder_path=$1
for pdf in "$folder_path"/*.pdf; do
text=$(pdftotext "$pdf" -)
date_collected=$(echo "$text" | grep -Eo 'Date Collected: [0-9]{2}/[0-9]{2}/[0-9]{4}' | head -n 1 | cut -d' ' -f3)
if [[ -z $date_collected ]]; then
echo "Error: No date detected in $pdf"
continue
fi
formatted_date=$(echo $date_collected | sed 's/\//./g')
base_new_filename="${pdf:h}/LabCorp-$formatted_date.pdf"
new_filename="$base_new_filename"
counter=1
if [[ -f $new_filename ]]; then
new_filename="${base_new_filename%.pdf}-$counter.pdf"
while [[ -f $new_filename ]]; do
((counter++))
new_filename="${base_new_filename%.pdf}-$counter.pdf"
done
fi
mv "$pdf" "$new_filename"
done
}
rename_files_with_date_collected ~/path/to/folder
What’s interesting about this script is it could be easily modified to rename any sort of PDFs in a folder based on their text content. This is super helpful when mass downloading poorly named PDFs from online services.