Aggregating Data by Month in Google Sheets
Tags: finance • Categories: Uncategorized
Certain operations that are easy in SQL are hard in google sheets / excel. I ran into one of these: I had a simple problem: I had some time series data (dividend payouts from a financial institution) and I wanted to aggregate the data by month. In SQL, this is a simple GROUP BY
, in sheets it’s not that easy.
Create a separate sheet to aggregate data, then group
- Filter based on the type of transaction
- Look at each date and normalize it to the last day of the month
=ARRAYFORMULA(EOMONTH(B:B, 0))
(this is now column E) - Get a list of unique months (i.e. end-of-month keys)
=UNIQUE(E:E)
(this is now column F) - Filter the normalized list of months by the unique month keys and sum the values
ARRAYFORMULA(SUMIFS(C:C, E:E, F1, A:A,$H$1))
Here’s all of the additional rows, tab separated, so you can copy paste.
=ARRAYFORMULA(EOMONTH(B:B, 0)) =UNIQUE(E:E)= ARRAYFORMULA(SUMIFS(C:C, E:E, F1, A:A,$H$1))
This works, but it’s super messy (adds a bunch of garbage rows only used for data normalization). You could probably write a more complicated query to materialize a dataset with the EOMONTH
data, but it would destroy readability.
There’s got to be a better way!
QUERY
to run to SQL on google sheets
UseThere’s a interesting QUERY
function in google sheets that allows you to write SQL-like syntax to query your sheet. It’s pretty powerful and perfect for this use case.
For instance, here’s how to aggregate dividends by month in a single formula (with nice column labels!):
=query(A:C, "SELECT MONTH(B) + 1, SUM(C) WHERE A = 'Dividend Received' GROUP BY MONTH(B) LABEL MONTH(B) + 1 'Month', SUM(C) 'Dividend Total'")
Have multiple entries like this and want to sum them up, without drag-copying your formula across rows?
=ARRAYFORMULA(IF(E2:E<>"", E2:E + B2:B, ""))
This function applies the formula to each row, like copy-dragging a formula. Nice! The IF
ensures that if no value exists, 0
s won’t be added to blank rows.
Other learnings on google sheets
Here some misc learnings that were new to me:
- You cannot reference columns by name in the google sheet visualization language
- You can’t set conditional formatting via a formula
SUBTOTAL
is the only function which respects data filters, but you can’t apply any additional filters or transformation to the input intoSUBTOTAL
which makes it mostly useless.- Sometimes, double clicking on the bottom right corner doesn’t work to copy a formula across an entire column. Here’s what does.
- Click on the cell you would like to copy the formula from
- CMD+SHIFT+down array
- CMD+D to copy the formula