Finding That One Song You Forgot Through Spotify Streaming History

April 27, 2020

One night I was listening to Spotify Radio while doing homework and, through sheer luck and algorithmic grace, I was handed a set of tracks I absolutely loved. I saved almost every one.

The next day, I was playing through the list, only to realize I had forgotten one! To make matters worse, the tune was stuck in my head, and so while I knew exactly what it sounded like, I had no idea what the title or artist was. The only other relevant information I had was a rough idea of what time I was listening to it (the early morning of February 9th, around 2-3 AM).

I went back through my Spotify history, but I couldn't find it. Sadly, Spotify only stores the last 50 tracks in the History tab.

At this point I thought this song would be lost forever. However...

Spotify Download Your Data

Could GDPR save me?

Spotify Download Your Data

To comply with GPDR, Spotify lets you download all the data they keep on you, which happens to include a full year of streaming history (streaming means any song, podcast, or video played through Spotify - not just radio history). The song I want should be in here somewhere!

Sidenote: I still find it incredibly strange that you can watch full, 1080p quality videos on Spotify. Not just music videos but also like...video game let's plays and mini documentaries. Who uses this?

I applied and waited for my link email. It only took a few days. Here's the files it came with:

FamilyPlan.json
Follow.json
Identity.json
Payments.json
Playlist1.json
SearchQueries.json
Userdata.json
YourLibrary.json
StreamingHistory[0-5].json

For info on each, check out Understanding My Data from Spotify. There's a lot of useful data here that isn't just about streaming history; you can also see every song you've saved, every playlist you've made and all the tracks they hold, who you follow, and your search query history.

For today, though, I'm only using streaming history. There's 6 of these files, and it turns out they go in chronological order: StreamingHistory0's first track was played in February 2019, while StreamingHistory5 begins in January 2020. So, we finally have a file to work with.

Unfortunately, it's 31,352 lines long!

Filtering with jq

Each piece of play history is represented in JSON:

{
    "endTime" : "2020-01-09 15:15",
    "artistName" : "Noisestorm",
    "trackName" : "Crab Rave",
    "msPlayed" : 161280
}

If we consider that each track takes up 6 lines of the file, then 31352/6 ~= 6,270 tracks over a period from January 9 (first entry) to February 12 (last entry). 6,270 tracks over a period of 43 days is roughly 146 tracks a day or 6 songs an hour, which is pretty accurate (I use Spotify a lot).

6,270 tracks is still a lot, though, and we can quickly cut this down using our good friend jq. jq lets you easily splice and dice through JSON files without needing to write custom scripts. I never used it before, and I knew this would be the perfect opportunity to try, so I read the manual.

> cat StreamingHistory5.json | jq 'map(select(.endTime | test("2020-02-09")))'

A few things are going on at once here (and it took me a lot more than one try to get this to work properly). First, some definitions:

For any filter x, map(x) will run that filter for each element of the input array, and return the outputs in a new array.
The function select(foo) produces its input unchanged if foo returns true for that input, and produces no output otherwise.
test(val), test(regex; flags) Like match, but does not return match objects, only true or false for whether or not the regex matches the input.

Effectively what we are doing is filtering out any track object that doesn't match our criteria (endTime contains "2020-02-09"). This gives us a new array of tracks. We can count how many using length:

> cat StreamingHistory5.json | jq 'map(select(.endTime | test("2020-02-09"))) | length'
> 297

297! A 95% decrease, which is great, but still way too many songs to deal with. I did mention that I heard this song during the night, and so we can try to see if we can filter by hours:

> cat StreamingHistory5.json | jq 'map(select(.endTime | test("2020-02-09 02"))) | length'
> 59

Now we're looking for end times that contain "2020-02-09 02", which limits us only to songs that were listened to at 2 AM.

I can listen to 59 songs, but right now I only have an artist name and a track name. It would take a long time to type all of that into Spotify, find a track, play it, and determine if it's the song I want or not.

It would be better if I could just click on the song to play it.

Markdown with jq

I'm not sure if anyone has actually done this before, but I figured the next step would be to convert my JSON object into a Markdown text file. I could have a list of songs, and each song could be wrapped in a hyperlink that opens Spotify.

Unfortunately, this JSON file doesn't give us any track ID or URL - only the artist and track name. We're going to have to use Spotify search to find the song before we can play it back.

From previous endeavors I knew that Spotify had a URI handler for spotify:search:, so you could, for example, click on spotify://search:Another One Bites the Dust to bring up the Spotify window with search results.

Using what we know, we can massage the original objects into the format that we want (still using only jq):

> cat StreamingHistory5.json | jq 'map(select((.endTime | test("2020-02-09 02:59")))) | map({time: .endTime, song: (.trackName + " - " + .artistName), link: (@uri "spotify://search:\\(.trackName + " " + .artistName)")})'

> [
    {
        "time": "2020-02-09 02:59",
        "song": "Jerry Was A Race Car Driver - Primus",
        "link": "spotify://search:Jerry%20Was%20A%20Race%20Car%20Driver%20Primus"
    }
]

The first map acts as a filter, getting rid of songs that don't match the criteria, while the second map acts as a...map, transforming keys from the old object to the new one.

(I've narrowed the time criteria to 2:59 just for this demo).

One more map later:

map("- \\(.time) [ \\(.song) ]( \\(.link) \)")

And we get:

[
    "- 2020-02-09 02:59 [Jerry Was A Race Car Driver - Primus](spotify://search:Jerry%20Was%20A%20Race%20Car%20Driver%20Primus)",
    "- 2020-02-09 02:59 [Blitzkrieg Bop - 2016 Remaster - Ramones](spotify://search:Blitzkrieg%20Bop%20-%202016%20Remaster%20Ramones)",
    "- 2020-02-09 02:59 [Giant - Nebula](spotify://search:Giant%20Nebula)",
    "- 2020-02-09 02:59 [Lies (feat. Luciana) - KSHMR](spotify://search:Lies%20(feat.%20Luciana)%20KSHMR)"
]

Note that jq's output still contains opening and closing brackets for the array, and puts each line in quotes. If we want to pipe this to a file, we need to get rid of those by using jq's -r argument (raw output) and .[] (which tells jq to separate objects with newlines):

> jq -r 'map(select((.endTime | test("2020-02-09 02:59")))) | map({time: .endTime, song: (.trackName + " - " + .artistName), link: (@uri "spotify://search:\(.trackName + " " + .artistName)")}) | map("- \\(.time) [ \\(.song) ]( \\ (.link) \)") | .[]'

- 2020-02-09 02:59 [Jerry Was A Race Car Driver - Primus](spotify://search:Jerry%20Was%20A%20Race%20Car%20Driver%20Primus)
- 2020-02-09 02:59 [Blitzkrieg Bop - 2016 Remaster - Ramones](spotify://search:Blitzkrieg%20Bop%20-%202016%20Remaster%20Ramones)
- 2020-02-09 02:59 [Giant - Nebula](spotify://search:Giant%20Nebula)
- 2020-02-09 02:59 [Lies (feat. Luciana) - KSHMR](<spotify://search:Lies%20(feat.%20Luciana)%20KSHMR)

Finally, let's give it a header, by simply adding an array containing ["# Songs", " "] to the front of the results:

> jq -r 'map(select((.endTime | test("2020-02-09 02:59")))) | map({time: .endTime, song: (.trackName + " - " + .artistName), link: (@uri "spotify://search:\\(.trackName + " " + .artistName)")}) | ["# Songs", " "] + map("- \\(.time) [ \\(.song) ]( \\ (.link) \)") | .[]'

# Songs

- 2020-02-09 02:59 [Jerry Was A Race Car Driver - Primus](spotify://search:Jerry%20Was%20A%20Race%20Car%20Driver%20Primus)
- 2020-02-09 02:59 [Blitzkrieg Bop - 2016 Remaster - Ramones](spotify://search:Blitzkrieg%20Bop%20-%202016%20Remaster%20Ramones)
- 2020-02-09 02:59 [Giant - Nebula](spotify://search:Giant%20Nebula)
- 2020-02-09 02:59 [Lies (feat. Luciana) - KSHMR](<spotify://search:Lies%20(feat.%20Luciana)%20KSHMR)

And now we pipe the end command to a file:

> jq -r 'map(select((.endTime | test("2020-02-09 02:59")))) | map({time: .endTime, song: (.trackName + " - " + .artistName), link: (@uri "spotify://search:\\(.trackName + " " + .artistName)")}) | ["# Songs", " "] + map("- \\(.time) [ \\(.song) ]( \\ (.link) \)") | .[]' > test.md

...presto!

Side by side Markdown and Spotify

(I found the song, by the way. It was Shift by ANG).

Tagged #shell.

Want more where that came from? Sign up to the newsletter!