2  Albums, dates and popularity

In this part I’m going to work with the /albums/{id} API section which return all the information Spotify has of an album. I use this data to answer questions like:

Remember that the analysis of the first part was done with a small sample of artist data, so far we have the following partial conclusions:

2.1 API

A small summary of the functionalities that I will use:

GET /albums/{id}/albums?market=US
Parameter Type Description
id (required) string The ID of the album

2.2 Libraries

Import the libraries I will need:

from pandas import json_normalize
import requests
import plotly.express as px
import plotly.io as pio
pio.renderers.default = "plotly_mimetype+notebook_connected"
import base64
from IPython.display import Image, display

def mm(graph):
    graphbytes = graph.encode("ascii")
    base64_bytes = base64.b64encode(graphbytes)
    base64_string = base64_bytes.decode("ascii")
    display(Image(url="https://mermaid.ink/img/" + base64_string))

This is the section of the authentication and generation of the token that was already explained in the first section

CLIENT_ID = '81e800e81ecf4997b5b9fb12efeb3ff2'
CLIENT_SECRET = '0e4364f440f148779d8a9f17976ecf1b'
def get_token():
    url = 'https://accounts.spotify.com/api/token'
    auth_response = requests.post(url, {
        'grant_type': 'client_credentials',
        'client_id': CLIENT_ID,
        'client_secret': CLIENT_SECRET,
    })
    if auth_response.status_code != 200:
        raise Exception('Error getting token')
    else:
        auth_response_data = auth_response.json()
        return auth_response_data['access_token']
access_token = get_token()
header = {
    'Authorization': 'Bearer {token}'.format(token=access_token),
    'accept':'application/json'
}
base_url = 'https://api.spotify.com/v1/'

To obtain a dataframe with the albums of an artist, a few previous steps must first be carried out:

1- Ask the user for the artists. 2- Get the artist ID 3- Extract all the albums of the artist.

Steps one and two were already explained in the first section:

#ask for user input for artist name
artist_name = input('Enter artist name: ')
print('the artist name is: ', artist_name)
#get artist id
artist_id = requests.get(base_url + 'search?q={}&type=artist'.format(artist_name), headers=header).json()['artists']['items'][0]['id']
the artist name is:  Rush

2.3 Workflow

The workflow from when the user enters the name of the artist, the request to get all de details and then saved in the dataframes is represented in the following diagram:

mm("""
graph TD
    A[user_input] -->|Search| B(GET item 0)
    B --> |save artist_id| D{GET request}
    D -->|/artist/artist_id/albums| E[df album list]
    D --> F[df albums details]
    F --> E
    E --> |/artist/album_id/| F
""")

2.4 Albums

In order to obtain the albums of an artist, I will use the /artists/{id}/albums API section. In the next part I use the API to get the info and a simple while loop for pagination. I will use the limit parameter to limit the number of albums per request to 50 and the offset parameter to move through the pages. I will also use the include_groups parameter to only get the albums and not the singles or compilations.

The structure of the json object delivered by /artists/{id}/albums can be found at this link. Link

#extract all albums of artist
r_albums = requests.get(base_url + 'artists/' + artist_id + '/albums?market=US', headers=header, params={'limit':50, 'include_groups':'album'})
r_albums=r_albums.json()
df_albums=json_normalize(r_albums['items'])
#get next page
while r_albums['next']:
    r_albums = requests.get(r_albums['next'], headers=header)
    r_albums=r_albums.json()
    df_albums=df_albums.append(json_normalize(r_albums['items']))
df_albums=df_albums.drop(['album_type',
                          'artists',
                          'href',
                          'images',
                          'release_date_precision',
                          'external_urls.spotify',
                          'uri',
                          'type'],axis=1)
df_albums['Release Date'] = df_albums['release_date'].apply(lambda x: x[:4])
df_albums.head(5)
album_group id name release_date total_tracks Release Date
0 album 2PBaIv21OWCmecNenZionV Moving Pictures (40th Anniversary Super Deluxe) 2022-04-15 26 2022
1 album 06hsxtm7Y1gDM5sNliCD5d Permanent Waves (40th Anniversary) 2020-05-29 18 2020
2 album 5G0G9TLLWr8n1abpY4ihmy Hemispheres (40th Anniversary) 2018-11-16 14 2018
3 album 3bMJQ8LQWi42IAhVPP0M9O A Farewell To Kings (40th Anniversary Deluxe E... 2017-09-01 26 2017
4 album 6q4SMJ8ggxBVrCzPSnDl7c 2112 (40 Anniversary) 2016-12-09 19 2016

Despite having used include_groups, Spotify’s response contains some live albums, anniversary editions and others that generate noise in the data. Since my goal is to get only the studio albums, the first step is to delete the albums I don’t need.

#delete live albums
df_albums=df_albums[~df_albums['name'].str.contains('Live|Anniversary|Remix|Tour|Stage|Retrospective|Chronicles|R30|A Show Of Hands|Closer')]
#loower case all names
df_albums['name']=df_albums['name'].str.lower()
df_albums.reset_index(drop=True, inplace=True)
df_albums.head(5)
album_group id name release_date total_tracks Release Date
0 album 744i0LypfMwHHrKhzsqAx0 clockwork angels 2012-06-11 12 2012
1 album 7hgcHQbB7xYr75qPPulfro snakes & arrows 2007-04-27 13 2007
2 album 0mT6ezOOTIUucAF9csghFE feedback 2004-06-29 8 2004
3 album 5fwkYtHrckROAs4ALRJ2Cz test for echo (2004 remaster) 1996-09-06 11 1996
4 album 6JNHWbFco4bnRP5ybKGriN counterparts (2004 remaster) 1993-09-28 11 1993

2.5 Popularity

In order to get the popularity of every album, I use the /albums/{id} API section and to concatenate in the dataframe.

#get the popularity of every album
df_albums['popularity'] = df_albums['id'].apply(lambda x: requests.get(base_url + 'albums/' + x, headers=header).json()['popularity'])
df_albums.head(5)
album_group id name release_date total_tracks Release Date popularity
0 album 744i0LypfMwHHrKhzsqAx0 clockwork angels 2012-06-11 12 2012 37
1 album 7hgcHQbB7xYr75qPPulfro snakes & arrows 2007-04-27 13 2007 34
2 album 0mT6ezOOTIUucAF9csghFE feedback 2004-06-29 8 2004 28
3 album 5fwkYtHrckROAs4ALRJ2Cz test for echo (2004 remaster) 1996-09-06 11 1996 33
4 album 6JNHWbFco4bnRP5ybKGriN counterparts (2004 remaster) 1993-09-28 11 1993 41

2.6 Graphs

The first graph shows the number of albums per year and the number of tracks per album.

#group by name and release date
df_tracks_and_year=df_albums.groupby(['name']).agg({'total_tracks':'first','Release Date':'first', 'popularity':'first'})
#sort by release date
df_tracks_and_year=df_tracks_and_year.sort_values(by=['Release Date'])
#reset index
df_tracks_and_year.reset_index(drop=True, inplace=True)
#number of albums per year
fig = px.scatter(df_tracks_and_year,
                 x='Release Date',
                 y='total_tracks',
                 color='Release Date',
                 color_continuous_scale='Viridis',
                 marginal_x="histogram",
                 labels={'x':'Release year', 'y':'Number of Tracks'},
                 template="ggplot2" )
fig.update_layout(title='Number of Tracks per Album',
                  width=700,
                  height=600)
fig.show()
  • Does the era affect the amount of tracks that an album had?

R:// Yes! The number of tracks per album has increased over time. The first album had 8 tracks and the last one 12 tracks.

#sotring by release date
df_albums=df_albums.sort_values(by=['Release Date'], ascending=True)
#heatmap of popularity, release date and number of tracks
fig = px.density_heatmap(df_albums,
                         x="Release Date",
                         y="popularity",
                         z="total_tracks",
                         #histfunc="max",
                         color_continuous_scale="Blues",
                         labels={'popularity':'Popularity', 'Release Date':'Release Date', 'total_tracks':'Number of Tracks'},
                         template="ggplot2")
fig.update_layout(title='Heatmap: Release date, popularity and number of tracks per album')
fig.layout['coloraxis']['colorbar']['title'] = 'Number of Tracks'
fig.update_layout(width=700, height=600)
fig.show()
  • The popularity of an album is it affected by the era? R:// Yes! The popularity of an album is affected by the era. The most popular albums were released in the 70s.

  • The number of tracks it’s related to the popularity of the album? R:// Yes! The more tracks an album has, the less popular it is.

2.7 Conclusion

The most popular years of Rush were the 70s, in the beginning of the band. As time increased, the songs became shorter and the albums did not enjoy as much popularity.