This study was conceived when I felt that Bollywood songs were focussed on female body parts rather than conveying particular emotions or anything (which supposedly are the reason why movies have songs).



First I try to get a sample of song videos from YouTube. This is probably where I should exert maximum care (to prevent bias). But since the experiment can be repeated any number of times, I do different kinds of sampling.


The songs will be downloaded from YouTube using youtube-dl

Frame picking

To objectively assess the videos, I shall pick out random frames from the songs. 10 frames from each song at random seconds by the code randint(1,vidlength) where vidlength is the length of the video in seconds rounded to the largest integer less than it.


Based on the frames I pick, I could assign some kind of score to the videos. But I will consider this after doing some frame picking.


The 10 frames that were snapped shall be uploaded for each dataset.

Highest Grossing Bollywood films

Based on the top 10 movies from List of highest-grossing Bollywood films and all songs from those.

Here's the list I used to feed youtube-dl: top10bollywoodgross

Here's the script used to process it all:

