Applied Data Analytics – Problem Solving Task 4
Music streaming services have a lot to answer for. While many applaud the ability to get access to thousands of songs, the instant gratification aspect has led to an increase in genericisation and also a shortening in the length of songs.
In fact the average length of new songs in 2020 was the shortest it has been since the 1960’s. At the same time the standard deviation in each of the last 4 years has been the lowest of all time leading to an increasing sameness.
How we can long for the days of music experimentation and progressive rock of Tubular Bells, Pink Floyd and even Queen’s standard breaking Bohemian Rhapsody. That is enough of my yearning for the past.
The file Problem Solving Task 4.xlsx contains records on over 110,000 songs downloaded from Spotify between 1963 and 2020 including artist, song name, a popularity score, release year, and duration in seconds and minutes. Most of this data is just for those of you who love music as much as do and want to muck around with the file. There are other similar files on a website called Kaggle.
Question One
⦁ Using only data from 2017, calculate the average and standard deviation of the song lengths in minutes.
⦁ Assuming the values calculated in part a. are a good estimate of the population mean and standard deviation, and assuming the distribution of song lengths is normal, calculate:
⦁ Probability that any one song in 2017 had a length of more than 7 minutes
⦁ Probability that any one song in 2017 lasted between 3 and 4 minutes.
⦁ The maximum duration for the shortest 20% of songs.
⦁ Construct a frequency distribution for the length of songs in 2017. Make the lower limit of the first class 0 minutes and make the class width for all classes 1 minute. Graph the frequency distribution using a histogram.
⦁ From the distribution in part c. is it reasonable to assume the distribution of song lengths was normal? Why?
Question Two
In 2016 the US Scholastic Assessment Test (SAT) which is used to assess students performance for admissions to US universities was completely revised. In 2017 across the entire US the average SAT score was 1060 with a standard deviation of 195 (these can be taken as population values)
⦁ What is the probability any one student taking the SAT scores more than 1500 on the test? What assumption would you have to make to do this calculation?
⦁ Suppose a university randomly sampled 100 students to calculate their average score. Describe the sampling distribution of the sample mean.
⦁ What is the probability that a sample of 100 students had an average score of more than 1100?
Question Three
According to Headphone Addict (https://headphonesaddict.com/listening-to-music-statistics/) an average person listens to more than 950 hours of music per year. One recent statistic indicated that 60% of all 16-24 year olds stream music every day.
Given 16 – 24 years old represents a typical university student, if we assume that 60% of university students stream music every day:
⦁ What is the probability that in a sample of 10 university students less than 4 streamed music that day?
⦁ What is the probability that in a sample of 100 university students less than 40% streamed music that day (i.e. the sample proportion was less than 0.4 make sure you do the question as a proportion question)?