Curiotaxis

Blu-ray/DVD movies sales prediction

April 26, 2015 | 1 Minute Read

Which Blu-ray/DVD movie titles make the top sales in 2015? How many units will be sold?

Last two weeks I worked on Blu-ray/DVD sales prediction project. Boxoffice data of about 15000 titles were obtained from Boxoffice Mojo site using Beautiful Soup tool. In this project, I am a datasicentist working for a consulting company. One day I was asked to predict best new movies whose related products sales would be highest among 2015. So I created Blu-ray/DVD sales prediciton algorithm which would reflect popularity and project movie related products sales.

Overview of my project:

1. Get boxoffice data by webscraping.

2. Bluray-dvd sales data from opusdata.com (SQL).

3. Cleaning, join, explore the data.

4. Popularity holding Index(PHI).

I have created popularity holding index(PHI), which caluculates the change of weekly gross per theater value between first 2 week average and 4th week value. If the opening viewers give positive feedbacks and spread words to others, the weekly gross per theater still should hold its high number in 4th week. “"

5. Linear regression for modeling.

Two variable (total first 4 week domestic gross and PHI) linear regression model is: “"

Final model of multivariate linear regression is: “"

From final model I predicted Top 4 sales Blu-ray/DVD movies currently in theaters (April 26th, 2015). “"

6. matplotlib / d3 visualizaion (mpld3).

7. Interactive slides (slides.com).

Want to see my presentation slides?

Interactive plot: PHI vs domestic total gross