Subtask 3.4: US vs non-US Cross Analysis
This subtask is the last article of this IMDB Case study. In this subtask, you are going to see how the votes for both US and Non-US movies vary among US and Non-US Voters.
1) Create a column named "IFUS" such that the column must contain the USA if its "Country" is the USA. Also, for countries other than the USA it should contain "non-USA". Follow the code below to perform the same task:
movies['IFUS']=movies['Country'].apply(lambda x : "USA" if x =='USA' else "non-USA") movies[['IFUS','Country']].head(20)
2) Analysis for CVotes:
This part will show you how the number of CVotes for both US and Non-US movies varies for both US and Non-US Voters. This will be performed using the concept of Boxplot of the seaborn library. Follow the code below to perform the same task:
fig, ax=plt.subplots(1,2,figsize=[13,6]) sns.boxplot(data= movies,x='IFUS', y='CVotesUS', ax=ax, color='r', hue='IFUS') sns.boxplot(data= movies,x='IFUS', y='CVotesnUS', ax=ax, hue='IFUS', color='b') ax.set_title('US Voters') ax.set_title('Non-US Voters') plt.show()
1) The median line is higher for US movies than Non-US for both the US Voters and Non-US Voters. Therefore it can be said that 50% of voters(be it the US or Non-US) that vote for US movies is greater than 50% of voters that vote for Non-US.
2)It can be noticed that outliers are present for US movies and not for Non-US movies. The outliers in Boxplot are represented by diamond shape as shown in the diagram above.
3) Analysis for Votes:
As done above for CVotes, you are going to see the same task performed with the Votes column.
Well, start brainstorming, and try to find out 2-3 inferences/Observations for the above-given output for Votes columns. Perform, this task the same as what we did for CVotes.
Subtask 3.5: Top 1000 Voters Vs Genres
Now, this is the last portion of this article where we are going to plot a bar graph for Top 1000 Voters Vs Genres. In order to do that, we have a CVotes1000 column in the dataset, that represents the top 1000 voters that voted for the particular movie.
Firstly, you need to arrange genre_top10 in descending order of the CVotes1000 column. Follow the step below to perform the same task:
Now, a plotting bar graph using seaborn in order to find Top 1000 Voters Vs Genres you can follow the step below:
# Bar plot plt.figure(figsize=[10,7]) sns.barplot(y='genres', x='CVotes1000', data=genre_top10, palette='twilight', lw=1, ec='black', hatch='/') plt.show()