IMDB Movie Assignment: Demographic analysis: Part - 3

Subtask 3.4: US vs non-US Cross Analysis

This subtask is the last article of this IMDB Case study. In this subtask, you are going to see how the votes for both US and Non-US movies vary among US and Non-US Voters.

1) Create a column named "IFUS" such that the column must contain the USA if its "Country" is the USA. Also, for countries other than the USA it should contain "non-USA". Follow the code below to perform the same task:

movies['IFUS']=movies['Country'].apply(lambda x : "USA" if x =='USA' else "non-USA")
movies[['IFUS','Country']].head(20)

OUTPUT: 

 imdb

2) Analysis for CVotes:

This part will show you how the number of CVotes for both US and Non-US movies varies for both US and Non-US Voters. This will be performed using the concept of Boxplot of the seaborn library. Follow the code below to perform the same task:

fig, ax=plt.subplots(1,2,figsize=[13,6])

sns.boxplot(data= movies,x='IFUS',
            y='CVotesUS',
            ax=ax[0],
            color='r',
            hue='IFUS')
sns.boxplot(data= movies,x='IFUS',
            y='CVotesnUS',
            ax=ax[1],
            hue='IFUS',
            color='b')

ax[0].set_title('US Voters')
ax[1].set_title('Non-US Voters')

plt.show()

OUTPUT:

 boxplot

Observations:

1) The median line is higher for US movies than Non-US for both the US Voters and Non-US Voters. Therefore it can be said that 50% of voters(be it the US or Non-US) that vote for US movies is greater than 50% of voters that vote for Non-US.

2)It can be noticed that outliers are present for US movies and not for Non-US movies. The outliers in Boxplot are represented by diamond shape as shown in the diagram above.

3) Analysis for Votes:

As done above for CVotes, you are going to see the same task performed with the Votes column

OUTPUT:

 imdb

Well, start brainstorming, and try to find out 2-3 inferences/Observations for the above-given output for Votes columns. Perform, this task the same as what we did for CVotes.

Subtask 3.5: Top 1000 Voters Vs Genres

Now, this is the last portion of this article where we are going to plot a bar graph for Top 1000 Voters Vs Genres. In order to do that, we have a CVotes1000 column in the dataset, that represents the top 1000 voters that voted for the particular movie.
Firstly, you need to arrange genre_top10 in descending order of the CVotes1000 column. Follow the step below to perform the same task:

genre_top10=genre_top10.sort_values(by='CVotes1000', ascending=False)

Now, a plotting bar graph using seaborn in order to find Top 1000 Voters Vs Genres you can follow the step below:

# Bar plot
plt.figure(figsize=[10,7])
sns.barplot(y='genres',
            x='CVotes1000',
            data=genre_top10,
            palette='twilight',
            lw=1,
            ec='black',
            hatch='/')
plt.show() 

OUTPUT:

imdb