Skip to main content
Home
  • Tutorials
    • Quality Assurance
    • Software Development
    • Machine Learning
    • Data Science
  • About Us
  • Contact
programsbuzz facebook programsbuzz twitter programsbuzz linkedin
  • Log in

Main navigation

  • Tutorials
    • Quality Assurance
    • Software Development
    • Machine Learning
    • Data Science
  • About Us
  • Contact

IMDB Movie Assignment: Start to Analysis: Part - 1

Profile picture for user akshita.goel
Written by akshita.goel on 11/25/2021 - 10:51

Subtask 2.1: Reduce those Digits!

Now, as you can see in the dataset we have two Columns of Gross and Budget. The values that are present under these columns are present in large figures. Therefore, performing the analysis part using this kind of data is a bit complex task. We are going to convert these figures from $ to million $, so as make these columns easily readable.

Follow the given code to follow the same:

movies["Gross"] = movies["Gross"]/1000000
movies["budget"] = movies["budget"]/1000000

 movies ["Gross"] and movies["budget"] indicated that we are performing the operation in the movies dataset, where Gross and budget are the column names.

Output

$ to million $

Subtask 2.2: Let's Talk Profit!

It can be noted that the figures are now converted into smaller ones.

Moving ahead in this article, we are going to practice the following four steps:

  1. We are going to create a new column named profit to calculate and Profit.
  2. We are going to learn how to sort the column into ascending or descending order.
  3. We will learn how to call certain top columns for analysis.
  4. We are going to visualize budget vs Profit using Matplotlib.
  5. Also, we are going to find all entries with negative profit.

1. Calculating Profit

To calculate the profit we need to know that: Profit= Gross- budget

Also, in the Dataset we have the Gross and Budget column. So, to calculate Profit we find need to subtract the budget Column from the gross column as shown below:

movies["Gross"]-movies["budget"]

and then Create a new column to store the profit calculated as below:

movies["profit"]=movies["Gross"]-movies["budget"]

Output
Profit calculated

2. Sorting Columns

To sort the values into ascending or descending order sort_values() is used. The method is followed by Column name to decide what column we want to sort. Follow the syntax to sort the data below:

Syntax

Dataset.sort_values(by="Column_name",ascending=False)

Now, let's apply the above syntax to sort the data by Profit Column in the dataset:

movies.sort_values(by="profit",ascending=False)

NOTE: Data is by default is sorted into ascending order. Therefore, to sort data into descending order we have used ascending= False.

Output
Sort the data

3. Reading top Columns

So to read top n rows in Pandas we use .iloc[1st position: last position] Function.  The code to read the top 10 rows with maximum profit is shown below:

movies.sort_values(by="profit",ascending=False).iloc[:10]

NOTE: the last position is never displayed as an output in the .iloc function.

Now, let's Store the top 10 rows in a variable named Top10, follow the code below:

top10=movies.sort_values(by="profit",ascending=False).iloc[:10]

Output
Top10

4. Visualizing budget v/s Profit

As we all know, Visualization means presenting data in diagrammatic form to attain some meeting out of it. In python, to do some visualization we need to export the Matplotlib module. Follow the syntax for the same:

Import matplotlib.pyplot as plt

Now that we want to draw a graph with a budget and Profit column we will draw it using a scatter plot. Follow the code to do the same:

plt.scatter(movies.profit, movies.budget)
plt.xlabel("Profit")
plt.ylabel("Budget")
plt.figure(figsize=[7,4])
plt.show()
  • movies.Profit and movies.budget indicates the columns for movies data frame i.e(Profit and budget).
  • xlabel and ylabel are used to give names for the x and y-axis.
  • plt.scatter is used to plot a Scatter Plot.
  • figsize[x, y] defines the size of the graph with respect to the x and y-axis.
  • plt.show() is used to display final graph.

Output

BUDGET VS PROFIT

5. Finding negative profit

negative values are those values that are marked less than one. So, to find negative profit we can find those profits whose values are less than 0. Follow the given code for the same:

movies[movies.profit<0]

To store the above operation in a variable neg_profit follow the steps:

neg_profit=movies[movies.profit<0]

Output

negprofit__

Related Content
IMDB Movie Assignment: Start to Analysis: Part - 2
IMDB Movie Assignment: Start To Analysis: Part - 3
IMDB Movie Assignment: Problem Statement with Basic Instructions
  • Log in or register to post comments

Choose Your Technology

  1. Agile
  2. Apache Groovy
  3. Apache Hadoop
  4. Apache HBase
  5. Apache Spark
  6. Appium
  7. AutoIt
  8. AWS
  9. Behat
  10. Cucumber Java
  11. Cypress
  12. DBMS
  13. Drupal
  14. GitHub
  15. GitLab
  16. GoLang
  17. Gradle
  18. HTML
  19. ISTQB Foundation
  20. Java
  21. JavaScript
  22. JMeter
  23. JUnit
  24. Karate
  25. Kotlin
  26. LoadRunner
  27. matplotlib
  28. MongoDB
  29. MS SQL Server
  30. MySQL
  31. Nightwatch JS
  32. PactumJS
  33. PHP
  34. Playwright
  35. Playwright Java
  36. Playwright Python
  37. Postman
  38. Project Management
  39. Protractor
  40. PyDev
  41. Python
  42. Python NumPy
  43. Python Pandas
  44. Python Seaborn
  45. R Language
  46. REST Assured
  47. Ruby
  48. Selenide
© Copyright By iVagus Services Pvt. Ltd. 2023. All Rights Reserved.

Footer

  • Cookie Policy
  • Privacy Policy
  • Terms of Use