For the given dataframe you have to clean the "Installs" column and print its correlation with other numeric columns of the dataframe.(print df.corr())

You have to do the following:

1. Remove characters like ',' from the number of installs.
2. Delete rows where the Installs column has irrelevant strings like 'Free'
3. Convert the column to int type

You can download the dataframe from here

Sample Output:

           Rating  Installs
Rating    1.000000  0.051355
Installs  0.051355  1.000000

Cleaning Correlation DataFrame

import pandas as pd 

df=pd.read_csv("filename.csv")
df.Installs=df.Installs.str.replace(',','')

df.Installs=df.Installs.str.replace('+','')
df=df[df.Installs!='Free']

df.Installs=df.Installs.astype(int)
print(df.corr())

Comments