You have to do the following:
1. Remove characters like ',' from the number of installs.
2. Delete rows where the Installs column has irrelevant strings like 'Free'
3. Convert the column to int type
You can download the dataframe from here
Sample Output:
Rating Installs
Rating 1.000000 0.051355
Installs 0.051355 1.000000
Cleaning Correlation DataFrame
import pandas as pd
df=pd.read_csv("filename.csv")
df.Installs=df.Installs.str.replace(',','')
df.Installs=df.Installs.str.replace('+','')
df=df[df.Installs!='Free']
df.Installs=df.Installs.astype(int)
print(df.corr())
Comments