Suppose you are working for a media services company like Netflix. They're launching a new show called 'Sacred Games' and you are building a logistic regression model which will predict whether a person will like it or not based on whether consumers have liked/disliked some previous shows. You have the data of five of the previous shows and you're just using the dummy variables for these five shows to build the model. If the variable is 1, it means that the consumer liked the show and if the variable is zero, it means that the consumer didn't like the show. The following table shows the values of the coefficients for these five shows that you got after building the logistic regression model.
|Variable Name||Coefficient Value|
Now, you have the data of three consumers Reetesh, Kshitij, and Shruti for these 5 shows indicating whether or not they liked these shows. This is shown in the table below:
Based on this data, which one of these three consumers is most likely to like to new show 'Sacred Games'?
To find the person who is most likely to like the show, you can use log odds. Recall the log odds is given by:
ln(P1−P) = β0 + β1 X1 + β2 X2+β3X3+...+βnXn
Here, there are five variables for which the coefficients are given. Hence, the log odds become:
ln(P1−P) = 0.47 X1 − 0.45 X2+0.39 X3−0.23 X4+0.55 X5
As you can see, we have ignored the β0 since it will be the same for all the three consumers. Now, using the values of the 5 variables given, you get -
As you can clearly see, the log odds of Reetesh is the highest, hence, the odds of Reetesh liking the show is the highest and hence, he is most likely to like the new show, Sacred Games.