# Logistic Regression MCQ

Displaying 1 - 10 of 18

## Logistic Regression: WOE pattern for categorical Variable - Grade

Download excel sheet from here. This file contains two sheets. The first sheet contains observation or data entries of two variables — 'Purpose' and 'Default' which are equivalent to 'Grade' and 'Loan status' respectively. And the second sheet contains the distribution of goods and bad buckets for the 'Grade' variable.

## Suppose someone built a logistic regression model to predict whether a person has a heart disease or not.

All you have from their model is the following table which contains data of 10 patients.

 Patient ID Heart Disease Predicted Probability for Heart Disease Predicted Label 1001 0 0.34 0 1002 1 0.58 1 1003 1 0.79 1 1004 0 0.68 1 1005 0 0.21 0 1006 0 0.04 0 1007 1 0.48 0 1008 1 0.64 1 1009 0 0.61 1 1010 1 0.86 1

Now, you wanted to find out the cutoff based on which the classes were predicted, but you can't. But can you identify which of the following cutoffs would be a valid cutoff for the model above based on the 10 data points given in the table? (More than one option may be correct.)

## Which among accuracy sensitivity and specificity is the highest for the model below?

 Actual/Predicted Not Churn Churn Not Churn 80 40 Churn 30 50

## Suppose you are working for a media services company like Netflix. They're launching a new show called Sacred Games you are building a logistic regression

Suppose you are working for a media services company like Netflix. They're launching a new show called 'Sacred Games' and you are building a logistic regression model which will predict whether a person will like it or not based on whether consumers have liked/disliked some previous shows. You have the data of five of the previous shows and you're just using the dummy variables for these five shows to build the model. If the variable is 1, it means that the consumer liked the show and if the variable is zero, it means that the consumer didn't like the show. The following table shows the values of the coefficients for these five shows that you got after building the logistic regression model.

 Variable Name Coefficient Value TrueDetective_Liked 0.47 ModernFamily_Liked -0.45 Mindhunter_Liked 0.39 Friends_Liked -0.23 Narcos_Liked 0.55

Now, you have the data of three consumers Reetesh, Kshitij, and Shruti for these 5 shows indicating whether or not they liked these shows. This is shown in the table below:

 Consumer TrueDetective_Liked ModernFamily_Liked Mindhunter_Liked Friends_Liked Narcos_Liked Reetesh 1 0 0 0 1 Kshitij 1 1 1 0 1 Shruti 0 1 0 1 1

## Based on this data, which one of these three consumers is most likely to like to new show 'Sacred Games'?

To find the person who is most likely to like the show, you can use log odds. Recall the log odds is given by:

ln(P1−P) = β0 + β1 X1 + β2 X2+β3X3+...+βnXn

Here, there are five variables for which the coefficients are given. Hence, the log odds become:

ln(P1−P) = 0.47 X1 − 0.45 X2+0.39 X3−0.23 X4+0.55 X5

As you can see, we have ignored the β0 since it will be the same for all the three consumers. Now, using the values of the 5 variables given, you get -

(Log Odds)Reetesh=(0.47×1)−(0.45×0)+(0.39×0)−(0.23×0)+(0.55×1)=1.02

(Log Odds)Kshitij=(0.47×1)−(0.45×1)+(0.39×1)−(0.23×0)+(0.55×1)=0.96

(Log Odds)Shruti=(0.47×0)−(0.45×1)+(0.39×0)−(0.23×1)+(0.55×1)=−0.13

As you can clearly see, the log odds of Reetesh is the highest, hence, the odds of Reetesh liking the show is the highest and hence, he is most likely to like the new show, Sacred Games.

## Suppose you are building a logistic regression model to determine whether a person has diabetes or not.

Following are the values of predicted probabilities of 10 patients.

 Patient Probability(Diabetes) A 0.82 B 0.37 C 0.04 D 0.41 E 0.55 F 0.62 G 0.20 H 0.91 I 0.74 J 0.33

Assuming you arbitrarily chose a cut-off of 0.4, wherein if the probability is greater than 0.4, you'd conclude that the patient has diabetes and if it is less than or equal to 0.4, you'd conclude that the patient doesn't have diabetes, how many of these patients would be classified as diabetic based on the table above?

The cut-off is given to be 0.4. Hence, for a patient to be classified as diabetic, Probability(Diabetes) needs to be greater than 0.4. As you can see in the table above, there are 6 patients who have Probability(Diabetes) > 0.4. These are:

A: 0.82, D: 0.41, E: 0.55, F: 0.62, H: 0.91, I: 0.74

## From the table below what will be the accuracy of the model

 Actual/Predicted No Yes No 400 100 Yes 50 150

## Consider table below. How many of these patients were correctly labelled i.e. if the patient had lung cancer it was actually predicted as a 'Yes'

Matrix

 Actual/Predicted No Yes No 400 100 Yes 50 150

## Suppose you built a logistic regression model to predict whether a patient has lung cancer or not and you get the following confusion matrix as the output.

Confusion Matrix:

 Actual/Predicted No Yes No 400 100 Yes 50 150

## From the confusion matrix below compute the accuracy of the model.

Matrix

 Actual/Predicted Not Churn Churn Not Churn 80 30 Churn 20 70