Logistic Regression Interview Questions

Displaying 1 - 10 of 11

Why can’t we use Mean Square Error (MSE) as a cost function for logistic regression?

In logistic regression, we use the sigmoid function and perform a non-linear transformation to obtain the probabilities. Squaring this non-linear transformation will lead to non-convexity with local minimums. Finding the global minimum in such cases using gradient descent is not possible. Due to this reason, MSE is not suitable for logistic regression. Cross-entropy or log loss is used as a cost function for logistic regression. In the cost function for logistic regression, the confident wrong predictions are penalised heavily. The confident right predictions are rewarded less. By optimising this cost function, convergence is achieved.

What is the output of a standard MLE program?

The output of a standard MLE program is as follows:

Maximised likelihood value: This is the numerical value obtained by replacing the unknown parameter values in the likelihood function with the MLE parameter estimator.

Estimated variance-covariance matrix: The diagonal of this matrix consists of the estimated variances of the ML estimates. The off-diagonal consists of the covariances of the pairs of the ML estimates

What are the advantages and disadvantages of conditional and unconditional methods of MLE?

Conditional methods do not estimate unwanted parameters. Unconditional methods estimate the values of unwanted parameters also. Unconditional formulas can directly be developed with joint probabilities. This cannot be done with conditional probability. If the number of parameters is high relative to the number of instances, then the unconditional method will give biased results. Conditional results will be unbiased in such cases.

What are the different methods of MLE and when is each method preferred?

In the case of logistic regression, there are two approaches to MLE. They are conditional and unconditional methods. Conditional and unconditional methods are algorithms that use different likelihood functions. The unconditional formula employs the joint probability of positives (for example, churn) and negatives (for example, non-churn). The conditional formula is the ratio of the probability of observed data to the probability of all possible configurations.

The unconditional method is preferred if the number of parameters is lower compared to the number of instances. If the number of parameters is high compared to the number of instances, then conditional MLE is to be preferred. Statisticians suggest that conditional MLE is to be used when in doubt. Conditional MLE will always provide unbiased results.

What is the Maximum Likelihood Estimator (MLE)?

The MLE chooses those sets of unknown parameters (estimator) that maximise the likelihood function. The method to find the MLE is to use calculus and setting the derivative of the logistic function with respect to an unknown parameter to zero, and solving it will give the MLE. For a binomial model, this will be easy, but for a logistic model, the calculations are complex. Computer programs are used for deriving MLE for logistic models.

(Here’s another approach to answering the question.)

MLE is a statistical approach to estimate the parameters of a mathematical model. MLE and ordinary square estimation give the same results for linear regression if the dependent variable is assumed to be normally distributed. MLE does not assume anything about independent variables.

What is the formula for calculating odds ratio?

The formula can be given as:

OR_{X_{1}, X_{2}} = e^{\sum_{i=1}^{k} \beta_{i}(X_{1i} - X_{0i}) }

In the formula above, X1 and X0 stand for two different groups for which the odds ratio needs to be calculated. X1i stands for the instance ‘i’ in group X1. Xoi stands for the instance ‘i’ in group X0.β0 stands for the coefficient of the logistic regression model. Note that the baseline is not included in this formula.

What is odds ratio?

Odds ratio is the ratio of odds between two groups. For example, let’s assume that you are trying to ascertain the effectiveness of a medicine. You administered this medicine to the ‘intervention’ group and a placebo to the ‘control’ group.

                                           Odds Ratio (OR) = Odds of the Intervention Group / Odds of the Control Group


  • If odds ratio = 1, then there is no difference between the intervention group and the control group.
  • If the odds ratio is greater than 1, then the control group is better than the intervention group.
  • If the odds ratio is less than 1, then the intervention group is better than the control group.

How to interpret the results of a logistic regression model? Or, what are the meanings of the different betas in a logistic regression model?

β0 is the baseline in a logistic regression model. It is the log odds for an instance when all the attributes (X1,X2,X3,...,Xn) are zero. In practical scenarios, the probability of all the attributes being zero is very low. In another interpretation, β0 is the log odds for an instance when none of the attributes is taken into consideration.

All the other Betas are the values by which the log odds change by a unit change in a particular attribute by keeping all other attributes fixed or unchanged (control variables).

Why is logistic regression very popular/widely used?

Logistic regression is famous because it can convert the values of logits (log-odds), which can range from −∞ to +∞ to a range between 0 and 1. As logistic functions output the probability of occurrence of an event, it can be applied to many real-life scenarios. It is for this reason that the logistic regression model is very popular. Another reason why logistic fairs in comparison to linear regression is that it is able to handle the categorical variables.