The output of a standard MLE program is as follows:
Maximised likelihood value: This is the numerical value obtained by replacing the unknown parameter values in the likelihood function with the MLE parameter estimator.
Estimated variance-covariance matrix: The diagonal of this matrix consists of the estimated variances of the ML estimates. The off-diagonal consists of the covariances of the pairs of the ML estimates
Conditional methods do not estimate unwanted parameters. Unconditional methods estimate the values of unwanted parameters also. Unconditional formulas can directly be developed with joint probabilities. This cannot be done with conditional probability. If the number of parameters is high relative to the number of instances, then the unconditional method will give biased results. Conditional results will be unbiased in such cases.
In the case of logistic regression, there are two approaches to MLE. They are conditional and unconditional methods. Conditional and unconditional methods are algorithms that use different likelihood functions. The unconditional formula employs the joint probability of positives (for example, churn) and negatives (for example, non-churn). The conditional formula is the ratio of the probability of observed data to the probability of all possible configurations.
The unconditional method is preferred if the number of parameters is lower compared to the number of instances. If the number of parameters is high compared to the number of instances, then conditional MLE is to be preferred. Statisticians suggest that conditional MLE is to be used when in doubt. Conditional MLE will always provide unbiased results.
The MLE chooses those sets of unknown parameters (estimator) that maximise the likelihood function. The method to find the MLE is to use calculus and setting the derivative of the logistic function with respect to an unknown parameter to zero, and solving it will give the MLE. For a binomial model, this will be easy, but for a logistic model, the calculations are complex. Computer programs are used for deriving MLE for logistic models.
(Here’s another approach to answering the question.)
MLE is a statistical approach to estimate the parameters of a mathematical model. MLE and ordinary square estimation give the same results for linear regression if the dependent variable is assumed to be normally distributed. MLE does not assume anything about independent variables.
In the formula above, X1 and X0 stand for two different groups for which the odds ratio needs to be calculated. X1i stands for the instance ‘i’ in group X1. Xoi stands for the instance ‘i’ in group X0.β0 stands for the coefficient of the logistic regression model. Note that the baseline is not included in this formula.
Odds ratio is the ratio of odds between two groups. For example, let’s assume that you are trying to ascertain the effectiveness of a medicine. You administered this medicine to the ‘intervention’ group and a placebo to the ‘control’ group.
Odds Ratio (OR) = Odds of the Intervention Group / Odds of the Control Group
Interpretation
If odds ratio = 1, then there is no difference between the intervention group and the control group.
If the odds ratio is greater than 1, then the control group is better than the intervention group.
If the odds ratio is less than 1, then the intervention group is better than the control group.
β0 is the baseline in a logistic regression model. It is the log odds for an instance when all the attributes (X1,X2,X3,...,Xn) are zero. In practical scenarios, the probability of all the attributes being zero is very low. In another interpretation, β0 is the log odds for an instance when none of the attributes is taken into consideration.
All the other Betas are the values by which the log odds change by a unit change in a particular attribute by keeping all other attributes fixed or unchanged (control variables).
Logistic regression is famous because it can convert the values of logits (log-odds), which can range from −∞ to +∞ to a range between 0 and 1. As logistic functions output the probability of occurrence of an event, it can be applied to many real-life scenarios. It is for this reason that the logistic regression model is very popular. Another reason why logistic fairs in comparison to linear regression is that it is able to handle the categorical variables.
The main important differences between logistic and linear regression are:
1. Dependent/response variable in linear regression is continuous whereas, in logistic regression, it is the discrete type.
2. Cost function in linear regression minimise the error term Sum(Actual(Y)-Predicted(Y))^2 but logistic regression uses maximum likelihood method for maximising probabilities.