Machine Learning Interview Questions and Answers (Part-01)

Sanjay Kumar PhD

3 min readNov 25, 2024

1. Explain the bias-variance tradeoff.

Bias: The error introduced by approximating a real-world problem using a simplified model.

High Bias (Underfitting): The model cannot capture the patterns in the data.
Example: Linear regression on non-linear data.

Variance: The error due to model sensitivity to small changes in the training data.

High Variance (Overfitting): The model captures noise along with the data patterns.
Example: High-degree polynomial regression overfitting the training set.

Tradeoff:

Ideally, we aim for low bias and low variance.
Achieving this balance often requires hyperparameter tuning, regularization, or ensemble learning.

Techniques to Balance Bias and Variance:

Cross-validation to test generalization.
Regularization (L1/L2).
Ensemble methods like bagging (reduces variance) or boosting (reduces bias).

What is the difference between L1 and L2 regularization?

L1 Regularization (Lasso):

Effect: Drives some weights to zero, effectively performing feature selection.
Use Case: Sparse models where irrelevant features need elimination.

L2 Regularization (Ridge):

Effect: Penalizes large weights, keeping all features but with smaller magnitudes.
Use Case: When all features contribute, but overfitting needs control.

4. Explain the concept of cross-validation.

Definition: A resampling method to evaluate model performance.

Common Techniques:

K-Fold Cross-Validation: Split data into k folds, use k−1k-1k−1 for training, 1 for testing.
Leave-One-Out CV: Uses n−1n-1n−1 samples for training; tests on one.
Stratified K-Fold: Maintains class distributions across folds (for imbalanced data).

6. Describe the difference between supervised and unsupervised learning.

Supervised Learning:

Uses labeled data.
Tasks: Classification (spam detection) and Regression (house price prediction).

Unsupervised Learning:

Uses unlabeled data.
Tasks: Clustering (customer segmentation), Dimensionality Reduction (PCA).

7. What are the advantages and disadvantages of decision trees?

Advantages:

Interpretability.
Handles categorical and numerical data.

Disadvantages:

Prone to overfitting (high variance).
Sensitive to noisy data.

8. Explain the concept of ensemble learning.

Definition: Combines multiple models to improve accuracy.

Types:

Bagging: Reduces variance (e.g., Random Forest).
Boosting: Reduces bias (e.g., AdaBoost, XGBoost).

9. What is a confusion matrix, and how is it used?

Definition: Summarizes prediction results for classification.

True Positives (TP): Correctly predicted positives.
False Positives (FP): Predicted positive but actually negative.
False Negatives (FN): Predicted negative but actually positive.
True Negatives (TN): Correctly predicted negatives.

10. How do you handle missing data in a dataset?

Techniques:

Drop rows/columns with missing values.
Impute with mean/median/mode.
Predict missing values using machine learning.

11. What is the difference between bagging and boosting?

Bagging:

Trains models independently on bootstrapped datasets.
Example: Random Forest.

Boosting:

Trains models sequentially, correcting errors of previous models.
Example: Gradient Boosting, XGBoost.

12. What is a ROC curve, and what does it represent?

ROC Curve: Plots True Positive Rate (TPR) vs. False Positive Rate (FPR).
AUC (Area Under Curve): Measures the ability of the classifier to distinguish classes.