Machine Learning Interview Questions and Answers (Part-01)

Sanjay Kumar PhD
3 min readNov 25, 2024

--

Image generated by DALL E

1. Explain the bias-variance tradeoff.

Bias: The error introduced by approximating a real-world problem using a simplified model.

  • High Bias (Underfitting): The model cannot capture the patterns in the data.
  • Example: Linear regression on non-linear data.

Variance: The error due to model sensitivity to small changes in the training data.

  • High Variance (Overfitting): The model captures noise along with the data patterns.
  • Example: High-degree polynomial regression overfitting the training set.

Tradeoff:

  • Ideally, we aim for low bias and low variance.
  • Achieving this balance often requires hyperparameter tuning, regularization, or ensemble learning.

Techniques to Balance Bias and Variance:

  • Cross-validation to test generalization.
  • Regularization (L1/L2).
  • Ensemble methods like bagging (reduces variance) or boosting (reduces bias).

What is the difference between L1 and L2 regularization?

L1 Regularization (Lasso):

  • Effect: Drives some weights to zero, effectively performing feature selection.
  • Use Case: Sparse models where irrelevant features need elimination.

L2 Regularization (Ridge):

  • Effect: Penalizes large weights, keeping all features but with smaller magnitudes.
  • Use Case: When all features contribute, but overfitting needs control.

4. Explain the concept of cross-validation.

  • Definition: A resampling method to evaluate model performance.

Common Techniques:

  • K-Fold Cross-Validation: Split data into k folds, use k−1k-1k−1 for training, 1 for testing.
  • Leave-One-Out CV: Uses n−1n-1n−1 samples for training; tests on one.
  • Stratified K-Fold: Maintains class distributions across folds (for imbalanced data).

6. Describe the difference between supervised and unsupervised learning.

Supervised Learning:

  • Uses labeled data.
  • Tasks: Classification (spam detection) and Regression (house price prediction).

Unsupervised Learning:

  • Uses unlabeled data.
  • Tasks: Clustering (customer segmentation), Dimensionality Reduction (PCA).

7. What are the advantages and disadvantages of decision trees?

Advantages:

  • Interpretability.
  • Handles categorical and numerical data.

Disadvantages:

  • Prone to overfitting (high variance).
  • Sensitive to noisy data.

8. Explain the concept of ensemble learning.

Definition: Combines multiple models to improve accuracy.

Types:

  • Bagging: Reduces variance (e.g., Random Forest).
  • Boosting: Reduces bias (e.g., AdaBoost, XGBoost).

9. What is a confusion matrix, and how is it used?

Definition: Summarizes prediction results for classification.

  • True Positives (TP): Correctly predicted positives.
  • False Positives (FP): Predicted positive but actually negative.
  • False Negatives (FN): Predicted negative but actually positive.
  • True Negatives (TN): Correctly predicted negatives.

10. How do you handle missing data in a dataset?

Techniques:

  • Drop rows/columns with missing values.
  • Impute with mean/median/mode.
  • Predict missing values using machine learning.

11. What is the difference between bagging and boosting?

Bagging:

  • Trains models independently on bootstrapped datasets.
  • Example: Random Forest.

Boosting:

  • Trains models sequentially, correcting errors of previous models.
  • Example: Gradient Boosting, XGBoost.

12. What is a ROC curve, and what does it represent?

  • ROC Curve: Plots True Positive Rate (TPR) vs. False Positive Rate (FPR).
  • AUC (Area Under Curve): Measures the ability of the classifier to distinguish classes.

--

--

Sanjay Kumar PhD
Sanjay Kumar PhD

Written by Sanjay Kumar PhD

AI Product | Data Science| GenAI | Machine Learning | LLM | AI Agents | NLP| Data Analytics | Data Engineering | Deep Learning | Statistics

No responses yet