Machine Learning Interview Questions and Answers (Part-01)
3 min readNov 25, 2024
1. Explain the bias-variance tradeoff.
Bias: The error introduced by approximating a real-world problem using a simplified model.
- High Bias (Underfitting): The model cannot capture the patterns in the data.
- Example: Linear regression on non-linear data.
Variance: The error due to model sensitivity to small changes in the training data.
- High Variance (Overfitting): The model captures noise along with the data patterns.
- Example: High-degree polynomial regression overfitting the training set.
Tradeoff:
- Ideally, we aim for low bias and low variance.
- Achieving this balance often requires hyperparameter tuning, regularization, or ensemble learning.
Techniques to Balance Bias and Variance:
- Cross-validation to test generalization.
- Regularization (L1/L2).
- Ensemble methods like bagging (reduces variance) or boosting (reduces bias).
What is the difference between L1 and L2 regularization?
L1 Regularization (Lasso):
- Effect: Drives some weights to zero, effectively performing feature selection.
- Use Case: Sparse models where irrelevant features need elimination.
L2 Regularization (Ridge):
- Effect: Penalizes large weights, keeping all features but with smaller magnitudes.
- Use Case: When all features contribute, but overfitting needs control.
4. Explain the concept of cross-validation.
- Definition: A resampling method to evaluate model performance.
Common Techniques:
- K-Fold Cross-Validation: Split data into k folds, use k−1k-1k−1 for training, 1 for testing.
- Leave-One-Out CV: Uses n−1n-1n−1 samples for training; tests on one.
- Stratified K-Fold: Maintains class distributions across folds (for imbalanced data).
6. Describe the difference between supervised and unsupervised learning.
Supervised Learning:
- Uses labeled data.
- Tasks: Classification (spam detection) and Regression (house price prediction).
Unsupervised Learning:
- Uses unlabeled data.
- Tasks: Clustering (customer segmentation), Dimensionality Reduction (PCA).
7. What are the advantages and disadvantages of decision trees?
Advantages:
- Interpretability.
- Handles categorical and numerical data.
Disadvantages:
- Prone to overfitting (high variance).
- Sensitive to noisy data.
8. Explain the concept of ensemble learning.
Definition: Combines multiple models to improve accuracy.
Types:
- Bagging: Reduces variance (e.g., Random Forest).
- Boosting: Reduces bias (e.g., AdaBoost, XGBoost).
9. What is a confusion matrix, and how is it used?
Definition: Summarizes prediction results for classification.
- True Positives (TP): Correctly predicted positives.
- False Positives (FP): Predicted positive but actually negative.
- False Negatives (FN): Predicted negative but actually positive.
- True Negatives (TN): Correctly predicted negatives.
10. How do you handle missing data in a dataset?
Techniques:
- Drop rows/columns with missing values.
- Impute with mean/median/mode.
- Predict missing values using machine learning.
11. What is the difference between bagging and boosting?
Bagging:
- Trains models independently on bootstrapped datasets.
- Example: Random Forest.
Boosting:
- Trains models sequentially, correcting errors of previous models.
- Example: Gradient Boosting, XGBoost.
12. What is a ROC curve, and what does it represent?
- ROC Curve: Plots True Positive Rate (TPR) vs. False Positive Rate (FPR).
- AUC (Area Under Curve): Measures the ability of the classifier to distinguish classes.