Data Scientist Interview Questions & Answers (2026)

Top 30 data scientist interview questions covering statistics, ML, Python, SQL, A/B testing, and case studies.

Avg. Salary$110,000 – $200,000
Questions10 Q&As

Top hiring companies

GoogleMetaNetflixAirbnbUberSpotify

Data Scientist interview questions & answers

1. What is the difference between overfitting and underfitting?

Overfitting: model learns training data too well including noise, performs poorly on new data (high variance). Underfitting: model too simple to capture the pattern (high bias). The bias-variance tradeoff: more complexity reduces bias but increases variance. Fix overfitting: regularization (L1/L2), dropout, early stopping, more data, cross-validation.

2. Explain the Central Limit Theorem and why it matters.

The CLT states that the distribution of sample means approaches a normal distribution as sample size increases (n ≥ 30), regardless of the population distribution. It justifies using z-tests and t-tests on non-normal data, underpins A/B testing statistics, and means we can make inferences about populations from samples.

3. How would you handle class imbalance in a classification problem?

(1) Resampling — oversample minority (SMOTE) or undersample majority. (2) Class weights — class_weight='balanced' in sklearn. (3) Threshold tuning — move decision threshold below 0.5 to improve recall. (4) Right metric — use F1, precision-recall AUC, not accuracy. (5) BalancedRandomForest or EasyEnsemble.

4. What is the difference between precision and recall?

Precision = TP/(TP+FP): of predicted positives, how many are correct? Recall = TP/(TP+FN): of actual positives, how many did you catch? There's a tradeoff. F1 = harmonic mean. Choose: maximize recall when false negatives are costly (cancer screening); maximize precision when false positives are costly (spam filtering).

5. How do you design an A/B test?

(1) Define hypothesis and primary metric. (2) Power analysis for sample size (80% power, 95% confidence, minimum detectable effect). (3) Random user assignment. (4) Run for a full business cycle. (5) Check for novelty effects and sample ratio mismatch. (6) Two-sided t-test or z-test. (7) Check guardrail metrics. (8) Decision requires statistical AND practical significance.

6. What is gradient boosting and how does it differ from random forests?

Random forests build trees independently and average predictions (bagging — reduces variance). Gradient boosting builds trees sequentially, each correcting residual errors of the previous (boosting — reduces bias). XGBoost/LightGBM are optimized implementations. Random forests: faster, less prone to overfitting. Gradient boosting: typically higher accuracy with tuning.

7. Write a SQL query to find the top 3 products by revenue in each category.

SELECT category, product, revenue FROM (SELECT category, product, SUM(amount) as revenue, RANK() OVER (PARTITION BY category ORDER BY SUM(amount) DESC) as rank FROM orders JOIN products USING(product_id) GROUP BY category, product) ranked WHERE rank <= 3. Key concept: RANK() window function with PARTITION BY.

8. How would you build a recommendation system from scratch?

(1) Collaborative filtering: find users with similar history (user-user or item-item CF), scalable with matrix factorization (ALS, SVD), cold-start problem with new users. (2) Content-based: recommend similar items based on features, no cold-start. (3) Hybrid: combine both. For production: candidate generation (FAISS for ANN) + ranking layer (LightGBM with context features).

9. What is p-hacking and how do you avoid it?

Running multiple statistical tests until one reaches p<0.05 by chance. With 20 tests at α=0.05, expect one false positive. Prevention: pre-register hypothesis; Bonferroni correction (divide α by number of tests); use false discovery rate (Benjamini-Hochberg); stop only at pre-calculated sample size; treat initial findings as hypothesis-generating only.

10. How would you explain a machine learning model to a non-technical stakeholder?

Start with output: 'This model predicts which users will cancel in the next 30 days.' Explain inputs and key drivers using SHAP values: 'Users inactive for 14+ days are 3x more likely to churn.' Avoid jargon. Acknowledge uncertainty: 'It's right about 85% of the time.' Focus on what decisions it enables and the cost of errors.

Practice these questions out loud

Reading answers is the first step. Delivering them under pressure — with follow-up questions, time constraints, and a panel evaluating you — is where real prep happens. Preciprocal's AI mock interviews simulate that experience.

Start practicing free →

Related interview guides

Ready to turn preparation into offers?

Try Preciprocal free — no credit card required