/ Welcome / blog

data science interview questions and resources

August 17, 2020

As a recent graduate, I spent a better part of this year applying and interviewing for data science jobs. By going through a process of hundreds of applications, a dozen phone screens and four final rounds of interviews, I have accumulated a significant amount of resources and questions that have helped me prepare and land my dream data scientist role. Organized by categorizies, you will find over 50 interview questions along with some resources to help you brush up your skills and crush that interview!

Please note that the questions below are more suitable for entry level data scientist position since that is what I have experience with.

machine learning

Machine learning questions generally tend to be open ended in an interview where you will most likely be asked to defend your approaches in your previous projects or take-home tests. Be prepared to measure the “goodness” of a feature for a company’s product and make sure to approach it in a scientific and principled way.

  1. Analyze this dataset and give me a model that can predict this response variable.

  2. What could be some issues if the distribution of the test data is significantly different than the distribution of the training data?

  3. What are some ways I can make my model more robust to outliers?

  4. Explain how decision trees work.

  5. How can you make sure that you don’t analyze something that ends up meaningless?

  6. What are some differences you would expect in a model that minimizes squared error, versus a model that minimizes absolute error? In which cases would each error metric be appropriate?

  7. What error metric would you use to evaluate how good a binary classifier is? What if the classes are imbalanced? What if there are more than 2 groups?

  8. What is regularization and where might it be helpful? What is an example of using regularization in a model?

  9. Why might it be preferable to include fewer predictors over many?

  10. What is regularization and why is it useful?

  11. Drawbacks of a linear model?

  12. What is kernal trick?

  13. How can you say time-series is stationary?

statistical inference

Data scientist interview is incomplete without testing your knowledge in statistical concepts. The best way to prepare for this would be going through your course material from school or a related textbook. Below are some questions to checking your understanding and indentify gaps to fill in later

  1. What is maximum likelihood estimation? Could there be any case where it doesn’t exist?

  2. What’s the difference between a MAP, MOM, MLE estimator? In which cases would you want to use each?

  3. What is the difference between covariance and correlation?

  4. What is power analysis?

  5. What is a confidence interval and how do you interpret it?

  6. What is the basic idea behind bootstrapping?

  7. What is unbiasedness as a property of an estimator? Is this always a desirable property when performing inference? What about in data analysis or predictive modeling?

  8. What is z-scoring? Why would you do it?

  9. What is the law of large numbers?

  10. What are eigenvalues and eigenvectors?

  11. What is the difference between sentivity and specificity?

  12. What is a random variable?

  13. What are the drawbacks of locally weighted average?

  14. What are the different correlation measures between continuous and categorical variables

programming

Programming tests for data scientists usually involve querying with SQL. This can be done over the phone or onsite. Based on the team, role and the organization, your interviews will widely vary between being heavy on statistics or software development. To practice programming related questions, I highly recommend cracking the coding interview.

  1. Write an algorithm that can calculate the square root of a number.

  2. Given a list of numbers, can you return the outliers?

  3. When can parallelism make your algorithms run faster? When could it make your algorithms run slower?

  4. What are the different types of joins? What are the differences between them?

  5. Why might a join on a subquery be slow? How might you speed it up?

probability

  1. How can you generate a random number between 1 - 7 with only a die?

  2. How can you get a fair coin toss if someone hands you a coin that is weighted to come up heads more often than tails?

  3. You have an 50-50 mixture of two normal distributions with the same standard deviation. How far apart do the means need to be in order for this distribution to be bimodal?

  4. Given draws from a normal distribution with known parameters, how can you simulate draws from a uniform distribution?

  5. A certain couple tells you that they have two children, at least one of which is a girl. What is the probability that they have two girls?

  6. How many ways can you split 12 people into 3 teams of 4?

  7. On a dating site, users can select 5 out of 24 adjectives to describe themselves. A match is declared between two users if they match on at least 4 adjectives. If Alice and Bob randomly pick adjectives, what is the probability that they form a match?

  8. Let’s say you have a very tall father. On average, what would you expect the height of his son to be? Taller, equal, or shorter? What if you had a very short father?

  9. What’s the expected number of coin flips until you get two heads in a row? What’s the expected number of coin flips until you get two tails in a row?

  10. Let’s say we play a game where I keep flipping a coin until I get heads. If the first time I get heads is on the nth coin, then I pay you 2n-1 dollars. How much would you pay me to play this game?

  11. You have a 0.1% chance of picking up a coin with both heads, and a 99.9% chance that you pick up a fair coin. You flip your coin and it comes up heads 10 times. What’s the chance that you picked up the fair coin, given the information that you observed?

  12. You have two coins, one of which is fair and comes up heads with a probability 1/2, and the other which is biased and comes up heads with probability 3/4. You randomly pick coin and flip it twice, and get heads both times. What is the probability that you picked the fair coin?

communication

  1. Explain to me a technical concept related to the role that you’re interviewing for.

  2. Introduce me to something you’re passionate about.

  3. How will you report statistical analysis to a non-technical staff?

  4. Tell me about a data project that you’ve done with a team. What did you add to the group?

  5. Tell me about a dataset that you’ve analyzed. What techniques did you find helpful and which ones didn’t work?

  6. What’s your favorite algorithm? Can you explain it to me?

  7. How could you help the generate public understanding towards the importance of using data to generate insights?

parting thoughts

Interviews are hard, but there is a silver lining in that they serve as a forcing function for learning.

Thanks to others sharing what they learned, I was able to fail, learn from it, and then do it over again until I landed a job that I loved. I’m confident that with these resources and the right mindset, you’ll do great too. Good luck!