Company Wise Data Science Interview Questions
Company Wise Data Science Interview Questions
Interview Questions
Company: Google
Role: Data Scientist
1. Pick any product or app that you really like and describe how you would improve it.
2. How would you find an anomaly in a distribution ?
3. How would you go about investigating if a certain trend in a distribution is due to an
anomaly?
4. How would you estimate the impact Uber has on traffic and driving conditions?
5. What metrics would you consider using to track if Uber’s paid advertising strategy to
acquire new customers actually works? How would you then approach figuring out an
ideal customer acquisition cost?
Company: TCS
Role: Data Scientist
1. How many cars are there in Chennai? How do u structurally approach coming up
with that number?
2. Multiple Linear Regression?
3. OLS vs MLE?
4. R2 vs Adjusted R2? During Model Development which one do we consider?
5. Lift chart, drift chart
6. Sigmoid Function in Logistic regression
7. ROC what is it? AUC and Differentiation?
8. Linear Regression from Multiple Linear Regression
9. P-Value what is it and its significance? What does P in P-Value stand for? What is
Hypothesis Testing? Null hypothesis vs Alternate Hypothesis?
10. Bias Variance Trade off?
11. Over fitting vs Underfitting in Machine learning?
12. Estimation of Multiple Linear Regression
13. Forecasting vs Prediction difference? Regression vs Time Series?
14. p,d,q values in ARIMA models
Company: Fractal
Role: Data Scientist
Company: Wipro
Role: Data Scientist
Company: Accenture
Role: Data Scientist
1. What is difference between K-NN and K-Means clustering?
2. How to handle missing data? What imputation techniques can be used?
3. Explain topic modelling in NLP and various methods in performing topic modeling.
4. Explain how you would find and tackle an outlier in the dataset.
5. Follow up: What about inlier?
6. Explain back propagation in few words and its variants?
7. Is interpretability important for machine learning model? If so, ways to achieve
interpretability for a machine learning models?
8. Is interpretability important for machine learning model? If so, ways to achieve
interpretability for a machine learning models?
9. How would you design a data science pipeline?
10. Explain bias - variance trade off. How does this affect the model?
11. What does a statistical test do?
12. How to determine if a coin is biased? Hint: Hypothesis testing
1. What is deep learning, and how does it contrast with other machine learning
algorithms?
2.When should you use classification over regression?
3.Using Python how do you find Rank, linear and tensor equations for an given array of
elements? Explain your approach.
4.What exactly do you know about Bias-Variance decomposition? 5.What is the best
recommendation technique you have learnt and what type of recommendation
technique helps to predict ratings? 6.How can you assess a good logistic model?
7.How to you read the text from an image? Explain?
8.What are all the options to convert speech to text? Explain and name few available
tools to implement the same?
Company Name : Tata IQ
Role: Data Analyst
Company: Genpact
Role: Data Scientist
1. How would you check if the model is suffering from multi Collinearity?
2. What is transfer learning? Steps you would take to perform transfer learning.
3. Why is CNN architecture suitable for image classification? Not an RNN?
4. What are the approaches for solving class imbalance problem?
5. When sampling what types of biases can be inflected? How to control the biases?
6. Explain concepts of epoch, batch, iteration in machine learning.
7. What type of performance metrics would you choose to evaluate the different
classification models and why?
8. What are some of the types of activation functions and specifically when to use
them?
9. What are the conditions that should be satisfied for a time series to be stationary?
10. What is the difference between Batch and Stochastic Gradient Descent?
11. What is difference between K-NN and K-Means clustering?
Company: Quantiphi
Role: Machine Learning Engineer
1. What happens when neural nets are too small? What happens when they are large
enough?
2. Why do we need pooling layer in CNN? Common pooling methods?
3. Are ensemble models better than individual models? Why/why - not?
4. Use Case - Consider you are working for pen manufacturing company. How would you
help sales team with leads using Data analysis?
5. Assume you were given access to a website google analytics data.
6. In order to increase conversions, how do you perform A/B testing to identify best
page design.
7. How is random forest different from Gradient boosting algorithm, given both are
tree-based algorithm?
8. Describe steps involved in creating a neural network?
9. In brief, how would you perform the task of sentiment analysis?
Company: TheMathCompany
Role: Analyst (Data Science)
Rounds:
1. Technical Test (Python, SQL, Statistics) (Coding+MCQ) (90 min).
2. Telephonic interview (10 min).
3. Technical interview (45 min).
4. Fitment interview (25 min).
5. HR interview (30 min).
Company: Cognizant
Role: Data Scientist
1. Conditional Probability
2. Can Linear Regression be used for Classification? If Yes, why if No why?
3. Hypothesis Testing. Null and Alternate hypothesis
4. Derivation of Formula for Linear and logistic Regression
5. Why use Decision Trees?
6. PCA Advantages and Disadvantages?
7. What is Naive Bayes Theorem? Multinomial, Bernoulli, Gaussian Naive Bayes.
8. Central Limit Theorem?
9. Scenario based question on when to use which ML model?
10. Over Sampling and Under Sampling
11. Over Fitting and Under Fitting
12. Core Concepts behind Each ML model mentioned in my Resume.
13. Genie Index Vs Entropy
14. how to deal with imbalance data in classification modelling?
Company: Wipro
Role: Data Scientist
1. What is a Python Package, and Have you created your own Python Package?
2. Explain about Time series models you have used?
3. SQL Questions - Group by Top 2 Salaries for Employees - use Row num and Partition
4. Pandas find Numeric and Categorical Columns. For Numeric columns in Data frame,
find the mean of the entire column and add that mean value to each row of those
numeric columns.
5. What is Gradient Descent? What is Learning Rate and Why we need to reduce or
increase? Why Global minimum is reached and Why it doesn’t improve when increasing
the LR after that point?
6. Two Logistic Regression Models - Which one will you choose - One is trained on 70%
and other on 80% data. Accuracy is almost same.
8. What is Log-Loss and ROC-AUC?
9. Do you know to use Amazon SageMaker for MLOPS?
10. Explain your Projects end to end (15-20mins).
Company: Infosys
Role: Data Scientist
1. What makes you feel that you would be suitable for this role, since you come from a
different background?
2. What is an imbalanced data set??
3. What are the factors you will consider in order to predict the population of a city in the
future?
4. Basic statistics questions?
5. What are the approaches for treating the missing values?
6. Evaluation metrics for Classification?
7. Bagging vs Boosting with examples
8. Handling of imbalanced datasets
9. What are your career aspirations?
10.What's the graph of y = |x|-2
11. esstimate on no. Of petrol cars in Delhi
12.Case study on opening a retail store
13.Order of execution of SQL
Company: Ericsson
Role: Data Scientist
1. How would you check if the model is suffering from multi Collinearity?
2. What is transfer learning? Steps you would take to perform transfer learning.
3. Why is CNN architecture suitable for image classification? Not an RNN?
4. What are the approaches for solving class imbalance problem?
5. When sampling what types of biases can be inflected? How to control the biases?
6. Explain concepts of epoch, batch, iteration in machine learning.
7. What type of performance metrics would you choose to evaluate the different
classification models and why?
8. What are some of the types of activation functions and specifically when to use
them?
9. What is the difference between Batch and Stochastic Gradient Descent?
10. What is difference between K-NN and K-Means clustering?
11. How to handle missing data? What imputation techniques can be used?
1. Use Case - Consider you are working for pen manufacturing company. How would you
help sales team with leads using Data analysis?
2. Interviewers ask about scenarios or use-case based questions to know interviewee
thought process and problem-solving skills.
3. Assume you were given access to a website google analytics data.
4. In order to increase conversions, how do you perform A/B testing to identify best
page design.
5. How is random forest different from Gradient boosting algorithm, given both are
tree-based algorithm?
6. Describe steps involved in creating a neural network?
7. LSTM solves the vanishing gradient problem, that RNN primarily have. How?
8. In brief, how would you perform the task of sentiment analysis?
Company: Axtria
Company: Bridgei2i
Role: Senior Analytics Consultant
Company: Deloitte
Role: Data Scientist
1. How many cars are there in Chennai? How do u structurally approach coming up
with that number?
2. Multiple Linear Regression?
3. OLS vs MLE?
4. R2 vs Adjusted R2? During Model Development which one do we consider?
5. Lift chart, drift chart
6. Sigmoid Function in Logistic regression
7. ROC what is it? AUC and Differentiation?
8. Linear Regression from Multiple Linear Regression
9. P-Value what is it and its significance? What does P in P-Value stand for? What is
Hypothesis Testing? Null hypothesis vs Alternate Hypothesis?
10. Bias Variance Trade off?
11. Over fitting vs Underfitting in Machine learning?
12. Estimation of Multiple Linear Regression
13. Forecasting vs Prediction difference? Regression vs Time Series?
14. p,d,q values in ARIMA models
1. What will happen if d=0
2. What is the meaning of p,d,q values?
15. Is your data for Forecasting Uni or multi-dimensional?
16. How to find the nose to start with in a Decision tree.
17. TYPES of Decision trees - CART vs C4.5 vs ID3
18. Genie index vs entropy
19. Linear vs Logistic Regression
20. Decision Trees vs Random Forests
21. Questions on liner regression, how it works and all
22. Asked to write some SQL queries
23. Asked about past work experience
24. Some questions on inferential statistics (hypothesis testing, sampling techniques)
25. Some questions on table (how to filter, how to add calculated fields etc)
26. Why do u use Licensed Platform when other Open source packages are available?
27. What certification Have u done?
28. What is a Confidence Interval?
29. What are Outliers? How to Detect Outliers?
30. How to Handle Outliers?
Company: L&T Financial Services
Role: Data Scientist
Introduce yourself.
One complex sql query- 2 table are there, Table1(cust_id,Name)
Table2(cust_id,Transaction_amt)
Write a query to return the name of customers with 8th highest lifetime purchase.
Achieve the same using python.
ML questions:
What's the problem in having multi collinearity in data set.
If there is business requirement to keep two corelated features in model, what would
you do.
How would you deal with feature of 4 categories and 20% null values
Some questions based on my project.
Problem Statement:
Company: Ericsson
Role: Data Scientist
Round 2:
Complete ML technical stack used in project?
Different activation function?
How do you handle imbalance data ?
Difference between sigmoid and softmax ?
Explain about optimizers ?
Precision-Recall Trade off ?
How do you handle False Positives ?
Explain LSTM architecture by taking example of 2 sentences and how it will be
processed?
Decision Tree Parameters?
Bagging and boosting ?
Explain bagging internals
Write a program by taking an url and give a rough code approach how you will pass
payload and make a post request?
Different modules used in python ?
Another coding problem of checking balanced parentheses?
Round 2:
Complete ML technical stack used in project?
Different activation function?
How do you handle imbalance data ?
Difference between sigmoid and softmax ?
Explain about optimisers ?
Precision-Recall Trade off ?
How do you handle False Positives ?
Explain LSTM architecture by taking example of 2 sentences and how it will be
processed?
Decision Tree Parameters?
Bagging and boosting ?
Explain bagging internals
Write a program by taking an url and give a rough code approach how you will pass
payload and make a post request?
Different modules used in python ?
Another coding problem of checking balanced parentheses?
Company: Cerence
Role: NLU Developer
Question1 :
Write a function that take two strings as inputs and return true if they are anagrams of
each other and false otherwise
e.g.
(hello, hlleo) --> true
(hello, helo) --> false
Question 2 :
Write a function that take an array of strings "A" and an integer "n",
that return the list of all strings of length "n" from the array "A" that can be constructed
as the concatenation of two strings from the same array "A"
e.g.
A = [dog, tail, sky, or, hotdog, tailor, hot] and n=6
output should be "hotdog" and "tailor"
Question 3 :
Given an array "arr" of numbers and a starting number "x",
Find "x" such that the running sums of "x" and the elements of the array "arr" are never
lower than 1.
e.g.
arr = [-2, 3, 1, -5].
The running sums will be x-2, x-2+3, x-2+3+1 and x-2+3+1-5.
So, the output should be 4.
Company: GEOTAB
Python :
1. Is python a language that follows pass by value, or pass by reference or pass by
object reference
2. What are lambda functions and how to use them
3. Difference between mutable and immutable objects with example.
4. What are Python decorators? Why do we use them
SQL :
1. What is the difference between Inner join and left inner join ?
2. What are window functions ?
3. What is the use of groupby ?
SQL Round
3 tables given as below:
TRIPS
trip_id
vehicle_id
start_time
stop_time
VEHICLE_MAKE
vehicle_id
make_id
MAKES
make_id
make_name
There is a table which contains vehicle trips. Trips are not necessarily in order.
There is a table which contains vehicle makes. Makes are not necessarily known.
PROBLEM: Write SQL code that provides the number of trips that started on September
1st, 2020 for each vehicle with a KNOWN make.
Order the results by the trip count.
op
vehicle_id | trip_count
4| 2
1| 1
2| 1
1st round:-
Introduction
Current NLP architecture used in my project
How will you identify Data Drift? Once identified how would you automate the handling
of Data Drift
Data Pipeline used
Fasttext word embedding vs word2vec
When should we use Tf-IDF and when predictive based word embedding will be
advantageous over Tf-IDF
Metrics used to validate our model
In MongoDB write a query to find employee names from a collection
In Python write a program to separate 0s and 1s from an array- (0,1,0,1,1,0,1,0)
Company: Latentview.
Initial they had asked for the explaining the project which I had done. I explained the
Customer prediction case . Then I was asked with python questions by sharing my
screen.
Company: Myntra
Role: Data Analyst
Introduce yourself.
A continuous variable is having missing values, so how will you decide that the missing
values should be imputed by mean or median?
What is PCA and what each component means? Also, what is the maximum value for
number of components?
What is test of independence? How do you calculate Chi-square value?
When precision is preferred over recall or vice-versa?
Advantages and disadvantages of Random forest over Decision Tree?
What is the c hyperparameter in SVM algorithm and how it affects bias variance
tradeoff?
What are the assumptions of linear regression?
Difference between Stemming and Lemmatization?
Difference between Correlation and Regression?
What is p-value and confidence interval?
What is multicollinearity and how do you deal with multicollinearity? What is VIF?
What is the difference between apply, applymap and map function in python?
Deloitte Interview :
Introduction
- NLP Questions
Sentiment analysis, preprocessing like (TFID, BOW), Embeddings, stemming,
Lemmatization
libraries in know : nltk, spacy
- Regression Preprocessing
answered outlier, missing value immputation, Distribution, dummies, multicolinearity etc
You have two highy co-related columns which one will you drop? : "Based on Business
Problem i will see accordingly.",
- Stastical Computing:
Type 1 and Type 2 error
Alternate name of Type 1 error (couldn't answer alternate name of Type 1 error, 'False
+ive, him)
What is p-Value (Explaiend with the example of Linear Regression from statsmodel)
- Do you have exposure of TimeSeires analysis : NO (didn't ask anything and seems fine
with him)
1. Genpact
2. Tredence Analytics
3. Fractal Analytics
4. Tiger Analytics
5. Bridgei2i
6. Ugam
7. Latent View
8. Brillio
9. Abzooba
10. AbsolutData
11. Gramemer
12. BluePi
13. Knowledge Foundry
14. Wipro
15. TCS
16. Accenture
17. Purplle
18. AbsoluteData
19. Hansa CEquity
20. Lymbyc
21. IBM
22. PwC
23. EY
24. KPMG
25. Sibia
26. ZS
27. ZF
28. TechVantage
29. L&T Infotech
30. Cognizant
31. Amazon
32. Microsoft
33. Walmart
34. Philips
35. Ford
36. JP Morgan
37. Deloitte
38. Shell
39. Mu Sigma
40. Postman
41. Altrix
42. HP
43. HCL
44. Dell
45. Paypal
46. Fidelity Investments
47. Rakuten
48. Infosys
49. Flipkart
50. Myntra