Data Scientist Trainee Interview Questions

54,195 data scientist trainee interview questions shared by candidates

There is a table that tracks every time a user turns a feature on or off, with columns user_id, action ("on" or "off), date, and time. How many users turned the feature on today? How many users have ever turned the feature on? In a table that tracks the status of every user every day, how would you add today's data to it?
avatar

Data Scientist

Interviewed at Meta

3.6
Mar 29, 2017

There is a table that tracks every time a user turns a feature on or off, with columns user_id, action ("on" or "off), date, and time. How many users turned the feature on today? How many users have ever turned the feature on? In a table that tracks the status of every user every day, how would you add today's data to it?

They asked probability question: 1) The probability that item an item at location A is 0.6 , and 0.8 at location B. What is the probability that item would be found on Amazon website. 2). I have table 1, with 1million records, with ID, AGE (column names) , Table 2 with 100 records with ID and Salary then the interviewer gave me the following SQL script SELECT A.ID,A.AGE,B.SALARY FROM TABLE 1 A LEFT JOIN TABLE 2 B ON A.ID = B.ID + WHERE B.SALARY > 50000 ( HE ASKED TO MODIFY THIS LINE OF QUERY) How many records would be returned? 3. Give a csv file with ID, and Quantity columns, 50million records and size of data is 2gig, write a program in any language of your choice to aggregate the QUANTITY column.
avatar

Data Scientist

Interviewed at Amazon

3.5
Oct 27, 2016

They asked probability question: 1) The probability that item an item at location A is 0.6 , and 0.8 at location B. What is the probability that item would be found on Amazon website. 2). I have table 1, with 1million records, with ID, AGE (column names) , Table 2 with 100 records with ID and Salary then the interviewer gave me the following SQL script SELECT A.ID,A.AGE,B.SALARY FROM TABLE 1 A LEFT JOIN TABLE 2 B ON A.ID = B.ID + WHERE B.SALARY > 50000 ( HE ASKED TO MODIFY THIS LINE OF QUERY) How many records would be returned? 3. Give a csv file with ID, and Quantity columns, 50million records and size of data is 2gig, write a program in any language of your choice to aggregate the QUANTITY column.

Given the following data: Table: searches Columns: date STRING date of the search, search_id INT the unique identifier of each search, user_id INT the unique identifier of the searcher, age_group STRING ('<30', '30-50', '50+'), search_query STRING the text of the search query Sample Rows: date | search_id | user_id | age_group | search_query -------------------------------------------------------------------- '2020-01-01' | 101 | 9991 | '<30' | 'justin bieber' '2020-01-01' | 102 | 9991 | '<30' | 'menlo park' '2020-01-01' | 103 | 5555 | '30-50' | 'john' '2020-01-01' | 104 | 1234 | '50+' | 'funny cats' Table: search_results Columns: date STRING date of the search action, search_id INT the unique identifier of each search, result_id INT the unique identifier of the result, result_type STRING (page, event, group, person, post, etc.), clicked BOOLEAN did the user click on the result? Sample Rows: date | search_id | result_id | result_type | clicked -------------------------------------------------------------------- '2020-01-01' | 101 | 1001 | 'page' | TRUE '2020-01-01' | 101 | 1002 | 'event' | FALSE '2020-01-01' | 101 | 1003 | 'event' | FALSE '2020-01-01' | 101 | 1004 | 'group' | FALSE Over the last 7 days, how many users made more than 10 searches? You notice that the number of users that clicked on a search result about a Facebook Event increased 10% week-over-week. How would you investigate? How do you decide if this is a good thing or a bad thing? The Events team wants to up-rank Events such that they show up higher in Search. How would you determine if this is a good idea or not?
avatar

Data Scientist

Interviewed at Meta

3.6
Apr 23, 2021

Given the following data: Table: searches Columns: date STRING date of the search, search_id INT the unique identifier of each search, user_id INT the unique identifier of the searcher, age_group STRING ('<30', '30-50', '50+'), search_query STRING the text of the search query Sample Rows: date | search_id | user_id | age_group | search_query -------------------------------------------------------------------- '2020-01-01' | 101 | 9991 | '<30' | 'justin bieber' '2020-01-01' | 102 | 9991 | '<30' | 'menlo park' '2020-01-01' | 103 | 5555 | '30-50' | 'john' '2020-01-01' | 104 | 1234 | '50+' | 'funny cats' Table: search_results Columns: date STRING date of the search action, search_id INT the unique identifier of each search, result_id INT the unique identifier of the result, result_type STRING (page, event, group, person, post, etc.), clicked BOOLEAN did the user click on the result? Sample Rows: date | search_id | result_id | result_type | clicked -------------------------------------------------------------------- '2020-01-01' | 101 | 1001 | 'page' | TRUE '2020-01-01' | 101 | 1002 | 'event' | FALSE '2020-01-01' | 101 | 1003 | 'event' | FALSE '2020-01-01' | 101 | 1004 | 'group' | FALSE Over the last 7 days, how many users made more than 10 searches? You notice that the number of users that clicked on a search result about a Facebook Event increased 10% week-over-week. How would you investigate? How do you decide if this is a good thing or a bad thing? The Events team wants to up-rank Events such that they show up higher in Search. How would you determine if this is a good idea or not?

The company developed a new feature and perform A/B test. Here is the result Comments +5% Likes -10% Timespent +1% All else neutral How would you decide to whether putting into product based on the A/B test result? Any ideas?
avatar

Data Scientist

Interviewed at Meta

3.6
Jun 8, 2017

The company developed a new feature and perform A/B test. Here is the result Comments +5% Likes -10% Timespent +1% All else neutral How would you decide to whether putting into product based on the A/B test result? Any ideas?

There are 50 cards of 5 different colors. It comprises of 10 Red cards, 10 blue cards, 10 orange cards, 10 green cards and 10 yellow cards. Each color will have the cards numbered between 1 to10. You pick 2 cards at random. What is the probability that they are not of same color and not of same number.
avatar

Data Scientist

Interviewed at Meta

3.6
Jul 17, 2016

There are 50 cards of 5 different colors. It comprises of 10 Red cards, 10 blue cards, 10 orange cards, 10 green cards and 10 yellow cards. Each color will have the cards numbered between 1 to10. You pick 2 cards at random. What is the probability that they are not of same color and not of same number.

Viewing 21 - 30 interview questions

Glassdoor has 54,195 interview questions and reports from Data scientist trainee interviews. Prepare for your interview. Get hired. Love your job.