Calculate the median value of a given unsorted array. Find the time complexity of the solution. How to improve the solution?
Data Engineer Interview Questions
Data Engineer Interview Questions
Data engineers are IT professionals who are needed in almost every industry. Data engineers monitor data trends to determine best next steps for companies. A critical part of a data engineer job is to process raw data into usable data by creating data pipelines and building data systems.
Top Data Engineer Interview Questions & How To Answer
Question #1: Can you describe in detail your level of expertise with programming languages?
Question #2: Explain data engineering in your own words.
Question #3: Can you describe your experience working with Apache Hadoop and cloud data management environments?
20,202 data engineer interview questions shared by candidates
Questions on System design, Python and SQL
Preguntas de pandas multiple choice
What is the biggest challenge in your previous work?
Build a web application that allows users to learn who represents them in the US House of Representatives. User Flow 1. User enters their zip code in validated form field. 2. User clicks submit button, or hits Enter key when input is focused. 3. User is returned a summary of who their representative is, including links to learn more. Resources: The `/data` folder in this repo contains two datasets: `legislators.json` lists current representatives associated with the states and district numbers they've served in, and `zipcodes-districts.json` lists every US zip code with its associated state and district number.
tell me about youself ?
Basic concepts about Data Engineering
Spark optimizations: what are the optimizations that can be done for the below snippet code: shoppers_df (customers description DF) 250MB, 15M records: schema: StructType = StructType(Array(StructFiled("shopper_id", LongType, nullable = True), StructField("retailer_id", StringType, nullable = True), StructField("shopper_group_id", StringType, nullable = True), StructField("join_date", DateType, nullable = True), StructField("shopper_type", StringType, nullable = True), StructField("gender", StringType, nullable = True))) sku_df (dimension DF): 15 MB, 90K records purchase_df (transactions DF): 50GB of parquet compressed files 5,000,000,000 records. schema: StructType = StructType(Array(StructFiled("shopper_id", LongType, nullable = True), StructField("product_id", LongType, nullable = True), StructField("pos_id", IntegerType, nullable = True), StructField("purchase_date", DateType, nullable = True), StructField("units", DoubleType, nullable = True), StructField("total_spent", DoubleType, nullable = True))) Current code: products_purchased_df = purchase_df.alias("purchase").join(shoppers_df, on = "shopper_id", how = "left outer").join(sku_df.alias("sku"), on = "product_id").select(Col("purchase.*"), Col("sku.*")) usage: status_df = products_purchased_df.groupBy(["shopper_id", "product_id"]).agg(...) Optimize join statement
We will give you a take-home project to do and you will have to do research and come up with architecture around it?
Two rounds - Online technical test Multiple choice answer and question format (skip questions that are not relevant) Technical questions on current problems the company faced and how you would solve it
Viewing 1291 - 1300 interview questions