How to Prepare for a Technical Data Engineer Interview

How to Prepare for a Technical Data Engineer Interview

Sample SQL, Python and Data System Design Questions

·

3 min read

I recently went through a number of interviews for Data Engineering roles. Most of them at the senior, individual contributor level. This is an aggregation of some of the questions I was asked throughout the technical rounds.

The core focus was of course SQL and Python with some data infrastructure and system design questions. Below is a breakdown of the type of technical questions asked as well sample questions and prompts.

  • SQL: 45%
  • Python: 35%
  • Data Systems Design: 20%

SQL

Verbal:

  • Explain the difference between a left join and an inner join
  • What's the difference between Union vs. Union All?
  • Explain what indexing is and it's purpose in a RDBMS
  • When analyzing a slow query, what steps would you take to diagnose the issue?

Written:

Given an order and customer table, answer the following questions:

  • Take both order and customer tables and produce the order_amount for all customer_ids
  • Take the order table and sum the order_amount for each month and display by order_type. Each order_type should be in its own column.
  • Take the order table and generate an average_order_type_amount, by order_type for each row.

Order Table:

customer_idorder_typedate_of_orderorder_amount
1A01/02/2021250
2B02/12/2021500
3C04/08/2021600
4B03/05/2021150
5A01/27/2021230
6A03/14/2021250
1B04/13/2021405

Customer Table:

customer_idname
1Jasper
2Will
3Ben

Python

Verbal:

  • What's the difference between a list and a tuple?
  • Explain what a dictionary is
  • In pseudocode, explain a how to encode a string input to $ if the character repeats or to ^ if the character does not repeat.
    • For instance, temperature would encode to $$^^$$^$^$$
    • t repeats twice and thus encoded to $
  • What is the output of this function?
    mydata = {'foo': 'foo', 'bar': 'bar'}
    def transform(data):
     local_data = data
     local_data['foo'] = 'baz'
     transform(mydata)
     print(mydata['foo'])
    

Written:

  • Given a string as an input, write a function to replace all # hashtags with a - and replace all & ampersands with a +
    • For example, ###&## should return ---+--
  • Find the longest consecutive recurrence of an element in a list. Given this list:

    [A ,C ,D, D, D, B, C, C, A, F, A, D, A, A,]
    
  • Print dictionary in alphabetic order by first name. Expected output Anthony is in IT, Ben is in HR...

    mydictionary = {'Jess': 'IT', 'Will': 'Sales', 'Anthony': 'IT', 'Ben':
    'HR' }
    

Data Systems Design

Prompt Example:

Be prepared to talk about the architectural design of data infrastructure you have worked on, preferably acted as the lead engineer. Prepare architecture diagrams in advance, or you can draw one live as you explain the design.

  • Explain the infrastructure and the principles that led to this architecture
  • Explain the reasoning behind different design decisions and alternative solutions
  • Explain further extensions of the project for future requests
  • What was the overall impact the project had?

Questions:

  • How would you design a scalable pipeline to ingest flat files?
  • How would you design an API client to interact with a microservice?