Top 50 SQL Interview Questions and Answers for Data Engineers

1. What are the differences between `INNER JOIN`, `LEFT JOIN`, `RIGHT JOIN`, and `FULL JOIN`?

Answer:

- `INNER JOIN` returns records with matching values in both tables.

- `LEFT JOIN` returns all records from the left table and matches records from the right table.

- `RIGHT JOIN` returns all records from the right table and matches records from the left table.

- `FULL JOIN` returns all records when there is a match in either left or right table.

2. Explain `GROUP BY` and `HAVING`. How do they differ?

Answer: `GROUP BY` groups rows with the same values in specified columns, while `HAVING` is used to filter records that aggregate results meet a certain condition. Unlike `WHERE`, `HAVING` works with aggregated data.

3. What’s the difference between `WHERE` and `HAVING`?

Answer: `WHERE` filters rows before any aggregation, while `HAVING` filters after aggregation, usually used with `GROUP BY`.

4. What is the primary key?

Answer: A primary key is a unique identifier for records in a table. It ensures each row is unique and cannot contain `NULL` values.

5. What’s the difference between a `UNION` and a `UNION ALL`?

Answer: `UNION` combines results from two queries and removes duplicates, while `UNION ALL` combines results without removing duplicates, making it faster.

6. Explain `DISTINCT` and when to use it.

Answer: `DISTINCT` removes duplicate rows from a result set. It’s useful when you want only unique values.

7. How do you create an index, and why is it important?

Answer: An index improves query performance by providing quick access to rows. It’s created using `CREATE INDEX index_name ON table(column); `.

8. What are clustered and non-clustered indexes?

Answer: A clustered index sorts and stores rows in the table based on key values, while a non-clustered index creates a separate structure for lookups.

9. What’s a foreign key?

Answer: A foreign key is a field in one table that uniquely identifies a row in another table, enforcing referential integrity between tables.

10. What is a subquery, and when would you use it?

Answer: A subquery is a query nested inside another query, often used to perform intermediate calculations or filter data.

11. Explain window functions and give an example.

Answer: Window functions perform calculations across a set of table rows related to the current row. Example: `ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary)` assigns a rank based on salary within each department.

12. What is a self-join?

Answer: A self-join is a regular join where a table joins itself, useful for comparing rows within the same table.

13. How would you delete duplicate records in SQL?

Answer: Using `ROW_NUMBER()`, identify duplicates and delete records where `ROW_NUMBER > 1` for each unique entry.

14. What is normalization, and why is it used?

Answer: Normalization organizes data to minimize redundancy. The process includes dividing tables and using foreign keys to link related data.

15. Explain the different normal forms.

Answer:

- 1NF: Eliminate duplicate columns.

- 2NF: Remove subsets of data that apply to multiple rows.

- 3NF: Remove columns that are not dependent on the primary key.

16. What is denormalization?

Answer: Denormalization is combining tables to improve read performance by reducing the number of joins, often used in data warehouses.

17. What’s the difference between `DELETE`, `TRUNCATE`, and `DROP`?

Answer:

- `DELETE` removes specific rows, can use `WHERE`, and maintains the table structure.

- `TRUNCATE` deletes all rows quickly, but keeps the table.

- `DROP` removes the entire table and its structure.

18. Explain ACID properties in a database.

Answer: ACID stands for Atomicity, Consistency, Isolation, and Durability, ensuring reliable transactions in a database.

19. What’s a stored procedure, and why use it?

Answer: A stored procedure is a saved SQL code block that performs a specific task, improving performance and reusability.

20. How does indexing affect performance?

Answer: Indexing speeds up data retrieval but can slow down write operations. Proper indexing is crucial for balance.

21. What is an aggregate function? Give examples.

Answer: Aggregate functions calculate a single value from multiple rows, like `SUM`, `AVG`, `MAX`, `MIN`, and `COUNT`.

22. Explain the difference between `CHAR` and `VARCHAR`.

Answer: `CHAR` is fixed-length, while `VARCHAR` is variable-length. `VARCHAR` is more efficient for varying-length data.

23. What is a database view, and why use it?

Answer: A view is a virtual table based on a query. It provides simplified or restricted access to data without storing it.

24. What’s a materialized view?

Answer: A materialized view stores the result of a query physically, enabling faster access to frequently be accessed complex queries.

25. How would you find the second-highest salary in a table?

Answer: Use `SELECT MAX(salary) FROM employees WHERE salary < (SELECT MAX(salary) FROM employees);`.

26. Explain the concept of `CASE` in SQL.

Answer: `CASE` is a conditional expression for returning different results based on conditions, similar to `IF-THEN-ELSE`.

27. What are transactions, and why are they important?

Answer: Transactions ensure a series of operations are completed successfully. If not, they’re rolled back, ensuring data integrity.

28. How do you prevent SQL injections?

Answer: Use parameterized queries, stored procedures, and validate inputs to avoid SQL injection vulnerabilities.

29. What’s the purpose of `COALESCE`?

Answer: `COALESCE` returns the first non-null value from a list of expressions useful for handling nulls in data.

30. What’s a CTE, and why use it?

Answer: A Common Table Expression (CTE) is a temporary result set within a query, making complex queries easier to read and maintain.

31. Explain `EXISTS` vs. `IN` for subqueries.

Answer: `EXISTS` is typically faster with correlated subqueries, while `IN` works better with static lists or when matching values across tables.

32. How do you calculate cumulative totals in SQL?

Answer: Use a window function like `SUM(column) OVER (ORDER BY column)`, which calculates a running total.

33. What are foreign key constraints?

Answer: Foreign key constraints ensure that the value in one table matches a value in another table, maintaining referential integrity.

34. Explain pivot tables in SQL.

Answer: Pivot tables rotate rows into columns, summarizing data to better analyze patterns, often done with `CASE` statements or `PIVOT`.

35. What’s a JSON data type in SQL, and when to use it?

Answer: JSON data type stores semi-structured JSON objects in SQL tables, ideal for storing flexible schema data.

36. How do you use `RANK()` and `DENSE_RANK()`?

Answer: `RANK()` assigns ranks with gaps, while `DENSE_RANK()` assigns ranks without gaps for ties.

37. What’s the difference between a view and a table?

Answer: A table stores data, while a view is a virtual table created by a query, containing no data itself.

38. What is a surrogate key?

Answer: A surrogate key is a unique identifier for a record, often used in place of a primary key to avoid dependencies on natural keys.

39. What are triggers, and when to use them?

Answer: Triggers automatically execute predefined actions in response to specific events on a table, like inserts or updates.

40. What is `NULL` in SQL?

Answer: `NULL` represents missing or unknown data, often handled with functions like `IS NULL`, `COALESCE`, or `IFNULL`.

41. How does `OFFSET-FETCH` work in pagination?

Answer: `OFFSET-FETCH` skips a set number of rows and then fetches the desired number, useful for paging through large datasets.

42. What’s the purpose of the `MERGE` statement?

Answer: `MERGE` performs insert, update or delete operations in a single statement based on matching records.

43. What’s `ISNULL` vs. `COALESCE`?

Answer: `ISNULL` is specific to two arguments, while `COALESCE` supports multiple arguments, useful for null handling.

44. How do you find duplicate records?

Answer: Use `GROUP BY` and `HAVING COUNT() > 1` to identify duplicates based on specific column(s).

45. How does a `FULL OUTER JOIN` work?

Answer: `FULL OUTER JOIN` returns all records when there’s a match in either left or right table, filling in `NULL` where there’s no match.

46. What’s a `TEMP TABLE`, and why use it?

Answer: A temporary table stores intermediate data temporarily, making complex calculations simpler without storing them in the main database.

47. How do you perform error handling in SQL?

Answer: Use `TRY-CATCH` blocks to handle runtime errors, making debugging easier and enhancing reliability.

48. Explain `LEFT JOIN` vs. `RIGHT JOIN`.

Answer: `LEFT JOIN` returns all records from the left table and matching records from the right, while `RIGHT JOIN` does the opposite.

49. What is indexing?

Answer: Indexing is a performance optimization technique, creating data structures that speed up data retrieval.

50. Explain `AUTO_INCREMENT` in SQL.

Answer: `AUTO_INCREMENT` is used to automatically generate unique values for a primary key field in tables.

follow Satish Mandale for more such contents

Search This Blog

Master Azure Data Engineer

Top 50 SQL Interview Questions and Answers for Data Engineers

Comments

Post a Comment

Popular posts from this blog

A Complete Guide to SnowSQL in Snowflake: Usage, Features, and Best Practices

Mastering DBT (Data Build Tool): A Comprehensive Guide

Understanding Virtual Warehouses in Snowflake: How to Create and Manage Staging in Snowflake