Top 50 SQL Interview Questions and Answers for Data Engineers

 

 1. What are the differences between `INNER JOIN`, `LEFT JOIN`, `RIGHT JOIN`, and `FULL JOIN`?

   Answer:

   - `INNER JOIN` returns records with matching values in both tables.

   - `LEFT JOIN` returns all records from the left table and matches records from the right table.

   - `RIGHT JOIN` returns all records from the right table and matches records from the left table.

   - `FULL JOIN` returns all records when there is a match in either left or right table.

 

 2. Explain `GROUP BY` and `HAVING`. How do they differ?

   Answer: `GROUP BY` groups rows with the same values in specified columns, while `HAVING` is used to filter records that aggregate results meet a certain condition. Unlike `WHERE`, `HAVING` works with aggregated data.

 

 3. What’s the difference between `WHERE` and `HAVING`?

   Answer: `WHERE` filters rows before any aggregation, while `HAVING` filters after aggregation, usually used with `GROUP BY`.

 

 4. What is the primary key?

   Answer: A primary key is a unique identifier for records in a table. It ensures each row is unique and cannot contain `NULL` values.

 

 5. What’s the difference between a `UNION` and a `UNION ALL`?

   Answer: `UNION` combines results from two queries and removes duplicates, while `UNION ALL` combines results without removing duplicates, making it faster.

 

 6. Explain `DISTINCT` and when to use it.

Answer: `DISTINCT` removes duplicate rows from a result set. It’s useful when you want only unique values.

 

 7. How do you create an index, and why is it important?

   Answer: An index improves query performance by providing quick access to rows. It’s created using `CREATE INDEX index_name ON table(column); `.

 

 8. What are clustered and non-clustered indexes?

   Answer: A clustered index sorts and stores rows in the table based on key values, while a non-clustered index creates a separate structure for lookups.

 

 9. What’s a foreign key?

   Answer: A foreign key is a field in one table that uniquely identifies a row in another table, enforcing referential integrity between tables.

 

 10. What is a subquery, and when would you use it?

   Answer: A subquery is a query nested inside another query, often used to perform intermediate calculations or filter data.

 

 11. Explain window functions and give an example.

   Answer: Window functions perform calculations across a set of table rows related to the current row. Example: `ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary)` assigns a rank based on salary within each department.

 

 12. What is a self-join?

   Answer: A self-join is a regular join where a table joins itself, useful for comparing rows within the same table.

 

 13. How would you delete duplicate records in SQL?

   Answer: Using `ROW_NUMBER()`, identify duplicates and delete records where `ROW_NUMBER > 1` for each unique entry.

 

 14. What is normalization, and why is it used?

   Answer: Normalization organizes data to minimize redundancy. The process includes dividing tables and using foreign keys to link related data.

 

 15. Explain the different normal forms.

   Answer:

   - 1NF: Eliminate duplicate columns.

   - 2NF: Remove subsets of data that apply to multiple rows.

   - 3NF: Remove columns that are not dependent on the primary key.

 

 16. What is denormalization?

   Answer: Denormalization is combining tables to improve read performance by reducing the number of joins, often used in data warehouses.

 

 17. What’s the difference between `DELETE`, `TRUNCATE`, and `DROP`?

   Answer:

   - `DELETE` removes specific rows, can use `WHERE`, and maintains the table structure.

   - `TRUNCATE` deletes all rows quickly, but keeps the table.

   - `DROP` removes the entire table and its structure.

 

 18. Explain ACID properties in a database.

   Answer: ACID stands for Atomicity, Consistency, Isolation, and Durability, ensuring reliable transactions in a database.

 

 19. What’s a stored procedure, and why use it?

   Answer: A stored procedure is a saved SQL code block that performs a specific task, improving performance and reusability.

 

 20. How does indexing affect performance?

   Answer: Indexing speeds up data retrieval but can slow down write operations. Proper indexing is crucial for balance.

 

 21. What is an aggregate function? Give examples.

   Answer: Aggregate functions calculate a single value from multiple rows, like `SUM`, `AVG`, `MAX`, `MIN`, and `COUNT`.

 

 22. Explain the difference between `CHAR` and `VARCHAR`.

   Answer: `CHAR` is fixed-length, while `VARCHAR` is variable-length. `VARCHAR` is more efficient for varying-length data.

 

 23. What is a database view, and why use it?

   Answer: A view is a virtual table based on a query. It provides simplified or restricted access to data without storing it.

 

 24. What’s a materialized view?

   Answer: A materialized view stores the result of a query physically, enabling faster access to frequently be accessed complex queries.

 

 25. How would you find the second-highest salary in a table?

   Answer: Use `SELECT MAX(salary) FROM employees WHERE salary < (SELECT MAX(salary) FROM employees);`.

 

 26. Explain the concept of `CASE` in SQL.

   Answer: `CASE` is a conditional expression for returning different results based on conditions, similar to `IF-THEN-ELSE`.

 

 27. What are transactions, and why are they important?

   Answer: Transactions ensure a series of operations are completed successfully. If not, they’re rolled back, ensuring data integrity.

 

 28. How do you prevent SQL injections?

   Answer: Use parameterized queries, stored procedures, and validate inputs to avoid SQL injection vulnerabilities.

 

 29. What’s the purpose of `COALESCE`?

   Answer: `COALESCE` returns the first non-null value from a list of expressions useful for handling nulls in data.

 

 30. What’s a CTE, and why use it?

   Answer: A Common Table Expression (CTE) is a temporary result set within a query, making complex queries easier to read and maintain.

 

 31. Explain `EXISTS` vs. `IN` for subqueries.

   Answer: `EXISTS` is typically faster with correlated subqueries, while `IN` works better with static lists or when matching values across tables.

 

 32. How do you calculate cumulative totals in SQL?

   Answer: Use a window function like `SUM(column) OVER (ORDER BY column)`, which calculates a running total.

 

 33. What are foreign key constraints?

   Answer: Foreign key constraints ensure that the value in one table matches a value in another table, maintaining referential integrity.

 

 34. Explain pivot tables in SQL.

   Answer: Pivot tables rotate rows into columns, summarizing data to better analyze patterns, often done with `CASE` statements or `PIVOT`.

 

 35. What’s a JSON data type in SQL, and when to use it?

   Answer: JSON data type stores semi-structured JSON objects in SQL tables, ideal for storing flexible schema data.

 

 36. How do you use `RANK()` and `DENSE_RANK()`?

   Answer: `RANK()` assigns ranks with gaps, while `DENSE_RANK()` assigns ranks without gaps for ties.

 

 37. What’s the difference between a view and a table?

   Answer: A table stores data, while a view is a virtual table created by a query, containing no data itself.

 

 38. What is a surrogate key?

   Answer: A surrogate key is a unique identifier for a record, often used in place of a primary key to avoid dependencies on natural keys.

 

 39. What are triggers, and when to use them?

   Answer: Triggers automatically execute predefined actions in response to specific events on a table, like inserts or updates.

 

 40. What is `NULL` in SQL?

   Answer: `NULL` represents missing or unknown data, often handled with functions like `IS NULL`, `COALESCE`, or `IFNULL`.

 

 41. How does `OFFSET-FETCH` work in pagination?

   Answer: `OFFSET-FETCH` skips a set number of rows and then fetches the desired number, useful for paging through large datasets.

 

 42. What’s the purpose of the `MERGE` statement?

   Answer: `MERGE` performs insert, update or delete operations in a single statement based on matching records.

 

 43. What’s `ISNULL` vs. `COALESCE`?

   Answer: `ISNULL` is specific to two arguments, while `COALESCE` supports multiple arguments, useful for null handling.

 

 44. How do you find duplicate records?

   Answer: Use `GROUP BY` and `HAVING COUNT() > 1` to identify duplicates based on specific column(s).

 

 45. How does a `FULL OUTER JOIN` work?

   Answer: `FULL OUTER JOIN` returns all records when there’s a match in either left or right table, filling in `NULL` where there’s no match.

 

 46. What’s a `TEMP TABLE`, and why use it?

   Answer: A temporary table stores intermediate data temporarily, making complex calculations simpler without storing them in the main database.

 

 47. How do you perform error handling in SQL?

   Answer: Use `TRY-CATCH` blocks to handle runtime errors, making debugging easier and enhancing reliability.

 

 48. Explain `LEFT JOIN` vs. `RIGHT JOIN`.

   Answer: `LEFT JOIN` returns all records from the left table and matching records from the right, while `RIGHT JOIN` does the opposite.

 

 49. What is indexing?

   Answer: Indexing is a performance optimization technique, creating data structures that speed up data retrieval.

 

 50. Explain `AUTO_INCREMENT` in SQL.

   Answer: `AUTO_INCREMENT` is used to automatically generate unique values for a primary key field in tables.


follow Satish Mandale for more such contents 

Comments

Popular posts from this blog

A Complete Guide to SnowSQL in Snowflake: Usage, Features, and Best Practices

Mastering DBT (Data Build Tool): A Comprehensive Guide

Unleashing the Power of Snowpark in Snowflake: A Comprehensive Guide