Top 50 SQL Interview Questions and Answers for Data Engineers
1. What are the differences between `INNER
JOIN`, `LEFT JOIN`, `RIGHT JOIN`, and `FULL JOIN`?
Answer:
- `INNER JOIN` returns records with matching
values in both tables.
- `LEFT JOIN` returns all records from the
left table and matches records from the right table.
- `RIGHT JOIN` returns all records from the
right table and matches records from the left table.
- `FULL JOIN` returns all records when there
is a match in either left or right table.
2. Explain `GROUP BY` and `HAVING`. How do
they differ?
Answer: `GROUP BY` groups rows with the same values in specified
columns, while `HAVING` is used to filter records that aggregate results meet a
certain condition. Unlike `WHERE`, `HAVING` works with aggregated data.
3. What’s the difference between `WHERE` and
`HAVING`?
Answer: `WHERE` filters rows before any aggregation, while
`HAVING` filters after aggregation, usually used with `GROUP BY`.
4. What is the primary key?
Answer: A primary key is a unique identifier for records in a
table. It ensures each row is unique and cannot contain `NULL` values.
5. What’s the difference between a `UNION` and
a `UNION ALL`?
Answer: `UNION` combines results from two queries and removes
duplicates, while `UNION ALL` combines results without removing duplicates,
making it faster.
6. Explain `DISTINCT` and when to use it.
Answer: `DISTINCT` removes duplicate rows from
a result set. It’s useful when you want only unique values.
7. How do you create an index, and why is
it important?
Answer: An index improves query performance by providing quick
access to rows. It’s created using `CREATE INDEX index_name ON table(column); `.
8. What are clustered and non-clustered
indexes?
Answer: A clustered index sorts and stores rows in the table based
on key values, while a non-clustered index creates a separate structure for
lookups.
9. What’s a foreign key?
Answer: A foreign key is a field in one table that uniquely
identifies a row in another table, enforcing referential integrity between
tables.
10. What is a subquery, and when would you use
it?
Answer: A subquery is a query nested inside another query, often
used to perform intermediate calculations or filter data.
11. Explain window functions and give an
example.
Answer: Window functions perform calculations across a set of
table rows related to the current row. Example: `ROW_NUMBER() OVER (PARTITION
BY department ORDER BY salary)` assigns a rank based on salary within each
department.
12. What is a self-join?
Answer: A self-join is a regular join where a table joins itself,
useful for comparing rows within the same table.
13. How would you delete duplicate records in
SQL?
Answer: Using `ROW_NUMBER()`, identify duplicates and delete
records where `ROW_NUMBER > 1` for each unique entry.
14. What is normalization, and why is it
used?
Answer: Normalization organizes data to minimize redundancy. The
process includes dividing tables and using foreign keys to link related data.
15. Explain the different normal forms.
Answer:
- 1NF: Eliminate duplicate columns.
- 2NF: Remove subsets of data that apply to
multiple rows.
- 3NF: Remove columns that are not dependent
on the primary key.
16. What is denormalization?
Answer: Denormalization is combining tables to improve read
performance by reducing the number of joins, often used in data warehouses.
17. What’s the difference between `DELETE`,
`TRUNCATE`, and `DROP`?
Answer:
- `DELETE` removes specific rows, can use
`WHERE`, and maintains the table structure.
- `TRUNCATE` deletes all rows quickly, but
keeps the table.
- `DROP` removes the entire table and its
structure.
18. Explain ACID properties in a database.
Answer: ACID stands for Atomicity, Consistency, Isolation, and
Durability, ensuring reliable transactions in a database.
19. What’s a stored procedure, and why use
it?
Answer: A stored procedure is a saved SQL code block that performs
a specific task, improving performance and reusability.
20. How does indexing affect performance?
Answer: Indexing speeds up data retrieval but can slow down write
operations. Proper indexing is crucial for balance.
21. What is an aggregate function? Give
examples.
Answer: Aggregate functions calculate a single value from multiple
rows, like `SUM`, `AVG`, `MAX`, `MIN`, and `COUNT`.
22. Explain the difference between `CHAR`
and `VARCHAR`.
Answer: `CHAR` is fixed-length, while `VARCHAR` is
variable-length. `VARCHAR` is more efficient for varying-length data.
23. What is a database view, and why use it?
Answer: A view is a virtual table based on a query. It provides simplified
or restricted access to data without storing it.
24. What’s a materialized view?
Answer: A materialized view stores the result of a query
physically, enabling faster access to frequently be accessed complex queries.
25. How would you find the second-highest
salary in a table?
Answer: Use `SELECT MAX(salary) FROM employees WHERE salary <
(SELECT MAX(salary) FROM employees);`.
26. Explain the concept of `CASE` in SQL.
Answer: `CASE` is a conditional expression for returning different
results based on conditions, similar to `IF-THEN-ELSE`.
27. What are transactions, and why are they
important?
Answer: Transactions ensure a series of operations are completed
successfully. If not, they’re rolled back, ensuring data integrity.
28. How do you prevent SQL injections?
Answer: Use parameterized queries, stored procedures, and validate
inputs to avoid SQL injection vulnerabilities.
29. What’s the purpose of `COALESCE`?
Answer: `COALESCE` returns the first non-null value from a list of
expressions useful for handling nulls in data.
30. What’s a CTE, and why use it?
Answer: A Common Table Expression (CTE) is a temporary result set
within a query, making complex queries easier to read and maintain.
31. Explain `EXISTS` vs. `IN` for
subqueries.
Answer: `EXISTS` is typically faster with correlated subqueries,
while `IN` works better with static lists or when matching values across
tables.
32. How do you calculate cumulative totals
in SQL?
Answer: Use a window function like `SUM(column) OVER (ORDER BY
column)`, which calculates a running total.
33. What are foreign key constraints?
Answer: Foreign key constraints ensure that the value in one table
matches a value in another table, maintaining referential integrity.
34. Explain pivot tables in SQL.
Answer: Pivot tables rotate rows into columns, summarizing data to
better analyze patterns, often done with `CASE` statements or `PIVOT`.
35. What’s a JSON data type in SQL, and
when to use it?
Answer: JSON data type stores semi-structured JSON objects in SQL
tables, ideal for storing flexible schema data.
36. How do you use `RANK()` and
`DENSE_RANK()`?
Answer: `RANK()` assigns ranks with gaps, while `DENSE_RANK()`
assigns ranks without gaps for ties.
37. What’s the difference between a view and a
table?
Answer: A table stores data, while a view is a virtual table
created by a query, containing no data itself.
38. What is a surrogate key?
Answer: A surrogate key is a unique identifier for a record, often
used in place of a primary key to avoid dependencies on natural keys.
39. What are triggers, and when to use
them?
Answer: Triggers automatically execute predefined actions in
response to specific events on a table, like inserts or updates.
40. What is `NULL` in SQL?
Answer: `NULL` represents missing or unknown data, often handled
with functions like `IS NULL`, `COALESCE`, or `IFNULL`.
41. How does `OFFSET-FETCH` work in
pagination?
Answer: `OFFSET-FETCH` skips a set number of rows and then fetches
the desired number, useful for paging through large datasets.
42. What’s the purpose of the `MERGE`
statement?
Answer: `MERGE` performs insert, update or delete operations in a
single statement based on matching records.
43. What’s `ISNULL` vs. `COALESCE`?
Answer: `ISNULL` is specific to two arguments, while `COALESCE`
supports multiple arguments, useful for null handling.
44. How do you find duplicate records?
Answer: Use `GROUP BY` and `HAVING COUNT() > 1` to identify
duplicates based on specific column(s).
45. How does a `FULL OUTER JOIN` work?
Answer: `FULL OUTER JOIN` returns all records when there’s a match
in either left or right table, filling in `NULL` where there’s no match.
46. What’s a `TEMP TABLE`, and why use it?
Answer: A temporary table stores intermediate data temporarily,
making complex calculations simpler without storing them in the main database.
47. How do you perform error handling in
SQL?
Answer: Use `TRY-CATCH` blocks to handle runtime errors, making
debugging easier and enhancing reliability.
48. Explain `LEFT JOIN` vs. `RIGHT JOIN`.
Answer: `LEFT JOIN` returns all records from the left table and
matching records from the right, while `RIGHT JOIN` does the opposite.
49. What is indexing?
Answer: Indexing is a performance optimization technique, creating
data structures that speed up data retrieval.
50. Explain `AUTO_INCREMENT` in SQL.
Answer: `AUTO_INCREMENT` is used to automatically generate unique
values for a primary key field in tables.
follow Satish Mandale for more such contents
Comments
Post a Comment