SQL Server Performance: Differences Between HAVING and WHERE in T-SQL
In SQL Server T-SQL, both the HAVING
and WHERE
clauses are used to filter rows in a SQL query, but they serve different purposes and are used in different parts of a query. Let's explore the differences between the two and provide examples of their performance differences.
Sample Data
You can use this Fiddle to follow along and practice HAVING and WHERE
clauses T-SQL
CREATE TABLE Orders (
OrderID INT,
CustomerID INT,
OrderDate DATE,
TotalAmount DECIMAL(10, 2),
PRIMARY KEY (CustomerID, OrderID)
);
INSERT INTO Orders (OrderID, CustomerID, OrderDate, TotalAmount)
VALUES
(1, 101, '2023-08-01', 150.00),
(2, 102, '2023-08-02', 75.50),
(3, 101, '2023-08-03', 200.00),
(4, 103, '2023-08-04', 50.00),
(5, 102, '2023-08-05', 120.00),
(6, 104, '2023-08-06', 300.00),
(7, 101, '2023-08-07', 75.00),
(8, 103, '2023-08-08', 250.00),
(9, 102, '2023-08-09', 180.00),
(10, 105, '2023-08-10', 90.00);
Orders Table
OrderID | CustomerID | OrderDate | TotalAmount |
---|---|---|---|
1 | 101 | 2023-08-01 | 150.00 |
3 | 101 | 2023-08-03 | 200.00 |
7 | 101 | 2023-08-07 | 75.00 |
2 | 102 | 2023-08-02 | 75.50 |
5 | 102 | 2023-08-05 | 120.00 |
9 | 102 | 2023-08-09 | 180.00 |
4 | 103 | 2023-08-04 | 50.00 |
8 | 103 | 2023-08-08 | 250.00 |
6 | 104 | 2023-08-06 | 300.00 |
10 | 105 | 2023-08-10 | 90.00 |
WHERE and HAVING Examples
WHERE Clause
The WHERE
clause is used to filter rows before the GROUP BY
or aggregation is applied. It operates on individual rows and determines which rows are included in the subsequent grouping or aggregation. It's commonly used to filter data based on individual column values.
Example:
Suppose you have a table named Orders
with columns CustomerID
, OrderID
, OrderDate
, and TotalAmount
. The combination of CustomerID
and OrderID
is the primary key of the table. You want to retrieve the orders with total amount more than $100.
SELECT CustomerID
,OrderID
,TotalAmount
FROM Orders
WHERE TotalAmount > 100;
Query Result
CustomerID | OrderID | TotalAmount |
---|---|---|
101 | 1 | 150.00 |
101 | 3 | 200.00 |
102 | 5 | 120.00 |
102 | 9 | 180.00 |
103 | 8 | 250.00 |
104 | 6 | 300.00 |
In this query, the WHERE clause filters rows from the Orders table before any grouping or aggregation. It directly operates on individual rows based on the TotalAmount column. The database engine can efficiently utilize any indexes on the TotalAmount column to quickly identify and retrieve the rows that satisfy the condition. This query doesn't involve any grouping or aggregate functions, making it relatively straightforward and efficient.
HAVING Clause
The HAVING
clause is used to filter the result of an aggregation after the GROUP BY
has been applied. It operates on the result of the grouping and aggregation and determines which aggregated groups are included in the final result set. It's used to filter aggregated data based on the results of aggregate functions.
You could get the same query result using GROUP BY and HAVING clauses.
SELECT CustomerID
,OrderID
,SUM(TotalAmount) AS TotalAmount
FROM Orders
GROUP BY CustomerID
,OrderID
HAVING SUM(TotalAmount) > 100;
Query Result
CustomerID | OrderID | TotalAmount |
---|---|---|
101 | 1 | 150.00 |
101 | 3 | 200.00 |
102 | 5 | 120.00 |
102 | 9 | 180.00 |
103 | 8 | 250.00 |
104 | 6 | 300.00 |
In this query, the HAVING clause filters the results after a GROUP BY operation and aggregation have been performed. The SUM(TotalAmount) aggregate function is calculated for each group of (CustomerID, OrderID) pairs. The filtering is applied after these aggregations, and it involves calculating the sums for all groups before filtering out those that don't meet the condition. This process requires more computation and may be less efficient, especially if the table contains a large number of rows and the aggregations are complex.
Comparing the two queries, the first query (using the WHERE clause) is generally more efficient because it directly filters rows based on the TotalAmount column (which could have an index already) before any aggregation occurs. The second query (using the HAVING clause) involves additional computation due to the aggregation before filtering.
In summary, when filtering based on non-aggregated column values, it's typically more efficient to use the WHERE clause. When filtering based on aggregated results, the HAVING clause is necessary, but it might involve more processing and be less optimized for direct row-level filtering.
Performance Differences Between HAVING
and WHERE
The performance differences between the HAVING
and WHERE
clauses are mainly related to the order in which filtering is applied.
-
Filtering Order:
- The
WHERE
clause filters individual rows before any grouping or aggregation is performed. - The
HAVING
clause filters aggregated results after grouping and aggregation have been applied.
- The
-
Optimization:
- The
WHERE
clause can often be optimized more efficiently by SQL Server, as it operates on individual rows and can take advantage of indexes on specific columns. - The
HAVING
clause works on aggregated data, which might require more processing and can be less efficient in terms of optimization compared to theWHERE
clause.
- The
-
Usage:
- If you're filtering based on individual column values, use the
WHERE
clause. - If you're filtering based on aggregated results or using aggregate functions like
SUM
,COUNT
,AVG
, etc., use theHAVING
clause.
- If you're filtering based on individual column values, use the
In terms of performance differences, if filtering is done based on individual column values, using the WHERE
clause generally provides better performance. However, in cases where filtering requires aggregated results, you need to use the HAVING
clause, even though it might not be as optimized as the WHERE
clause for individual row filtering.
Remember that actual performance can depend on various factors such as the complexity of the query, the database schema, indexes, and the amount of data being processed. It's a good practice to use appropriate indexes, write efficient queries, and profile performance when dealing with large datasets.