OK So, I am doing some digging and peaking around again in SQL Server and came across a database option called Date Correlation Optimization Enabled = False. Honestly, I had no clue what it did, so I took it as a learning opportunity to look into it and do a little research. Who knows, it may actual help me solve one of the many problems I run into day to day for clients.
Syntax
ALTER DATABASE DEMO SET DATE_CORRELATION_OPTIMIZATION ON;
So, What Does It Do?
According to MSDN – The DATE_CORRELATION_OPTIMIZATION database SET option improves the performance of queries that perform an equi-join between two tables whose date or datetime columns are correlated, and which specify a date restriction in the query predicate.
How many of you read what MSDN says and thinks “wuuuuuttt, English please”? I do.
In English
Basically, it uses a foreign key relationship key between tables in SQL Server to enhance performance of date and date time queries when the dates fall within a certain defined range of each other (correlates). Ok that’s cool, but what’s the big deal? The power really comes in for things like reporting, validation, and data warehouses. With this option turned on, SQL Server maintains statistics between correlated columns and creates improved execution plans that reads less data.
Let’s See It in Action
Consider this, all internet orders that are received have a must send out by date (due date) of 10 days after order is received. Therefore, the OrderDate and DueDate are correlated, related to each other.
Here is a query you would normally run.
SELECT * FROM InternetOrders AS i INNER JOIN InternetOrderDetail AS d ON i.InternetOrderID = d.InternetOrderID WHERE i.OrderDate BETWEEN '20170801' AND '20170901';
Without DATE_CORRELATION_OPTIMIZATION turn on the optimizer would create a plan just like anything else, however with it set to ON the optimizer can make more granular execution plans.
Here’s how
With each INSERT, UPDATE and DELETE between these two tables SQL Server is gathering statistics which helps the optimizer infer the query to be more like the one below. This is where the power comes in. The optimizer can better narrow down the records it needs to read and therefor returning faster results.
Here is the way SQL interprets the dates now that correlation is turned on and statistics are being gathered. It based on those statistics it can now infer that each DueDate is exactly 10 days after the OrderDate.
SELECT * FROM InternetOrders AS i, InternetOrderDetail AS d WHERE i.InternetOrderID = d.InternetOrderID AND i.OrderDate BETWEEN '8/1/17' AND '8/15/17' AND d.DueDate BETWEEN CAST('20170801' AS DATETIME) + 10 AND CAST('20170901' AS DATETIME) + 10;
Depending on the number of records in the two tables this can be a VERY significant decrease to execution times.
The Caution
You should not enable DATE_CORRELATION_OPTIMIZATION in update-intensive database environments. SQL Server keeps all the correlation information in statistics form, this means with every INSERT, UPDATE and DELETE you gain additional overhead.
As always, be sure to test it before you use it in production.
Pingback: Date Correlation Optimization – Curated SQL
Thanks for the easy-to-understand explanation.
I’m curious why this feature doesn’t also work with DATETIME2 columns. It is frustrating that Microsoft gave us the more accurate and more space-efficient datatype, but that features like this can’t handle it without converting it back to DATETIME, thus providing a disincentive to use DATETIME2.
I’d appreciate any insight you could offer.