Hmmm… What’s This? Date Correlation Optimization in SQL Server

OK So, I am doing some digging and peaking around again in SQL Server and came across a database option called Date Correlation Optimization Enabled = False. Honestly, I had no clue what it did, so I took it as a learning opportunity to look into it and do a little research. Who knows, it may actual help me solve one of the many problems I run into day to day for clients.

Syntax

ALTER DATABASE DEMO
   SET DATE_CORRELATION_OPTIMIZATION ON;

So, What Does It Do?

According to MSDN – The DATE_CORRELATION_OPTIMIZATION database SET option improves the performance of queries that perform an equi-join between two tables whose date or datetime columns are correlated, and which specify a date restriction in the query predicate.

How many of you read what MSDN says and thinks “wuuuuuttt, English please”? I do.

In English

Basically, it uses a foreign key relationship key between tables in SQL Server to enhance performance of date and date time queries when the dates fall within a certain defined range of each other (correlates). Ok that’s cool, but what’s the big deal? The power really comes in for things like reporting, validation, and data warehouses. With this option turned on, SQL Server maintains statistics between correlated columns and creates improved execution plans that reads less data.

Let’s See It in Action

Consider this, all internet orders that are received have a must send out by date (due date) of 10 days after order is received. Therefore, the OrderDate and DueDate are correlated, related to each other.

Here is a query you would normally run.

SELECT  *
FROM    InternetOrders AS i
        INNER JOIN InternetOrderDetail AS d ON i.InternetOrderID = d.InternetOrderID
WHERE   i.OrderDate BETWEEN '20170801' AND '20170901';


Without DATE_CORRELATION_OPTIMIZATION turn on the optimizer would create a plan just like anything else, however with it set to ON the optimizer can make more granular execution plans.

Here’s how

With each INSERT, UPDATE and DELETE between these two tables SQL Server is gathering statistics which helps the optimizer infer the query to be more like the one below. This is where the power comes in. The optimizer can better narrow down the records it needs to read and therefor returning faster results.

Here is the way SQL interprets the dates now that correlation is turned on and statistics are being gathered. It based on those statistics it can now infer that each DueDate is exactly 10 days after the OrderDate.

SELECT  *
FROM    InternetOrders AS i,
        InternetOrderDetail AS d
WHERE   i.InternetOrderID = d.InternetOrderID
        AND i.OrderDate BETWEEN '8/1/17' AND '8/15/17'
        AND d.DueDate BETWEEN CAST('20170801' AS DATETIME) + 10
                      AND     CAST('20170901' AS DATETIME) + 10;

Depending on the number of records in the two tables this can be a VERY significant decrease to execution times.

The Caution

You should not enable DATE_CORRELATION_OPTIMIZATION in update-intensive database environments. SQL Server keeps all the correlation information in statistics form, this means with every INSERT, UPDATE and DELETE you gain additional overhead.

As always, be sure to test it before you use it in production.

About Monica Rathbun

Monica Rathbun lives in Virginia, is a Microsoft MVP for Data Platform and Microsoft Certified Solutions Expert. She has nearly two decades of experience working with a wide variety of database platforms with a focus on SQL Server and the Microsoft Data Platform. She is a frequent speaker at IT industry conferences on topics including performance tuning and configuration management. She is the Leader of the Hampton Roads SQL Server User Group. She is passionate about SQL Server and the SQL Server community, doing anything she can to give back. Monica can always be found on Twitter (@sqlespresso) handing out helpful tips. You can find Monica blogging at sqlespresso.com 
Bookmark the permalink.

2 Comments

  1. Pingback: Date Correlation Optimization – Curated SQL

  2. Thanks for the easy-to-understand explanation.

    I’m curious why this feature doesn’t also work with DATETIME2 columns. It is frustrating that Microsoft gave us the more accurate and more space-efficient datatype, but that features like this can’t handle it without converting it back to DATETIME, thus providing a disincentive to use DATETIME2.

    I’d appreciate any insight you could offer.

Comments are closed