We know that sorting can be one of the most expensive things in an execution plan as shown below. However, we continue to do ORDER BYs repeatedly. Yes, I 100% agree that there is a need to sort a results set and that this should be done in the procedure for good reason, but my concern is having multiple sorts, erroneous sorts, and the sorts that can be done elsewhere. These are the ones that waste resources and can stifle performance.
Many of us writing procedures tend to write in code blocks. We write the SELECT, JOINS, FROMs and WHERES then immediately follow it up with and ORDER BY as a way to check result sets before moving onto the next block of code. I admit I do this almost every time. But what most developers do not do is remove unneeded ORDER BYs that are not required. This is very costly and can lead to suboptimal performance not only of your procedure but also for TEMPDB as this is where all sorting takes place.
Do you sort in your procedures that are used for data consumers like reports, ETL or an application? If you do, I ask, why are you sorting in the procedure and not in the consumer ? Many report end users will resort the data in Excel, or the report itself gives parameters for custom sorts or the data doesn’t need a sort at all. Why are you wasting resources on the SQL Server side just for it to be nullified? By removing unneeded sorts or performing the sort in the application tier you can have big performance gains. I would rather have a report, ETL process or application take the performance hit then a procedure.
Let’s look at one of the procedures that are available in AdventureWorks2016CPT3 called uspGetOrderTrackingBySalesOrderID. We will run it using the example execution in the code and then remove the ORDER BY, compile and rerun. We will be able to see clearly see the difference.
USE [AdventureWorks2016CTP3] GO /****** Object: StoredProcedure [dbo].[uspGetOrderTrackingBySalesOrderID] Script Date: 1/28/2020 11:31:16 AM ******/ SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO ALTER PROCEDURE [dbo].[uspGetOrderTrackingBySalesOrderID] @SalesOrderID [int] NULL AS BEGIN /* Example: exec dbo.uspGetOrderTrackingBySalesOrderID 53498 */ SET NOCOUNT ON; SET STATISTICS IO, TIME ON SELECT ot.SalesOrderID, ot.CarrierTrackingNumber, ot.OrderTrackingID, ot.TrackingEventID, te.EventName, ot.EventDetails, ot.EventDateTime FROM Sales.OrderTracking ot, Sales.TrackingEvent te WHERE ot.SalesOrderID = @SalesOrderID AND ot.TrackingEventID = te.TrackingEventID --ORDER BY -- ot.SalesOrderID, -- ot.TrackingEventID; END;
Plan with ORDER BY
Plan without ORDER BY and Query Store graph showing the difference in duration between the two. You can clearly see the performance improvement, and this was just one sort in a very simple procedure.
Take a moment and consider the sorting that happens in your code. I’d ask that when writing store procedures, doing code reviews or performance tuning that you take a second to ask why the sorts are being done in the data tier and if they can be performed elsewhere. You can see get some performance gains not only in your code but in TEMPDB as well when sorting is reigned in.
Pingback: The Cost of Sorting in Stored Procedures – Curated SQL
Interesting post and valid arguments, especially if you are only concerned with the performance of the database. My counter-argument would be that sorting is likely also the most expensive operation in applications and/or reporting tools too, and the database may still be the best place to do that.
As a consultant it is usually my job to figure out where it would be best to take the performance hit, while considering what I have access to and control over…it quickly becomes a loaded situation with many factors to consider.