You might have heard that SQL Server user-defined functions are the devil. That’s because someone got burned. They put a scalar function in a WHERE clause on a big table and fed it a column name. And bad things happened.
In this case, “bad” means slow. Really slow.
In order for queries to perform well, SQL Server needs to be able to find the data quickly, and that usually means using an index. An index is a list of key data from a table, organized to make finding things faster. Just like in a book, instead of scanning every page for a mention of a word, you can use the index, where the author– or her computer program– has done that work for you as a handy reference.
An index is fastest when you can use it directly. If you’re looking for pages which mention “SQL”, you find “SQL” in the index and it tells you the page numbers to check. But what if you’re looking for all words which end in “SQL”, like “T-SQL”? Now can you can’t go right to the part of the index you want. You have to scan through the whole index. You no longer have a single search argument (“SQL”); you have a search expression (“ends in SQL”). The industry term we use is that your search is no longer “sargable” (“Search ARGument ABLE”).
The moral of the story is that if you want an index used most effectively, make sure you’re searching on exactly what’s being indexed. If you’re searching by a date, and you want to find everything in a certain year, don’t use
WHERE Year([DateColumn]) = 2018". Instead, use
WHERE [DateColumn] >= '20180101' AND [DateColumn] < '20190101'". As a human, you can understand these as the same thing, but often the optimizer just sees "arbitrary function on indexed column" and figures it can't use the index the normal way but instead has to slog through the whole thing and run the function to see what it does.
And if the optimizer has this problem with the built-in function "Year", what do you think it's going to do with your scalar "Complex_2000_Line_Monstrosity"? It's going to treat it like the black box it is and run through every entry in the index to test it. And that assumes there's an index. A big user-defined function often needs several different columns' worth of data, some of which isn't in an index. So now you're scanning through the whole table. Grab yourself a cup of joe. You'll be there a while.
You can see right through them.
While scalar functions are opaque to the optimizer, inline functions are transparent. Instead of seeing an inline function as a black box, the query optimizer sees it rather like a parameterized view-- that is, a view which happens to take parameters. Views and inline functions are essentially copied and pasted into the query which calls them, as if they weren't a separate construct at all.
|User-Defined Function Type||Results||Multi-Statement||Opaque|
Now just because your function boundaries fade away in optimizer-land doesn't mean you will find your seek. You don't need a function to destroy saragability. Simply rearranging your criteria logic can handle that on its own. For example, you won't get the benefit of your index on
Column1 if you use
WHERE 1 = CASE WHEN Column1 = @value THEN 1 ELSE 0 END instead of
WHERE Column1 = @value. Again, we humans can see that's the same thing, but SQL Server does not.
But if you use an inline function which directly acts on an indexed field, without an obfuscating calculation, the optimizer will still use an indexed seek. Just how to make it happen feels a little more like art than science.
Orders table with an
OrderID column, an
OrderStatusID column, and an
OrderDate column. The
OrderDate are irrelevant for our examples except to provide some additional order-related data. The key column here is
OrderStatusID. Let's say we have many statuses an order could be in (New, Processing, Shipping, Paid, etc.).
Now let's say we have a business rule which applies in only some statuses. Perhaps an order is handled by a certain department only if it's New or Processing but not when in any other statuses. If you're putting this business logic in the database-- as you should-- your database code will need to understand which statuses are special.
The most performance-efficient way to query the Orders table for the special status orders is to directly list them in each relevant query:
SELECT o.* FROM #Orders o WHERE o.OrderStatusID IN (3,8);
With an index on the
OrderStatusID field, the optimizer would likely use an Index Seek action and find your rows lickety-split. However, you'll probably need the database to understand that concept in many different places. Rather than repeat this list of "special statuses", you decide you want a function to calculate whether a status is special. That way your business logic is centralized:
CREATE FUNCTION IsOrderStatusSpecialScalar ( @OrderStatusID int ) RETURNS bit AS BEGIN RETURN ( CASE WHEN @OrderStatusID IN (3,8) THEN 1 ELSE 0 END );
Now whenever you want to find these special statuses, you just use the function.
SELECT o.* FROM #Orders o WHERE dbo.IsOrderStatusSpecialScalar(o.OrderStatusID) = 1;
Alas, you find a slow index scan because the optimizer doesn't know what's going on inside the function. It simply calls it over and over for each row.
You might think you can rescue your performance by using an inline function.
CREATE FUNCTION IsOrderStatusSpecial ( @OrderStatusID int ) RETURNS TABLE AS RETURN SELECT OrderStatusIsSpecial = CASE WHEN @OrderStatusID IN (3,8) THEN 1 ELSE 0 END;
You have to change the usage a little bit but it's similarly concise.
SELECT o.* FROM #Orders o CROSS APPLY IsOrderStatusSpecial ( o.OrderStatusID ) s WHERE s.OrderStatusIsSpecial = 1;
Unfortunately, you run into the same index scan performance issue. You can't really blame the inline function, though. After all, you get the same poor performance using the function's
WHEN logic directly:
SELECT o.* FROM #Orders o WHERE 1 = CASE WHEN o.OrderStatusID IN (3,8) THEN 1 ELSE 0 END;
So you have to find a way to use an inline function where the optimizer sees a more direct usage of the indexed field. So you create an inline function which doesn't use any calculation to produce its bit.
CREATE FUNCTION IsOrderStatusSpecialLimiter ( @OrderStatusID int ) RETURNS TABLE AS RETURN SELECT OrderIsSpecial = CONVERT(bit,1) WHERE @OrderStatusID IN (3,8);
And you don't even use the bit it makes when you call the function. You let the function's results impact the rows returned directly, via filtration.
SELECT o.* FROM #Orders o CROSS APPLY IsOrderStatusSpecialLimiter ( o.OrderStatusID ) s;
And voila! You have your logic encapsulation cake and you get to eat your index seek performance, too.
But that's not the only way to make this work. And this is where it starts to feel a little more like an art. Instead of having the inline function perform a calculation or even limit rows with a
WHERE clause with an
IN list, you could build a virtual table of special statuses in your inline function.
CREATE FUNCTION GetSpecialOrderStatusIDs() RETURNS TABLE AS RETURN SELECT s.OrderStatusID FROM ( SELECT 3 UNION ALL SELECT 8 ) AS s(OrderStatusID);
You're not even passing anything into this function. In fact, you could even use a plain old view. And you'd use it in a more traditional manner, without that new-fangled
SELECT o.* FROM #Orders o WHERE EXISTS ( SELECT * FROM GetSpecialOrderStatusIDs() s WHERE s.OrderStatusID = o.OrderStatusID );
This is yet another example of using an inline function without sacrificing performance.
Hide and No Seek
So no, user-defined functions are not the devil. Scalar user-defined functions can cause big problems if misused, but generally inline user-defined functions do not cause problems.
The real rule of thumb is not to avoid functions, but rather to avoid adorning your index fields with logic or functions. Because when you hide your intentions from the optimizer with complex syntax, you risk not getting the better performing index seek.