By Riley Major, 2018-09-14
So you’ve wisely decided to put your business logic in your database, but for whatever reason you can’t use inline user defined functions, even though they encapsulate logic without hurting performance. That’s ok, because you can build functions on the fly in a single statement.
Tables on the Fly
You might have heard of Common Table Expressions (CTEs), T-SQL’s tool for building tables on the fly in the single statement. They’re a lot like Derived Tables, in that they’re
SELECT statements which can be used by other portions of an encapsulating
SELECT statement. But unlike Derived Tables, they can be used multiple times (in the same statement). It’s not accurate to think of them as statement-level temporary tables, because they aren’t filled up with data as a discrete step. Except for their recursive call feature, they are tantamount to repeating the contained
SELECT statement multiple times in the main
For example, this
SELECT statement repeats the same Derived Table multiple times.
SELECT a1 = t1.a, a2 = t2.a FROM ( SELECT a = n + 1 FROM @t ) AS t1 CROSS JOIN ( SELECT a = n + 1 FROM @t ) AS t2;
SELECT statement creates a Common Table Expression for that Derived Table’s contents and references the CTE twice.
WITH t AS ( SELECT a = n + 1 FROM @t ) SELECT a1 = t1.a, a2 = t2.a FROM t AS t1 CROSS JOIN t AS t2;
They have the same query plan. And there is no separate step in the query plan for the “population” of the supposed temporary table.
So if you’re going to repeatedly use the same *data* multiple times in a query, the Common Table Expression is your tool of choice. But what if you want to use the same logic?
Pump it up.
Unfortunately, there’s no similar syntactical sugar to create a statement-level user-defined function. There’s no direct way to have the same calculation applied to different columns of data without creating an actual user-defined function or simply repeating the logic. But there is a neat trick which in the right circumstances can accomplish the same goal without costing you any performance.
The goal, as with any function, is not repeating yourself. It’s not just a bunch of extra characters in your code. It’s an accident waiting to happen– two representations of the same logic which are just bound to diverge due to our cursed human imperfections.
But how can you run different columns of data through the same calculation in a query? Turn them into rows!
Calculations are already performed repeatedly for every row in a query. So all you have to do is take your columns and explode them up into multiple rows. In order to apply a calculation to
Column2, first change each row of the data into two more rows with a new, computed
Column3 which has the original row’s data for
Column1 first and the data for
In other words, you need to convert this:
Once you have that vertical list of the columns’ values, you can
APPLY your calculation to it, once, and it will operate on all of the values.
|Column1||Column2||Counter||Column3 – Altered|
All that’s left is to smoosh those vertically-represented values back into a single horizontal row. You could use
PIVOT, but that syntax is a little tricky. Instead, first duplicate the calculated column and then selectively suppress the values you don’t want (by using a
CASE expession). You’ll end up with this:
|Column1||Column2||Counter||Column1 – Altered||Column 2 – Altered|
If you take the
MAX of each of those calculated columns, you end up with a single row for each one, with your final results:
|Column1||Column2||Column1 – Altered||Column 2 – Altered|
And voila! You’ve taken a statement with repetitive code:
SELECT ContactName = c.ContactName, Phone1Formatted = REPLACE(REPLACE(REPLACE(REPLACE(c.Phone1,'(',''),')',''),' ',''),'-',''), Phone2Formatted = REPLACE(REPLACE(REPLACE(REPLACE(c.Phone2,'(',''),')',''),' ',''),'-','') FROM @Contacts AS c;
And eliminated that repetition:
SELECT ContactName = c.ContactName, Phone1Formatted = t.Phone1Formatted, Phone2Formatted = t.Phone2Formatted FROM @Contacts AS c CROSS APPLY ( SELECT Phone1Formatted = MAX(CASE WHEN Exploder.n = 1 THEN PhoneFormatter.PhoneFormatted ELSE '' END), Phone2Formatted = MAX(CASE WHEN Exploder.n = 2 THEN PhoneFormatter.PhoneFormatted ELSE '' END) FROM (SELECT n = 1 UNION ALL SELECT 2) AS Exploder CROSS APPLY ( SELECT Phone = CASE Exploder.n WHEN 1 THEN c.Phone1 WHEN 2 THEN c.Phone2 ELSE NULL END ) AS Chooser CROSS APPLY ( SELECT PhoneFormatted = REPLACE(REPLACE(REPLACE(REPLACE(Chooser.Phone,'(',''),')',''),' ',''),'-','') ) AS PhoneFormatter ) AS t;
Admittedly, there is more overall code in this simplified example, but imagine that the logic was more complex. And also, remember that it’s not about character count, but rather removing duplication because of the potential for different behaviors if the copied code isn’t perfectly maintained.
Unfortunately, unlike with Common Table Expressions, this is not mere syntactic sugar. This is semantically different T-SQL which makes the engine do more work– first, spreading apart the columns into rows before shoving them through your calculation and then squishing them back down again. But in the right circumstances, it might add only trivial overhead, as in this example.
Still, use this technique with care. It’s almost always better to just make actual inline user-defined functions, which often have identical performance with the benefit of code encapsulation, as they do in this example.
Ceci est une pipe
This isn’t the only way to reduce code duplication in your complex T-SQL statements. If you have two calculations which have similar beginnings but differ in the final steps, break them apart into their separate steps so that you can re-use the first steps of the calculation.
SELECT Palindrome1 = SecondaryOperations.Palindrome1, Palindrome2 = SecondaryOperations.Palindrome2 FROM (SELECT d = 'abc') AS RawData CROSS APPLY ( SELECT r = REVERSE(RawData.d) ) AS BaseOperation CROSS APPLY ( SELECT Palindrome1 = RawData.d + BaseOperation.r, Palindrome2 = BaseOperation.r + RawData.d ) AS SecondaryOperations;
It’s helpful to think of each
APPLY as a pipe operation, taking the values from the previous derived table and passing them into the next to be manipulated. Programming T-SQL in this manner (loosely) approximates modern functional programming techniques.
It keeps each step of the logic smaller, so that it’s easier to understand. And you can expose the intermediary columns to help with debugging.
Department of Redundancy Department
You are going to have business logic in your database, whether you like it or not. Use these and other techniques to make sure it’s only in there once.
Photo credit: Water Bird by Evan