Bird shape temporarily formed by cascading water captured in still image.

Functions on the Fly

By Riley Major, 2018-09-14

So you’ve wisely decided to put your business logic in your database, but for whatever reason you can’t use inline user defined functions, even though they encapsulate logic without hurting performance. That’s ok, because you can build functions on the fly in a single statement.

Sort of.

Tables on the Fly

You might have heard of Common Table Expressions (CTEs), T-SQL’s tool for building tables on the fly in the single statement. They’re a lot like Derived Tables, in that they’re SELECT statements which can be used by other portions of an encapsulating SELECT statement. But unlike Derived Tables, they can be used multiple times (in the same statement). It’s not accurate to think of them as statement-level temporary tables, because they aren’t filled up with data as a discrete step. Except for their recursive call feature, they are tantamount to repeating the contained SELECT statement multiple times in the main SELECT statement.

For example, this SELECT statement repeats the same Derived Table multiple times.

SELECT		a1 = t1.a, a2 = t2.a
FROM		(
				SELECT		a = n + 1
				FROM		@t
			) AS t1
CROSS JOIN	(
				SELECT		a = n + 1
				FROM		@t
			) AS t2;

This alternative SELECT statement creates a Common Table Expression for that Derived Table’s contents and references the CTE twice.

WITH t AS
(
	SELECT		a = n + 1
	FROM		@t
)
SELECT		a1 = t1.a, a2 = t2.a
FROM		t AS t1
CROSS JOIN	t AS t2;

Common Table Expression and Derived Tables Exhibit Identical Query Plans

They have the same query plan. And there is no separate step in the query plan for the “population” of the supposed temporary table.

So if you’re going to repeatedly use the same *data* multiple times in a query, the Common Table Expression is your tool of choice. But what if you want to use the same logic?

Pump it up.

Unfortunately, there’s no similar syntactical sugar to create a statement-level user-defined function. There’s no direct way to have the same calculation applied to different columns of data without creating an actual user-defined function or simply repeating the logic. But there is a neat trick which in the right circumstances can accomplish the same goal without costing you any performance.

The goal, as with any function, is not repeating yourself. It’s not just a bunch of extra characters in your code. It’s an accident waiting to happen– two representations of the same logic which are just bound to diverge due to our cursed human imperfections.

But how can you run different columns of data through the same calculation in a query? Turn them into rows!

Calculations are already performed repeatedly for every row in a query. So all you have to do is take your columns and explode them up into multiple rows. In order to apply a calculation to Column1 and Column2, first change each row of the data into two more rows with a new, computed Column3 which has the original row’s data for Column1 first and the data for Column2 second.

In other words, you need to convert this:

Column1 Column2
ABC DEF
GHI JKL

Into this:

Column1 Column2 Counter Column3
ABC DEF 1 ABC
ABC DEF 2 DEF
GHI JKL 1 GHI
GHI JKL 2 JKL

Once you have that vertical list of the columns’ values, you can APPLY your calculation to it, once, and it will operate on all of the values.

Column1 Column2 Counter Column3 – Altered
ABC DEF 1 CBA
ABC DEF 2 FED
GHI JKL 1 IHG
GHI JKL 2 LKJ

All that’s left is to smoosh those vertically-represented values back into a single horizontal row. You could use PIVOT, but that syntax is a little tricky. Instead, first duplicate the calculated column and then selectively suppress the values you don’t want (by using a CASE expession). You’ll end up with this:

Column1 Column2 Counter Column1 – Altered Column 2 – Altered
ABC DEF 1 CBA  
ABC DEF 2   FED
GHI JKL 1 IHG  
GHI JKL 2   LKJ

If you take the MAX of each of those calculated columns, you end up with a single row for each one, with your final results:

Column1 Column2 Column1 – Altered Column 2 – Altered
ABC DEF CBA FED
GHI JKL IHG LKJ

And voila! You’ve taken a statement with repetitive code:

SELECT
	ContactName = c.ContactName,
	Phone1Formatted = REPLACE(REPLACE(REPLACE(REPLACE(c.Phone1,'(',''),')',''),' ',''),'-',''),
	Phone2Formatted = REPLACE(REPLACE(REPLACE(REPLACE(c.Phone2,'(',''),')',''),' ',''),'-','')
FROM		@Contacts AS c;

And eliminated that repetition:

SELECT
	ContactName = c.ContactName,
	Phone1Formatted = t.Phone1Formatted,
	Phone2Formatted = t.Phone2Formatted
FROM		@Contacts AS c
CROSS APPLY	(
				SELECT
					Phone1Formatted = MAX(CASE WHEN Exploder.n = 1 THEN PhoneFormatter.PhoneFormatted ELSE '' END),
					Phone2Formatted = MAX(CASE WHEN Exploder.n = 2 THEN PhoneFormatter.PhoneFormatted ELSE '' END)
				FROM		(SELECT n = 1 UNION ALL SELECT 2) AS Exploder
				CROSS APPLY	(
								SELECT
									Phone = 
										CASE Exploder.n
											WHEN 1 THEN c.Phone1
											WHEN 2 THEN c.Phone2
											ELSE NULL
										END
							) AS Chooser
				CROSS APPLY	(
								SELECT
									PhoneFormatted =
										REPLACE(REPLACE(REPLACE(REPLACE(Chooser.Phone,'(',''),')',''),' ',''),'-','')
							) AS PhoneFormatter
			) AS t;

Admittedly, there is more overall code in this simplified example, but imagine that the logic was more complex. And also, remember that it’s not about character count, but rather removing duplication because of the potential for different behaviors if the copied code isn’t perfectly maintained.

Execution plans showing identical treatment of repeated code and inline function but slightly different treatment of "function on the fly" method.

Unfortunately, unlike with Common Table Expressions, this is not mere syntactic sugar. This is semantically different T-SQL which makes the engine do more work– first, spreading apart the columns into rows before shoving them through your calculation and then squishing them back down again. But in the right circumstances, it might add only trivial overhead, as in this example.

Still, use this technique with care. It’s almost always better to just make actual inline user-defined functions, which often have identical performance with the benefit of code encapsulation, as they do in this example.

Ceci est une pipe

This isn’t the only way to reduce code duplication in your complex T-SQL statements. If you have two calculations which have similar beginnings but differ in the final steps, break them apart into their separate steps so that you can re-use the first steps of the calculation.

SELECT
	Palindrome1 = SecondaryOperations.Palindrome1,
	Palindrome2 = SecondaryOperations.Palindrome2
FROM		(SELECT d = 'abc') AS RawData
CROSS APPLY	(
				SELECT
					r = REVERSE(RawData.d)
			) AS BaseOperation
CROSS APPLY	(
				SELECT
					Palindrome1 = RawData.d + BaseOperation.r,
					Palindrome2 = BaseOperation.r + RawData.d
			) AS SecondaryOperations;

It’s helpful to think of each APPLY as a pipe operation, taking the values from the previous derived table and passing them into the next to be manipulated. Programming T-SQL in this manner (loosely) approximates modern functional programming techniques.

It keeps each step of the logic smaller, so that it’s easier to understand. And you can expose the intermediary columns to help with debugging.

Department of Redundancy Department

You are going to have business logic in your database, whether you like it or not. Use these and other techniques to make sure it’s only in there once.

References

Photo credit: Water Bird by Evan

Leave a Reply

Your email address will not be published. Required fields are marked *