Matt Gordon hosts this month’s T-SQL Tuesday and asks us to write about how we’ve used a new capability of the SQL Server ecosystem to solve an old problem. At first blush, this seems like a good way to show concrete benefits to abstract new concepts. But I wonder if it sets the bar too high. In 2015, Brent Ozar highlighted how few instances use the latest version of SQL Server. Then, 2014 was only a few percentage points above the ancient SQL Server 2000 and the 2005 version had 4 times the presence of 2014. It’s better now, of course, but still, only 8% of installations would have provided the opportunity to use the newfangled solutions Matt is looking for.For example, we only just purchased 2016 at our shop; we’re still running 2008 R2. So I have no real world perspective to offer. That said, I am excited about all of the new and shiny.
We do a lot of development in T-SQL. We also make heavy use of in-line user-defined functions (iUDF). While they’re not perfect, we like them for encapsulating code while remaining transparent to the optimizer. They behave like views, but they take parameters. That enables a pattern separating data gathering logic from business logic– all in the database layer.
For example, consider the calculation of the profit for an order. Traditionally, you might create an OrderWithProfit view including a computed column which subtracted the costs from the charges. Similarly, we would create an iUDF called GetOrderWithProfit. However, rather than have it contain the profit calculation directly, we would instead have it call a separate iUDF called CalcOrderProfit which would accept @Charges and @Costs and return the profit as a column. The parent GetOrderWithProfit would APPLY CalcOrderProfit and because of the optimizer transparency, its query plan would be indistinguishable from the view.
Importantly, the business logic of profitability (charges – costs) is now encapsulated on its own. In the future, if the profitability calculation is changed (say, we need to globally add a 3% sales commission cost), we can adjust only the CalcOrderProfit function. Also, if we had a variety of functions which involved order information and included this profit calculation, all of those would automatically reflect the new rules. (You could also accomplish this example with a calculated column in the table, but that breaks down with more complex calculations.) And finally, we now have a deterministic function which computes business logic and therefore we can independently, automatically test just our business logic. And the tests are much easier to write– simple SELECT statements, rather than looping over stored procedure calls.
For those with a programming background, you might recognize this concept as a rudimentary form of dependency injection, which is a fancy way to describe passing things by parameters. The principle is that the down-level functions and procedures don’t ever reach out of themselves; they are fed everything they need. In addition to simplifying testing, it also allows us to use the logic outside of its original context. For example, in addition to calculating the profit on real orders in our example, we can use the same logic to calculate what the profit *would be* for merely conceptual orders, as would be needed in a quoting system.
This pattern scales with the complexity of the logic– reaping more of its isolation, reuse, and testing benefits– so long as the parameters involved all come from a single row of the outside query (even if they come from a variety of sources all gathered together). But now, imagine if our example profit calculation was based on order detail lines rather than total charges and costs, and that there was a cost discount of 10% if we had more than 2 detail lines. Calculating profit is no longer a matter of order-level information; it now requires multiple rows of data for each order.
One option, if the logic is amenable, is to aggregate the data and then send it into a calculating function. In our example, we would total the charges and costs and count the number of lines, and then pass those aggregates into CalcOrderProfit. However, that bleeds some of the business logic– the concept of aggregating itself– out of the calculation function. And as the logic becomes more complex, more of it seeps out into the aggregating function, which can only be tested with data in tables. For a truly independent profit calculator, it must be fed everything it needs for its calculation, even if that’s an array of information.
As we know from Erland Sommarskog’s treatise on passing data among stored procedures, all of the methods have significant drawbacks. Furthermore, when looking at them from the lens of passing data to an iUDF *on-the-fly* as part of a SELECT statement, all fall away except for XML. It’s the only method which doesn’t require multiple statements and, more importantly, external resources such as tables. Unfortunately, the formation and subsequent shredding of XML brings not only additional code but also significantly hampers performance and destroys query plans. For small data sets, it’s not often a deal breaker, but it doesn’t scale well– with number of iterations or volume of data.
Those performance issues have constrained our use of these patterns in our environment. But SQL Server 2016 is poised to change all of that. I’ve talked about how 2016’s JSON support isn’t as feature-filled as for XML. For example, there’s no native JSON type, whereas XML is stored in a special binary format. I assumed that the special treatment of XML would result in better performance– it wouldn’t have to parse it from scratch each time it was referenced. But I think I underestimated the impact of the additional features required to support XML. After all, XML querying supports FLWOR, a mini-language complete with control of flow statements. The long and short is that for some scenarios, SQL Server’s JSON handling is orders of magnitude faster than XML– 8 ms for raw SQL, 169 ms for JSON, and 46511 ms for XML.
Separating your business logic into stand-alone iUDFs allows (1) code encapsulation (for better isolation and comprehension), (2) easier logic reuse (for faster development and more consistent application behavior), and (3) easier testing and out of context use. For simple logic, there is no performance degradation. For complex logic involving data sets, you could use XML and suffer a performance hit, or upgrade to SQL Server 2016 and use JSON for near native performance.