By Riley Major, 2018-06-29
One of the many uses of tally tables is to blow up some rows. Normally, you make your database reduce rows by joining tables to others and applying criteria. But sometimes you want it to make more rows.
Phone Home
Imagine a simple table containing contact information. It has two places for phone numbers (Phone1
and Phone2
). Let’s say you wanted to split those off into a related phone numbers table so you could support many more numbers for a single contact. As part of that process, you’d need to create two rows for each existing contact– one for each phone number.
One way to get two phone numbers is to handle each phone number as a separate process, combining a list of all contacts and their first number with those same contacts and their second number. So you’d SELECT
everything from the table once for the first phone number and then UNION
it with another SELECT
. This works, but if you have a large list of columns or a complex set of JOIN
s, you end up repeating a lot of code.
Instead, what if you simply, magically double the rows and then pick and choose the columns you wanted from each of the rows? You’d pick the first phone number for the first copy and the second phone number for the second copy. So what’s the magic? The Cartesian product.
Criss-Cross Applesauce
Normally, when you JOIN
two tables, you want to correlate them somehow. If you had contacts in one table and phones in another, you’d want to make sure the phone numbers in your final query correlated to the relevant contact. The contact name might be repeated for each phone number, but it’s not just willy-nilly. A Certesian product is (seemingly) willy-nilly.
If you took a table with “A” and “B” and combined it with another table of “A” and “B”, you’d often want to match those values up. You’d take a query like this:
SELECT * FROM (SELECT 'A' UNION ALL SELECT 'B') AS L(Col) JOIN (SELECT 'A' UNION ALL SELECT 'B') AS R(Col) ON L.Col = R.Col;
And you’d get results like this:
A | A |
B | B |
That’s because you’re correlating the values in one table with the other. What if you removed that correlation? Well, you’d get an error if you just removed the JOIN
condition altogether, so let’s just fake its removal by making it a tautology like 1 = 1
:
SELECT * FROM (SELECT 'A' UNION ALL SELECT 'B') AS L(Col) JOIN (SELECT 'A' UNION ALL SELECT 'B') AS R(Col) ON 1 = 1;
Now there’s nothing to correlate the right table with the left table. So it takes every row in the left table and matches it with every row in the right table and you get this:
A | A |
A | B |
B | A |
B | B |
Instead of faking things out with that tautology, you can use a CROSS JOIN
to achieve the same results:
SELECT * FROM (SELECT 'A' UNION ALL SELECT 'B') AS L(Col) CROSS JOIN (SELECT 'A' UNION ALL SELECT 'B') AS R(Col);
SQL is pretty flexible like that. You can even take this CROSS JOIN
and make it behave like a “regular” INNER JOIN
again by putting the criteria in a WHERE
clause:
SELECT * FROM (SELECT 'A' UNION ALL SELECT 'B') AS L(Col) CROSS JOIN (SELECT 'A' UNION ALL SELECT 'B') AS R(Col) WHERE L.Col = R.Col;
Kaboom!
So you can use a CROSS JOIN
to get a list of everything in one table combined with everything in another table, all willy-nilly. If that “other table” is just a derived table of numbers (or a materialized tally table), you get as many copies of you original table as you have rows in your table of numbers. So if we want 2 copies of our table, we can just cross it with a tally table with two rows:
SELECT * FROM (SELECT 'A') AS C(Col) CROSS JOIN (SELECT 1 UNION ALL SELECT 2) AS TallyTable(n);
That gives us two rows for each row in our our first table:
A | 1 |
B | 2 |
Just in CASE
For our phone number splitting, we don’t just want a copy, of course. We’re copying with a purpose. We want to fiddle with each copy, grabbing a different phone number each time. When we’re looking at the first copy, we want the first phone number (Phone1
); or else we can get the second (Phone2
).
Let’s demonstrate by populating our simple contact table and then use two different methods to gather up a list of contacts with each phone number in a separate row.
DECLARE @Contact table ( ContactName varchar(50), Phone1 varchar(50), Phone2 varchar(50) ); INSERT INTO @Contact (ContactName, Phone1, Phone2) VALUES ('A','555-1212','555-1213'), ('B','555-1214','555-1215'); /* UNION Method */ SELECT ContactName, PhoneNum = 1, Phone = Phone1 FROM @Contact UNION ALL SELECT ContactName, PhoneNum = 2, Phone = Phone2 FROM @Contact; /* Exploding Rows Method */ SELECT ContactName, PhoneNum = t.n, CASE WHEN t.n = 1 THEN Phone1 ELSE Phone2 END FROM @Contact CROSS JOIN (SELECT n = 1 UNION ALL SELECT n = 2) AS t;
Eat that cake!
In this short example, the code explosion method looks about the same size as the UNION
method, but it scales linearly with additional columns to split, whereas the UNION
method would scale geometrically, duplicating the full list of columns– and the full FROM
clause– for each iteration.
But wait, there’s more. In addition to being more concise code, with less error-prone duplication, it also probably performs better. In this example, if you use SET STATISTICS IO ON
, you can see that it uses half as much IO:
(4 rows affected) Table '#ADB44FBF'. Scan count 2, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. (1 row affected) (4 rows affected) Table '#ADB44FBF'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
And the query plans bear that out.
So you can have your exploding cake and you can eat it, too. Go blow up some queries.