Ramblings of a Mad Computer Guy: Pivot Tables

For the last few weeks I have been building a web site that would display parent companies with the services they offered. The kicker is that I needed to differentiate if that service was provided by the parent company or some sub contracting company.

So I needed a variety of tables in the database to keep track of the data. First I needed a table to keep all of the various companies data in. This table also contained a flag to differentiate what was a parent company.
Next I needed a table to keep the services in. Then I needed a couple of look up tables that matched services to companies, and sub contracting companies with the parent companies. Lots of tables.

My requirement was to display the data in a grid that had the services run across the top, the parent companies run along the left, and a check box in cells where the company could provide the services.
At any time the services could be added to or removed, so I was left with the prospect of building the table from a set of four lists.
I first did this in C#, with several ForEach loops. Easy enough, and was fine for small amounts of data. Then word came down that another team saw the web site and wanted to use it for their parent company/sub contractors. This team had been in existence for nearly 10 years, so they had hundreds of services and thousands of companies. My C# ForEach loop solution would not scale well for this much data. The performance would be unacceptably slow. What to do? Use the power of SQL server and feed the roll up data to the data service. How?

My first problem was, how to take a table and use its rows as columns. There is no foreach loop in TSQL, so how to solve? With the release of SQL 2005 Microsoft gave us a neat little function for just such a problem. It is the PIVOT function:
SELECT <non-pivoted column>,
[first pivoted column] AS <column name>,
[second pivoted column] AS <column name>,
...
[last pivoted column] AS <column name>
FROM
(<SELECT query that produces the data>)
AS <alias for the source query>
PIVOT
(
<aggregation function>(<column being aggregated>)
FOR
[<column that contains the values that will become column headers>]
IN ( [first pivoted column], [second pivoted column],
... [last pivoted column])
) AS <alias for the pivot table>
<optional ORDER BY clause>;

Looks worse than it is. So we start with my services table:

ServiceID	ServiceName
1	Engine
2	Fuel
3	Wheels
4	Wings

Company Table:

CompanyID	CompanyName	HoldingCompany
1	Bob's Holding Company	true
2	Jim's SubContractor	false
3	Matt's SubContractor	false
4	Bruce's Holding Company	true

Service Match Table:

ServiceMatchID	CompanyID	ServiceID
1	1	1
2	4	1
3	2	2
4	3	3

Company Match Table

CompanyMatchID	ParentCompanyID	ContractorCompanyID
1	1	2
2	4	3

Fun right? Anyway what we first need to do is craft the data table that the PIVOT function will use. What we want is to match up our Companies with our Services, while joining on the other tables to make the return human readable.

SELECT CompanyName, Company.CompanyID, ServiceName, Services.ServiceID, HoldingCompany, HoldingCompanySubContractMatch.SubContractID,
CellType =
CASE
WHEN HoldingCompany = true THEN 'M'
WHEN HoldingCompany = false THEN 'S'
END
FROM ServiceCompanyMatch
JOIN Services ON ServiceCompanyMatch.QualID = Service.ServiceID
JOIN Company ON ServiceCompanyMatch.CompanyID = Company.CompanyID
LEFT OUTER JOIN HoldingSubContractMatch ON company.CompanyID = HoldingSubContractMatch.MOAHolderID

This gives us:

CompanyName	CompanyID	ServiceName	ServiceID	HoldingCompany	SubContractID	CellType
Bob's Holding Company	1	Engine	1	true	2	M
Bruce's Holding Company	4	Engine	1	true	NULL	M
Jim's SubContractor	2	Fuel	2	false	NULL	S
Matt's SubContractor	3	Wheels	3	false	NULL	S

Now we have our data table. Now we have to make some decisions. First how do we want our pivot table to look? We want the columns to be the Services, and the rows to be the companies. So we want to pivot on the the ServiceID column, but display the CellType information.
We also want to be very careful with the Services table. These rows can change at any point, so we have to dynamically load these in to the PIVOT statement.
There is no ForEach loop in TSQL so we have to do something else to load all of the columns. To do that we need to use a little trick, called dynamic SQL. We load the function as a varchar in to an EXEC statement. Inside this varchar we load a flattened list of the rows. We use the COALESCE statement to do this. So, in pieces, here is our statement:

DECLARE @ColumnHeader varchar(MAX)
SELECT @ColumnHeader=
COALESCE(
@ColumnHeader + ',[' +Services.ServiceName+']','['+Services.ServiceName+']'
)
FROM Services

First, we declare a varchar to contain the list of the would be columns. Then we use COALESCE to put the rows in to a comma delimited string.

DECLARE @Pivot varchar(MAX)
SET @Pivot = N'
SELECT
*
FROM
(
SELECT CompanyName, Company.CompanyID, QualName, Quals.QualID, MOAHolder, MOAHolderSubContractMatch.SubContractID,
CellType = CASE
WHEN MOAHolder = 1 then ''M''
WHEN MOAHolder = 0 then ''S''
END
FROM QualCompanyMatch
JOIN Quals on QualCompanyMatch.QualID = quals.QualID
JOIN Company on QualCompanyMatch.CompanyID = Company.CompanyID
LEFT OUTER JOIN MOAHolderSubContractMatch on company.CompanyID = MOAHolderSubContractMatch.MOAHolderID
) DataTable

I then declare the varchar that will be executed in the EXEC statement, and set it to the string that follows, thus the opening tick mark after the N.
Then we grab the entire table from the SELECT statement that we figured out above. We call that result set DataTable.

PIVOT
(
MAX(CellType) FOR ServiceName
IN
(
'+@ColumnHeader+'
)
)as PivotTable'
exec (@Pivot)

Here we pivot the columns and load the column names from the varchar variable that we loaded earlier.
In the syntax of PIVOT we need to have an aggregation function over the column that contains the data we want to display in the rows. In my example, the aggregated column is a varchar, so we use the MAX aggregation function. This does nothing to a varchar, so the data remains untouched.
Next we have the FOR statement determining the column that we wish to pivot on. SQL takes the data in the rows of this column, and sets them as the columns in the new result set we are creating.
The IN statement then names the columns, we load in the @ColumnHeader vachar that we created in the first part of the statement.
Finally we name the new result set "PivotTable", and close the tick. We then run the dynamically crated command in the EXEC function.
The result is

CompanyName	CompanyID	ServiceID	HoldingCompany	SubContractID	Engine	Fuel	Wheels	Wings
Bob's Holding Company	1	1	true	2	M	NULL	NULL	NULL
Bruce's Holding Company	4	1	true	3	M	NULL	NULL	NULL
Jim's SubContractor	2	2	false	NULL	NULL	S	NULL	NULL
Matt's SubContractor	3	3	false	NULL	NULL	NULL	S	NULL

So we now have a data source that now can be called as a stored procedure or a view. It contains everything we need to display the information that we want it to.
I am still working on getting the result set to display with the data rolled up in to the parent companies. If I figure it out I will post!

Ramblings of a Mad Computer Guy

Tuesday, December 14, 2010

Pivot Tables

No comments:

Post a Comment