T-SQL Tuesday #43 – Hello, Operator?

In this iteration of T-SQL Tuesday, Rob Farley came up with a really good topic about Plan Operators.  While I do enjoy taking the opportunity during T-SQL Tuesday to tell a personal SQL story and relate it to the topic, I will actually use this opportunity to expand my academic knowledge of SQL Server Plan Operators.  Although I have seen almost all of the plan operators in my own SQL development experience, there has been one in particular that I have really never researched: The Eager Spool.

SQL Server uses several operators to build a query plan and return the result set.  This happens in Query Optimizer and is seen in the Execution Plan immediately after executing a SQL statement.  Like the print spooler in the Windows OS, the Query Optimizer uses a series of spoolers to fetch data from varying records into an intermediate result set in tempdb.  Don’t worry though – the lifetime of the intermediate result set in tempdb only lasts as long as the query is executing, so overhead should be light.

Taking this one step further, the Eager Spool performs an additional operation that allows the intermediate result set to be “primed”, in essence, so as to facilitate a quicker execution of the next operation.  In other words, this spool operator eagerly awaits the next operation so it will fill the intermediate result set in tempdb.  If the next operation turns out to be the final result set, or the result set of the eager spool is not needed (result set needs to be rebinded to another operator), the result set in tempdb is removed.  However, if the next operation is in need of the result set produced by the eager spool in tempdb, the first record in the result set is used to perform the operation and each record thereafter (a process referred to a rewinding the spool).  An added benefit to the rewinding operation is that the result set does not need to be rebinded, thus cutting down on reload time of the result set.

To illustrate this, let’s create a simple table and perform an update using a non-clustered index.  First, our table and index:

CREATE TABLE SalesPeople (
    Region int
    ,Territory int
    ,SalesPerson varchar(200)
);

CREATE NONCLUSTERED INDEX IX_SalesPeople_Territory ON SalesPeople(Territory);

Now, let’s populate it with some sample data:

DECLARE @index int;
SET @index = 20;
WHILE @index < 30
BEGIN
    INSERT INTO SalesPeople(Region, Territory, SalesPerson)
    VALUES (1, @index, 'Oliver');
    SET @index += 1;
END

TSQL Tues #43 Table 01

So far, so good.  Now here is where the magic happens.  By using a query hint in the UPDATE, we force the query optimizer to prime the intermediate output using the eager spool.  This allows the update to happen on a row-by-row basis in tempdb.

UPDATE SalesPeople
SET Territory = 40
FROM SalesPeople WITH(INDEX = IX_SalesPeople_Territory);

TSQL Tues #43 Output 01

TADA!

While the eager spool, and any SQL spool for that matter, may seem impressive enough to use in every query instance, it is worth noting to use query hints sparingly.  Whereas the eager spool may help with large DML operations, it may hurt in other areas of index and schema maintenance.  As a general rule of thumb, just let the query optimizer determine when to spool – it knows best!

One of the key points to remember here is that the Eager Spool is a logical operator.  That is to say it will illustrate the operation to be performed rather than performing the operation itself.  The execution of the logical operator is left to the physical operators of the execution plan.  Personally, I like to think of these operators in terms of electrical circuits: logical operators are to logic gates as physical operators are to transistors.  Really brings the whole picture full circuit, wouldn’t you agree?