Tuesday, May 11, 2010

TOP n, TOP n Percent And TOP n WITH TIES

The TOP option has been available in T-SQL for some time, but enhancements were made in SQL 2005 that allow the use of a variable with TOP along with using TOP with an INSERT, UPDATE, or DELETE statement.  The TOP options is ideal for limiting a data set when trying to obtain a data sample from a table(s), but can also be used to qualify a specific data set.
Consider being requested to show the 6 most expensive list prices of all available products.  This could easily be done with TOP, but there are some internals to understand.  The below code creates a table that is populated with the sample data from the AdventureWorks2008 database Production.Product table.  A clustered index is added to the table on the ProductNumber to demonstrate some of the possible pitfalls and/or misunderstandings of using TOP.

USE AdventureWorks2008;

CREATE TABLE topcost(
Name           NVARCHAR(50),
ProductNumber  NVARCHAR(25),
ListPrice      MONEY);
GO

CREATE CLUSTERED INDEX   cl_lprice
ON topcost(ProductNumber);
GO
INSERT topcost
SELECT Name,
      
ProductNumber,
      
ListPrice

FROM Production.Product
WHERE ProductNumber > 'H'

Based on the request for the 6 most expensive list prices of all products the immediate reaction is to whip up a quick and dirty TOP query.  This appears easy enough, but both the request and knee jerk response need a bit more clarification to assure that the desired results are obtained.  Ignoring the underlying data structure and assuming that sql server knows what is required then the below query would work:

SELECT TOP 6 Name,
      
ProductNumber,
      
ListPrice

FROM topcost

The results would be far from what was requested as the list price is in no apparent order:
LL Mountain Handlebars     HB-M243    44.54
ML Mountain Handlebars    HB-M763    61.92
HL Mountain Handlebars    HB-M918    120.27
LL Road Handlebars           HB-R504    44.54
ML Road Handlebars          HB-R720    61.92
HL Road Handlebars          HB-R956    120.27
The first issue is the fact that the table has a clustered index on the ProductNumber which enforces the physical ordering of the leaf level data.  This means that the data is stored based on the ProductNumber, and as ascending or descending order was not specified in the creation of the clustered index, in ascending order.  To assure that that the 6 most expensive list prices are returned the query must contain an ORDER BY clause qualifying this requirement and as the default behavior of ORDER BY is to return in ascending order the opposite, descending, must be noted:

****The presence of a clustered index does not necessarily assure that the result set will be returned in that order.  If ordering is important than an ORDER BY clause should always be used to assure consistency.

SELECT TOP 6 Name,
      
ProductNumber,
      
ListPrice

FROM topcost
ORDER BY ListPrice DESC

The results are close to what is requested now:

HL Road Rear Wheel                RW-R820        357.06
HL Mountain Rear Wheel          RW-M928       327.215
ML Road Rear Wheel               RW-R762        275.385
Touring Rear Wheel                  RW-T905         245.01
ML Mountain Rear Wheel         RW-M762       236.025
HL Mountain Seat Assembly     SA-M687        196.92

The only possible problem arising now is not visible unless we change the query to include a few more rows into the result set

SELECT TOP 9 Name,
      
ProductNumber,
      
ListPrice

FROM topcost
ORDER BY ListPrice DESC

The results now show that there are three products with a list price of 196.92 before this tie is broken with a lower price:

HL Road Rear Wheel                RW-R820       357.06
HL Mountain Rear Wheel          RW-M928      327.215
ML Road Rear Wheel               RW-R762       275.385
Touring Rear Wheel                  RW-T905       245.01
ML Mountain Rear Wheel         RW-M762     236.025
HL Mountain Seat Assembly     SA-M687      196.92
HL Road Seat Assembly           SA-R522       196.92
HL Touring Seat Assembly       SA-T872       196.92
All-Purpose Bike Stand            ST-1401        159.00

The initial request from for the 6 most expensive products was very specific, but the option of returning the TOP results, including ties, is often overlooked.  To return a complete result set including ties requires minimal changes to the query:

SELECT TOP 6 Name,
     
ProductNumber,
     
ListPrice

FROM topcost
ORDER BY ListPrice DESC

The final result would be:

HL Road Rear Wheel               RW-R820       357.06
HL Mountain Rear Wheel         RW-M928      327.215
ML Road Rear Wheel               RW-R762      275.385
Touring Rear Wheel                  RW-T905       245.01
ML Mountain Rear Wheel         RW-M762     236.025
HL Mountain Seat Assembly     SA-M687      196.92
HL Road Seat Assembly           SA-R522       196.92
HL Touring Seat Assembly       SA-T872       196.92

A total of 8 rows, the extra two are the result of having identical values to the 6th row. as a result of using WITH TIES.

TOP also supports using PERCENT, which returns the requested percent sampling of the result set:

SELECT TOP 5 PERCENT WITH TIES Name,
      
ProductNumber,
      
ListPrice

FROM topcost
ORDER BY ListPrice DESC  

The above query returns 15 rows out of the 261.  13.xx rows would be 5 percent, but as WITH TIES was used the additional row(s) were returned:

HL Road Rear Wheel                RW-R820    357.06
HL Mountain Rear Wheel         RW-M928    327.215
ML Road Rear Wheel              RW-R762     275.385
Touring Rear Wheel                 RW-T905     245.01
ML Mountain Rear Wheel        RW-M762    236.025
HL Mountain Seat Assembly    SA-M687    196.92
HL Road Seat Assembly          SA-R522    196.92
HL Touring Seat Assembly      SA-T872    196.92
All-Purpose Bike Stand           ST-1401    159.00
ML Road Seat Assembly         SA-R430    147.14
ML Touring Seat Assembly     SA-T612    147.14
ML Mountain Seat Assembly    SA-M237    147.14
LL Road Seat Assembly          SA-R127    133.34
LL Mountain Seat Assembly    SA-M198    133.34
LL Touring Seat Assembly      SA-T467    133.34

In SQL Server 2005 enhancements were made to the TOP option including allowing the ability to pass an expression to TOP, there by no longer requiring using dynamic t-sql for such a query:

DECLARE @i BIGINT
SET @i = 6
SELECT TOP (@i) Name,
      
ProductNumber,
      
ListPrice

FROM topcost
ORDER BY ListPrice DESC

The results set is again 8 rows. 

******Unlike passing a static numeric value with TOP a variable must be passed between parenthesis in order to parse. Using parenthesis with numeric values will also parse and may be good practice to assure consistency in writing your queries.

SQL 2005 also introduced the ability to use TOP in INSERT, UPDATE, DELETE.  There are several ways to implement this, the below creates a new table mirroring the topcost table, created above, to exhibit the differences in each technique:

CREATE TABLE trans(
Name           NVARCHAR(50),
ProductNumber  NVARCHAR(25),
ListPrice      MONEY);
GO

INSERT TOP (2) INTO trans  
SELECT Name
      
ProductNumber
      
ListPrice

FROM topcost;
GO

The above use of TOP in the INSERT statement will insert the the first two rows from the result set based on the order the results are returned:
LL Mountain Handlebars    HB-M243    44.54
ML Mountain Handlebars    HB-M763    61.92
If you were again tasked with inserting the top 6 rows based on the most expensive list price then the below query would be more appropriate:

INSERT trans  
SELECT TOP 6 WITH TIES Name,
     
ProductNumber,
     
ListPrice

FROM topcost
ORDER BY ListPrice DESC ;
GO

This will insert the 8 expected rows based on the ListPrice in descending order. 
The below will clean up the tables used for these examples

DROP TABLE topcost;
GO

DROP TABLE trans;
GO

No comments:

Post a Comment