/* */ ROW_NUMBER and other ranking functions in SQL Server |SQL Works

Thursday, October 4, 2012

ROW_NUMBER and other ranking functions in SQL Server

Have you ever needed to select the records in one column based on the contents of another column? It's a pretty common request, but what to do if you need to select from one column based on the max or min values from another column? Lets base this on a table like this:
CREATE TABLE dbo.SalesRecords
 (
 SalesPersonID int NOT NULL,
 SaleDate datetime,
 SaleCategory smallint,
 SaleAmount money
 )

INSERT INTO dbo.SalesRecords VALUES(100, '10/2/2012', 1, 200.00)
INSERT INTO dbo.SalesRecords VALUES(100, '10/1/2012', 2, 200.00)
INSERT INTO dbo.SalesRecords VALUES(100, '9/1/2012', 1, 300.00)
INSERT INTO dbo.SalesRecords VALUES(100, '9/2/2012', 2, 400.00)
INSERT INTO dbo.SalesRecords VALUES(100, '9/15/2012', 2, 100.00)
INSERT INTO dbo.SalesRecords VALUES(100, '9/20/2012', 1, 250.00)
INSERT INTO dbo.SalesRecords VALUES(200, '10/2/2012', 1, 200.00)
INSERT INTO dbo.SalesRecords VALUES(200, '10/1/2012', 2, 200.00)
INSERT INTO dbo.SalesRecords VALUES(200, '9/1/2012', 1, 300.00)
INSERT INTO dbo.SalesRecords VALUES(200, '9/2/2012', 2, 400.00)
INSERT INTO dbo.SalesRecords VALUES(200, '9/15/2012', 2, 100.00)
INSERT INTO dbo.SalesRecords VALUES(200, '9/20/2012', 1, 250.00)
You may be asked to return a list of the date and amount of first sale made in each month by each sales person. We could try to do that like this, by using the MIN function on the date, but this gives us the SaleAmount for each day, not the min for each month.
SELECT SalesPersonID, saleamount, MIN(saledate)
 FROM salesrecords
 WHERE saledate between '9/1/2012' and '9/30/2012'
  GROUP BY SalesPersonID, saleamount, saledate
  HAVING saledate = MIN(saledate)
A more effective way to handle this is using ROW_NUMBER() as follows:
;WITH ROW_NUM as
(
SELECT SalesPersonID, 
       SaleDate, 
       SaleCategory, 
       SaleAmount, 
       ROW_NUMBER() OVER (PARTITION BY SalesPersonID, datepart(MM,saledate) ORDER BY SaleDate ASC ) as 'Sequence'
 FROM SalesRecords
)
If we just select all from the common Table Expression ROW_NUM, the results look like this:

ROW_NUMBER function results image

So you can see the ROW_NUMBER function has created sequential numbers which restart at the beginning of each month, for each SalesPersonID. In the PARTITION clause of your ROW_NUMBER function you may specify one, or multiple columns to define how the sequential numbers are ordered. Since we chose SalesPersonID and the Month part of sale date, the numbers restart at each change in Month for each SalesPersonID. In order to answer our question on the first sale of each month for each SalesPersonID now we only need to run the following against our CTE:

SELECT SalesPersonID, SaleAmount, SaleDate 
FROM ROW_NUM
WHERE sequence = 1
ORDER BY SaleDate DESC

And it returns a list of the SaleAmount of the first sale of each month for each SalesPersonID as shown here:


If you were interested in the last sale of the month instead of the first you could simply change the ORDER BY hint from ASC to DESC in your ROW_NUMBER() function and you would get the opposite results, sequence number 1 would be the last sale instead of the first.

Part 2 coming later...other ranking functions in SQL Server

Part 2 is now up...NTILE() function usage