A database is like a book with no index--you store information in it, but unless you know how to get it out in a particular order, it's not going to be very useful. Ordering is one of the most often used aspects of SQL, yet one of the most under-used parts of the SQL language at the same time. You can often do some really cool things through simple ordering of your data, and present the data in a different way. This article addresses ordering for database queries in any situation, whether it's in a stored procedure, or in inline SQL code in an ASP, ASP.NET, PHP, ColdFusion, or JSP page.
Note: The code shown has been tested in MS SQL Server, but will work in other databases that support the syntax. In many cases, if your database does not support certain keywords, you might be able to find an similar method of doing the same thing using equivalent functionality. MS Access queries are also shown when possible.
You might be able to improve the accuracy of the ordering, such as in this listing of titles:
'A Tale of Two Cities'
'Dreamweaver MS: The Complete Reference'
'The Andromedia Strain'
Obviously, this listing is ordered alphabetically, but it is all wrong because of the insignificant words ('A', 'The') in the example. I'll address correct ordering of titles in this article.
Another area that is often neglected is the ordering of numbers that are stored as varchar, nvarchar, text, or other textual data type values. Take this listing as an example:
ID,Name
'1','Tom'
'10','Jack'
'100','Steve'
'2','Jim'
'20','Frank'
The numbers are ordered alphabetically, which puts them out of order as numbers. This will be addressed in the article.
Forcing an order will be addressed: you can force data to be ordered based on some pre-defined criteria that you set up. There are many uses for this type of ordering, and I will show you a few examples.
Ordering within a subgroup will be addressed. What do I mean by that? Well, consider this list of data:
'10','1','Tom','Muck'
'20','1','ColdFusion'
'20','1','Flash Remoting'
'10','2','Ray','West'
'20','2','Dreamweaver MX'
'20','2','ASP.NET'
'20','2','SQL Server'
'20','2','Content Management'
'10','3','Massimo','Foti'
'20','3','ColdFusion'
'20','3','JavaScript'
'20','3','Dreamweaver MX'
This set of data does not follow the standard relational data model -- it is transactional data. The rows that begin with a '10' are the key rows and the rows that begin with '20' are the data (or line item rows) for each key row. They are linked by the record numbers in the second column. If we want to order the data by the last name of the '10' row, then by the third field in the '20' row, we might have a hard time coming up with one SQL statement that will do the trick.
Database tools are not going to help you build sorts like this, nor are the query building tools of Dreamweaver MX, CF Studio, or other web development programs. They each require hand-coding of your SQL. After having built the query, however, you can easily create a stored procedure out of it or paste it into Dreamweaver MX when creating a recordset.
The proper way to organize a series of book, article, or movie titles involves alphabetizing by the first significant word, or removing certain common words that may be at the beginning of your title. The English words commonly removed from a sort would be: a, an, the. Other languages may vary, but the concept is the same. If your data is not stored in this way, however, you will have a hard time coming up with a sort order that can be easily browsed by your end user.
Enter SQL.
Using Structured Query Language you can carefully structure the way that the title is sorted and displayed. Some variations on SQL do not allow this type of query (Access, for example) but the majority of RDBMS do. There are three main principles involved with this type of sort:
*CASE is not supported in MS Access. Sometimes you can use IIF instead of CASE.
These are three of the basic constructs in SQL coding that will make the job easy.
I will be using the SQL Server Pubs database as an example, but you can use any table in any database that has a title column. Using the Titles table as an example, I can write the following SQL statement:
SELECT title, price
FROM titles
ORDER BY title
That gives me the following results:
Title | Price |
---|---|
But Is It User Friendly? | 22.9500 |
Computer Phobic AND Non-Phobic Individuals: Behavior Variations | 21.5900 |
Cooking with Computers: Surreptitious Balance Sheets | 11.9500 |
Emotional Security: A New Algorithm | 7.9900 |
Fifty Years in Buckingham Palace Kitchens | 11.9500 |
Is Anger the Enemy? | 10.9500 |
Life Without Fear | 7.0000 |
Net Etiquette | NULL |
Onions, Leeks, and Garlic: Cooking Secrets of the Mediterranean | 20.9500 |
Prolonged Data Deprivation: Four Case Studies | 19.9900 |
Secrets of Silicon Valley | 20.0000 |
Silicon Valley Gastronomic Treats | 19.9900 |
Straight Talk About Computers | 19.9900 |
Sushi, Anyone? | 14.9900 |
The Busy Executive's Database Guide | 19.9900 |
The Gourmet Microwave | 2.9900 |
The Psychology of Computer Cooking | NULL |
You Can Combat Computer Stress! | 2.9900 |
You can see the the word "The" is in 3 of the titles, which completely nullify the sort order. Using a CASE statement and string manipulation, we can pull the first several characters off the front of the title and do things if the word is "a", "an", or "the". First, the string manipulation. Different databases will have different syntax, but this works in SQL Server. What you want to do is test the first two characters for "a ", the first three characters for "an ", and the first four characters for "the ", and perform a different action for each of these. You'll pull the word off the beginning of the title and put it at the end, using a preceding comma. You'll then set the new "title" field to an alias named "newtitle". Here is the code:
SELECT NewTitle =
CASE Left(Title, 4)
WHEN 'The '
Then Right(Title, len(Title)-4) + ', The'
WHEN 'An _'
Then Right(Title, len(Title)-2) + ', An'
WHEN 'A __'
Then Right(Title, len(Title)-2) + ', A'
ELSE
Title
END
, price
FROM titles
ORDER BY NewTitle
Basically we are looking at the first 4 characters of the title, which is pulled off the Title column with a LEFT(Title, 4). You can see what follows next. Each variation on insignificant words (the, a, an) is given a separate condition, with the _ character filling in for a single character that can be anything. Finally, the default case is to use the title with no alteration. Lastly, we order by the column alias NewTitle rather than Title. This gives us the results we want:
Title | Price |
---|---|
Busy Executive's Database Guide, The | 19.9900 |
But Is It User Friendly? | 22.9500 |
Computer Phobic AND Non-Phobic Individuals: Behavior Variations | 21.5900 |
Cooking with Computers: Surreptitious Balance Sheets | 11.9500 |
Emotional Security: A New Algorithm | 7.9900 |
Fifty Years in Buckingham Palace Kitchens | 11.9500 |
Gourmet Microwave, The | 2.9900 |
Is Anger the Enemy? | 10.9500 |
Life Without Fear | 7.0000 |
Net Etiquette | NULL |
Onions, Leeks, and Garlic: Cooking Secrets of the Mediterranean | 20.9500 |
Prolonged Data Deprivation: Four Case Studies | 19.9900 |
Psychology of Computer Cooking, The | NULL |
Secrets of Silicon Valley | 20.0000 |
Silicon Valley Gastronomic Treats | 19.9900 |
Straight Talk About Computers | 19.9900 |
Sushi, Anyone? | 14.9900 |
You Can Combat Computer Stress! | 2.9900 |
Picking the first 4 characters allowed us to simplify the CASE statement, rather than picking first 4 for "the ", first 3 for "an ", and first 2 for "a ".
In MS Access you would not be able to code a query easily that accomplished this same thing, because the basic construct of the IIF statement only allows two possible outcomes, whereas CASE gives you an infinite number of options. To create a similar query in MS Access that only checks for the word "The ", the following will work:
SELECT
IIF(Left(Title,4) = 'The ',Right(Title, len(Title)-4) + ', The',
title)
as
newtitle
, price
FROM dbo_titles
ORDER BY
IIF(Left(Title,4) = 'The ',Right(Title, len(Title)-4) + ', The',
title)
Once in a while you will find that you have a database column stored as text when in fact your values are all numbers. If you try to sort the field with a SQL statement like this
SELECT ID, Name from MyTable
ORDER BY ID
you will end up with a sort order like this:
ID | Name |
---|---|
1 | Tom |
10 | Jack |
100 | Steve |
2 | Jim |
20 | Frank |
To be able to sort the ID field numerically, you need to be able to covert the data into numeric data. Again, databases such as MS Access will not support this on-the-fly conversion of data types, but most modern RDBMS systems do. To sort the ID column numerically rather than alphabetically, you can simply CAST the resulting column in your ORDER BY statement as a number:
SELECT ID, Name from MyTable
ORDER BY CAST(ID as int)
Now your sort looks like this:
ID | Name |
---|---|
1 | Tom |
2 | Jim |
10 | Jack |
20 | Frank |
100 | Steve |
To implement a workaround in MS Access, you can use SUM on the ORDER BY clause, but you also have to group your query by all fields:
SELECT ID, Name
FROM MyTable
GROUP BY id, Name
ORDER BY sum(ID);
What if your data contains some numbers and some text characters? Well, unfortunately in basic SQL you are out of luck because there is no way to extract parts of fields easily. The SQL language does not contain regular expressions. You could create a function that removes the numeric characters and puts them into another field, then sorts numerically, but it is not an easy proposition, given the possible varying lengths of the numeric portion of the number.
If you know that the text portion of the data is the same length, however, the SQL can be easily written. Take a look at this sample data:
id | Name | user_id |
---|---|---|
1 | Tom | user1 |
3 | Jess | user10 |
5 | Jim | user11 |
4 | Mike | user12 |
2 | Frank | user2 |
6 | Bruce | user20 |
7 | Clint | user21 |
8 | Tony | user8 |
You can see the user_id field is part numeric and part text. If you wanted to sort on the text portion, but also keep the numeric portion in numeric order, this would be easy in SQL, because the text portion is 4 characters in every row:
SELECT id, Name, user_id
FROM SampleNumeric
ORDER BY CAST(RIGHT(user_id, len(user_id)-4) as int)
In this query we are simply taking the portion of the field that begins at the 5th character by choosing the RIGHT portion of the field using the length of the field minus 4. The MS Access version of the same query would look like this:
SELECT id, Name, user_id
FROM SampleNumeric
GROUP BY id, name, user_id
ORDER BY SUM(RIGHT(user_id, len(user_id)-4));
What if the text portion was varying in length, however? We can do this in SQL Server and other databases that support PATINDEX and REVERSE (or their equivalents) but not MS Access. The following table contains a user_id field that has varying length text followed by number. The user_id field is in alphabetical order:
id | Name | user_id |
---|---|---|
2 | Frank | alexandria2 |
6 | Bruce | alexandria20 |
7 | Clint | alexandria21 |
1 | Tom | buffalo1 |
3 | Jess | buffalo10 |
5 | Jim | buffalo11 |
4 | Mike | buffalo12 |
8 | Tony | buffalo8 |
Now it becomes tricky without regular expressions. The SQL to pull the numeric portion out of the field is a little trickier, but it can be done. The following SQL will sort the data on the numeric portion of user_id only:
SELECT id, Name, user_id
FROM SampleNumeric2
ORDER BY
CAST(
RIGHT(user_id, PATINDEX('%[0-9][^0-9]%', REVERSE(user_id))
) as int)
This gives the result:
id | Name | user_id |
---|---|---|
1 | Tom | buffalo1 |
2 | Frank | alexandria2 |
8 | Tom | buffalo8 |
3 | Jess | buffalo10 |
5 | Jim | buffalo11 |
4 | Mike | buffalo12 |
6 | Bruce | alexandria20 |
7 | Clint | alexandria21 |
Let's examine the SQL: The CAST() function is just like you've seen in the other examples. We will cast the result of the inner expression as an integer. The PATINDEX and REVERSE functions are doing something strange though:
CAST(
RIGHT(user_id, PATINDEX('%[0-9][^0-9]%', REVERSE(user_id))
) as int)
Look at REVERSE first: this reverses the string in the field so that the numbers are first. Because we don't have regular expressions in SQL, it is difficult to pick the first occurrence of a range of characters. After reversing the characters, the PATINDEX function returns the LAST numeric character it finds. This is made possible by the use of the wildcard % in the first character position to find any character, the range of characters to search for in the next position [0-9] (which finds one character), and the range of characters NOT to search for [^0-9] (one character, once again.)
That is good if you need it sorted ONLY numerically, but what if you want it sorted on the numeric AND text portion? The following SQL will do that:
SELECT id, Name, user_id
FROM SampleNumeric2 /**/
ORDER BY
LEFT(user_id, len(user_id)-patindex('%[0-9][^0-9]%',reverse(user_id))),
CAST(
RIGHT(user_id, patindex('%[0-9][^0-9]%',reverse(user_id))
) as int)
The query will return the results intended:
id | Name | user_id |
---|---|---|
2 | Frank | alexandria2 |
6 | Bruce | alexandria20 |
7 | Clint | alexandria21 |
1 | Tom | buffalo1 |
8 | Tony | buffalo8 |
3 | Jess | buffalo10 |
5 | Jim | buffalo11 |
4 | Mike | buffalo12 |
The query is based on the previous query, but includes the exact opposite functionality in the first ORDER BY clause to return the alphabetic portion of the field:
LEFT(user_id, len(user_id)-patindex('%[0-9][^0-9]%',reverse(user_id)))
The number of numeric characters is subtracted from the length of the field to yield a result of ONLY text characters that we know are in the field.
These types of queries only work if you know what kind of data you can expect in the field, such as text characters followed by numeric characters. Any mixing of characters that you can't predict makes a query like this fail.
One thing that you can do easily though is to force the fields that begin with numeric characters to the bottom of your listing, rather than display at the top. If you want to force the rows to the bottom, try this:
SELECT ProductName
FROM Products
ORDER BY ISNUMERIC(left(productname,1))
, productname
Here you are creating a new unnamed column to sort by: the code ISNUMERIC(left(productname, 1)) will return a 1 or a 0 depending on whether the first letter of the field is a number or not. If it is, the return value is 1. If it isn't, the return value is 0. Since you are sorting on that field first, all the 0s will rise to the top. The 1s will fall to the bottom. An MS Access version of this query might look like this:
SELECT ProductName
FROM Products
ORDER BY IIF(left(ProductName,1) < 'a',1,0), ProductName
Just as you can force the numeric values to the bottom of your resultset in the previous example, you can force any specific value to the top or bottom of the results in the same way. For example, the Employee table in the Pubs database contains employee names (the fname and lname fields) with employee id numbers (the emp_id field). The IDs have two different formats, if you examine the data. Some of the emp_id fields contain a letter followed by a dash character and several other letters. Some are made up entirely of characters. To order on emp_id, you would use this SQL:
SELECT emp_id, fname, lname
FROM employee
ORDER BY emp_id,lname, fname
You would get these results:
emp_id | fname | lname |
---|---|---|
A-C71970F | Aria | Cruz |
A-R89858F | Annette | Roulet |
AMD15433F | Ann | Devon |
ARD36773F | Anabela | Domingues |
CFH28514M | Carlos | Hernadez |
CGS88322F | Carine | Schmitt |
DBT39435M | Daniel | Tonini |
DWR65030M | Diego | Roel |
ENL44273F | Elizabeth | Lincoln |
F-C16315M | Francisco | Chang |
GHT50241M | Gary | Thomas |
H-B39728F | Helen | Bennett |
HAN90777M | Helvetius | Nagy |
HAS54740M | Howard | Snyder |
JYL26161F | Janine | Labrune |
KFJ64308F | Karin | Josephs |
KJJ92907F | Karla | Jablonski |
L-B31947F | Lesley | Brown |
LAL21447M | Laurence | Lebihan |
M-L67958F | Maria | Larsson |
M-P91209M | Manuel | Pereira |
M-R38834F | Martine | Rance |
MAP77183M | Miguel | Paolino |
MAS70474F | Margaret | Smith |
MFS52347M | Martin | Sommer |
MGK44605M | Matti | Karttunen |
MJP25939M | Maria | Pontes |
MMS49649F | Mary | Saveley |
PCM98509F | Patricia | McKenna |
PDI47470M | Palle | Ibsen |
PHF38899M | Peter | Franken |
PMA42628M | Paolo | Accorti |
POK93028M | Pirkko | Koskitalo |
PSA89086M | Pedro | Afonso |
PSP68661F | Paula | Parente |
PTC11962M | Philip | Cramer |
PXH22250M | Paul | Henriot |
R-M53550M | Roland | Mendel |
RBM23061F | Rita | Muller |
SKO22412M | Sven | Ottlieb |
TPO55093M | Timothy | O'Rourke |
VPA30890F | Victoria | Ashworth |
Y-L77953M | Yoshi | Latimer |
Suppose you wanted to sort by emp_id, but you wanted all the emp_id fields with the dash character to magically rise to the top. Use string manipulation and the CASE statement once again:
SELECT emp_id, fname, lname
FROM employee
ORDER BY
CASE WHEN LEFT(emp_id, 2) LIKE '%-' THEN
0
ELSE
1
END
, emp_id, lname, fname
The equivalent query in MS Access would be:
SELECT emp_id, fname, lname
FROM Employee
ORDER BY
IIF(left(emp_id,2) LIKE '*-',0,1)
,emp_id, lname, fname;
The Access IIF statement is saying "If the left 2 characters contain any number of characters followed by a - sign, use 0 to sort by, otherwise use 1."
In SQL Server you can also simply use a PATINDEX function :
SELECT emp_id, fname, lname
FROM employee
ORDER BY
patindex('_-%',emp_id) desc
, emp_id, lname, fname
The PATINDEX function returns a 1 or a 0. If the second character is a '-' character, it will return 1. We order the field in DESC order because we want the 1 at the top, and the 0 at the bottom. Notice the emp_id field is still in perfect alphabetical order otherwise. The resulting sort order looks like this:
emp_id | fname | lname |
---|---|---|
A-C71970F | Aria | Cruz |
A-R89858F | Annette | Roulet |
F-C16315M | Francisco | Chang |
H-B39728F | Helen | Bennett |
L-B31947F | Lesley | Brown |
M-L67958F | Maria | Larsson |
M-P91209M | Manuel | Pereira |
M-R38834F | Martine | Rance |
R-M53550M | Roland | Mendel |
Y-L77953M | Yoshi | Latimer |
AMD15433F | Ann | Devon |
ARD36773F | Anabela | Domingues |
CFH28514M | Carlos | Hernadez |
CGS88322F | Carine | Schmitt |
DBT39435M | Daniel | Tonini |
DWR65030M | Diego | Roel |
ENL44273F | Elizabeth | Lincoln |
GHT50241M | Gary | Thomas |
HAN90777M | Helvetius | Nagy |
HAS54740M | Howard | Snyder |
JYL26161F | Janine | Labrune |
KFJ64308F | Karin | Josephs |
KJJ92907F | Karla | Jablonski |
LAL21447M | Laurence | Lebihan |
MAP77183M | Miguel | Paolino |
MAS70474F | Margaret | Smith |
MFS52347M | Martin | Sommer |
MGK44605M | Matti | Karttunen |
MJP25939M | Maria | Pontes |
MMS49649F | Mary | Saveley |
PCM98509F | Patricia | McKenna |
PDI47470M | Palle | Ibsen |
PHF38899M | Peter | Franken |
PMA42628M | Paolo | Accorti |
POK93028M | Pirkko | Koskitalo |
PSA89086M | Pedro | Afonso |
PSP68661F | Paula | Parente |
PTC11962M | Philip | Cramer |
PXH22250M | Paul | Henriot |
RBM23061F | Rita | Muller |
SKO22412M | Sven | Ottlieb |
TPO55093M | Timothy | O'Rourke |
VPA30890F | Victoria | Ashworth |
Another time you might want to force a column to the bottom is if the column contains a NULL. The easiest way to do this is to use the ISNULL function in SQL Server (other databases have similar functions).
SELECT Col001, Col002, Col003, Col004
FROM testtable
ORDER BY col002
,
isnull(col004,'zzzzzzz')
, col001
, col003
Here we are just forcing the column to the bottom with a string of characters that is guaranteed to be last in a sort order.
Another way to do it is like this:
SELECT Col001, Col002, Col003, Col004
FROM testtable
ORDER BY col002
, CASE WHEN col004 IS NULL THEN 1 else 0 end
, col001
, col003
The Microsoft Access approach would be to use the IIF construct, once again:
SELECT Col001, Col002, Col003, Col004
FROM testtable
ORDER BY col002
, IIF(isnull(col004),1,0)
, col001;
, col003;
The last item I want to talk about is perhaps the most complex, as it deals with transactional data. Transactional data does not conform to standard relational database structure. Your field names and field mappings will not describe the data, because you have two different types of rows, identified by a key field. Because the fields do not have the same type of data in each row (and the "20" rows contain one less field), the naming of the fields becomes insignificant. The first column could be named "key" and the second column could be named "recordNumber", but the remaining columns have differing data. For that reason, I'll simply refer to them as col001, col002, col003, and col004. This is how they are named automatically by SQL Server as the data is imported.
Transactional data is typically stored in a text file as an output from another database system, data received from a client for processing by you, or data that has been generated as a report.
'10','1','Tom','Muck'
'20','1','ColdFusion'
'20','1','Flash Remoting'
'10','2','Ray','West'
'20','2','Dreamweaver MX'
'20','2','ASP.NET'
'20','2','SQL Server'
'20','2','Content Management'
'10','3','Massimo','Foti'
'20','3','ColdFusion'
'20','3','JavaScript'
'20','3','Dreamweaver MX'
As you can see, the highlighted fields are the key fields in the data. Data is stored initially in sequential order in a text file, but when imported to SQL, the only way to retrieve the results in any kind of order is by using an ORDER BY clause and ordering by the second column (the record number). However, if you do this, your transactional order will be lost, because there is no line number. We can order on subgroups as long as our original record number (col002) is the column that we want to sort on for the results. Try the following statement:
SELECT Col001, Col002, Col003, Col004
FROM TransactionalTable
ORDER BY col002
, col001
, col003
Your result will look like this:
Col001 | Col002 | Col003 | Col004 |
---|---|---|---|
10 | 1 | Tom | Muck |
20 | 1 | ColdFusion | |
20 | 1 | Flash Remoting | |
10 | 2 | Ray | West |
20 | 2 | ASP.NET | |
20 | 2 | Content Management | |
20 | 2 | Dreamweaver MX | |
20 | 2 | SQL Server | |
10 | 3 | Massimo | Foti |
20 | 3 | ColdFusion | |
20 | 3 | Dreamweaver MX | |
20 | 3 | JavaScript |
The third column in the subgroup of the "20" records is now sorted. If you want to sort by some other field, however, such as col004 (the last name in the "10" rows), you would have a hard time trying to use standard SQL syntax. Sorting on last name is easy -- carrying the "20" records along with that name is the hard part.
It turns out the easiest way to do this is to create another field on the fly to act as the last name field. We'll populate the field with the last name in col004 for the "10" records, and populate the new field with that name for the "20" records. Sound easy? It's not, but once you have the code you can use it for any situation where you have to group records. The SQL is as follows:
SELECT t.col001, t.col002, col003, col004
FROM TransactionalTable t
ORDER BY
CASE when col001 = '10' THEN
col004
ELSE
(SELECT col004 from TransactionalTable where col002 = t.col002 and col001 = '10')
end
, col001, col003
We created the new field on the fly in the ORDER BY clause using CASE. In the "10" rows, we merely use col004 as is. In the "20" rows, we pull the col004 from the matching record in the "10" row. That fills up the field so that the record remains grouped together in the ORDER BY clause. Then we merely sort on col001, which floats the "10" records to the top of each group, and then col003, which orders the items within the "20" group.
In MS Access you can't create this query in one pass, but you can turn it into two queries: one can be a saved query named Transaction1:
SELECT t.col001, t.col002, col003, col004
, (SELECT col004 from TransactionalTable where col002 = t.col002 and col001
= '10') as DUMMY
FROM TransactionalTable t;
This creates a temporary view of the data with the added column holding the last name field (col004). Then you can run a query against this view and order it the way you want:
SELECT col001, col002, col003, col004
FROM Transaction1
ORDER BY dummy, col001, col003;
Ordering your results can be enhanced with a few simple tricks. Many web programmers are content to learn a few SQL keywords, but there are a lot of useful functions in the SQL language that make it easy to return the results that you need, and in any order that you need them.