How Expensive Is SQL ORDER BY
How Expensive Is SQL ORDER BY
5097942
I don't quite understand how a SQL command would sort a large resultset. Is it done in memory on the fly (i.e. when a query is perfomed)?
up vote down Is is going to be faster to sort using ORDER BY in SQL rather than sort say a linked list vote
favorite
of objects containing the results in a language like Java (assuming a fast built-in sort, probably using quicksort)?
sql sorting
Its almost always faster if the database does it. Rook Feb 23 '11 at 22:32 what order of magnitude would you guess? Joda Maki Feb 23 '11 at 22:40 You could just time it yourself. More authoritative than anything here. nate c Feb 23 '11 at 22:53 I wouldn't expect the order of magnitude to be knowable in general. If the database has to materialize the entire result set and sort the data, it's going to be a bit more efficient than the middle tier, but not much (assuming a reasonable middle tier implementation). If the database can choose a query plan that avoids having to do an explicit sort, it's arguably infinitely faster than sorting in the middle tier. Justin Cave Feb 23 '11 at 23:30 feedback
3 Answers
active oldest
5098016
votes
It will almost certainly be more efficient to sort the data in the database. Databases are
designed to deal with large data volumes. And there are various optimizations available up vote down to the database that would not be available to the middle tier. If you plan on writing a vote hyper-efficient sort routine in the middle tier that takes advantage of information that you have about your data that the database doesn't (i.e. farming the data out to a cluster of
accept ed
dozens of middle tier machines so that the sort never spills to disk, taking advantage of the fact that your data is mostly ordered to choose an algorithm that wouldn't normally be particularly efficient), you can probably beat the database's sort speed. But that tends to be rare. Depending on the query, for example, the database optimizer may choose a query plan that returns the data in order without performing a sort. For example, the database knows that the data in an index is sorted so it may choose to do an index scan to return the data in order without ever having to materialize and sort the entire result set. If it does have to materialized the entire result, it only needs the columns you are sorting by and some sort of row identifier (i.e. a ROWID in Oracle) rather than sorting an entire row of data like a naive middle tier implementation is likely to do. For example, if you have a composite index on (col1, col2) and you decide to sort on UPPER(col2), LOWER(col1), the database could read the col1 & col2 values from the index, sort the row identifiers, and then go fetch the data from the table. Of course, the database doesn't have to do this-- the optimizer will take into account the cost of doing a sort against the cost of fetching the data from the table or from the various indexes. The database may well conclude that the most efficient approach is to do a table scan, read the entire row into memory, and sort it. It may conclude that leveraging an index results in more I/O to fetch the data but makes up for it by reducing or eliminating the sort costs.
share|improve this answer edited Feb 23 '11 at 22:43 answered Feb 23 '11 at 22:27
Justin Cave 65.8k64875 can you expand on the part about the only needing columns and a row ID. You mean it will fetch certain columns, sort them, then go back and fetch the full columns based on the sort order? This seems very slow with double fetching each row from disk Joda Maki Feb 23 '11 at 22:34 @Joda - Expanded a bit. I didn't mean to imply that it had to fetch the data multiple times just that the database may have various structures that it could leverage to optimize (or eliminate) the need to do a sort. Justin Cave Feb 23 '11 at 22:44 feedback
5097989
The answer is... it depends. If the ORDER BY part can be done by using an index in the database, then the execution plan for the query will use that index and the results will
come back in the right order straight from the DB. If not, then the database will perform the sorting, but it's likely better at it than you reading all the results into memory (and certainly better than reading the results into a linked list).
share|improve this answer answered Feb 23 '11 at 22:25
Chris Nash 1,388113 I guess I just don't understand what the intermediate data structures typically look like that the DB will use to do its own sorting. Care to shed any light? Joda Maki Feb 23 '11 at 22:38 Tasks like sorting are things that databases are good at, so the data structures are designed to make that efficient - for example, balanced binary trees that are kept updated with every record insert. The index doesn't contain the entire row, just a record ID, primary key, location within the DB. When you ask for a sorted result, it can quickly return those locations in the order you asked for, and then lookup the full rows for the resultset. Chris Nash Feb 23 '11 at 22:45 Only in the smallest data set, will performance be equal between the database and an application language. OMG Ponies Feb 23 '11 at 22:49
5098003
feedback The exact method depends on the product you are using, but normally a fully-featured
up vote down space over time, some work in memory, optimizing for speed. Check the source code of vote the available open source ones, if you are interested in the gory details.
DBMS has multiple sort algorithms at its disposal. Some work on disk, optimizing for
It's unlikely that you are going to get better results by doing the sorting yourself or using some other library, although there can be pathological cases such as some operating system's qsort() having problems with certain data distributions. Try it out if you must, but prefer using a DBMS to manage your data, because that's what they are good at.