0% found this document useful (0 votes)
62 views3 pages

How Expensive Is SQL ORDER BY

The database will likely be faster than sorting in the application code. If an index can be used to retrieve results in order, no sorting is needed. Otherwise, databases are optimized for sorting large datasets using techniques like balanced binary trees, while minimizing disk access. The exact methods depend on the database, but it will generally outperform application-level sorting.

Uploaded by

exbis
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views3 pages

How Expensive Is SQL ORDER BY

The database will likely be faster than sorting in the application code. If an index can be used to retrieve results in order, no sorting is needed. Otherwise, databases are optimized for sorting large datasets using techniques like balanced binary trees, while minimizing disk access. The exact methods depend on the database, but it will generally outperform application-level sorting.

Uploaded by

exbis
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

How Expensive is SQL ORDER BY?

5097942

I don't quite understand how a SQL command would sort a large resultset. Is it done in memory on the fly (i.e. when a query is perfomed)?

up vote down Is is going to be faster to sort using ORDER BY in SQL rather than sort say a linked list vote
favorite

of objects containing the results in a language like Java (assuming a fast built-in sort, probably using quicksort)?
sql sorting

share|improve this question

asked Feb 23 '11 at 22:19

Joda Maki 57329

65% accept rate

Its almost always faster if the database does it. Rook Feb 23 '11 at 22:32 what order of magnitude would you guess? Joda Maki Feb 23 '11 at 22:40 You could just time it yourself. More authoritative than anything here. nate c Feb 23 '11 at 22:53 I wouldn't expect the order of magnitude to be knowable in general. If the database has to materialize the entire result set and sort the data, it's going to be a bit more efficient than the middle tier, but not much (assuming a reasonable middle tier implementation). If the database can choose a query plan that avoids having to do an explicit sort, it's arguably infinitely faster than sorting in the middle tier. Justin Cave Feb 23 '11 at 23:30 feedback

3 Answers
active oldest
5098016

votes
It will almost certainly be more efficient to sort the data in the database. Databases are

designed to deal with large data volumes. And there are various optimizations available up vote down to the database that would not be available to the middle tier. If you plan on writing a vote hyper-efficient sort routine in the middle tier that takes advantage of information that you have about your data that the database doesn't (i.e. farming the data out to a cluster of

accept ed

dozens of middle tier machines so that the sort never spills to disk, taking advantage of the fact that your data is mostly ordered to choose an algorithm that wouldn't normally be particularly efficient), you can probably beat the database's sort speed. But that tends to be rare. Depending on the query, for example, the database optimizer may choose a query plan that returns the data in order without performing a sort. For example, the database knows that the data in an index is sorted so it may choose to do an index scan to return the data in order without ever having to materialize and sort the entire result set. If it does have to materialized the entire result, it only needs the columns you are sorting by and some sort of row identifier (i.e. a ROWID in Oracle) rather than sorting an entire row of data like a naive middle tier implementation is likely to do. For example, if you have a composite index on (col1, col2) and you decide to sort on UPPER(col2), LOWER(col1), the database could read the col1 & col2 values from the index, sort the row identifiers, and then go fetch the data from the table. Of course, the database doesn't have to do this-- the optimizer will take into account the cost of doing a sort against the cost of fetching the data from the table or from the various indexes. The database may well conclude that the most efficient approach is to do a table scan, read the entire row into memory, and sort it. It may conclude that leveraging an index results in more I/O to fetch the data but makes up for it by reducing or eliminating the sort costs.
share|improve this answer edited Feb 23 '11 at 22:43 answered Feb 23 '11 at 22:27

Justin Cave 65.8k64875 can you expand on the part about the only needing columns and a row ID. You mean it will fetch certain columns, sort them, then go back and fetch the full columns based on the sort order? This seems very slow with double fetching each row from disk Joda Maki Feb 23 '11 at 22:34 @Joda - Expanded a bit. I didn't mean to imply that it had to fetch the data multiple times just that the database may have various structures that it could leverage to optimize (or eliminate) the need to do a sort. Justin Cave Feb 23 '11 at 22:44 feedback

5097989

The answer is... it depends. If the ORDER BY part can be done by using an index in the database, then the execution plan for the query will use that index and the results will

up vote down vote

come back in the right order straight from the DB. If not, then the database will perform the sorting, but it's likely better at it than you reading all the results into memory (and certainly better than reading the results into a linked list).
share|improve this answer answered Feb 23 '11 at 22:25

Chris Nash 1,388113 I guess I just don't understand what the intermediate data structures typically look like that the DB will use to do its own sorting. Care to shed any light? Joda Maki Feb 23 '11 at 22:38 Tasks like sorting are things that databases are good at, so the data structures are designed to make that efficient - for example, balanced binary trees that are kept updated with every record insert. The index doesn't contain the entire row, just a record ID, primary key, location within the DB. When you ask for a sorted result, it can quickly return those locations in the order you asked for, and then lookup the full rows for the resultset. Chris Nash Feb 23 '11 at 22:45 Only in the smallest data set, will performance be equal between the database and an application language. OMG Ponies Feb 23 '11 at 22:49

5098003

feedback The exact method depends on the product you are using, but normally a fully-featured

up vote down space over time, some work in memory, optimizing for speed. Check the source code of vote the available open source ones, if you are interested in the gory details.

DBMS has multiple sort algorithms at its disposal. Some work on disk, optimizing for

It's unlikely that you are going to get better results by doing the sorting yourself or using some other library, although there can be pathological cases such as some operating system's qsort() having problems with certain data distributions. Try it out if you must, but prefer using a DBMS to manage your data, because that's what they are good at.

You might also like