PL/SQL Bulk Collect
PL/SQL Bulk Collect
On BULK COLLECT
By Steven Feuerstein
I have started using BULK COLLECT whenever I need to fetch large volumes of data. This
has caused me some trouble with my DBA, however. He is complaining that although my
programs might be running much faster, they are also consuming way too much memory. He
refuses to approve them for a production rollout. What's a programmer to do?
The most important thing to remember when you learn about and start to take advantage of
features such as BULK COLLECT is that there is no free lunch. There is almost always a
trade-off to be made somewhere. The tradeoff with BULK COLLECT, like so many other
performance-enhancing features, is "run faster but consume more memory."
Specifically, memory for collections is stored in the program global area (PGA), not the
system global area (SGA). SGA memory is shared by all sessions connected to Oracle
Database, but PGA memory is allocated for each session. Thus, if a program requires 5MB of
memory to populate a collection and there are 100 simultaneous connections, that program
causes the consumption of 500MB of PGA memory, in addition to the memory allocated to
the SGA.
Fortunately, PL/SQL makes it easy for developers to control the amount of memory used in a
BULK COLLECT operation by using the LIMIT clause.
Suppose I need to retrieve all the rows from the employees table and then perform some
compensation analysis on each row. I can use BULK COLLECT as follows:
PROCEDURE process_all_rows
IS
TYPE employees_aat
IS TABLE OF employees%ROWTYPE
INDEX BY PLS_INTEGER;
l_employees employees_aat;
BEGIN
SELECT *
BULK COLLECT INTO l_employees
FROM employees;
Very concise, elegant, and efficient code. If, however, my employees table contains tens of
thousands of rows, each of which contains hundreds of columns, this program can cause
excessive PGA memory consumption.
Consequently, you should avoid this sort of "unlimited" use of BULK COLLECT. Instead,
move the SELECT statement into an explicit cursor declaration and then use a simple loop to
fetch many, but not all, rows from the table with each execution of the loop body, as shown
in Listing 1.
l_employees employees_aat;
BEGIN
OPEN employees_cur;
LOOP
FETCH employees_cur
BULK COLLECT INTO l_employees LIMIT limit_in;
END LOOP;
CLOSE employees_cur;
END process_all_rows;
The process_all_rows procedure in Listing 1 requests that up to the value of limit_in rows be
fetched at a time. PL/SQL will reuse the same limit_in elements in the collection each time
the data is fetched and thus also reuse the same memory. Even if my table grows in size, the
PGA consumption will remain stable.
How do you decide what number to use in the LIMIT clause? Theoretically, you will want to
figure out how much memory you can afford to consume in the PGA and then adjust the limit
to be as close to that amount as possible.
From tests I (and others) have performed, however, it appears that you will see roughly the
same performance no matter what value you choose for the limit, as long as it is at least 25.
The test_diff_limits.sql script, included with the sample code for this column, demonstrates
this behavior, using the ALL_SOURCE data dictionary view on an Oracle Database 11g
instance. Here are the results I saw (in hundredths of seconds) when fetching all the rows (a
total of 470,000):
I was very happy to learn that Oracle Database 10g will automatically optimize my cursor
FOR loops to perform at speeds comparable to BULK COLLECT. Unfortunately, my
company is still running on Oracle9i Database, so I have started converting my cursor FOR
loops to BULK COLLECTs. I have run into a problem: I am using a LIMIT of 100, and my
query retrieves a total of 227 rows, but my program processes only 200 of them. [The query
is shown in Listing 2.] What am I doing wrong?
PROCEDURE process_all_rows
IS
CURSOR table_with_227_rows_cur
IS
SELECT * FROM table_with_227_rows;
TYPE table_with_227_rows_aat IS
TABLE OF table_with_227_rows_cur%ROWTYPE
INDEX BY PLS_INTEGER;
l_table_with_227_rows table_with_227_rows_aat;
BEGIN
OPEN table_with_227_rows_cur;
LOOP
FETCH table_with_227_rows_cur
BULK COLLECT INTO l_table_with_227_rows LIMIT 100;
CLOSE table_with_227_rows_cur;
END process_all_rows;
You came so close to a completely correct conversion from your cursor FOR loop to BULK
COLLECT! Your only mistake was that you didn't give up the habit of using the
%NOTFOUND cursor attribute in your EXIT WHEN clause.
The statement
EXIT WHEN
table_with_227_rows_cur%NOTFOUND;
makes perfect sense when you are fetching your data one row at a time. With BULK
COLLECT, however, that line of code can result in incomplete data processing, precisely as
you described.
Let's examine what is happening when you run your program and why those last 27 rows are
left out. After opening the cursor and entering the loop, here is what occurs:
EXIT WHEN
table_with_227_rows_cur%NOTFOUND;
with
EXIT WHEN
l_table_with_227_rows.COUNT = 0;
Generally, you should keep all of the following in mind when working with BULK
COLLECT: