0% found this document useful (0 votes)
69 views5 pages

EXISTS Conditions: An Condition Tests For Existence of Rows in A Subquery

The document describes an efficient method for deleting duplicate rows from a table using a PL/SQL stored procedure. The procedure works by: 1. Selecting duplicate rows into a cursor, sorted by the duplicate key columns 2. Looping through the cursor rows, comparing the current row's key to the previous, and deleting any rows with matching keys except the first 3. This allows controlling which duplicate row is kept for each group, and performs the deletion more efficiently than a NOT IN clause by avoiding multiple table scans. On a test table with 500k rows and 45k duplicates, the stored procedure completed the deletion much faster than an alternative SQL method.

Uploaded by

yalamandu4358
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views5 pages

EXISTS Conditions: An Condition Tests For Existence of Rows in A Subquery

The document describes an efficient method for deleting duplicate rows from a table using a PL/SQL stored procedure. The procedure works by: 1. Selecting duplicate rows into a cursor, sorted by the duplicate key columns 2. Looping through the cursor rows, comparing the current row's key to the previous, and deleting any rows with matching keys except the first 3. This allows controlling which duplicate row is kept for each group, and performs the deletion more efficiently than a NOT IN clause by avoiding multiple table scans. On a test table with 500k rows and 45k duplicates, the stored procedure completed the deletion much faster than an alternative SQL method.

Uploaded by

yalamandu4358
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 5

EXISTS Conditions

An EXISTS condition tests for existence of rows in a subquery.

exists_condition::=

Text description of exists_condition

Table 5-10 shows the EXISTS condition.

Table 5-10 EXISTS Conditions

Condition Operation Example


EXISTS TRUE if a subquery returns at least one SELECT department_id
FROM departments d
row. WHERE EXISTS
(SELECT * FROM employees
e
WHERE d.department_id
= e.department_id);
Description: With reference to your tip for the week 08/04/2002 submitted by Madan Patil, this is another
way of deleting duplicate rows from a table. The difference being the time it takes to delete the duplicate
rows with this method is many times faster than the earlier method.

Let us take a table containing 3 columns, then we can use the following command to delete the duplicate
rows from the table.

delete from where rowid in (


SELECT rowid FROM
group by rowid,col1,col2,col3
minus
SELECT min(rowid) FROM

group by col1,col2,col3);

To show the difference let's consider a table:

Table: EMP
EMPNO NUMBER
ENAME VARCHAR2(20)
JOB VARCHAR2(20)

CREATE TABLE EMP (


EMPNO NUMBER,
ENAME VARCHAR2(20),
JOB VARCHAR2(20)
);
/
begin
for i in 1..20 loop
insert into emp values (1,'xx','clerk');
end loop;

commit;
end;
/
begin
for i in 1..20 loop
insert into emp values (2,'yy','accountant');
end loop;
commit;
end;
/
begin

for i in 1..20000 loop


insert into emp values (3,'zz','manager');
end loop;
commit;
end;
/
begin
for i in 1..10000 loop
insert into emp values (4,'ab','accountant');
end loop;
commit;
end;

Using the previous method as in your TIP for the Week 08/04/2002

------------------------------------------------------------------------
SQL> select count(*) from emp;

COUNT(*)
----------
30040

SQL> set timing on;


SQL> DELETE FROM EMP E
2 WHERE E.ROWID > ANY (SELECT ROWID
3 FROM EMP M
4 WHERE M.EMPNO = E.EMPNO
5 AND M.ENAME = E.ENAME
6 AND M.JOB = E.JOB );

30036 rows deleted.

Elapsed: 00:03:207.48

SQL> select count(*) from emp;

COUNT(*)
----------
4

Elapsed: 00:00:00.10

Using the NEW suggested method:

-------------------------------------------
SQL> select count(*) from emp;

COUNT(*)
----------
30040
SQL> delete from emp where rowid in (
2 SELECT rowid FROM emp
3 group by rowid,empno,ename,job
4 minus
5 SELECT min(rowid) FROM emp
6 group by empno,ename,job);
30036 rows deleted.

Elapsed: 00:00:02.94

SQL> select count(*) from emp;

COUNT(*)
----------
4

Elapsed: 00:00:00.10
--------------------------------------------------------------------

As we can see the difference is multifold to achieve the same result. This is because the new method uses
the set operator to compute the list of duplicate rows. The bigger the table the better you can appreciate the
difference.

>>> MIN() allows you to select one row per group—duplicates and non-duplicates—so that you
get a list of all the rows you want to keep:

SELECT MIN(ID) AS ID, LastName, FirstName


FROM Customers
GROUP BY LastName, FirstName;
Listing 5 shows the output of the above code.

Now you just need to delete rows that are not in this list, using the last query as a subquery inside
an antijoin (the NOT IN clause):

DELETE FROM Customers


WHERE ID NOT IN
(SELECT MIN(ID)
FROM Customers
GROUP BY LastName, FirstName);
However, an antijoin query with the NOT IN clause is inefficient to make this work. In our case
two (!) full table scans need to be performed to resolve this SQL statement. That leads to
substantial performance loss for big data sets. For performance testing I created the Customers
data set with 500,000 rows and 45,000 duplicates (9 percent of the total). The above command
ran for more than one hour with no results—except that it exhausted my patience—so I killed the
process.

Another disadvantage of this syntax is that you can't control which row per group of duplicates
you can keep in the database.

A PL/SQL Solution: Deleting Duplicate Data with a Stored Procedure


Let me give you an example of a PL/SQL stored procedure, called DeleteDuplicate (see Listing
6), that cleans up duplicates. The algorithm for this procedure is pretty straightforward:

1. It selects the duplicate data in the cursor, sorted by duplicate key (LastName, FirstName
in our case), as shown in Listing 4.

2. It opens the cursor and fetches each row, one by one, in a loop.

3. It compares the duplicate key value with the previously fetched one.
4. If this is a first fetch, or the value is different, then that's the first row in a new group so it
skips it and fetches the next row. Otherwise, it's a duplicate row within the same group,
so it deletes it.

Let's run the stored procedure and check it against the Customers data:
BEGIN
DeleteDuplicates;
END;
/

SELECT LastName, FirstName, COUNT(*)


FROM Customers
GROUP BY LastName, FirstName
HAVING COUNT(*) > 1;
The last SELECT statement returns no rows, so the duplicates are gone.

The main job of extracting duplicates in this procedure is done by a SQL statement, which is
defined in the csr_Duplicates cursor. The PL/SQL procedural code is used only to implement the
logic of deleting all rows in the group except the first one. Could it all be done by one SQL
statement?

You might also like