I got a request on how to improve performance for table update with 33,617,240 rows running for 2.5 hours as shown:
SET a.status =
(CASE WHEN a.dt1 IS NOT NULL THEN ‘PROCESSED’ WHEN a.dt2 IS NULL THEN NULL END);
32935613 rows updated.
As written, 32,935,613 of 33,617,240 rows updated.
This is a good demonstration on how filtering early and understanding the data is a HUGE benefit.
If status is already ‘PROCESSED’, then is it necessary to update?
select count(*) from a where dt1 IS NOT NULL and status<>’PROCESSED’;
If status is already NULL, then is it necessary to update?
select count(*) from a where dt2 IS NULL and status IS NOT NULL;
The answer to both questions is NO and we don’t want to peform blanket update.
The SQL can be written as:
SELECT /*+ PARALLEL(a,4) FULL(a) */
WHEN dt1 IS NOT NULL THEN ‘PROCESSED’
WHEN dt2 IS NULL THEN NULL
WHERE (dt1 IS NOT NULL AND status <> ‘PROCESSED’) OR (dt2 IS NULL AND status IS NOT NULL)
14604 rows updated.
The result is only 14,604 versus 32,935,613 rows updated.
Huge difference in performnace.
Without hints: 47s direct path read + 28s CPU
With hints: 17s direct path read + 13s CPU