ORA-01555 Snapshot Too Old — The Root Cause Analysis Your Team Isn’t Doing- Let us discuss this error again!

RA-01555 is one of the most mishandled errors in Oracle production support. The fix is almost always “increase undo retention” — and teams apply it, the error goes away for a while, and eventually it comes back. This cycle repeats because the team is treating the symptom. The error comes back because the root cause — which is never just “undo retention is too low” — was never properly investigated.

Let me show you how to actually diagnose this.


What’s Actually Happening

Oracle’s read consistency mechanism guarantees that a query sees a consistent snapshot of data as of its start time — even if other transactions modify that data while the query runs. To reconstruct the old version of a changed block, Oracle uses undo data.

ORA-01555 fires when Oracle needs to reconstruct a block version from undo, but the undo data has been overwritten. The undo segment was reused before the query finished.

Three things must be true simultaneously for ORA-01555 to occur:

  1. A long-running query is reading a block
  2. That block has been modified by another transaction while the query runs
  3. The undo data for that modification has been overwritten

All three. Increasing undo retention affects condition 3, but conditions 1 and 2 are just as important.


Identifying the Long-Running Query

The first thing I capture when ORA-01555 is reported is the query that failed. The error message includes information:

ORA-01555: snapshot too old: rollback segment number 12
with name "_SYSSMU12_3749283649$" too small

But more useful is querying the session and SQL at the time of the error:

sql

-- If the session is still active:
SELECT s.sid, s.serial#, s.username, s.sql_id,
s.last_call_et elapsed_seconds,
sq.sql_text
FROM v$session s
JOIN v$sql sq ON s.sql_id = sq.sql_id
WHERE s.status = 'ACTIVE'
ORDER BY s.last_call_et DESC;

A query that’s been running for 3 hours on a table with high DML activity is always going to be vulnerable to ORA-01555. The right question is: why is this query taking 3 hours? That’s the real problem.


Measuring Actual Undo Retention vs Configured Retention

sql

-- How long is Oracle actually retaining undo?
SELECT begin_time, end_time,
undoblks,
txnconcurrency,
maxquerylen, -- longest query duration in this period (seconds)
maxqueryid, -- SQL_ID of that longest query
ssolderrcnt, -- ORA-01555 errors in this period
nospaceerrcnt -- "out of undo space" errors
FROM v$undostat
ORDER BY begin_time DESC;

MAXQUERYLEN is critically important. It tells you the actual maximum query duration in each 10-minute window. Compare this to your UNDO_RETENTION setting. If MAXQUERYLEN consistently exceeds UNDO_RETENTION, you’re at risk.

SSOLDERRCNT directly counts ORA-01555 errors by time window. If you see errors concentrated at a specific time — say, every night between 2 AM and 4 AM — that tells you when your longest queries run and exactly when they conflict with DML workloads.


The Undo Tablespace Sizing Issue

Here’s a subtlety that many people miss: UNDO_RETENTION is a request, not a guarantee. Oracle will try to retain undo for the configured duration, but if the undo tablespace runs out of space, Oracle overwrites the oldest undo regardless of the retention setting.

sql

-- Check undo tablespace usage
SELECT tablespace_name,
SUM(bytes)/1024/1024 total_mb,
SUM(CASE WHEN status = 'ACTIVE' THEN bytes ELSE 0 END)/1024/1024 active_mb,
SUM(CASE WHEN status = 'UNEXPIRED' THEN bytes ELSE 0 END)/1024/1024 unexpired_mb,
SUM(CASE WHEN status = 'EXPIRED' THEN bytes ELSE 0 END)/1024/1024 expired_mb
FROM dba_undo_extents
GROUP BY tablespace_name;

If ACTIVE_MB + UNEXPIRED_MBTOTAL_MB, your undo tablespace is full and Oracle is overwriting unexpired undo. Increasing UNDO_RETENTION without increasing undo tablespace size accomplishes nothing.

Use this query to size your undo tablespace correctly:

sql

-- Calculate required undo tablespace size
-- Based on: undo generation rate × desired retention
SELECT d.undo_retention,
u.undoblks * 8192 / 1024 / 1024 undo_mb_per_10min,
(u.undoblks * 8192 / 1024 / 1024) * (d.undo_retention / 600) required_mb
FROM (SELECT TO_NUMBER(value) undo_retention FROM v$parameter WHERE name = 'undo_retention') d,
(SELECT AVG(undoblks) undoblks FROM v$undostat WHERE begin_time > SYSDATE - 7) u;

This calculates the required undo tablespace based on your actual undo generation rate and desired retention.


The Application-Layer Root Cause

After all the database-level investigation, the root cause of ORA-01555 is almost always one of two application patterns:

Pattern 1: Reports or batch jobs that run long queries on tables with concurrent heavy DML

The report runs for 2 hours. During those 2 hours, the OLTP application is doing thousands of updates to the same tables. The undo for those updates gets overwritten before the report finishes reading old block versions.

Fix: run long reports during low-DML windows. Or move reporting to a standby database (Active Data Guard). Or redesign the report to work with smaller time windows. Or implement a proper data warehouse and stop running analytical reports on the OLTP database.

Pattern 2: DBLINK queries that hold a distributed transaction open

sql

-- This can cause ORA-01555 on the remote database
SELECT * FROM remote_table@dblink
WHERE some_column = some_value;

When you query through a database link, Oracle on the remote side maintains a read consistent snapshot for the duration of the remote query. If the local query is slow (network issues, local processing), the remote undo must be retained for the entire duration. This is completely invisible to the remote DBA who sees ORA-01555 without any local long-running queries.

Check your application for database link queries and their duration.



Yorum bırakın