There’s a critical distinction between running backups and having a recovery strategy. Many production databases have RMAN jobs that run nightly, complete with exit code 0, and the DBA who set them up three years ago is confident that the database is protected. Until the day they actually need to restore. I’ve seen this scenario enough times to be emphatic: if you haven’t tested your recovery from backup in the last six months, you don’t have a recovery strategy. You have a backup job.
Understanding What RMAN Actually Backs Up
RMAN backs up datafiles, control files, archived redo logs, and optionally the SPFILE. What it does not automatically back up:
- Oracle software binaries (your ORACLE_HOME)
- External tables pointing to OS files
- OS-level configuration files (
/etc/oratab, listener.ora, tnsnames.ora) - ORACLE_HOME patches (only the inventory records which patches were applied)
- Wallet files (for TDE-encrypted databases)
This last one is the silent killer. If you’re running TDE and you restore your database but the wallet is gone or corrupted, your datafiles are encrypted and unreadable. Many DBAs back up the database meticulously and forget the wallet entirely.
bash
# Find your wallet locationSELECT wrl_parameter FROM v$encryption_wallet;# Include wallet in your backup procedure# At minimum, copy the wallet directory to a separate secure location# after any key rotationcp -r /etc/oracle/wallets/backup_$(date +%Y%m%d)/
Backup Validation — The Command Most DBAs Skip
RMAN’s VALIDATE command checks backup pieces for physical corruption without actually restoring data. CROSSCHECK verifies that the backup pieces still exist in their configured location. Both are essential and both are frequently absent from backup jobs.
bash
# Validate recent backups for corruptionRMAN> VALIDATE BACKUPSET ALL;# More targeted - validate last 7 daysRMAN> VALIDATE BACKUP COMPLETED AFTER 'SYSDATE-7';# Crosscheck that backups still exist where RMAN thinks they areRMAN> CROSSCHECK BACKUP;RMAN> CROSSCHECK ARCHIVELOG ALL;# Delete expired (missing) backup recordsRMAN> DELETE EXPIRED BACKUP;
If VALIDATE finds corruptions, your backup is unusable. You need to know this before you need to recover, not during.
Recovery Time Objective — Have You Measured It?
RTO (Recovery Time Objective) is the maximum acceptable time to restore service after a failure. Most organizations have an RTO defined in their disaster recovery policy. Most DBAs have never actually measured how long their recovery takes.
For a database restore from backup, recovery time depends on:
- Database size
- Backup location (local disk, NFS, Object Storage, tape)
- Network bandwidth to backup storage
- Number of archived logs to apply after datafile restore
- Whether you’re restoring to the same server or a new one
I recommend performing a timed restore drill on a non-production environment annually. Use production-representative database sizes. Restore from your actual backup location. Measure every phase:
- Time to restore datafiles from backup: ___
- Time to recover (apply archived logs): ___
- Time to open database and validate: ___
- Time to validate application functionality: ___
- Total RTO: ___
Compare this to your documented RTO. If your RTO is 2 hours and your drill shows 6 hours, you have a gap that needs to be addressed — either by improving your backup/recovery infrastructure, implementing Data Guard as a faster recovery path, or renegotiating your RTO with the business.
RMAN on OCI — Object Storage Integration
On OCI, storing RMAN backups directly in Object Storage is cost-effective (much cheaper than Block Volume) and provides geographic resilience. The OCI Object Storage service acts as an RMAN media management layer through the OCI RMAN connector.
bash
# Configure RMAN to use OCI Object StorageCONFIGURE CHANNEL DEVICE TYPE SBT PARMS='SBT_LIBRARY=/opt/oracle/dcs/commonstore/pkgrepos/oss/odbcs/libopc.so, ENV=(OPC_PFILE=/home/oracle/opc_config.cfg)';CONFIGURE DEFAULT DEVICE TYPE TO SBT;BACKUP DATABASE PLUS ARCHIVELOG;
The opc_config.cfg file contains your OCI Object Storage credentials. Keep this file secured (permissions 600) — it contains credentials that can read and write your backup storage.
For backup retention, Object Storage lifecycle policies can automatically tier older backups to Archive Storage (significantly cheaper) and delete backups older than your retention period. Configure this at the storage tier rather than only relying on RMAN retention policies — defense in depth.
Handling Block Corruptions in Production
When RMAN detects a corrupt block during backup, it’s easy to dismiss if the backup still completes. Don’t. A reported corrupt block in production is a critical finding.
sql
-- Check for known corrupt blocksSELECT * FROM v$database_block_corruption;-- RMAN can repair corrupt blocks if you have a valid backupRMAN> BLOCKRECOVER CORRUPTION LIST;-- Or repair specific blocksRMAN> BLOCKRECOVER DATAFILE 5 BLOCK 1234;
BLOCKRECOVER is one of RMAN’s most powerful features — it restores only the specific corrupt blocks from backup rather than restoring entire datafiles. For large datafiles with isolated corruption, this is dramatically faster than a full datafile restore.
After recovering corrupt blocks, investigate the root cause. Corrupt blocks don’t appear randomly. Common causes: storage hardware issues, SAN/NAS firmware bugs, OS-level I/O errors, or Oracle bugs. Check your OS-level storage logs and your Oracle alert log. If you see corrupt blocks across multiple datafiles, investigate your storage infrastructure immediately.
The Backup Job That Lies to You
Here’s a scenario I’ve investigated twice in my career. RMAN backup job runs nightly. Completes successfully. Exit code 0. Nobody notices that three months ago, the archivelog backup started failing silently because the archive log destination filled up and Oracle stopped archiving. The database was running in NOARCHIVELOG mode by this point (the archiver died). RMAN backs up datafiles happily, but without archived logs, point-in-time recovery is impossible. The backup job says SUCCESS every night.
sql
-- Verify archivelog modeARCHIVE LOG LIST;-- Check archiver statusSELECT dest_id, dest_name, status, target, archiver, scheduleFROM v$archive_dest_statusWHERE status != 'INACTIVE';-- Check for recent archiving activitySELECT sequence#, first_time, completion_time, archivedFROM v$log_historyWHERE first_time > SYSDATE - 1ORDER BY sequence# DESC;
Build a backup validation job that doesn’t just check RMAN exit code. Check that archived logs are being generated and backed up. Check that your backup pieces are not all going to the same disk as your datafiles. Check that Object Storage backups are actually landing in the bucket.
A backup strategy that only checks the job exit code is not a backup strategy.

Yorum bırakın