I have faced with that issue on our 12.1.3 EBS system on AIX system with 2 node 11gR2 RAC databases.
During the full backup operation, I noticed that RMAN log mention below error message at the backup log file.
RMAN-03009: failure of backup command on ORA_DISK_3 channel at 03/29/2012 22:19:19 ORA-19566: exceeded limit of 0 corrupt blocks for file +ORADATA/PROD/datafile/apps_ts_xxcustom_data.308.752148073 RMAN-03009: failure of backup command on ORA_DISK_2 channel at 03/29/2012 21:42:32 ORA-19566: exceeded limit of 0 corrupt blocks for file +ORADATA/PROD/datafile/apps_ts_xxcustom_data.395.741949465 RMAN-03009: failure of backup command on ORA_DISK_8 channel at 03/29/2012 22:18:24 ORA-19566: exceeded limit of 0 corrupt blocks for file +ORADATA/PROD/datafile/apps_ts_xxcustom_data.583.753881375
We have below error at alert.log:
Hex dump of (file 440, block 1106642) in trace file /oracle11g/PROD/db/diag/rdbms/PROD/PROD1/trace/PROD1_ora_45744742.trc Corrupt block relative dba: 0x6e10e2d2 (file 440, block 1106642) Bad check value found during backing up datafile Data in bad block: type: 6 format: 2 rdba: 0x6e10e2d2 last change scn: 0x08fc.fd857452 seq: 0x2 flg: 0x04 spare1: 0x0 spare2: 0x0 spare3: 0x0 consistency value in tail: 0x74520602 check value in block header: 0x937c computed block checksum: 0x3a7 Reread of blocknum=1106642, file=+ORADATA/PROD/datafile/apps_ts_xxcustom_data.716.771786733. found same corrupt data Reread of blocknum=1106642, file=+ORADATA/PROD/datafile/apps_ts_xxcustom_data.716.771786733. found same corrupt data Reread of blocknum=1106642, file=+ORADATA/PROD/datafile/apps_ts_xxcustom_data.716.771786733. found same corrupt data Reread of blocknum=1106642, file=+ORADATA/PROD/datafile/apps_ts_xxcustom_data.716.771786733. found same corrupt data Reread of blocknum=1106642, file=+ORADATA/PROD/datafile/apps_ts_xxcustom_data.716.771786733. found same corrupt data
After some search on Metalink and I followed Intermittent Ora-19566 When Using RMAN Backup [ID 549256.1] and then We first started to work with our storage team to can identified Issue is related with storage or not.
I noticed that issue is not with storage level. So We decided to go with SR1.
During SR session I followed below steps to identified why I faced this corruption and how we can solve an issue in a short period.
Firstly While corruption happens Oracle started to fill in V$DATABASE_BLOCK_CORRUPTION dynamic view. So I query below view
select * from V$DATABASE_BLOCK_CORRUPTION ;
INST_ID | FILE# | BLOCK# | BLOCKS | CORRUPTION_CHANGE# | CORRUPTION_TYPE |
1 | 71 | 1.340.426 | 1 | 0 | CHECKSUM |
1 | 81 | 1.338.102 | 1 | 0 | CHECKSUM |
1 | 101 | 1.335.514 | 1 | 0 | CHECKSUM |
1 | 139 | 1.332.486 | 1 | 0 | CHECKSUM |
1 | 192 | 1.074.210 | 1 | 0 | CHECKSUM |
1 | 234 | 980.706 | 1 | 0 | CHECKSUM |
1 | 284 | 974.490 | 1 | 0 | CHECKSUM |
1 | 307 | 1.060.358 | 1 | 0 | CHECKSUM |
1 | 309 | 1.047.338 | 1 | 0 | CHECKSUM |
1 | 334 | 926.887 | 1 | 0 | CHECKSUM |
1 | 354 | 882.790 | 1 | 0 | CHECKSUM |
1 | 413 | 1.205.042 | 1 | 0 | CHECKSUM |
1 | 440 | 1.106.642 | 1 | 0 | CHECKSUM |
1 | 740 | 1.243.698 | 1 | 0 | CHECKSUM |
I was trying to check if the blocks are on free space or occupied by any object.
Even if the blocks were occupied by any object and if that object was dropped the block would still be in corrupted formatted unless Oracle reuses it.
If the blocks are indeed in free space then we use a document to manually format the block.
So the output of these queries is important.
SELECT e.owner, e.segment_type, e.segment_name, e.partition_name, c.file# , greatest(e.block_id, c.block#) corr_start_block# , least(e.block_id+e.blocks-1, c.block#+c.blocks-1) corr_end_block# , least(e.block_id+e.blocks-1, c.block#+c.blocks-1) - greatest(e.block_id, c.block#) + 1 blocks_corrupted , null description FROM dba_extents e, v$database_block_corruption c WHERE e.file_id = c.file# AND e.block_id <= c.block# + c.blocks - 1 AND e.block_id + e.blocks - 1 >= c.block#; SELECT s.owner, s.segment_type, s.segment_name, s.partition_name, c.file# , header_block corr_start_block# , header_block corr_end_block# , 1 blocks_corrupted , 'Segment Header' description FROM dba_segments s, v$database_block_corruption c WHERE s.header_file = c.file# AND s.header_block between c.block# and c.block# + c.blocks - 1; SELECT null owner, null segment_type, null segment_name, null partition_name, c.file# , greatest(f.block_id, c.block#) corr_start_block# , least(f.block_id+f.blocks-1, c.block#+c.blocks-1) corr_end_block# , least(f.block_id+f.blocks-1, c.block#+c.blocks-1) - greatest(f.block_id, c.block#) + 1 blocks_corrupted , 'Free Block' description FROM dba_free_space f, v$database_block_corruption c WHERE f.file_id = c.file# AND f.block_id <= c.block# + c.blocks - 1 AND f.block_id + f.blocks - 1 >= c.block# order by file#, corr_start_block#;
After using upper queries We run below queries:
SQL>Select file#,block# from v$database_block_corruption ;
From rman:
RMAN > Validate datafile <file no> block <Block no> ; PS: You can use below sql's for your issues select 'Validate datafile '|| file# || ' block '||block# ||';' FROM v$database_block_corruption; select 'Select * from dba_free_space where file_id='|| file# || ' and '||block# ||' between block_id and block_id + blocks -1;' FROM v$database_block_corruption;
So our issue we just replaced the file no with file# and block no with block# ;
The above will run the validate only on that file and block and not entire database.
So we just started ti run those queries our first corrupted block
We just run for single file and output as below
rman connect target / ( Do not connect catalog)
RMAN > Validate datafile 71 block 1340426 ; Validate datafile 71 block 1340426; List of Datafiles ================= File Status Marked Corrupt Empty Blocks Blocks Examined High SCN ---- ------ -------------- ------------ --------------- ---------- 71 FAILED 0 0 1 0 File Name: +ORADATA/PROD/datafile/apps_ts_xxcustom_data.365.739618975 Block Type Blocks Failing Blocks Processed ---------- -------------- ---------------- Data 0 0 Index 1 1 Other 0 0
So this indicates the block is still corrupted.
Its showing failing block for index.
I run the following:
SQL>Select file#,block# from v$database_block_corruption where file#=71 ; SQL>Select * from dba_free_space where file_id=71 and <block# value from above query> between block_id and block_id + blocks -1 ;
If this returns row then the block is in free space.
Query return:
APPS_TS_XXCUSTOM_DATA 71 1337856 24903680 3040 71
The good news is this block is in free space. Which mean when the index has dropped the block was still in the corrupted state but since it was not reused by any other segment oracle still believe’s it to be corrupt.
Once this block is reused oracle would autoformat this block.
So We can do a block recovery. By using the backups taken to tape or disk
As I mention previously We should first use below queries for other corrupted blocks
SQL>Select file#,block# from v$database_block_corruption ;
From rman
RMAN > Validate datafile <file no> block <Block no> ;
So replace the file no with file# and block no with block# ;
Once done
Select * from dba_free_space where file_id=71 and <block# value from above query> between block_id and block_id + blocks -1 ;
For example
V$database_block_corruption
2 81 1.338.102 1 0 CHECKSUM
2 101 1.335.514 1 0 CHECKSUM
Rman> backup validate datafile 81 block 1338102 ; SQL>Select * from from dba_free_space where file_id=81 and 1338102 between block_id and block_id + blocks -1 ;
So corrupt blocks are not occupied by any objects. So My production database activities would not be affected.
However, since the blocks have still not been reused rman still considers them as corrupted. So what We have to do?
We should now go with the rman block recovery command.
Please note block recover command would restore archivelogs required for block recovery to disk for doing a block recovery of the affected block.
You can do the following for block recover:
A single command will recovery all blocks affected in v$database_block_corruption ;
Rman> run { allocate channel c1 device type disk ; allocate channel c2 device type disk ; blockrecover corruption list ; }
Or Alternate command is
Rman>recover corruption list ;
In these steps, We faced first block corruption recovery took 1 hour 06 min., second block corruption took almost 1 hour. We started to thing the recovery process could be more than 10 hours. But After 2th recover other block corruption took less than 5-10 min.
So the expected behavior is while recovering corruption list process end, dynamic view v$database_block_corruption will be updated. But We see that after rman complete process this view still show us there are corruption block available.
While We started to run validate command again view became empty
rman target / nocatalog
RMAN> Validate datafile 71 block 1340426;
So what is the cause of this corruption? What is a suggestion to prevent this error?
By default db_block_checksum is true.
When dbwr writes blocks to disk it does a xor of bytes within the blocks. Once the block is written to disk no oracle process modifies the blocks unless its re-read to buffer.
Before the block is read back to buffer the server process does a recomputation of the checksum value by doing xor of the bytes within the block.
This should match with the value stored in the block by dbwr process as no oracle process has modified the blocks.
If there is a mismatch in this value it clearly indicates some OS/hardware issue which has overwritten some contents of the blocks when they were on disk.
Also, any corruption by Oracle would be through redo changes. However, we see rman block recovery has gone fine.
Clearly indicating the redo changes vectors were good.You can set db_block_checking=typical …. however, these parameters would have some performance impact around 2 to 10%.
Or set DB_ULTRA_SAFE
Refer
New Parameter DB_ULTRA_SAFE introduce In 11g [ID 465130.1]
These parameters would help detect corruption at earlies in memory and prevent on occasion corrupted block from written to disk.
Again these might also have some performance impact around 2-10% depending on your application
References:
New Parameter DB_ULTRA_SAFE introduce In 11g [ID 465130.1]
Known Corruption issues caused by 3rd party Software Provider [ID 1323649.1]
Intermittent Ora-19566 When Using RMAN Backup [ID 549256.1]
Ext/Pub RMAN 9i Block-Level Media Recovery – Concept & [ID 144911.1 ]
Bir Cevap Yazın