Case study: log file sequential read
Symptoms
Here's problem's description given by client:
- third party software developers were rebuilding (REBUILD ONLINE) some indexes in the database;
- suddenly, the whole system became much slower; main database wait event observed was "log file sequential read";
- IO workload for storage partition holding online redo was so significant that all users of that storage device noticed performance problems - and it was hi-end storage system, main storage device for that enterprise;
- problem disappeared when Oracle instance had been bounced.
Diagnostics
As far as "log file sequential read" isn't the most
frequently seen wait event, I've looked up Oracle documentation for it's
description. It says: "Waiting for the read from this log file to
return. This is used to read redo records from the log file". Obvious
but useless, as for me, as there's no information why may Oracle
read online redo. Google and Metalink gave me a couple of link
describing ARCH and Logminer performance problems, but they were not the
case.
However, there was an interesting message in alert.log: "Some
indexes or index [sub]partitions of table SOMESCHEMA.SOMETABLE have
been marked unusable" followed by number of "Errors in file..." records
without actual error messages. All indexes on SOMESCHEMA.SOMETABLE were
valid at the time of investigation. But in trace files mentioned in
alert.log I saw, inter alia, message "oer 8102.2 - obj# 55780", i.e.
"ORA-8102: index key not found" for index with object_id = 55780.
After that it was easy to:
- find a bug #7329252 "Index corruption after rebuild index ONLINE" saying "If there is a heavy concurrency on a table during online index rebuild, the index can be corrupt missing some keys." Fixed in 10.2.0.4.4 and 11.2.0.1;
- understand that "log file sequential read" wait event was caused by sessions that had found corrupted block and tried to dump corresponding redo blocks. By the way, documentation hints that "log file sequential read" is used when dumping redo in event parameters description: "log# - The relative sequence number of the logfiles within a log group (used only when dumping the logfiles)";
- understand that initial information "problem disappeared when Oracle instance had been bounced" was incorrect. In fact, software developers just rebuilt corrupted index after RDBMS reboot and concealed that fact from client
Key findings:
- now I know at least three situations when "log file sequential read" can be observed: when ARCH as reading file to archive it; when Logmine is working and when some session is trying to create dump of redo;
- again, REBUID ONLINE is not so ONLINE as it seems - at least, in Oracle 10g.