Thursday, August 23, 2012

Wait Event : log file sequential read

Case study: log file sequential read

Recently one of my clients asked me to analyze his database problem post factum. At the time of request the problem itself was already solved of "self-solved", so the task was to explain it's causes and prevent similar issues in future.

Symptoms

Here's problem's description given by client:

  • third party software developers were rebuilding (REBUILD ONLINE) some indexes in the database;
  • suddenly, the whole system became much slower; main database wait event observed was "log file sequential read";
  • IO workload for storage partition holding online redo was so significant that all users of that storage device noticed performance problems - and it was hi-end storage system, main storage device for that enterprise;
  • problem disappeared when Oracle instance had been bounced.

Diagnostics

As far as "log file sequential read" isn't the most frequently seen wait event, I've looked up Oracle documentation for it's description. It says: "Waiting for the read from this log file to return. This is used to read redo records from the log file". Obvious but useless, as for me, as there's no information why may Oracle read online redo. Google and Metalink gave me a couple of link describing ARCH and Logminer performance problems, but they were not the case.

However, there was an interesting message in alert.log: "Some indexes or index [sub]partitions of table SOMESCHEMA.SOMETABLE have been marked unusable" followed by number of "Errors in file..." records without actual error messages. All indexes on SOMESCHEMA.SOMETABLE were valid at the time of investigation. But in trace files mentioned in alert.log I saw, inter alia, message "oer 8102.2 - obj# 55780", i.e. "ORA-8102: index key not found" for index with object_id = 55780.

After that it was easy to:
  • find a bug #7329252 "Index corruption after rebuild index ONLINE" saying "If there is a heavy concurrency on a table during online index rebuild, the index can be corrupt missing some keys." Fixed in 10.2.0.4.4 and 11.2.0.1;
  • understand that "log file sequential read" wait event was caused by sessions that had found corrupted block and tried to dump corresponding redo blocks. By the way, documentation hints that "log file sequential read" is used when dumping redo in event parameters description: "log# - The relative sequence number of the logfiles within a log group (used only when dumping the logfiles)";
  • understand that initial information "problem disappeared when Oracle instance had been bounced" was incorrect. In fact, software developers just rebuilt corrupted index after RDBMS reboot and concealed that fact from client

Key findings:
  • now I know at least three situations when "log file sequential read" can be observed: when ARCH as reading file to archive it; when Logmine is working and when some session is trying to create dump of redo;
  • again, REBUID ONLINE is not so ONLINE as it seems - at least, in Oracle 10g.
ref : http://ua-dba.blogspot.com/2010/09/case-study-log-file-sequential-read.html