better now than never – buffer busy wait challenges and “data block class”
I have a RAC instance with a 30million row+ OLTP table that is not Partitioned (this is our most used table and since we have a Standard edition license partitioning right now is not an option until we get budget for Enterprise edition ).
The underlying DATA tablespace for this table is not on ASSM and has a single freelist group. I got called about 3 weeks back for slow performance at certain times during the day. Obviously the first thing I do is jump on Spotlight do my general checks and see my wait events. There was no doubt that my biggest wait event was “Buffer Busy waits”. While I used my performance monitoring tool to get down to this – the below sql will also list your top wait events.
select event,time_waited,
round(time_waited*100/SUM (time_waited) OVER(),2) wait_pct
from(
select event,time_waited
from v$system_event
where event NOT IN
('Null event',
'client message',
'rdbms ipc reply',
'smon timer',
'rdbms ipc message',
'PX Idle Wait',
'PL/SQL lock timer',
'file open',
'pmon timer',
'WMON goes to sleep',
'virtual circuit status',
'dispatcher timer',
'SQL*Net message from client',
'parallel query dequeue wait',
'pipe get'
) UNION
(select name,value from v$sysstat
where name like 'CPU uses when call started'))
order by wait_pct desc
We had this issue 2 days consecutively in the week(and have had similar behavior over the last few months) . We also collect statspack data frequently and the below query was run to identify the buffer busy segments in question for the time interval we had performance problems.
acknowledging source of sql : Author:Tim Gorman (SageLogix, Inc)
spool sp_buffer_busy_waits
clear breaks computes
break on day skip 1 on object_type on report
select yyyymmdd sort0,
daily_ranking sort1,
day,
object_type,
owner,
object_name,
buffer_busy_waits
from (select to_char(ss.snap_time, 'YYYYMMDD') yyyymmdd,
to_char(ss.snap_time, 'DD-MON') day,
o.object_type,
o.owner,
o.object_name,
sum(s.buffer_busy_waits) buffer_busy_waits,
rank () over (partition by to_char(ss.snap_time, 'YYYYMMDD') order by sum(s.buffer_busy_waits) desc) daily_ranking
from (select dbid,
instance_number,
dataobj#,
obj#,
snap_id,
nvl(decode(greatest(buffer_busy_waits,
nvl(lag(buffer_busy_waits) over (partition by dbid, instance_number, dataobj#, obj# order by snap_id),0)),
buffer_busy_waits,
buffer_busy_waits - lag(buffer_busy_waits) over (partition by dbid, instance_number, dataobj#, obj# order by snap_id),
buffer_busy_waits), 0) buffer_busy_waits
from stats$seg_stat) s,
stats$seg_stat_obj o,
stats$snapshot ss
where o.dataobj# = s.dataobj#
and o.obj# = s.obj#
and o.dbid = s.dbid
and ss.snap_id = s.snap_id
and ss.dbid = s.dbid
and ss.instance_number = s.instance_number
and ss.snap_time between (sysdate - &&V_NBR_DAYS) and sysdate
group by to_char(ss.snap_time, 'YYYYMMDD'),
to_char(ss.snap_time, 'DD-MON'),
o.object_type,
o.owner,
o.object_name
order by yyyymmdd, buffer_busy_waits)
where daily_ranking <= 10 order by sort0, sort1; There results showed that my biggest table had the highest buffer busy waits (note it was only on the table object and not on the index objects; the index tablespace had ASSM on them) I knew my options were the below:
(i)alter this object for multiple free list groups (this calls for an object reorg, we all know that reorgs require time, appropriate approvals and planning for it to be implemented in Production).
(ii)recreate the DATA/INDEX tablespace for this object with ASSM and let Oracle manage your freelist contention.
(we are on version 9i, as a DBA you wish your underlying structure are set correctly for a huge application. But then that’s not a realistic world ; most likely you have to pickup the work that was done in the past and get them all together)
(iii)alter storage for the object to add more freelists (this can be done with a simple alter table statement).
Obviously the patch I could submit rightaway to change management was for (iii). Since we have added more freelists; I have not had calls on performance issues – but given that I was on RAC and a single freelist group ....the issue had to be addressed
I mustered all the courage to let Management know that we may have to Reorg our biggest table and given that I will need to reorg the table I rather go with ASSM
While the reorg will need some planning and preparation it can be done. “Better now than never.” My experience with Quest Central for Live Reorg has been fairly good
Below are some quick / important steps to get your Live Reorg rolling
Before you begin install QSA (quest server agent) on your database server. I prefer the manuall install – Install from the GUI keeps failing at the login prompt even if you have rexec enabled.
1.Step1 connect to Space manager(part of Quest Central – Version 6.0.1.289) as an account that has dba privileges.
2) On your Tolls Bar Reorganize > Reorg Manager
3)Find the object you want to reorganize and say Next
4)Choose the Option (i) Optimize for speed (ii) Relocate Tablespace (create a new tablespace with SEGMENT SPACE MANAGEMENT AUTO)
5)Choose default scripting options;default tablespace storage options; Object Options
make sure to check for NoLogging
6)Store the Reorg as job schedule
(Details on the Reorg will be a separate on its own)
While I am in the process of testing my reorg in preprod and had no performance issues after I added multiple freelists to the heavily used table in PRODUCTION, I am still left with one question, will a freelist contention not manifect itself as a “freelist class” in v$waitstat when you are on RAC and have a single freelist group? I have read some posts saying that in all probablility it may manifest itself as a “datablock class” until you add multiple freelist groups.(I will need to run more test to verify this)
October 30th, 2006 - Shakir Sadikali's blog pointed out buffer busy waits as a result of disk write caching disabled..."I did not have to check our batteries" we are on NFS and noac "nocaching is a separate issue that we are dealing with for I/O performance issues"