Degraded DB performance with significant waits for “reliable messages” & “enq: KO – fast object checkpoint”

Dear Readers,

Trust you are doing well!!!

In this post, I will discussing about an interesting issue I faced recently on one of the database setups, wherein we received incident about generic performance issue. This database was recently created for reporting purpose.

Though no specific errors seen in database, application job were taking significant time to complete.

Looking at active sessions on database, I was able to see most of database sessions were waiting for following event. Final blocker for these sessions was mostly CKPT process.

  • reliable message
  • enq: KO – fast object checkpoint
  • enq: RO – fast object reuse

Content from HA trace during issue time, which clearly shows CKPT is blocking 90+ sessions. Wait chain is ‘enq: RO – fast object reuse'<=’reliable message’


Chain 1:

Oracle session identified by:
{
            instance: 1 (test.test1)
               os id: 10355
          process id: 161, oracle@test_server1 (ARCE)
          session id: 189
    session serial #: 39695
}
is waiting for 'reliable message' with wait info:
{
                  p1: 'channel context'=0xb7eae7a60
                  p2: 'channel handle'=0xb5e325cb0
                  p3: 'broadcast message'=0xb5eba7d68
        time in wait: 0.295771 sec
       timeout after: never
             wait id: 10434563
            blocking: 0 sessions
      current sql_id: 0
         current sql: <none>
         short stack: ksedsts()+964<-ksdxfstk()+44<-ksdxcb()+2112<-sspuser()+400<-__sighndlr()+12<-call_user_handler()+1028<-sigacthandler()+224<-_portfs()+8<-port_get()+28<-sskgpwwait()+188<-skgpwwait()+192<-ksliwat()+1896<-kslwaitctx()+156<-ksrpubwait_ctx()+1248<-krsr_execute_cic()+748<-kcrrwkx()+1088<-kcrrwk()+564<-ksb_act_run_int()+136<-ksb_act_run()+136<-ksbabs()+1516<-ksbrdp()+1472<-opirip()+704<-opidrv()+612<-sou2o()+136<-opimai_real()+164<-ssthrdmain()+488<-main()+316<-_start()+272
        wait history:
          * time between current wait and wait #1: 0.000192 sec
          1.       event: 'rdbms ipc message'
             time waited: 0.006441 sec
                 wait id: 10434562         p1: 'timeout'=0x14ea
          * time between wait #1 and #2: 0.000058 sec
          2.       event: 'rdbms ipc message'
             time waited: 0.000275 sec
                 wait id: 10434561         p1: 'timeout'=0x14ea
          * time between wait #2 and #3: 0.000197 sec
          3.       event: 'latch free'
             time waited: 0.000888 sec
                 wait id: 10434560         p1: 'address'=0x38002d240
                                           p2: 'number'=0x169
                                           p3: 'why'=0x0
}
and is blocked by=>

 Oracle session identified by:
{
instance: 1 (test.test1)
os id: 7167
process id: 111, oracle@test_server1 (CKPT)
session id: 20869
session serial #: 54287
}
which is waiting for 'enq: RO - fast object reuse' with wait info:
{
p1: 'name|mode'=0x524f0002
p2: '2'=0x20374
p3: '0'=0x1
time in wait: 0.000000 sec
timeout after: never
wait id: 60861795
blocking: 93 sessions
current sql_id: 0
current sql:
short stack: ksedsts()+964<-ksdxfstk()+44<-ksdxcb()+2112<-sspuser()+400<-__sighndlr()+12<-call_user_handler()+1028<-sigacthandler()+224<-ksl_post_queue_done_int()+32<-ksqgtlctx()+9228<-ksqgelctx()+1040<-kcbo_add_ckpt()+2892<-kcbstc1()+168<-kcbo_subch()+520<-ksb_act_run_int()+136<-ksb_act_run()+136<-ksbcti()+176<-ksbabs()+1724<-ksbrdp()+1472<-opirip()+704<-opidrv()+612<-sou2o()+136<-opimai_real()+164<-ssthrdmain()+488<-main()+316<-_start()+272
wait history:
* time between current wait and wait #1: 0.000222 sec
1. event: 'enq: RO - fast object reuse'
time waited: 0.000684 sec
wait id: 60861794 p1: 'name|mode'=0x524f0006
p2: '2'=0x20374
p3: '0'=0x2
* time between wait #1 and #2: 0.000188 sec
2. event: 'ges inquiry response'
time waited: 0.000786 sec
wait id: 60861793 p1: 'type|mode|where'=0x524f0000
p2: 'id1'=0x20374
p3: 'id2'=0x1
* time between wait #2 and #3: 0.000844 sec
3. event: 'enq: RO - fast object reuse'
time waited: 0.000855 sec
wait id: 60861792 p1: 'name|mode'=0x524f0006
p2: '2'=0x212ce
p3: '0'=0x2
}
and may or may not be blocked by another session.

Chain 1 Signature: 'enq: RO - fast object reuse'<='reliable message'

After doing analysis for couple of days, most of these queries were going for FTS & doing direct path reads & eventually blocked by CKPT, which was causing overall degration.

But why CKPT is blocking all my sessions doing direct path reads.

As per Oracle documentation, the session is waiting for a direct read to complete. A direct read is a physical I/O from a data file that bypasses the buffer cache and reads the data block directly into process-private memory. Direct path reads require a checkpoint.

So this explains overall situation my database is currently in.

Now even bigger question is ==> what is causing all my FTS queries doing direct path reads and not scattered read.

I found database is having one interesting hidden parameter set.

_serial_direct_read = TRUE

As value for _serial_direct_read is set to TRUE, Oracle was forcing every query going for FTS waits for CKPT.

I changed _serial_direct_read to AUTO, which lets Oracle to decide on whether is wants to choose direct path read, scattered read or smart scan. Post changing this parameter, we do not see any sessions waiting for “enq: KO – fast object checkpoint”.

Hope u will find this post very useful!!!

Cheers

Regards,
Adityanath

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s