Recently one of my client faced issue after upgrading Exadata image in DB server, image was showing its status as failure. I did review all patchmgr logs but didn’t see anything weird.
root@testserver1 ]# imageinfo Kernel version: 4.1.12-94.8.4.el6uek.x86_64 #2 SMP Sat May 5 16:14:51 PDT 2018 x86_64 Image kernel version: 4.1.12-94.8.4.el6uek Image version: 18.1.5.0.0.180506 Image activated: 2018-05-29 18:03:57 +0200 Image status: failure ============================> Issue System partition on device: /dev/mapper/VGExaDb-LVDbSys1
I asked customer to run validations manually as below:
/opt/oracle.cellos/validations/bin/vldrun.pl -quiet -all
Customer shared o/p of the command as below:
[root@testserver1 ]# /opt/oracle.cellos/validations/bin/vldrun.pl -quiet -all Logging started to /var/log/cellos/validations.log Command line is /opt/oracle.cellos/validations/bin/vldrun.pl -quiet -all Run validation ipmisettings - PASSED Run validation misceachboot - FAILED ============================> Issue Check log in /var/log/cellos/validations/misceachboot.log Run validation biosbootorder - PASSED Run validation oswatcher - PASSED Run validation checkdeveachboot - PASSED Run validation checkconfigs - BACKGROUND RUN Run validation saveconfig - BACKGROUND RUN
After checking in misceachboot.log, I found below error:
-bash-4.4$ cat misceachboot.log | grep -i error BIOS is Already Pause On Error on Adapter 0. [1527609678][2018-05-29 18:03:53 +0200][ERROR][0-0][/opt/oracle.cellos/image_functions][image_functions_check_configured_services][] Validation check ERROR - NOT RUNNING for service: dbserverd BIOS is Already Pause On Error on Adapter 0. [1527678371][2018-05-30 13:06:56 +0200][ERROR][0-0][/opt/oracle.cellos/image_functions][image_functions_check_configured_services][] Validation check ERROR - NOT RUNNING for service: dbserverd
This shows something went wrong with service: dbserverd.
I asked him to check status of dbserverd services & to manually stop & start dbserverd services on affected server.
1. service dbserverd status
2. service dbserverd stop
3. service dbserverd start
[root@testserver1 ]# service dbserverd status rsStatus: running msStatus: stopped ============================> Issue [root@testserver1 ]# service dbserverd stop Stopping the RS and MS services... The SHUTDOWN of services was successful. [root@testserver1 ]# service dbserverd start Starting the RS services... Getting the state of RS services... running Starting MS services... DBM-01513: DBMCLI request to Restart Server (RS) has timed out. The STARTUP of MS services was not successful. Error: Unknown Error
This confirmed issue was with MS services. I asked customer to restart DB server but it didn’t resolve the issue.
Now I asked customer to reconfigure MS services as given below & check if it helps:
1. ssh to the node as root
2. Shutdown running RS and MS
DBMCLI>ALTER DBSERVER SHUTDOWN SERVICES ALL
see all the pids by “ps -ef | grep “dbserver.*dbms”, just kill them all.
3. re-deploy MS:
/opt/oracle/dbserver/dbms/deploy/scripts/unix/setup_dynamicDeploy DB -D
4. Restart RS and MS
DBMCLI>ALTER DBSERVER STARTUP SERVICES ALL
& this action plan resolved the issue:
[root@testserver1 ]# dbmcli DBMCLI: Release - Production on Wed May 30 16:05:13 CEST 2018 Copyright (c) 2007, 2016, Oracle and/or its affiliates. All rights reserved. DBMCLI> ALTER DBSERVER STARTUP SERVICES ALL Starting the RS and MS services... Getting the state of RS services... running Starting MS services... The STARTUP of MS services was successful. DBMCLI> exit quitting [root@testserver1 ]# service dbserverd status rsStatus: running msStatus: running ============================> Resolved [root@testserver1 ]#
Then we need to rerun validations to check if it is successful now:
[root@testserver1 ]# /opt/oracle.cellos/validations/bin/vldrun.pl -quiet -all Logging started to /var/log/cellos/validations.log Command line is /opt/oracle.cellos/validations/bin/vldrun.pl -quiet -all Run validation ipmisettings - PASSED Run validation misceachboot - PASSED ============================> Resolved Check log in /var/log/cellos/validations/misceachboot.log Run validation biosbootorder - PASSED Run validation oswatcher - PASSED Run validation checkdeveachboot - PASSED Run validation checkconfigs - BACKGROUND RUN Run validation saveconfig - BACKGROUND RUN
Now you need to check image status:
[root@testserver1 ]# imageinfo Kernel version: 4.1.12-94.8.4.el6uek.x86_64 #2 SMP Sat May 5 16:14:51 PDT 2018 x86_64 Image kernel version: 4.1.12-94.8.4.el6uek Image version: 18.1.5.0.0.180506 Image activated: 2018-05-29 18:03:57 +0200 Image status: success ============================> Resolved System partition on device: /dev/mapper/VGExaDb-LVDbSys1
Sometimes this can still show status as failure where you can mark image status as success manually after checking with Oracle Support 🙂
Hope so u will find this post very useful 🙂
Cheers
Regards,
Adityanath
Categories: Administration, Exadata, Upgrade
After examine a number of of the weblog posts in your web site now, and I really like your means of blogging. I bookmarked it to my bookmark website list and will likely be checking back soon. Pls take a look at my web site as properly and let me know what you think.
I am truly grateful to the owner of this site who has shared this impressive piece of writing at at this time.