./roothas.sh -postpatch OR root.sh failing with CLSRSC-400: A system reboot is required to continue installing.

Recently I was doing fresh Grid Infrastructure(GI) 12.2 install on one of our UAT boxes, wherein I was facing strange issue.

Both “root.sh” & “./roothas.sh -postpatch” exiting with below error/warning:

CLSRSC-400: A system reboot is required to continue installing.


test-server01:/u01/app/12.2.0.1/grid/bin # cd $ORACLE_HOME/crs/install
test-server01:/u01/app/12.2.0.1/grid/crs/install # ./roothas.sh -postpatch
Using configuration parameter file: /u01/app/12.2.0.1/grid/crs/install/crsconfig_params
The log of current session can be found at:
/u01/app/crsdata/test-server01/crsconfig/hapatch_2019-05-27_02-22-25PM.log
2019/05/27 14:22:30 CLSRSC-329: Replacing Clusterware entries in file '/etc/inittab'
2019/05/27 14:23:18 CLSRSC-400: A system reboot is required to continue installing.

A simple instruction given by above warning was to reboot machine & retry. I did ask server admin to reboot machine but subsequent rerun of command failed with the same error.


test-server01:/u01/app/12.2.0.1/grid/crs/install # ./roothas.sh -postpatch
Using configuration parameter file: /u01/app/12.2.0.1/grid/crs/install/crsconfig_params
The log of current session can be found at:
/u01/app/crsdata/test-server01/crsconfig/hapatch_2019-05-27_02-37-02PM.log
2019/05/27 14:37:07 CLSRSC-329: Replacing Clusterware entries in file '/etc/inittab'
2019/05/27 14:37:52 CLSRSC-400: A system reboot is required to continue installing.

I tried checking associated log files to get more details: /u01/app/crsdata/test-server01/crsconfig/hapatch_2019-05-27_02-37-02PM.log


> ACFS-9428: Failed to load ADVM/ACFS drivers. A system reboot is recommended.
> ACFS-9310: ADVM/ACFS installation failed.
> ACFS-9178: Return code = USM_REBOOT_RECOMMENDED
2019-05-27 14:37:41: ACFS drivers cannot be installed, and reboot may resolve this
2019-05-27 14:37:52: Command output:
> CLSRSC-400: A system reboot is required to continue installing.
>End Command output
2019-05-27 14:37:52: CLSRSC-400: A system reboot is required to continue installing.

So this was definitely due to issue with ACFS drivers.

I found below MOS documents related to my issues but nothing was exactly matching with my situation or operating system.

While Manually Installing a Patch ‘rootcrs.sh -patch’ Fails with – CLSRSC-400: A system reboot is required to continue installing. (Doc ID 2360097.1)

ALERT: root.sh Fails With “CLSRSC-400” While Installing GI 12.2.0.1 on RHEL or OL with RedHat Compatible Kernel (RHCK) 7.3 (Doc ID 2284463.1)

In our environment, we don’t use ACFS file system, so it was not the real problem in my case & we can always get ACFS drivers explicity if we need it in the future.

After reading all the logs I found /u01/app/12.2.0.1/grid/lib/acfstoolsdriver.sh is being called for ACFS driver installation.

I changed following code from

# Now run command with all arguments!
exec ${RUNTHIS} $@

to

# Now run command with all arguments!
#exec ${RUNTHIS} $@
exit 0

After changing above code, “./roothas.sh -postpatch” completed without errors/warning & I was able to complete GI installation successfully.

Note: This workaround is only applicable when ACFS is not being used in the environment so it can be implemented with the forewarning that there is implied risk which one must accept πŸ˜‰

Hope u will find this post very useful.

Cheers

Regards,
Adityanath

 

 

 

Advertisements

DST_UPGRADE_STATE = DATAPUMP(1) causing issue in Oracle DB upgrade.

Yesterday, I was busy upgrading my UAT database from 12.1.0.2 to 12.2.0.1. As a prerequisite when I ran preupgrade.jar into 12.1 RDBMS home it gave me below warning:


-- CHECK/FIXUP name: pending_dst_session
--
-- The call to run_fixup below will test whether
-- the following issue originally identified by
-- the preupgrade tool is still present
-- and if so, it will attempt to perform the action
-- necessary to resolve it.
--
-- ORIGINAL PREUPGRADE ISSUE:
-- + Complete any pending DST update operation before starting the database
-- upgrade.
--
-- There is an unfinished DST update operation in the database. It's
-- current state is: DATAPUMP(1)
--
-- There must not be any Daylight Savings Time (DST) update operations
-- pending in the database before starting the upgrade process.
-- Refer to My Oracle Support Note 1509653.1 for more information.
--
fixup_result := dbms_preup.run_fixup('pending_dst_session');

I tried querying DATABASE_PROPERTIES to get current DST_UPGRADE_STATE output, it was shown as below:


SQL> SELECT PROPERTY_NAME, SUBSTR(property_value, 1, 30) value
FROM DATABASE_PROPERTIES
WHERE PROPERTY_NAME LIKE 'DST_%'
ORDER BY PROPERTY_NAME;
PROPERTY_NAME VALUE
-------------------------------------------------------------------------------------------------------------------------------- ---------
DST_PRIMARY_TT_VERSION 18
DST_SECONDARY_TT_VERSION 14
DST_UPGRADE_STATE DATAPUMP(1)

I followed below MOS notes to resolve the issues, but all ended without any luck.

Updating the RDBMS DST version in 12c Release 1 (12.1.0.1 and up) using DBMS_DST (Doc ID 1509653.1)
How To Cleanup Orphaned DataPump Jobs In DBA_DATAPUMP_JOBS ? (Doc ID 336014.1)

Then I thought of ignoring this error & proceeded with DB upgrade. DB was successfully upgraded. But I once again stuck during step of DST upgrade from 18 to 26.

It was not allowing me to upgrade DST version from 18 to 26 as DST_UPGRADE_STATE was not NONE.

Then I googled it & found below steps to resolve it:


1. ALTER SESSION SET EVENTS ‘30090 TRACE NAME CONTEXT FOREVER, LEVEL 32’;
2. exec dbms_dst.unload_secondary;
3. ALTER SESSION SET EVENTS ‘30090 TRACE NAME CONTEXT FOREVER, OFF’;


I did check DST_UPGRADE_STATE post implementing it.


SQL> SELECT PROPERTY_NAME, SUBSTR(property_value, 1, 30) value
FROM DATABASE_PROPERTIES
WHERE PROPERTY_NAME LIKE 'DST_%'
ORDER BY PROPERTY_NAME;

PROPERTY_NAME VALUE
-------------------------------------------------------------------------------------------------------------------------------- ---------
DST_PRIMARY_TT_VERSION 18
DST_SECONDARY_TT_VERSION 14
DST_UPGRADE_STATE NONE

Now my database was ready for DST upgrade as DST_UPGRADE_STATE is NONE. πŸ™‚

Hope u will find this post very useful.

Cheers

Regards,
Adityanath

INVALID JServer JAVA Virtual Machine in Oracle RDBMS Database 12.1.0.2.

Recently I was busy upgrading our DEV database from 12.1 to 12.2 & found JServer JAVA Virtual Machine registry component was in INVALID state.


COMP_NAME COMP_ID VERSION STATUS
----------------------------------- ------------------------------ ------------------------------ -----------
Oracle Application Express APEX 4.2.5.00.08 VALID
OWB OWB 11.2.0.3.0 VALID
OLAP Catalog AMD 11.2.0.4.0 OPTION OFF
Spatial SDO 12.1.0.2.0 VALID
Oracle Multimedia ORDIM 12.1.0.2.0 VALID
Oracle XML Database XDB 12.1.0.2.0 VALID
Oracle Text CONTEXT 12.1.0.2.0 VALID
Oracle Workspace Manager OWM 12.1.0.2.0 VALID
Oracle Database Catalog Views CATALOG 12.1.0.2.0 VALID
Oracle Database Packages and Types CATPROC 12.1.0.2.0 VALID
JServer JAVA Virtual Machine JAVAVM 12.1.0.2.0 INVALID ====> Issue
Oracle XDK XML 12.1.0.2.0 VALID
Oracle Database Java Packages CATJAVA 12.1.0.2.0 VALID
OLAP Analytic Workspace APS 12.1.0.2.0 VALID
Oracle OLAP API XOQ 12.1.0.2.0 VALID
Oracle Real Application Clusters RAC 12.1.0.2.0 OPTION OFF

As a prerequisite of upgrade, I had to rectify this before attempting upgrade.

As a first step, I tried running UTLRP.sql, but still component was in INVALID state.

I even checked, status of all objects in the database with object_type like JAVA%.


SYS@TESTDB:TESTDB> select owner, status, count(*) from all_objects where object_type like '%JAVA%' group by owner, status;

OWNER STATUS COUNT(*)
-------------------------------------------------------------------------------------------------------------------------------- ------- ----------
SYS VALID 29238
MDSYS VALID 650
ORDSYS VALID 2589

So now only option, I had left is to reinstall JAVA package inside database.

PFB steps for the same:

  1. alter system set java_jit_enabled = FALSE;
  2. alter system set “_system_trig_enabled”=FALSE;
  3. alter system set job_queue_processes=0;
  4. create or replace java system;
  5. alter system set java_jit_enabled = true;
  6. alter system set “_system_trig_enabled”=TRUE;
  7. alter system set JOB_QUEUE_PROCESSES=1000;
  8. @?/rdbms/admin/utlrp.sql

After applying above steps, JServer JAVA Virtual Machine became VALID. πŸ™‚


COMP_NAME COMP_ID VERSION STATUS
----------------------------------- ------------------------------ ------------------------------ -----------
Oracle Application Express APEX 4.2.5.00.08 VALID
OWB OWB 11.2.0.3.0 VALID
OLAP Catalog AMD 11.2.0.4.0 OPTION OFF
Spatial SDO 12.1.0.2.0 VALID
Oracle Multimedia ORDIM 12.1.0.2.0 VALID
Oracle XML Database XDB 12.1.0.2.0 VALID
Oracle Text CONTEXT 12.1.0.2.0 VALID
Oracle Workspace Manager OWM 12.1.0.2.0 VALID
Oracle Database Catalog Views CATALOG 12.1.0.2.0 VALID
Oracle Database Packages and Types CATPROC 12.1.0.2.0 VALID
JServer JAVA Virtual Machine JAVAVM 12.1.0.2.0 VALID ====> Fixed
Oracle XDK XML 12.1.0.2.0 VALID
Oracle Database Java Packages CATJAVA 12.1.0.2.0 VALID
OLAP Analytic Workspace APS 12.1.0.2.0 VALID
Oracle OLAP API XOQ 12.1.0.2.0 VALID
Oracle Real Application Clusters RAC 12.1.0.2.0 OPTION OFF

Hope u will find this post very useful πŸ™‚

Cheers

Regards,
Adityanath

ORA-06598: insufficient INHERIT PRIVILEGES privilege

Few days ago I observed, all of a sudden, one of the application related cron job started failing with following error.

ORA-06598: insufficient INHERIT PRIVILEGES privilege

This job was intended to drop temporary tables in application schema. We had written a shell script in which SYS user executes procedure owned by application schema.

Only thing that was changed at DB end, that DB was upgraded from 11g to 12c.

After investigating further on the error, I found this was due to a new 12c security feature.

Before Oracle Database 12c, a PL/SQL code/pacakge/procedure always ran with the privileges of its invoker. If its invoker had higher privileges than its owner, then the code might perform operations unintended by, or forbidden to, its owner. Here we can see security gap.

For example, User A creates a new package and we execute it from users with higher privileges, like SYS. Now user A knows that SYS uses this package regularly, so user A could replace the contents of this package with some malacious code any time and do anything in the database, knowing the code will be ran by SYS sooner or later.

In 12c this behavior can be controlled using INHERITANCE PRIVILEGES.

See following linkΒ for more details.

INHERIT PRIVILEGES and INHERIT ANY PRIVILEGES Privileges

As of Oracle Database 12c, a PL/SQL code/pacakge/procedure can run with the privileges of its invoker only if its owner has either the INHERIT PRIVILEGES privilege on the invoker or the INHERIT ANY PRIVILEGES privilege.

I was able to resolve the issue after issuing below command:


SQL> grant inherit privileges on user sys to <application schema>;

Grant succeeded.

Hope so u will find this post very useful πŸ™‚

Cheers

Regards,
Adityanath

CELL-02630: There is a communication error between Management Server and Cell Server caused by a mismatch of security keys. Check that both servers have access to and use the same $OSSCONF/cellmskey.ora file.

IHAC who is on Exadata image version: 11.2.3.3.0.131014.1, faced below issue.

Any command on cellcli was failing with error: CELL-02630.

CELL-02630: There is a communication error between Management Server and Cell Server caused by a mismatch of security keys. Check that both servers have access to and use the same $OSSCONF/cellmskey.ora file.

If you read error description, it refers to communication error between CELLSRV & MS processes, due to mismatch in security keys.

I did check current status of CELL process & all were up & running.


[root@test01celadm01 ~]# service celld status
rsStatus: running
msStatus: running
cellsrvStatus: running

MS always creates key file ==> cellmskey.ora on startup if it does not exist. But in our case it was not present. (Not sure if someone deleted it manually)

I asked customer to restart MS process & check if it helps. After restarting MS process, CELLCLI commands started working as expected πŸ™‚


CellCLI> alter cell restart services ms

Restarting MS services...
The RESTART of MS services was successful.

CellCLI> list celldisk
CD_00_test01celadm01 normal
CD_01_test01celadm01 normal
CD_02_test01celadm01 normal
CD_03_test01celadm01 normal
CD_04_test01celadm01 normal
CD_05_test01celadm01 normal
CD_06_test01celadm01 normal
CD_07_test01celadm01 normal
CD_08_test01celadm01 normal
CD_09_test01celadm01 normal
CD_10_test01celadm01 normal
CD_11_test01celadm01 normal
FD_00_test01celadm01 normal
FD_01_test01celadm01 normal
FD_02_test01celadm01 normal
FD_03_test01celadm01 normal
FD_04_test01celadm01 normal
FD_05_test01celadm01 normal
FD_06_test01celadm01 normal
FD_07_test01celadm01 normal
FD_08_test01celadm01 normal
FD_09_test01celadm01 normal
FD_10_test01celadm01 normal
FD_11_test01celadm01 normal
FD_12_test01celadm01 normal
FD_13_test01celadm01 normal
FD_14_test01celadm01 normal
FD_15_test01celadm01 normal

Hope so u will find this post very useful πŸ™‚

Cheers

Regards,
Adityanath

New Exadata install getting Warning:Flash Cache size is not consistent for all storage nodes in the cluster.

Recently my customer faced the following issue, wherein after completing the X7-2 Exadata Install, Flash cache was showing different size in one of the cell node than other cells.

Everything went well with onecommand install until step 15 which had this warning:

Warning:Flash Cache size is not consistent for all storage nodes in the cluster. Flash Cache on [celadm06.test.local] does not match with the Flash Cache size on the cell celadm01.test.local in cluser /u01/app/12.2.0.1/grid

We checked flashcache size using dcli command:


[root@celadm01 linux-x64]# dcli -g cell_group -l root cellcli -e "list flashcache detail" | grep size
celadm01: size: 23.28692626953125T
celadm02: size: 23.28692626953125T
celadm03: size: 23.28692626953125T
celadm04: size: 23.28692626953125T
celadm05: size: 23.28692626953125T
celadm06: size: 23.28680419921875T ==================> Smaller flashcache than other cells
celadm07: size: 23.28692626953125T

All Flash disks were in a normal state and there was no hardware failure reported.

After investigating furter through sundiag report, I found below mismatch.


name: FD_00_celadm06
comment: 
creationTime: 2018-07-22T14:11:18+00:00
deviceName: /dev/md310
devicePartition: /dev/md310
diskType: FlashDisk
errorCount: 0
freeSpace: 0 =================================================>>>>>>>>>>>>>>>>>>>>>>>>>> freeSpace is 0
id: ***********
physicalDisk: ***********
size: 5.8218994140625T
status: normal

name: FD_01_celadm06
comment: 
creationTime: 2018-07-22T14:11:18+00:00
deviceName: /dev/md304
devicePartition: /dev/md304
diskType: FlashDisk
errorCount: 0
freeSpace: 0 =================================================>>>>>>>>>>>>>>>>>>>>>>>>>> freeSpace is 0
id: ***********
physicalDisk: ***********
size: 5.8218994140625T
status: normal

name: FD_02_celadm06
comment: 
creationTime: 2018-07-22T14:11:18+00:00
deviceName: /dev/md305
devicePartition: /dev/md305
diskType: FlashDisk
errorCount: 0
freeSpace: 0 =================================================>>>>>>>>>>>>>>>>>>>>>>>>>> freeSpace is 0
id: ***********
physicalDisk: ***********
size: 5.8218994140625T
status: normal

name: FD_03_celadm06
comment: 
creationTime: 2018-07-23T19:31:59+00:00
deviceName: /dev/md306
devicePartition: /dev/md306
diskType: FlashDisk
errorCount: 0
freeSpace: 160M =================================================>>>>>>>>>>>>>>>>>>>>>>> freeSpace 160M is not released
id: ***********
physicalDisk: ***********
size: 5.8218994140625T
status: normal

So I found the culprit πŸ™‚ The mismatch in flash cache size was caused by freeSpace not being released on one of the flash disks (FD_03_celadm06) as we can see in the logs.

I did ask customer to recreate flashcache using following procedure.


1) Check to make sure at least one mirror copy of the extents is available.

CellCLI> list griddisk attributes name,asmmodestatus,asmdeactivationoutcome
– If reporting ‘YES’ continue to step #2

2) Manually flush the flashcache:
# cellcli -e alter flashcache all flush

In a 2nd window… Check status of flashcach flush.
The following command should return “working” for each flash disk on each cell while the cache is being flushed and “completed” when it is finished.
# cellcli -e \”LIST CELLDISK ATTRIBUTES name, flushstatus, flusherror \”” | grep FD

3) Drop Flashlog:
# cellcli -e drop flashlog all

4) Drop flashcache:
# cellcli -e drop flashcache all

5) Recreate flashlog:
# cellcli -e create flashlog all

6) Recreate flashcache:
# cellcli -e create flashcache all

7) Finally check the flashcache size to see if it’s now at correct size:
# cellcli -e list flashcache detail | grep size


Issue was resolved after dropping and recreating the flashlog and flashcache on particular cellnode. πŸ™‚

Hope so u will find this post very useful πŸ™‚

Cheers

Regards,
Adityanath

Exadata image 18.1.5 in status failure due to Validation check ERROR – NOT RUNNING for service: dbserverd

Recently one of my client faced issue after upgrading Exadata image in DB server, image was showing its status as failure. I did review all patchmgr logs but didn’t see anything weird.


root@testserver1 ]# imageinfo
Kernel version: 4.1.12-94.8.4.el6uek.x86_64 #2 SMP Sat May 5 16:14:51 PDT 2018 x86_64
Image kernel version: 4.1.12-94.8.4.el6uek
Image version: 18.1.5.0.0.180506 
Image activated: 2018-05-29 18:03:57 +0200
Image status: failure ============================> Issue
System partition on device: /dev/mapper/VGExaDb-LVDbSys1

I asked customer to run validations manually as below:

/opt/oracle.cellos/validations/bin/vldrun.pl -quiet -all

Customer shared o/p of the command as below:


[root@testserver1 ]# /opt/oracle.cellos/validations/bin/vldrun.pl -quiet -all
Logging started to /var/log/cellos/validations.log
Command line is /opt/oracle.cellos/validations/bin/vldrun.pl -quiet -all
Run validation ipmisettings - PASSED
Run validation misceachboot - FAILED   ============================> Issue
Check log in /var/log/cellos/validations/misceachboot.log
Run validation biosbootorder - PASSED
Run validation oswatcher - PASSED
Run validation checkdeveachboot - PASSED
Run validation checkconfigs - BACKGROUND RUN
Run validation saveconfig - BACKGROUND RUN

After checking in misceachboot.log, I found below error:


-bash-4.4$ cat misceachboot.log | grep -i error
BIOS is Already Pause On Error on Adapter 0.
[1527609678][2018-05-29 18:03:53 +0200][ERROR][0-0][/opt/oracle.cellos/image_functions][image_functions_check_configured_services][] Validation check ERROR - NOT RUNNING for service: dbserverd
BIOS is Already Pause On Error on Adapter 0.
[1527678371][2018-05-30 13:06:56 +0200][ERROR][0-0][/opt/oracle.cellos/image_functions][image_functions_check_configured_services][] Validation check ERROR - NOT RUNNING for service: dbserverd

This shows something went wrong with service: dbserverd.

I asked him to check status of dbserverd services & to manually stop & start dbserverd services on affected server.

1. service dbserverd status

2. service dbserverd stop

3. service dbserverd start


[root@testserver1 ]# service dbserverd status
rsStatus: running
msStatus: stopped       ============================> Issue

[root@testserver1 ]# service dbserverd stop
Stopping the RS and MS services...
The SHUTDOWN of services was successful.

[root@testserver1 ]# service dbserverd start
Starting the RS services...
Getting the state of RS services... running
Starting MS services...
DBM-01513: DBMCLI request to Restart Server (RS) has timed out.
The STARTUP of MS services was not successful. Error: Unknown Error

This confirmed issue was with MS services. I asked customer to restart DB server but it didn’t resolve the issue.

Now I asked customer to reconfigure MS services as given below & check if it helps:


1. ssh to the node as root

2. Shutdown running RS and MS

DBMCLI>ALTER DBSERVER SHUTDOWN SERVICES ALL

see all the pids by “ps -ef | grep “dbserver.*dbms”, just kill them all.

3. re-deploy MS:
/opt/oracle/dbserver/dbms/deploy/scripts/unix/setup_dynamicDeploy DB -D

4. Restart RS and MS
DBMCLI>ALTER DBSERVER STARTUP SERVICES ALL


& this action plan resolved the issue:


[root@testserver1 ]# dbmcli
DBMCLI: Release - Production on Wed May 30 16:05:13 CEST 2018

Copyright (c) 2007, 2016, Oracle and/or its affiliates. All rights reserved.

DBMCLI> ALTER DBSERVER STARTUP SERVICES ALL
Starting the RS and MS services...
Getting the state of RS services... running
Starting MS services...
The STARTUP of MS services was successful.

DBMCLI> exit
quitting

[root@testserver1 ]# service dbserverd status
rsStatus: running
msStatus: running    ============================> Resolved 
[root@testserver1 ]#

Then we need to rerun validations to check if it is successful now:


[root@testserver1 ]# /opt/oracle.cellos/validations/bin/vldrun.pl -quiet -all
Logging started to /var/log/cellos/validations.log
Command line is /opt/oracle.cellos/validations/bin/vldrun.pl -quiet -all
Run validation ipmisettings - PASSED
Run validation misceachboot - PASSED ============================> Resolved 
Check log in /var/log/cellos/validations/misceachboot.log
Run validation biosbootorder - PASSED
Run validation oswatcher - PASSED
Run validation checkdeveachboot - PASSED
Run validation checkconfigs - BACKGROUND RUN
Run validation saveconfig - BACKGROUND RUN

Now you need to check image status:


[root@testserver1 ]# imageinfo
Kernel version: 4.1.12-94.8.4.el6uek.x86_64 #2 SMP Sat May 5 16:14:51 PDT 2018 x86_64
Image kernel version: 4.1.12-94.8.4.el6uek
Image version: 18.1.5.0.0.180506 
Image activated: 2018-05-29 18:03:57 +0200
Image status: success ============================> Resolved
System partition on device: /dev/mapper/VGExaDb-LVDbSys1

Sometimes this can still show status as failure where you can mark image status as success manually after checking with Oracle Support πŸ™‚

Hope so u will find this post very useful πŸ™‚

Cheers

Regards,
Adityanath