OEM agent version 13c installation on AIX fails with OUI-10039:Unable to access the inventory /u01/app/oraInventory on this system.

Hello Readers,

Few days ago, I was installing OEM agent version 13.3.0.0.0 on my DEV box with OS AIX version 7.2. I was using silent method using agentDeploy.sh.

Previously I had installed it successfully on multiple machines without any issues but this one failed with below error:

java.io.IOException: OUI-10039:Unable to access the inventory /u01/app/oraInventory on this system. Please ensure you have the proper permissions to read/write/search the inventory.

I tried to see if there are any permissions issue on folder /u01/app/oraInventory, I found mentioned directory was not present. So this error is expected one. But why Oracle is searching inventory at incorrect location???

When I dug into associated log file, I found more details about these errors.


2019-10-07 12:20:48,465 WARNING [34] oracle.sysman.oii.oiip.oiipg.OiipgPropertyLoader - The inventory pointer location /var/opt/oracle/oraInst.loc is either not readable or does not exist
2019-10-07 12:20:48,475 INFO [34] oracle.sysman.nextgen.utils.NextGenInventoryUtil - Setting default inventory location to: '/u01/app/oraInventory'
2019-10-07 12:20:48,475 WARNING [34] oracle.sysman.oii.oiip.oiipg.OiipgPropertyLoader - The inventory pointer location /var/opt/oracle/oraInst.loc is either not readable or does not exist
2019-10-07 12:20:48,475 WARNING [34] oracle.sysman.oii.oiip.oiipg.OiipgPropertyLoader - The inventory pointer location /var/opt/oracle/oraInst.loc is either not readable or does not exist
2019-10-07 12:20:48,477 SEVERE [34] oracle.sysman.oii.oiii.OiiiInstallAreaControl - OUI-10039:Unable to access the inventory /u01/app/oraInventory on this system. Please ensure you have the proper permissions to read/write/search the inventory.
2019-10-07 12:20:48,477 SEVERE [34] oracle.sysman.nextgen.impl.NextGenInstallerImpl - java.io.IOException: OUI-10039:Unable to access the inventory /u01/app/oraInventory on this system. Please ensure you have the proper permissions to read/write/search the inventory.

So basically, Oracle tries to check oraInst.loc under folder /var/opt/oracle & if it doesn’t find any, then it sets default inventory location to ‘/u01/app/oraInventory’.

I feel, This is somewhat agent software BUG, as loaction of oraInst.loc in AIX is ‘/etc’ not ‘/var/opt/oracle’.

Now there are two questions. First one, how to resolve this & second one, why my other installations went successful.

Reason behind other installation to be successful was Oracle indeed find oraInventory at its default location. So whenever oraInventory is located at ‘/u01/app/oraInventory’, you wont face this issue.

Now how to resolve this. You can always create softlink oraInst.loc to “/var/opt/oracle”.

Steps are given below:


Login using root:
1. mkdir -p /var/opt/oracle/
2. cd /var/opt/oracle/
3. ln -s /etc/oraInst.loc oraInst.loc
4. ls -lrt oraInst.loc

Once you perform above steps, you will be install OEM agent successfully.

Hope u will find this post very useful. πŸ™‚

Cheers

Regards,
Adityanath

 

Advertisements

./roothas.sh -postpatch OR root.sh failing with CLSRSC-400: A system reboot is required to continue installing.

Recently I was doing fresh Grid Infrastructure(GI) 12.2 install on one of our UAT boxes, wherein I was facing strange issue.

Both “root.sh” & “./roothas.sh -postpatch” exiting with below error/warning:

CLSRSC-400: A system reboot is required to continue installing.


test-server01:/u01/app/12.2.0.1/grid/bin # cd $ORACLE_HOME/crs/install
test-server01:/u01/app/12.2.0.1/grid/crs/install # ./roothas.sh -postpatch
Using configuration parameter file: /u01/app/12.2.0.1/grid/crs/install/crsconfig_params
The log of current session can be found at:
/u01/app/crsdata/test-server01/crsconfig/hapatch_2019-05-27_02-22-25PM.log
2019/05/27 14:22:30 CLSRSC-329: Replacing Clusterware entries in file '/etc/inittab'
2019/05/27 14:23:18 CLSRSC-400: A system reboot is required to continue installing.

A simple instruction given by above warning was to reboot machine & retry. I did ask server admin to reboot machine but subsequent rerun of command failed with the same error.


test-server01:/u01/app/12.2.0.1/grid/crs/install # ./roothas.sh -postpatch
Using configuration parameter file: /u01/app/12.2.0.1/grid/crs/install/crsconfig_params
The log of current session can be found at:
/u01/app/crsdata/test-server01/crsconfig/hapatch_2019-05-27_02-37-02PM.log
2019/05/27 14:37:07 CLSRSC-329: Replacing Clusterware entries in file '/etc/inittab'
2019/05/27 14:37:52 CLSRSC-400: A system reboot is required to continue installing.

I tried checking associated log files to get more details: /u01/app/crsdata/test-server01/crsconfig/hapatch_2019-05-27_02-37-02PM.log


> ACFS-9428: Failed to load ADVM/ACFS drivers. A system reboot is recommended.
> ACFS-9310: ADVM/ACFS installation failed.
> ACFS-9178: Return code = USM_REBOOT_RECOMMENDED
2019-05-27 14:37:41: ACFS drivers cannot be installed, and reboot may resolve this
2019-05-27 14:37:52: Command output:
> CLSRSC-400: A system reboot is required to continue installing.
>End Command output
2019-05-27 14:37:52: CLSRSC-400: A system reboot is required to continue installing.

So this was definitely due to issue with ACFS drivers.

I found below MOS documents related to my issues but nothing was exactly matching with my situation or operating system.

While Manually Installing a Patch ‘rootcrs.sh -patch’ Fails with – CLSRSC-400: A system reboot is required to continue installing. (Doc ID 2360097.1)

ALERT: root.sh Fails With “CLSRSC-400” While Installing GI 12.2.0.1 on RHEL or OL with RedHat Compatible Kernel (RHCK) 7.3 (Doc ID 2284463.1)

In our environment, we don’t use ACFS file system, so it was not the real problem in my case & we can always get ACFS drivers explicity if we need it in the future.

After reading all the logs I found /u01/app/12.2.0.1/grid/lib/acfstoolsdriver.sh is being called for ACFS driver installation.

I changed following code from

# Now run command with all arguments!
exec ${RUNTHIS} $@

to

# Now run command with all arguments!
#exec ${RUNTHIS} $@
exit 0

After changing above code, “./roothas.sh -postpatch” completed without errors/warning & I was able to complete GI installation successfully.

Note: This workaround is only applicable when ACFS is not being used in the environment so it can be implemented with the forewarning that there is implied risk which one must accept πŸ˜‰

Hope u will find this post very useful.

Cheers

Regards,
Adityanath

 

 

 

ORA-06598: insufficient INHERIT PRIVILEGES privilege

Few days ago I observed, all of a sudden, one of the application related cron job started failing with following error.

ORA-06598: insufficient INHERIT PRIVILEGES privilege

This job was intended to drop temporary tables in application schema. We had written a shell script in which SYS user executes procedure owned by application schema.

Only thing that was changed at DB end, that DB was upgraded from 11g to 12c.

After investigating further on the error, I found this was due to a new 12c security feature.

Before Oracle Database 12c, a PL/SQL code/pacakge/procedure always ran with the privileges of its invoker. If its invoker had higher privileges than its owner, then the code might perform operations unintended by, or forbidden to, its owner. Here we can see security gap.

For example, User A creates a new package and we execute it from users with higher privileges, like SYS. Now user A knows that SYS uses this package regularly, so user A could replace the contents of this package with some malacious code any time and do anything in the database, knowing the code will be ran by SYS sooner or later.

In 12c this behavior can be controlled using INHERITANCE PRIVILEGES.

See following linkΒ for more details.

INHERIT PRIVILEGES and INHERIT ANY PRIVILEGES Privileges

As of Oracle Database 12c, a PL/SQL code/pacakge/procedure can run with the privileges of its invoker only if its owner has either the INHERIT PRIVILEGES privilege on the invoker or the INHERIT ANY PRIVILEGES privilege.

I was able to resolve the issue after issuing below command:


SQL> grant inherit privileges on user sys to <application schema>;

Grant succeeded.

Hope so u will find this post very useful πŸ™‚

Cheers

Regards,
Adityanath

CELL-02630: There is a communication error between Management Server and Cell Server caused by a mismatch of security keys. Check that both servers have access to and use the same $OSSCONF/cellmskey.ora file.

IHAC who is on Exadata image version: 11.2.3.3.0.131014.1, faced below issue.

Any command on cellcli was failing with error: CELL-02630.

CELL-02630: There is a communication error between Management Server and Cell Server caused by a mismatch of security keys. Check that both servers have access to and use the same $OSSCONF/cellmskey.ora file.

If you read error description, it refers to communication error between CELLSRV & MS processes, due to mismatch in security keys.

I did check current status of CELL process & all were up & running.


[root@test01celadm01 ~]# service celld status
rsStatus: running
msStatus: running
cellsrvStatus: running

MS always creates key file ==> cellmskey.ora on startup if it does not exist. But in our case it was not present. (Not sure if someone deleted it manually)

I asked customer to restart MS process & check if it helps. After restarting MS process, CELLCLI commands started working as expected πŸ™‚


CellCLI> alter cell restart services ms

Restarting MS services...
The RESTART of MS services was successful.

CellCLI> list celldisk
CD_00_test01celadm01 normal
CD_01_test01celadm01 normal
CD_02_test01celadm01 normal
CD_03_test01celadm01 normal
CD_04_test01celadm01 normal
CD_05_test01celadm01 normal
CD_06_test01celadm01 normal
CD_07_test01celadm01 normal
CD_08_test01celadm01 normal
CD_09_test01celadm01 normal
CD_10_test01celadm01 normal
CD_11_test01celadm01 normal
FD_00_test01celadm01 normal
FD_01_test01celadm01 normal
FD_02_test01celadm01 normal
FD_03_test01celadm01 normal
FD_04_test01celadm01 normal
FD_05_test01celadm01 normal
FD_06_test01celadm01 normal
FD_07_test01celadm01 normal
FD_08_test01celadm01 normal
FD_09_test01celadm01 normal
FD_10_test01celadm01 normal
FD_11_test01celadm01 normal
FD_12_test01celadm01 normal
FD_13_test01celadm01 normal
FD_14_test01celadm01 normal
FD_15_test01celadm01 normal

Hope so u will find this post very useful πŸ™‚

Cheers

Regards,
Adityanath

New Exadata install getting Warning:Flash Cache size is not consistent for all storage nodes in the cluster.

Recently my customer faced the following issue, wherein after completing the X7-2 Exadata Install, Flash cache was showing different size in one of the cell node than other cells.

Everything went well with onecommand install until step 15 which had this warning:

Warning:Flash Cache size is not consistent for all storage nodes in the cluster. Flash Cache on [celadm06.test.local] does not match with the Flash Cache size on the cell celadm01.test.local in cluser /u01/app/12.2.0.1/grid

We checked flashcache size using dcli command:


[root@celadm01 linux-x64]# dcli -g cell_group -l root cellcli -e "list flashcache detail" | grep size
celadm01: size: 23.28692626953125T
celadm02: size: 23.28692626953125T
celadm03: size: 23.28692626953125T
celadm04: size: 23.28692626953125T
celadm05: size: 23.28692626953125T
celadm06: size: 23.28680419921875T ==================> Smaller flashcache than other cells
celadm07: size: 23.28692626953125T

All Flash disks were in a normal state and there was no hardware failure reported.

After investigating furter through sundiag report, I found below mismatch.


name: FD_00_celadm06
comment: 
creationTime: 2018-07-22T14:11:18+00:00
deviceName: /dev/md310
devicePartition: /dev/md310
diskType: FlashDisk
errorCount: 0
freeSpace: 0 =================================================>>>>>>>>>>>>>>>>>>>>>>>>>> freeSpace is 0
id: ***********
physicalDisk: ***********
size: 5.8218994140625T
status: normal

name: FD_01_celadm06
comment: 
creationTime: 2018-07-22T14:11:18+00:00
deviceName: /dev/md304
devicePartition: /dev/md304
diskType: FlashDisk
errorCount: 0
freeSpace: 0 =================================================>>>>>>>>>>>>>>>>>>>>>>>>>> freeSpace is 0
id: ***********
physicalDisk: ***********
size: 5.8218994140625T
status: normal

name: FD_02_celadm06
comment: 
creationTime: 2018-07-22T14:11:18+00:00
deviceName: /dev/md305
devicePartition: /dev/md305
diskType: FlashDisk
errorCount: 0
freeSpace: 0 =================================================>>>>>>>>>>>>>>>>>>>>>>>>>> freeSpace is 0
id: ***********
physicalDisk: ***********
size: 5.8218994140625T
status: normal

name: FD_03_celadm06
comment: 
creationTime: 2018-07-23T19:31:59+00:00
deviceName: /dev/md306
devicePartition: /dev/md306
diskType: FlashDisk
errorCount: 0
freeSpace: 160M =================================================>>>>>>>>>>>>>>>>>>>>>>> freeSpace 160M is not released
id: ***********
physicalDisk: ***********
size: 5.8218994140625T
status: normal

So I found the culprit πŸ™‚ The mismatch in flash cache size was caused by freeSpace not being released on one of the flash disks (FD_03_celadm06) as we can see in the logs.

I did ask customer to recreate flashcache using following procedure.


1) Check to make sure at least one mirror copy of the extents is available.

CellCLI> list griddisk attributes name,asmmodestatus,asmdeactivationoutcome
– If reporting ‘YES’ continue to step #2

2) Manually flush the flashcache:
# cellcli -e alter flashcache all flush

In a 2nd window… Check status of flashcach flush.
The following command should return “working” for each flash disk on each cell while the cache is being flushed and “completed” when it is finished.
# cellcli -e \”LIST CELLDISK ATTRIBUTES name, flushstatus, flusherror \”” | grep FD

3) Drop Flashlog:
# cellcli -e drop flashlog all

4) Drop flashcache:
# cellcli -e drop flashcache all

5) Recreate flashlog:
# cellcli -e create flashlog all

6) Recreate flashcache:
# cellcli -e create flashcache all

7) Finally check the flashcache size to see if it’s now at correct size:
# cellcli -e list flashcache detail | grep size


Issue was resolved after dropping and recreating the flashlog and flashcache on particular cellnode. πŸ™‚

Hope so u will find this post very useful πŸ™‚

Cheers

Regards,
Adityanath

Exadata image 18.1.5 in status failure due to Validation check ERROR – NOT RUNNING for service: dbserverd

Recently one of my client faced issue after upgrading Exadata image in DB server, image was showing its status as failure. I did review all patchmgr logs but didn’t see anything weird.


root@testserver1 ]# imageinfo
Kernel version: 4.1.12-94.8.4.el6uek.x86_64 #2 SMP Sat May 5 16:14:51 PDT 2018 x86_64
Image kernel version: 4.1.12-94.8.4.el6uek
Image version: 18.1.5.0.0.180506 
Image activated: 2018-05-29 18:03:57 +0200
Image status: failure ============================> Issue
System partition on device: /dev/mapper/VGExaDb-LVDbSys1

I asked customer to run validations manually as below:

/opt/oracle.cellos/validations/bin/vldrun.pl -quiet -all

Customer shared o/p of the command as below:


[root@testserver1 ]# /opt/oracle.cellos/validations/bin/vldrun.pl -quiet -all
Logging started to /var/log/cellos/validations.log
Command line is /opt/oracle.cellos/validations/bin/vldrun.pl -quiet -all
Run validation ipmisettings - PASSED
Run validation misceachboot - FAILED   ============================> Issue
Check log in /var/log/cellos/validations/misceachboot.log
Run validation biosbootorder - PASSED
Run validation oswatcher - PASSED
Run validation checkdeveachboot - PASSED
Run validation checkconfigs - BACKGROUND RUN
Run validation saveconfig - BACKGROUND RUN

After checking in misceachboot.log, I found below error:


-bash-4.4$ cat misceachboot.log | grep -i error
BIOS is Already Pause On Error on Adapter 0.
[1527609678][2018-05-29 18:03:53 +0200][ERROR][0-0][/opt/oracle.cellos/image_functions][image_functions_check_configured_services][] Validation check ERROR - NOT RUNNING for service: dbserverd
BIOS is Already Pause On Error on Adapter 0.
[1527678371][2018-05-30 13:06:56 +0200][ERROR][0-0][/opt/oracle.cellos/image_functions][image_functions_check_configured_services][] Validation check ERROR - NOT RUNNING for service: dbserverd

This shows something went wrong with service: dbserverd.

I asked him to check status of dbserverd services & to manually stop & start dbserverd services on affected server.

1. service dbserverd status

2. service dbserverd stop

3. service dbserverd start


[root@testserver1 ]# service dbserverd status
rsStatus: running
msStatus: stopped       ============================> Issue

[root@testserver1 ]# service dbserverd stop
Stopping the RS and MS services...
The SHUTDOWN of services was successful.

[root@testserver1 ]# service dbserverd start
Starting the RS services...
Getting the state of RS services... running
Starting MS services...
DBM-01513: DBMCLI request to Restart Server (RS) has timed out.
The STARTUP of MS services was not successful. Error: Unknown Error

This confirmed issue was with MS services. I asked customer to restart DB server but it didn’t resolve the issue.

Now I asked customer to reconfigure MS services as given below & check if it helps:


1. ssh to the node as root

2. Shutdown running RS and MS

DBMCLI>ALTER DBSERVER SHUTDOWN SERVICES ALL

see all the pids by “ps -ef | grep “dbserver.*dbms”, just kill them all.

3. re-deploy MS:
/opt/oracle/dbserver/dbms/deploy/scripts/unix/setup_dynamicDeploy DB -D

4. Restart RS and MS
DBMCLI>ALTER DBSERVER STARTUP SERVICES ALL


& this action plan resolved the issue:


[root@testserver1 ]# dbmcli
DBMCLI: Release - Production on Wed May 30 16:05:13 CEST 2018

Copyright (c) 2007, 2016, Oracle and/or its affiliates. All rights reserved.

DBMCLI> ALTER DBSERVER STARTUP SERVICES ALL
Starting the RS and MS services...
Getting the state of RS services... running
Starting MS services...
The STARTUP of MS services was successful.

DBMCLI> exit
quitting

[root@testserver1 ]# service dbserverd status
rsStatus: running
msStatus: running    ============================> Resolved 
[root@testserver1 ]#

Then we need to rerun validations to check if it is successful now:


[root@testserver1 ]# /opt/oracle.cellos/validations/bin/vldrun.pl -quiet -all
Logging started to /var/log/cellos/validations.log
Command line is /opt/oracle.cellos/validations/bin/vldrun.pl -quiet -all
Run validation ipmisettings - PASSED
Run validation misceachboot - PASSED ============================> Resolved 
Check log in /var/log/cellos/validations/misceachboot.log
Run validation biosbootorder - PASSED
Run validation oswatcher - PASSED
Run validation checkdeveachboot - PASSED
Run validation checkconfigs - BACKGROUND RUN
Run validation saveconfig - BACKGROUND RUN

Now you need to check image status:


[root@testserver1 ]# imageinfo
Kernel version: 4.1.12-94.8.4.el6uek.x86_64 #2 SMP Sat May 5 16:14:51 PDT 2018 x86_64
Image kernel version: 4.1.12-94.8.4.el6uek
Image version: 18.1.5.0.0.180506 
Image activated: 2018-05-29 18:03:57 +0200
Image status: success ============================> Resolved
System partition on device: /dev/mapper/VGExaDb-LVDbSys1

Sometimes this can still show status as failure where you can mark image status as success manually after checking with Oracle Support πŸ™‚

Hope so u will find this post very useful πŸ™‚

Cheers

Regards,
Adityanath

Telnet command fails with telnet: /lib64/libc.so.6: version `GLIBC_2.15′ not found (required by telnet)

Yesterday one of my customer had issue with running telnet command on exadata server. It was failing with below error:


[root@extestserver ~]# telnet
telnet: /lib64/libc.so.6: version `GLIBC_2.15' not found (required by telnet)

I asked him to provide me details like telnet version & kernel version running on the server:


[root@extestserver ~]# imageinfo
Kernel version: 2.6.39-400.294.4.el6uek.x86_64 #1 SMP Tue Mar 14 18:42:17 PDT 2017 x86_64
Image kernel version: 2.6.39-400.294.4.el6uek
Image version: 12.1.2.3.5.170418
Image activated: 2017-08-25 11:55:37 +0300
Image status: success
System partition on device: /dev/mapper/VGExaDb-LVDbSys1

[root@extestserver ~]# rpm -qa |grep telnet
telnet-1.2-166.4.1.x86_64

From error, obviously telnet rpm was installed on the server but it was not compatible with GLIBC rpm installed on the server.

The installed telnet rpms output shows the version of telnet which is not available in Oracle’s public yum repositoryΒ So customer had installed it from outside oracle’s yum public repository.

From Oracle’s public yum repository,Β I found, supported telnet version on OEL6 isΒ telnet-0.17-48.el6.x86_64.rpm.

So I asked customer to uninstall current telnet installed on the server & install supported one.


rpm -e telnet-1.2-166.4.1.x86_64
rpm -ivh telnet-0.17-48.el6.x86_64.rpm

Telnet started working as expected πŸ™‚

Hope so u will find this post very useful:-)

Cheers

Regards,
Adityanath