CELL-02630: There is a communication error between Management Server and Cell Server caused by a mismatch of security keys. Check that both servers have access to and use the same $OSSCONF/cellmskey.ora file.

IHAC who is on Exadata image version: 11.2.3.3.0.131014.1, faced below issue.

Any command on cellcli was failing with error: CELL-02630.

CELL-02630: There is a communication error between Management Server and Cell Server caused by a mismatch of security keys. Check that both servers have access to and use the same $OSSCONF/cellmskey.ora file.

If you read error description, it refers to communication error between CELLSRV & MS processes, due to mismatch in security keys.

I did check current status of CELL process & all were up & running.


[root@test01celadm01 ~]# service celld status
rsStatus: running
msStatus: running
cellsrvStatus: running

MS always creates key file ==> cellmskey.ora on startup if it does not exist. But in our case it was not present. (Not sure if someone deleted it manually)

I asked customer to restart MS process & check if it helps. After restarting MS process, CELLCLI commands started working as expected 🙂


CellCLI> alter cell restart services ms

Restarting MS services...
The RESTART of MS services was successful.

CellCLI> list celldisk
CD_00_test01celadm01 normal
CD_01_test01celadm01 normal
CD_02_test01celadm01 normal
CD_03_test01celadm01 normal
CD_04_test01celadm01 normal
CD_05_test01celadm01 normal
CD_06_test01celadm01 normal
CD_07_test01celadm01 normal
CD_08_test01celadm01 normal
CD_09_test01celadm01 normal
CD_10_test01celadm01 normal
CD_11_test01celadm01 normal
FD_00_test01celadm01 normal
FD_01_test01celadm01 normal
FD_02_test01celadm01 normal
FD_03_test01celadm01 normal
FD_04_test01celadm01 normal
FD_05_test01celadm01 normal
FD_06_test01celadm01 normal
FD_07_test01celadm01 normal
FD_08_test01celadm01 normal
FD_09_test01celadm01 normal
FD_10_test01celadm01 normal
FD_11_test01celadm01 normal
FD_12_test01celadm01 normal
FD_13_test01celadm01 normal
FD_14_test01celadm01 normal
FD_15_test01celadm01 normal

Hope so u will find this post very useful 🙂

Cheers

Regards,
Adityanath

Advertisements

New Exadata install getting Warning:Flash Cache size is not consistent for all storage nodes in the cluster.

Recently my customer faced the following issue, wherein after completing the X7-2 Exadata Install, Flash cache was showing different size in one of the cell node than other cells.

Everything went well with onecommand install until step 15 which had this warning:

Warning:Flash Cache size is not consistent for all storage nodes in the cluster. Flash Cache on [celadm06.test.local] does not match with the Flash Cache size on the cell celadm01.test.local in cluser /u01/app/12.2.0.1/grid

We checked flashcache size using dcli command:


[root@celadm01 linux-x64]# dcli -g cell_group -l root cellcli -e "list flashcache detail" | grep size
celadm01: size: 23.28692626953125T
celadm02: size: 23.28692626953125T
celadm03: size: 23.28692626953125T
celadm04: size: 23.28692626953125T
celadm05: size: 23.28692626953125T
celadm06: size: 23.28680419921875T ==================> Smaller flashcache than other cells
celadm07: size: 23.28692626953125T

All Flash disks were in a normal state and there was no hardware failure reported.

After investigating furter through sundiag report, I found below mismatch.


name: FD_00_celadm06
comment: 
creationTime: 2018-07-22T14:11:18+00:00
deviceName: /dev/md310
devicePartition: /dev/md310
diskType: FlashDisk
errorCount: 0
freeSpace: 0 =================================================>>>>>>>>>>>>>>>>>>>>>>>>>> freeSpace is 0
id: ***********
physicalDisk: ***********
size: 5.8218994140625T
status: normal

name: FD_01_celadm06
comment: 
creationTime: 2018-07-22T14:11:18+00:00
deviceName: /dev/md304
devicePartition: /dev/md304
diskType: FlashDisk
errorCount: 0
freeSpace: 0 =================================================>>>>>>>>>>>>>>>>>>>>>>>>>> freeSpace is 0
id: ***********
physicalDisk: ***********
size: 5.8218994140625T
status: normal

name: FD_02_celadm06
comment: 
creationTime: 2018-07-22T14:11:18+00:00
deviceName: /dev/md305
devicePartition: /dev/md305
diskType: FlashDisk
errorCount: 0
freeSpace: 0 =================================================>>>>>>>>>>>>>>>>>>>>>>>>>> freeSpace is 0
id: ***********
physicalDisk: ***********
size: 5.8218994140625T
status: normal

name: FD_03_celadm06
comment: 
creationTime: 2018-07-23T19:31:59+00:00
deviceName: /dev/md306
devicePartition: /dev/md306
diskType: FlashDisk
errorCount: 0
freeSpace: 160M =================================================>>>>>>>>>>>>>>>>>>>>>>> freeSpace 160M is not released
id: ***********
physicalDisk: ***********
size: 5.8218994140625T
status: normal

So I found the culprit 🙂 The mismatch in flash cache size was caused by freeSpace not being released on one of the flash disks (FD_03_celadm06) as we can see in the logs.

I did ask customer to recreate flashcache using following procedure.


1) Check to make sure at least one mirror copy of the extents is available.

CellCLI> list griddisk attributes name,asmmodestatus,asmdeactivationoutcome
– If reporting ‘YES’ continue to step #2

2) Manually flush the flashcache:
# cellcli -e alter flashcache all flush

In a 2nd window… Check status of flashcach flush.
The following command should return “working” for each flash disk on each cell while the cache is being flushed and “completed” when it is finished.
# cellcli -e \”LIST CELLDISK ATTRIBUTES name, flushstatus, flusherror \”” | grep FD

3) Drop Flashlog:
# cellcli -e drop flashlog all

4) Drop flashcache:
# cellcli -e drop flashcache all

5) Recreate flashlog:
# cellcli -e create flashlog all

6) Recreate flashcache:
# cellcli -e create flashcache all

7) Finally check the flashcache size to see if it’s now at correct size:
# cellcli -e list flashcache detail | grep size


Issue was resolved after dropping and recreating the flashlog and flashcache on particular cellnode. 🙂

Hope so u will find this post very useful 🙂

Cheers

Regards,
Adityanath

Exadata image 18.1.5 in status failure due to Validation check ERROR – NOT RUNNING for service: dbserverd

Recently one of my client faced issue after upgrading Exadata image in DB server, image was showing its status as failure. I did review all patchmgr logs but didn’t see anything weird.


root@testserver1 ]# imageinfo
Kernel version: 4.1.12-94.8.4.el6uek.x86_64 #2 SMP Sat May 5 16:14:51 PDT 2018 x86_64
Image kernel version: 4.1.12-94.8.4.el6uek
Image version: 18.1.5.0.0.180506 
Image activated: 2018-05-29 18:03:57 +0200
Image status: failure ============================> Issue
System partition on device: /dev/mapper/VGExaDb-LVDbSys1

I asked customer to run validations manually as below:

/opt/oracle.cellos/validations/bin/vldrun.pl -quiet -all

Customer shared o/p of the command as below:


[root@testserver1 ]# /opt/oracle.cellos/validations/bin/vldrun.pl -quiet -all
Logging started to /var/log/cellos/validations.log
Command line is /opt/oracle.cellos/validations/bin/vldrun.pl -quiet -all
Run validation ipmisettings - PASSED
Run validation misceachboot - FAILED   ============================> Issue
Check log in /var/log/cellos/validations/misceachboot.log
Run validation biosbootorder - PASSED
Run validation oswatcher - PASSED
Run validation checkdeveachboot - PASSED
Run validation checkconfigs - BACKGROUND RUN
Run validation saveconfig - BACKGROUND RUN

After checking in misceachboot.log, I found below error:


-bash-4.4$ cat misceachboot.log | grep -i error
BIOS is Already Pause On Error on Adapter 0.
[1527609678][2018-05-29 18:03:53 +0200][ERROR][0-0][/opt/oracle.cellos/image_functions][image_functions_check_configured_services][] Validation check ERROR - NOT RUNNING for service: dbserverd
BIOS is Already Pause On Error on Adapter 0.
[1527678371][2018-05-30 13:06:56 +0200][ERROR][0-0][/opt/oracle.cellos/image_functions][image_functions_check_configured_services][] Validation check ERROR - NOT RUNNING for service: dbserverd

This shows something went wrong with service: dbserverd.

I asked him to check status of dbserverd services & to manually stop & start dbserverd services on affected server.

1. service dbserverd status

2. service dbserverd stop

3. service dbserverd start


[root@testserver1 ]# service dbserverd status
rsStatus: running
msStatus: stopped       ============================> Issue

[root@testserver1 ]# service dbserverd stop
Stopping the RS and MS services...
The SHUTDOWN of services was successful.

[root@testserver1 ]# service dbserverd start
Starting the RS services...
Getting the state of RS services... running
Starting MS services...
DBM-01513: DBMCLI request to Restart Server (RS) has timed out.
The STARTUP of MS services was not successful. Error: Unknown Error

This confirmed issue was with MS services. I asked customer to restart DB server but it didn’t resolve the issue.

Now I asked customer to reconfigure MS services as given below & check if it helps:


1. ssh to the node as root

2. Shutdown running RS and MS

DBMCLI>ALTER DBSERVER SHUTDOWN SERVICES ALL

see all the pids by “ps -ef | grep “dbserver.*dbms”, just kill them all.

3. re-deploy MS:
/opt/oracle/dbserver/dbms/deploy/scripts/unix/setup_dynamicDeploy DB -D

4. Restart RS and MS
DBMCLI>ALTER DBSERVER STARTUP SERVICES ALL


& this action plan resolved the issue:


[root@testserver1 ]# dbmcli
DBMCLI: Release - Production on Wed May 30 16:05:13 CEST 2018

Copyright (c) 2007, 2016, Oracle and/or its affiliates. All rights reserved.

DBMCLI> ALTER DBSERVER STARTUP SERVICES ALL
Starting the RS and MS services...
Getting the state of RS services... running
Starting MS services...
The STARTUP of MS services was successful.

DBMCLI> exit
quitting

[root@testserver1 ]# service dbserverd status
rsStatus: running
msStatus: running    ============================> Resolved 
[root@testserver1 ]#

Then we need to rerun validations to check if it is successful now:


[root@testserver1 ]# /opt/oracle.cellos/validations/bin/vldrun.pl -quiet -all
Logging started to /var/log/cellos/validations.log
Command line is /opt/oracle.cellos/validations/bin/vldrun.pl -quiet -all
Run validation ipmisettings - PASSED
Run validation misceachboot - PASSED ============================> Resolved 
Check log in /var/log/cellos/validations/misceachboot.log
Run validation biosbootorder - PASSED
Run validation oswatcher - PASSED
Run validation checkdeveachboot - PASSED
Run validation checkconfigs - BACKGROUND RUN
Run validation saveconfig - BACKGROUND RUN

Now you need to check image status:


[root@testserver1 ]# imageinfo
Kernel version: 4.1.12-94.8.4.el6uek.x86_64 #2 SMP Sat May 5 16:14:51 PDT 2018 x86_64
Image kernel version: 4.1.12-94.8.4.el6uek
Image version: 18.1.5.0.0.180506 
Image activated: 2018-05-29 18:03:57 +0200
Image status: success ============================> Resolved
System partition on device: /dev/mapper/VGExaDb-LVDbSys1

Sometimes this can still show status as failure where you can mark image status as success manually after checking with Oracle Support 🙂

Hope so u will find this post very useful 🙂

Cheers

Regards,
Adityanath

Telnet command fails with telnet: /lib64/libc.so.6: version `GLIBC_2.15′ not found (required by telnet)

Yesterday one of my customer had issue with running telnet command on exadata server. It was failing with below error:


[root@extestserver ~]# telnet
telnet: /lib64/libc.so.6: version `GLIBC_2.15' not found (required by telnet)

I asked him to provide me details like telnet version & kernel version running on the server:


[root@extestserver ~]# imageinfo
Kernel version: 2.6.39-400.294.4.el6uek.x86_64 #1 SMP Tue Mar 14 18:42:17 PDT 2017 x86_64
Image kernel version: 2.6.39-400.294.4.el6uek
Image version: 12.1.2.3.5.170418
Image activated: 2017-08-25 11:55:37 +0300
Image status: success
System partition on device: /dev/mapper/VGExaDb-LVDbSys1

[root@extestserver ~]# rpm -qa |grep telnet
telnet-1.2-166.4.1.x86_64

From error, obviously telnet rpm was installed on the server but it was not compatible with GLIBC rpm installed on the server.

The installed telnet rpms output shows the version of telnet which is not available in Oracle’s public yum repository So customer had installed it from outside oracle’s yum public repository.

From Oracle’s public yum repository, I found, supported telnet version on OEL6 is telnet-0.17-48.el6.x86_64.rpm.

So I asked customer to uninstall current telnet installed on the server & install supported one.


rpm -e telnet-1.2-166.4.1.x86_64
rpm -ivh telnet-0.17-48.el6.x86_64.rpm

Telnet started working as expected 🙂

Hope so u will find this post very useful:-)

Cheers

Regards,
Adityanath

 

 

 

 

Oracle Histograms Simplified: Part 2

In my previous post Oracle Histograms Simplified: Part 1, I focused on what is histogram & why it is used. In this post I will explain different ways to create & drop histograms.

HISTOGRAM CREATION:

Creation of histogram is always depends on parameter method_opt that you specify during gathering statstics.

Typical syntax for method_opt is as follows:

First column part of clause has different options as given below:


1. ALL COULMNS :

As name suggests oracle considers all columns on table for histogram creation.

EXEC DBMS_STATS.GATHER_TABLE_STATS (OWNNAME=>'<OWNER>’,TABNAME=>'<TABLE_NAME>’,METHOD_OPT => ‘FOR ALL COLUMNS SIZE AUTO’,CASCADE = TRUE);


2. ALL INDEXED COLUMNS :

Oracle will consider only columns with indexes for histogram creation.

EXEC DBMS_STATS.GATHER_TABLE_STATS (OWNNAME=>'<OWNER>',TABNAME=>'<TABLE_NAME>',METHOD_OPT => 'FOR ALL INDEXED COLUMNS SIZE AUTO',CASCADE = TRUE);

3. ALL HIDDEN COLUMNS :

Oracle will cosider only hidden columns for histogram creation.

EXEC DBMS_STATS.GATHER_TABLE_STATS (OWNNAME=>'<OWNER>',TABNAME=>'<TABLE_NAME>',METHOD_OPT => 'FOR ALL HIDDEN COLUMNS SIZE AUTO',CASCADE = TRUE);

4. COLUMNS <COLUMN NAMES SEPERATED BY COMMA> :

Oracle will consider only given columns for histogram creation.

EXEC DBMS_STATS.GATHER_TABLE_STATS (OWNNAME=>'<OWNER>',TABNAME=>'<TABLE_NAME>',METHOD_OPT => 'FOR COLUMNS COLUMN1,COLUMN2 SIZE AUTO',CASCADE = TRUE);

Moving to next portion of syntax, which lets you controls histogram creation and its type.

Size clause has different options as given below:


1. INTEGER:

You can specify number of buckets. It must be in range of 1 to 254. 1 denotes no histogram with base column statistics (i.e. minimal information required for the Optimizer) while 254 is highest allowable value.

EXEC DBMS_STATS.GATHER_TABLE_STATS (OWNNAME=>'<OWNER>',TABNAME=>'<TABLE_NAME>',METHOD_OPT => 'FOR ALL COLUMNS SIZE 10',CASCADE = TRUE);

2. AUTO:

This most commonly used size in real time scenarios. The basic process is as follows:

  • You run DBMS_STATS for a table with the METHOD_OPT parameter set to the default SIZE AUTO.
  • You query the table.
  • The database keeps track of the predicates in the preceding query and updates the data dictionary table SYS.COL_USAGE$.
  • Then you run DBMS_STATS again, DBMS_STATS takes into consideration data in SYS.COL_USAGE$ to determine which columns require histograms based on queries ran on table.

This method has following drawbacks:

  • If you gather stats with auto option & do not query table at all, then Oracle will not create any histogram on column.
  • If queries are constantly being changed in particular environment, so even data in table almost the same, can cause changes in execution plans for the queries.
EXEC DBMS_STATS.GATHER_TABLE_STATS (OWNNAME=>'<OWNER>',TABNAME=>'<TABLE_NAME>',METHOD_OPT => 'FOR ALL COLUMNS SIZE AUTO',CASCADE = TRUE);

3. REPEAT :

It will just collect histograms on columns which already have histograms.

EXEC DBMS_STATS.GATHER_TABLE_STATS (OWNNAME=>'<OWNER>',TABNAME=>'<TABLE_NAME>',METHOD_OPT => 'FOR ALL COLUMNS SIZE REPEAT',CASCADE = TRUE);

4. SKEWONLY :

Oracle collects histograms on the columns with only skewed data.

EXEC DBMS_STATS.GATHER_TABLE_STATS (OWNNAME=>'<OWNER>',TABNAME=>'<TABLE_NAME>',METHOD_OPT => 'FOR ALL COLUMNS SIZE SKEWONLY',CASCADE = TRUE);

HISTOGRAM DELETION:

There are different methods for dropping histograms on table columns:


To completely delete all types of column statistics plus histograms for a column including minimal statistics:

EXEC DBMS_STATS.DELETE_COLUMN_STATS(OWNNAME=>'<OWNER>',TABNAME=>'<TABLE_NAME>','<COLUMN_NAME>');

This is generally not recommended as the optimizer needs at least the minimal (size 1) information for creating optimal execution plans.


To delete histograms from a column without touching base column statistics i.e. minimal information required for the Optimizer (This is recommended method for pre 11g databases).

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>'<OWNER>',TABNAME=>'<TABLE_NAME>',METHOD_OPT=>'FOR COLUMNS <COLUMN_NAME> SIZE 1');

From Oracle 11g onwards, there is new syntax to delete the histogram for a column while still leaving behind the base column statistics:

EXEC DBMS_STATS.DELETE_COLUMN_STATS(OWNNAME=>'<OWNER>',TABNAME=>'<TABLE_NAME>',COLNAME=>'<COLUMN_NAME>',COL_STAT_TYPE=>'HISTOGRAM');

Additionally we can delete histograms for a partition:

EXEC DBMS_STATS.DELETE_COLUMN_STATS('<OWNER>',TABNAME=>'<TABLE_NAME>',COLNAME=>'<COLUMN_NAME>',PARTNAME=>'<PARTITION_NAME>', COL_STAT_TYPE=>'HISTOGRAM');

and to delete column histograms for the table and all its partitions:

 EXEC DBMS_STATS.DELETE_COLUMN_STATS(OWNNAME=>'<OWNER>',TABNAME=>'<TABLE_NAME>',COLNAME=>'<COLUMN_NAME>',CASCADE_PARTS=>TRUE,COL_STAT_TYPE=>'HISTOGRAM');

Hope so u will find this post very useful:-)

Visit Oracle Histograms Simplified: Part 1, in which I focussed on what is histogram & why it is used.

Cheers

Regards,
Adityanath

Oracle Histograms Simplified: Part 1

Oracle Histogram is very nice feature to help CBO to make right decision while constructing execution plan. But many of DBA’s do not use it, assuming it is complex concept to use. In next few posts, I will try to simplify it, so that we can achieve maximum benefit from it.

What it Histogram & what is its purpose:

A histogram is a special type of column statistic that provides more detailed information about the data distribution in a table column. In simple words, it is a feature in CBO and it helps optimizer to determine how data are distributed/skewed within column. Predicates with skewed data are good candidates for histogram creation.

Basically CBO assumes uniform data distribution of data inside column across the distinct values. For the columns that contain skewed data, histogram helps to generate more accurate execution plans.

Below example will explain you significance of histograms:


I have a table ADI.SALES_ORDER with 100000 rows with index SALES_ORDER_N1 on column GENERATED.

GENERATED column has 4 distinct values: A, B, Y, N.  CBO will assume uniform data distribution across these values. So any query with column GENERATED as predicate will go for FTS, as CBO will assume it has to select 25% data from table.

But in reality data under column generated in skewed as given below:

I gathered fresh statistics without histogram on table as shown below:

Now will compare execution plan with different values for predicate:

As expected this query is going for FTS, as it has to select approx. 80% of data.

But I will expect query to pick index in case of GENERATED=’N’ as it will just select less than 1% of rows from table.

But NO its still going for FTS 😦 & it is due to CBO’s default behavior.

Now here Histogram should come into picture:

Now I will once again compare plans:

As expected following query is going for FTS, as it has to select approx. 80% of data.

But following query has picked up index due to creation of histogram. 🙂

In this post, we focused on what is histogram & why it is used.

Hope so u will find this post very useful:-)

Stay tuned for Oracle Histograms Simplified: Part 2, in which I will explain what are different ways to create & drop histogram on particular column.

Cheers

Regards,
Adityanath

Drop empty tablespace fails with ORA-14404 or ORA-144047 partitioned table contains partitions/subpartitions in a different tablespace

Recently I was performing database reorg operation on one of pre-prod environment with traditional way of table/lob movement followed by index rebuild. After moving all segments from current tablespace to new, I was ready to drop original one.


SQL> select count(0) from dba_segments where tablespace_name='MY_TABLESPACE';
  COUNT(0)
----------         
0

And it errored out!!!


SQL> DROP TABLESPACE MY_TABLESPACE INCLUDING CONTENTS AND DATAFILES;
DROP TABLESPACE MY_TABLESPACE INCLUDING CONTENTS AND DATAFILES
*ERROR at line 1:
ORA-14404: partitioned table contains partitions in a different tablespace

As first step, I tried purging dba_recyclebin, but it didn’t help.

After searching on metalink, I found following note with similar issue: ORA-14404 or ORA-14407 When Trying to Drop a Tablespace (Doc ID 1674989.1)

which talks about two causes of this issue:

1. An attempt was made to drop a tablespace which contains tables whose partitions or subpartitions are not completely contained in this tablespace. 

OR

2. This can happen due to deferred segment creation introduced in 11.2.

First cause was matching my case:

I tried finding out tables whose partitions were spread across other partitions along with tablespace MY_TABLESPACE & moved to it to new tablespace as well:


SELECT    'alter table '|| table_owner|| '.'|| table_name|| ' move partition '|| partition_name|| '   tablespace MY_TABLESPACE_NEW;'  FROM (SELECT DISTINCT table_name, partition_name          FROM dba_tab_partitions         WHERE tablespace_name = 'MY_TABLESPACE') a,       dba_tab_partitions b WHERE b.tablespace_name <> 'MY_TABLESPACE'       AND a.table_name = b.table_name;

After moving all affected table partitions, I once again tried dropping tablespace: MY_TABLESPACE. It once again failed with same error: ORA-14404.
Then I found same issue with some of index partitions as well, for which I used following command:


select distinct 'alter index ' || index_owner || '.' || a.index_name || ' rebuild partition ' || a.partition_name || ' tablespace MY_TABLESPACE_NEW;'from ( select distinct index_name,partition_namefrom dba_ind_partitionswhere tablespace_name = 'MY_TABLESPACE' ) a, dba_ind_partitions bwhere b.tablespace_name <> 'MY_TABLESPACE' and a.index_name = b.index_name;

Once indexes were rebuild, I was able to drop tablespace MY_TABLESPACE without any issues 🙂

Note: You may also get ORA-14407 is some cases, wherein sub-partitions are the issue, for which you will use above commands just with sub-partitions views.

Just to elaborate seconds possible cause of this error: deferred segment creation:


The Oracle Database 11.2 includes a new space allocation method. When you create a non-partitioned heap table, the table segment creation is deferred to the first row insert. This functionality is enabled by default with the initialization parameter DEFERRED_SEGMENT_CREATION  set to TRUE.So in simple words, if you have set this parameter to value TRUE, particular table will be populated in dba_tables once created, but will not be populated in dba_segments unless we insert a row to it. So in the case of defered segments, you will not be able to drop tablespace.


Hope so u will find this post very useful:-)

Cheers

Regards,
Adityanath