Nilesh Oracle DBA Blog: January 2021

Tuesday, January 12, 2021

How to Change Database Name Using Control file trace

How to Change Database Name.

Scenario :- Unix Team Took Snapshot Current QA Server and Created a new Server with QAtest Name

And here I have to open a database with newname QRT from PLO.

On New Server Lets login.

export ORACLE_SID=PLO

$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.4.0 Production on Tue Jan 12 22:27:38 2021

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL> select name from v$database;
NAME
---------
PL0
SQL> select open_mode,name from V$database;
OPEN_MODE NAME
-------------------- --------- =====> DB name PL0
MOUNTED PL0

Take Backup of Existing Redologfile which are need for recovery. ======>

$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.4.0 Production on Tue Jan 12 22:24:04 2021
Copyright (c) 1982, 2013, Oracle. All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL> select member from v$logfile;
MEMBER
--------------------------------------------------------------------------------
/oracle/PL0/origlogB/log_g4m1.dbf
/oracle/PL0/mirrlogB/log_g4m2.dbf
/oracle/PL0/origlogA/log_g3m1.dbf
/oracle/PL0/mirrlogA/log_g3m2.dbf
/oracle/PL0/origlogB/log_g2m1.dbf
/oracle/PL0/mirrlogB/log_g2m2.dbf
/oracle/PL0/origlogA/log_g1m1.dbf
/oracle/PL0/mirrlogA/log_g1m2.dbf
8 rows selected.
SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

$ cp /oracle/PL0/origlogB/log_g4m1.dbf /oracle/PL0/origlogB/log_g4m1.dbf_bkp
$ cp /oracle/PL0/mirrlogB/log_g4m2.dbf /oracle/PL0/mirrlogB/log_g4m2.dbf_bkp
$ cp /oracle/PL0/origlogA/log_g3m1.dbf /oracle/PL0/origlogA/log_g3m1.dbf_bkp
$ cp /oracle/PL0/mirrlogA/log_g3m2.dbf /oracle/PL0/mirrlogA/log_g3m2.dbf_bkp
$ cp /oracle/PL0/origlogB/log_g2m1.dbf /oracle/PL0/origlogB/log_g2m1.dbf_bkp
$ cp /oracle/PL0/mirrlogB/log_g2m2.dbf /oracle/PL0/mirrlogB/log_g2m2.dbf_bkp
$ cp /oracle/PL0/origlogA/log_g1m1.dbf /oracle/PL0/origlogA/log_g1m1.dbf_bkp
$ cp /oracle/PL0/mirrlogA/log_g1m2.dbf /oracle/Pl0/mirrlogA/log_g1m2.dbf_bkp

Take Controlfile Backup as trace and it located on ===>

./oracle/diag/rdbms/qrt/QRT/trace/

location.

SQL> startup Mount

ORA-32004: obsolete or deprecated parameter(s) specified for RDBMS instance

ORACLE instance started.

Total System Global Area 5.3447E+10 bytes

Fixed Size 2265864 bytes

Variable Size 1.7717E+10 bytes

Database Buffers 3.5702E+10 bytes

Redo Buffers 26480640 bytes

Database mounted.

Database opened.

SQL> alter database backup controlfile to trace;

Database altered.

SQL> select * from V$diag_info;

INST_ID NAME

---------- ----------------------------------------------------------------

VALUE

--------------------------------------------------------------------------------

1 Diag Cdump

/oracle/diag/rdbms/qrt/QRT/cdump

1 Health Monitor

/oracle/diag/rdbms/qrt/QRT/hm

1 Default Trace File

/oracle/diag/rdbms/qrt/QRT/trace/QRT_ora_12755.trc

Update the Controlfile trace. and update REUSE with SET.

STARTUP NOMOUNT

CREATE CONTROLFILE SET DATABASE "QRT" RESETLOGS ARCHIVELOG

MAXLOGFILES 255

MAXLOGMEMBERS 2

MAXDATAFILES 3000

MAXINSTANCES 2

MAXLOGHISTORY 22495

LOGFILE

GROUP 1 (

'/oracle/QRT/origlogA/log_g1m1.dbf',

'/oracle/QRT/mirrlogA/log_g1m2.dbf'

) SIZE 250M BLOCKSIZE 512,

GROUP 2 (

'/oracle/QRT/origlogB/log_g2m1.dbf',

'/oracle/QRT/mirrlogB/log_g2m2.dbf'

) SIZE 250M BLOCKSIZE 512,

GROUP 3 (

'/oracle/QRT/origlogA/log_g3m1.dbf',

'/oracle/QRT/mirrlogA/log_g3m2.dbf'

) SIZE 250M BLOCKSIZE 512,

GROUP 4 (

'/oracle/QRT/origlogB/log_g4m1.dbf',

'/oracle/QRT/mirrlogB/log_g4m2.dbf'

) SIZE 250M BLOCKSIZE 512

-- STANDBY LOGFILE

DATAFILE

'/oracle/QRT/sapdata1/system_1/system.data1',

'/oracle/QRT/sapdata1/undo_1/undo.data1',

'/oracle/QRT/sapdata2/undo_2/undo.data2',

'/oracle/QRT/sapdata1/undo_3/undo.data3',

'/oracle/QRT/sapdata2/undo_4/undo.data4',

'/oracle/QRT/sapdata1/sysaux_1/sysaux.data1',

'/oracle/QRT/sapdata2/dat_1/dat.data1',

'/oracle/QRT/sapdata2/dat_2/dat.data2',

'/oracle/QRT/sapdata5/dat_3/dat.data3',

'/oracle/QRT/sapdata3/dat_4/dat.data4'

CHARACTER SET UTF8

;

Shut down The database instance

$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.4.0 Production on Tue Jan 12 22:33:57 2021

Connected to:

Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production

With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> shut abort

ORACLE instance shut down.

Take Backup of Controlfile ====>

dp3:oraQRT 71> grep con initQRT.ora

*.control_file_record_keep_time=30

*.control_files='/oracle/QRT/origlogA/cntrl/cntrlQRT.dbf','/oracle/QRT/origlogB/cntrl/cntrlQRT.dbf','/oracle/QRT/sapdata1/cntrlrlQRT.dbf'

dp3:oraQRT 72>

$ cd /oracle/QWT/origlogA/cntrl/

$ mv cntrlQWT.dbf cntrlQRT.dbf_bkp

mv/oracle/QRT/origlogB/cntrl/cntrlQRT.dbf','/oracle/QRT/sapdata1/cntrlrlQRT.dbf_bkp'

Now to startup a Database with QRT name.

export ORACLE_SID=QRT
$ ps -ef|grep pmon
oraQRT 9987 8763 0 22:36 pts/2 00:00:00 grep pmon
$
$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.4.0 Production on Tue Jan 12 22:36:56 2021
Copyright (c) 1982, 2013, Oracle. All rights reserved.
Connected to an idle instance.
SQL> @new_con.sql
ORA-32004: obsolete or deprecated parameter(s) specified for RDBMS instance
ORACLE instance started.
Total System Global Area 5.3447E+10 bytes
Fixed Size 2265864 bytes
Variable Size 1.7717E+10 bytes
Database Buffers 3.5702E+10 bytes
Redo Buffers 26480640 bytes
Control file created.
SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: '/oracle/QRT/sapdata1/system_1/system.data1'

SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01589: must use RESETLOGS or NORESETLOGS option for database open

SQL> recover database using backup controlfile until cancel;
ORA-00279: change 31328916885 generated at 01/09/2021 12:53:52 needed for
thread 1
ORA-00289: suggestion : /oracle/QRT/oraarch/QRTarch1_14978_1036372415.dbf
ORA-00280: change 31328916885 for thread 1 is in sequence #14978

Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
/oracle/P10/mirrlogB/log_g2m2.dbf_bkp
Log applied.
Media recovery complete.
SQL> alter database open resetlogs;

SQL> show parameter db_name
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
db_name string QWT
SQL>
SQL>
SQL> shut immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup
ORA-32004: obsolete or deprecated parameter(s) specified for RDBMS instance
ORACLE instance started.
Total System Global Area 5.3447E+10 bytes
Fixed Size 2265864 bytes
Variable Size 1.7717E+10 bytes
Database Buffers 3.5702E+10 bytes
Redo Buffers 26480640 bytes
Database mounted.
Database opened.

SQL> select name from V$database;

NAME

---------

QWT

===============================EOD===============================

Monday, January 11, 2021

CRS STACK :- Series 3

Cluster Ready Service STACK.

Below are the Components of CRS Stack.

Cluster Synchronization Service.
Oracle ASM.
Cluster Time Synchronization Service
Event Management
GRID NAMING Service.
Oracle Agent.
Oracle Notification Service.
Oracle Root Agent.

Event Management:-

This background process publishes events to all the members of the cluster. On Linux/Unix, the process name is evmd.bin, and on Windows, it is evmd.exe.

[root@rac1 ~]# ps -ef|grep evmd
grid 2317 1 1 06:46 ? 00:00:55 /u002/app/oracle/product/12.1.0/grid/bin/evmd.bin
grid 2426 2317 0 06:46 ? 00:00:15 /u002/app/oracle/product/12.1.0/grid/bin/evmlogger.bin -o /u002/app/oracle/product/12.1.0/grid/log/[HOSTNAME]/evmd/evmlogger.info -l /u002/app/oracle/product/12.1.0/grid/log/[HOSTNAME]/evm /evmlogger.log.

EVMD process run as Grid user.
EVMD process spawn the child process evmlogger. and generate the event when something happened in the cluster.
EVMD process scan the callout directory to invoke the callout message.
EVMD process will restart on failure & does not bring node reboot. It will start automatically.

[root@rac1 evmd]# ps -ef|grep evmd.bin
grid 2317 1 1 07:36 ? 00:01:10 /u002/app/oracle/product/12.1.0/grid/bin/evmd.bin
root 9175 5073 0 09:04 pts/0 00:00:00 grep --color=auto evmd.bin
[root@rac1 evmd]# kill -9 2317
[root@rac1 evmd]# ps -ef|grep evmd.bin
grid 10510 1 3 09:08 ? 00:00:00 /u002/app/oracle/product/12.1.0/grid/bin/evmd.bin
root 10528 5073 0 09:08 pts/0 00:00:00 grep --color=auto evmd.bin
[root@rac1 evmd]#

EVMD process received the FAN event posted by client and distributes the FAN event to client which subscribed to them.

Lets Consider two Node cluster RAC1,RAC2 , If rac2 leave the cluster , the OCSSD process on RAC1 post a leave FAN event and then EVMD publish and distribute this message to CRSD process on Node A.
CRS Process is always Communicate with EVMD process because its Subscribe to EVMD demon for FAN event.

GRID NAMING Service.:-

This process provides the name resolutions with the cluster.

From 12cR1 Version, GNS can be used for multiple clusters in contrast to the single-cluster version.

GNS is used to simplify network management.

If GNS used then No need to manually set the SCAN & VIP address.

It also eliminates the need to configure VIP names and SCAN names in a Domain Name Server (DNS) as GNS provides the name resolution for the cluster.

When we Select GNS configuration then then it will GNS take responsibility in DNS for cluster Node.

1. Allocate one static IP address on public network for the GNS VIP.
2. Setup GNS as a delegate for the cluster's sub-domain on the DNS server.
3. Validate that there is a DHCP server that is addressable by the cluster nodes.

When installing RAC , there are 2 options:

1. Configure GNS and let it handle name resolution in the cluster subdomain.
2. Do not select GNS and configure each Node and SCAN name with IP addresses defined in DNS.

Oracle ASM:-

From 11g the ASM is important part of RAC. and its provide the Disk, Volume management in Cluster ware and oracle database.

below Are the ASM process.

[root@rac1 evmd]# ps -ef|grep asm
grid 3149 1 0 07:37 ? 00:00:00 asm_pmon_+ASM1
grid 3151 1 0 07:37 ? 00:00:02 asm_psp0_+ASM1
grid 3154 1 2 07:37 ? 00:03:32 asm_vktm_+ASM1
grid 3158 1 0 07:37 ? 00:00:06 asm_gen0_+ASM1
grid 3160 1 0 07:37 ? 00:00:00 asm_mman_+ASM1
grid 3164 1 0 07:37 ? 00:00:10 asm_diag_+ASM1
grid 3166 1 0 07:37 ? 00:00:00 asm_ping_+ASM1
grid 3168 1 0 07:37 ? 00:00:23 asm_dia0_+ASM1
grid 3170 1 0 07:37 ? 00:00:27 asm_lmon_+ASM1
grid 3172 1 0 07:37 ? 00:00:13 asm_lmd0_+ASM1
grid 3174 1 0 07:37 ? 00:00:41 asm_lms0_+ASM1
grid 3178 1 0 07:37 ? 00:00:15 asm_lmhb_+ASM1
grid 3180 1 0 07:37 ? 00:00:00 asm_lck1_+ASM1
grid 3182 1 0 07:37 ? 00:00:00 asm_dbw0_+ASM1
grid 3184 1 0 07:37 ? 00:00:00 asm_lgwr_+ASM1
grid 3186 1 0 07:37 ? 00:00:00 asm_ckpt_+ASM1
grid 3188 1 0 07:37 ? 00:00:00 asm_smon_+ASM1
grid 3190 1 0 07:37 ? 00:00:00 asm_lreg_+ASM1
grid 3192 1 0 07:37 ? 00:00:00 asm_pxmn_+ASM1
grid 3194 1 0 07:37 ? 00:00:01 asm_rbal_+ASM1
grid 3196 1 0 07:37 ? 00:00:04 asm_gmon_+ASM1
grid 3198 1 0 07:37 ? 00:00:00 asm_mmon_+ASM1
grid 3201 1 0 07:37 ? 00:00:02 asm_mmnl_+ASM1
grid 3203 1 0 07:37 ? 00:00:02 asm_lck0_+ASM1
grid 3205 1 0 07:37 ? 00:00:00 asm_gcr0_+ASM1
grid 3328 1 0 07:38 ? 00:00:00 asm_asmb_+ASM1
grid 3330 1 0 07:38 ? 00:00:00 oracle+ASM1_asmb_+asm1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
grid 3889 1 0 07:38 ? 00:00:00 mdb_asmb_-MGMTDB
grid 3893 1 0 07:38 ? 00:00:00 oracle+ASM1_asmb_-mgmtdb (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle 8443 1 0 09:03 ? 00:00:00 ora_asmb_TEST1
grid 8458 1 0 09:03 ? 00:00:00 oracle+ASM1_asmb_test1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))

Cluster Time Synchronization Service :-

CTSS introduce in 11g. and its handle time synchronization among the node in cluster.

Either you use NTP or CTSS to synch up time among cluster Node.

If NTP service disabled then CTSS provide time synchronization Service.

[root@rac1 evmd]# ps -ef|grep ctss

root 2943 1 0 07:37 ? 00:01:23 /u002/app/oracle/product/12.1.0/grid/bin/octssd.bin reboot

Oracle Notification Service:-

ONS installed& configured during the Clustered installation.

State of resources in cluster has changed then ONS process communicate with each other on node & published the message about resource change.

CRS closely work with ONS process. HA event information triggered by CRS and shared with ONS Process & this ONS process publish this HA event with Middleware tier.

ONS process play very critical role while sharing any FAN event with client.

to use this setup then we have to build some logic so application can get HA event information quickly.

HA events are used for multiple things for eg to detect failure quickly in cluster. & publish the HA event (FAN) Failure Application notification. This FAN event are Published by ONS.

Oracle Agent:-

[root@rac1 evmd]# ps -ef|grep oraagent

grid 2301 1 1 07:36 ? 00:01:5 /u002/app/oracle/product/12.1.0/grid/bin/oraagent.bin
grid 3424 1 1 07:38 ? 00:02:52 /u002/app/oracle/product/12.1.0/grid/bin/oraagent.bin
oracle 8299 1 0 09:03 ? 00:00:44 /u002/app/oracle/product/12.1.0/grid/bin/oraagent.bin

Run by Grid & oracle User & handle OAHSD grid, oracle owned resources.

Run a server callout scripts when FAN event Occur.

Oracle Root Agent.:-

The orarootagent is the agent responsible for managing all root-owned ohasd resources.

Its help CRSD process to manage the root owned resource.

for eg Network,VIP etc.

==============================------EOD----==============================

Friday, January 8, 2021

System Tablespace Growing Fast

Current size is 44GB in my Prod Database.

select sum(bytes)/1024/1024/1024 "GB", TABLESPACE_NAME from dba_data_files group by TABLESPACE_NAME order by 1;

GB TABLESPACE_NAME
------------------------ ------------------------------
44 SYSTEM

Free Size is 500 MB only.

SQL> select sum(bytes)/1024/1024 from dba_free_space

2 where TABLESPACE_NAME='SYSTEM';

SUM(BYTES)/1024/1024

--------------------

561.625

Now Let me see who consuming More space.

SQL> column segment_name format a30

set numf 999,999,999,999,999,999

select * from

(select segment_name, bytes from dba_segments where tablespace_name = 'SYSTEM' order by bytes desc)

where rownum < 26;

SEGMENT_NAME BYTES

------------------------------ ------------------------

AUD$ 32,827,768,832 =====> 32 GB too Much space Consume.

SOURCE$ 3,489,660,928

IDL_UB1$ 1,902,116,864

I_SOURCE1 1,412,431,872

IDL_UB2$ 752,877,568

ARGUMENT$ 738,197,504

C_OBJ# 574,619,648

I_ARGUMENT1 478,150,656

I_ARGUMENT2 377,487,360

VIEW$ 370,147,328

C_TOID_VERSION# 291,504,128

SEGMENT_NAME BYTES

------------------------------ ------------------------

I_COL2 270,532,608

I_COL1 234,881,024

IDL_CHAR$ 218,103,808

I_COL3 142,606,336

DEPENDENCY$ 125,829,120

IDL_SB4$ 117,440,512

ACCESS$ 109,051,904

I_DEPENDENCY2 102,760,448

C_OBJ#_INTCOL# 100,663,296

KOTAD$ 87,031,808

HIST_HEAD$ 83,886,080

SEGMENT_NAME BYTES

------------------------------ ------------------------

I_DEPENDENCY1 83,886,080

OBJ$ 57,671,680

I_ACCESS1 55,574,528

25 rows selected.

Lets Reduced size.

create table tmpAud.

SQL> sho user
USER is "SYS"
SQL>

create table tmpAud as

select * from sys.aud$
where to_date(trunc(ntimestamp#)) > trunc(sysdate-60);

Table created.

Cleanup and coalesce aud$

SQL> truncate table sys.aud$;

Table truncated.

-- Repopulate
SQL> insert into sys.aud$ select * from tmpAud;
4407589 rows created.
SQL> commit;
Commit complete.

SQL> drop table tmpAud;
Table dropped.
SQL>

Now Lets Re-Check the space size.

SQL> column segment_name format a30

set numf 999,999,999,999,999,999

select * from

(select segment_name, bytes from dba_segments where tablespace_name = 'SYSTEM' order by bytes desc)

where rownum < 26;

SEGMENT_NAME BYTES

------------------------------ ------------------------

SOURCE$ 3,489,660,928

IDL_UB1$ 1,902,116,864

I_SOURCE1 1,412,431,872

AUD$ 897,581,056 ===> We Achieved.

SQL>

select sum(bytes)/1024/1024/1024 from SM$TS_FREE

where TABLESPACE_NAME='SYSTEM';

SUM(BYTES)/1024/1024/1024

-------------------------

30.7945557 ====> 30 GB space free. We are good

===EOD==

Sunday, January 3, 2021

Cluster Synchronization Service ===> series 2

Cluster Synchronization Service.

CRS STACK COMPONENTS

Cluster Ready Service.
Cluster Synchronization Service.
Oracle ASM.
Cluster Time Synchronization Service
Event Management
GRID NAMING Service.
Oracle Agent.
Oracle Notification Service.
Oracle Root Agent.

Cluster Synchronization Service:-

OCSSD (CSS daemon) - This process is spawned by the cssdagent process. It runs in both
clusterware and non-clusterware environments.

Its Muti-threaded Process.

OCSSD can also evict a node after escalation of a member kill from a client (such as a database LMON process). This is a multi-threaded process that runs at an elevated priority and runs as the Oracle/grid user.

This service manages and monitors the node membership in the cluster and updates the node status information in Voting Disk.

Its also update ,manage the cluster configuration, Node membership when new node added/remove in the cluster.

CSSDagent process monitor the cluster and provide I/O fencing.

This service runs as the ocssd.bin process on Linux/Unix and OracleOHService (ocssd.exe) on Windows.

Its provided synchronization Service among the node.

OCSSD run as Grid user. grid 20834 1 1 Jan02 ? 00:10:32 /u002/app/oracle/product/12.1.0/grid/bin/ocssd.bin.

OCCSD Configure and Maintain the Cluster using Node Cluster Service and Group membership Service. In back OCSSD use two type of service Network heartbeat & disk heartbeat service's.

Where Network Heartbeat :- This Ensure that Node are accessible in the cluster.

Disk Heartbeat:- It Ensure that the no Spilt brain happened in the Cluster. and Each Node periodically cast the vote in Voting Disk .

OCSSD provide lock Service , its mean the Cluster wide serialization locking function.

and its used the FIFO mechanism to manage the locking.

OCR Data also updated by OOCSSd process.

OCSSD's primary job is internode health monitoring and RDBMS instance endpoint discovery.

OCSSD process Log Location.

/u002/app/gridbase/diag/crs/rac2/crs/trace

if you kill or failure of ocssd.bin causes the reboot of Server to Avoid the Spilt brain situation. So its very critical processs.

[root@rac2 ~]# ps -ef|grep ocssd

grid 20834 1 1 Jan02 ? 00:10:32 /u002/app/oracle/product/12.1.0/grid/bin/ocssd.bin

root 22419 20010 0 16:05 pts/0 00:00:00 grep --color=auto ocssd

[root@rac2 ~]# kill -9 20834

[root@rac2 ~]#

My Node got rebooted.

[root@rac2 ~]# uptime

02:52:06 up 1 min, 1 user, load average: 1.09, 0.49, 0.19 ===> Node rebooted & started 1 min before.

You can see the rac2 is evicted from cluster on alert.log of Node1.

[root@rac1 trace]# tail -100f alert.log

2021-01-04 04:03:45.630 [OCSSD(10356)]CRS-1612: Network communication with node rac2 (2) missing for 50% of timeout interval. Removal of this node from cluster in 14.620 seconds

2021-01-04 04:03:53.634 [OCSSD(10356)]CRS-1611: Network communication with node rac2 (2) missing for 75% of timeout interval. Removal of this node from cluster in 6.620 seconds

2021-01-04 04:03:57.635 [OCSSD(10356)]CRS-1610: Network communication with node rac2 (2) missing for 90% of timeout interval. Removal of this node from cluster in 2.620 seconds

2021-01-04 04:04:00.257 [OCSSD(10356)]CRS-1632: Node rac2 is being removed from the cluster in cluster incarnation 503751816

2021-01-04 04:04:00.271 [OCSSD(10356)]CRS-1601: CSSD Reconfiguration complete. Active nodes are rac1 .

2021-01-04 04:04:00.282 [CRSD(9324)]CRS-5504: Node down event reported for node 'rac2'.

2021-01-04 04:04:07.255 [CRSD(9324)]CRS-2773: Server 'rac2' has been removed from pool 'Generic'.

2021-01-04 04:04:07.255 [CRSD(9324)]CRS-2773: Server 'rac2' has been removed from pool 'ora.RACDB'.

2021-01-04 04:04:07.256 [CRSD(9324)]CRS-2773: Server 'rac2' has been removed from pool 'ora.TEST'.

This particular eviction happened when we had hit the network timeout. CSSD exited and the cssdagent took action to evict. The cssdagent knows the information in the error message from local heartbeats made from CSSD.

If no message is in the evicted node's clusterware alert log, check the lastgasp logs on the local node and/or the clusterware alert logs of other nodes.

Startup sequence:

INIT --> init.ohasd --> ohasd --> ohasd.bin --> cssdagent --> ocssd --> ocssd.bin.

OCSSD :- This process is spawned by the cssdagent process.OCSSD's primary job is internode health monitoring and RDBMS instance endpoint discovery. One Thread Monitor network heartbeat. One thread monitor the Disk heartbeat.

CSSDAGENT - This process is spawned by OHASD and is responsible for spawning the OCSSD process, monitoring for node hangs (via oprocd functionality), and monitoring to the OCSSD process for hangs (via oclsomon functionality), and monitoring vendor clusterware (via vmon functionality).

Responsible for spawning ocssd.bin.

Monitor node & CSSD process for hangs.

This is a multi-threaded process that runs at an elevated priority and runs as the root user.

Startup sequence:

INIT --> init.ohasd --> ohasd --> ohasd.bin --> cssdagent

CSSDMONITOR - This proccess also monitors for node hangs (via oprocd functionality), monitors the OCSSD process for hangs (via oclsomon functionality), and monitors vendor clusterware (via vmon functionality).

This is a multi-threaded process that runs at an elevated priority and runs as the root user.

This Provide redundancy of CSSD Monitoring.

Monitor node & CSSD process for hangs.

Startup sequence:

INIT --> init.ohasd --> ohasd --> ohasd.bin --> cssdmonitor

=================================EOD================================

Reference:-Troubleshooting Clusterware Node Evictions (Reboots) (Doc ID 1050693.1)

Saturday, January 2, 2021

CRSD Cluster Ready Service ====>Series: 1

CRSD =========>Cluster Ready Service.

From 11g Oracle Cluster ware consist of two Stack .

OHASD STACK :- High Availability Service stack its at lower End of Cluster process & its consist of Set of background process.

CRS STACK :- CRS stack is separate stack , Which consist of set of background process. This mainly handle by CRSD

As I mentioned this two stack consist of set of background process which help to run the cluster resources. with the help of OHASD Stack CRS stack start/stop and provide its service to each resources in cluster.

CRS STACK COMPONENTS

Cluster Ready Service.
Cluster Synchronization Service.
Oracle ASM.
Cluster Time Synchronization Service
Event Management
GRID NAMING Service.
Oracle Agent.
Oracle Notification Service.
Oracle Root Agent.

Every process in CRS stack is critical. Let explore the CRS Stack first.

Cluster Ready Service.

[root@rac2 ~]# ps -ef|grep -v grep | grep crsd

root 32449 1 1 04:00 ? 00:02:27 /u002/app/oracle/product/12.1.0/grid/bin/crsd.bin reboot

[root@rac2 ~]#

Some Observation :-

1) CRSD process always run as a root operating system user.

2) CRSD Process is always run from Different Oracle Home that is GRID_HOME.

3) CRSD manage the cluster resources based on the configuration information stored on OCR.

4) CRSD process generate event when resources status has been changed.

5) CRSD start/stop/monitor/manage cluster resource ,oracle instance/Database , listener ,vip, ASM instance ,resource failover operation.

6)CRSD also responsible to automatically restart failed components.

7)CRSD responsible to updated status of cluster resource in OCR.

8)Public interface, private interface, Virtual IP (VIP) all this interface should be up and running to make the CRS workable. Each Node interface should pingable. Without this network interface CRS cannot installed.

CRSD Log file location:

/u002/app/gridbase/diag/crs/rac2/crs/trace/crsd_oraagent_oracle.trc

To find the Master node in cluster use the CRSD.trc file.

cd /u002/app/gridbase/diag/crs/rac2/crs/trace

[root@rac2 trace]# grep "OCR MASTER" crsd.trc

Missing CRSD process associated <node>.pid file ????====?

cd /u002/app/oracle/product/12.1.0/grid/crs/init

total 60

-rw-r--r--. 1 root root 4361 Sep 28 08:09 oka

-rw-r--r--. 1 root root 7250 Sep 28 08:09 ohasd.sles

-rw-r--r--. 1 root root 7160 Sep 28 08:09 ohasd

-rw-r--r--. 1 root root 9126 Sep 28 08:09 init.ohasd

-rw-r--r--. 1 root root 6159 Sep 28 08:09 afd.sles

-rw-r--r--. 1 root root 5905 Sep 28 08:09 afd

-rw-r--r--. 1 root root 0 Sep 28 08:13 rac2

-rw-r--r--. 1 root root 6 Dec 19 03:35 rac2.pid-bkp

-rw-r--r-- 1 root root 5 Dec 19 10:14 rac2.pid ========> this Node Pid file for CRSD

[root@rac2 init]#

lets remove rac2.pid file and restart the Cluster Services from Node2.

[root@rac2 init]# rm rac2.pid

rm: remove regular file ârac2.pidâ? yes

[root@rac2 init]#

[root@rac2 init]# /u002/app/oracle/product/12.1.0/grid/bin/crsctl start crs

[root@rac2 init]# /u002/app/oracle/product/12.1.0/grid/bin/crsctl check crs

CRS-4638: Oracle High Availability Services is online

CRS-4535: Cannot communicate with Cluster Ready Services =======> you will get famous error

CRS-4530: Communications failure contacting Cluster Synchronization Services daemon

CRS-4534: Cannot communicate with Event Manager

[root@rac2 init]#

CRS-4535: Cannot communicate with Cluster Ready Services.

Error: PROC-32: Cluster Ready Services on the local node is not running Messaging error [gipcretConnectionRefused] [29]

Please check below trace & log file for more trouble shoot.

/u002/app/gridbase/diag/crs/rac2/crs/trace/ohasd_orarootagent_root.trc

2020-12-19 22:46:08.316548 :CLSDYNAM:1366206208: [ora.crsd]{0:0:2} [check] PID FILE doesn't exist.

2020-12-19 22:46:08.316557 :CLSDYNAM:1366206208: [ora.crsd]{0:0:2} [check] PID from /u002/app/oracle/product/12.1.0/grid/crs/init/rac2.pid

2020-12-19 22:46:08.316663 : CLSDMC:1366206208: Connecting to ipc://rac2_DBG_CRSD

2020-12-19 22:46:08.316773 : CLSDMC:1366206208: Error: gipcWait for gipcConnect - ret_gipcreqinfo=gipcretConnectionRefused, type_gipcreqinfo=gipcreqtypeConnect

2020-12-19 22:46:08.316815 :CLSDYNAM:1366206208: [ora.crsd]{0:0:2} [check] ClsdmClient::sendMessage clsdmc_send error rmsg:0 ecode:-7 errbuf:

2020-12-19 22:46:08.316840 :CLSDYNAM:1366206208: [ora.crsd]{0:0:2} [check] Calling PID check for daemon

You can see the straight forward message about missing of crs process pid.

Messages or Syslog from all nodes from the time of the problem:

Linux: /var/log/messages ==============>
Sun: /var/adm/messages
HP-UX: /var/adm/syslog/syslog.log
IBM: /bin/errpt -a > messages.out

[root@rac2 trace]# grep CRSD alert.log =====> Checking CRSD status

2021-01-02 22:45:52.543 [CRSD(14337)]CRS-1012: The OCR service started on node rac2.

2021-01-02 22:45:54.451 [CRSD(14337)]CRS-1201: CRSD started on node rac2.

2021-01-02 22:51:26.657 [CRSD(17255)]CRS-8500: Oracle Clusterware CRSD process is starting with operating system process ID 17255

[root@rac2 trace]# date

Sat Jan 2 22:52:43 IST 2021

[root@rac2 trace]# ps -ef|grep 17255

root 17657 2664 0 22:52 pts/0 00:00:00 grep --color=auto 17255

[root@rac2 trace]#

Still No startup of Process.

Without CRS Process none of the process has been started. for eg Database instance,ASM ,Listener

even This below stat command also not working

[root@rac2 trace]# /u002/app/oracle/product/12.1.0/grid/bin/crsctl stat res -t

CRS-4535: Cannot communicate with Cluster Ready Services

CRS-4000: Command Status failed, or completed with errors.

[root@rac2 trace]#

[root@rac2 trace]# ps -ef|grep pmon

root 3322 25238 0 02:35 pts/2 00:00:00 grep --color=auto pmon

[root@rac2 trace]# ps -ef|grep tns

root 22 2 0 Dec18 ? 00:00:00 [netns]

root 3339 25238 0 02:35 pts/2 00:00:00 grep --color=auto tns

[root@rac2 trace]#

[root@rac2 trace]# /u002/app/oracle/product/12.1.0/grid/bin/crsctl stat res -t -init

--------------------------------------------------------------------------------

Name Target State Server State details

--------------------------------------------------------------------------------

Cluster Resources

--------------------------------------------------------------------------------

ora.asm

1 ONLINE OFFLINE STABLE

ora.cluster_interconnect.haip

1 ONLINE ONLINE rac2 STABLE

ora.crf

1 ONLINE OFFLINE STABLE

ora.crsd

1 ONLINE OFFLINE STABLE

ora.cssd

1 ONLINE ONLINE rac2 STABLE

ora.cssdmonitor

1 ONLINE ONLINE rac2 STABLE

ora.ctssd

1 ONLINE OFFLINE STABLE

ora.diskmon

1 OFFLINE OFFLINE STABLE

ora.evmd

1 ONLINE INTERMEDIATE rac2 STABLE

ora.gipcd

1 ONLINE ONLINE rac2 STABLE

ora.gpnpd

1 ONLINE ONLINE rac2 STABLE

ora.mdnsd

1 ONLINE ONLINE rac2 STABLE

ora.storage

1 ONLINE OFFLINE STABLE

--------------------------------------------------------------------------------

Now lets resolve this issue.

Simply create file rac2.pid <node>.pid name file under ============>/u002/app/oracle/product/12.1.0/grid/crs/init

[root@rac2 init]# touch rac2.pid

[root@rac2 init]#

[root@rac2 init]# touch rac2.pid

[root@rac2 init]# ls -ltr

total 64

-rw-r--r--. 1 root root 4361 Sep 28 08:09 oka

-rw-r--r--. 1 root root 7250 Sep 28 08:09 ohasd.sles

-rw-r--r--. 1 root root 7160 Sep 28 08:09 ohasd

-rw-r--r--. 1 root root 9126 Sep 28 08:09 init.ohasd

-rw-r--r--. 1 root root 6159 Sep 28 08:09 afd.sles

-rw-r--r--. 1 root root 5905 Sep 28 08:09 afd

-rw-r--r--. 1 root root 0 Sep 28 08:13 rac2

-rw-r--r--. 1 root root 6 Dec 19 03:35 rac2.pid-bkp

-rw-r--r-- 1 root root 5 Dec 19 10:14 rac2.pid_bkp_1

-rw-r--r-- 1 root root 6 Jan 2 22:45 rac2.pid_bkp

-rw-r--r-- 1 root root 0 Jan 2 22:53 rac2.pid

Now Restart the OHASD and cluster process

[root@rac2 init]# /u002/app/oracle/product/12.1.0/grid/bin/crsctl stop crs -f ======>

CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'rac2'

CRS-2673: Attempting to stop 'ora.mdnsd' on 'rac2'

CRS-2673: Attempting to stop 'ora.gpnpd' on 'rac2'

CRS-2673: Attempting to stop 'ora.evmd' on 'rac2'

CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'rac2'

CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'rac2' succeeded

CRS-2677: Stop of 'ora.mdnsd' on 'rac2' succeeded

CRS-2677: Stop of 'ora.evmd' on 'rac2' succeeded

CRS-2673: Attempting to stop 'ora.cssd' on 'rac2'

CRS-2677: Stop of 'ora.gpnpd' on 'rac2' succeeded

CRS-2677: Stop of 'ora.cssd' on 'rac2' succeeded

CRS-2673: Attempting to stop 'ora.gipcd' on 'rac2'

CRS-2677: Stop of 'ora.gipcd' on 'rac2' succeeded

CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'rac2' has completed

CRS-4133: Oracle High Availability Services has been stopped.

[root@rac2 init]# ls -ltr

total 68

-rw-r--r--. 1 root root 4361 Sep 28 08:09 oka

-rw-r--r--. 1 root root 7250 Sep 28 08:09 ohasd.sles

-rw-r--r--. 1 root root 7160 Sep 28 08:09 ohasd

-rw-r--r--. 1 root root 9126 Sep 28 08:09 init.ohasd

-rw-r--r--. 1 root root 6159 Sep 28 08:09 afd.sles

-rw-r--r--. 1 root root 5905 Sep 28 08:09 afd

-rw-r--r--. 1 root root 0 Sep 28 08:13 rac2

-rw-r--r--. 1 root root 6 Dec 19 03:35 rac2.pid-bkp

-rw-r--r-- 1 root root 5 Dec 19 10:14 rac2.pid_bkp_1

-rw-r--r-- 1 root root 6 Jan 2 22:45 rac2.pid_bkp

-rw-r--r-- 1 root root 6 Jan 2 23:02 rac2.pid =======>CRSD process id has been added.

[root@rac2 init]#

[root@rac2 init]# /u002/app/oracle/product/12.1.0/grid/bin/crsctl start crs.

[root@rac2 init]# /u002/app/oracle/product/12.1.0/grid/bin/crsctl check crs

CRS-4638: Oracle High Availability Services is online

CRS-4537: Cluster Ready Services is online

CRS-4529: Cluster Synchronization Services is online

CRS-4533: Event Manager is online

[root@rac2 init]#

[root@rac2 init]# ps -ef|grep pmon

grid 21444 1 0 23:03 ? 00:00:00 asm_pmon_+ASM2

oracle 21586 1 0 23:03 ? 00:00:00 ora_pmon_TEST2

root 23606 4013 0 23:08 pts/1 00:00:00 grep --color=auto pmon

[root@rac2 init]# ps -ef|grep tns

root 22 2 0 22:13 ? 00:00:00 [netns]

grid 21269 1 0 23:03 ? 00:00:00 /u002/app/oracle/product/12.1.0/grid/bin/tnslsnr ASMNET1LSNR_ASM -no_crs_notify -inherit

grid 21304 1 0 23:03 ? 00:00:00 /u002/app/oracle/product/12.1.0/grid/bin/tnslsnr LISTENER -no_crs_notify -inherit

grid 21380 1 0 23:03 ? 00:00:00 /u002/app/oracle/product/12.1.0/grid/bin/tnslsnr LISTENER_SCAN1 -no_crs_notify -inherit

root 23626 4013 0 23:08 pts/1 00:00:00 grep --color=auto tns

[root@rac2 init]#

We can see the CRS process has been started.

[root@rac2 init]# ps -p 21098 -o cmd

CMD

/u002/app/oracle/product/12.1.0/grid/bin/crsd.bin reboot

[root@rac2 init]#

2021-01-02 23:02:44.955 [OCSSD(20834)]CRS-1601: CSSD Reconfiguration complete. Active nodes are rac1 rac2 .

2021-01-02 23:02:47.671 [ORAROOTAGENT(21014)]CRS-8500: Oracle Clusterware ORAROOTAGENT process is starting with operating system process ID 21014

2021-01-02 23:02:47.713 [OCTSSD(21027)]CRS-8500: Oracle Clusterware OCTSSD process is starting with operating system process ID 21027

2021-01-02 23:02:48.861 [OCTSSD(21027)]CRS-2407: The new Cluster Time Synchronization Service reference node is host rac1.

2021-01-02 23:02:48.865 [OCTSSD(21027)]CRS-2401: The Cluster Time Synchronization Service started on host rac2.

2021-01-02 23:02:48.831 [OCTSSD(21027)]CRS-2408: The clock on host rac2 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

2021-01-02 23:02:57.657 [OSYSMOND(21091)]CRS-8500: Oracle Clusterware OSYSMOND process is starting with operating system process ID 21091

2021-01-02 23:02:58.684 [CRSD(21098)]CRS-8500: Oracle Clusterware CRSD process is starting with operating system process ID 21098

2021-01-02 23:03:01.625 [CRSD(21098)]CRS-1012: The OCR service started on node rac2.

2021-01-02 23:03:01.676 [CRSD(21098)]CRS-1201: CRSD started on node rac2.

=================================EOD=================================

Other Reason's are to get below infamous message is ======>

CRS-4535: Cannot communicate with Cluster Ready Services.

OCR is inaccessible
ocr.loc content mismatch with other cluster nodes.
Difference between time in cluster Node.

Nilesh Oracle DBA Blog

Tuesday, January 12, 2021

How to Change Database Name Using Control file trace

Monday, January 11, 2021

CRS STACK :- Series 3

Friday, January 8, 2021

System Tablespace Growing Fast

System Tablespace Growing Fast

-- Repopulate
SQL> insert into sys.aud$ select * from tmpAud;
4407589 rows created.
SQL> commit;
Commit complete.

SQL> drop table tmpAud;
Table dropped.
SQL>

Sunday, January 3, 2021

Cluster Synchronization Service ===> series 2

Saturday, January 2, 2021

CRSD Cluster Ready Service ====>Series: 1

Report Abuse

Labels

Tuesday, January 12, 2021

How to Change Database Name Using Control file trace

Monday, January 11, 2021

CRS STACK :- Series 3

Friday, January 8, 2021

System Tablespace Growing Fast

System Tablespace Growing Fast

-- RepopulateSQL> insert into sys.aud$ select * from tmpAud;4407589 rows created.SQL> commit;Commit complete.

SQL> drop table tmpAud;Table dropped.SQL>

Sunday, January 3, 2021

Cluster Synchronization Service ===> series 2

Saturday, January 2, 2021

CRSD Cluster Ready Service ====>Series: 1

-- Repopulate
SQL> insert into sys.aud$ select * from tmpAud;
4407589 rows created.
SQL> commit;
Commit complete.

SQL> drop table tmpAud;
Table dropped.
SQL>