Saturday, January 2, 2021

CRSD Cluster Ready Service ====>Series: 1

                                   CRSD =========>Cluster Ready Service.               

           From 11g Oracle Cluster ware consist of two Stack .

OHASD STACK :- High Availability Service stack its at lower End of Cluster process & its consist of Set of background process.

CRS STACK :- CRS stack is separate stack , Which consist of set of background process. This mainly handle by CRSD 

As I mentioned this two stack consist of set of background process which help to run the cluster resources.  with the help of OHASD Stack CRS stack start/stop and provide its service  to each resources in cluster.

                         CRS STACK COMPONENTS

Cluster Ready Service.

Cluster Synchronization Service.

Oracle ASM.

Cluster Time Synchronization Service 

Event Management

GRID NAMING Service.

Oracle Agent.

Oracle Notification Service.

Oracle Root Agent. 


Every process in CRS stack is critical. Let explore the CRS Stack first. 


Cluster Ready Service.


[root@rac2 ~]# ps -ef|grep -v grep | grep crsd

root     32449     1  1 04:00 ?        00:02:27 /u002/app/oracle/product/12.1.0/grid/bin/crsd.bin reboot

[root@rac2 ~]#

Some Observation :- 

1) CRSD process always run as a root operating system user.

2) CRSD Process is always run from Different Oracle Home that is GRID_HOME.

3) CRSD manage the cluster resources based on the configuration information stored on OCR.

4) CRSD process generate event when resources status has been changed.

5) CRSD start/stop/monitor/manage cluster resource ,oracle instance/Database , listener ,vip, ASM instance ,resource failover operation.

6)CRSD also responsible to automatically restart failed components.

7)CRSD responsible to updated status of  cluster resource in OCR. 

8)Public interface, private interface, Virtual IP (VIP) all this interface should be up and running to make the CRS workable. Each Node interface should pingable. Without this network interface CRS cannot installed. 

  

CRSD Log file location:

 /u002/app/gridbase/diag/crs/rac2/crs/trace/crsd_oraagent_oracle.trc

To find the Master node in cluster  use the CRSD.trc file.

cd /u002/app/gridbase/diag/crs/rac2/crs/trace

[root@rac2 trace]# grep "OCR MASTER" crsd.trc


Missing  CRSD process associated <node>.pid file  ????====?

cd /u002/app/oracle/product/12.1.0/grid/crs/init

total 60
-rw-r--r--. 1 root root 4361 Sep 28 08:09 oka
-rw-r--r--. 1 root root 7250 Sep 28 08:09 ohasd.sles
-rw-r--r--. 1 root root 7160 Sep 28 08:09 ohasd
-rw-r--r--. 1 root root 9126 Sep 28 08:09 init.ohasd
-rw-r--r--. 1 root root 6159 Sep 28 08:09 afd.sles
-rw-r--r--. 1 root root 5905 Sep 28 08:09 afd
-rw-r--r--. 1 root root    0 Sep 28 08:13 rac2
-rw-r--r--. 1 root root    6 Dec 19 03:35 rac2.pid-bkp
-rw-r--r--  1 root root    5 Dec 19 10:14 rac2.pid ========> this Node Pid file for CRSD
[root@rac2 init]#

lets remove rac2.pid file and restart the Cluster Services from Node2.


[root@rac2 init]# rm rac2.pid
rm: remove regular file ârac2.pidâ? yes
[root@rac2 init]#

[root@rac2 init]# /u002/app/oracle/product/12.1.0/grid/bin/crsctl start crs


[root@rac2 init]# /u002/app/oracle/product/12.1.0/grid/bin/crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services =======> you will get famous error 
CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
CRS-4534: Cannot communicate with Event Manager
[root@rac2 init]#


CRS-4535: Cannot communicate with Cluster Ready Services.

Error: PROC-32: Cluster Ready Services on the local node is not running Messaging error [gipcretConnectionRefused] [29]


Please check below trace & log file for more trouble shoot.


/u002/app/gridbase/diag/crs/rac2/crs/trace/ohasd_orarootagent_root.trc


2020-12-19 22:46:08.316548 :CLSDYNAM:1366206208: [ora.crsd]{0:0:2} [check] PID FILE doesn't exist.
2020-12-19 22:46:08.316557 :CLSDYNAM:1366206208: [ora.crsd]{0:0:2} [check] PID  from /u002/app/oracle/product/12.1.0/grid/crs/init/rac2.pid
2020-12-19 22:46:08.316663 :  CLSDMC:1366206208: Connecting to ipc://rac2_DBG_CRSD
2020-12-19 22:46:08.316773 :  CLSDMC:1366206208: Error: gipcWait for gipcConnect - ret_gipcreqinfo=gipcretConnectionRefused, type_gipcreqinfo=gipcreqtypeConnect
2020-12-19 22:46:08.316815 :CLSDYNAM:1366206208: [ora.crsd]{0:0:2} [check] ClsdmClient::sendMessage clsdmc_send error rmsg:0 ecode:-7 errbuf:
2020-12-19 22:46:08.316840 :CLSDYNAM:1366206208: [ora.crsd]{0:0:2} [check] Calling PID check for daemon

You can see the straight forward message about missing of crs process pid.

Messages or Syslog from all nodes from the time of the problem:

  • Linux: /var/log/messages  ==============>
  • Sun: /var/adm/messages
  • HP-UX: /var/adm/syslog/syslog.log
  • IBM: /bin/errpt -a > messages.out
[root@rac2 trace]# grep CRSD alert.log =====> Checking CRSD status

2021-01-02 22:45:52.543 [CRSD(14337)]CRS-1012: The OCR service started on node rac2.
2021-01-02 22:45:54.451 [CRSD(14337)]CRS-1201: CRSD started on node rac2.
2021-01-02 22:51:26.657 [CRSD(17255)]CRS-8500: Oracle Clusterware CRSD process is starting with operating system process ID 17255
[root@rac2 trace]# date
Sat Jan  2 22:52:43 IST 2021
[root@rac2 trace]# ps -ef|grep 17255
root     17657  2664  0 22:52 pts/0    00:00:00 grep --color=auto 17255
[root@rac2 trace]#

Still No startup of Process.


Without CRS Process none of the process has been started. for eg Database instance,ASM ,Listener 

even   This below stat command also not working 

[root@rac2 trace]# /u002/app/oracle/product/12.1.0/grid/bin/crsctl stat res -t
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Status failed, or completed with errors.
[root@rac2 trace]#

[root@rac2 trace]# ps -ef|grep pmon

root      3322 25238  0 02:35 pts/2    00:00:00 grep --color=auto pmon

[root@rac2 trace]# ps -ef|grep tns

root        22     2  0 Dec18 ?        00:00:00 [netns]

root      3339 25238  0 02:35 pts/2    00:00:00 grep --color=auto tns

[root@rac2 trace]#



[root@rac2 trace]# /u002/app/oracle/product/12.1.0/grid/bin/crsctl stat res -t -init
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  OFFLINE                               STABLE
ora.cluster_interconnect.haip
      1        ONLINE  ONLINE       rac2                     STABLE
ora.crf
      1        ONLINE  OFFLINE                               STABLE
ora.crsd
      1        ONLINE  OFFLINE                               STABLE
ora.cssd
      1        ONLINE  ONLINE       rac2                     STABLE
ora.cssdmonitor
      1        ONLINE  ONLINE       rac2                     STABLE
ora.ctssd
      1        ONLINE  OFFLINE                               STABLE
ora.diskmon
      1        OFFLINE OFFLINE                               STABLE
ora.evmd
      1        ONLINE  INTERMEDIATE rac2                     STABLE
ora.gipcd
      1        ONLINE  ONLINE       rac2                     STABLE
ora.gpnpd
      1        ONLINE  ONLINE       rac2                     STABLE
ora.mdnsd
      1        ONLINE  ONLINE       rac2                     STABLE
ora.storage
      1        ONLINE  OFFLINE                               STABLE
--------------------------------------------------------------------------------

Now lets resolve this issue.

Simply create file rac2.pid  <node>.pid  name file under ============>/u002/app/oracle/product/12.1.0/grid/crs/init 

[root@rac2 init]# touch rac2.pid

[root@rac2 init]#

[root@rac2 init]# touch rac2.pid

[root@rac2 init]# ls -ltr

total 64

-rw-r--r--. 1 root root 4361 Sep 28 08:09 oka

-rw-r--r--. 1 root root 7250 Sep 28 08:09 ohasd.sles

-rw-r--r--. 1 root root 7160 Sep 28 08:09 ohasd

-rw-r--r--. 1 root root 9126 Sep 28 08:09 init.ohasd

-rw-r--r--. 1 root root 6159 Sep 28 08:09 afd.sles

-rw-r--r--. 1 root root 5905 Sep 28 08:09 afd

-rw-r--r--. 1 root root    0 Sep 28 08:13 rac2

-rw-r--r--. 1 root root    6 Dec 19 03:35 rac2.pid-bkp

-rw-r--r--  1 root root    5 Dec 19 10:14 rac2.pid_bkp_1

-rw-r--r--  1 root root    6 Jan  2 22:45 rac2.pid_bkp

-rw-r--r--  1 root root    0 Jan  2 22:53 rac2.pid



Now Restart the OHASD and cluster process

[root@rac2 init]# /u002/app/oracle/product/12.1.0/grid/bin/crsctl stop crs -f ======>

CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'rac2'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'rac2'
CRS-2673: Attempting to stop 'ora.gpnpd' on 'rac2'
CRS-2673: Attempting to stop 'ora.evmd' on 'rac2'
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'rac2'
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'rac2' succeeded
CRS-2677: Stop of 'ora.mdnsd' on 'rac2' succeeded
CRS-2677: Stop of 'ora.evmd' on 'rac2' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'rac2'
CRS-2677: Stop of 'ora.gpnpd' on 'rac2' succeeded
CRS-2677: Stop of 'ora.cssd' on 'rac2' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'rac2'
CRS-2677: Stop of 'ora.gipcd' on 'rac2' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'rac2' has completed
CRS-4133: Oracle High Availability Services has been stopped.

[root@rac2 init]# ls -ltr

total 68

-rw-r--r--. 1 root root 4361 Sep 28 08:09 oka

-rw-r--r--. 1 root root 7250 Sep 28 08:09 ohasd.sles

-rw-r--r--. 1 root root 7160 Sep 28 08:09 ohasd

-rw-r--r--. 1 root root 9126 Sep 28 08:09 init.ohasd

-rw-r--r--. 1 root root 6159 Sep 28 08:09 afd.sles

-rw-r--r--. 1 root root 5905 Sep 28 08:09 afd

-rw-r--r--. 1 root root    0 Sep 28 08:13 rac2

-rw-r--r--. 1 root root    6 Dec 19 03:35 rac2.pid-bkp

-rw-r--r--  1 root root    5 Dec 19 10:14 rac2.pid_bkp_1

-rw-r--r--  1 root root    6 Jan  2 22:45 rac2.pid_bkp

-rw-r--r--  1 root root    6 Jan  2 23:02 rac2.pid =======>CRSD process id has been added.

[root@rac2 init]#



[root@rac2 init]# /u002/app/oracle/product/12.1.0/grid/bin/crsctl start crs.

[root@rac2 init]# /u002/app/oracle/product/12.1.0/grid/bin/crsctl check crs

CRS-4638: Oracle High Availability Services is online

CRS-4537: Cluster Ready Services is online

CRS-4529: Cluster Synchronization Services is online

CRS-4533: Event Manager is online

[root@rac2 init]#


[root@rac2 init]# ps -ef|grep pmon

grid     21444     1  0 23:03 ?        00:00:00 asm_pmon_+ASM2

oracle   21586     1  0 23:03 ?        00:00:00 ora_pmon_TEST2

root     23606  4013  0 23:08 pts/1    00:00:00 grep --color=auto pmon

[root@rac2 init]# ps -ef|grep tns

root        22     2  0 22:13 ?        00:00:00 [netns]

grid     21269     1  0 23:03 ?        00:00:00 /u002/app/oracle/product/12.1.0/grid/bin/tnslsnr ASMNET1LSNR_ASM -no_crs_notify -inherit

grid     21304     1  0 23:03 ?        00:00:00 /u002/app/oracle/product/12.1.0/grid/bin/tnslsnr LISTENER -no_crs_notify -inherit

grid     21380     1  0 23:03 ?        00:00:00 /u002/app/oracle/product/12.1.0/grid/bin/tnslsnr LISTENER_SCAN1 -no_crs_notify -inherit

root     23626  4013  0 23:08 pts/1    00:00:00 grep --color=auto tns

[root@rac2 init]#


We can see the CRS process has been started.

[root@rac2 init]# ps -p 21098 -o cmd
CMD
/u002/app/oracle/product/12.1.0/grid/bin/crsd.bin reboot

[root@rac2 init]#

2021-01-02 23:02:44.955 [OCSSD(20834)]CRS-1601: CSSD Reconfiguration complete. Active nodes are rac1 rac2 .
2021-01-02 23:02:47.671 [ORAROOTAGENT(21014)]CRS-8500: Oracle Clusterware ORAROOTAGENT process is starting with operating system process ID 21014
2021-01-02 23:02:47.713 [OCTSSD(21027)]CRS-8500: Oracle Clusterware OCTSSD process is starting with operating system process ID 21027
2021-01-02 23:02:48.861 [OCTSSD(21027)]CRS-2407: The new Cluster Time Synchronization Service reference node is host rac1.
2021-01-02 23:02:48.865 [OCTSSD(21027)]CRS-2401: The Cluster Time Synchronization Service started on host rac2.
2021-01-02 23:02:48.831 [OCTSSD(21027)]CRS-2408: The clock on host rac2 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
2021-01-02 23:02:57.657 [OSYSMOND(21091)]CRS-8500: Oracle Clusterware OSYSMOND process is starting with operating system process ID 21091
2021-01-02 23:02:58.684 [CRSD(21098)]CRS-8500: Oracle Clusterware CRSD process is starting with operating system process ID 21098
2021-01-02 23:03:01.625 [CRSD(21098)]CRS-1012: The OCR service started on node rac2.
2021-01-02 23:03:01.676 [CRSD(21098)]CRS-1201: CRSD started on node rac2.


=================================EOD=================================

Other Reason's are  to get below infamous message is ======>

CRS-4535: Cannot communicate with Cluster Ready Services.

  • OCR is inaccessible
  • ocr.loc content mismatch with other cluster nodes.
  • Difference between time  in cluster Node.

No comments:

Post a Comment