CRSD =========>Cluster Ready Service.
From 11g Oracle Cluster ware consist of two Stack .
OHASD STACK :- High Availability Service stack its at lower End of Cluster process & its consist of Set of background process.
CRS STACK :- CRS stack is separate stack , Which consist of set of background process. This mainly handle by CRSD
As I mentioned this two stack consist of set of background process which help to run the cluster resources. with the help of OHASD Stack CRS stack start/stop and provide its service to each resources in cluster.
CRS STACK COMPONENTS
Cluster Ready Service.
Cluster Synchronization Service.
Oracle ASM.
Cluster Time Synchronization Service
Event Management
GRID NAMING Service.
Oracle Agent.
Oracle Notification Service.
Oracle Root Agent.
Every process in CRS stack is critical. Let explore the CRS Stack first.
Cluster Ready Service.
[root@rac2 ~]# ps -ef|grep -v grep | grep crsd
root 32449 1 1 04:00 ? 00:02:27 /u002/app/oracle/product/12.1.0/grid/bin/crsd.bin reboot
[root@rac2 ~]#
Some Observation :-
1) CRSD process always run as a root operating system user.
2) CRSD Process is always run from Different Oracle Home that is GRID_HOME.
3) CRSD manage the cluster resources based on the configuration information stored on OCR.
4) CRSD process generate event when resources status has been changed.
5) CRSD start/stop/monitor/manage cluster resource ,oracle instance/Database , listener ,vip, ASM instance ,resource failover operation.
6)CRSD also responsible to automatically restart failed components.
7)CRSD responsible to updated status of cluster resource in OCR.
8)Public interface, private interface, Virtual IP (VIP) all this interface should be up and running to make the CRS workable. Each Node interface should pingable. Without this network interface CRS cannot installed.
CRSD Log file location:
/u002/app/gridbase/diag/crs/rac2/crs/trace/crsd_oraagent_oracle.trc
To find the Master node in cluster use the CRSD.trc file.
cd /u002/app/gridbase/diag/crs/rac2/crs/trace
[root@rac2 trace]# grep "OCR MASTER" crsd.trc
Missing CRSD process associated <node>.pid file ????====?
cd /u002/app/oracle/product/12.1.0/grid/crs/init
total 60
-rw-r--r--. 1 root root 4361 Sep 28 08:09 oka
-rw-r--r--. 1 root root 7250 Sep 28 08:09 ohasd.sles
-rw-r--r--. 1 root root 7160 Sep 28 08:09 ohasd
-rw-r--r--. 1 root root 9126 Sep 28 08:09 init.ohasd
-rw-r--r--. 1 root root 6159 Sep 28 08:09 afd.sles
-rw-r--r--. 1 root root 5905 Sep 28 08:09 afd
-rw-r--r--. 1 root root 0 Sep 28 08:13 rac2
-rw-r--r--. 1 root root 6 Dec 19 03:35 rac2.pid-bkp
-rw-r--r-- 1 root root 5 Dec 19 10:14 rac2.pid ========> this Node Pid file for CRSD
[root@rac2 init]#
lets remove rac2.pid file and restart the Cluster Services from Node2.
[root@rac2 init]# rm rac2.pid
rm: remove regular file ârac2.pidâ? yes
[root@rac2 init]#
[root@rac2 init]# /u002/app/oracle/product/12.1.0/grid/bin/crsctl start crs
[root@rac2 init]# /u002/app/oracle/product/12.1.0/grid/bin/crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services =======> you will get famous error
CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
CRS-4534: Cannot communicate with Event Manager
[root@rac2 init]#
CRS-4535: Cannot communicate with Cluster Ready Services.
Error: PROC-32: Cluster Ready Services on the local node is not running Messaging error [gipcretConnectionRefused] [29]
Please check below trace & log file for more trouble shoot.
/u002/app/gridbase/diag/crs/rac2/crs/trace/ohasd_orarootagent_root.trc
2020-12-19 22:46:08.316548 :CLSDYNAM:1366206208: [ora.crsd]{0:0:2} [check] PID FILE doesn't exist.
2020-12-19 22:46:08.316557 :CLSDYNAM:1366206208: [ora.crsd]{0:0:2} [check] PID from /u002/app/oracle/product/12.1.0/grid/crs/init/rac2.pid
2020-12-19 22:46:08.316663 : CLSDMC:1366206208: Connecting to ipc://rac2_DBG_CRSD
2020-12-19 22:46:08.316773 : CLSDMC:1366206208: Error: gipcWait for gipcConnect - ret_gipcreqinfo=gipcretConnectionRefused, type_gipcreqinfo=gipcreqtypeConnect
2020-12-19 22:46:08.316815 :CLSDYNAM:1366206208: [ora.crsd]{0:0:2} [check] ClsdmClient::sendMessage clsdmc_send error rmsg:0 ecode:-7 errbuf:
2020-12-19 22:46:08.316840 :CLSDYNAM:1366206208: [ora.crsd]{0:0:2} [check] Calling PID check for daemon
You can see the straight forward message about missing of crs process pid.
Messages or Syslog from all nodes from the time of the problem:
- Linux: /var/log/messages ==============>
- Sun: /var/adm/messages
- HP-UX: /var/adm/syslog/syslog.log
- IBM: /bin/errpt -a > messages.out
[root@rac2 trace]# grep CRSD alert.log =====> Checking CRSD status
2021-01-02 22:45:52.543 [CRSD(14337)]CRS-1012: The OCR service started on node rac2.
2021-01-02 22:45:54.451 [CRSD(14337)]CRS-1201: CRSD started on node rac2.
2021-01-02 22:51:26.657 [CRSD(17255)]CRS-8500: Oracle Clusterware CRSD process is starting with operating system process ID 17255
[root@rac2 trace]# date
Sat Jan 2 22:52:43 IST 2021
[root@rac2 trace]# ps -ef|grep 17255
root 17657 2664 0 22:52 pts/0 00:00:00 grep --color=auto 17255
[root@rac2 trace]#
Still No startup of Process.
Without CRS Process none of the process has been started. for eg Database instance,ASM ,Listener
even This below stat command also not working
[root@rac2 trace]# /u002/app/oracle/product/12.1.0/grid/bin/crsctl stat res -t
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Status failed, or completed with errors.
[root@rac2 trace]#
[root@rac2 trace]# ps -ef|grep pmon
root 3322 25238 0 02:35 pts/2 00:00:00 grep --color=auto pmon
[root@rac2 trace]# ps -ef|grep tns
root 22 2 0 Dec18 ? 00:00:00 [netns]
root 3339 25238 0 02:35 pts/2 00:00:00 grep --color=auto tns
[root@rac2 trace]#
[root@rac2 trace]# /u002/app/oracle/product/12.1.0/grid/bin/crsctl stat res -t -init
--------------------------------------------------------------------------------
Name Target State Server State details
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
1 ONLINE OFFLINE STABLE
ora.cluster_interconnect.haip
1 ONLINE ONLINE rac2 STABLE
ora.crf
1 ONLINE OFFLINE STABLE
ora.crsd
1 ONLINE OFFLINE STABLE
ora.cssd
1 ONLINE ONLINE rac2 STABLE
ora.cssdmonitor
1 ONLINE ONLINE rac2 STABLE
ora.ctssd
1 ONLINE OFFLINE STABLE
ora.diskmon
1 OFFLINE OFFLINE STABLE
ora.evmd
1 ONLINE INTERMEDIATE rac2 STABLE
ora.gipcd
1 ONLINE ONLINE rac2 STABLE
ora.gpnpd
1 ONLINE ONLINE rac2 STABLE
ora.mdnsd
1 ONLINE ONLINE rac2 STABLE
ora.storage
1 ONLINE OFFLINE STABLE
--------------------------------------------------------------------------------
Now lets resolve this issue.
Simply create file rac2.pid <node>.pid name file under ============>/u002/app/oracle/product/12.1.0/grid/crs/init
[root@rac2 init]# touch rac2.pid
[root@rac2 init]#
[root@rac2 init]# touch rac2.pid
[root@rac2 init]# ls -ltr
total 64
-rw-r--r--. 1 root root 4361 Sep 28 08:09 oka
-rw-r--r--. 1 root root 7250 Sep 28 08:09 ohasd.sles
-rw-r--r--. 1 root root 7160 Sep 28 08:09 ohasd
-rw-r--r--. 1 root root 9126 Sep 28 08:09 init.ohasd
-rw-r--r--. 1 root root 6159 Sep 28 08:09 afd.sles
-rw-r--r--. 1 root root 5905 Sep 28 08:09 afd
-rw-r--r--. 1 root root 0 Sep 28 08:13 rac2
-rw-r--r--. 1 root root 6 Dec 19 03:35 rac2.pid-bkp
-rw-r--r-- 1 root root 5 Dec 19 10:14 rac2.pid_bkp_1
-rw-r--r-- 1 root root 6 Jan 2 22:45 rac2.pid_bkp
-rw-r--r-- 1 root root 0 Jan 2 22:53 rac2.pid
Now Restart the OHASD and cluster process
[root@rac2 init]# /u002/app/oracle/product/12.1.0/grid/bin/crsctl stop crs -f ======>
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'rac2'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'rac2'
CRS-2673: Attempting to stop 'ora.gpnpd' on 'rac2'
CRS-2673: Attempting to stop 'ora.evmd' on 'rac2'
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'rac2'
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'rac2' succeeded
CRS-2677: Stop of 'ora.mdnsd' on 'rac2' succeeded
CRS-2677: Stop of 'ora.evmd' on 'rac2' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'rac2'
CRS-2677: Stop of 'ora.gpnpd' on 'rac2' succeeded
CRS-2677: Stop of 'ora.cssd' on 'rac2' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'rac2'
CRS-2677: Stop of 'ora.gipcd' on 'rac2' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'rac2' has completed
CRS-4133: Oracle High Availability Services has been stopped.
[root@rac2 init]# ls -ltr
total 68
-rw-r--r--. 1 root root 4361 Sep 28 08:09 oka
-rw-r--r--. 1 root root 7250 Sep 28 08:09 ohasd.sles
-rw-r--r--. 1 root root 7160 Sep 28 08:09 ohasd
-rw-r--r--. 1 root root 9126 Sep 28 08:09 init.ohasd
-rw-r--r--. 1 root root 6159 Sep 28 08:09 afd.sles
-rw-r--r--. 1 root root 5905 Sep 28 08:09 afd
-rw-r--r--. 1 root root 0 Sep 28 08:13 rac2
-rw-r--r--. 1 root root 6 Dec 19 03:35 rac2.pid-bkp
-rw-r--r-- 1 root root 5 Dec 19 10:14 rac2.pid_bkp_1
-rw-r--r-- 1 root root 6 Jan 2 22:45 rac2.pid_bkp
-rw-r--r-- 1 root root 6 Jan 2 23:02 rac2.pid =======>CRSD process id has been added.
[root@rac2 init]#
[root@rac2 init]# /u002/app/oracle/product/12.1.0/grid/bin/crsctl start crs.
[root@rac2 init]# /u002/app/oracle/product/12.1.0/grid/bin/crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
[root@rac2 init]#
[root@rac2 init]# ps -ef|grep pmon
grid 21444 1 0 23:03 ? 00:00:00 asm_pmon_+ASM2
oracle 21586 1 0 23:03 ? 00:00:00 ora_pmon_TEST2
root 23606 4013 0 23:08 pts/1 00:00:00 grep --color=auto pmon
[root@rac2 init]# ps -ef|grep tns
root 22 2 0 22:13 ? 00:00:00 [netns]
grid 21269 1 0 23:03 ? 00:00:00 /u002/app/oracle/product/12.1.0/grid/bin/tnslsnr ASMNET1LSNR_ASM -no_crs_notify -inherit
grid 21304 1 0 23:03 ? 00:00:00 /u002/app/oracle/product/12.1.0/grid/bin/tnslsnr LISTENER -no_crs_notify -inherit
grid 21380 1 0 23:03 ? 00:00:00 /u002/app/oracle/product/12.1.0/grid/bin/tnslsnr LISTENER_SCAN1 -no_crs_notify -inherit
root 23626 4013 0 23:08 pts/1 00:00:00 grep --color=auto tns
[root@rac2 init]#
We can see the CRS process has been started.
[root@rac2 init]# ps -p 21098 -o cmd
CMD
/u002/app/oracle/product/12.1.0/grid/bin/crsd.bin reboot
[root@rac2 init]#
2021-01-02 23:02:44.955 [OCSSD(20834)]CRS-1601: CSSD Reconfiguration complete. Active nodes are rac1 rac2 .
2021-01-02 23:02:47.671 [ORAROOTAGENT(21014)]CRS-8500: Oracle Clusterware ORAROOTAGENT process is starting with operating system process ID 21014
2021-01-02 23:02:47.713 [OCTSSD(21027)]CRS-8500: Oracle Clusterware OCTSSD process is starting with operating system process ID 21027
2021-01-02 23:02:48.861 [OCTSSD(21027)]CRS-2407: The new Cluster Time Synchronization Service reference node is host rac1.
2021-01-02 23:02:48.865 [OCTSSD(21027)]CRS-2401: The Cluster Time Synchronization Service started on host rac2.
2021-01-02 23:02:48.831 [OCTSSD(21027)]CRS-2408: The clock on host rac2 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
2021-01-02 23:02:57.657 [OSYSMOND(21091)]CRS-8500: Oracle Clusterware OSYSMOND process is starting with operating system process ID 21091
2021-01-02 23:02:58.684 [CRSD(21098)]CRS-8500: Oracle Clusterware CRSD process is starting with operating system process ID 21098
2021-01-02 23:03:01.625 [CRSD(21098)]CRS-1012: The OCR service started on node rac2.
2021-01-02 23:03:01.676 [CRSD(21098)]CRS-1201: CRSD started on node rac2.
=================================EOD=================================
Other Reason's are to get below infamous message is ======>
CRS-4535: Cannot communicate with Cluster Ready Services.
- OCR is inaccessible
- ocr.loc content mismatch with other cluster nodes.
- Difference between time in cluster Node.