Tuesday, July 12, 2011

Add node fails due to time synchronization issue with NTPD and CTSSD

Today while checking the status of a four node Oracle 11gR2 RAC environment, I noticed something was wrong with time synchronization between the cluster nodes. Even though I had our system administrator configure NTPD for the environment, the Cluster Verification Utility (CVU) failed on NTPD errors and showed the Cluster Synchronization Services Daemon (CTSSD) in Observer mode when I ran a check of the clock synchronization:

oracle@rac1 ~]$ cluvfy comp clocksync -n all

Verifying Clock Synchronization across the cluster nodes

Checking if Clusterware is installed on all nodes...
Check of Clusterware install passed

Checking if CTSS Resource is running on all nodes...
CTSS resource check passed

Querying CTSS for time offset on all nodes...
Query of CTSS for time offset passed

Check CTSS state started...
CTSS is in Observer state. Switching over to clock synchronization checks using NTP

Starting Clock synchronization checks using Network Time Protocol(NTP)...

NTP Configuration file check started...
NTP Configuration file check passed

Checking daemon liveness...
Liveness check passed for "ntpd"
Check for NTP daemon or service alive passed on all nodes

NTP daemon slewing option check failed on some nodes
Check failed on nodes:
PRVF-5436 : The NTP daemon running on one or more nodes lacks the slewing option "-x"
Clock synchronization check using Network Time Protocol(NTP) failed

PRVF-9652 : Cluster Time Synchronization Services check failed

Verification of Clock Synchronization across the cluster nodes was unsuccessful on all the specified nodes.

Aha! So CTSSD must have an NTPD server in slewing option mode or it will fail to synchronize the cluster nodes correctly.

The solution to this is to shutdown the Oracle RAC database environment as well as to shutdown ASM and the clusterware and then to restart ntpd on the Oracle RAC cluster nodes and app tier server host with the –x option

Previous to this, I had verified that ntpd was running however it was started by default mode and not with the -x option as shown below:

[root@rac1 ~]# service ntpd status
ntpd (pid 24396) is running...
[root@rac1 ~]# ps -ef|grep ntpd
root 15495 8369 0 09:52 pts/1 00:00:00 grep ntpd
ntp 24396 1 0 Jul11 ? 00:00:00 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g

You can check to see if the -x flag has been set by examination of the /etc/sysconfig/ntpd file.

[root@rac1 ~]# grep OPTIONS /etc/sysconfig/ntpd
OPTIONS="-u ntp:ntp -p /var/run/ntpd.pid"

I found the following two My Oracle Support (http://support.oracle.com) notes useful while solving this issue with Oracle 11gR2 RAC and time synchronization issues.

MOS 1054006.1- CTSSD Runs in Observer Mode Even Though No Time Sync Software is Running
MOS 1056693.1- How to Configure NTP or Windows Time to Resolve CLUVFY Error PRVF-5436 PRV-9652

No comments: