PeopleSoft

Production Cluster Part Two

April 10, 2007 · Leave a Comment

After the re-installation of Clusterware it turns out the original problem – Node (Blade) 5 not joining the cluster was still there.

At this point we determined that we had read through every log file, examined every core dump, trace file and we next started comparing the OS system configuration including kernel parameters, etc. of all the boxes to the blades we used to earlier build the four node QA Cluster.

I read through just about every Metalink note on Clusterware reported problems and resolutions and I recalled one installation where they had a problem with the network adapters. I then started focusing on using traceroute, ping and the UNIX Administrator installed some trace tools and started going through the process of validating the switch configurations, etc.

The problem turned out to be another host on the network was assigned the same IP address that had been assigned to the cluster private interconnect for Node (Blade) 5. At that point we requested and received a new IP address, reconfigured the Network adapter and attempted to start up the Clusterware processes for Node 5. This time the processes came up and stayed up.

This raises a more important issue in how did an adapter that was assigned to be the private interconnect adapter whose traffic should not be leaving the subnet collided with a public adapter on another node. We haven’t completed our investigation into this problem but we will begin investigation of this matter at the beginning of next week April 16th.

Categories: Clusterware · Oracle
Tagged: ,

0 responses so far ↓

  • There are no comments yet...Kick things off by filling out the form below.

Leave a Comment