Troubleshooting: CRS fails to start

Symptoms

All cluster nodes are up, but CRS is not able to start on one or more nodes.

Solution

  1. Run flashgrid-cluster command to confirm that all nodes are up and no nodes are shown as Inaccessible.
  2. Kill all ohasd.bin reboot processes on the database nodes where CRS fails to start:

    ps -ef | grep "ohasd.bin reboot" | grep -v grep | awk '{print $2}' | xargs kill -9
  3. Start CRS: crsctl start crs -wait

Cause of the problem

CRS may get stuck in a failed state, unable to open OCR, if during CRS start one or more disks containing voting files are offline. This is likely to happen in one of the following scenarios:

  • Manual start of CRS after the entire cluster was down and while one of the other nodes is still down
  • While CRS is starting on one of the nodes, another node is rebooted or stopped.

How to prevent the problem

  • Do not start CRS manually while any other node is down. If CRS autostart is enabled then CRS will start automatically after all nodes are up.
  • Always use flashgrid-node reboot command for rebooting a node.
  • Do not reboot more than one node at a time. You can reboot a node only after all other nodes are up, CRS is running, all disks are online, and there is no active Resync.
  • If you need to reboot the entire cluster, do not reboot the nodes simultaneously. Instead, stop all nodes and then start all nodes.