Troubleshooting: Cluster initialization problems in AWS/Azure


Cannot SSH to the first node of the cluster

Symptoms

SSH connection fails

Solution

  1. Double-check the IP address
  2. If connection is refused, wait 2 minutes after the VM is launched
  3. If authentication fails, make sure to connect as user fg@ and provide the private key corresponding to the public key provided when creating the cluster.
  4. In the security group settings check that TCP port 22 is open for inbound connections from the system with the SSH client

"Cluster initialization failed" message

Symptoms

After logging in to the first node you see "Cluster initialization failed" message

Solution

  1. On the first node open /tmp/fg_setup.log file and check for an error message at the end of the file.
  2. If DNS server check has failed then verify that the DNS server IP address you provided in Cloud Provisioning tool is correct. DNS server is required for downloading Oracle installation files. You can check whether DNS works by running nslookup flashgrid.io on the first node.
  3. If there is some other error reported then contact FlashGrid support

Cluster initialization not completing

Symptoms

More than 2 hours passed after creating the cluster, but when logging in to the first node you still see "Cluster initialization in progress" message.

Solution

On the first node open /tmp/fg_setup.log file and scroll to the bottom. If you see repeated SSH errors trying to connect to another node then the security group setting might be incorrect. If you are creating the cluster in an existing VPC/VNet then check that the corresponding security group has UDP ports 4801, 4802, 4803 open between the group members.