FlashGrid Storage Fabric Maintenance Tasks (On-Premises Only)


Rebooting one node

Note: Do not use this procedure if you need to restart the entire cluster. Instead, see instructions for restarting the entire cluster.

To reboot one node in a running cluster

  1. Make sure there are no other nodes that are in offline or re-syncing state. All disk groups must have zero offline disks and Resync = No.

    # flashgrid-cluster

  2. If the node is a database node, stop all local database instances running on the node.

  3. Reboot the node using flashgrid-node command. It will gracefully put the corresponding failure group offline.

    # flashgrid-node reboot

  4. After the nodes boots up, wait until re-synchronization completes for all disk groups before rebooting or powering off any other node.

Restarting the entire cluster

In some cases it may be desirable to restart all nodes of the cluster simultaneously instead of rebooting them one by one.

Note: Do not reboot all nodes simultaneously using reboot or flashgrid-node reboot command. This may lead to CRS failure to start if one node goes down when CRS is already starting on another node.

To restart the entire cluster

  1. Stop all running databases.

  2. Stop Oracle cluster services on all nodes.

    # crsctl stop cluster -all

  3. Power all nodes off.

  4. Start all nodes

Powering off one node

To power off one node in a running cluster

  1. Make sure there are no other nodes that are in offline or re-syncing state. All disk groups must have zero offline disks and Resync = No.

    # flashgrid-cluster

  2. If the node is a database node, stop all local database instances running on the node.

  3. Power off the node using flashgrid-node command. It will gracefully put the corresponding failure group offline.

    # flashgrid-node poweroff

  4. After powering up the node, wait until re-synchronization completes for all disk groups before rebooting or powering off any other node.

Shutting down the entire cluster

To shut the entire cluster down

  1. Stop all running databases.

  2. Stop Oracle cluster services on all nodes.

    # crsctl stop cluster -all

  3. Power all nodes off.

Adding SSDs

To add new hot-plug SSDs in a running cluster

  1. Plug in new SSDs.

  2. Use flashgrid-cluster drives command to determine FlashGrid names of the SSDs, e.g. rac2.newserialnumber

  3. Run flashgrid-create-dg to create a new disk group with the new SSDs or add the new SSDs to the ASM disk group. Example:

    $ flashgrid-dg add-disks -G MYDG -d /dev/flashgrid/rac1.newserialnumber1 /dev/flashgrid/rac2.newserialnumber2


Re-adding a lost disk

ASM will drop a disk from a disk group if the disk stays offline for longer than the disk repair time. If the disk was taken offline because of an intermittent problem, for example a network problem, and it is in a physically good condition then you can re-add such disk to the disk group. Force option must be used for re-adding such disk because it already contains ASM metadata.

Example of re-adding a regular disk:

$ flashgrid-dg add-disks -G MYDG -d /dev/flashgrid/rac2.newserialnumber -f

Example of re-adding a quorum disk:

$ flashgrid-dg add-disks -G MYDG -q /dev/flashgrid/racq.quorumdiskname -f


Replacing a failed SSD (hot-plug 2.5”)

To replace a failed SSD in a running cluster

  1. Use flashgrid-cluster drives command to determine the following information about the failed SSD:
  • FlashGrid name of the SSD, e.g. rac2.failedserialnumber
  • ASM name of the SSD, e.g. RAC2$FAILEDSERIALNUMBER
  • slot number where the SSD is installed
  • whether the ASM disk is online, offline, or dropped (ASMStatus=N/A)
  1. Drop the failed SSD from the ASM disk group if it has not been dropped yet. Examples:

    a. If the failing ASM disk is still online:

    SQL> alter diskgroup MYDG drop disk RAC2$FAILEDSERIALNUMBER rebalance wait;

    b. If the failed ASM disk is offline, but has not been dropped by ASM:

    SQL> alter diskgroup MYDG drop disk RAC2$FAILEDSERIALNUMBER force;

  2. Physically remove the failed SSD.

  3. Plug in a new SSD in the same drive slot.

  4. Use flashgrid-node command to determine its FlashGrid name, e.g. rac2.newserialnumber

  5. Add the new SSD to the ASM disk group that the failed SSD was in. Example:

    $ flashgrid-dg add-disks -G MYDG -d /dev/flashgrid/rac2.newserialnumber

If you have to re-add the same SSD that was used before or add a different SSD that already has ASM metadata on it then you have to add it using the force option -f instead of -d. Example:

$ flashgrid-dg add-disks -G MYDG -d /dev/flashgrid/rac2.newserialnumber -f


Replacing a failed SSD (add-in card)

To replace a failed SSD in a running cluster

  1. Use flashgrid-cluster drives command to determine the following information about the failed SSD:
  • FlashGrid name of the SSD, e.g. rac2.failedserialnumber
  • ASM name of the SSD, e.g. RAC2$FAILEDSERIALNUMBER
  • slot number where the SSD is installed
  • whether the ASM disk is online, offline, or dropped (ASMStatus=N/A)
  1. Drop the failed SSD from the ASM disk group if it has not been dropped yet. Examples:

    a. If the failing ASM disk is still online:

    SQL> alter diskgroup MYDG drop disk RAC2$FAILEDSERIALNUMBER rebalance wait;

    b. If the failed ASM disk is offline, but has not been dropped by ASM:

    SQL> alter diskgroup MYDG drop disk RAC2$FAILEDSERIALNUMBER force;

  2. Use flashgrid-node utility to power off the node where the failed SSD is located:

    # flashgrid-node poweroff

  3. Physically remove the failed SSD.

  4. Plug in a new SSD in the same PCIe slot.

  5. Power on the node.

  6. Use flashgrid-node command to determine FlashGrid name of the new SSD, e.g. rac2.newserialnumber

  7. Add the new SSD to the ASM disk group that the failed SSD was in. Example:

    $ flashgrid-dg add-disks -G MYDG -d /dev/flashgrid/rac2.newserialnumber

If you have to re-add the same SSD that was used before or add a different SSD that already has ASM metadata on it then you need to use the force option -f. Example:

$ flashgrid-dg add-disks -G MYDG -d /dev/flashgrid/rac2.newserialnumber -f


Removing SSDs

To remove hot-plug SSDs from a running cluster

  1. Use flashgrid-cluster drives command to determine the following information about the SSDs that will be removed:
  • ASM name of the SSD, e.g. RAC2$SERIALNUMBER
  • Slot numbers where the SSDs are installed
  1. If the SSDs are members of an ASM disk group then drop the SSD from the disk group. Example:

    SQL> alter diskgroup MYDG    
    drop disk RAC1$SERIALNUMBER1    
    drop disk RAC2$SERIALNUMBER2    
    rebalance wait;
  2. Prepare the SSDs for removal. Example:

    [root@rac1 ~] # flashgrid-node stop-target /dev/flashgrid/rac1.serialnumber1
    [root@rac2 ~] # flashgrid-node stop-target /dev/flashgrid/rac2.serialnumber2
  3. Physically remove the SSDs.

Replacing a failed server

To replace a failed server with new hardware in a running cluster

  1. Move all SSDs from the failed server to a new server. Make sure to install the SSDs in the same slots where they were installed before.

  2. Make sure the host name or alias of the new server is the same as it was on the failed server.

  3. On any other node of the cluster run flashgrid-ca

  4. Select ‘Modify existing configuration’ and press Next

  5. Press Next

  6. Select ‘SAVE & APPLY changes without restarting FlashGrid’

  7. Press Save

Adding a node to a cluster

Note: Contact FlashGrid support if you need to add a new node in a cluster that has Extended Cluster feature enabled in Grid Infrastructure 12.2.

To add a new node to an existing FlashGrid cluster

  1. Prepare the new node for adding it to the cluster. See the following sections of this guide:
  • Installing and Configuring OS
  • Installing FlashGrid Software
  • Configuring Storage Network
  • Creating LVM Volumes for Quorum and GRID Disks
  1. Run flashgrid-ca on any node that is already a member of the cluster

  2. Select ‘Modify existing configuration’ and press Next

  3. Select ‘Cluster nodes’ and press Next

  4. Enter short host name for the node and select its role from the scroll down list

  5. Press Next

  6. Select ‘SAVE & APPLY changes without restarting FlashGrid’

  7. Press Save

Removing a node from a cluster

To remove a node from a FlashGrid cluster

  1. Stop FlashGrid services on the node that you want to remove

    # flashgrid-node stop

  2. Run flashgrid-ca on any node of the cluster

  3. Select ‘Modify existing configuration’ and press Next

  4. Select ‘Cluster nodes’ and press Next

  5. Enter short host name for the node and select its role from the scroll down list

  6. Press Next

  7. Select ‘SAVE & APPLY changes without restarting FlashGrid’

  8. Press Save

Updating FlashGrid software

The following procedure applies to minor updates. Minor updates are those that have the same first two numbers in the version number, for example, from 18.1.30 to 18.1.50. However, update from 18.1 to 18.9 is considered major and may require a different procedure. Contact FlashGrid support for assistance if you need to do a major version update.

To update FlashGrid Storage Fabric and/or FlashGrid Cloud Area Network software on a running cluster repeat the following steps on each node, one node at a time

  1. Make sure there are no other nodes that are in offline or re-syncing state. All disk groups must have zero offline disks and Resync = No.

    # flashgrid-cluster

  2. If the node is a database node,

    a. Stop all local database instances running on the node.

    b. Stop Oracle CRS:

    # crsctl stop crs

  3. Stop the FlashGrid Storage Fabric services:

    # flashgrid-node stop

  4. Stop the FlashGrid Cloud Area Network service, if installed:

    # systemctl stop flashgrid-clan

  5. Update the flashgrid-sf and flashgrid-clan (if installed) RPMs using yum or rpm tool

  6. Start the FlashGrid Cloud Area Network service, if installed:

    # systemctl start flashgrid-clan

  7. Start the FlashGrid Storage Fabric service

    # flashgrid-node start

  8. If the node has ASM installed on it then start Oracle services:

    # systemctl start oracle-ohasd    
    # crsctl start crs -wait
  9. Wait until all disks are back online and resyncing operations complete on all disk groups before updating the next node. All disk groups must have zero offline disks and Resync = No.

    # flashgrid-cluster

To update FlashGrid Diagnostics

  1. Update the flashgrid-diags RPMs using yum or rpm tool

  2. Restart the flashgrid-node-monitor service:

    # systemctl restart flashgrid-node-monitor


Updating OS

FlashGrid recommmends applying only those OS updates that include security patches. This minimizes the risk of compatibility issues introduced by the updates.

To update OS on a running cluster repeat the following steps on each node, one node at a time

  1. Install OS updates
    # yum --security update-minimal
  2. Follow the steps for rebooting the node