SIOS Protection Suite For Linux Network Attached Storage Reovery Kit v9.2.2
SIOS Protection Suite For Linux Network Attached Storage Reovery Kit v9.2.2
March 2018
This document and the information herein is the property of SIOS Technology Corp. (previously known as
SteelEye® Technology, Inc.) and all unauthorized use and reproduction is prohibited. SIOS Technology
Corp. makes no warranties with respect to the contents of this document and reserves the right to revise this
publication and make changes to the products described herein without prior notification. It is the policy of
SIOS Technology Corp. to improve products as new technology, components and software become
available. SIOS Technology Corp., therefore, reserves the right to change specifications without prior notice.
LifeKeeper, SteelEye and SteelEye DataKeeper are registered trademarks of SIOS Technology Corp.
Other brand and product names used herein are for identification purposes only and may be trademarks of
their respective companies.
To maintain the quality of our publications, we welcome your comments on the accuracy, clarity,
organization, and value of this document.
Copyright © 2018
By SIOS Technology Corp.
San Mateo, CA U.S.A.
All rights reserved
Table of Contents
Chapter 1: Introduction 1
NAS Recovery Kit Technical Documentation 1
Document Contents 1
Chapter 2: Requirements 2
Hardware Requirements 2
Software Requirements 2
Chapter 3: Overview 3
LifeKeeper for Linux NAS Recovery Kit 3
Configuration Examples 6
Chapter 6: Troubleshooting 18
Error Messages 18
Table of Contents
i
Chapter 1: Introduction
Document Contents
This guide contain the following topics:
l Documentation and References.Provides a list of LifeKeeper for Linux documentation and where to
find them.
l Requirements.A description of the hardware and software necessary to properly setup, install, and
operate the NAS Recovery Kit. Refer to SIOS Protection Suite Installation Guide for specific instruc-
tions on how to install or remove LifeKeeper for Linux software.NAS Recovery Kit .
l Configuring the LifeKeeper for Linux NAS Recovery Kit.A description of the procedures required to
properly configure the NAS Recovery Kit.
l LifeKeeper Configuration Tasks.A description of the tasks for creating and managing your NAS
resource hierarchies using the LifeKeeper GUI.
l Troubleshooting.A list of LifeKeeper for Linux error messages including a description for each.
This documentation, along with documentation associated with optional LifeKeeper Recovery Kits, is
available on the SIOS Technology Corp. website at:
http://docs.us.sios.com
Hardware Requirements
l Servers-LifeKeeper for Linux supported servers configured in accordance with the requirements
described in SIOS Protection Suite Installation Guideand SPS for Linux Release Notes.
l IP Network Interface Cards-Each server requires at least one Ethernet TCP/IP-supported network
interface card. Remember, however, that a LifeKeeper cluster requires two communications paths;
two separate LAN-based communication paths using dual independent sub-nets are recommended for
heartbeats, and at least one of these should be configured as a private network. Using a combination of
TCP and TTY heartbeats is also supported.
Software Requirements
l TCP/IP software-Each server in your LifeKeeper configuration requires TCP/IP software.
l LifeKeeper Software。It is imperative that you install the same version of the LifeKeeper for Linux soft-
ware and apply the same versions of the LifeKeeper for Linux software patches to each server in your
cluster.
l LifeKeeper for Linux NAS Recovery Kit - The NAS Recovery Kit is provided on the SPS Installation
Image File (sps.img). It is packaged, installed and removed via the Red Hat Package Manager, rpm.
The following rpm file is supplied on the SPS Installation Image File (sps.img):
steeleye-lkNAS
l Linux software - Each server in your cluster must have the util-linux package installed and
configured prior to configuring LifeKeeper and the LifeKeeper NAS Recovery Kit. The NAS Recovery
Kit requires version 2-9u or later of the util-linux package to assure proper functionality.
Please see SIOS Protection Suite Installation Guide for specific instructions on the installation and
removal of the LifeKeeper for Linux software.
The NAS Recovery Kit enables the creation of LifeKeeper resource hierarchies on LifeKeeper protected
servers or clients that have imported (mounted) an exported Network File System (NFS) from either a
Network Attached Storage device or an NFS server in the cluster. When a failure is detected on the node in
the cluster where the exported file system is mounted, the NAS Recovery Kit initiates a fail over to the
predetermined backup node.
Therefore, once the exported file system is mounted on a LifeKeeper server or client, it can be fully utilized as
an additional storage basis for LifeKeeper hierarchies.
When you elect to use an exported file system as a storage medium, LifeKeeper does not require you to
protect the server where the file system is exported. However, to achieve a greater degree of availability,
users are encouraged to use the LifeKeeper for Linux NFS Server Recovery Kit to protect the server from
failure where the file system is exported.
Resource hierarchies for the NAS Recovery Kit are created using the currently existing File System Recovery
Kit available with the LifeKeeper Core product (steeleye-lk package).
While the NAS Recovery Kit delivers several advantages, the two most significant advantages are the
elimination of the need for costly shared-storage devices and the capability to have multi-node cluster
configurations.
l The NAS Recovery Kit does not provide protection for your Network Attached Storage device. The
objective of this kit is to expand LifeKeeper storage options into the Network Attached Storage arena.
l The NAS Recovery Kit does not permit the NFS file system to be mounted more than once on different
mount points. Attempts to create hierarchies when the file system is found in the /etc/fstab file multiple
times will fail.
l File systems to be protected by the NAS Recovery Kit should be mounted using the IP address rather
than the host name (for example, 100.99.100.9/dir instead of server1/dir). This will avoid potential DNS
or host file lookup problems. Mounting via host name will result in a “bad mount” being detected, after
which LifeKeeper will unmount and re-mount the file system using the IP address. The unmount
process could kill processes that are currently using the mount point.
Configuration Considerations
The following should be considered before operating the LifeKeeper for Linux NAS Recovery Kit:
1. Install the NAS Recovery Kit on the server(s) in your cluster configuration where you wish to mount
your exported file systems and where you will extend your NAS resource hierarchy. You can export
your file system from either a NFS server, which may be protected by LifeKeeper (this is the recom-
mended configuration), or from a Network Attached Storage device.
2. To ensure proper execution of this kit, it is highly recommended that you mount your exported NFS file
system using the server’s IP address in place of the server name and that you perform your mount
operation before you place your file system under LifeKeeper protection. Additionally, if you are
mounting a file system that is currently protected by the LifeKeeper for Linux NFS Server Recovery
Kit, we strongly suggest that the IP address used to create the NFS Server hierarchy be used to mount
the file system on the LifeKeeper NAS server. Use the NFS mount option “intr” to ensure that
LifeKeeper can interrupt operations being performed on the file system. Failure to use this option can
result in a LifeKeeper failure.
3. To eliminate the possibility of split-brain related problems (i.e. more than one node in the cluster has a
hierarchy In Service Protected (ISP)), we highly recommend that you establish one of the
communication paths between nodes in the cluster on the same network used to access the exported
file system. Failure to comply with this recommendation can result in multiple nodes bringing the
hierarchy ISP (split-brain) when a communication path failure occurs. To recover from a split-brain
scenario, take all but one of the ISP hierarchies out of service. This will ensure that only one node has
access to the exported file system.
4. The built-in file system recovery kit used to build NAS hierarchies cannot detect and remove
processes not protected by LifeKeeper that are using the mounted file system in a fail over condition.
Therefore, it is highly recommended that only LifeKeeper protected processes use the NAS protected
file system.
5. The LKNFSTIMEOUT tunable represents the timeout in seconds the NAS Recovery Kit will use when
attempting to determine the status of a NFS mounted file system. The default value for this tunable is
set to 2 minutes. The LKNFSSYSCALLTO tunable represents the timeout in seconds the NAS Recov-
ery Kit will use for alarms to interrupt system calls when attempting to determine the status of a mount
point. Use the formula below to determine the value for this tunable:
3 times your LKNFSSYSCALLTO value plus 5 should be less than the value of LKNFSTIMEOUT.
6. The LKNASERROR tunable controls the actions the NAS Recovery kit takes when access to the
NAS device fails. The tunable has two values, switch and halt, with switch being the default. If the
value is set to switch and access fails, the NAS Recovery Kit will initiate a transfer of the resource
hierarchy to a backup server when the failure is detected. The attempt to transfer the resource
hierarchy to the backup server can hang if any of the resources sitting above the NAS resource attempt
to access anything on the NAS file system. To avoid this problem the tunable value can be set to halt,
which will immediately halt the system when an access failure is detected. This action will force a
failover of all resource hierarchies to the backup server.
7. STONITH devices or the Quorum/Witness package should be used so that a machine failure (all
comm paths are down) does not result in a split brain where all the NAS resources are in service on all
nodes in the cluster. This condition can lead to data corruption. More details on the Quorum/Witness
package can be found in the SIOS Protection Suite Technical Documentation.
Configuration Examples
A few examples of what happens during a fail over using LifeKeeper for Linux NAS Recovery Kit are provided
below.
In this configuration, Server 1 is considered active because it is running the NAS Recovery Kit software and
has imported (mounted) the file system from the NAS device. Server 2 does other processing. If Server 1
fails, Server 2 gains access to the file system and uses the LifeKeeper secondary hierarchy to make it
available to clients.
Configuration Notes:
l Server 2 should not access files and directories on the NAS device while Server 1 is active.
Note:In an active/standby configuration, Server 2 might be running the NAS Recovery Kit, but does not have
any other NAS resources under LifeKeeper protection.
An active/active configuration consists of two or more systems actively running the NAS Recovery Kit
software and importing file systems from NAS device(s).
Configuration Notes:
l Initially, Server 1 imports a file system and Server 2 imports a different file system. In a switchover
situation, one system can import both file systems.
The following tasks are available for configuring the LifeKeeper for Linux NAS Recovery Kit:
l Extend a Resource Hierarchy - Extends a NAS resource hierarchy from the primary server to the
backup server.
l Unextend a Resource Hierarchy - Unextends (removes) a NAS resource hierarchy from a single server
in the LifeKeeper cluster.
l Create Dependency - Creates a child dependency between an existing resource hierarchy and another
resource instance and propagates the dependency changes to all applicable servers in the cluster.
l Delete Dependency - Deletes a resource dependency and propagates the dependency changes to all
applicable servers in the cluster.
Note: Throughout the rest of this section, configuration tasks are performed using the Edit menu. You can
also perform most of these tasks:
1. From the toolbar, right - click on a global resource in the left pane of the status display.
2. Right - click on a resource instance in the right pane of the status display.
Note: Using the right-click method allows you to avoid entering information that is required when using
the Edit menu.
2. The “Select Recovery Kit” dialog appears. Select the File System option from the drop down list.
Simply put, a NAS Resource Hierarchy is a File System Hierarchy created using a NFS mounted file
system.
CAUTION:If you click the Cancel button at any time during the sequence of creating your
hierarchy, LifeKeeper will cancel the entire creation process.
3. The “Switchback Type” dialog appears. The switchback type determines how the NAS resource will
be switched back to the primary server when it becomes in-service (active) on the backup server fol-
lowing a failover. Switchback types are either intelligent or automatic. Intelligent switchback requires
administrative intervention to switch the resource back to the primary server while automatic switch-
back occurs as soon as the primary server is back on line and reestablishes LifeKeeper com-
munication paths.
4. The “Server” dialog appears. Select the name of the server where the NAS resource will be created
(typically this is your primary server). All servers in your cluster are included in the drop down list box.
5. Select the Mount Point path to be protected by the NAS (File System) Resource Hierarchy. All “local”
(i.e. file systems using shared storage) and NFS mounted file systems are listed. Select the NFS
mounted file system from the drop down list box.
6. The Root Tag dialog is automatically populated with a unique name for the resource instance on the tar-
get server (i.e. the server selected above). You may accept the default or enter a unique tag consisting
of letters, numbers and the following special characters: -,_, ., or /.
8. An information box appears announcing the successful creation of your NAS resource hierarchy. You
must Extend the hierarchy to another server in your cluster in order to place it under LifeKeeper pro-
tection.
1. From the LifeKeeper GUI menu, select Edit, then Resource. From the drop down menu, select Delete
Resource Hierarchy.
2. Select the name of the Target Server where you will be deleting your NAS resource hierarchy.
Note:If you selected the Delete Resource task by right-clicking from either the left pane on a global
resource or the right pane on an individual resource instance, this dialog will not appear.
3. Select the Hierarchy to Delete. Identify the resource hierarchy you wish to delete, and highlight it.
Note:If you selected the Delete Resource task by right-clicking from either the left pane on a global
resource or the right pane on an individual resource instance, this dialog will not appear.
4. An information box appears confirming your selection of the target server and the hierarchy you have
selected to delete.
5. An information box appears confirming that the NAS resource instance was deleted successfully.
1. When you successfully create your NAS resource hierarchy you will have an opportunity to select
Continue which will allow you to proceed with extending your resource hierarchy to your backup
server.
2. Right-click on an unextended hierarchy in either the left or right pane on the LifeKeeper GUI.
3. Select the “Extend Resource Hierarchy” task from the LifeKeeper GUI by selecting Edit, Resource,
Extend Resource Hierarchy from the drop down menu. This sequence of selections will launch the
Extend Resource Hierarchy wizard. The Accept Defaults button that is available for the Extend
Resource Hierarchy option is intended for the user who is familiar with the LifeKeeper Extend
Resource Hierarchy defaults and wants to quickly extend a LifeKeeper resource hierarchy without
being prompted for input or confirmation. Users who prefer to extend a LifeKeeper resource hierarchy
using the interactive, step-by- step interface of the GUI dialogs should use the Next button.
a. The first dialog box to appear will ask you to select the Template Server where your
NAS resource hierarchy is currently in service. Remember that the Template Server
you select now and the Tag to Extend that you select in the next dialog box represent an
in- service (activated) resource hierarchy. An error message will appear if you select a
resource tag that is not in service on the template server you have selected. The drop
down box in this dialog provides the names of all the servers in your cluster.
Note:If you are entering the Extend Resource Hierarchy task by continuing from the
creation of a NAS resource hierarchy, this dialog box will not appear because the wizard
has already identified the template server in the create stage. This is also the case when
you right-click on either the NAS resource icon in the left pane or right-click on the NAS
resource box in the right pane of the GUI window and choose Extend Resource
Hierarchy.
CAUTION:If you click the Cancel button at any time during the sequence of extending
your hierarchy, LifeKeeper will cancel the extend hierarchy process. However, if you
have already extended the resource to another server, that instance will continue to be in
effect until you specifically unextend it.
b. Select the Tag to Extend. This is the name of the NAS instance you wish to extend
from the template server to the target server. The wizard will list in the drop down box all
of the resources that you have created on the template server.
Note: Once again, if you are entering the Extend Resource Hierarchy task immediately
following the creation of a NAS hierarchy, this dialog box will not appear because the
wizard has already identified the tag name of your resource in the create stage. This is
also the case when you right-click on either the NAS resource icon in the left pane or on
the NAS resource box in the right pane of the GUI window and choose Extend Resource
Hierarchy.
c. Select the Target Server where you will extend your NAS resource hierarchy.
d. The Switchback Type dialog appears. The switchback type determines how the NAS
resource will be switched back to the primary server when it becomes in service (active)
on the backup server following a failover. Switchback types are either intelligent or auto-
matic. Intelligent switchback requires administrative intervention to switch the resource
back to the primary server while automatic switchback occurs as soon as the primary
server is back on line and reestablishes LifeKeeper communication paths.
e. Select or enter a Template Priority. This is the priority for the NAS hierarchy on the
server where it is currently in service. Any unused priority value from 1 to 999 is valid,
where a lower number means a higher priority (1=highest). The extend process will reject
any priority for this hierarchy that is already in use by another system. The default value
is recommended.
Note:This selection will appear only for the initial extend of the hierarchy.
f. Select or enter the Target Priority. This is the priority for the new extended NAS hier-
archy relative to equivalent hierarchies on other servers. Any unused priority value from 1
to 999 is valid, indicating a server’s priority in the cascading failover sequence for the
resource. A lower number means a higher priority (1=highest). Note that LifeKeeper
assigns the number “1” to the server on which the hierarchy is created by default. The pri-
orities need not be consecutive, but no two servers can have the same priority for a given
resource.
g. An information box appears explaining that LifeKeeper has successfully checked your
environment and that all requirements for extending this resource have been met. If there
are requirements that have not been met, LifeKeeper will disable the Next button, and
enable the Back button.
Click Finish to confirm the successful extension of your NAS resource instance.
Note: Be sure to test the functionality of the new instance on both servers.
2. Select the Target Server where you want to unextend the NAS resource. It cannot be the server where
the resource is currently in service (active).
Note: If you selected the Unextend task by right-clicking from either the left pane on a global resource
or the right pane on an individual resource instance, this dialog will not appear.
Note: If you selected the Unextend task by right-clicking from either the left pane on a global resource
or the right pane on an individual resource instance, this dialog will not appear.
4. An information box appears confirming the target server and the NAS resource hierarchy you have
chosen to unextend.
Click Unextend.
5. Another information box appears confirming that the NAS resource was unextended successfully.
If you execute the Out of Service request, the resource hierarchy is taken out-of-service without bringing it in-
service on the other server.
2. Re-mount the file system via IP address rather than host name.
Error Messages
This section provides a list of messages that you may encounter while creating and extending an SPS NAS
resource hierarchy or removing and restoring a resource. Where appropriate, it provides an additional
explanation of the cause of an error and necessary action to resolve the error condition.
Messages from other SPS components are also possible. In these cases, please refer to the Message
Catalog (located on our Technical Documentation site under “Search for an Error Code”) which provides a
listing of all error codes, including operational, administrative and GUI, that may be encountered while using
SIOS Protection Suite for Linux and, where appropriate, provides additional explanation of the cause of the
error code and necessary action to resolve the issue. This full listing may be searched for any error code
received, or you may go directly to one of the individual Message Catalogs for the appropriate SPS
component.
Error
Error Message
Number
Error: Exported file system <NFS exported file system name> cannot be accessed on
<server name>.
Possible causes:
- The LifeKeeper node is not in the exported system list on the NFS server, or,
107012
- The exported system list has contradictory entries that are not displayed by the
showmount command. (i.e. if exported system list exports a file system to both the world
and to specific systems, showmount will report only the specific systems).
Action: Fix the exported file system access problem and re- extend the hierarchy.
Error: Mount authorization check for "172.25.113.25:/ export" on "fred" appears to be hung.
107013 Exiting.