High Availability with VMware Metro Storage Cluster: features and recommendations

Written by present | Feb 4, 2015 5:10:00 PM

The demand for highly available yet geographically dispersed environments is increasing in Canada due to the growing need to maintain uninterrupted access to the most critical services and data. Another reason justifying this increase is the losses incurred in the event of a failure of service, even a momentary one.

To define a customized and well- balanced business continuity strategy, companies must consider a wide range of parameters. In particular, they should determine their recovery point objectives (RPO) and Recovery Time Objectives (RTO), which are key indicators to answer two fundamental questions:

• How much data is it acceptable to lose (RPO) per service or service chain?

• In how much time should the service or service chain (RTO) be restored?

The most critical applications with the RPO and RTO values closest to zero are those that pose the most demanding challenges such as:

• Eliminating single points of failure;

• Minimizing infrastructure costs;

• Managing high availability simply;

• Managing failures and load balancing automatically;

• Providing a continuous service even with a site failure.

Do the features of VMware Metro Storage Cluster allow enterprises to achieve these levels of availability and performance? Here are our answers and recommendations.

Features of VMware Metro Storage Cluster (VMware vMSC)

Stretched Cluster

From the perspective of an extended high availability, as opposed to disaster recovery, companies are required to distribute their vSphere farms on two sites instead of one, as is usually customary.

The technology used is called vMSC (VMware Metro Storage Cluster), also known as Stetched Cluster.

VMware vMSC provides businesses the benefits of a local high availability cluster such as:

• vMotion and DRS (VM migration and dynamic allocation of VMs between hosts without interruption of service);

• High Availability (automatic restart of VM in the event of a host failure);

• Full Tolerance (permanent availability of applications in the event of a host failure).

The cluster is spread over two geographic sites. It is important to note that this configuration, unlike VMware SRM, uses only vCenter.

Synchronous replication

VMware vMSC uses synchronous replication to write the data on the two remote storage devices simultaneously. However, this type of synchronous replication differs from the traditional synchronous replication that creates a primary/secondary relationship between the two storage units.

In the case of VMware Metro Storage Cluster, only the primary relationship level exists, which allows access to data on either part of the cluster, in real time.

Required infrastructure

With such a demand on the flow of data, telecommunication links between the two sites must be sized specifically and meet strict criteria.

The most important requirements are:

• A stretch storage architecture, active / active, with synchronous mirroring;

• A network connectivity extended to level 2;

• Latency (RTT or round-trip time) and the maximal distance between sites;

• Bandwidth;

• The quorum witness at a third site or the in cloud;

• Only one vCenter.

Benefits

Here are some of the benefits associated with such an approach.

• RPO and RTO values close to zero;

• The ability to migrate VMs among sites without service interruption;

• No issues changing IP addresses;

• Automatic and immediate treatment of storage failures;

• Transparency for users in the event of site failover.

Recommendations

Evaluate whether the vMSC technology is suitable for your needs

Do you need a recovery solution or an extended high-availability solution? Which approach is compatible with your service level requirements (SLAs)?

We recommend establishing two scenarios, one based on vMSC and the other on VMware Site Recovery Manager (SRM).

• Option 1 : two production data centers in active-active mode with extended storage and networking.

• Option 2 : two data centers in active / passive mode, one for production and one for testing and development. If the production site fails, SRM performs a scheduled recovery of VMs on the secondary site. There are many alternative tools, although with less orchestration, such Veeam Backup & Replication or Zerto Virtual Replication.

Our specialists can assist you in determining the most appropriate scenarios and their respective ROIs.

Identify your requirements and the limits of the solution

In addition to the requirements above, the following points should be considered:

• As with a local cluster, the solution has only one vCenter. In the case of failure, both sites are disturbed;

• DRS and HA does not have site awareness;

• vMSC is an extended high availability solution, and as such, does not have procedures for dealing with unplanned outages and is not able to resolve a corruption.

Eliminate doubts with structured testing methodology

The testing phase during the implementation of your high availability environment with VMware vMSC is a critical step. The various failure scenarios must be thought of and tested prior to production. Everything must be documented and executed according to the test plan.

Experience shows that this phase is often neglected or forgotten for lack of time and resources.

Surround yourself with experts in this field

Ask our experts to help improve your infrastructure project while still developing the skills and independence of your team.

View full post