Part 2 - Disaster Recovery with SRM and vSphere Replication
In the previous article we went through the installation and configuration of the SRM and vSphere infrastructure. The time has now come to actually doing some tests and failover some VM's.
In the simple scenario I expect everything to go smoothly but there are a few things that I'm concerned about at this point since the protected environment I'm ultimately is failing over isn't so simple. It has multiple dvSwitches, vLans, load balancer, firewall, ldap, a set of test drivers to verify the integrity of the system and access to the system in a sensible way for administrators so there are a few things that needs to be resolved.
Protected Setup |
Test Failover
The small test
In this scenario I do a test failover of a single machine and verifies that it starts and that I can log into it.
The steps are:
The steps are:
- Setup the machine running RHEL 5.5.
- Install VMware tools
- Kick of the vSphere Replication
- Create a Protection Group and ... no that didn't work.Fortunately this is just a simple one to fix since I forgot to unmount the VMware iso from the DVD/CD drive. I just removed it and then the Protection Group configuration accepted the machine.
- Create the Recovery Plan
As for the Recovery Plan its just to click through the wizard - Do "Test"
The machine starts nicely and, as expected, it can't reach anything outside its test bubble. Nor anything in it. This obviously is a problem in the case you actually want to make some serious testing of the services that you failed over. - Do "Cleanup" to remove the test failover.
The big question now is how to get access to the servers from en external place?
Bursting the Test Bubble
The first step is to create a new machine at the recovery site and install vmware tools. In my case I create a CentOS 6.3 machine (bubblebridge). The intention is to let this new machine act as a bridge between the real world and the test bubble.
Now the machine is created and I can log in through the console. Now we need to do some configuration of the machine to make it useful.
- As root
cp /etc/sysconfig/network-scripts/ifcfg-lo /etc/sysconfig/network-scripts/ifcfg-eth0 vi /etc/sysconfig/network-scripts/ifcfg-eth0
- Make it look like this with reservation for IPADDR, GATEWWAY and NETMASK:
DEVICE=eth0 BOOTPROTO=static NM_CONTROLLED=yes ONBOOT=yes TYPE=Ethernet IPADDR=
GATEWAY= NETMASK=255.255.255.X DNS1= DNS2=8.8.8.8 # Google DNS - Change network config by vi /etc/sysconfig/network and make it look like:
NETWORKING=yes HOSTNAME=bubblebridge GATEWAY=
- At this point it should be possible to ping google.com
- CentOS 6+ comes with a NetworkManager that I don't agree with so I chose to stop itAs root stop NetworkManager and remove it from startup
service stop NetworkManager chkconfig NetworkManager off chkconfig network on
- Id like to ssh into the machine so lets get that up and running to.Install packages needed
yum install sshd openssh-server openssh-clients perl
Turn on sshchkconfig sshd on service sshd start
- Now you should be able to ssh bubblebridge@
Ok - so far so good.
Now lets do a new Test failover using the Simply Single Recovery Plan (thats the recovery plan created with only one machine in it). When the machine has failed over the only way to talk to it is through the console. When looking at the virtual machines summary you see that it has a Network attached to it.
Now lets do a new Test failover using the Simply Single Recovery Plan (thats the recovery plan created with only one machine in it). When the machine has failed over the only way to talk to it is through the console. When looking at the virtual machines summary you see that it has a Network attached to it.
To get access to the machine I add a NIC to the bubblebridge. In the picture above I use the virtual switch created by the failover note that this will cause issues further down the stream when cleaning up the test bubble you will get errors such as "Remove virtual switch... The resource srmvs-recovery... is in use". This will happen since the bubblebridge machine isn't part of the test setup and if it is associated to it (can only can be done when you have done a test failover since the network wont be there otherwise) it will block the removal of the resources.
To mitigate that I have a check in the protected site and check which dvSwitch and VLAN the machine is connected to and create the same setup at the recovery site (I use the postfix -R to denote the recovery network). Don't forget that you have to go to SRM and do the Network Mappings in order for it to take affect
To mitigate that I have a check in the protected site and check which dvSwitch and VLAN the machine is connected to and create the same setup at the recovery site (I use the postfix -R to denote the recovery network). Don't forget that you have to go to SRM and do the Network Mappings in order for it to take affect
Edit settings and press the "Add..." button
Select Ethernet Adapter
In the Network Type dialog under Network Connection select the network label that the failed over test machine has. And then click through to finish. After completion the bubblebridge machine should have two network cards associated.
Now its time to ssh into bubblebridge again. When you reach the machine you can run ifconfig and discover that you now have two NIC's eth0 and eth1 with the same address. That was not what was intended so lets fix that. Switch to root and do the following:
- As root
cp /etc/sysconfig/network-scripts/ifcfg-eth0 /etc/sysconfig/network-scripts/ifcfg-eth1 cp /etc/sysconfig/network-scripts/ifcfg-eth1 /etc/sysconfig/network-scripts/ifcfg-eth1.664 vi /etc/sysconfig/network-scripts/ifcfg-eth1
- And make ifcfg-eth1 look like
DEVICE=eth1 BOOTPROTO=static NM_CONTROLLED=no ONBOOT=yes TYPE=Ethernet IPADDR=
NETMASK=255.255.255.0 - Now you should be able to ping the test server and access it if its configured to accept ssh. In my case it seems that the server hasn't a sshd running so lets clean up. Fix the issue and do a new test failover and everything should work perfectly.
Current State of the Setup
Recover Test Setup |
Rudimentary access to the test bubble is established.
Comments