Replacing the Nutanix Witness VM on an existing Metro Availability Protection Domain

Last week, I had to change an existing Metro Availability Protection Domain between 2 Nutanix Sites. The change was to replace the existing Witness VM with a new one to be deployed on another VMware ESXi host. I completed this changed during business hours without causing any interruptions. This blog post provides you with the detailed steps that I followed to implement this change.

Deploy new Witness VM

The first step was to deploy a new Witness VM on the 4th location where new stand-a-lone VMware ESXi host was configured. This ESXi had been added to the exiting VMware vCenter Server instance for ease of management within the entire VMware vSphere environment.

Step by step

    1. Download the “5.16-witness_vm.ova” file from https://portal.nutanix.com
    2. From within VMware vCenter Server, right-click on the ESXi host and select “Deploy OVF Template”
    3. Follow the steps in the wizard to have the new Witness VM created based on the .ova file, selected datastore and network on the ESXi host
    4. Turn on the Witness VM
    5. Using the VM remote console in vCenter Server or directly on the ESXi host, logon to the Witness VM using the default “nutanix” credentials
    6. Set the static IP address by modifying the “ifcfg-eth0” file
      • sudo vi /etc/sysconfig/network-scripts/ifcfg-eth0
    7. Update/Add the following NETMASK, IPADDR, BOOTPROTO and GATEWAY entries
      • NETMASK="xxx.xxx.xxx.xxx"
      • IPADDR="xxx.xxx.xxx.xxx"
      • BOOTPROTO="none"
      • GATEWAY="xxx.xxx.xxx.xxx"
    8. Restart the Witness VM
      • sudo reboot
    9. The Witness VM should now have a static IP address allowing you to utilize Putty or another SSH Terminal client to connect and login
    10. Create the Witness cluster using the (familiar) cluster create command and use the static IP address for the “vm_ip_address” variable below
      • cluster -s vm_ip_address --cluster_function_list=witness_vm create
    11. The final configuration step is to change the default “admin” password using the same SSH session and the following command
      • passwd
    12. Before moving on, perform network connectivity tests ensuring that the new Witness VM can reach the Nutanix Controller VM’s on both Site 1 & 2 and vice versa

Changing Metro Availability Protection Domains

The next step was to change the active Metro Availability Protection Domains between Site 1 & 2. It goes without saying that these steps need to be performed very carefully and in the correct order to avoid any disruption on Production workloads.

Step by step

    1. Using Nutanix Prism Elements on the cluster annotated as Site 1, navigate to “Data Protection > Metro Availability” to list the Active (and, if available, Passive) Protection Domains
    2. Select the “Active” Protection Domain and click below on “Update”
      • Do NOT select an “Passive” Protection Domain
    3. Change the “Failure Handling” mode from “Witness” to either “Automatic” or “Manual” and click on “Save”
    4. Back in the Metro Availability Protection Domain listing you will see that the Failure Handling will take a little bit of time to change from Witness to either Manual or Automatic
    5. When the change is done, select the specific Active Protection Domain again but this time click below on “Disable”
      • This will cause the Metro Availability synchronization to be disabled
      • Prism will give you an alert stating that you need to ensure that all VM’s, which are using the Storage Container related to this Protection Domain, are located on (or otherwise migrated to) Site 1
    6. In my case, Site 1 & 2 were setup as Active-Active Metro Availability Protection Domains resulting in one Active Protection Domain on each Site
      • Therefore, I performed steps 1 – 5 on Site 2 as well for the Active Metro Availability Protection Domain on that side
    7. Next step is to unregister the existing Witness VM on both Site 1 & 2 clusters by clicking on the gear icon in Nutanix Prism and selecting the “Configure Witness” option
    8. Click on “Unregister” to remove the existing Witness from the cluster
    9. Now it is time to introduce the new Witness VM by registering it on both Site 1 & 2 clusters
      • Using the new Witness VM IP address and “admin” account (with that earlier changed password), register this new Witness in the same “Configure Witness” page in Prism Elements on Site 1 & 2 clusters
    10. The last step is to utilize the new Witness VM by selecting the Active Metro Availability Protection Domain on Site 1, changing the Failure Handling back to “Witness” and clicking on “Enable” on the Protection Domain
      • When clicking on “Enable” to restart the synchronization between Site 1 & 2, Prism will give you an alert stating that data on Site 2 (Passive side) will be overwritten, which is normal in that Active > Passive setup
      • Synchronization can take a long time dependent on the amount of data changes on the Storage Container on Site 1 after having disabled the Protection Domain after step 5
    11. Repeat step 10 on Site 2 on the other Active Metro Availability Protection Domain (if applicable to you as well)
    12. When all Protection Domains are back “in-sync” using “Witness” as Failure Handling, you can perform a final check by navigating to the new Witness VM IP address using a web browser
      •  Here you will see a listing of all Protection Domains that you have configured on this Witness using above-listed steps
      • My screenshot is here below where you will see that I actually had to deal with 4 clusters with 4 Metro Availability Protection Domains

For cleaning-up purposes, I powered off and deleted the old Witness VM using the VMware vCenter Server managed environment.

One Reply to “Replacing the Nutanix Witness VM on an existing Metro Availability Protection Domain”

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.