My Nutanix Field Deployment Process
I have been doing field deployments of Nutanix nodes for a couple of years now. My very first deployment was back in 2018, which was successful but did not go really smooth. Practice makes perfect though and these deployments have now become a walk in the park. It’s all good fun and time has really flown by. I thought it was about time to share my way of workings with you, which hopefully helps you on your own path as a Nutanix engineer.
“Preparation is the key to success”. I couldn’t agree more as the same applies for Nutanix field deployments.
There are certain key steps involved prior to actually doing the Nutanix field deployment as I will outline further down below. Those steps include requirements gathering, sizing, site surveys, designing, and ordering the actual hardware from the vendor. I will not detail all of these steps but only those which are directly related to deploying Nutanix nodes at the datacenter location or customer server room.
Cluster Deployment Questionnaire
Before each new deployment, I work with the customer to fill in “Cluster Deployment Questionnaire”, which contains all required information to proceed with Nutanix node deployments. This questionnaire requires to be filled in completely and covers the following subjects:
- Customer Information
- Data Center Facility Checklist
- Requirements on Software and Hypervisor
- Nutanix Cluster Setup
- Network (IP) Information
- Clusters Details
- Physical Networking Port Configurations
- Physical Node Placements
- Existing VMware vSphere Configuration
- Advanced Configurations
- Prism Central
- Data Protection Domains
- Witness VM
- Alert & Notifications Information
The best method for retrieving all of the above required information is by hosting a workshop with the customer, which is to be attended by all involved system & network administrators. The primary goal of such a work is to gather requirements, which will serve as input for creating the Technical Design documentation and Cluster Deployment Questionnaire. Additionally, the workshop is a way of introducing the Nutanix Enterprise Cloud platform to the attendees ensuring that they gain knowledge of their upcoming new Hyperconverged Infrastructure solution.
From my personal experience, my first workshop in 2018 was quite a challenge and I had to rely on my more experienced colleagues to answer the more technical challenging questions. Fortunately, it became easier after each successful workshop because many customer situations are quite identical with the same types of challenges. I am now actually focused on the more challenging environments where customers have very specific needs requiring me to utilize the full extend of functionalities and products in the Nutanix Enterprise Cloud platform.
Another key preparation step is to perform a customer Site Survey, which greatly helps to determine any deployment issues upfront. These issues could be related to insufficient power in one or more server racks where the new Nutanix nodes are to be installed. Also, sufficient Top Of Rack switch ports need to be confirmed ensuring each new node can be properly connected via 1GbE, 10GbE, etc. following the customer requirements. Do ensure to take pictures of the datacenter or server room situation, which then can be reviewed by others if required. Note that the Site Survey information is to be added to the Cluster Deployment Questionnaire in the Data Center Facility Checklist area. Worth noting is that is always a good practice to check the route from the loading area to the actual server racks and look for any obstacles that need to be removed. You do not find yourself in a situation that a simple doorstep is stopping you in your tracks with a cart of full of new Nutanix nodes. That trip from the loading area, where the nodes are being delivered and unpacked, towards the server racks requires to be as short and smooth as possible.
Server DOA Checks
The final preparation step is to precheck the new Nutanix nodes before these are actually dispatched to the customer office or datacenter location. In my case, any new Nutanix node is first delivered at the PQR HQ in Utrecht (The Netherlands) where these will be checked by my System Engineering colleagues for any Death On Arrival components. DOA checks basically involves checking for components, which are ‘broken’ (not working) at delivery. These could include faulty RAM modules , storage disks, NIC’s, etc. In some rare scenarios you could also encounter a node that doesn’t power on at all. In case of a DOA component or node the vendor is immediately contacted to provide replacement parts a.s.a.p.
At this stage checks are also done on the installed BIOS and firmware versions ensuring that these are up-to-date and corresponding with the hardware compatibility list (HCL) provided by Nutanix. This is more applicable when you are dealing with non-Nutanix (Supermicro / NX) Appliance nodes such as Dell or HPE (DL series).
When all preparation steps have been completed, it is time to dispatch the new Nutanix nodes to their “Final Destination” being the datacenter or customer office location. At PQR, we have a Logistics team that handles this work with great care. Once arrived at the datacenter or customer office location; it is time for the fun to begin! 🙂
With “fun” I mean unboxing and installing the new nodes into their respective server rack(s). This is definitely a team effort and should not be done alone because of your safety and that of the precious Nutanix nodes. I am always accompanied by one of my PQR System Engineering colleagues. Most of the times, a customer system administrator is also present on these installation days. I can’t blame them because it is always a treat to see the new Nutanix nodes being installed!
Before doing anything else, I check the server room and specifically server rack ensuring that there is sufficient space and power available. If not, then you can do only part of the deployment work or sometimes nothing at all. In these scenarios, it is better to leave the servers inside their boxes for the best protection whilst any server rack issue(s) are being resolved. I also double check the route from the unboxing (loading area) location to those server racks. Usually, the Site Survey was done sometime ago and things could have changed in the meantime. Any obstacles on the route need to be removed.
Before unboxing, I always cross check the number and contents of the boxes with the shipping details forms. These forms are attached to each wooden pallet containing one or more servers. I also have a copy of these details in my email inbox.
In case you have not experienced this before, you will be surprised how much heavy duty packing material is used for each node. Of course this is needed for shipping these nodes globally using air freight transport.
Whilst unboxing, double check that you are not accidentally throwing away important accessories. These could be tucked away inside packing materials. You do not want to find yourself inside a trash dumpster looking for a much needed bag of screws. 😉
Also, check that the Power Distribution Units in the server racks contain enough and the correct type of power outlet (schuko or C14). It has happened to me that only schuko power outlets where available in a server rack whereas in the Nutanix boxes only C14 power cables were provided. In this case, I had to temporarily improvise by extending power cables and borrowing power from the adjacent server rack. Of course, this was just for a couple of hours to perform the Nutanix Foundation process, which is an activity I will elaborate on further down below. Special note: extending cables is not a permanent solution and does not comply with any datacenter policy out there. 😉
Following the regulations in the particular data center or customer location, I always ensure to clean up after myself by disposing the packing materials in a correct manner. This will ensure a safe working environment whilst also not blocking other people doing their work in the same location. And again, double check that the boxes do not contain any important accessories!
Moving Nodes to the Server Rack
Always use proper carts when moving Nutanix nodes from the unpacking area to the server room. I do not stack the nodes too high on a single cart as these are not meant to be stacked in this way. This is because of the combined weight pressing down on the bottom node. But also because stacking too high causes the nodes to slide whilst moving, which can happen when moving up or down some ramps. Have your colleague walk with you along the route as he/she can open doors whilst also looking out for any movement of the nodes on the cart.
When stacking nodes on a cart, I double check that I have added the correct server rails and other accessories (e.g. front bezels & power cables) to each cart. I don’t want to worry about things like searching for the correct server rails or screws when working in the noisy and hot/cold server room later on. Everything requires to be readily available and organized.
You want the trip to go as smoothly as possible because you do not want any surprises along the way with a cart containing servers, which are easily more expensive than my car. Safety first! 🙂
Installing Nodes in the Server Rack
Before heavy lifting the nodes, work inside the server rack requires to be done first such as clearing the space where the new nodes are to be mounted. With clearing, I mean carefully moving any ethernet or power cables from other servers possibly already mounted in the rack. There requires to be enough room for the servers to be slided in later on without any existing cables to be damaged during that process. You could be in a situation with a ‘messy’ server rack where cables are running everywhere. It is quite possible to overlook just one cable that could get stuck or damaged when mounting the new Nutanix nodes. You can then expect a call from the customer informing you of a Production issue or even outage, which requires to be avoided at all cost!
The next step is to attach the server rails to the rack using the screws that came along within the Nutanix boxes.
It has happened to me one time that I forgot my philips screwdriver, which was very much a facepalm moment. Luckily, the data center personnel helped me out that time otherwise a trip to the local hardware store was required. Read on if you want to know what tools and stuff I take along when doing Nutanix field deployments.
When I am 100% certain of the server rack placements then I continue attaching all other server rails in one go. It greatly helps if you do this job with two persons as I always do. One person at each side (front/back) of the server rack.
Next is to attach the rack rail mounts on each node (left and right) and please check that you are mounting it in the correct manner. These mounts can easily get stuck when not done properly. In case you are having trouble, check the documentation in the Nutanix box. In each box there is a document with installation instructions. 😉
Next is to lift (with two persons) the nodes and slide these into the server rails. The mounts on the sides of the nodes and the server rack rails will lock onto one another. The node can then be slid back into the rack and should fit “like a glove”.
In case you have used the incorrect screws to fasten the rails to the server rack then you will notice that the node will not completely slide back into the server rack. Yes, it has happened to me… 😛
I always handle the nodes with great care when lifting and sliding them into the rack. It is quite easy to bend a server chassis component, which in turn leads to not being able to mount the server properly in the rack.
When you feel too much resistance when sliding back a node, do not use more strength as it can lead to bended rails. Just pull the node back out and double check your rack and server rail mounts.
Regarding rack placement of Nutanix nodes, it is absolutely fine to place these right on top of each other without having a couple of “U” rackspace in between. These nodes (as with other rack servers) are build to be stacked and suck cold air from the front and push it out from the back.
I did come across a customer once who persisted in having some “U” rackspace between each node. No problem but do remember that you will lose some amount of rackspace in that server rack when doing so. This could become a problem when you want to add more nodes to your Nutanix cluster in the future.
When this work is done then the physical server installation has been completed. This is a nice moment to take a moment and look at the cool new stack of Nutanix nodes.
Next up: Nutanix Foundation!
My preference is to perform the Nutanix Foundation process (imaging new nodes) after installing the nodes in the server racks. Others prefer to perform this activity before the nodes are shipped to the datacenter location. You could perform this imaging work after having DOA checked the nodes. This method does saves time at the datacenter location. Also, when dealing with possible issues when using Nutanix Foundation, you do not have to troubleshoot those issues in the loud, warm/cold and noisy environment being the datacenter server room.
However, I enjoy the datacenter or customer server rooms and do not have any problems with doing troubleshooting in that environment. I like it! 🙂
Regarding Nutanix Foundation, there are several approaches available for performing this process. These include, but are not limited to, using a Windows/MacOS applet or standalone Foundation virtual machine. I always use the latter as I have used this method starting from day 1 when I imaged my first Nutanix node. Yes, that brings back some good old memories…
Anyway, the Nutanix Foundation standalone VM runs on Oracle VirtualBox, which is installed on my Apple MacBook Pro. It has a small footprint and does not require that much resources; 2GB RAM, 2vCPU & 30GB storage (SSD preferred) and you are good to go.
When imaging the nodes using Nutanix Foundation, I am always using my trusty Netgear switch and ethernet cables. I have a special bag with everything that I need to perform this activity at a datacenter location. It contains:
- Netgear 16-port Switch
- 20+ Cat5e Ethernet Cables
- 6 Outlet Powerstrip
- 4x Ubiquiti RJ45 / SFP Transceivers
- C14 to Schuko Adapter Cable
- Philips Screwdriver
- Velcro Cable Ties
- Ear Protection
Do ensure that you have a similar bag like mine when you are off to perform your Nutanix Foundation work. In my view, all the above-listed tools are mandatory and have helped me out when working with specific nodes or facing certain issues at the datacenter location.
Before starting Nutanix Foundation you will need 2 cables per to-be-imaged node connected to your switch and one cable from your laptop/MacBook to that same switch. In my situation, I can hook up a maximum of 7 nodes at one time on my 16-port switch. However, I have also managed to daisy chain two temporary switches allowing me to Foundation 17 nodes in one go. Check out my old blog post on that one.
I will not go into further details of Nutanix Foundation as it deserves a separate blog post to do it proper justice. However, I do want to share some more Field Installation tips when using Nutanix Foundation in the field.
Whenever possible, use the Foundation Preconfiguration tool available at my.nutanix.com. This tool allows you to enter all configuration details (IP addresses, Nutanix AOS, Hypervisor details, cluster settings, etc.) beforehand and export these settings into a JSON file. This information can then be double checked with the customer ensuring that there are no errors in IP addresses before you are off to the datacenter. It has happened to me that I had to redo a Foundation process for certain nodes because the customer made mistakes when providing the IP addresses. Foundation Preconfiguration ensures that the customer checks the configuration one more time reducing the risk of mistakes in the earlier mentioned Cluster Deployment Questionnaire.
An additional benefit is that this saves some time at the datacenter location because you can import the JSON file into Nutanix Foundation and all configuration fields will be filled out automatically! You just have to double check the settings and start the process.
Always ensure to use the latest version of Nutanix Foundation, which is available on portal.nutanix.com. It has happened to a colleague of mine that was unable to image certain nodes because the Foundation process failed at a certain point in the process. Upon researching, this happened to be a known bug in Nutanix Foundation that was already fixed in a newer release, which was available on the Nutanix Portal. After using the latest version, the Foundation process was smooth sailing again for my colleague.
As I have mentioned in another blog post, last year Nutanix made security changes on new nodes coming from the factory. The default IPMI admin password on NX nodes has changed. Ensure to use the correct password for each node in the Nutanix Foundation configuration screen; the last one before starting the Foundation process.
Unfortunately, it does happen sometimes that Nutanix Foundation fails for one node whilst other ones in the same batch have been completed successfully. In these cases; turn off the particular node and remove the power and wait for 1 minute. Thereafter, redo the Nutanix Foundation process for that node only using a brand new Nutanix Foundation batch. Do not use the same one. I most cases, this will fix the issue and the node will be imaged properly.
On your laptop or MacBook, do not run any other programs other than Nutanix Foundation as it may interfere with the process. If your hands are itching to check your social media channels or emails; use your smartphone instead. 😉
Check your cables and temporary switch before going to the datacenter. Do not use any ethernet cables that have broken clips at the connections because any hiccup on the connection between Nutanix Foundation and the nodes could result in a failure. Also, ensure that your temporary switch does not have any VLAN or other configuration present. It needs to be a “flat” unconfigured switch.
When the Foundation process is done, do not start dismantling all ethernet cables immediately. Using your Nutanix Foundation VM (in my case) you can double check that you can reach all imaged Nutanix nodes using the IPMI, CVM and AHV or ESXi IP addresses. When all is fine, you can make some screenshots of the completed Foundation process and your double checks ensuring that you have completed this process to confirm with the customer.
Top of Rack Switches / Cable Management
My next step is Top of Rack Switch cabling after having completed the Nutanix Foundation process. All that mess we made earlier with the temporary switch and cables needs to be cleaned up and put away in my bag. Yes, it can be quite a mess and especially when you are imaging many nodes at the same time… 😛
From a practical point of view, you can park this work for another day as it can take up lots of time when you are dealing with many nodes. Cable Management needs to be done properly and should not be a rush job. Doing a bad job here does not only look unprofessional but can lead to issues with heat buildup in the server racks but also with severe problems when you have to relocate or add servers from and to the same rack in the future.
Always ensure to have sufficient spare 10GbE cables available because it does happen that certain cables are faulty coming out of a brand new batch. Problems with solid port connections also happen due to faulty cables clips. By the way, in some rare cases it also happens that a NIC on the node is faulty as it does not allow the cable to be held in place. In these cases, you will need to inform the hardware vendor to provide a replacement NIC. However, this check is part of the earlier mentioned Server DOA Checks and should not happen at this stage in the process.
The exact method of cabling each node differs per customers and is based on their requirements, which has been captured in the Cluster Deployment Questionnaire. Usually, each node is connected via two 10GbE cables to two separate ToR switches. Keep a close eye on the length of cable required for each ToR switch connection. Do not use too long cables for those connections, which are going to the ToR switch in the same server rack. Otherwise, you will have to cable manage too much cables and a server rack only has that much room for this purpose.
Please, use velcro cable ties for cable management instead of zip ties… just don’t. It can damage the cables and is also a pain to remove when you have to add more cables to the same bundle when adding new nodes.
Do not forget to label each cable allowing you to easily track each connection in the setup at a later point in time. This definitely helps when having to troubleshoot possible issues at the site.
In certain cases, customers want to use different colors for their IPMI (ilO or iDRAC) cables to easily see the difference with the other 10GbE data cables. It all comes down to requirements from the customers and, again, should be captured in the Cluster Deployment Questionnaire.
Check with the customer that all IPMI (iLO or iDRAC) IP addresses are reachable from their office network. However do note that it is very well possible that the 10GbE connections are not working yet because of post-configurations steps (LACP and VLAN settings) that need to be done first to enable these connections.
When all of the above is done then you have successfully completed a Nutanix Field Deployment! I always take a good moment to enjoy this accomplishment by looking at the nodes spinning away with their blue disk leds flashing away. Do not forget to put on those nice front Nutanix bezels and locking the server rack doors. Leaving everything tidy is a key part of being a professional in your field.
Next steps are to perform the required post-configurations of the newly deployed Nutanix nodes, which include the setup of one or more Nutanix clusters based on the newly deployed Nutanix nodes. Nutanix Foundation does allow you to also create a Nutanix cluster as part of the process but I always perform this step afterwards in the comfort of an office or working from home environment. Any additional virtual switch configuration (e.g. LAG/LACP) is also a key activity to be performed next. But more on this on other blog posts. 😉
Nutanix Training & Certification
Nutanix offers a great training and certification track, which enables you to perform all (and more) of the above mentioned Nutanix Field Deployment work:
- Nutanix Certified Services Core Infrastructure Professional
Check out my other blog post where I highlight that training when I attended a NCS boot camp earlier in 2020.
In case you are interested in other Nutanix training and certification possibilities, have a look at https://www.nutanixuniversity.com.
Thank you for reading this blog post and please do let me know your thoughts in the comments below? 🙂
5 Replies to “My Nutanix Field Deployment Process”
You are doing an amazing job Preet!
as usual a joy to read. Keep up the good work!
Thank you for sharing your experience. It reminds me of my project 4 years ago before moving to another company when we deployed 12 Pivot3 HCI running ESXi on Dell hardware enclosure with 4 nodes each. Well documented, perfect!.
You forgot to mention coffee break in the datacenter .
Thanks for your great feedback and sharing your Pivot3 deployment memories; sounds like a good one!
Very true; I totally forgot to list the (many) coffee breaks. 😉
Yes Pivot3 deployment was great.