VSAN 6.5 to 6.6 Upgrade Issues with CLOMD Liveness

Before attempting any upgrades in a production environment I always try to test the process and functionality in a lab first. With this in mind I wanted to test the upgrade of VSAN 6.5 to 6.6 in my home lab, and unfortunately I initially didn’t have a whole lot of success. I’ve now fixed all the issues and just in-case anyone has the same problems, I’d like to ensure the resolution is readily available. I haven’t had the time to define the root cause but I have resolved the issues.

Firstly, let me make sure you understand, this is on UN-SUPPORTED hardware. These issues may not ever exist in a fully supported and compliant production environment. I have not seen these VSAN upgrade issues in fully supported environment. However, we all tend to run our labs on un-supported hardware so I’m sure I won’t be the only one that comes across these issues and just in-case other people do, the resolution is pretty simple. I have seen the same issues three times in three separate (unsupported) environments.

The upgrade was from VSAN 6.5 to VSAN 6.6 and as VSAN isn’t a stand-alone product, it is built into vSphere so the upgrade performed is as simple as upgrading ESXi. I was running ESXi 6.5.0 (Build 4887370) and the upgrade was to ESXi 6.5.0 (Build 5310538).

It has been a long (and i mean a LONG time) time since I have seen an ESXi purple screen. But soon after upgrading my environment to ESXi 6.5 (5310538) my hosts started purple screening. I had to take a screen shot because this is a rare sight. It only happened once and since the below fixes were applied it has never happened again.

Screen Shot 2017-05-26 at 7.28.59 PM

The VSAN upgrade process is very straight forward to perform.

  • Upgrade vCenter Server
  • Upgrade ESXi hosts
  • Upgrade the disk format version

Straight after the upgrade I started receiving vMotion alerts and my VMs wouldn’t migrate between hosts. There didn’t appear to be any configuration issues with vMotion and it was working perfectly fine before the upgrade. I tested the connectivity using a vmkping between hosts on the vMotion vmkernel IP and it failed. There was no network connectivity between hosts on the vMotion vmkernel port!

The vMotion fix:
I found that simply deleting the existing vMotion vmkernel and recreating a new vmkernel with the exact same configuration fixed all the issues. I had to do this on all hosts within the cluster and vMotion started working again.

CLOMD Liveness

This brings me to the next issue which was a lot more critical, the CLOMD Liveness. After I resolved the vMotion alerts, I ran a quick health check on VSAN. I found that my hosts were now reporting a “CLOMD Liveness” issue. This is concerning because the CLOMD (Cluster Level Object Manager Daemon) is a key component to VSAN. CLOMD runs on every ESXi host in a VSAN cluster and is responsible for creating new objects, communication between hosts for data moves and evacuations, and the repair of existing VSAN objects. To put it simply, this is a critical component for creating any new objects on VSAN.

Screen Shot 2017-05-26 at 9.04.03 PM

If you want to test this out (in a test environment), SSH to your ESXi hosts and stop the CLOMD daemon by running “/etc/init.d/clomd stop” and then try to create new objects or do a VM creation proactive VSAN test and see what happens. You will get the error “Cannot complete file creation operation”.

Screen Shot 2017-05-26 at 9.15.41 PM

And the output from the proactive VSAN test is “Failed to create object. A CLOM is not attached. This could indicate that the clomd daemon is not running”.

Screen Shot 2017-05-26 at 9.19.53 PM

If CLOMD isn’t running, you’re not at risk of losing any data, it just means that new data can’t be created, I would still suggest that it is critical to get it running again.

The CLOMD Liveness can occur for a number of reasons. The VMware KB article is here: https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2109873

In order to check the CLOMD service/daemon was running on the hosts you can execute the following command on each host:

/etc/init.d/clomd status

The results showed that the CLOMD service was not running and even after re-starting the service, it would stop running a short time later.

Screen Shot 2017-05-26 at 8.20.43 PM

The VSAN CLOMD Liveness fix:
Learning from the vmkernel issues, I immediately tried deleting and re-creating the VSAN vmkernel on each host and this fixed the issue. However to do this was a little more difficult than the vMotion process because when you delete the VSAN vmkernel you instantly partition that host, so you need to be careful how you do this.

Place the host in Maintenance Mode first! We aren’t going to lose any data so you don’t need to evacuate the data, however I would recommend you at least select “Ensure data accessibility from other hosts”. Selecting “No Data Migration” is generally only suggested if you are shutting down all nodes in the VSAN cluster, or possibly a non-intrusive action like a quick reboot.

Once the host is in Maintenance Mode you can now delete the existing vmkernel and re-create a new one with the same settings. I would then reboot the host for good measure. Once the host is back up, you can exit Maintenance Mode and then move on to the next host.

Again, I stress that I have only seen this issue on un-supported hardware.

My VSAN Upgrade Process

  1. Upgrade vCenter
  2. Upgrade each ESXi server
  3. Upgrade the disk format version
  4. Run a VSAN health check!
  5. If you have a CLOMD issue then for each ESXi host in the VSAN Cluster
    1. Place a host in Maintenance Mode
    2. Delete and re-create the vMotion vmkernel
    3. Delete and re-create the VSAN vmkernel
    4. Reboot the ESXi host
    5. Move on to the next host
  6. Run a VSAN health check again

 


 

Other posts you might be interested in:

SuperMicro VSAN HCIBench

SSL Certificate Tool (CertGenVVD)

Single Node SuperMicro Home Lab

SuperMicro vs Intel NUC

Single Node SuperMicro Home Lab

Building a home lab can be an expensive endeavour, so if there’s a much cheaper and easier option that still achieves the same outcome then why not do it? Who needs all that physical hardware when you can build your entire lab environment from a single SuperMicro server? The SuperMicro E200-8D and E300-8D are both micro servers that are ideal for this type of home lab build. Have a look at my previous article on this topic  (SuperMicro vs Intel NUC) where I explain why the SuperMicro is such a great option. They are micro servers that take up next to no space, consume minimal amounts of power and provide you with 128GB RAM capacity.

Thanks to a colleague of mine, Dale Shaw @Shawski500 who has loaned me his SuperMicro E200-8D server with 128GB RAM, I am able to show the process to build out a home lab on a single server.

Home Lab Concept

Ok, so the concept here is pretty simple. Take a single server with 128GB RAM, build 4 nested ESXi hosts with 32GB RAM each that will share the resources of the single physical host. Why 4 nested ESXi hosts? Not only does the RAM split at 32GB nicely but this also allows you to build a couple of 2 node clusters in your environment (i.e. management and compute clusters).
In perfect timing, William Lam (virtuallyGhetto) has just published 2 new blogs that we can leverage to assist us with our home lab build.

Utilising one or both of the above capabilities, we can simplify our home lab build. If you haven’t tried it, this is a great opportunity to try out the Project USB to SDDC in order to kick start your home lab build. William has already tried it on the SuperMicro E200-8D and without any effort the SDDC environment was up and running. If we can do it on the floor of the Melbourne Convention Centre, then you can do it at home!MelbourneVMUG

As is often the case with a home lab build, the idea is to manually install all of the components in order to learn how they work, break things, fix them and make it your own. So this article will provide you with the details you need to build your home lab using the ESXi virtual appliances that William offers. How you then chose to build your actual lab environment is up to you.

What You’ll Need

Let’s get started with the essentials. Here is what you’ll need to get started to build your new lab.

  • Server with sufficient RAM and CPU (I’m using a SuperMicro E200-8D with 128GB RAM)
  • Local disk or NAS for storage
  • ESXi 6.5d iso
  • ESXi 6.5d virtual appliance
  • Virtual router (pfsense or similar)
  • Nested VM for AD, DNS, DHCP, CA…etc

Building the Physical ESXi host

This is where all the critical configuration is, so don’t rush into building the nested ESXi hosts straight away. The first step is to prep your server (BIOS and IPMI Updates, Network configuration, BIOS settings and all the normal stuff) then install ESXi to it. I won’t go into any details around this process as you should be familiar with installing ESXi 🙂

Here is my Physical ESXi host. Just to confirm, it is a SuperMicro E200-8D with 6 CPUs and 128GB RAM. I’m also using local SSD storage rather than my NAS, just for this demonstration. I will configure VSAN within the nested environment based on this underlying single 1TB SSD and NVMe cache.

Screen Shot 2017-05-16 at 10.52.37 AM

Network

The networking configuration on the physical ESXi host is important to get right. If it isn’t right then your nested lab won’t be able communicate between ESXi hosts. A massive benefit to using the SuperMicro servers is that they have multiple NICs and therefore I can run separate vSwitches for my nested environment. I’ve built a separate vSwitch called “Nested ESXi” and assigned it my 10Gbe NICs. The Physical ESXi management is on its own vSwitch, the default “vSwitch0” and is assigned to two 1Gbe NICs.

Screen Shot 2017-05-16 at 11.55.04 AM.png

On the “Nested ESXi” vSwitch I have created a single port group also called “Nested ESXi”. The network settings for the Nested ESXi switch and port group needs the following configuration:

  • Allow Promiscuous Mode.
  • Allow Forged Transmits.
  • Allow MAC Changes.
  • VLAN 4095, which is a “trunk” port group and will allow you to run multiple VLANs in your nested lab.
  • MTU needs to be set to Jumbo Frames if you are going to use NSX in your nested lab.

Screen Shot 2017-05-16 at 10.54.28 AM.png

Storage

The next step is to configure the storage. In my case I am going to run a nested VSAN lab and the SuperMicro E200-8D is fitted with a 350GB NVMe and a 1TB SSD, so I need to create local datastores for each of these storage tiers.

Screen Shot 2017-05-16 at 12.04.40 PM.png

Nested ESXi Hosts

Now that our underlying networking and storage is configured, we can start to deploy our nested ESXi hosts. You can deploy as many or as little number of nested hosts as you like. This is now an extremely simple process thanks to the nested ESXi appliances. Simply deploy the ova file 4 times to build 4 nested ESXi hosts. You will need to configure each host during the deploying with their management configuration. At this point you need to decide on what your Management VLAN ID will be and the host IP addresses. At this early stage DNS isn’t critical but if you’ve already decided on what your DNS Server IP address will be then enter in all the details during the deployment.

The VLAN ID will likely be 0 or blank. Because the physical port group is configured as VLAN 4095 or a trunk port group, then you can use multiple VLANs in your nested environment and you can use either a Management VLAN or No VLAN. Once we have configured our nested ESXi hosts, we will deploy a virtual router that will then be configured with our nested home lab VLANs and we can configure VLANs for VSAN, vMotion, Nested Management…etc. For now, all we need is for the ESXi hosts to communicate between each other without routing to any other VLANs so just make sure they’re all configured on the same network and are accessible from your home network. Don’t power on the nested ESXi hosts yet.

Now that the nested ESXi hosts are deployed, we need to configure them before powering them on. This includes the CPU and RAM resources and the storage configuration. You should now have 4 virtual ESXi hosts on your physical ESXi server.

Screen Shot 2017-05-16 at 10.52.13 AM

  • Each nested ESXi host will be deployed with 2 NICs. Check that both of these are connected to the “Nested ESXi” port group and set to “connected”.
  • If you are going to run VSAN on you nested home lab like I am, then configure each nested ESXi host with 3 HDDs in suitable sizes.
    • Hard Disk 1 shouldn’t be modified as this is where ESXi is installed.
    • Hard Disk 2 is configured as the read/write cache and is connected to the “Local NVMe” datastore.
    • Hard Disk 3 is your VSAN capacity disk and should be as large as you can afford. It should be connected to the “Local SSD” datastore.
  • The CPU should be set to use all of the available CPU cores
  • The RAM is set to the shared amount, in my case 32GB.

Screen Shot 2017-05-16 at 12.36.40 PM.png

Configure all of your nested ESXi hosts in the same way, and then power them all on.

Accessing Your Nested Lab

There are a number of ways in which you can configure access to your new lab and this entirely depends on what you have available to you. You have deployed your nested ESXi hosts on your physical home network, so you can now connect to each of the ESXi hosts and configure them to suit your new lab environment.

The next issue will be building out all of your VMs within your nested lab and the nested networking configuration. You could simply deploy all of your VMs to your physical home network and on the same subnet as your ESXi management. This will work but it’s not really what I’d build a home lab for. I’ve configured this nested home lab to use a trunk port group so that I can run multiple VLANs in my home lab. I want to be able to deploy and use NSX and VSAN, both of which will require VLAN IDs and communication between ESXi hosts. In order to start using VLAN IDs within your nested lab and configure routing between these VLANs, you’re going to need a nested virtual router. There are many options out there but for simplistic sake I have used a pfsense configuration. This is downloaded in the form of an ISO file and when booted from the iso it will build the virtual router for you.

Here is a quick overview of my pfsense configuration for this lab, with the WAN network being the untagged native network and the LAN networks the nested VLANs. If you want to do something similar then let me know and i’ll try to put together a more detailed “next steps” follow up with nested networking configuration, vCenter deployment, VSAN configuration and NSX.

Now you now have 4 ESXi hosts running on a single SuperMicro server that can be used to build your home lab however you like. Here is what my new lab environment looks like.

Screen Shot 2017-05-17 at 11.52.52 AM.png


Other posts you might be interested in:

SuperMicro VSAN HCIBench

SSL Certificate Tool (CertGenVVD)

Single Node SuperMicro Home Lab

SuperMicro vs Intel NUC

SSL Certificate Tool (CertGenVVD)

 

It’s always one of the parts of a new implementation that I don’t look forward to, generating SSL signed certificates for all of the various VMware products. This is something that i’ve done a lot of times in my years at VMware but I still avoid doing it if possible. Not surprisingly, a lot of customers reach out to VMware for support when renewing certificates too. The process you have to go through even just to generate the certificates is time consuming and prone to error.

  • First you have to write out the config files for all of the certificates.
  • Then generate a .csr file for each of those certificates.
  • Submit the .csr and get a CA signed SSL certificate back.
  • Download the root and intermediary certificates.
  • Create SSL Chains with the root, intermediary and SSL certificate. This is where one of the most common mistakes occur, mixing up the chain certificate ordering.
  • Using OpenSSL you can create a range of .pem or .p7b or .pfx files depending on what the specific product is that your implementing.
  • And then you can start to install the SSL Certificates for each product.

If you haven’t done this process hundreds of times, it’s quite a time consuming task and if you get it wrong it takes a lot of time to resolve issues. This is just one of those things that I don’t think anyone really enjoys doing. Until now, that is. I’ve spent the last week playing with the VVD CertGen tool and I’ve actually enjoyed my time doing it. So much so that i’ve even written a PowerShell script to make the process even easier and i’d like to share my work with the community.

First of all, I am no PowerShell expert and of course I can’t take responsibility for anything that happens with this script. The VMware CertGen tool does all the hard work, my script simply takes the input from a .csv file and then creates all of the config files which are then input into the CertGen tool. I’ve wrapped it all up into a simple process that anyone can use. The CertGen tool outputs CA Signed SSL Certificates for all of the products and automatically creates the various different certificate formats that each product requires. All that is left to do is upload the SSL certificate to the product.

CertGen Tool and Scripts

The first thing you need to do is review the VMware KB article KB2146215 on the CertGen Tool. This article will provide you with the instructions to use the CertGen tool. I will cover off the simple steps, however the KB article details the pre-requisites and configuration of the CA servers, the supported platforms, product compatibility and it also explains use-cases outside of what i’ll explain here. This blog article will cover the use of my script to automatically generate the configuration files from a .csv and a simplified set of instructions for the CertGen tool usage.

At the bottom of the KB Article, in the attachments section, download the CertGenVVD zip file.

Screen Shot 2017-05-08 at 10.58.26 AM

Extract the zip file to a location that will be easy to access via command line. This can be simply c:\Temp. The zip file contains the “ConfigFiles” folder, a “default.txt” file and the “CertGenVVD-3.0.ps1” script file.

Open the “ConfigFiles” folder and delete all of the existing config files, or you can delete the entire folder, the script will just re-create the folder anyway. Normally you would use these files to manually update the configuration details for each of your products. We don’t need to do this because we will use a csv file and then build all of these files using the script. You can also delete the “default.txt” file as we won’t need this.

Download my Certificate Config Tool which will include the csv configuration file “CertConfig.csv” and the “CertConfig.ps1” script. Extract this zip file to the same location as the CertGen Tool. You should now have a file structure that looks like this.

Screen Shot 2017-05-08 at 11.51.10 AM

I have offered the above instructions so that you can download the most up to date version of the CertGenVVD tool and use it in conjunction with my script. If you would rather a more simplified approach and download the pre-configured package, then you can download the Cert Tool zip file here which contains my configuration scripts + the CertGenVVD-3.0.ps1 scripts in a pre-configured directory. Just Download the zip file and extract to to a directory like C:\Temp.

Cert Tool Package Download

Creating the SSL Certificates

I first created this spreadsheet to be used with the VMware Validated Design (VVD) Configuration Workbook and the values are linked to the configuration cells within the VVD workbook. When using the VVD Deployment Tool the certificate configuration is entirely automated from generation of the configuration files and all the way to implementing the certificates for each of the products. I have simply exported the spreadsheet as a csv file and shared it as-is so that it can be more widely used outside of the VVD process.

Update the Cert Config csv

Therefore the first step you must do is update the values within the csv file. I have pre-populated the configuration details that I used in a test lab so that you can see how it works.

Screen Shot 2017-05-08 at 1.06.28 PM

  • Every row with a “Name” on it relates to an individual certificate that will be created
  • If the DNS1 column contains an “n/a” then the certificate for that row will be skipped. I have included certificates for a number of fake hosts in the configuration csv that you can leave as n/a or delete the row if you don’t need them.
  • Some products require additional SANs (Subject Alternate Names), therefore each DNS column references an additional SAN for each certificate. If you don’t require additional names, leave the cells blank.
  • The domain name needs to be populated because the PowerShell script uses the short DNS name separately. The script will combine the short DNS and Domain Name to create the FQDN.
  • Some products require the IP address. You can populate that here or leave it blank for the products that you only want to have a DNS record and not locked to an IP address.
  • The FileName column is the name of the configuration file that gets created. The name and folder structure of the Signed Certificates is created by the CertGenVVD Tool and is based on the Common Name inside the certificate (the FQDN).

Once the csv file is complete save it with the same filename “CertConfig.csv” in the same directory as the “CertConfig.ps1” file. The script expects this file to be in the same folder as the script, as does the CertGenVVD script.

Prepare the Microsoft CA Server

To use a Microsoft Certificate Authority Server you must ensure that the server meets the pre-requisites that the CertGenVVD script required. This is fairly simple to do, if you have administrator rights to the CA.

As part of the Certificate Authority services, you must ensure that the following additional services are installed and configured

  • Certificate Authority Web Enrolment
  • Certificate Authority Web Serviced

You will also need a Certificate Template that is used to sign the certificates. Open your CA server settings, expand the folder structure, right click on “Certificate Templates” and select “Manage“. Right click the “Web Server” and select “Duplicate Template“. I create a VMware specific Template that includes the following configuration.

  • Template Name – VMware.
  • Compatibility of Windows Server 2003 and upwards.
  • In the Subject Name tab, make sure “Supply in the request” is selected.
  • In the Extensions tab.
    • Delete all the application policies.
    • In Key Usage select “Signature is proof of origin (nonrepudiation)”.

Screen Shot 2017-05-08 at 2.31.59 PM

Close the Certificate Templates Console and add the new VMware Certificate Template to the CA by right clicking on the “Certificate Templates” folder, select “New” and then select “Certificate Template to Issue“. Find the “VMware” certificate and click OK.

Prepare the Operating System

On the Windows Operating System that in intend to execute the scripts from you will need to install OpenSSL and Java. Without these installed the CertGenVVD script will not work.

You should download the most up to date versions online, however for ease of use I am using the following versions that are bundled with the VVD Deployment Tool.

Win32 OpenSSL
Java 8u60

Download and install OpenSSL and Java. Once these are installed you will need to set your environment PATHs to include these products. To do this, right click on your computer, go to “Properties” and then “Advanced System Settings“. In the “Advanced” tab click on “Environment Variables

Screen Shot 2017-05-08 at 2.52.39 PM

Create a new System Variable called JAVA_HOME and enter the path to the Java application folder.

Screen Shot 2017-05-08 at 2.53.37 PM

Scroll down through the “System Variables” and find the “path“. Edit the path variable and add the OpenSSL and Java Path’s to end of the variable. Use a semicolon “;” as the separator.

Execute the CertConfig Script

  1. Change Directory to the location of the CertConfig.ps1 script. In my case this is C:\Temp\CertTool
  2. Execute the “CertConfig.ps1” script
  3. Answer the default configuration questions:
    1. Organisation
    2. OU
    3. Location
    4. State
    5. County
    6. Key Size (Default is set to 2048)

Screen Shot 2017-05-08 at 1.31.20 PM

That it! You will now see a new folder called “ConfigFiles” within the Cert Tool directory that has been fully populated with the configuration files for each of your certificates.

Execute the CertGenVVD Script

  1. Set the execution policy to remote signed with the following command.
    Set-ExecutionPolicy RemoteSigned
  2. Do a test run of the CertGenVVD script by first running the script with the -validate parameter. This will check everything is configured successfully and ready to issues the CA signed certificates.
    ./CertGenVVD-3.0.ps1 -validate
  3. Execute the “CertGenVVD-3.0.ps1” script with the required parameters (as defined in the KB article KB2146215.
    ./CertGenVVD-3.0.ps1 -MSCASigned -attrib “CertificateTemplate:VMware” -config “labrat.local\labrat-CA” -username labrat\Administrator -password VMware1!

The -attrib parameter references the CA Servers Certificate Template that will be used to sign these certificates. You created this when preparing the CA Server.

The -config parameter is the name of your CA Server.

Screen Shot 2017-05-08 at 1.31.20 PM
You will be asked to enter a password for the p12/pem certificates. This is required.

Screen Shot 2017-05-08 at 2.00.35 PM

It will only take a minute and the script will do all the rest of the work. When the script is finished you will be presented with a list the certificates that were generated, which will be located in a new directory called “SignedByMSCACerts


Other posts you might be interested in:

SuperMicro VSAN HCIBench

SSL Certificate Tool (CertGenVVD)

Single Node SuperMicro Home Lab

SuperMicro vs Intel NUC