Sometimes it happens that you miss certain features on a Nutanix Enterprise Cloud based on AHV. One of those features is the ability to browse files inside a Storage Container using a File Explorer type of GUI. Recently, this happened to me when I needed to delete several unneeded Storage Containers but was unable to do so due to an error message stating: “Container <x> contains multiple vDisk(s) not marked for removal”.
Before trying to delete any Storage Container (‘SC’) via the Prism GUI, I did the following:
- Confirmed that no VM’s had vDisks on the involved SC’s
- I did this using Prism via “VM > Table > Select VM > Update” and then checking the various vDisks and associated SC’s
- Checked that there are no image files (e.g. ISO’s) stored on the involved SC’s
- I did this also using Prism via “Gear Icon > Image Configuration” and then checking the various images and associated SC’s
During my above checks, I did delete a few unneeded VM’s and images, which resulted in the “Used Capacity” going down on a few involved SC’s. However, this did not seem to happen for all SC’s and seemed to be stuck at a certain amount of GiB’s or even MiB’s in some cases.
Thereafter, I reviewed the configured Protection Domains (Async DR) and confirmed that there were no unnecessary local snapshots present. Because there was no Production workload deployed on this Nutanix AHV cluster, I went ahead and deleted all local snapshots from all Protection Domains. Additionally, I deleted all active schedules preventing new snapshots from being created for the time being. Unfortunately, this also did not result in the SC’s becoming empty. “I would very much like a GUI-based File Explorer now.” is what went through my head at that point in time.
Nevertheless, whatever the Prism GUI lacks in functionality is almost always available via ncli commands. So, I executed the following steps:
- Via SSH connect to a CVM inside the same cluster
- List all SC’s via “ncli ctr list” and note down the short ID after the double colon (::) in the “ID” field for the involved SC
Id : 0005a3b8-6b49-5a63-56fc-ac1f6b3bb9db::6742 Uuid : a7485f51-63a8-4b69-9aea-c3083f49807c Name : SRV011 Storage Pool Id : 0005a3b8-6b49-5a63-56fc-ac1f6b3bb9db::9 Storage Pool Uuid : 93624aa5-de1b-4faf-bed4-d6efe2bd4a7d Free Space (Logical) : 46.01 TiB (50,593,557,299,200 bytes) Used Space (Logical) : 0 bytes Allowed Max Capacity : 46.01 TiB (50,593,557,299,200 bytes) Used by other Containers : 5.38 TiB (5,912,803,643,392 bytes) Explicit Reservation : 0 bytes Thick Provisioned : 0 bytes Replication Factor : 2 Oplog Replication Factor : 2 NFS Whitelist Inherited : true Container NFS Whitelist : VStore Name(s) : SRV011 Random I/O Pri Order : SSD-PCIe, SSD-SATA, DAS-SATA Sequential I/O Pri Order : SSD-PCIe, SSD-SATA, DAS-SATA Compression : on Compression Delay : 0 mins Fingerprint On Write : off On-Disk Dedup : off Erasure Code : off Software Encryption : off
- List all files (vDisks) in that SC via “vdisk_config_printer –container_id= <short ID>”
nutanix@NTNX-x-A-CVM:10.0.1.27:~$ vdisk_config_printer -container_id=6742 vdisk_id: 27297806 vdisk_name: "NFS:4611686018454686421" vdisk_size: 4398046511104 container_id: 6742 to_remove: true creation_time_usecs: 1594737785715515 mutability_state: kImmutable vdisk_creator_loc: 8 vdisk_creator_loc: 27229315 vdisk_creator_loc: 2039232 nfs_file_name: "a75a66e5-9081-4dc5-949e-c48440296bec" may_be_parent: true originating_cluster_id: xxxxxxxxxxxx originating_cluster_incarnation_id: xxxxxxxxxxxx originating_vdisk_id: 23637936 vdisk_uuid: "3415d5ee-0c25-4a47-96c6-28c499fc6b28" chain_id: "85cffefd-d2a4-439a-8cdb-0d102c96e12a" last_modification_time_usecs: 1594759382806740 vdisk_id: 28672520 vdisk_name: "NFS:4611686018456061604" parent_vdisk_id: 27859255 vdisk_size: 4398046511104 container_id: 6742 creation_time_usecs: 1594808111371359 mutability_state: kImmutableSnapshot closest_named_ancestor: "NFS:4611686018455307570" avoid_vblock_copy_when_leaf: true vdisk_creator_loc: 7 vdisk_creator_loc: 27229202 vdisk_creator_loc: 41124027 nfs_file_name: "a75a66e5-9081-4dc5-949e-c48440296bec" may_be_parent: true parent_nfs_file_name_hint: "a75a66e5-9081-4dc5-949e-c48440296bec" never_hosted: false has_complete_data: true vdisk_uuid: "a52b57b7-e797-4c49-a9fc-be5cb0626ff9" chain_id: "85cffefd-d2a4-439a-8cdb-0d102c96e12a" vdisk_snapshot_uuid: "49eb540a-a1c5-4dfe-9ba3-bc3b8286a79d" last_modification_time_usecs: 1594808141902078
- Make note of which vDisks do not have the “to_remove: true” parameter assigned
In my case, most of involved SC’s had just a few vDisks not marked for removal. Apparently, these were zombie snapshot files left behind from Async DR Protection Domains not relevant anymore. Using the same SSH session as above, I executed the following steps to mark those vDisks for removal and subsequently delete the entire SC:
- For each noted vDisk, change the disk configuration via “edit_vdisk_config –vdisk_id=<vdisk_id> –editor=vim”
- This action requires you to add a newline “to_remove: true” after the “container_id: 6742” (in my example above)
nutanix@NTNX-x-A-CVM:10.0.1.27:~$ edit_vdisk_config --vdisk_id=28672520 --editor=vim VDisk config update is successful
- When all noted vDisks are changed, proceed and delete the SC via “ncli container remove name=SRV011 ignore-small-files=true force=true”
nutanix@NTNX-x-A-CVM:10.0.1.27:~$ ncli container remove name=SRV011 ignore-small-files=true force=true Storage container deleted successfully
The above approach resolved my inability of deleting those SC’s. All but one. Apparently, that single remaining SC had 80+ zombie snapshot files… As time was not on my side, I needed another approach to accomplish the same end result as above.
I ended up whitelisting an old VMware ESXi host on that particular SC enabling me to mount that SC as a NFS Datastore. After that it was a matter of deleting all files using the Datastore browser functionality, unmounting the SC and deleting the SC via ncli (as above). I executed the following steps:
- In Nutanix Prism, navigate to “Storage > Table” and select the particular SC
- Click on “Update” and enter the IP/MAC of the involved ESXi host in the “Filesystem Whitelist” area
- Logon to the ESXi host and create a new NFS Datastore
- in “Datastore name” enter a name for the Datastore as visible on the ESXi host (e.g. ‘Nutanix-SRV012’)
- in “Folder” enter the SC name in Nutanix (case sensitive) preceded by a “/” (e.g. ‘/SRV012’)
- in “Server” enter the Nutanix CVM IP or Nutanix Cluster VIP (e.g. ‘10.0.1.27’)
- With the Datastore having been created now, select the Datastore on the ESXi host and click on “Files” to enter the browser
- Now, select all files and click on “Delete” to empty the Datastore
- Right-click the Datastore and click on “Unmount Datastore”
- Via SSH connect to a CVM inside the Nutanix cluster (e.g. ‘10.0.1.27’)
- Proceed and delete the SC via ‘ncli container remove name=SRV012 ignore-small-files=true force=true’
nutanix@NTNX-x-A-CVM:10.0.1.27:~$ ncli container remove name=SRV012 ignore-small-files=true force=true Storage container deleted successfully
- That’s it! This approach saved me a lot of time compared to the first approach.
Note that the second approach did not actually remove the files (vDisks) from the SC when I performed the deletion using the Datastore browser. Upon checking the SC contents, using the same ncli command in the first approach, I noticed that those 80+ zombie snapshots files just had that ‘”to_remove: true” parameter now properly set.
Also note that I performed all the above steps on a Nutanix environment, which did not have any Production workloads running. In case you encounter the same issue when trying to delete a SC on a Production environment, always check with Nutanix Support ensuring the four eyes principle.