Hi everyone,
I am setting up a vSphere 5.5 environment in the lab that is to use stateless caching to USB. I am under the gun to get this into production yesterday.
The hardware used is a Cisco UCS B200-M3 blade with B200M3.2.2.2.0.04282014643 firmware and Cisco provided 4GB USB disk. There are 4 fiber channel LUNs presented, but these are meant to be data only.
I have been able to validate that the USB disk works by manually installing ESXi to it, and booting from it.
The short version:
We have the vCenter/Auto Deploy/DNS/DHCP/TFTP infrastructure validated and working. Auto Deploy rules are working. Applying the Host Profile with "Enable stateless caching to a USB disk on the host" fails with an error that the cache does not meet specification and that the host needs to be rebooted. Rebooting once or many times does not resolve the error. The USB disk is blank; the host will not boot from it. I tried switching the Host Profile setting to "Enable stateless caching on the host" with the first argument being "usb", selected to overwrite any existing VMFS volumes, and selected to ignore any SSD devices. Same thing happens.
The long version:
I have spent a fair bit of time troubleshooting this, and believe that I found the root cause: the USB is being claimed for passthrough when it should be left alone for ESXi to mount it and use it.
Here's the story:
/var/log # lsusb
Bus 02 Device 04: ID 0624:0402 Avocent Corp.
Bus 02 Device 03: ID 13fe:3100 Kingston Technology Company Inc. 2/4 GB stick
Bus 02 Device 02: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub
Bus 02 Device 01: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 01 Device 02: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub
Bus 01 Device 01: ID 1d6b:0002 Linux Foundation 2.0 root hub
* note: the USB stick is a Cisco part number made by UNIGEN, not Kingston
/var/log # dmesg | grep 13fe
2014-09-05T18:07:23.336Z cpu8:33604)<6>usb 2-1.3: New USB device found, idVendor=13fe, idProduct=3100
2014-09-05T18:07:26.950Z cpu6:33693)<6>usb 2-1.3: Vendor: 0x13fe, Product: 0x3100, Revision: 0x0100
/var/log # dmesg | grep 2-1.3
2014-09-05T18:07:23.216Z cpu8:33604)<6>usb 2-1.3: new high speed USB device number 3 using ehci_hcd
2014-09-05T18:07:23.336Z cpu8:33604)<6>usb 2-1.3: New USB device found, idVendor=13fe, idProduct=3100
2014-09-05T18:07:23.336Z cpu8:33604)<6>usb 2-1.3: New USB device strings: Mfr=1, Product=2, SerialNumber=3
2014-09-05T18:07:23.336Z cpu8:33604)<6>usb 2-1.3: Product: PSE4000S3
2014-09-05T18:07:23.336Z cpu8:33604)<6>usb 2-1.3: Manufacturer: UNIGEN
2014-09-05T18:07:23.336Z cpu8:33604)<6>usb 2-1.3: SerialNumber: 40E11B0086921B5A
2014-09-05T18:07:23.336Z cpu8:33604)<6>usb 2-1.3: usbfs: registered usb0203
2014-09-05T18:07:26.950Z cpu6:33693)<6>usb 2-1.3: Vendor: 0x13fe, Product: 0x3100, Revision: 0x0100
2014-09-05T18:07:26.950Z cpu6:33693)<6>usb 2-1.3: Interface Subclass: 0x06, Protocol: 0x50
2014-09-05T18:07:27.237Z cpu6:33693)<6>usb-storage 2-1.3:1.0: interface is claimed by usb-storage
2014-09-05T18:07:27.237Z cpu6:33693)<6>usb 2-1.3: device is not available for passthrough
2014-09-05T18:08:42.445Z cpu18:33546)<6>usb-storage 2-1.3:1.0: unclaiming vmhba32
2014-09-05T18:08:42.445Z cpu18:33546)<6>usb 2-1.3: device is available for passthrough
It appears that the USB device is being unclaimed and made available for passthrough to VMs. This is not the desired behaviour.
To find our what my USB device is called, I ran:
esxcli storage core device list | less (output shortened for clarity)
mpx.vmhba32:C0:T0:L0
Display Name: Local USB Direct-Access (mpx.vmhba32:C0:T0:L0)
Vendor: UNIGEN
Model: PSE4000S3
Based on my understanding, there is a way to prevent making a device available for passthrough by marking it perenially reserved. This can be done in the host profile, which I've done:
Also, I turned off the USB arbitrator service and tried to reapply the Host Profile to no avail:
/etc/init.d/usbarbitrator stop
So I kept on digging for the reason why ESXi is not writing the cache to USB.
Looking at syslog.log, here's what I found. (results redacted and shortened)
- 2014-09-05T17:47:46Z 2014-09-05 17: 47:46,938 Host Profiles[40280]: INFO: Now caching to disk...^@ <-- this is good!
2014-09-05T17:47:55Z HostProfileManager: [2014-09-05 17:47:55,380 root INFO] Scanning mpx.vmhba32:C0:T0:L0 for any installs ...^@
2014-09-05T17:47:55Z HostProfileManager: [2014-09-05 17:47:55,715 root INFO] gpt
487 255 63 7831552
1 2048 7829503 EBD0A0A2B9E5443387C068B6B72699C7 linuxNative 0
^@
2014-09-05T17:47:55Z HostProfileManager: [2014-09-05 17:47:55,942 root INFO] ^@
- 2014-09-05T17:48:01Z HostProfileManager: [2014-09-05 17:48:01,945 root INFO] Found nothing on mpx.vmhba32:C0:T0:L0.^@ <-- this is good!
2014-09-05T17:48:02Z HostProfileManager: [2014-09-05 17:48:02,279 root INFO] gpt
487 255 63 7831552
1 2048 7829503 EBD0A0A2B9E5443387C068B6B72699C7 linuxNative 0
^@
2014-09-05T17:48:02Z HostProfileManager: [2014-09-05 17:48:02,280 root INFO] Fresh install. Using GPT^@
2014-09-05T17:48:02Z HostProfileManager: [2014-09-05 17:48:02,280 root INFO] Using the standard, minimum partition layout.^@
2014-09-05T17:48:02Z HostProfileManager: [2014-09-05 17:48:02,280 root INFO] Checking USB device...^@
2014-09-05T17:48:02Z HostProfileManager: [2014-09-05 17:48:02,807 root INFO] gpt
0 0 0 0
1 64 8191 C12A7328F81F11D2BA4B00A0C93EC93B 128
5 8224 520191 EBD0A0A2B9E5443387C068B6B72699C7 0
6 520224 1032191 EBD0A0A2B9E5443387C068B6B72699C7 0
7 1032224 1257471 9D27538040AD11DBBF97000C2911D1B8 0
8 1257504 1843199 EBD0A0A2B9E5443387C068B6B72699C7 0
^@ <-- this is good!
2014-09-05T17:48:02Z HostProfileManager: [2014-09-05 17:48:02,807 root INFO] Preparing Visor volumes on disk /vmfs/devices/disks/mpx.vmhba32:C0:T0:L0...^@ <-- this is good!
2014-09-05T17:48:03Z HostProfileManager: [2014-09-05 17:48:03,112 root INFO] stderr: create fs deviceName:'/vmfs/devices/disks/mpx.vmhba32:C0:T0:L0:8', fsShortName:'vfat', fsName:'(null)'
deviceFullPath:/dev/disks/mpx.vmhba32:C0:T0:L0:8 deviceFile:mpx.vmhba32:C0:T0:L0:8
Checking if remote hosts are using this device as a valid file system. This may take a few seconds...
Creating vfat file system on "mpx.vmhba32:C0:T0:L0:8" with blockSize 1048576 and volume label "none".
Filesystem was created but mount failed on device "mpx.vmhba32:C0:T0:L0:8".: Not found. ^@ <-- ERROR!
Going back to try and find out what happened to my USB storage, I found:
esxcli storage core device list | less
mpx.vmhba32:C0:T0:L0
Display Name: Local USB Direct-Access (mpx.vmhba32:C0:T0:L0)
Has Settable Display Name: false
Size: 0
Device Type: Direct-Access
Multipath Plugin: NMP
Devfs Path:
Vendor: UNIGEN
Model: PSE4000S3
Revision: PMAP
SCSI Level: 2
Is Pseudo: false
Status: dead timeout
Is RDM Capable: false
Is Local: true
Is Removable: true
Is SSD: false
Is Offline: false
Is Perennially Reserved: false
Queue Full Sample Size: 0
Queue Full Threshold: 0
Thin Provisioning Status: unknown
Attached Filters:
VAAI Status: unsupported
Other UIDs: vml.0000000000766d68626133323a303a30
Is Local SAS Device: false
Is Boot USB Device: false
No of outstanding IOs with competing worlds: 32
The USB disk is in "dead timeout" and the "perenially reserved" setting from the Host Profile had no effect.
However, I was able to prove that the USB device worked fine, at least for a while:
esxcli storage core device stats get | less
mpx.vmhba32:C0:T0:L0
Device: mpx.vmhba32:C0:T0:L0
Successful Commands: 471
Blocks Read: 7455
Blocks Written: 0
Read Operations: 309
Write Operations: 0
Reserve Operations: 0
Reservation Conflicts: 0
Failed Commands: 85
Failed Blocks Read: 0
Failed Blocks Written: 0
Failed Read Operations: 0
Failed Write Operations: 0
Failed Reserve Operations: 0
Looking at the VM kernel log, see this:
vmkernel.log | less
2014-09-05T19:03:50.242Z cpu9:39820 opID=9efa2c3c)World: 14296: VC opID hostd-8b72 maps to vmkernel opID 9efa2c3c
2014-09-05T19:03:55.446Z cpu9:39820 opID=252dfa49)World: 14296: VC opID 58F84D94-00000680-7b-4e maps to vmkernel opID 252dfa49
2014-09-05T19:03:55.475Z cpu9:33047 opID=252dfa49)ScsiPath: 5151: DeletePath : adapter=vmhba32, channel=0, target=0, lun=0
2014-09-05T19:03:55.475Z cpu9:33047 opID=252dfa49)ScsiDevice: 3612: Can't unregister device mpx.vmhba32:C0:T0:L0 because it is in use. OpenCount:1 InternalOpenCount:0 RefCount:2 FilterCount:0
2014-09-05T19:03:55.475Z cpu9:33047 opID=252dfa49)ScsiDevice: 3623: Device mpx.vmhba32:C0:T0:L0 was in use by worldId 0
2014-09-05T19:03:55.475Z cpu9:33047 opID=252dfa49)WARNING: NMP: nmpUnclaimPath:1502: NMP device "mpx.vmhba32:C0:T0:L0" quiesce state change failed: Busy
2014-09-05T19:03:55.475Z cpu9:33047 opID=252dfa49)WARNING: ScsiPath: 3708: Path vmhba32:C0:T0:L0 is being removed
2014-09-05T19:03:55.475Z cpu9:33047 opID=252dfa49)WARNING: ScsiPath: 3914: Failed to issue command 0x0 (cmdSN 0x0) on path vmhba32:C0:T0:L0: No connection
2014-09-05T19:03:55.475Z cpu9:33047 opID=252dfa49)ScsiPath: 4874: Path vmhba32:C0:T0:L0 could not be unclaimed from plugin, status Busy. Continue path unclaiming
2014-09-05T19:03:55.475Z cpu9:33047 opID=252dfa49)WARNING: ScsiScan: 1758: Could not delete path vmhba32:C0:T0:L0
I confirmed that mpx.vmhba32:C0:T0:L0 is truly unavailable by trying to read from it.
/dev/disks # ls -l ./mpx*
-rw------- 1 root root 4009754624 Sep 5 19:42 ./mpx.vmhba32:C0:T0:L0
-rw------- 1 root root 4161536 Sep 5 19:42 ./mpx.vmhba32:C0:T0:L0:1
-rw------- 1 root root 262127616 Sep 5 19:42 ./mpx.vmhba32:C0:T0:L0:5
-rw------- 1 root root 262127616 Sep 5 19:42 ./mpx.vmhba32:C0:T0:L0:6
-rw------- 1 root root 115326976 Sep 5 19:42 ./mpx.vmhba32:C0:T0:L0:7
-rw------- 1 root root 299876352 Sep 5 19:42 ./mpx.vmhba32:C0:T0:L0:8
/dev/disks # cat ./mpx.vmhba32\:C0\:T0\:L0\:1
cat: read error: Input/output error
(yes, I know I would get a bunch of garbage, but no error)
So it appears that the USB stick is recognized, partially configured (partitions are written), but fails at some point before ESXi is able to mount it to write the cache to it.
I was hoping to reset the USB bus by disabling and re-enabling the ESXi kernel USB and USB-storage modules, but that didn't seem to work - it was a bit of a long shot.
esxcli system module set --enabled=false --module=usb
esxcli system module set --enabled=true --module=usb
esxcli system module set --enabled=false --module=usb-storage
esxcli system module set --enabled=true --module=usb-storage
Has anyone else seen this behaviour?? Any help would be greatly appreciated.
Thanks,
Michal