Quantcast
Channel: VMware Communities : All Content - ESXi
Viewing all articles
Browse latest Browse all 8313

Stateless caching to USB problems with ESXi 5.5U1 on Cisco UCS

$
0
0

Hi everyone,

 

I am setting up a vSphere 5.5 environment in the lab that is to use stateless caching to USB.  I am under the gun to get this into production yesterday.

 

The hardware used is a Cisco UCS B200-M3 blade with B200M3.2.2.2.0.04282014643 firmware and Cisco provided 4GB USB disk.  There are 4 fiber channel LUNs presented, but these are meant to be data only.

 

I have been able to validate that the USB disk works by manually installing ESXi to it, and booting from it.

 

The short version:

 

We have the vCenter/Auto Deploy/DNS/DHCP/TFTP infrastructure validated and working.  Auto Deploy rules are working.  Applying the Host Profile with "Enable stateless caching to a USB disk on the host" fails with an error that the cache does not meet specification and that the host needs to be rebooted.  Rebooting once or many times does not resolve the error.  The USB disk is blank; the host will not boot from it.  I tried switching the Host Profile setting to "Enable stateless caching on the host" with the first argument being "usb", selected to overwrite any existing VMFS volumes, and selected to ignore any SSD devices.  Same thing happens.

 

The long version:

 

I have spent a fair bit of time troubleshooting this, and believe that I found the root cause:  the USB is being claimed for passthrough when it should be left alone for ESXi to mount it and use it.

 

Here's the story:

 

/var/log # lsusb

Bus 02 Device 04: ID 0624:0402 Avocent Corp.

Bus 02 Device 03: ID 13fe:3100 Kingston Technology Company Inc. 2/4 GB stick

Bus 02 Device 02: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub

Bus 02 Device 01: ID 1d6b:0002 Linux Foundation 2.0 root hub

Bus 01 Device 02: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub

Bus 01 Device 01: ID 1d6b:0002 Linux Foundation 2.0 root hub

 

* note: the USB stick is a Cisco part number made by UNIGEN, not Kingston

 

/var/log # dmesg | grep 13fe

2014-09-05T18:07:23.336Z cpu8:33604)<6>usb 2-1.3: New USB device found, idVendor=13fe, idProduct=3100

2014-09-05T18:07:26.950Z cpu6:33693)<6>usb 2-1.3: Vendor: 0x13fe, Product: 0x3100, Revision: 0x0100

 

 

/var/log # dmesg | grep 2-1.3

2014-09-05T18:07:23.216Z cpu8:33604)<6>usb 2-1.3: new high speed USB device number 3 using ehci_hcd

2014-09-05T18:07:23.336Z cpu8:33604)<6>usb 2-1.3: New USB device found, idVendor=13fe, idProduct=3100

2014-09-05T18:07:23.336Z cpu8:33604)<6>usb 2-1.3: New USB device strings: Mfr=1, Product=2, SerialNumber=3

2014-09-05T18:07:23.336Z cpu8:33604)<6>usb 2-1.3: Product: PSE4000S3

2014-09-05T18:07:23.336Z cpu8:33604)<6>usb 2-1.3: Manufacturer: UNIGEN

2014-09-05T18:07:23.336Z cpu8:33604)<6>usb 2-1.3: SerialNumber: 40E11B0086921B5A

2014-09-05T18:07:23.336Z cpu8:33604)<6>usb 2-1.3: usbfs: registered usb0203

2014-09-05T18:07:26.950Z cpu6:33693)<6>usb 2-1.3: Vendor: 0x13fe, Product: 0x3100, Revision: 0x0100

2014-09-05T18:07:26.950Z cpu6:33693)<6>usb 2-1.3: Interface Subclass: 0x06, Protocol: 0x50

2014-09-05T18:07:27.237Z cpu6:33693)<6>usb-storage 2-1.3:1.0: interface is claimed by usb-storage

2014-09-05T18:07:27.237Z cpu6:33693)<6>usb 2-1.3: device is not available for passthrough

2014-09-05T18:08:42.445Z cpu18:33546)<6>usb-storage 2-1.3:1.0: unclaiming vmhba32

2014-09-05T18:08:42.445Z cpu18:33546)<6>usb 2-1.3: device is available for passthrough

 

It appears that the USB device is being unclaimed and made available for passthrough to VMs.  This is not the desired behaviour.

 

To find our what my USB device is called, I ran:


esxcli storage core device list | less  (output shortened for clarity)

mpx.vmhba32:C0:T0:L0

  Display Name: Local USB Direct-Access (mpx.vmhba32:C0:T0:L0)

  Vendor: UNIGEN

  Model: PSE4000S3

 

Based on my understanding, there is a way to prevent making a device available for passthrough by marking it perenially reserved.  This can be done in the host profile, which I've done:

 

pernially_reserved.png

 

Also, I turned off the USB arbitrator service and tried to reapply the Host Profile to no avail:

/etc/init.d/usbarbitrator stop

 

 

So I kept on digging for the reason why ESXi is not writing the cache to USB.

 

Looking at syslog.log, here's what I found.  (results redacted and shortened)

 

  1. 2014-09-05T17:47:46Z 2014-09-05 17: 47:46,938 Host Profiles[40280]: INFO: Now caching to disk...^@ <-- this is good!

2014-09-05T17:47:55Z HostProfileManager: [2014-09-05 17:47:55,380 root     INFO] Scanning mpx.vmhba32:C0:T0:L0 for any installs ...^@

2014-09-05T17:47:55Z HostProfileManager: [2014-09-05 17:47:55,715 root     INFO] gpt

487 255 63 7831552

1 2048 7829503 EBD0A0A2B9E5443387C068B6B72699C7 linuxNative 0

^@

2014-09-05T17:47:55Z HostProfileManager: [2014-09-05 17:47:55,942 root     INFO] ^@

  1. 2014-09-05T17:48:01Z HostProfileManager: [2014-09-05 17:48:01,945 root     INFO]   Found nothing on mpx.vmhba32:C0:T0:L0.^@ <-- this is good!

2014-09-05T17:48:02Z HostProfileManager: [2014-09-05 17:48:02,279 root     INFO] gpt

487 255 63 7831552

1 2048 7829503 EBD0A0A2B9E5443387C068B6B72699C7 linuxNative 0

^@

2014-09-05T17:48:02Z HostProfileManager: [2014-09-05 17:48:02,280 root     INFO] Fresh install.  Using GPT^@

2014-09-05T17:48:02Z HostProfileManager: [2014-09-05 17:48:02,280 root     INFO]   Using the standard, minimum partition layout.^@

2014-09-05T17:48:02Z HostProfileManager: [2014-09-05 17:48:02,280 root     INFO] Checking USB device...^@

2014-09-05T17:48:02Z HostProfileManager: [2014-09-05 17:48:02,807 root     INFO] gpt

0 0 0 0

1 64 8191 C12A7328F81F11D2BA4B00A0C93EC93B 128

5 8224 520191 EBD0A0A2B9E5443387C068B6B72699C7 0

6 520224 1032191 EBD0A0A2B9E5443387C068B6B72699C7 0

7 1032224 1257471 9D27538040AD11DBBF97000C2911D1B8 0

8 1257504 1843199 EBD0A0A2B9E5443387C068B6B72699C7 0

^@ <-- this is good!

2014-09-05T17:48:02Z HostProfileManager: [2014-09-05 17:48:02,807 root     INFO] Preparing Visor volumes on disk /vmfs/devices/disks/mpx.vmhba32:C0:T0:L0...^@ <-- this is good!

2014-09-05T17:48:03Z HostProfileManager: [2014-09-05 17:48:03,112 root     INFO] stderr: create fs deviceName:'/vmfs/devices/disks/mpx.vmhba32:C0:T0:L0:8', fsShortName:'vfat', fsName:'(null)'

deviceFullPath:/dev/disks/mpx.vmhba32:C0:T0:L0:8 deviceFile:mpx.vmhba32:C0:T0:L0:8

Checking if remote hosts are using this device as a valid file system. This may take a few seconds...

Creating vfat file system on "mpx.vmhba32:C0:T0:L0:8" with blockSize 1048576 and volume label "none".

Filesystem was created but mount failed on device "mpx.vmhba32:C0:T0:L0:8".: Not found. ^@  <-- ERROR!

 

 

Going back to try and find out what happened to my USB storage, I found:

 

esxcli storage core device list | less

   mpx.vmhba32:C0:T0:L0

   Display Name: Local USB Direct-Access (mpx.vmhba32:C0:T0:L0)

   Has Settable Display Name: false

   Size: 0

   Device Type: Direct-Access

   Multipath Plugin: NMP

   Devfs Path:

   Vendor: UNIGEN

   Model: PSE4000S3

   Revision: PMAP

   SCSI Level: 2

   Is Pseudo: false

   Status: dead timeout

   Is RDM Capable: false

   Is Local: true

   Is Removable: true

   Is SSD: false

   Is Offline: false

   Is Perennially Reserved: false

   Queue Full Sample Size: 0

   Queue Full Threshold: 0

   Thin Provisioning Status: unknown

   Attached Filters:

   VAAI Status: unsupported

   Other UIDs: vml.0000000000766d68626133323a303a30

   Is Local SAS Device: false

   Is Boot USB Device: false

   No of outstanding IOs with competing worlds: 32

 

 

The USB disk is in "dead timeout" and the "perenially reserved" setting from the Host Profile had no effect.

 

However, I was able to prove that the USB device worked fine, at least for a while:

 

esxcli storage core device stats get | less

 

   mpx.vmhba32:C0:T0:L0

   Device: mpx.vmhba32:C0:T0:L0

   Successful Commands: 471

   Blocks Read: 7455

   Blocks Written: 0

   Read Operations: 309

   Write Operations: 0

   Reserve Operations: 0

   Reservation Conflicts: 0

   Failed Commands: 85

   Failed Blocks Read: 0

   Failed Blocks Written: 0

   Failed Read Operations: 0

   Failed Write Operations: 0

   Failed Reserve Operations: 0

 

Looking at the VM kernel log, see this:

 

vmkernel.log | less

2014-09-05T19:03:50.242Z cpu9:39820 opID=9efa2c3c)World: 14296: VC opID hostd-8b72 maps to vmkernel opID 9efa2c3c

2014-09-05T19:03:55.446Z cpu9:39820 opID=252dfa49)World: 14296: VC opID 58F84D94-00000680-7b-4e maps to vmkernel opID 252dfa49

2014-09-05T19:03:55.475Z cpu9:33047 opID=252dfa49)ScsiPath: 5151: DeletePath : adapter=vmhba32, channel=0, target=0, lun=0

2014-09-05T19:03:55.475Z cpu9:33047 opID=252dfa49)ScsiDevice: 3612: Can't unregister device mpx.vmhba32:C0:T0:L0 because it is in use.  OpenCount:1 InternalOpenCount:0 RefCount:2 FilterCount:0

2014-09-05T19:03:55.475Z cpu9:33047 opID=252dfa49)ScsiDevice: 3623: Device mpx.vmhba32:C0:T0:L0 was in use by worldId 0

2014-09-05T19:03:55.475Z cpu9:33047 opID=252dfa49)WARNING: NMP: nmpUnclaimPath:1502: NMP device "mpx.vmhba32:C0:T0:L0" quiesce state change failed: Busy

2014-09-05T19:03:55.475Z cpu9:33047 opID=252dfa49)WARNING: ScsiPath: 3708: Path vmhba32:C0:T0:L0 is being removed

2014-09-05T19:03:55.475Z cpu9:33047 opID=252dfa49)WARNING: ScsiPath: 3914: Failed to issue command 0x0 (cmdSN 0x0) on path vmhba32:C0:T0:L0: No connection

2014-09-05T19:03:55.475Z cpu9:33047 opID=252dfa49)ScsiPath: 4874: Path vmhba32:C0:T0:L0 could not be unclaimed from plugin, status Busy. Continue path unclaiming

2014-09-05T19:03:55.475Z cpu9:33047 opID=252dfa49)WARNING: ScsiScan: 1758: Could not delete path vmhba32:C0:T0:L0


I confirmed that mpx.vmhba32:C0:T0:L0 is truly unavailable by trying to read from it.

 

/dev/disks # ls -l ./mpx*

-rw-------    1 root     root     4009754624 Sep  5 19:42 ./mpx.vmhba32:C0:T0:L0

-rw-------    1 root     root       4161536 Sep  5 19:42 ./mpx.vmhba32:C0:T0:L0:1

-rw-------    1 root     root     262127616 Sep  5 19:42 ./mpx.vmhba32:C0:T0:L0:5

-rw-------    1 root     root     262127616 Sep  5 19:42 ./mpx.vmhba32:C0:T0:L0:6

-rw-------    1 root     root     115326976 Sep  5 19:42 ./mpx.vmhba32:C0:T0:L0:7

-rw-------    1 root     root     299876352 Sep  5 19:42 ./mpx.vmhba32:C0:T0:L0:8

 

/dev/disks # cat  ./mpx.vmhba32\:C0\:T0\:L0\:1

cat: read error: Input/output error

(yes, I know I would get a bunch of garbage, but no error)

 

So it appears that the USB stick is recognized, partially configured (partitions are written), but fails at some point before ESXi is able to mount it to write the cache to it.

 

I was hoping to reset the USB bus by disabling and re-enabling the ESXi kernel USB and USB-storage modules, but that didn't seem to work - it was a bit of a long shot.

 

esxcli system module set --enabled=false  --module=usb

esxcli system module set --enabled=true  --module=usb


esxcli system module set --enabled=false  --module=usb-storage

esxcli system module set --enabled=true  --module=usb-storage

 

 

Has anyone else seen this behaviour??  Any help would be greatly appreciated.

 

Thanks,

 

Michal


Viewing all articles
Browse latest Browse all 8313

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>