Hi All,
On one of our subsites we've got an ESXi host connected to a central vCenter server. The location has a main router with a failover router in case the main connection fails.
Main Router: 192.168.66.254
Backup router: 192.168.66.253
Core switch: 192.168.66.250
On the ESXi host the default gateway is set to the core switch ip: 192.168.66.250 (just like every other device in the network)
Our main link failed in the night and the backup link took over, everything (pc's printers, virtual servers etc) was working as expected on the backup link, but the ESXi host was unreachable in vCenter. I could ping the ESXi server from my laptop, but could not do it from the console on the vCenter server.
I logged on with SSH to the ESXi host and ran:
esxcli network ip route ipv4 list
Network Netmask Gateway Interface Source
------------ ------------- ------------ --------- ------
default 0.0.0.0 192.168.66.250 vmk0 MANUAL
192.168.27.12 255.255.255.255 192.168.66.254 vmk0 MANUAL
192.168.28.2 255.255.255.255 192.168.66.254 vmk0 MANUAL
192.168.17.154 255.255.255.255 192.168.66.253 vmk0 MANUAL
I've noticed the default gateway for the DNS server (192.168.27.12) and vCenter (192.168.28.2) had the main router set instead of the core switch, and also that my pc (192.168.17.154) was listed with the backup gateway. I ran the route remove command to remove the routes and the ESXi host became visible in vCenter again.
I double checked the settings, it all was fine, gateway was indeed set to the core switch.
Now the main link is back up so the failover link stops working, guess what, the ESXi host was again not connected into vCenter, tried to ping it from the vCenter and my laptop, no replies from them.
I logged on to another computer and I was able to ping and ssh into the ESXi host
esxcli network ip route ipv4 list
Network Netmask Gateway Interface Source
------------ ------------- ------------ --------- ------
default 0.0.0.0 192.168.66.250 vmk0 MANUAL
192.168.27.12 255.255.255.255 192.168.66.253 vmk0 MANUAL
192.168.28.2 255.255.255.255 192.168.66.253 vmk0 MANUAL
192.168.17.154 255.255.255.255 192.168.66.253 vmk0 MANUAL
192.168.17.179 255.255.255.255 192.168.66.254 vmk0 MANUAL
So once again I manually removed the routes and everything was functioning again.
I simulated the failover in our test lab and it had the exact same behaviour, it looks like ESXi takes on gateway and then never releases that IP.
any thoughts on this odd behavior? I expected ESXi to failover just like any other pc/printer/server etc in our network would do, but clearly it doesn't
Thanks for your input!