During a failover from one node to the partner, VMs residing on NFS datastores or LUNs connected to Hypervisors (ESXi, RedHat, Hyper-V) can experience short disk timeouts, sometimes resulting in VMs being paused, IOs freezes or application crashes.
The following are some reasons why guest OS tunings are required:
- To help improve error handling and interoperability during storage controller failover events.
- To improve recovery times following a storage controller failover event
Note: Increasing the SCSI Timeout value is to alleviate the issue of slow Failover/Transient storage conditions. It is not designed to mitigate prolonged underlying storage conditions such as APD/PDL.
Guest OS type | Disk timeout |
Windows | disk timeout = 60 - up to 190 |
Linux | disk timeout = 60 - up to 190 |
Solaris | disk timeout = 60 - up to 190 busy retry = 300 not ready retry = 300 reset retry = 30 max. throttle = 32 min. throttle = 8 |
- For windows edit registry:
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Disk\TimeOutValue
- For Linux:
In ESXi: if VmWare tools are installed the timeout is set automatically to 180 sec. That should be enough in most cases.
For other Hypervisors the timeout can be set using sysfs:
echo 180 > /sys/block/sdc/device/timeout
This command sets the timeout of 180 sec for /dev/sdc only. Should be run for all disks.
- For Solaris: edit /etc/system and add
set sd:sd_io_time=0xb4
For other settings edit /kernel/drv/sd.conf
"NETAPP LUN ","physical-block-size:4096,retries-busy:300,retries-timeout:16,retries-notready:300,retries-reset:30,throttle-max:32,throttle-min:8",
"VMware Virtual ","physical-block-size:4096,retries-busy:300,retries-timeout:16,retries-notready:300,retries-reset:30,throttle-max:32,throttle-min:8";
"VMware Virtual ","physical-block-size:4096,retries-busy:300,retries-timeout:16,retries-notready:300,retries-reset:30,throttle-max:32,throttle-min:8";
NOTE: changing the disk timeout has no performance impact on ths VMs or guest OS.
To discover iSCSI LUNs:
oracle:# svcadm enable network/iscsi/initiator
oracle:# iscsiadm add discovery-address 172.21.201.233:3260
oracle:# iscsiadm modify discovery --sendtargets enable
oracle:# devfsadm -i iscsi
oracle::# format
The new disks (LUNs) should be there.
System wide:
root@solaris:/# echo "sd_io_time::print" | mdb -k
0x3c
root@solaris:/#
0x3c
root@solaris:/#
Per disk:
root@solaris:/# echo "::walk sd_state | ::grep '.!=0' | ::sd_state" | mdb -k | egrep "^un|un_retry_count|un_cmd_timeout"
un: ffffc100000bc040
un: ffffc100000bc040
un_retry_count = 0x5
un_cmd_timeout = 0x3c
un: ffffc1000290f980
un_retry_count = 0x5
un_cmd_timeout = 0x3c
un: ffffc10004a4f2c0
un_retry_count = 0x3
un_cmd_timeout = 0x3c
un: ffffc10004a4f900
un_retry_count = 0x3
un_cmd_timeout = 0x3c
un: ffffc10004a4e640
un_retry_count = 0x3
un_cmd_timeout = 0x3c
un: ffffc10015370c80
un_retry_count = 0x3
un_cmd_timeout = 0x3c
un: ffffc1001536d940
un_retry_count = 0x3
un_cmd_timeout = 0x3c
un: ffffc1001536c680
un_retry_count = 0x3
un_cmd_timeout = 0x3c
un: ffffc1001536b340
un_retry_count = 0x3
un_cmd_timeout = 0x3c
un: ffffc1001536a080
un_retry_count = 0x3
un_cmd_timeout = 0x3c
root@solaris:/#