Disk Timeouts for SAN storage

During a failover from one node to the partner, VMs residing on NFS datastores or LUNs connected to Hypervisors (ESXi, RedHat, Hyper-V) can experience short disk timeouts, sometimes resulting in VMs being paused, IOs freezes or application crashes.

The following are some reasons why guest OS tunings are required:

To help improve error handling and interoperability during storage controller failover events.
To improve recovery times following a storage controller failover event

Note: Increasing the SCSI Timeout value is to alleviate the issue of slow Failover/Transient storage conditions. It is not designed to mitigate prolonged underlying storage conditions such as APD/PDL.

Guest OS type	Disk timeout
Windows	disk timeout = 60 - up to 190
Linux	disk timeout = 60 - up to 190
Solaris	disk timeout = 60 - up to 190 busy retry = 300 not ready retry = 300 reset retry = 30 max. throttle = 32 min. throttle = 8

For windows edit registry:

HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Disk\TimeOutValue

For Linux:

In ESXi: if VmWare tools are installed the timeout is set automatically to 180 sec. That should be enough in most cases.

For other Hypervisors the timeout can be set using sysfs:

echo 180 > /sys/block/sdc/device/timeout

This command sets the timeout of 180 sec for /dev/sdc only. Should be run for all disks.

For Solaris: edit /etc/system and add

set sd:sd_io_time=0xb4

For other settings edit /kernel/drv/sd.conf

"NETAPP LUN ","physical-block-size:4096,retries-busy:300,retries-timeout:16,retries-notready:300,retries-reset:30,throttle-max:32,throttle-min:8",
"VMware Virtual ","physical-block-size:4096,retries-busy:300,retries-timeout:16,retries-notready:300,retries-reset:30,throttle-max:32,throttle-min:8";

NOTE: changing the disk timeout has no performance impact on ths VMs or guest OS.

To discover iSCSI LUNs:

oracle:# svcadm enable network/iscsi/initiator

oracle:# iscsiadm add discovery-address 172.21.201.233:3260

oracle:# iscsiadm modify discovery --sendtargets enable

oracle:# devfsadm -i iscsi

oracle::# format

The new disks (LUNs) should be there.

System wide:

root@solaris:/# echo "sd_io_time::print" | mdb -k
0x3c
root@solaris:/#

Per disk:

    un_retry_count = 0x5
    un_cmd_timeout = 0x3c
un: ffffc1000290f980
    un_retry_count = 0x5
    un_cmd_timeout = 0x3c
un: ffffc10004a4f2c0
    un_retry_count = 0x3
    un_cmd_timeout = 0x3c
un: ffffc10004a4f900
    un_retry_count = 0x3
    un_cmd_timeout = 0x3c
un: ffffc10004a4e640
    un_retry_count = 0x3
    un_cmd_timeout = 0x3c
un: ffffc10015370c80
    un_retry_count = 0x3
    un_cmd_timeout = 0x3c
un: ffffc1001536d940
    un_retry_count = 0x3
    un_cmd_timeout = 0x3c
un: ffffc1001536c680
    un_retry_count = 0x3
    un_cmd_timeout = 0x3c
un: ffffc1001536b340
    un_retry_count = 0x3
    un_cmd_timeout = 0x3c
un: ffffc1001536a080
    un_retry_count = 0x3
    un_cmd_timeout = 0x3c
root@solaris:/#

Linuksovi - Linux Storage / vmware articles

Pages

Disk Timeouts for SAN storage