Skip to the content.

以 scsi 为例子分析 blk 层超时机制

直接释放已经提交的 request 是不可以的,如果设备忽然又醒过来如何办? 所以正确的操作是,首先 abort 之前的命令,然后重试,如果 abort 失败, 那么等待,如果再次超时,那么 reset 设备。

超时机制基本流程

如果要注入 io-timeout-fail 的话,那么 nvme_try_complete_req 中会直接跳过 blk_mq_complete_request_remote ,导致 永远不会成功。

blk_mq_timeout_work 每 blk_mq_tag_set::timeout 运行一次,检查每一个 request 的 timeout 时间

如果想要测试 scsi ,可以尝试分析:

static enum scsi_timeout_action virtscsi_eh_timed_out(struct scsi_cmnd *scmnd)
{

	if (time_after(jiffies, scmnd->jiffies_at_alloc +
				(90* 2) * HZ)) {
		return SCSI_EH_NOT_HANDLED;
	}

	return SCSI_EH_RESET_TIMER;
}

有趣的是:

struct request {
  // ...

	/* Time that this request was allocated for this IO. */
	u64 start_time_ns;
	/* Time that I/O was submitted to the device. */
	u64 io_start_time_ns;

  // ...
}

io_start_time_ns 没有被利用起来监控硬件长时间不返回的情况,只是用于 cgroup

五个超时时间配置

更多参考

从 scsi_eh_stu 到 scsi_eh_host_reset 都是调用 Fine-grained 的 hook 的,其实内部都是存在超时的, 例如 megasas_task_abort -> megasas_task_abort_fusion -> megasas_issue_tm 中设置的是 50S 的时间。

eh_deadline 的实现就是靠 scsi_eh_ready_devs 每一级的检测中检查 scsi_host_eh_past_deadline()

如何修改一个卡的超时时间设置

有的驱动可以调整

也可以通过 sysfs:

超时的核心流程

如果使用 timeout 注入,使用 virtio scsi ,将 virtscsi_eh_timed_out 的返回值直接修改 SCSI_EH_NOT_HANDLED

0 启动
30 scsi_timeout 中执行 scsi_abort_command ,无论 abort 是否成功,scsi_timeout 会首先立刻返回成功,这里是启动一个异步进程来执行
60 在 scsi_timeout 中执行 scsi_abort_command 执行失败, 通过 scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED 知道该进程已经被 eh 过一次
90 在 eh 处理存在超时,第二次执行 sd_eh_action ,其中 scsi_device_set_state(sdev, SDEV_OFFLINE);

中断返回的检测过程

[ ] 驱动在错误流程中的处理

scsi_try_host_reset virtscsi_device_reset virtscsi_abort virtscsi_commit_rqs

void scsi_eh_ready_devs(struct Scsi_Host *shost,
			struct list_head *work_q,
			struct list_head *done_q)
{
	if (!scsi_eh_stu(shost, work_q, done_q))
		if (!scsi_eh_bus_device_reset(shost, work_q, done_q))
			if (!scsi_eh_target_reset(shost, work_q, done_q))
				if (!scsi_eh_bus_reset(shost, work_q, done_q))
					if (!scsi_eh_host_reset(shost, work_q, done_q))
						scsi_eh_offline_sdevs(work_q,
								      done_q);
}

scsi 错误处理机制的漏洞

如果返回成功,但是没有一个 bit 成功

验证犯法 1:

什么时候会自动 offline 盘

一共两个位置来 offline 盘,第一个是 sd_eh_action 中

路径 1:

路径 2:

这些函数调用地方是穿插到核心路径中的:

void scsi_eh_ready_devs(struct Scsi_Host *shost,
			struct list_head *work_q,
			struct list_head *done_q)
{
	if (!scsi_eh_stu(shost, work_q, done_q))                // <----- 路径 1
		if (!scsi_eh_bus_device_reset(shost, work_q, done_q)) // <----- 路径 1
			if (!scsi_eh_target_reset(shost, work_q, done_q))   // <----- 路径 1
				if (!scsi_eh_bus_reset(shost, work_q, done_q))    // <----- 路径 1
					if (!scsi_eh_host_reset(shost, work_q, done_q)) // <----- 路径 1
						scsi_eh_offline_sdevs(work_q,                 // <---------- 路径 2
								      done_q);
}

如果去小心的控制那个 reset 成功,那个 reset 失败,最后可以实现总是不去 offline disk 的

[ 6907.605910] scsi host1: Waking error handler thread
[ 6907.608289] scsi host1: scsi_eh_1: waking up 0/14/14
[ 6907.609279] scsi host1: scsi_eh_prt_fail_stats: cmds failed: 0, cancel: 14
[ 6907.609958] scsi host1: Total of 14 commands on 1 devices require eh work
[ 6907.610561] sd 1:0:1:0: scsi_eh_1: Sending BDR
[ 6907.611141] sd 1:0:1:0: scsi_eh_1: BDR failed

[ 6907.611693] scsi host1: scsi_eh_1: Sending target reset to target 1                  <---- 配置 scsi_eh_target_reset 失败了
[ 6907.612245] scsi host1: scsi_eh_1: Target reset failed target: 1

[ 6907.612780] scsi host1: scsi_eh_1: Sending BRST chan: 0
[ 6907.613304] sd 1:0:1:0: [sdb] tag#1734 scsi_try_bus_reset: Snd Bus RST               <---- bus 也失败了
[ 6907.613808] scsi host1: scsi_eh_1: BRST failed chan: 0

[ 6907.614310] scsi host1: scsi_eh_1: Sending HRST
[ 6907.614782] scsi host1: Snd Host RST
[ 6907.615254] sd 1:0:1:0: [sdb] tag#1734 OCR is requested due to IO timeout!!
[ 6907.615718] sd 1:0:1:0: [sdb] tag#1734 SCSI host state: 5  FW outstanding: 0
[ 6907.616191] sd 1:0:1:0: [sdb] tag#1734 scmd: (0x00000000b095e7b3)  retries: 0x0  allowed: 0x5
[ 6907.616669] sd 1:0:1:0: [sdb] tag#1734 CDB: Write(10) 2a 00 0e dc 64 00 00 00 20 00
[ 6907.617142] megaraid_sas 0000:31:00.0: megasas_disable_intr_fusion is called outbound_intr_mask:0x40000009
[ 6907.617653] megaraid_sas 0000:31:00.0: megasas_enable_intr_fusion is called outbound_intr_mask:0x40000000
[ 6917.816878] sd 1:0:1:0: [sdb] tag#1734 scsi_eh_done result: 0                        <---- host rset

[ 6917.817475] sd 1:0:1:0: [sdb] tag#1734 scsi_send_eh_cmnd timeleft: 10000             <---- scsi_eh_host_reset 中
[ 6917.818273] sd 1:0:1:0: [sdb] tag#1734 scsi_send_eh_cmnd: scsi_eh_completed_normally 2002
[ 6917.818987] sd 1:0:1:0: [sdb] tag#1734 scsi_eh_done result: 0
[ 6917.820191] sd 1:0:1:0: [sdb] tag#1734 scsi_send_eh_cmnd timeleft: 9999
[ 6917.820970] sd 1:0:1:0: [sdb] tag#1734 scsi_send_eh_cmnd: scsi_eh_completed_normally 2002
[ 6917.821588] sd 1:0:1:0: [sdb] tag#1734 scsi_eh_tur return: 2002

[ 6917.822171] sd 1:0:1:0: [sdb] tag#1734 scsi_eh_1: flush retry cmd
[ 6917.822750] sd 1:0:1:0: [sdb] tag#1732 scsi_eh_1: flush retry cmd
[ 6917.829086] sd 1:0:1:0: [sdb] tag#1755 scsi_eh_1: flush retry cmd
[ 6917.829573] scsi host1: waking up host to restart
[ 6917.830148] scsi host1: scsi_eh_1: sleeping

[ ] eh thread 的不同机制

  1. eh thread 真的需要等待所有的 io 都出现问题的时候才会启动吗?

scsi_send_eh_cmnd : 专门发送错误处理的命令的同步函数

调用者:

raid 的 failfast 如何作用于 allowed maxretry 机制

/*
 * These flags should really be called "NO_RETRY" rather than
 * "FAILFAST" because they don't make any promise about time lapse,
 * only about the number of retries, which will be zero.
 * REQ_FAILFAST_DRIVER is not included because
 * Commit: 4a27446f3e39 ("[SCSI] modify scsi to handle new fail fast flags.")
 * seems to suggest that the errors it avoids retrying should usually
 * be retried.
 */
#define	MD_FAILFAST	(REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT)

在 scsi_noretry_cmd 中对于 REQ_FAILFAST_DEV 和 REQ_FAILFAST_TRANSPORT 进行判断

错误处理中如何阻碍其他进程

参考 scsi_block_when_processing_errors

进入到 eh 的标志: scsi_eh_scmd_add

fault injection 对于 scsi 错误处理命令无效

static void scsi_done_internal(struct scsi_cmnd *cmd, bool complete_directly)
{
	struct request *req = scsi_cmd_to_rq(cmd);

	switch (cmd->submitter) {
	case SUBMITTED_BY_BLOCK_LAYER:
		break;
	case SUBMITTED_BY_SCSI_ERROR_HANDLER:
		return scsi_eh_done(cmd);          // <-- tur 命令直接成功返回
	case SUBMITTED_BY_SCSI_RESET_IOCTL:
		return;
	}

	if (unlikely(blk_should_fake_timeout(scsi_cmd_to_rq(cmd)->q))) // <--- 普通命令才会接受
		return;
	if (unlikely(test_and_set_bit(SCMD_STATE_COMPLETE, &cmd->state)))
		return;
	trace_scsi_dispatch_cmd_done(cmd);

	if (complete_directly)
		blk_mq_complete_request_direct(req, scsi_complete);
	else
		blk_mq_complete_request(req);
}

Scsi_Host::host_blocked

如何起作用:

static inline bool scsi_host_is_busy(struct Scsi_Host *shost)
{
	if (atomic_read(&shost->host_blocked) > 0) // <-
		return true;
	if (shost->host_self_blocked) // <--- 如果 HBA 驱动自己设置自己为 block
		return true;
	return false;
}

在函数 scsi_set_blocked 中时候设置 host_blocked ,而 scsi_set_blocked 的调用位置为 - __scsi_queue_insert : 在 requeue 的时候 - scsi_queue_rq : 如果 dispatch 失败

scsi_set_blocked 会将 Scsi_Host::host_blocked 设置为 Scsi_Host::max_host_blocked ,后者一般默认初始化为:

#define SCSI_DEFAULT_HOST_BLOCKED	7

重新插入队列的 request 会获取一个新的队列吗

关键参考

When the target returns a Check Condition in response to a command, the initiator usually then issues a SCSI Request Sense command in order to obtain more information. 如果接受到 Check Condition ,那么就发送一个

[ ] TUR 命令是否可以 retry

据说,不会 retry , 因为 blk_rq_is_passthrough

测试了一下,似乎并不会,下面的这个 #242 似乎被连续发送了很多次

[   73.766810] FAULT_INJECTION: forcing a failure.
               name fail_io_timeout, interval 0, probability 100, space 0, times -1
[   76.836431] sd 0:0:3:4: [sda] tag#242 Done: TIMEOUT_ERROR Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=15s
[   76.836764] sd 0:0:3:4: [sda] tag#242 CDB: Synchronize Cache(10) 35 00 00 00 00 00 00 00 00 00
[   76.836994] sd 0:0:3:4: [sda] tag#242 scsi host busy 1 failed 0
[   76.837159] sd 0:0:3:4: [sda] tag#242 previous abort failed
[   76.845477] scsi host0: Waking error handler thread
[   76.845613] scsi host0: scsi_eh_0: waking up 0/1/1
[   76.845756] scsi host0: scsi_eh_prt_fail_stats: cmds failed: 0, cancel: 1
[   76.845938] scsi host0: Total of 1 commands on 1 devices require eh work
[   76.846118] sd 0:0:3:4: scsi_eh_0: Sending BDR
[   76.846240] sd 0:0:3:4: device reset
[   76.846339] sd 0:0:3:4: scsi_eh_0: BDR failed
[   76.846459] scsi host0: scsi_eh_0: Sending target reset to target 3
[   76.846635] scsi host0: scsi_eh_0: Target reset failed target: 3
[   76.846807] scsi host0: scsi_eh_0: Sending BRST chan: 0
[   76.846951] sd 0:0:3:4: [sda] tag#242 scsi_try_bus_reset: Snd Bus RST
[   76.847124] scsi host0: scsi_eh_0: BRST failed chan: 0
[   76.847264] scsi host0: scsi_eh_0: Sending HRST
[   76.847390] scsi host0: Snd Host RST
[   76.847496] sd 0:0:3:4: [sda] tag#242 Send: scmd 0x00000000a41a6d52
[   76.847670] sd 0:0:3:4: [sda] tag#242 CDB: Test Unit Ready 00 00 00 00 00 00
[   76.848000] sd 0:0:3:4: [sda] tag#242 scsi_eh_done result: 0
[   76.848243] sd 0:0:3:4: [sda] tag#242 Done: SUCCESS Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=15s
[   76.848511] sd 0:0:3:4: [sda] tag#242 CDB: Test Unit Ready 00 00 00 00 00 00
[   76.848709] sd 0:0:3:4: [sda] tag#242 scsi host busy 1 failed 1
[   76.848874] sd 0:0:3:4: [sda] tag#242 scsi_send_eh_cmnd timeleft: 10000
[   76.849054] sd 0:0:3:4: [sda] tag#242 scsi_send_eh_cmnd: scsi_eh_completed_normally 2002
[   76.849270] sd 0:0:3:4: [sda] tag#242 scsi_eh_tur return: 2002
[   76.849429] sd 0:0:3:4: [sda] tag#242 scsi_eh_0: flush retry cmd
[   76.849596] sd 0:0:3:4: [sda] tag#242 Inserting command 00000000a41a6d52 into mlqueue
[   76.849816] scsi host0: waking up host to restart
[   76.849947] scsi host0: scsi_eh_0: sleeping
[   76.857545] sd 0:0:3:4: unblocking device at zero depth
[   76.857714] sd 0:0:3:4: [sda] tag#242 Send: scmd 0x00000000a41a6d52
[   76.857883] sd 0:0:3:4: [sda] tag#242 CDB: Synchronize Cache(10) 35 00 00 00 00 00 00 00 00 00

到底可以重试多少次

static bool scsi_cmd_retry_allowed(struct scsi_cmnd *cmd)
{
	if (cmd->allowed == SCSI_CMD_RETRIES_NO_LIMIT)
		return true;

	return ++cmd->retries <= cmd->allowed;
}

这里就是可能导致超时是 5 倍的时间,一般来说,不会允许无线超时的:

static bool scsi_cmd_runtime_exceeced(struct scsi_cmnd *cmd)
{
	struct request *req = scsi_cmd_to_rq(cmd);
	unsigned long wait_for;

	if (cmd->allowed == SCSI_CMD_RETRIES_NO_LIMIT)
		return false;

	wait_for = (cmd->allowed + 1) * req->timeout;
	if (time_before(cmd->jiffies_at_alloc + wait_for, jiffies)) {
		scmd_printk(KERN_ERR, cmd, "timing out command, waited %lus\n",
			    wait_for/HZ);
		return true;
	}
	return false;
}

scsi_eh_get_sense() 中看到了 SCSI_CMD_RETRIES_NO_LIMIT

本站所有文章转发 CSDN 将按侵权追究法律责任,其它情况随意。