linux/drivers/scsi
Michal Rábek 0e16776542 scsi: sg: Fix occasional bogus elapsed time that exceeds timeout
A race condition was found in sg_proc_debug_helper(). It was observed on
a system using an IBM LTO-9 SAS Tape Drive (ULTRIUM-TD9) and monitoring
/proc/scsi/sg/debug every second. A very large elapsed time would
sometimes appear. This is caused by two race conditions.

We reproduced the issue with an IBM ULTRIUM-HH9 tape drive on an x86_64
architecture. A patched kernel was built, and the race condition could
not be observed anymore after the application of this patch. A
reproducer C program utilising the scsi_debug module was also built by
Changhui Zhong and can be viewed here:

https://github.com/MichaelRabek/linux-tests/blob/master/drivers/scsi/sg/sg_race_trigger.c

The first race happens between the reading of hp->duration in
sg_proc_debug_helper() and request completion in sg_rq_end_io().  The
hp->duration member variable may hold either of two types of
information:

 #1 - The start time of the request. This value is present while
      the request is not yet finished.

 #2 - The total execution time of the request (end_time - start_time).

If sg_proc_debug_helper() executes *after* the value of hp->duration was
changed from #1 to #2, but *before* srp->done is set to 1 in
sg_rq_end_io(), a fresh timestamp is taken in the else branch, and the
elapsed time (value type #2) is subtracted from a timestamp, which
cannot yield a valid elapsed time (which is a type #2 value as well).

To fix this issue, the value of hp->duration must change under the
protection of the sfp->rq_list_lock in sg_rq_end_io().  Since
sg_proc_debug_helper() takes this read lock, the change to srp->done and
srp->header.duration will happen atomically from the perspective of
sg_proc_debug_helper() and the race condition is thus eliminated.

The second race condition happens between sg_proc_debug_helper() and
sg_new_write(). Even though hp->duration is set to the current time
stamp in sg_add_request() under the write lock's protection, it gets
overwritten by a call to get_sg_io_hdr(), which calls copy_from_user()
to copy struct sg_io_hdr from userspace into kernel space. hp->duration
is set to the start time again in sg_common_write(). If
sg_proc_debug_helper() is called between these two calls, an arbitrary
value set by userspace (usually zero) is used to compute the elapsed
time.

To fix this issue, hp->duration must be set to the current timestamp
again after get_sg_io_hdr() returns successfully. A small race window
still exists between get_sg_io_hdr() and setting hp->duration, but this
window is only a few instructions wide and does not result in observable
issues in practice, as confirmed by testing.

Additionally, we fix the format specifier from %d to %u for printing
unsigned int values in sg_proc_debug_helper().

Signed-off-by: Michal Rábek <mrabek@redhat.com>
Suggested-by: Tomas Henzl <thenzl@redhat.com>
Tested-by: Changhui Zhong <czhong@redhat.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: John Meneghini <jmeneghi@redhat.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Link: https://patch.msgid.link/20251212160900.64924-1-mrabek@redhat.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-12-16 21:24:28 -05:00
..
aacraid
aic7xxx
aic94xx
arcmsr
arm
be2iscsi
bfa
bnx2fc
bnx2i
csiostor
cxgbi
device_handler
elx
esas2r
fcoe
fnic
hisi_sas
ibmvscsi
ibmvscsi_tgt
isci
libfc
libsas scsi: libsas: Add rollback handling when an error occurs 2025-12-08 22:08:31 -05:00
lpfc
megaraid
mpi3mr scsi: mpi3mr: Read missing IOCFacts flag for reply queue full overflow 2025-12-16 21:19:26 -05:00
mpt3sas
mvsas
pcmcia
pm8001
qedf
qedi
qla2xxx SCSI misc on 20251214 2025-12-14 15:35:35 +12:00
qla4xxx SCSI misc on 20251214 2025-12-14 15:35:35 +12:00
smartpqi
snic
sym53c8xx_2
.gitignore
3w-9xxx.c
3w-9xxx.h
3w-sas.c
3w-sas.h
3w-xxxx.c
3w-xxxx.h
53c700.c
53c700.h
53c700.scr
53c700_d.h_shipped
BusLogic.c
BusLogic.h
FlashPoint.c
Kconfig
Makefile
NCR5380.c
NCR5380.h
a100u2w.c
a100u2w.h
a2091.c
a2091.h
a3000.c
a3000.h
a4000t.c
advansys.c
aha152x.c
aha152x.h
aha1542.c
aha1542.h
aha1740.c
aha1740.h
am53c974.c
atari_scsi.c
atp870u.c
atp870u.h
bvme6000_scsi.c
ch.c
constants.c
dc395x.c
dc395x.h
dmx3191d.c
esp_scsi.c
esp_scsi.h
fdomain.c
fdomain.h
fdomain_isa.c
fdomain_pci.c
g_NCR5380.c
gvp11.c
gvp11.h
hosts.c
hpsa.c
hpsa.h
hpsa_cmd.h
hptiop.c
hptiop.h
imm.c
imm.h
initio.c
initio.h
ipr.c SCSI misc on 20251214 2025-12-14 15:35:35 +12:00
ipr.h
ips.c
ips.h
iscsi_boot_sysfs.c
iscsi_tcp.c
iscsi_tcp.h
jazz_esp.c
lasi700.c
libiscsi.c
libiscsi_tcp.c
mac53c94.c
mac53c94.h
mac_esp.c
mac_scsi.c
megaraid.c
megaraid.h
mesh.c
mesh.h
mvme16x_scsi.c
mvme147.c
mvme147.h
mvumi.c
mvumi.h
myrb.c
myrb.h
myrs.c
myrs.h
ncr53c8xx.c
ncr53c8xx.h
nsp32.c
nsp32.h
nsp32_debug.c
nsp32_io.h
pmcraid.c
pmcraid.h
ppa.c
ppa.h
ps3rom.c
qla1280.c
qla1280.h
qlogicfas.c
qlogicfas408.c
qlogicfas408.h
qlogicpti.c
qlogicpti.h
raid_class.c
script_asm.pl
scsi.c
scsi_bsg.c
scsi_common.c
scsi_debug.c scsi: scsi_debug: Fix atomic write enable module param description 2025-12-16 21:09:44 -05:00
scsi_debugfs.c
scsi_debugfs.h
scsi_devinfo.c
scsi_dh.c scsi: device_handler: Return error pointer in scsi_dh_attached_handler_name() 2025-12-08 22:04:38 -05:00
scsi_error.c
scsi_ioctl.c
scsi_lib.c
scsi_lib_dma.c
scsi_lib_test.c
scsi_logging.c
scsi_logging.h
scsi_netlink.c
scsi_pm.c
scsi_priv.h
scsi_proc.c
scsi_proto_test.c
scsi_sas_internal.h
scsi_scan.c
scsi_sysctl.c
scsi_sysfs.c
scsi_trace.c
scsi_transport_api.h
scsi_transport_fc.c
scsi_transport_iscsi.c
scsi_transport_sas.c
scsi_transport_spi.c
scsi_transport_srp.c
scsicam.c
sd.c block-6.19-20251208 2025-12-09 08:53:24 +09:00
sd.h
sd_dif.c
sd_trace.h
sd_zbc.c
sense_codes.h
ses.c
sg.c scsi: sg: Fix occasional bogus elapsed time that exceeds timeout 2025-12-16 21:24:28 -05:00
sgiwd93.c
sim710.c
sni_53c710.c
sr.c
sr.h
sr_ioctl.c
sr_vendor.c
st.c
st.h
st_options.h
stex.c
storvsc_drv.c
sun3_scsi.c
sun3_scsi_vme.c
sun3x_esp.c
sun_esp.c
virtio_scsi.c
vmw_pvscsi.c
vmw_pvscsi.h
wd33c93.c
wd33c93.h
wd719x.c
wd719x.h
xen-scsifront.c
zalon.c
zorro7xx.c
zorro_esp.c