In nvmet_check_ctrl_status() cmd can be derived from nvmet_req. Remove
the local variable cmd in the nvmet_check_ctrl_status() and function
parameter cmd for nvmet_check_ctrl_status(). Derive the cmd value from
req parameter in the nvmet_check_ctrl_status().
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Instead of updating the error log page in the caller of the
nvmet_alloc_ctrt() update the error log page in the nvmet_alloc_ctrl().
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
In the function nvmet_alloc_ctrl() we assign status value before we
call nvmet_fine_get_subsys() to:
status = NVME_SC_CONNECT_INVALID_PARAM | NVME_SC_DNR;
After we successfully find the subsystem we again set the status value
to:
status = NVME_SC_CONNECT_INVALID_PARAM | NVME_SC_DNR;
Remove the duplicate status assignment value.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
From the base spec, Figure 78:
"Controller Configuration, these fields are defined as parameters to
configure an "I/O Controller (IOC)" and not to configure a "Discovery
Controller (DC).
...
If the controller does not support I/O queues, then this field shall
be read-only with a value of 0h
Just perform this check for I/O controllers.
Fixes: a07b4970f4 ("nvmet: add a generic NVMe target")
Reported-by: Belanger, Martin <Martin.Belanger@dell.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
In case we have already established connection to nvmf target, it
shouldn't be allowed to change the model_number. E.g. if someone will
identify ctrl and get model_number of "my_model" later on will change
the model_numbel via configfs to "my_new_model" this will break the NVMe
specification for "Get Log Page – Persistent Event Log" that refers to
Model Number as: "This field contains the same value as reported in the
Model Number field of the Identify Controller data structure, bytes
63:24."
Although it doesn't mentioned explicitly that this field can't be
changed, we can assume it.
So allow setting this field only once: using configfs or in the first
identify ctrl operation.
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
The function nvmet_parse_io_cmd() returns value from
nvmet_file_parse_io_cmd() or nvmet_bdev_parse_io_cmd() based on which
backend is set for the request. Remove the else and just return the
value from nvmet_bdev_parse_io_cmd().
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Just like what we have to get the passthru ctrl from the req, add an
helper to get the subsystem associated with the nvmet_req() instead
of open coding the chain of structures.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
In the NVMeOF block device backend, file backend, and passthru backend
we reject and report the commands if opcode is not handled.
Add an helper and use it in block device backend to keep the code
and error message uniform.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
The six callers of nvmet_find_namespace() duplicate the error log page
update and status setting code for each call on failure.
All callers are nvmet requests based functions, so we can pass req
to the nvmet_find_namesapce() & derive ctrl from req, that'll allow us
to update the error log page in nvmet_find_namespace(). Now that we
pass the request we can also get rid of the local variable in
nvmet_find_namespace() and use the req->ns and return the error code.
Replace the ctrl parameter with nvmet_req for nvmet_find_namespace(),
centralize the error log page update for non allocated namesapces, and
return uniform error for non-allocated namespace.
The nvmet_find_namespace() takes nsid parameter which is from NVMe
commands structures such as get_log_page, identify, rw and common. All
these commands have same offset for the nsid field.
Derive nsid from req->cmd->common.nsid) & remove the extra parameter
from the nvmet_find_namespace().
Lastly now we associate the ns to the req parameter that we pass to the
nvmet_find_namespace(), rename nvmet_find_namespace() to
nvmet_req_find_ns().
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
remove unused cqs from nvmet_ctrl struct
this will reduce the allocated memory.
Signed-off-by: Amit <amit.engel@dell.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
When connecting a controller with a zero kato value using the following
command line
nvme connect -t tcp -n NQN -a ADDR -s PORT --keep-alive-tmo=0
the warning below can be reproduced:
WARNING: CPU: 1 PID: 241 at kernel/workqueue.c:1627 __queue_delayed_work+0x6d/0x90
with trace:
mod_delayed_work_on+0x59/0x90
nvmet_update_cc+0xee/0x100 [nvmet]
nvmet_execute_prop_set+0x72/0x80 [nvmet]
nvmet_tcp_try_recv_pdu+0x2f7/0x770 [nvmet_tcp]
nvmet_tcp_io_work+0x63f/0xb2d [nvmet_tcp]
...
This is caused by queuing up an uninitialized work. Althrough the
keep-alive timer is disabled during allocating the controller (fixed in
0d3b6a8d21), ka_work still has a chance to run (called by
nvmet_start_ctrl).
Fixes: 0d3b6a8d21 ("nvmet: Disable keep-alive timer when kato is cleared to 0h")
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
A user may modify the kato by a set features cmd. To properly deal
with races or a kato value of 0 (no keep alive enabled) change
nvmet_set_feat_kato to first disable the timer, then set the value
and then re-enable the timer.
Signed-off-by: Amit Engel <amit.engel@dell.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl9CwtMQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpsehEAC4ReB53LLbZxqcmoA2RNs9yz1I4DM2PU6z
C+NSGGEnAFHQAhLbfCAzxbtQa6x/m64zoLd+8zHZNAeanJXarszcgSuqhXQFlEfX
7Jz/vdXGdu7Q4zgkLuO3FxleDoPoUC5qOSFHWYtMu6KvHLOkmc9DvdSUsFMDSThX
6RsoaQY2gDOD/pwtm8Cqmy89nLZdFoyxadXyk/lzxLodjeRZOwoVc+YM8YWmrXZ0
mKEEuO4uBWxUUmoyAwUABNqWWAkwTDEhrYCiiG81DkAa1Cu0mRXodN0xycr72cLZ
Ik2OlnTLCE6B0UXsBu2c0+qXGArWsvDyhEEkwF+O+Ump4IBIr72EmgZb+o2nnkXo
Uu4X/r0qeQ6XD+vBTHcE6oPUjJhV6uEXXon5aesE+vh277ILmHgMyjJKaSiJcY/E
efM5SuPRq2kuROKWLKiLJnpuJ/9ZTU/4nk4k1pOlWWOVGLHien0sWBBzQ+iWr6mm
eRl5EkI3JoahqIrNFz0+qF3DwKPVfu+B02/EzA8OXoYHIRV9KMS5eWX5hK12aZ3i
4AT3xuAanfcNs4qBAScOfHQxQu9U5Z7Mu4JQJ58xdsJd+UWBnbznUmSLob9KKk+c
X8AvAcYhb684F87VCmaCzDlIPMb46OYxLBgI6sz7L0xdc7i8TCeeEDbQCN1HixZ3
SNtKzalNXA==
=fAwK
-----END PGP SIGNATURE-----
Merge tag 'io_uring-5.9-2020-08-23' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
- NVMe pull request from Sagi:
- nvme completion rework from Christoph and Chao that mostly came
from a bit of divergence of how we classify errors related to
pathing/retry etc.
- nvmet passthru fixes from Chaitanya
- minor nvmet fixes from Amit and I
- mpath round-robin path selection fix from Martin
- ignore noiob for zoned devices from Keith
- minor nvme-fc fix from Tianjia"
- BFQ cgroup leak fix (Dmitry)
- block layer MAINTAINERS addition (Geert)
- fix null_blk FUA checking (Hou)
- get_max_io_size() size fix (Keith)
- fix block page_is_mergeable() for compound pages (Matthew)
- discard granularity fixes (Ming)
- IO scheduler ordering fix (Ming)
- misc fixes
* tag 'io_uring-5.9-2020-08-23' of git://git.kernel.dk/linux-block: (31 commits)
null_blk: fix passing of REQ_FUA flag in null_handle_rq
nvmet: Disable keep-alive timer when kato is cleared to 0h
nvme: redirect commands on dying queue
nvme: just check the status code type in nvme_is_path_error
nvme: refactor command completion
nvme: rename and document nvme_end_request
nvme: skip noiob for zoned devices
nvme-pci: fix PRP pool size
nvme-pci: Use u32 for nvme_dev.q_depth and nvme_queue.q_depth
nvme: Use spin_lock_irq() when taking the ctrl->lock
nvmet: call blk_mq_free_request() directly
nvmet: fix oops in pt cmd execution
nvmet: add ns tear down label for pt-cmd handling
nvme: multipath: round-robin: eliminate "fallback" variable
nvme: multipath: round-robin: fix single non-optimized path case
nvme-fc: Fix wrong return value in __nvme_fc_init_request()
nvmet-passthru: Reject commands with non-sgl flags set
nvmet: fix a memory leak
blkcg: fix memleak for iolatency
MAINTAINERS: Add missing header files to BLOCK LAYER section
...
Based on nvme spec, when keep alive timeout is set to zero
the keep-alive timer should be disabled.
Signed-off-by: Amit Engel <amit.engel@dell.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This patch adds helper functions which are used in the NVMeOF configfs
when the user is configuring the passthru subsystem. Here we ensure
that only one subsys is assigned to each nvme_ctrl by using an xarray
on the cntlid.
The subsystem's version number is overridden by the passed through
controller's version. However, if that version is less than 1.2.1,
then we bump the advertised version to that and print a warning
in dmesg.
Based-on-a-patch-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Add passthru command handling capability for the NVMeOF target and
export passthru APIs which are used to integrate passthru
code with nvmet-core.
The new file passthru.c handles passthru cmd parsing and execution.
In the passthru mode, we create a block layer request from the nvmet
request and map the data on to the block layer request.
Admin commands and features are on an allow list as there are a number
of each that don't make too much sense with passthrough. We use an
allow list such that new commands can be considered before being blindly
passed through. In both cases, vendor specific commands are always
allowed.
We also reject reservation IO commands as the underlying device cannot
differentiate between multiple hosts behind a fabric.
Based-on-a-patch-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Replace has_keyed_sgls and metadata_support booleans with a flags member
that will be used for adding more features in the future.
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: Israel Rukshin <israelr@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Just use bd_disk->queue instead.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In function nvmet_async_event_process() we only process AENs iff
there is an open slot on the ctrl->async_event_cmds[] && aen
event list posted by the target is not empty. This keeps host
posted AEN outstanding if target generated AEN list is empty.
We do cleanup the target generated entries from the aen list in
nvmet_ctrl_free()-> nvmet_async_events_free() but we don't
process AEN posted by the host. This leads to following problem :-
When processing admin sq at the time of nvmet_sq_destroy() holds
an extra percpu reference(atomic value = 1), so in the following code
path after switching to atomic rcu, release function (nvmet_sq_free())
is not getting called which blocks the sq->free_done in
nvmet_sq_destroy() :-
nvmet_sq_destroy()
percpu_ref_kill_and_confirm()
- __percpu_ref_switch_mode()
-- __percpu_ref_switch_to_atomic()
--- call_rcu() -> percpu_ref_switch_to_atomic_rcu()
---- /* calls switch callback */
- percpu_ref_put()
-- percpu_ref_put_many(ref, 1)
--- else if (unlikely(atomic_long_sub_and_test(nr, &ref->count)))
---- ref->release(ref); <---- Not called.
This results in indefinite hang:-
void nvmet_sq_destroy(struct nvmet_sq *sq)
...
if (ctrl && ctrl->sqs && ctrl->sqs[0] == sq) {
nvmet_async_events_process(ctrl, status);
percpu_ref_put(&sq->ref);
}
percpu_ref_kill_and_confirm(&sq->ref, nvmet_confirm_sq);
wait_for_completion(&sq->confirm_done);
wait_for_completion(&sq->free_done); <-- Hang here
Which breaks the further disconnect sequence. This problem seems to be
introduced after commit 64f5e9cdd7 ("nvmet: fix memory leak when
removing namespaces and controllers concurrently").
This patch processes ctrl->async_event_cmds[] in the admin sq destroy()
context irrespetive of aen_list. Also we get rid of the controller's
aen_list processing in the nvmet_sq_destroy() context and just ignore
ctrl->aen_list.
This results in nvmet_async_events_process() being called from workqueue
context so we adjust the code accordingly.
Fixes: 64f5e9cdd7 ("nvmet: fix memory leak when removing namespaces and controllers concurrently ")
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Based-on-a-patch-by: Christoph Hellwig <hch@infradead.org>
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: David Milburn <dmilburn@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
When removing a namespace, we add an NS_CHANGE async event, however if
the controller admin queue is removed after the event was added but not
yet processed, we won't free the aens, resulting in the below memory
leak [1].
Fix that by moving nvmet_async_event_free to the final controller
release after it is detached from subsys->ctrls ensuring no async
events are added, and modify it to simply remove all pending aens.
--
$ cat /sys/kernel/debug/kmemleak
unreferenced object 0xffff888c1af2c000 (size 32):
comm "nvmetcli", pid 5164, jiffies 4295220864 (age 6829.924s)
hex dump (first 32 bytes):
28 01 82 3b 8b 88 ff ff 28 01 82 3b 8b 88 ff ff (..;....(..;....
02 00 04 65 76 65 6e 74 5f 66 69 6c 65 00 00 00 ...event_file...
backtrace:
[<00000000217ae580>] nvmet_add_async_event+0x57/0x290 [nvmet]
[<0000000012aa2ea9>] nvmet_ns_changed+0x206/0x300 [nvmet]
[<00000000bb3fd52e>] nvmet_ns_disable+0x367/0x4f0 [nvmet]
[<00000000e91ca9ec>] nvmet_ns_free+0x15/0x180 [nvmet]
[<00000000a15deb52>] config_item_release+0xf1/0x1c0
[<000000007e148432>] configfs_rmdir+0x555/0x7c0
[<00000000f4506ea6>] vfs_rmdir+0x142/0x3c0
[<0000000000acaaf0>] do_rmdir+0x2b2/0x340
[<0000000034d1aa52>] do_syscall_64+0xa5/0x4d0
[<00000000211f13bc>] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
Fixes: a07b4970f4 ("nvmet: add a generic NVMe target")
Reported-by: David Milburn <dmilburn@redhat.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Tested-by: David Milburn <dmilburn@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Allocate the metadata SGL buffers and set metadata fields for the
request. Then create a block IO request for the metadata from the
protection SG list.
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Expose the namespace metadata format when PI is enabled. The user needs
to enable the capability per subsystem and per port. The other metadata
properties are taken from the namespace/bdev.
Usage example:
echo 1 > /config/nvmet/subsystems/${NAME}/attr_pi_enable
echo 1 > /config/nvmet/ports/${PORT_NUM}/param_pi_enable
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
The function doesn't check only the data length, because the transfer
length includes also the metadata length in some cases. This is
preparation for adding metadata (T10-PI) support.
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
The newly added function nvmet_ns_revalidate() does update the ns size
in the identify namespace in-core target data structure when host issues
id-ns command. This can lead to host having inconsistencies between size
of the namespace present in the id-ns command result and size of the
corresponding block device until host scans the namespaces explicitly.
To avoid this scenario generate AEN if old size is not same as the new
one in nvmet_ns_revalidate().
This will allow automatic AEN generation when host calls id-ns command
and also allows target to install userspace rules so that it can trigger
nvmet_ns_revalidate() (using configfs interface with the help of next
patch) resulting in appropriate AEN generation when underlying namespace
size change is detected.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimbeg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
This patch adds a wrapper helper to indicate size change in the bdev &
file-backed namespace when revalidating ns. This helper is needed in
order to minimize code repetition in the next patch for configfs.c and
existing admin-cmd.c.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimbeg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
This adds a new tracepoint for the target to trace async event. This is
helpful in debugging and comparing host and target side async events
especially when host is connected to different targets on different
machines and now that we rely on userspace components to generate AEN.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
This patch adds a new target subsys attribute which allows user to
optionally specify model name which then used in the
nvmet_execute_identify_ctrl() to fill up the nvme_id_ctrl structure.
The default value for the model is set to "Linux" for backward
compatibility.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Mark Ruijter <MRuijter@onestopsystems.com>
[chaitanya.kulkarni@wdc.com
*Use macro for default model, coding style fixes.
*Use RCU for accessing model in for configfs and in
nvmet_execute_identify_ctrl().
]
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
This patch adds a new target subsys attribute which allows user to
optionally specify target controller IDs which then used in the
nvmet_execute_identify_ctrl() to fill up the nvme_id_ctrl structure.
For example, when using a cluster setup with two nodes, with a dual
ported NVMe drive and exporting the drive from both the nodes,
The connection to the host fails due to the same controller ID and
results in the following error message:-
"nvme nvmeX: Duplicate cntlid XXX with nvmeX, rejecting"
With this patch now user can partition the controller IDs for each
subsystem by setting up the cntlid_min and cntlid_max. These values
will be used at the time of the controller ID creation. By partitioning
the ctrl-ids for each subsystem results in the unique ctrl-id space
which avoids the collision.
When new attribute is not specified target will fall back to original
cntlid calculation method.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
All async events are enqueued via nvmet_add_async_event() which
updates the ctrl->async_event_cmds[] array and additionally an struct
nvmet_async_event is added to the ctrl->async_events list.
Under normal operations the nvmet_async_event_work() updates again
the ctrl->async_event_cmds and removes the corresponding struct
nvmet_async_event from the list again. Though nvmet_sq_destroy() could
be called which calls nvmet_async_events_free() which only updates the
ctrl->async_event_cmds[] array.
Add new functions nvmet_async_events_process() and
nvmet_async_events_free() to process async events, update an array and
the list.
When we destroy submission queue after clearing the aen present on
the ctrl->async list we also loop over ctrl->async_event_cmds[] for
any requests posted by the host for which we don't have the AEN in
the ctrl->async_events list by calling nvmet_async_event_process()
and nvmet_async_events_free().
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Daniel Wagner <dwagner@suse.de>
[chaitanya.kulkarni@wdc.com
* Loop over and clear out outstanding requests
* Update changelog
]
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
The host is allowed to pass the controller an sgl describing a buffer
that is larger than the dsm payload itself, allow it when executing
dsm.
Reported-by: Dakshaja Uppalapati <dakshaja@chelsio.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>,
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
ctrl->subsys->namespaces and subsys->namespaces are traversed with
list_for_each_entry_rcu outside an RCU read-side critical section but
under the protection of ctrl->subsys->lock and subsys->lock respectively.
Hence, add the corresponding lockdep expression to the list traversal
primitive to silence false-positive lockdep warnings, and harden RCU
lists.
Reported-by: kbuild test robot <lkp@intel.com>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Amol Grover <frextrite@gmail.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Move the special cases for fabrics commands and the discovery controller
to nvmet_parse_admin_cmd in preparation for adding passthrough support.
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Now that nvmet_req_execute does nothing, open code it.
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
[split patch, update changelog]
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Instead of storing the expected length and checking it when it's
executed, just check the length inside the command themselves.
A new helper, nvmet_check_data_len() is created to help with this
check.
Signed-off-by: Christoph Hellwig <hch@lst.de>
[split patch, udpate changelog]
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The call to sgl_alloc shouldn't fail so add this simple optimization to
the fast path.
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Presently, nvmet_file_flush() always returns a call to
errno_to_nvme_status() but that helper doesn't take into account the
case when errno=0. So nvmet_file_flush() always returns an error code.
All other callers of errno_to_nvme_status() check for success before
calling it.
To fix this, ensure errno_to_nvme_status() returns success if the
errno is zero. This should prevent future mistakes like this from
happening.
Fixes: c6aa3542e0 ("nvmet: add error log support for file backend")
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
When a port is removed through configfs, any connected controllers
are still active and can still send commands. This causes a
use-after-free bug which is detected by KASAN for any admin command
that dereferences req->port (like in nvmet_execute_identify_ctrl).
To fix this, disconnect all active controllers when a subsystem is
removed from a port. This ensures there are no active controllers
when the port is eventually removed.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by : Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
This patch introduces target-side request tracing. As Christoph
suggested, the trace would not be in a core or module to avoid
disadvantages like cache miss:
http://lists.infradead.org/pipermail/linux-nvme/2019-June/024721.html
The target-side trace code is entirely based on the Johannes's trace code
from the host side. It has lots of codes duplicated, but it would be
better than having advantages mentioned above.
It also traces not only fabrics commands, but also nvme normal commands.
Once the codes to be shared gets bigger, then we can make it common as
suggsted.
This also removed the create_sq and create_cq trace parsing functions
because it will be done by the connect fabrics command.
Example:
echo 1 > /sys/kernel/debug/tracing/event/nvmet/nvmet_req_init/enable
echo 1 > /sys/kernel/debug/tracing/event/nvmet/nvmet_req_complete/enable
cat /sys/kernel/debug/tracing/trace
Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
[hch: fixed the symbol namespace and a an endianess conversion]
Signed-off-by: Christoph Hellwig <hch@lst.de>
This patch introduces a nvme_is_fabrics() inline function to check
whether or not the given command structure is for fabrics.
Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Some transports, such as FC-NVME, support discovery controller change
events without the use of a persistent discovery controller. FC receives
events via RSCN from the FC Fabric Controller or subsystem FC port.
This patch adds a nvmet transport op that is called whenever a
discovery change event occurs in the nvmet layer.
To facilitate the callback without adding another layer to cross into
core.c to reference the transport ops, the port structure snapshots
the transport ops when the port is enabled and clears them when disabled.
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Arun Easi <aeasi@marvell.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Build breaks:
drivers/nvme/target/core.c: In function 'nvmet_req_alloc_sgl':
drivers/nvme/target/core.c:939:12: error: implicit declaration of \
function 'sgl_alloc'; did you mean 'bio_alloc'? \
[-Werror=implicit-function-declaration]
req->sg = sgl_alloc(req->transfer_len, GFP_KERNEL, &req->sg_cnt);
^~~~~~~~~
bio_alloc
drivers/nvme/target/core.c:939:10: warning: assignment makes pointer \
from integer without a cast [-Wint-conversion]
req->sg = sgl_alloc(req->transfer_len, GFP_KERNEL, &req->sg_cnt);
^
drivers/nvme/target/core.c: In function 'nvmet_req_free_sgl':
drivers/nvme/target/core.c:952:3: error: implicit declaration of \
function 'sgl_free'; did you mean 'ida_free'? [-Werror=implicit-function-declaration]
sgl_free(req->sg);
^~~~~~~~
ida_free
Cause:
1. missing include to <linux/scatterlist.h>
2. SGL_ALLOC needs to be enabled
Therefore adding the missing include, as well as Kconfig dependency.
Signed-off-by: Enrico Weigelt, metux IT consult <info@metux.net>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Minwoo Im <minwoo.im@samsung.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
nvmet_subsys_alloc() returns its pointer or NULL if it fails. We can
see three different steps in this function:
1. memory allocation
2. argument check
3. memory allocation for string
But now the callers of this function do not seem to handle case 2 by
returning -ENOMEM only even if it fails with an invalid parameter.
This patch specifies error codes so that caller can pass it to its own
caller.
Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Use NVMe namings for improving code readability.
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by : Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
we need to make sure that subsystem lock is taken during ctrl's list
traversing. nvmet_ns_changed function is not static and can be used from
various callers simultaneously.
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
In case we create N namespaces while N < NVMET_MAX_NAMESPACES, we can
perform "echo 1 > <nsid>/enable" as much as we want. In case N ==
NVMET_MAX_NAMESPACES we fail. Make sure we have the same flow for any N.
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
In case we fail to enable p2pmem on the current namespace, disable the
backing store device before exiting.
Cc: Stephen Bates <sbates@raithlin.com>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Update license to use SPDX-License-Identifier instead of verbose license
text.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
This patch defines a new macro NVMET_NO_ERROR_LOC to represent the
default error location value in the nvme-error-log-page.
This is a pure cleanup patch and it does not change any functionality.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Currently the u16 req->error_loc is being compared to -1 which
will always be false. Fix this by casting -1 to u16 to fix this.
Detected by clang:
warning: result of comparison of constant -1 with expression of
type 'u16' (aka 'unsigned short') is always false
[-Wtautological-constant-out-of-range-compare]
Fixes: 76574f37bf ("nvmet: add interface to update error-log page")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
This patch adds support for the file backend to populate the
error log entries. Here we map the errno to the NVMe status codes.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
This patch adds the support to maintain error log page for the
nvmet-core.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
This patch adds nvmet_req based interface to the nvmet-core so that
we can update the error log page. We update error log page in
the request completion path when status is not set to NVME_SC_SUCCESS.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
This patch adds necessary fields in the target data structures to
support error log page. For a target controller, we add a new error log
field to maintain the error log, at any given point we maintain error
entries equal to NVMET_ERROR_LOG_SLOTS for each controller. In the
following patch, we also update the error log page entry in the I/O
completion path so we introduce a spinlock for synchronization of the
log.
For nvmet_req, we add a new field error_loc to hold the location of
the error in the command when the actual error occurs for each request
and a starting LBA if applicable.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
This patch adds unlikely in the nvmet request completion path for the
status check in the low level function __nvmet_req_complete.
This is helpful in the scenario where host and target connection is
working smoothly.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Technical proposal 8005 "fabrics SQ flow control" introduces a mode
where a host and controller agree to omit sq_head pointer updates
when sending nvme completions.
In case the host indicated desire to operate in this mode (connect attribute)
the controller will return back a connect completion with sq_head value
of 0xffff as indication that it will omit sq_head pointer updates.
This mode saves us an atomic update in the I/O path.
Reviewed-by: Hannes Reinecke <hare@suse.com>
[hch: suggested better implementation]
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Add functions to find connections requesting Discovery Change events
and send a notification to hosts that maintain an explicit persistent
connection and have and active Asynchronous Event Request pending.
Only Hosts that have access to the Subsystem effected by the change
will receive notifications of Discovery Change event.
Call these functions each time there is a configfs change that effects
the Discover Log Pages.
Set the OAES field in the Identify Controller response to advertise the
support for Asynchronous Event Notifications.
Signed-off-by: Jay Sternberg <jay.e.sternberg@intel.com>
Reviewed-by: Phil Cayton <phil.cayton@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
It is perfectly valid that a host connects to a discovery subsystem
and gets an empty discovery log page since no subsystems are
provisioned to it. No reason to disallow connecting to the discovery
subsystem all together.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Jay Sternberg <jay.e.sternberg@intel.com>
Reviewed-by: Phil Cayton <phil.cayton@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Per change to specification allowing Discovery controllers to have
explicit persistent connections, remove restriction on Discovery
controllers allowing kato on connect.
Signed-off-by: Jay Sternberg <jay.e.sternberg@intel.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Functions nvmet_aen_disabled and nvmet_clear_aen were using
values not bit numbers ie 1 << 9 not 9 for bit function clear_bit
and test_and_set_bit.
Signed-off-by: Jay Sternberg <jay.e.sternberg@intel.com>
Reviewed-by: Phil Cayton <phil.cayton@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Move nvmet_aen_disabled and nvmet_clear_aen in preparation for other types
of controllers to use, initially the discovery controller.
Signed-off-by: Jay Sternberg <jay.e.sternberg@intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
A controller that supports traffic based keep-alive can restart the keep
alive timer even when no keep-alive was not received in the kato period
as long as other admin or I/O commands were received. For each command
set ctrl->cmd_seen to true, and when keep-alive timer expires, if any
commands were seen, resched ka_work instead of escalating to a fatal
error.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Even without CONFIG_P2PDMA this results in a error print:
nvmet: no peer-to-peer memory is available that's supported by rxe0 and /dev/nullb0
Fixes: c6925093d0 ("nvmet: Optionally use PCI P2P memory")
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAlvPV7IUHGJoZWxnYWFz
QGdvb2dsZS5jb20ACgkQWYigwDrT+vyaUg//WnCaRIu2oKOp8c/bplZJDW5eT10d
oYAN9qeyptU9RYrg4KBNbZL9UKGFTk3AoN5AUjrk8njxc/dY2ra/79esOvZyyYQy
qLXBvrXKg3yZnlNlnyBneGSnUVwv/kl2hZS+kmYby2YOa8AH/mhU0FIFvsnfRK2I
XvwABFm2ZYvXCqh3e5HXaHhOsR88NQ9In0AXVC7zHGqv1r/bMVn2YzPZHL/zzMrF
mS79tdBTH+shSvchH9zvfgIs+UEKvvjEJsG2liwMkcQaV41i5dZjSKTdJ3EaD/Y2
BreLxXRnRYGUkBqfcon16Yx+P6VCefDRLa+RhwYO3dxFF2N4ZpblbkIdBATwKLjL
npiGc6R8yFjTmZU0/7olMyMCm7igIBmDvWPcsKEE8R4PezwoQv6YKHBMwEaflIbl
Rv4IUqjJzmQPaA0KkRoAVgAKHxldaNqno/6G1FR2gwz+fr68p5WSYFlQ3axhvTjc
bBMJpB/fbp9WmpGJieTt6iMOI6V1pnCVjibM5ZON59WCFfytHGGpbYW05gtZEod4
d/3yRuU53JRSj3jQAQuF1B6qYhyxvv5YEtAQqIFeHaPZ67nL6agw09hE+TlXjWbE
rTQRShflQ+ydnzIfKicFgy6/53D5hq7iH2l7HwJVXbXRQ104T5DB/XHUUTr+UWQn
/Nkhov32/n6GjxQ=
=58I4
-----END PGP SIGNATURE-----
Merge tag 'pci-v4.20-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI updates from Bjorn Helgaas:
- Fix ASPM link_state teardown on removal (Lukas Wunner)
- Fix misleading _OSC ASPM message (Sinan Kaya)
- Make _OSC optional for PCI (Sinan Kaya)
- Don't initialize ASPM link state when ACPI_FADT_NO_ASPM is set
(Patrick Talbert)
- Remove x86 and arm64 node-local allocation for host bridge structures
(Punit Agrawal)
- Pay attention to device-specific _PXM node values (Jonathan Cameron)
- Support new Immediate Readiness bit (Felipe Balbi)
- Differentiate between pciehp surprise and safe removal (Lukas Wunner)
- Remove unnecessary pciehp includes (Lukas Wunner)
- Drop pciehp hotplug_slot_ops wrappers (Lukas Wunner)
- Tolerate PCIe Slot Presence Detect being hardwired to zero to
workaround broken hardware, e.g., the Wilocity switch/wireless device
(Lukas Wunner)
- Unify pciehp controller & slot structs (Lukas Wunner)
- Constify hotplug_slot_ops (Lukas Wunner)
- Drop hotplug_slot_info (Lukas Wunner)
- Embed hotplug_slot struct into users instead of allocating it
separately (Lukas Wunner)
- Initialize PCIe port service drivers directly instead of relying on
initcall ordering (Keith Busch)
- Restore PCI config state after a slot reset (Keith Busch)
- Save/restore DPC config state along with other PCI config state
(Keith Busch)
- Reference count devices during AER handling to avoid race issue with
concurrent hot removal (Keith Busch)
- If an Upstream Port reports ERR_FATAL, don't try to read the Port's
config space because it is probably unreachable (Keith Busch)
- During error handling, use slot-specific reset instead of secondary
bus reset to avoid link up/down issues on hotplug ports (Keith Busch)
- Restore previous AER/DPC handling that does not remove and
re-enumerate devices on ERR_FATAL (Keith Busch)
- Notify all drivers that may be affected by error recovery resets
(Keith Busch)
- Always generate error recovery uevents, even if a driver doesn't have
error callbacks (Keith Busch)
- Make PCIe link active reporting detection generic (Keith Busch)
- Support D3cold in PCIe hierarchies during system sleep and runtime,
including hotplug and Thunderbolt ports (Mika Westerberg)
- Handle hpmemsize/hpiosize kernel parameters uniformly, whether slots
are empty or occupied (Jon Derrick)
- Remove duplicated include from pci/pcie/err.c and unused variable
from cpqphp (YueHaibing)
- Remove driver pci_cleanup_aer_uncorrect_error_status() calls (Oza
Pawandeep)
- Uninline PCI bus accessors for better ftracing (Keith Busch)
- Remove unused AER Root Port .error_resume method (Keith Busch)
- Use kfifo in AER instead of a local version (Keith Busch)
- Use threaded IRQ in AER bottom half (Keith Busch)
- Use managed resources in AER core (Keith Busch)
- Reuse pcie_port_find_device() for AER injection (Keith Busch)
- Abstract AER interrupt handling to disconnect error injection (Keith
Busch)
- Refactor AER injection callbacks to simplify future improvments
(Keith Busch)
- Remove unused Netronome NFP32xx Device IDs (Jakub Kicinski)
- Use bitmap_zalloc() for dma_alias_mask (Andy Shevchenko)
- Add switch fall-through annotations (Gustavo A. R. Silva)
- Remove unused Switchtec quirk variable (Joshua Abraham)
- Fix pci.c kernel-doc warning (Randy Dunlap)
- Remove trivial PCI wrappers for DMA APIs (Christoph Hellwig)
- Add Intel GPU device IDs to spurious interrupt quirk (Bin Meng)
- Run Switchtec DMA aliasing quirk only on NTB endpoints to avoid
useless dmesg errors (Logan Gunthorpe)
- Update Switchtec NTB documentation (Wesley Yung)
- Remove redundant "default n" from Kconfig (Bartlomiej Zolnierkiewicz)
- Avoid panic when drivers enable MSI/MSI-X twice (Tonghao Zhang)
- Add PCI support for peer-to-peer DMA (Logan Gunthorpe)
- Add sysfs group for PCI peer-to-peer memory statistics (Logan
Gunthorpe)
- Add PCI peer-to-peer DMA scatterlist mapping interface (Logan
Gunthorpe)
- Add PCI configfs/sysfs helpers for use by peer-to-peer users (Logan
Gunthorpe)
- Add PCI peer-to-peer DMA driver writer's documentation (Logan
Gunthorpe)
- Add block layer flag to indicate driver support for PCI peer-to-peer
DMA (Logan Gunthorpe)
- Map Infiniband scatterlists for peer-to-peer DMA if they contain P2P
memory (Logan Gunthorpe)
- Register nvme-pci CMB buffer as PCI peer-to-peer memory (Logan
Gunthorpe)
- Add nvme-pci support for PCI peer-to-peer memory in requests (Logan
Gunthorpe)
- Use PCI peer-to-peer memory in nvme (Stephen Bates, Steve Wise,
Christoph Hellwig, Logan Gunthorpe)
- Cache VF config space size to optimize enumeration of many VFs
(KarimAllah Ahmed)
- Remove unnecessary <linux/pci-ats.h> include (Bjorn Helgaas)
- Fix VMD AERSID quirk Device ID matching (Jon Derrick)
- Fix Cadence PHY handling during probe (Alan Douglas)
- Signal Cadence Endpoint interrupts via AXI region 0 instead of last
region (Alan Douglas)
- Write Cadence Endpoint MSI interrupts with 32 bits of data (Alan
Douglas)
- Remove redundant controller tests for "device_type == pci" (Rob
Herring)
- Document R-Car E3 (R8A77990) bindings (Tho Vu)
- Add device tree support for R-Car r8a7744 (Biju Das)
- Drop unused mvebu PCIe capability code (Thomas Petazzoni)
- Add shared PCI bridge emulation code (Thomas Petazzoni)
- Convert mvebu to use shared PCI bridge emulation (Thomas Petazzoni)
- Add aardvark Root Port emulation (Thomas Petazzoni)
- Support 100MHz/200MHz refclocks for i.MX6 (Lucas Stach)
- Add initial power management for i.MX7 (Leonard Crestez)
- Add PME_Turn_Off support for i.MX7 (Leonard Crestez)
- Fix qcom runtime power management error handling (Bjorn Andersson)
- Update TI dra7xx unaligned access errata workaround for host mode as
well as endpoint mode (Vignesh R)
- Fix kirin section mismatch warning (Nathan Chancellor)
- Remove iproc PAXC slot check to allow VF support (Jitendra Bhivare)
- Quirk Keystone K2G to limit MRRS to 256 (Kishon Vijay Abraham I)
- Update Keystone to use MRRS quirk for host bridge instead of open
coding (Kishon Vijay Abraham I)
- Refactor Keystone link establishment (Kishon Vijay Abraham I)
- Simplify and speed up Keystone link training (Kishon Vijay Abraham I)
- Remove unused Keystone host_init argument (Kishon Vijay Abraham I)
- Merge Keystone driver files into one (Kishon Vijay Abraham I)
- Remove redundant Keystone platform_set_drvdata() (Kishon Vijay
Abraham I)
- Rename Keystone functions for uniformity (Kishon Vijay Abraham I)
- Add Keystone device control module DT binding (Kishon Vijay Abraham
I)
- Use SYSCON API to get Keystone control module device IDs (Kishon
Vijay Abraham I)
- Clean up Keystone PHY handling (Kishon Vijay Abraham I)
- Use runtime PM APIs to enable Keystone clock (Kishon Vijay Abraham I)
- Clean up Keystone config space access checks (Kishon Vijay Abraham I)
- Get Keystone outbound window count from DT (Kishon Vijay Abraham I)
- Clean up Keystone outbound window configuration (Kishon Vijay Abraham
I)
- Clean up Keystone DBI setup (Kishon Vijay Abraham I)
- Clean up Keystone ks_pcie_link_up() (Kishon Vijay Abraham I)
- Fix Keystone IRQ status checking (Kishon Vijay Abraham I)
- Add debug messages for all Keystone errors (Kishon Vijay Abraham I)
- Clean up Keystone includes and macros (Kishon Vijay Abraham I)
- Fix Mediatek unchecked return value from devm_pci_remap_iospace()
(Gustavo A. R. Silva)
- Fix Mediatek endpoint/port matching logic (Honghui Zhang)
- Change Mediatek Root Port Class Code to PCI_CLASS_BRIDGE_PCI (Honghui
Zhang)
- Remove redundant Mediatek PM domain check (Honghui Zhang)
- Convert Mediatek to pci_host_probe() (Honghui Zhang)
- Fix Mediatek MSI enablement (Honghui Zhang)
- Add Mediatek system PM support for MT2712 and MT7622 (Honghui Zhang)
- Add Mediatek loadable module support (Honghui Zhang)
- Detach VMD resources after stopping root bus to prevent orphan
resources (Jon Derrick)
- Convert pcitest build process to that used by other tools (iio, perf,
etc) (Gustavo Pimentel)
* tag 'pci-v4.20-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (140 commits)
PCI/AER: Refactor error injection fallbacks
PCI/AER: Abstract AER interrupt handling
PCI/AER: Reuse existing pcie_port_find_device() interface
PCI/AER: Use managed resource allocations
PCI: pcie: Remove redundant 'default n' from Kconfig
PCI: aardvark: Implement emulated root PCI bridge config space
PCI: mvebu: Convert to PCI emulated bridge config space
PCI: mvebu: Drop unused PCI express capability code
PCI: Introduce PCI bridge emulated config space common logic
PCI: vmd: Detach resources after stopping root bus
nvmet: Optionally use PCI P2P memory
nvmet: Introduce helper functions to allocate and free request SGLs
nvme-pci: Add support for P2P memory in requests
nvme-pci: Use PCI p2pmem subsystem to manage the CMB
IB/core: Ensure we map P2P memory correctly in rdma_rw_ctx_[init|destroy]()
block: Add PCI P2P flag for request queue
PCI/P2PDMA: Add P2P DMA driver writer's documentation
docs-rst: Add a new directory for PCI documentation
PCI/P2PDMA: Introduce configfs/sysfs enable attribute helpers
PCI/P2PDMA: Add PCI p2pmem DMA mappings to adjust the bus offset
...
Create a configfs attribute in each nvme-fabrics namespace to enable P2P
memory use. The attribute may be enabled (with a boolean) or a specific
P2P device may be given (with the device's PCI name).
When enabled, the namespace will ensure the underlying block device
supports P2P and is compatible with any specified P2P device. If no device
was specified it will ensure there is compatible P2P memory somewhere in
the system. Enabling a namespace with P2P memory will fail with EINVAL
(and an appropriate dmesg error) if any of these conditions are not met.
Once a controller is set up on a specific port, the P2P device to use for
each namespace will be found and stored in a radix tree by namespace ID.
When memory is allocated for a request, the tree is used to look up the P2P
device to allocate memory against. If no device is in the tree (because no
appropriate device was found), or if allocation of P2P memory fails, fall
back to using regular memory.
Signed-off-by: Stephen Bates <sbates@raithlin.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
[hch: partial rewrite of the initial code]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Add helpers to allocate and free the SGL in a struct nvmet_req:
int nvmet_req_alloc_sgl(struct nvmet_req *req)
void nvmet_req_free_sgl(struct nvmet_req *req)
This will be expanded in a future patch to implement peer-to-peer memory
DMAs and should be common with all target drivers.
The new helpers are used in nvmet-rdma. Seeing we use req.transfer_len as
the length of the SGL it is set earlier and cleared on any error. It also
seems to be unnecessary to accumulate the length as the map_sgl functions
should only ever be called once per request.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
strncmp() stops comparing when either the end of one of the first two arguments
is reached or when 'n' characters have been compared, whichever comes first.
That means that strncmp(s1, s2, n) is equivalent to strcmp(s1, s2) if n exceeds
the length of s1 or the length of s2. Since that is the case in
nvmet_find_get_subsys(), change strncmp() into strcmp(). This patch avoids that
the following warning is reported by smatch:
drivers/nvme/target/core.c:940:1 nvmet_find_get_subsys() error: strncmp() '"nqn.2014-08.org.nvmexpress.discovery"' too small (37 vs 223)
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAltwvasQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpv65EACTq5gSLnJBI6ZPr1RAHruVDnjfzO2Veitl
tUtjm0XfWmnEiwQ3dYvnyhy99xbyaG3900d9BClCTlH6xaUdSiQkDpcKG/R2F36J
5mZitYukQcpFAQJWF8YKsTTE7JPl4VglCIDqYiC4+C3rOSVi8lrKn2qp4J4MMCFn
thRg3jCcq7c5s9Eigsop1pXWQSasubkXfk55Krcp4oybKYpYRKXXf74Mj14QAbwJ
QHN3VisyAUWoBRg7UQZo1Npe2oPk6bbnJypnjf8M0M2EnlvddEkIlHob91sodka8
6p4APOEu5cbyXOBCAQsw/koff14mb8aEadqeQA68WvXfIdX9ZjfxCX0OoC3sBEXk
yqJhZ0C980AM13zIBD8ejv4uasGcPca8W+47mE5P8sRiI++5kBsFWDZPCtUBna0X
2Kh24NsmEya9XRR5vsB84dsIPQ3tLMkxg/IgQRVDaSnfJz0c/+zm54xDyKRaFT4l
5iERk2WSkm9+8jNfVmWG0edrv6nRAXjpGwFfOCPh6/LCSCi4xQRULYN7sVzsX8ZK
FRjt24HftBI8mJbh4BtweJvg+ppVe1gAk3IO3HvxAQhv29Hz+uvFYe9kL+3N8LJA
Qosr9n9O4+wKYizJcDnw+5iPqCHfAwOm9th4pyedR+R7SmNcP3yNC8AbbheNBiF5
Zolos5H+JA==
=b9ib
-----END PGP SIGNATURE-----
Merge tag 'for-4.19/block-20180812' of git://git.kernel.dk/linux-block
Pull block updates from Jens Axboe:
"First pull request for this merge window, there will also be a
followup request with some stragglers.
This pull request contains:
- Fix for a thundering heard issue in the wbt block code (Anchal
Agarwal)
- A few NVMe pull requests:
* Improved tracepoints (Keith)
* Larger inline data support for RDMA (Steve Wise)
* RDMA setup/teardown fixes (Sagi)
* Effects log suppor for NVMe target (Chaitanya Kulkarni)
* Buffered IO suppor for NVMe target (Chaitanya Kulkarni)
* TP4004 (ANA) support (Christoph)
* Various NVMe fixes
- Block io-latency controller support. Much needed support for
properly containing block devices. (Josef)
- Series improving how we handle sense information on the stack
(Kees)
- Lightnvm fixes and updates/improvements (Mathias/Javier et al)
- Zoned device support for null_blk (Matias)
- AIX partition fixes (Mauricio Faria de Oliveira)
- DIF checksum code made generic (Max Gurtovoy)
- Add support for discard in iostats (Michael Callahan / Tejun)
- Set of updates for BFQ (Paolo)
- Removal of async write support for bsg (Christoph)
- Bio page dirtying and clone fixups (Christoph)
- Set of bcache fix/changes (via Coly)
- Series improving blk-mq queue setup/teardown speed (Ming)
- Series improving merging performance on blk-mq (Ming)
- Lots of other fixes and cleanups from a slew of folks"
* tag 'for-4.19/block-20180812' of git://git.kernel.dk/linux-block: (190 commits)
blkcg: Make blkg_root_lookup() work for queues in bypass mode
bcache: fix error setting writeback_rate through sysfs interface
null_blk: add lock drop/acquire annotation
Blk-throttle: reduce tail io latency when iops limit is enforced
block: paride: pd: mark expected switch fall-throughs
block: Ensure that a request queue is dissociated from the cgroup controller
block: Introduce blk_exit_queue()
blkcg: Introduce blkg_root_lookup()
block: Remove two superfluous #include directives
blk-mq: count the hctx as active before allocating tag
block: bvec_nr_vecs() returns value for wrong slab
bcache: trivial - remove tailing backslash in macro BTREE_FLAG
bcache: make the pr_err statement used for ENOENT only in sysfs_attatch section
bcache: set max writeback rate when I/O request is idle
bcache: add code comments for bset.c
bcache: fix mistaken comments in request.c
bcache: fix mistaken code comments in bcache.h
bcache: add a comment in super.c
bcache: avoid unncessary cache prefetch bch_btree_node_get()
bcache: display rate debug parameters to 0 when writeback is not running
...
This patch implements the Namespace Write Protect feature described in
"NVMe TP 4005a Namespace Write Protect". In this version, we implement
No Write Protect and Write Protect states for target ns which can be
toggled by set-features commands from the host side.
For write-protect state transition, we need to flush the ns specified
as a part of command so we also add helpers for carrying out synchronous
flush operations.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
[hch: fixed an incorrect endianess conversion, minor cleanups]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Allow creating non-default ANA groups (group ID > 1). Groups are created
either by assigning the group ID to a namespace, or by creating a configfs
group object under a specific port. All namespaces assigned to a group
that doesn't have a configfs object for a given port are marked as
inaccessible.
Allow changing the ANA state on a per-port basis by creating an
ana_groups directory under each port, and another directory with an
ana_state file in it. The default ANA group 1 directory is created
automatically for each port.
For all changes in ANA configuration the ANA change AEN is sent. We only
keep a global changecount instead of additional per-group changecounts to
keep the implementation as simple as possible.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Add support for Asynchronous Namespace Access as specified in NVMe 1.3
TP 4004.
Just add a default ANA group 1 that is optimized on all ports. This is
(and will remain) the default assignment for any namespace not epxlicitly
assigned to another ANA group. The ANA state can be manually changed
through the configfs interface, including the change state.
Includes fixes and improvements from Hannes Reinecke.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
TP 4004 introduces a new 'Maximum Number of Allocated Namespaces' field
in the Identify controller data to help the host size resources. Put
an upper limit on the supported namespaces to be able to support this
value as supporting 32-bits worth of namespaces would lead to very
large buffers. The limit is completely arbitrary at this point.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
This will be needed for the ANA AEN code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
We only need to check for a file-backed namespace if
nvmet_bdev_ns_enable() returns -ENOTBLK. For any other error
it's pointless as the open() error will remain the same.
Fixes: d5eff33e ("nvmet: add simple file backed ns support")
Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
The patch enables inline data sizes using up to 4 recv sges, and capping
the size at 16KB or at least 1 page size. So on a 4K page system, up to
16KB is supported, and for a 64K page system 1 page of 64KB is supported.
We avoid > 0 order page allocations for the inline buffers by using
multiple recv sges, one for each page. If the device cannot support
the configured inline data size due to lack of enough recv sges, then
log a warning and reduce the inline size.
Add a new configfs port attribute, called param_inline_data_size,
to allow configuring the size of inline data for a given nvmf port.
The maximum size allowed is still enforced by nvmet-rdma with
NVMET_RDMA_MAX_INLINE_DATA_SIZE, which is now max(16KB, PAGE_SIZE).
And the default size, if not specified via configfs, is still PAGE_SIZE.
This preserves the existing behavior, but allows larger inline sizes
for small page systems. If the configured inline data size exceeds
NVMET_RDMA_MAX_INLINE_DATA_SIZE, a warning is logged and the size is
reduced. If param_inline_data_size is set to 0, then inline data is
disabled for that nvmf port.
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Add a new "buffered_io" attribute, which disabled direct I/O and thus
enables page cache based caching when enabled. The attribute can only
be changed when the namespace is disabled as the file has to be reopend
for the change to take effect.
The possibly blocking read/write are deferred to a newly introduced
global workqueue.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Controllers that are not yet enabled should not really enforce keep alive
timeouts, but we still want to track a timeout and cleanup in case a host
died before it enabled the controller. Hence, simply reset the keep
alive timer when the controller is enabled.
Suggested-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Per section 5.2 of the NVMe 1.3 spec:
"When the controller posts a completion queue entry for an outstanding
Asynchronous Event Request command and thus reports an asynchronous
event, subsequent events of that event type are automatically masked by
the controller until the host clears that event. An event is cleared by
reading the log page associated with that event using the Get Log Page
command (see section 5.14)."
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
AEN configuration via the 'Get Features' and 'Set Features' admin
command is mandatory, so we should be implemeting handling for it.
Signed-off-by: Hannes Reinecke <hare@suse.com>
[hch: use WRITE_ONCE, check for invalid values]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Daniel Verkamp <daniel.verkamp@intel.com>
Just keep a per-controller buffer of changed namespaces and copy it out
in the get log page implementation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Daniel Verkamp <daniel.verkamp@intel.com>
Zeroes the SGL in the payload.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
This patch adds simple file backed namespace support for NVMeOF target.
The new file io-cmd-file.c is responsible for handling the code for I/O
commands when ns is file backed. Also, we introduce mempools based slow
path using sync I/Os for file backed ns to ensure forward progress under
reclaim.
The old block device based implementation is moved to io-cmd-bdev.c and
use a "nvmet_bdev_" symbol prefix. The enable/disable calls are also
move into the respective files.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
[hch: updated changelog, fixed double req->ns lookup in bdev case]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Remove the duplicate NULL initialization for req->ns. req->ns is always
initialized to NULL in nvmet_req_init(), so there is no need to reset
it later on failures unless we have previously assigned a value to it.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
"nvmet_check_ctrl_status()" is called from admin-cmd.c along
with io-cmd.c, make the error message more generic.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
PSDT field section according to NVM_Express-1.3:
"This field specifies whether PRPs or SGLs are used for any data
transfer associated with the command. PRPs shall be used for all
Admin commands for NVMe over PCIe. SGLs shall be used for all Admin
and I/O commands for NVMe over Fabrics. This field shall be set to
01b for NVMe over Fabrics 1.0 implementations.
Suggested-by: Idan Burstein <idanb@mellanox.com>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
nvmet_req_init looked up a namespace and took a reference on it (unless it
failed prior to that). If the request is uninitialized (in error cases) we
need to remove that reference in case it was taken, otherwise we leak
namespace reference when calling nvme_req_uninit.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Make it symmetric to nvmet_alloc_ctrl().
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Remove the allocated id on error.
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Pull core block layer updates from Jens Axboe:
"This is the main pull request for block storage for 4.15-rc1.
Nothing out of the ordinary in here, and no API changes or anything
like that. Just various new features for drivers, core changes, etc.
In particular, this pull request contains:
- A patch series from Bart, closing the whole on blk/scsi-mq queue
quescing.
- A series from Christoph, building towards hidden gendisks (for
multipath) and ability to move bio chains around.
- NVMe
- Support for native multipath for NVMe (Christoph).
- Userspace notifications for AENs (Keith).
- Command side-effects support (Keith).
- SGL support (Chaitanya Kulkarni)
- FC fixes and improvements (James Smart)
- Lots of fixes and tweaks (Various)
- bcache
- New maintainer (Michael Lyle)
- Writeback control improvements (Michael)
- Various fixes (Coly, Elena, Eric, Liang, et al)
- lightnvm updates, mostly centered around the pblk interface
(Javier, Hans, and Rakesh).
- Removal of unused bio/bvec kmap atomic interfaces (me, Christoph)
- Writeback series that fix the much discussed hundreds of millions
of sync-all units. This goes all the way, as discussed previously
(me).
- Fix for missing wakeup on writeback timer adjustments (Yafang
Shao).
- Fix laptop mode on blk-mq (me).
- {mq,name} tupple lookup for IO schedulers, allowing us to have
alias names. This means you can use 'deadline' on both !mq and on
mq (where it's called mq-deadline). (me).
- blktrace race fix, oopsing on sg load (me).
- blk-mq optimizations (me).
- Obscure waitqueue race fix for kyber (Omar).
- NBD fixes (Josef).
- Disable writeback throttling by default on bfq, like we do on cfq
(Luca Miccio).
- Series from Ming that enable us to treat flush requests on blk-mq
like any other request. This is a really nice cleanup.
- Series from Ming that improves merging on blk-mq with schedulers,
getting us closer to flipping the switch on scsi-mq again.
- BFQ updates (Paolo).
- blk-mq atomic flags memory ordering fixes (Peter Z).
- Loop cgroup support (Shaohua).
- Lots of minor fixes from lots of different folks, both for core and
driver code"
* 'for-4.15/block' of git://git.kernel.dk/linux-block: (294 commits)
nvme: fix visibility of "uuid" ns attribute
blk-mq: fixup some comment typos and lengths
ide: ide-atapi: fix compile error with defining macro DEBUG
blk-mq: improve tag waiting setup for non-shared tags
brd: remove unused brd_mutex
blk-mq: only run the hardware queue if IO is pending
block: avoid null pointer dereference on null disk
fs: guard_bio_eod() needs to consider partitions
xtensa/simdisk: fix compile error
nvme: expose subsys attribute to sysfs
nvme: create 'slaves' and 'holders' entries for hidden controllers
block: create 'slaves' and 'holders' entries for hidden gendisks
nvme: also expose the namespace identification sysfs files for mpath nodes
nvme: implement multipath access to nvme subsystems
nvme: track shared namespaces
nvme: introduce a nvme_ns_ids structure
nvme: track subsystems
block, nvme: Introduce blk_mq_req_flags_t
block, scsi: Make SCSI quiesce and resume work reliably
block: Add the QUEUE_FLAG_PREEMPT_ONLY request queue flag
...
Currently the NVMe target stores the expexted data length in req->data_len
and uses that for data transfer decisions, but that does not take the
actual transfer length in the SGLs into account. So this adds a new
transfer_len field, into which the transport drivers store the actual
transfer length. We then check the two match before actually executing
the command.
The FC transport driver already had such a field, which is removed in
favour of the common one.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In case we disable namespaces which has the nsid like
subsystem max_nsid we need to search for the next largest nsid
in this subsystem. If the subsystem don't has more namespaces
we set it to 0, else we take nsid from the last namespace in
namespaces list because the list is sorted while inserting.
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Roy Shterman <roys@lightbitslabs.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
[hch: slight refactor]
Signed-off-by: Christoph Hellwig <hch@lst.de>
In testing target io in read write mix, we did indeed get into cases where
sqhd didn't update properly and slowly missed enough updates to shutdown
the queue.
Protect the updating sqhd by using cmpxchg, and for that turn the sqhd
field into a u32 so that cmpxchg works on it for all architectures.
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Fix bug in sqhd patch.
It wasn't the sq that was at risk. In the case where the admin queue
connect command fails, the sq->size field is not set. Therefore, this
becomes a divide by zero error.
Add a quick check to bypass under this failure condition.
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
To support sqhd, for initiators that are following the spec and
paying attention to sqhd vs their sqtail values:
- add sqhd to struct nvmet_sq
- initialize sqhd to 0 in nvmet_sq_setup
- rather than propagate the 0's-based qsize value from the connect message
which requires a +1 in every sqhd update, and as nothing else references
it, convert to 1's-based value in nvmt_sq/cq_setup() calls.
- validate connect message sqsize being non-zero per spec.
- updated assign sqhd for every completion that goes back.
Also remove handling the NULL sq case in __nvmet_req_complete, as it can't
happen with the current code.
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>