Commit Graph

418 Commits

Author SHA1 Message Date
Thomas Gleixner 8fa7292fee treewide: Switch/rename to timer_delete[_sync]()
timer_delete[_sync]() replaces del_timer[_sync](). Convert the whole tree
over and remove the historical wrapper inlines.

Conversion was done with coccinelle plus manual fixups where necessary.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-04-05 10:30:12 +02:00
Colin Ian King 8a9b76b7d9 scsi: lpfc: Fix spelling mistake 'Toplogy' -> 'Topology'
There is a spelling mistake in a lpfc_printf_log message. Fix it.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Link: https://lore.kernel.org/r/20250227225046.660865-1-colin.i.king@gmail.com
Reviewed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-03-03 22:11:47 -05:00
Easwar Hariharan a131f20804 scsi: lpfc: Convert timeouts to secs_to_jiffies()
Commit b35108a51c ("jiffies: Define secs_to_jiffies()") introduced
secs_to_jiffies().  As the value here is a multiple of 1000, use
secs_to_jiffies() instead of msecs_to_jiffies() to avoid the multiplication

This is converted using scripts/coccinelle/misc/secs_to_jiffies.cocci with
the following Coccinelle rules:

@depends on patch@
expression E;
@@

-msecs_to_jiffies(E * 1000)
+secs_to_jiffies(E)

-msecs_to_jiffies(E * MSEC_PER_SEC)
+secs_to_jiffies(E)

While here, convert some timeouts that are denominated in seconds
manually.

[mkp: Fix compilation error]

Signed-off-by: Easwar Hariharan <eahariha@linux.microsoft.com>
Link: https://lore.kernel.org/r/20250225-converge-secs-to-jiffies-part-two-v3-2-a43967e36c88@linux.microsoft.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-03-03 22:11:47 -05:00
Justin Tee ef12deb6ce scsi: lpfc: Copyright updates for 14.4.0.8 patches
Update copyrights to 2025 for files modified in the 14.4.0.8 patch set.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20250131000524.163662-7-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-02-03 21:53:23 -05:00
Justin Tee 56c3d809b7 scsi: lpfc: Handle duplicate D_IDs in ndlp search-by D_ID routine
After a port swap between separate fabrics, there may be multiple nodes in
the vport's fc_nodes list with the same fabric well known address.
Duplication is temporary and eventually resolves itself after dev_loss_tmo
expires, but nameserver queries may still occur before dev_loss_tmo.  This
possibly results in returning stale fabric ndlp objects.  Fix by adding an
nlp_state check to ensure the ndlp search routine returns the correct newer
allocated ndlp fabric object.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20250131000524.163662-5-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-02-03 21:53:22 -05:00
Justin Tee 23ed628977 scsi: lpfc: Ignore ndlp rport mismatch in dev_loss_tmo callbk
With repeated port swaps between separate fabrics, there can be multiple
registrations for fabric well known address 0xfffffe.  This can cause ndlp
reference confusion due to the usage of a single ndlp ptr that stores the
rport object in fc_rport struct private storage during transport
registration.  Subsequent registrations update the ndlp->rport field with
the newer rport, so when transport layer triggers dev_loss_tmo for the
earlier registered rport the ndlp->rport private storage is referencing the
newer rport instead of the older rport in dev_loss_tmo callbk.

Because the older ndlp->rport object is already cleaned up elsewhere in
driver code during the time of fabric swap, check that the rport provided
in dev_loss_tmo callbk actually matches the rport stored in the LLDD's
ndlp->rport field.  Otherwise, skip dev_loss_tmo work on a stale rport.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20250131000524.163662-4-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-02-03 21:53:22 -05:00
Justin Tee 3f8f9f16f8 scsi: lpfc: Change lpfc_nodelist save_flags member into a bitmask
In attempt to reduce the amount of unnecessary ndlp->lock acquisitions in
the lpfc driver, change save_flags into an unsigned long bitmask and use
clear_bit/test_bit bitwise atomic APIs instead of reliance on ndlp->lock
for synchronization.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20241212233309.71356-7-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-01-02 15:01:22 -05:00
Justin Tee 92b99f1a73 scsi: lpfc: Change lpfc_nodelist nlp_flag member into a bitmask
In attempt to reduce the amount of unnecessary ndlp->lock acquisitions
in the lpfc driver, change nlpa_flag into an unsigned long bitmask and
use clear_bit/test_bit bitwise atomic APIs instead of reliance on
ndlp->lock for synchronization.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20241031223219.152342-10-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-11-02 21:45:24 -04:00
Justin Tee 32566a6f1a scsi: lpfc: Remove NLP_RELEASE_RPI flag from nodelist structure
An RPI is tightly bound to an NDLP structure and is freed only upon
release of an NDLP object.  As such, there should be no logic that frees
an RPI outside of the lpfc_nlp_release() routine.  In order to reinforce
the original design usage of RPIs, remove the NLP_RELEASE_RPI flag and
related logic.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20241031223219.152342-9-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-11-02 21:45:24 -04:00
Justin Tee 4281f44ea8 scsi: lpfc: Prevent NDLP reference count underflow in dev_loss_tmo callback
Current dev_loss_tmo handling checks whether there has been a previous
call to unregister with SCSI transport.  If so, the NDLP kref count is
decremented a second time in dev_loss_tmo as the final kref release.
However, this can sometimes result in a reference count underflow if
there is also a race to unregister with NVMe transport as well.  Add a
check for NVMe transport registration before decrementing the final
kref.  If NVMe transport is still registered, then the NVMe transport
unregistration is designated as the final kref decrement.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20241031223219.152342-8-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-11-02 21:45:24 -04:00
Justin Tee 4c113ac05b scsi: lpfc: Check devloss callbk done flag for potential stale NDLP ptrs
Should an rport remain in the NOTPRESENT state it is possible that
stgt_delete_work is scheduled after dev_loss_tmo_callbk.  In such cases,
dev_loss_tmo_callbk would have cleaned up the NDLP object resulting in
stale ndlp pointers in lpfc_terminate_rport_io().

Check for the DEVLOSS_CALLBK_DONE flag to know if dev_loss_tmo_callbk
has been called.  This is a more reliable way to avoid dereferencing
stale NDLP pointers.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20241031223219.152342-3-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-11-02 21:45:23 -04:00
Justin Tee d1a2ef63fc scsi: lpfc: Fix kref imbalance on fabric ndlps from dev_loss_tmo handler
With a FLOGI outstanding and loss of physical link connection to the fabric
for the duration of dev_loss_tmo, there is a fabric ndlp kref imbalance
that decrements the kref and sets the NLP_IN_RECOV_POST_DEV_LOSS flag at
the same time.

The issue is that when the FLOGI completion routine executes, the fabric
ndlp could already be freed because of the final kref put from the
dev_loss_tmo handler.  Fix by early returning before the ndlp kref put if
the ndlp is deemed a candidate for NLP_IN_RECOV_POST_DEV_LOSS in the FLOGI
completion routine.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20240912232447.45607-5-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-09-12 21:21:19 -04:00
Justin Tee b5c18c9dd1 scsi: lpfc: Fix unsolicited FLOGI kref imbalance when in direct attached topology
In direct attached topology, certain target vendors that are quick to issue
FLOGI followed by a cable pull for more than dev_loss_tmo may result in a
kref imbalance for the remote port ndlp object.

Add an nlp_get when the defer_flogi_acc flag is set.  This is expected to
balance the nlp_put in the defer_flogi_acc clause in the
lpfc_issue_els_flogi() routine.  Because we need to retain the ndlp ptr,
reorganize all of the defer_flogi_acc information into one
lpfc_defer_flogi_acc struct.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20240726231512.92867-6-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-08-02 21:48:05 -04:00
Justin Tee 2be1d4f119 scsi: lpfc: Validate hdwq pointers before dereferencing in reset/errata paths
When the HBA is undergoing a reset or is handling an errata event, NULL ptr
dereference crashes may occur in routines such as
lpfc_sli_flush_io_rings(), lpfc_dev_loss_tmo_callbk(), or
lpfc_abort_handler().

Add NULL ptr checks before dereferencing hdwq pointers that may have been
freed due to operations colliding with a reset or errata event handler.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20240726231512.92867-4-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-08-02 21:48:05 -04:00
Justin Tee 15e21dc6d6 scsi: lpfc: Fix handling of fully recovered fabric node in dev_loss callbk
In rare cases when a fabric node is recovered after a link bounce and
before dev_loss_tmo callbk is reached, the driver may leave the fabric node
in an inconsistent state with the NLP_IN_DEV_LOSS flag perpetually set.

In lpfc_dev_loss_tmo_callbk, a check is added for a recovered fabric node.
If the node is recovered, then don't queue the lpfc_dev_loss_tmo_handler
work. In lpfc_dev_loss_tmo_handler, the path taken for the recovered fabric
nodes is updated to clear the NLP_IN_DEV_LOSS flag.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20240628172011.25921-5-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-07-04 23:24:51 -04:00
Justin Tee 9609385dd9 scsi: lpfc: Allow DEVICE_RECOVERY mode after RSCN receipt if in PRLI_ISSUE state
Certain vendor specific targets initially register with the fabric as an
initiator function first and then re-register as a target function
afterwards.

The timing of the target function re-registration can cause a race
condition such that the driver is stuck assuming the remote port as an
initiator function and never discovers the target's hosted LUNs.

Expand the nlp_state qualifier to also include NLP_STE_PRLI_ISSUE because
the state means that PRLI was issued but we have not quite reached
MAPPED_NODE state yet.  If we received an RSCN in the PRLI_ISSUE state,
then we should restart discovery again by going into DEVICE_RECOVERY.

Fixes: dded1dc31a ("scsi: lpfc: Modify when a node should be put in device recovery mode during RSCN")
Cc: <stable@vger.kernel.org> # v6.6+
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20240628172011.25921-3-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-07-04 23:24:51 -04:00
Justin Tee e780c9423b scsi: lpfc: Change lpfc_hba hba_flag member into a bitmask
In attempt to reduce the amount of unnecessary phba->hbalock acquisitions
in the lpfc driver, change hba_flag into an unsigned long bitmask and use
clear_bit/test_bit bitwise atomic APIs instead of reliance on phba->hbalock
for synchronization.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20240429221547.6842-6-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-05-06 21:53:58 -04:00
Justin Tee 115d137aa9 scsi: lpfc: Define lpfc_dmabuf type for ctx_buf ptr
In LPFC_MBOXQ_t, the ctx_buf ptr shouldn't be defined as a generic void
*ptr.  It is named ctx_buf and it should only be used as an lpfc_dmabuf
*ptr.  Due to the void* declaration, there have been abuses of ctx_buf for
things not related to lpfc_dmabuf.

So, set the ptr type for *ctx_buf as lpfc_dmabuf.  Remove all type casts on
ctx_buf because it is no longer a void *ptr.  Convert the abuse of ctx_buf
for something not related to lpfc_dmabuf to use the void *context3 ptr.

A particular abuse of the ctx_buf warranted a new void *ext_buf ptr.
However, the usage of this new void *ext_buf is not generic.  It is
intended to only hold virtual addresses for extended mailbox commands.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20240305200503.57317-10-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-03-10 18:56:44 -04:00
Justin Tee 18f7fe44bc scsi: lpfc: Define lpfc_nodelist type for ctx_ndlp ptr
In LPFC_MBOXQ_t data structure, the ctx_ndlp ptr shouldn't be defined as a
generic void *ptr.  It is named ctx_ndlp and it should only be used as an
lpfc_nodelist *ptr.  Due to the void* declaration, there have been abuses
of ctx_ndlp for things not related to ndlp.

So, set the ptr type for *ctx_ndlp as lpfc_nodelist.  Remove all type casts
on ctx_ndlp because it is no longer a void *ptr.  Convert the abuse of
ctx_ndlp for things not related to ndlps to use the void *context3 ptr.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20240305200503.57317-9-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-03-10 18:56:44 -04:00
Justin Tee ded20192df scsi: lpfc: Release hbalock before calling lpfc_worker_wake_up()
lpfc_worker_wake_up() calls the lpfc_work_done() routine, which takes the
hbalock.  Thus, lpfc_worker_wake_up() should not be called while holding the
hbalock to avoid potential deadlock.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20240305200503.57317-7-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-03-10 18:56:43 -04:00
Justin Tee ea4044e4dd scsi: lpfc: Copyright updates for 14.4.0.0 patches
Update copyrights to 2024 for files modified in the 14.4.0.0 patch set.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20240131185112.149731-18-justintee8345@gmail.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-02-05 20:51:36 -05:00
Justin Tee e39811bec6 scsi: lpfc: Change lpfc_vport load_flag member into a bitmask
In attempt to reduce the amount of unnecessary shost_lock acquisitions in
the lpfc driver, change load_flag into an unsigned long bitmask and use
clear_bit/test_bit bitwise atomic APIs instead of reliance on shost_lock
for synchronization.

Also, correct the test for FC_UNLOADING in lpfc_ct_handle_mibreq, which
incorrectly tests vport->fc_flag rather than vport->load_flag.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20240131185112.149731-16-justintee8345@gmail.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-02-05 20:51:36 -05:00
Justin Tee a645b8c1f5 scsi: lpfc: Change lpfc_vport fc_flag member into a bitmask
In attempt to reduce the amount of unnecessary shost_lock acquisitions in
the lpfc driver, change fc_flag into an unsigned long bitmask and use
clear_bit/test_bit bitwise atomic APIs instead of reliance on shost_lock
for synchronization.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20240131185112.149731-15-justintee8345@gmail.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-02-05 20:51:36 -05:00
Justin Tee 9bb36777d0 scsi: lpfc: Protect vport fc_nodes list with an explicit spin lock
In attempt to reduce the amount of unnecessary shost_lock acquisitions in
the lpfc driver, replace shost_lock with an explicit fc_nodes_list_lock
spinlock when accessing vport->fc_nodes lists.  Although vport memory
region is owned by shost->hostdata, it is driver private memory and an
explicit fc_nodes list lock for fc_nodes list mutations is more appropriate
than locking the entire shost.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20240131185112.149731-14-justintee8345@gmail.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-02-05 20:51:36 -05:00
Justin Tee 0dfd9cbc18 scsi: lpfc: Change nlp state statistic counters into atomic_t
There is no reason to use the shost_lock to synchronize an LLDD statistics
counter.  Convert all the nlp state statistic counters into atomic_t.
Corresponding zeroing, increments, and reads are converted to atomic
versions.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20240131185112.149731-13-justintee8345@gmail.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-02-05 20:51:35 -05:00
Justin Tee a801d57a11 scsi: lpfc: Remove NLP_RCV_PLOGI early return during RSCN processing for ndlps
Upon first RSCN receipt of a target server's remote port that is initially
acting as an initiator function, the driver marks the ndlp->nlp_type as an
initiator role.  Then later, when processing an RSCN for a target function
role switch, that ndlp remote port is permanently stuck as an initiator
role and can never transition to be discovered as an updated target role
function.

Remove the NLP_RCV_PLOGI early return if statement clause so that the
NLP_NPR_2B_DISC flag gets set.  This allows for role change detections.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20240131185112.149731-7-justintee8345@gmail.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-02-05 20:51:35 -05:00
Justin Tee e07ac2d2aa scsi: lpfc: Eliminate unnecessary relocking in lpfc_check_nlp_post_devloss()
In lpfc_check_nlp_post_devloss(), retaking of the ndlp lock in the if
statement is useless because the very next line unlocks. Simply return to
avoid relocking.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20231031191224.150862-5-justintee8345@gmail.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-11-15 09:52:57 -05:00
Martin K. Petersen af46076d66 Merge patch series "lpfc: Update lpfc to revision 14.2.0.15"
Justin Tee <justintee8345@gmail.com> says:

Update lpfc to revision 14.2.0.15

This patch set contains error handling fixes, ELS bug fixes, and
logging improvements.

The patches were cut against Martin's 6.7/scsi-queue tree.

Link: https://lore.kernel.org/r/20231009161812.97232-1-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-10-13 17:00:47 -04:00
Justin Tee 41c831bbb0 scsi: lpfc: Introduce LOG_NODE_VERBOSE messaging flag
The preexisting LOG_NODE message flag frequently spams a subset of the same
log messages during normal FC driver operations.  When analyzing driver
logs, this sometimes leads to difficulty in troubleshooting.

Because LOG_IP log message flag is unused, convert it to a new
LOG_NODE_VERBOSE flag.  The LOG_NODE_VERBOSE shall specifically be used for
diagnosing issues that require precise ndlp tracking detail.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20231009161812.97232-6-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-10-13 16:58:27 -04:00
Justin Tee dae40be7a1 scsi: lpfc: Prevent use-after-free during rmmod with mapped NVMe rports
During rmmod, when dev_loss_tmo callback is called, an ndlp kref count is
decremented twice.  Once for SCSI transport registration and second to
remove the initial node allocation kref.  If there is also an NVMe
transport registration, another reference count decrement is expected in
lpfc_nvme_unregister_port().

Race conditions between the NVMe transport remoteport_delete and
dev_loss_tmo callbacks sometimes results in premature ndlp object release
resulting in use-after-free issues.

Fix by not dropping the ndlp object in dev_loss_tmo callback with an
outstanding NVMe transport registration.  Inversely, mark the final
NLP_DROPPED flag in lpfc_nvme_unregister_port when rmmod flag is set.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230908211923.37603-1-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-09-13 20:51:16 -04:00
Justin Tee 9c3034968e scsi: lpfc: Early return after marking final NLP_DROPPED flag in dev_loss_tmo
When a dev_loss_tmo event occurs, an ndlp lock is taken before checking
nlp_flag for NLP_DROPPED.  There is an attempt to restore the ndlp lock
when exiting the if statement, but the nlp_put kref could be the final
decrement causing a use-after-free memory access on a released ndlp object.

Instead of trying to reacquire the ndlp lock after checking nlp_flag, just
return after calling nlp_put.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230908211852.37576-1-justintee8345@gmail.com
Reviewed-by: "Ewan D. Milne" <emilne@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-09-13 20:49:34 -04:00
James Bottomley e03843a0f0 Merge branch 'fixes' into misc 2023-09-02 08:25:19 +01:00
Justin Tee dded1dc31a scsi: lpfc: Modify when a node should be put in device recovery mode during RSCN
Only nodes whose state is at least past a PLOGI issue and strictly less
than a PRLI issue should be put into device recovery mode upon RSCN
receipt.  Previously, the allowance of LOGO and PRLI completion states did
not make sense because those nodes should be allowed to flow through and
marked as NPort dissappeared as is normally done.  A follow up RSCN GID_FT
would recover those nodes in such cases.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230804195546.157839-1-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-08-07 21:25:47 -04:00
Justin Tee 90cec07f53 scsi: lpfc: Revise ndlp kref handling for dev_loss_tmo_callbk and lpfc_drop_node
The ndlp kref count implementation in lpfc_dev_loss_tmo_callbk() removes
the initial node reference when a vport is unloading.  When lpfc_cleanup()
sends a DEVICE_RM event and is in NPR state, the driver calls
lpfc_drop_node().  Subsequently, lpfc_drop_node() also removes an ndlp kref
thinking it is the initial reference.  This unintentionally introduces an
extra kref decrement on the ndlp object.

Fix by using the NLP_DROPPED node flag in lpfc_dev_loss_tmo_callbk() and
lpfc_drop_node() to coordinate the removal of the initial node reference.

In lpfc_dev_loss_tmo_callbk(), remove the SCSI transport reference provided
the node is registered in the dev_loss context because the driver cannot
call the SCSI transport in dev_loss context or afterwards.  And, have
lpfc_drop_node() not remove a reference if another thread is acting or has
already acted on it.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230712180522.112722-6-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-07-23 16:17:07 -04:00
Justin Tee 377d7abadd scsi: lpfc: Qualify ndlp discovery state when processing RSCN
Conditionalize when to put an ndlp into recovery mode when processing
RSCNs.  As long as an ndlp state is beyond a PLOGI issue and has been
mapped to a transport layer before, the ndlp qualifies to be put into
recovery mode.  Otherwise, treat the ndlp rport normally through the
discovery engine.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230712180522.112722-5-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-07-23 16:17:07 -04:00
Tuo Li 0e881c0a4b scsi: lpfc: Fix a possible data race in lpfc_unregister_fcf_rescan()
The variable phba->fcf.fcf_flag is often protected by the lock
phba->hbalock() when is accessed. Here is an example in
lpfc_unregister_fcf_rescan():

  spin_lock_irq(&phba->hbalock);
  phba->fcf.fcf_flag |= FCF_INIT_DISC;
  spin_unlock_irq(&phba->hbalock);

However, in the same function, phba->fcf.fcf_flag is assigned with 0
without holding the lock, and thus can cause a data race:

  phba->fcf.fcf_flag = 0;

To fix this possible data race, a lock and unlock pair is added when
accessing the variable phba->fcf.fcf_flag.

Reported-by: BassCheck <bass@buaa.edu.cn>
Signed-off-by: Tuo Li <islituo@gmail.com>
Link: https://lore.kernel.org/r/20230630024748.1035993-1-islituo@gmail.com
Reviewed-by: Justin Tee <justin.tee@broadcom.com>
Reviewed-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-07-05 21:14:22 -04:00
Martin K. Petersen 21be4d0344 Merge patch series "lpfc: Update lpfc to revision 14.2.0.13"
Justin Tee <justintee8345@gmail.com> says:

Update lpfc to revision 14.2.0.13

This patch set contains discovery bug fixes, firmware logging
improvements, clean up of CQ handling, and statistics collection
enhancements.

The patches were cut against Martin's 6.5/scsi-queue tree.

Link: https://lore.kernel.org/r/20230523183206.7728-1-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-05-31 18:15:07 -04:00
Justin Tee 73ded37869 scsi: lpfc: Account for fabric domain ctlr device loss recovery
Pre-existing device loss recovery logic via the NLP_IN_RECOV_POST_DEV_LOSS
flag only handled Fabric Port Login, Fabric Controller, Management, and
Name Server addresses.

Fabric domain controllers fall under the same category for usage of the
NLP_IN_RECOV_POST_DEV_LOSS flag.  Add a default case statement to mark an
ndlp for device loss recovery.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230523183206.7728-4-justintee8345@gmail.com
Acked-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-05-31 18:14:20 -04:00
Justin Tee fd57a687d4 scsi: lpfc: Clear NLP_IN_DEV_LOSS flag if already in rediscovery
In dev_loss_tmo callback routine, we early return if the ndlp is in a state
of rediscovery.  This occurs when a target proactively PLOGIs or PRLIs
after an RSCN before the dev_loss_tmo callback routine is scheduled to run.
Move clear of the NLP_IN_DEV_LOSS flag before the ndlp state check in such
cases.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230523183206.7728-3-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-05-31 18:14:20 -04:00
Justin Tee a4157aaf0f scsi: lpfc: Fix use-after-free rport memory access in lpfc_register_remote_port()
Due to a target port D_ID swap, it is possible for the
lpfc_register_remote_port() routine to touch post mortem fc_rport memory
when trying to access fc_rport->dd_data.

The D_ID swap causes a simultaneous call to lpfc_unregister_remote_port(),
where fc_remote_port_delete() reclaims fc_rport memory.

Remove the fc_rport->dd_data->pnode NULL assignment because the following
line reassigns ndlp->rport with an fc_rport object from
fc_remote_port_add() anyways.  The pnode nullification is superfluous.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230523183206.7728-2-justintee8345@gmail.com
Acked-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-05-31 18:14:19 -04:00
Azeem Shaikh 73be26b12d scsi: lpfc: Replace all non-returning strlcpy() with strscpy()
strlcpy() reads the entire source buffer first.  This read may exceed the
destination size limit.  This is both inefficient and can lead to linear
read overflows if a source string is not NUL-terminated [1].  In an effort
to remove strlcpy() completely [2], replace strlcpy() here with strscpy().
No return values were used, so direct replacement is safe.

[1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy
[2] https://github.com/KSPP/linux/issues/89

Signed-off-by: Azeem Shaikh <azeemshaikh38@gmail.com>
Link: https://lore.kernel.org/r/20230530155745.343032-1-azeemshaikh38@gmail.com
Reviewed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-05-31 17:59:06 -04:00
Justin Tee 97f975823f scsi: lpfc: Fix double free in lpfc_cmpl_els_logo_acc() caused by lpfc_nlp_not_used()
Smatch detected a double free path because lpfc_nlp_not_used() releases an
ndlp object before reaching lpfc_nlp_put() at the end of
lpfc_cmpl_els_logo_acc().

Remove the outdated lpfc_nlp_not_used() routine.  In
lpfc_mbx_cmpl_ns_reg_login(), replace the call with lpfc_nlp_put().  In
lpfc_cmpl_els_logo_acc(), replace the call with lpfc_unreg_rpi() and keep
the lpfc_nlp_put() at the end of the routine.  If ndlp's rpi was
registered, then lpfc_unreg_rpi()'s completion routine performs the final
ndlp clean up after lpfc_nlp_put() is called from lpfc_cmpl_els_logo_acc().
Otherwise if ndlp has no rpi registered, the lpfc_nlp_put() at the end of
lpfc_cmpl_els_logo_acc() is the final ndlp clean up.

Fixes: 4430f7fd09 ("scsi: lpfc: Rework locations of ndlp reference taking")
Cc: <stable@vger.kernel.org> # v5.11+
Reported-by: Dan Carpenter <error27@gmail.com>
Link: https://lore.kernel.org/all/Y3OefhyyJNKH%2Fiaf@kili/
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230417191558.83100-3-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-05-08 07:16:04 -04:00
Justin Tee 796876fdae scsi: lpfc: Revise lpfc_error_lost_link() reason code evaluation logic
Extended status reason code errors should mask off the IOERR_PARAM_MASK
before checking strict equalities for IOERR values.

Update the lpfc_error_lost_link() routine as such.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230301231626.9621-9-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-03-09 21:21:45 -05:00
Justin Tee 1d0f9fea5d scsi: lpfc: Defer issuing new PLOGI if received RSCN before completing REG_LOGIN
When mapped to a target with multiple virtual ports, a link bounce
sometimes results in unsuccessful rediscovery of all of the target's
virtual ports.  This is because a succession of repeat RSCNs for the
virtual target ports leaves ndlps in the REG_LOGIN state with the
NLP_REG_LOGIN_SEND flag set.  With NLP_REG_LOGIN_SEND set, during the next
PLOGI, the driver will UNREG_RPI.  When UNREG_RPI is processed, the driver
can be in the middle of PRLI_ISSUE or MAPPED state resulting in an illegal
state transition by the discovery engine and stalling.

Fix by calling the discovery state machine with DEVICE_RECOVERY event
during RSCN processing.  This will set the NLP_IGNR_REG_CMPL bit and
prevent the old REG_LOGIN state from advancing.  Then for the new PLOGI
issue, add the check for the NLP_IGNR_REG_CMPL bit to delay issuing the new
PLOGI until the queued REG_LOGIN and UNREG_LOGIN have been processed.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230301231626.9621-6-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-03-09 21:21:44 -05:00
Bo Liu 442336a5a9 scsi: lpfc: Fix double word in comments
Remove the repeated word "the" in comments.

[mkp: fixed additional typos in the changed lines]

Link: https://lore.kernel.org/r/20230217083046.4090-1-liubo03@inspur.com
Signed-off-by: Bo Liu <liubo03@inspur.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-02-21 22:00:51 -05:00
Justin Tee 191b5a3877 scsi: lpfc: Copyright updates for 14.2.0.10 patches
Update copyrights to 2023 for files modified in the 14.2.0.10 patch set.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-01-12 00:03:15 -05:00
Justin Tee ecdf4ddf4e scsi: lpfc: Remove duplicate ndlp kref decrement in lpfc_cleanup_rpis()
With faulty cables in PT2PT topology, an unintentional ndlp double kref
decrement can occur.

If a FLOGI request is outstanding before the link goes down, the missing
FLOGI_ACC causes an F_Port ndlp to remain in the UNUSED state.  During link
down, lpfc_cleanup_rpis() is called and decrements an ndlp kref.
Additionally, when the driver later decides to abort the FLOGI, the FLOGI
completion handler decrements the ndlp kref a second time.

Remove duplicate clean up logic in lpfc_cleanup_rpis() because the updated
FLOGI completion handler already handles the ndlp kref decrement.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-01-12 00:03:14 -05:00
Justin Tee 97f256913c scsi: lpfc: Fix crash involving race between FLOGI timeout and devloss handler
When a FLOGI completes with a sequence timeout error, a freed kref ptr
dereference crash can occur due to a timing race involving ndlp referencing
in lpfc_dev_loss_tmo_callbk.

Fix by ensuring the driver accounts for an outstanding FLOGI when dev_loss
is active.  Also, don't remove the HBA_FLOGI_OUTSTANDING flag when the
FLOGI is retried to allow the driver to handle the reference counts
correctly in lpfc_dev_loss_tmo_handler.

Reported-by: Dietmar Hahn <dietmar.hahn@fujitsu.com>
Tested-by: Dietmar Hahn <dietmar.hahn@fujitsu.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20221116011921.105995-5-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-11-17 18:18:42 +00:00
Jason A. Donenfeld 7e3cf0843f treewide: use get_random_{u8,u16}() when possible, part 1
Rather than truncate a 32-bit value to a 16-bit value or an 8-bit value,
simply use the get_random_{u8,u16}() functions, which are faster than
wasting the additional bytes from a 32-bit value. This was done
mechanically with this coccinelle script:

@@
expression E;
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
typedef u16;
typedef __be16;
typedef __le16;
typedef u8;
@@
(
- (get_random_u32() & 0xffff)
+ get_random_u16()
|
- (get_random_u32() & 0xff)
+ get_random_u8()
|
- (get_random_u32() % 65536)
+ get_random_u16()
|
- (get_random_u32() % 256)
+ get_random_u8()
|
- (get_random_u32() >> 16)
+ get_random_u16()
|
- (get_random_u32() >> 24)
+ get_random_u8()
|
- (u16)get_random_u32()
+ get_random_u16()
|
- (u8)get_random_u32()
+ get_random_u8()
|
- (__be16)get_random_u32()
+ (__be16)get_random_u16()
|
- (__le16)get_random_u32()
+ (__le16)get_random_u16()
|
- prandom_u32_max(65536)
+ get_random_u16()
|
- prandom_u32_max(256)
+ get_random_u8()
|
- E->inet_id = get_random_u32()
+ E->inet_id = get_random_u16()
)

@@
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
typedef u16;
identifier v;
@@
- u16 v = get_random_u32();
+ u16 v = get_random_u16();

@@
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
typedef u8;
identifier v;
@@
- u8 v = get_random_u32();
+ u8 v = get_random_u8();

@@
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
typedef u16;
u16 v;
@@
-  v = get_random_u32();
+  v = get_random_u16();

@@
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
typedef u8;
u8 v;
@@
-  v = get_random_u32();
+  v = get_random_u8();

// Find a potential literal
@literal_mask@
expression LITERAL;
type T;
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
position p;
@@

        ((T)get_random_u32()@p & (LITERAL))

// Examine limits
@script:python add_one@
literal << literal_mask.LITERAL;
RESULT;
@@

value = None
if literal.startswith('0x'):
        value = int(literal, 16)
elif literal[0] in '123456789':
        value = int(literal, 10)
if value is None:
        print("I don't know how to handle %s" % (literal))
        cocci.include_match(False)
elif value < 256:
        coccinelle.RESULT = cocci.make_ident("get_random_u8")
elif value < 65536:
        coccinelle.RESULT = cocci.make_ident("get_random_u16")
else:
        print("Skipping large mask of %s" % (literal))
        cocci.include_match(False)

// Replace the literal mask with the calculated result.
@plus_one@
expression literal_mask.LITERAL;
position literal_mask.p;
identifier add_one.RESULT;
identifier FUNC;
@@

-       (FUNC()@p & (LITERAL))
+       (RESULT() & LITERAL)

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> # for sch_cake
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-10-11 17:42:58 -06:00
James Smart a4de8356b6 scsi: lpfc: Fix various issues reported by tools
This patch fixes below Smatch reported issues:

 1. lpfc_hbadisc.c:3020 lpfc_mbx_cmpl_fcf_rr_read_fcf_rec()
    error: uninitialized symbol 'vlan_id'.

 2. lpfc_hbadisc.c:3121 lpfc_mbx_cmpl_read_fcf_rec()
    error: uninitialized symbol 'vlan_id'.

 3. lpfc_init.c:335 lpfc_dump_wakeup_param_cmpl()
    warn: always true condition '(prg->dist < 4) => (0-3 < 4)'

 4. lpfc_init.c:2419 lpfc_parse_vpd()
    warn: inconsistent indenting.

 5. lpfc_init.c:13248 lpfc_sli4_enable_msi()
    warn: 'phba->pcidev->irq' 2147483648 can't fit into 65535
    'eqhdl->irq'

 6. lpfc_debugfs.c:5300 lpfc_idiag_extacc_avail_get()
    error: uninitialized symbol 'ext_cnt'

 7. lpfc_debugfs.c:5300 lpfc_idiag_extacc_avail_get()
    error: uninitialized symbol 'ext_size'

 8. lpfc_vmid.c:248 lpfc_vmid_get_appid()
    warn: sleeping in atomic context.

 9. lpfc_init.c:8342 lpfc_sli4_driver_resource_setup()
    warn: missing error code 'rc'.

10. lpfc_init.c:13573 lpfc_sli4_hba_unset()
    warn: variable dereferenced before check 'phba->pport' (see
    line 13546)

11. lpfc_auth.c:1923 lpfc_auth_handle_dhchap_reply()
    error: double free of 'hash_value'

Fixes:

 1. Initialize vlan_id to LPFC_FCOE_NULL_VID.

 2. Initialize vlan_id to LPFC_FCOE_NULL_VID.

 3. prg->dist is a 2 bit field. Its value can only be between 0-3.
    Remove redundent check 'if (prg->dist < 4)'.

 4. Fix inconsistent indenting.  Moved logic into helper function
    lpfc_fill_vpd().

 5. Define 'eqhdl->irq' as int value as pci_irq_vector() returns int.
    Also, check for return value of pci_irq_vector() and log message in
    case of failure.

 6. Initialize 'ext_cnt' to 0.

 7. Initialize 'ext_size' to 0.

 8. Use alloc_percpu_gfp() with GFP_ATOMIC flag.

 9. 'rc' was not updated when dma_pool_create() fails.  Update 'rc =
     -ENOMEM' when dma_pool_create() fails before calling goto statement.

10. Add check for 'phba->pport' in lpfc_cpuhp_remove().

11. Initialize 'hash_value' to NULL, same like 'aug_chal' variable.

Link: https://lore.kernel.org/r/20220911221505.117655-13-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-09-15 22:18:28 -04:00