Commit Graph

358 Commits

Author SHA1 Message Date
David Howells e7f40f4a70 rxrpc: Remove local->defrag_sem
We no longer need local->defrag_sem as all DATA packet transmission is now
done from one thread, so remove it.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2023-01-31 16:38:35 +00:00
David Howells f21e93485b rxrpc: Simplify ACK handling
Now that general ACK transmission is done from the same thread as incoming
DATA packet wrangling, there's no possibility that the SACK table will be
being updated by the latter whilst the former is trying to copy it to an
ACK.

This means that we can safely rotate the SACK table whilst updating it
without having to take a lock, rather than keeping all the bits inside it
in fixed place and copying and then rotating it in the transmitter.

Therefore, simplify SACK handing by keeping track of starting point in the
ring and rotate slots down as we consume them.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2023-01-31 16:38:35 +00:00
David Howells 5bbf953382 rxrpc: De-atomic call->ackr_window and call->ackr_nr_unacked
call->ackr_window doesn't need to be atomic as ACK generation and ACK
transmission are now done in the same thread, so drop the atomic64 handling
and split it into two separate members.

Similarly, call->ackr_nr_unacked doesn't need to be atomic now either.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2023-01-31 16:38:26 +00:00
David Howells af094824f2 rxrpc: Allow a delay to be injected into packet reception
If CONFIG_AF_RXRPC_DEBUG_RX_DELAY=y, then a delay is injected between
packets and errors being received and them being made available to the
processing code, thereby allowing the RTT to be artificially increased.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2023-01-31 16:38:09 +00:00
David Howells 223f59016f rxrpc: Convert call->recvmsg_lock to a spinlock
Convert call->recvmsg_lock to a spinlock as it's only ever write-locked.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2023-01-31 16:38:07 +00:00
David Howells 42f229c350 rxrpc: Fix incoming call setup race
An incoming call can race with rxrpc socket destruction, leading to a
leaked call.  This may result in an oops when the call timer eventually
expires:

   BUG: kernel NULL pointer dereference, address: 0000000000000874
   RIP: 0010:_raw_spin_lock_irqsave+0x2a/0x50
   Call Trace:
    <IRQ>
    try_to_wake_up+0x59/0x550
    ? __local_bh_enable_ip+0x37/0x80
    ? rxrpc_poke_call+0x52/0x110 [rxrpc]
    ? rxrpc_poke_call+0x110/0x110 [rxrpc]
    ? rxrpc_poke_call+0x110/0x110 [rxrpc]
    call_timer_fn+0x24/0x120

with a warning in the kernel log looking something like:

   rxrpc: Call 00000000ba5e571a still in use (1,SvAwtACK,1061d,0)!

incurred during rmmod of rxrpc.  The 1061d is the call flags:

   RECVMSG_READ_ALL, RX_HEARD, BEGAN_RX_TIMER, RX_LAST, EXPOSED,
   IS_SERVICE, RELEASED

but no DISCONNECTED flag (0x800), so it's an incoming (service) call and
it's still connected.

The race appears to be that:

 (1) rxrpc_new_incoming_call() consults the service struct, checks sk_state
     and allocates a call - then pauses, possibly for an interrupt.

 (2) rxrpc_release_sock() sets RXRPC_CLOSE, nulls the service pointer,
     discards the prealloc and releases all calls attached to the socket.

 (3) rxrpc_new_incoming_call() resumes, launching the new call, including
     its timer and attaching it to the socket.

Fix this by read-locking local->services_lock to access the AF_RXRPC socket
providing the service rather than RCU in rxrpc_new_incoming_call().
There's no real need to use RCU here as local->services_lock is only
write-locked by the socket side in two places: when binding and when
shutting down.

Fixes: 5e6ef4f101 ("rxrpc: Make the I/O thread take over the call and local processor work")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-afs@lists.infradead.org
2023-01-07 09:30:26 +00:00
David Howells 9d35d880e0 rxrpc: Move client call connection to the I/O thread
Move the connection setup of client calls to the I/O thread so that a whole
load of locking and barrierage can be eliminated.  This necessitates the
app thread waiting for connection to complete before it can begin
encrypting data.

This also completes the fix for a race that exists between call connection
and call disconnection whereby the data transmission code adds the call to
the peer error distribution list after the call has been disconnected (say
by the rxrpc socket getting closed).

The fix is to complete the process of moving call connection, data
transmission and call disconnection into the I/O thread and thus forcibly
serialising them.

Note that the issue may predate the overhaul to an I/O thread model that
were included in the merge window for v6.2, but the timing is very much
changed by the change given below.

Fixes: cf37b59875 ("rxrpc: Move DATA transmission into call processor work item")
Reported-by: syzbot+c22650d2844392afdcfd@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2023-01-06 09:43:33 +00:00
David Howells 0d6bf319bc rxrpc: Move the client conn cache management to the I/O thread
Move the management of the client connection cache to the I/O thread rather
than managing it from the namespace as an aggregate across all the local
endpoints within the namespace.

This will allow a load of locking to be got rid of in a future patch as
only the I/O thread will be looking at the this.

The downside is that the total number of cached connections on the system
can get higher because the limit is now per-local rather than per-netns.
We can, however, keep the number of client conns in use across the entire
netfs and use that to reduce the expiration time of idle connection.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2023-01-06 09:43:33 +00:00
David Howells 96b4059f43 rxrpc: Remove call->state_lock
All the setters of call->state are now in the I/O thread and thus the state
lock is now unnecessary.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2023-01-06 09:43:33 +00:00
David Howells 93368b6bd5 rxrpc: Move call state changes from recvmsg to I/O thread
Move the call state changes that are made in rxrpc_recvmsg() to the I/O
thread.  This means that, thenceforth, only the I/O thread does this and
the call state lock can be removed.

This requires the Rx phase to be ended when the last packet is received,
not when it is processed.

Since this now changes the rxrpc call state to SUCCEEDED before we've
consumed all the data from it, rxrpc_kernel_check_life() mustn't say the
call is dead until the recvmsg queue is empty (unless the call has failed).

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2023-01-06 09:43:33 +00:00
David Howells d41b3f5b96 rxrpc: Wrap accesses to get call state to put the barrier in one place
Wrap accesses to get the state of a call from outside of the I/O thread in
a single place so that the barrier needed to order wrt the error code and
abort code is in just that place.

Also use a barrier when setting the call state and again when reading the
call state such that the auxiliary completion info (error code, abort code)
can be read without taking a read lock on the call state lock.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2023-01-06 09:43:32 +00:00
David Howells 0b9bb322f1 rxrpc: Split out the call state changing functions into their own file
Split out the functions that change the state of an rxrpc call into their
own file.  The idea being to remove anything to do with changing the state
of a call directly from the rxrpc sendmsg() and recvmsg() paths and have
all that done in the I/O thread only, with the ultimate aim of removing the
state lock entirely.  Moving the code out of sendmsg.c and recvmsg.c makes
that easier to manage.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2023-01-06 09:43:32 +00:00
David Howells 1bab27af6b rxrpc: Set up a connection bundle from a call, not rxrpc_conn_parameters
Use the information now stored in struct rxrpc_call to configure the
connection bundle and thence the connection, rather than using the
rxrpc_conn_parameters struct.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2023-01-06 09:43:32 +00:00
David Howells 2953d3b8d8 rxrpc: Offload the completion of service conn security to the I/O thread
Offload the completion of the challenge/response cycle on a service
connection to the I/O thread.  After the RESPONSE packet has been
successfully decrypted and verified by the work queue, offloading the
changing of the call states to the I/O thread makes iteration over the
conn's channel list simpler.

Do this by marking the RESPONSE skbuff and putting it onto the receive
queue for the I/O thread to collect.  We put it on the front of the queue
as we've already received the packet for it.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2023-01-06 09:43:32 +00:00
David Howells f06cb29189 rxrpc: Make the set of connection IDs per local endpoint
Make the set of connection IDs per local endpoint so that endpoints don't
cause each other's connections to get dismissed.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2023-01-06 09:43:32 +00:00
David Howells 57af281e53 rxrpc: Tidy up abort generation infrastructure
Tidy up the abort generation infrastructure in the following ways:

 (1) Create an enum and string mapping table to list the reasons an abort
     might be generated in tracing.

 (2) Replace the 3-char string with the values from (1) in the places that
     use that to log the abort source.  This gets rid of a memcpy() in the
     tracepoint.

 (3) Subsume the rxrpc_rx_eproto tracepoint with the rxrpc_abort tracepoint
     and use values from (1) to indicate the trace reason.

 (4) Always make a call to an abort function at the point of the abort
     rather than stashing the values into variables and using goto to get
     to a place where it reported.  The C optimiser will collapse the calls
     together as appropriate.  The abort functions return a value that can
     be returned directly if appropriate.

Note that this extends into afs also at the points where that generates an
abort.  To aid with this, the afs sources need to #define
RXRPC_TRACE_ONLY_DEFINE_ENUMS before including the rxrpc tracing header
because they don't have access to the rxrpc internal structures that some
of the tracepoints make use of.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2023-01-06 09:43:32 +00:00
David Howells a00ce28b17 rxrpc: Clean up connection abort
Clean up connection abort, using the connection state_lock to gate access
to change that state, and use an rxrpc_call_completion value to indicate
the difference between local and remote aborts as these can be pasted
directly into the call state.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2023-01-06 09:43:32 +00:00
David Howells f2cce89a07 rxrpc: Implement a mechanism to send an event notification to a connection
Provide a means by which an event notification can be sent to a connection
through such that the I/O thread can pick it up and handle it rather than
doing it in a separate workqueue.

This is then used to move the deferred final ACK of a call into the I/O
thread rather than a separate work queue as part of the drive to do all
transmission from the I/O thread.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2023-01-06 09:43:31 +00:00
David Howells a343b174b4 rxrpc: Only set/transmit aborts in the I/O thread
Only set the abort call completion state in the I/O thread and only
transmit ABORT packets from there.  rxrpc_abort_call() can then be made to
actually send the packet.

Further, ABORT packets should only be sent if the call has been exposed to
the network (ie. at least one attempted DATA transmission has occurred for
it).

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2023-01-06 09:43:31 +00:00
David Howells 30df927b93 rxrpc: Separate call retransmission from other conn events
Call the rxrpc_conn_retransmit_call() directly from rxrpc_input_packet()
rather than calling it via connection event handling.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2023-01-06 09:43:31 +00:00
David Howells 8a758d98db rxrpc: Stash the network namespace pointer in rxrpc_local
Stash the network namespace pointer in the rxrpc_local struct in addition
to a pointer to the rxrpc-specific net namespace info.  Use this to remove
some places where the socket is passed as a parameter.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2023-01-06 09:43:31 +00:00
Stephen Rothwell b2ba00c2a5 rxrpc: replace zero-lenth array with DECLARE_FLEX_ARRAY() helper
0-length arrays are deprecated, and cause problems with bounds checking.
Replace with a flexible array:

In file included from include/linux/string.h:253,
                 from include/linux/bitmap.h:11,
                 from include/linux/cpumask.h:12,
                 from arch/x86/include/asm/paravirt.h:17,
                 from arch/x86/include/asm/cpuid.h:62,
                 from arch/x86/include/asm/processor.h:19,
                 from arch/x86/include/asm/cpufeature.h:5,
                 from arch/x86/include/asm/thread_info.h:53,
                 from include/linux/thread_info.h:60,
                 from arch/x86/include/asm/preempt.h:9,
                 from include/linux/preempt.h:78,
                 from include/linux/percpu.h:6,
                 from include/linux/prandom.h:13,
                 from include/linux/random.h:153,
                 from include/linux/net.h:18,
                 from net/rxrpc/output.c:10:
In function 'fortify_memcpy_chk',
    inlined from 'rxrpc_fill_out_ack' at net/rxrpc/output.c:158:2:
include/linux/fortify-string.h:520:25: error: call to '__write_overflow_field' declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()?  [-Werror=attribute-warning]
  520 |                         __write_overflow_field(p_size_field, size);
      |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Link: https://lore.kernel.org/linux-next/20230105132535.0d65378f@canb.auug.org.au/
Cc: David Howells <dhowells@redhat.com>
Cc: Marc Dionne <marc.dionne@auristor.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: linux-afs@lists.infradead.org
Cc: netdev@vger.kernel.org
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Kees Cook <keescook@chromium.org>
2023-01-05 12:08:29 -08:00
David Howells 31d35a02ad rxrpc: Fix the return value of rxrpc_new_incoming_call()
Dan Carpenter sayeth[1]:

  The patch 5e6ef4f1017c: "rxrpc: Make the I/O thread take over the
  call and local processor work" from Jan 23, 2020, leads to the
  following Smatch static checker warning:

	net/rxrpc/io_thread.c:283 rxrpc_input_packet()
	warn: bool is not less than zero.

Fix this (for now) by changing rxrpc_new_incoming_call() to return an int
with 0 or error code rather than bool.  Note that the actual return value
of rxrpc_input_packet() is currently ignored.  I have a separate patch to
clean that up.

Fixes: 5e6ef4f101 ("rxrpc: Make the I/O thread take over the call and local processor work")
Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: http://lists.infradead.org/pipermail/linux-afs/2022-December/006123.html [1]
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-19 09:51:31 +00:00
David Howells 608aecd16a rxrpc: Fix locking issues in rxrpc_put_peer_locked()
Now that rxrpc_put_local() may call kthread_stop(), it can't be called
under spinlock as it might sleep.  This can cause a problem in the peer
keepalive code in rxrpc as it tries to avoid dropping the peer_hash_lock
from the point it needs to re-add peer->keepalive_link to going round the
loop again in rxrpc_peer_keepalive_dispatch().

Fix this by just dropping the lock when we don't need it and accepting that
we'll have to take it again.  This code is only called about every 20s for
each peer, so not very often.

This allows rxrpc_put_peer_unlocked() to be removed also.

If triggered, this bug produces an oops like the following, as reproduced
by a syzbot reproducer for a different oops[1]:

BUG: sleeping function called from invalid context at kernel/sched/completion.c:101
...
RCU nest depth: 0, expected: 0
3 locks held by kworker/u9:0/50:
 #0: ffff88810e74a138 ((wq_completion)krxrpcd){+.+.}-{0:0}, at: process_one_work+0x294/0x636
 #1: ffff8881013a7e20 ((work_completion)(&rxnet->peer_keepalive_work)){+.+.}-{0:0}, at: process_one_work+0x294/0x636
 #2: ffff88817d366390 (&rxnet->peer_hash_lock){+.+.}-{2:2}, at: rxrpc_peer_keepalive_dispatch+0x2bd/0x35f
...
Call Trace:
 <TASK>
 dump_stack_lvl+0x4c/0x5f
 __might_resched+0x2cf/0x2f2
 __wait_for_common+0x87/0x1e8
 kthread_stop+0x14d/0x255
 rxrpc_peer_keepalive_dispatch+0x333/0x35f
 rxrpc_peer_keepalive_worker+0x2e9/0x449
 process_one_work+0x3c1/0x636
 worker_thread+0x25f/0x359
 kthread+0x1a6/0x1b5
 ret_from_fork+0x1f/0x30

Fixes: a275da62e8 ("rxrpc: Create a per-local endpoint receive queue and I/O thread")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/0000000000002b4a9f05ef2b616f@google.com/ [1]
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-19 09:51:31 +00:00
David Howells 8fbcc83334 rxrpc: Fix I/O thread startup getting skipped
When starting a kthread, the __kthread_create_on_node() function, as called
from kthread_run(), waits for a completion to indicate that the task_struct
(or failure state) of the new kernel thread is available before continuing.

This does not wait, however, for the thread function to be invoked and,
indeed, will skip it if kthread_stop() gets called before it gets there.

If this happens, though, kthread_run() will have returned successfully,
indicating that the thread was started and returning the task_struct
pointer.  The actual error indication is returned by kthread_stop().

Note that this is ambiguous, as the caller cannot tell whether the -EINTR
error code came from kthread() or from the thread function.

This was encountered in the new rxrpc I/O thread, where if the system is
being pounded hard by, say, syzbot, the check of KTHREAD_SHOULD_STOP can be
delayed long enough for kthread_stop() to get called when rxrpc releases a
socket - and this causes an oops because the I/O thread function doesn't
get started and thus doesn't remove the rxrpc_local struct from the
local_endpoints list.

Fix this by using a completion to wait for the thread to actually enter
rxrpc_io_thread().  This makes sure the thread can't be prematurely
stopped and makes sure the relied-upon cleanup is done.

Fixes: a275da62e8 ("rxrpc: Create a per-local endpoint receive queue and I/O thread")
Reported-by: syzbot+3538a6a72efa8b059c38@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Hillf Danton <hdanton@sina.com>
Link: https://lore.kernel.org/r/000000000000229f1505ef2b6159@google.com/
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-19 09:51:31 +00:00
David Howells b0346843b1 rxrpc: Transmit ACKs at the point of generation
For ACKs generated inside the I/O thread, transmit the ACK at the point of
generation.  Where the ACK is generated outside of the I/O thread, it's
offloaded to the I/O thread to transmit it.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2022-12-01 13:36:43 +00:00
David Howells a2cf3264f3 rxrpc: Fold __rxrpc_unuse_local() into rxrpc_unuse_local()
Fold __rxrpc_unuse_local() into rxrpc_unuse_local() as the latter is now
the only user of the former.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2022-12-01 13:36:43 +00:00
David Howells 5086d9a9df rxrpc: Move the cwnd degradation after transmitting packets
When we've gone for >1RTT without transmitting a packet, we should reduce
the ssthresh and cut the cwnd by half (as suggested in RFC2861 sec 3.1).

However, we may receive ACK packets in a batch and the first of these may
cut the cwnd, preventing further transmission, and each subsequent one cuts
the cwnd yet further, reducing it to the floor and killing performance.

Fix this by moving the cwnd reset to after doing the transmission and
resetting the base time such that we don't cut the cwnd by half again for
at least another RTT.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2022-12-01 13:36:43 +00:00
David Howells 32cf8edb07 rxrpc: Trace/count transmission underflows and cwnd resets
Add a tracepoint to log when a cwnd reset occurs due to lack of
transmission on a call.

Add stat counters to count transmission underflows (ie. when we have tx
window space, but sendmsg doesn't manage to keep up), cwnd resets and
transmission failures.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2022-12-01 13:36:42 +00:00
David Howells 5e6ef4f101 rxrpc: Make the I/O thread take over the call and local processor work
Move the functions from the call->processor and local->processor work items
into the domain of the I/O thread.

The call event processor, now called from the I/O thread, then takes over
the job of cranking the call state machine, processing incoming packets and
transmitting DATA, ACK and ABORT packets.  In a future patch,
rxrpc_send_ACK() will transmit the ACK on the spot rather than queuing it
for later transmission.

The call event processor becomes purely received-skb driven.  It only
transmits things in response to events.  We use "pokes" to queue a dummy
skb to make it do things like start/resume transmitting data.  Timer expiry
also results in pokes.

The connection event processor, becomes similar, though crypto events, such
as dealing with CHALLENGE and RESPONSE packets is offloaded to a work item
to avoid doing crypto in the I/O thread.

The local event processor is removed and VERSION response packets are
generated directly from the packet parser.  Similarly, ABORTs generated in
response to protocol errors will be transmitted immediately rather than
being pushed onto a queue for later transmission.

Changes:
========
ver #2)
 - Fix a couple of introduced lock context imbalances.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2022-12-01 13:36:42 +00:00
David Howells 393a2a2007 rxrpc: Extract the peer address from an incoming packet earlier
Extract the peer address from an incoming packet earlier, at the beginning
of rxrpc_input_packet() and thence pass a pointer to it to various
functions that use it as part of the lookup rather than doing it on several
separate paths.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2022-12-01 13:36:42 +00:00
David Howells cd21effb05 rxrpc: Reduce the use of RCU in packet input
Shrink the region of rxrpc_input_packet() that is covered by the RCU read
lock so that it only covers the connection and call lookup.  This means
that the bits now outside of that can call sleepable functions such as
kmalloc and sendmsg.

Also take a ref on the conn or call we're going to use before we drop the
RCU read lock.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2022-12-01 13:36:41 +00:00
David Howells cf37b59875 rxrpc: Move DATA transmission into call processor work item
Move DATA transmission into the call processor work item.  In a future
patch, this will be called from the I/O thread rather than being itsown
work item.

This will allow DATA transmission to be driven directly by incoming ACKs,
pokes and timers as those are processed.

The Tx queue is also split: The queue of packets prepared by sendmsg is now
places in call->tx_sendmsg and the packet dispatcher decants the packets
into call->tx_buffer as space becomes available in the transmission
window.  This allows sendmsg to run ahead of the available space to try and
prevent an underflow in transmission.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2022-12-01 13:36:41 +00:00
David Howells f3441d4125 rxrpc: Copy client call parameters into rxrpc_call earlier
Copy client call parameters into rxrpc_call earlier so that that can be
used to convey them to the connection code - which can then be offloaded to
the I/O thread.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2022-12-01 13:36:41 +00:00
David Howells 15f661dc95 rxrpc: Implement a mechanism to send an event notification to a call
Provide a means by which an event notification can be sent to a call such
that the I/O thread can process it rather than it being done in a separate
workqueue.  This will allow a lot of locking to be removed.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2022-12-01 13:36:41 +00:00
David Howells 4041a8ff65 rxrpc: Remove call->input_lock
Remove call->input_lock as it was only necessary to serialise access to the
state stored in the rxrpc_call struct by simultaneous softirq handlers
presenting received packets.  They now dump the packets in a queue and a
single process-context handler now processes them.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2022-12-01 13:36:40 +00:00
David Howells ff7348254e rxrpc: Move error processing into the local endpoint I/O thread
Move the processing of error packets into the local endpoint I/O thread,
leaving the handover from UDP to merely transfer them into the local
endpoint queue.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2022-12-01 13:36:40 +00:00
David Howells 446b3e1452 rxrpc: Move packet reception processing into I/O thread
Split the packet input handler to make the softirq side just dump the
received packet into the local endpoint receive queue and then call the
remainder of the input function from the I/O thread.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2022-12-01 13:36:40 +00:00
David Howells a275da62e8 rxrpc: Create a per-local endpoint receive queue and I/O thread
Create a per-local receive queue to which, in a future patch, all incoming
packets will be directed and an I/O thread that will process those packets
and perform all transmission of packets.

Destruction of the local endpoint is also moved from the local processor
work item (which will be absorbed) to the thread.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2022-12-01 13:36:40 +00:00
David Howells 96b2d69b43 rxrpc: Split the receive code
Split the code that handles packet reception in softirq mode as a prelude
to moving all the packet processing beyond routing to the appropriate call
and setting up of a new call out into process context.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2022-12-01 13:36:40 +00:00
David Howells 3cec055c56 rxrpc: Don't hold a ref for connection workqueue
Currently, rxrpc gives the connection's work item a ref on the connection
when it queues it - and this is called from the timer expiration function.
The problem comes when queue_work() fails (ie. the work item is already
queued): the timer routine must put the ref - but this may cause the
cleanup code to run.

This has the unfortunate effect that the cleanup code may then be run in
softirq context - which means that any spinlocks it might need to touch
have to be guarded to disable softirqs (ie. they need a "_bh" suffix).

 (1) Don't give a ref to the work item.

 (2) Simplify handling of service connections by adding a separate active
     count so that the refcount isn't also used for this.

 (3) Connection destruction for both client and service connections can
     then be cleaned up by putting rxrpc_put_connection() out of line and
     making a tidy progression through the destruction code (offloaded to a
     workqueue if put from softirq or processor function context).  The RCU
     part of the cleanup then only deals with the freeing at the end.

 (4) Make rxrpc_queue_conn() return immediately if it sees the active count
     is -1 rather then queuing the connection.

 (5) Make sure that the cleanup routine waits for the work item to
     complete.

 (6) Stash the rxrpc_net pointer in the conn struct so that the rcu free
     routine can use it, even if the local endpoint has been freed.

Unfortunately, neither the timer nor the work item can simply get around
the problem by just using refcount_inc_not_zero() as the waits would still
have to be done, and there would still be the possibility of having to put
the ref in the expiration function.

Note the connection work item is mostly going to go away with the main
event work being transferred to the I/O thread, so the wait in (6) will
become obsolete.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2022-12-01 13:36:40 +00:00
David Howells 3feda9d69c rxrpc: Don't hold a ref for call timer or workqueue
Currently, rxrpc gives the call timer a ref on the call when it starts it
and this is passed along to the workqueue by the timer expiration function.
The problem comes when queue_work() fails (ie. the work item is already
queued): the timer routine must put the ref - but this may cause the
cleanup code to run.

This has the unfortunate effect that the cleanup code may then be run in
softirq context - which means that any spinlocks it might need to touch
have to be guarded to disable softirqs (ie. they need a "_bh" suffix).

Fix this by:

 (1) Don't give a ref to the timer.

 (2) Making the expiration function not do anything if the refcount is 0.
     Note that this is more of an optimisation.

 (3) Make sure that the cleanup routine waits for timer to complete.

However, this has a consequence that timer cannot give a ref to the work
item.  Therefore the following fixes are also necessary:

 (4) Don't give a ref to the work item.

 (5) Make the work item return asap if it sees the ref count is 0.

 (6) Make sure that the cleanup routine waits for the work item to
     complete.

Unfortunately, neither the timer nor the work item can simply get around
the problem by just using refcount_inc_not_zero() as the waits would still
have to be done, and there would still be the possibility of having to put
the ref in the expiration function.

Note the call work item is going to go away with the work being transferred
to the I/O thread, so the wait in (6) will become obsolete.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2022-12-01 13:36:39 +00:00
David Howells fa3492abb6 rxrpc: Trace rxrpc_bundle refcount
Add a tracepoint for the rxrpc_bundle refcounting.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2022-12-01 13:36:39 +00:00
David Howells cb0fc0c972 rxrpc: trace: Don't use __builtin_return_address for rxrpc_call tracing
In rxrpc tracing, use enums to generate lists of points of interest rather
than __builtin_return_address() for the rxrpc_call tracepoint

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2022-12-01 13:36:39 +00:00
David Howells 7fa25105b2 rxrpc: trace: Don't use __builtin_return_address for rxrpc_conn tracing
In rxrpc tracing, use enums to generate lists of points of interest rather
than __builtin_return_address() for the rxrpc_conn tracepoint

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2022-12-01 13:36:39 +00:00
David Howells 47c810a798 rxrpc: trace: Don't use __builtin_return_address for rxrpc_peer tracing
In rxrpc tracing, use enums to generate lists of points of interest rather
than __builtin_return_address() for the rxrpc_peer tracepoint

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2022-12-01 13:36:38 +00:00
David Howells 0fde882fc9 rxrpc: trace: Don't use __builtin_return_address for rxrpc_local tracing
In rxrpc tracing, use enums to generate lists of points of interest rather
than __builtin_return_address() for the rxrpc_local tracepoint

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2022-12-01 13:36:38 +00:00
David Howells 2cc800863c rxrpc: Drop rxrpc_conn_parameters from rxrpc_connection and rxrpc_bundle
Remove the rxrpc_conn_parameters struct from the rxrpc_connection and
rxrpc_bundle structs and emplace the members directly.  These are going to
get filled in from the rxrpc_call struct in future.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2022-12-01 13:36:38 +00:00
David Howells e969c92ce5 rxrpc: Remove the [_k]net() debugging macros
Remove the _net() and knet() debugging macros in favour of tracepoints.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2022-12-01 13:36:38 +00:00
David Howells 2ebdb26e6a rxrpc: Remove the [k_]proto() debugging macros
Remove the kproto() and _proto() debugging macros in preference to using
tracepoints for this.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2022-12-01 13:36:38 +00:00
Jakub Kicinski f2bb566f5c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
tools/lib/bpf/ringbuf.c
  927cbb478a ("libbpf: Handle size overflow for ringbuf mmap")
  b486d19a0a ("libbpf: checkpatch: Fixed code alignments in ringbuf.c")
https://lore.kernel.org/all/20221121122707.44d1446a@canb.auug.org.au/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-29 13:04:52 -08:00
David Howells 3bcd6c7eaa rxrpc: Fix race between conn bundle lookup and bundle removal [ZDI-CAN-15975]
After rxrpc_unbundle_conn() has removed a connection from a bundle, it
checks to see if there are any conns with available channels and, if not,
removes and attempts to destroy the bundle.

Whilst it does check after grabbing client_bundles_lock that there are no
connections attached, this races with rxrpc_look_up_bundle() retrieving the
bundle, but not attaching a connection for the connection to be attached
later.

There is therefore a window in which the bundle can get destroyed before we
manage to attach a new connection to it.

Fix this by adding an "active" counter to struct rxrpc_bundle:

 (1) rxrpc_connect_call() obtains an active count by prepping/looking up a
     bundle and ditches it before returning.

 (2) If, during rxrpc_connect_call(), a connection is added to the bundle,
     this obtains an active count, which is held until the connection is
     discarded.

 (3) rxrpc_deactivate_bundle() is created to drop an active count on a
     bundle and destroy it when the active count reaches 0.  The active
     count is checked inside client_bundles_lock() to prevent a race with
     rxrpc_look_up_bundle().

 (4) rxrpc_unbundle_conn() then calls rxrpc_deactivate_bundle().

Fixes: 245500d853 ("rxrpc: Rewrite the client connection manager")
Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-15975
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: zdi-disclosures@trendmicro.com
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-18 12:05:44 +00:00
David Howells 30d95efe06 rxrpc: Allocate an skcipher each time needed rather than reusing
In the rxkad security class, allocate the skcipher used to do packet
encryption and decription rather than allocating one up front and reusing
it for each packet.  Reusing the skcipher precludes doing crypto in
parallel.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2022-11-08 16:42:28 +00:00
David Howells 1fc4fa2ac9 rxrpc: Fix congestion management
rxrpc has a problem in its congestion management in that it saves the
congestion window size (cwnd) from one call to another, but if this is 0 at
the time is saved, then the next call may not actually manage to ever
transmit anything.

To this end:

 (1) Don't save cwnd between calls, but rather reset back down to the
     initial cwnd and re-enter slow-start if data transmission is idle for
     more than an RTT.

 (2) Preserve ssthresh instead, as that is a handy estimate of pipe
     capacity.  Knowing roughly when to stop slow start and enter
     congestion avoidance can reduce the tendency to overshoot and drop
     larger amounts of packets when probing.

In future, cwind growth also needs to be constrained when the window isn't
being filled due to being application limited.

Reported-by: Simon Wilkinson <sxw@auristor.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2022-11-08 16:42:28 +00:00
David Howells 6869ddb87d rxrpc: Remove the rxtx ring
The Rx/Tx ring is no longer used, so remove it.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2022-11-08 16:42:28 +00:00
David Howells d57a3a1516 rxrpc: Save last ACK's SACK table rather than marking txbufs
Improve the tracking of which packets need to be transmitted by saving the
last ACK packet that we receive that has a populated soft-ACK table rather
than marking packets.  Then we can step through the soft-ACK table and look
at the packets we've transmitted beyond that to determine which packets we
might want to retransmit.

We also look at the highest serial number that has been acked to try and
guess which packets we've transmitted the peer is likely to have seen.  If
necessary, we send a ping to retrieve that number.

One downside that might be a problem is that we can't then compare the
previous acked/unacked state so easily in rxrpc_input_soft_acks() - which
is a potential problem for the slow-start algorithm.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2022-11-08 16:42:28 +00:00
David Howells 4e76bd406d rxrpc: Remove call->lock
call->lock is no longer necessary, so remove it.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2022-11-08 16:42:28 +00:00
David Howells a4ea4c4776 rxrpc: Don't use a ring buffer for call Tx queue
Change the way the Tx queueing works to make the following ends easier to
achieve:

 (1) The filling of packets, the encryption of packets and the transmission
     of packets can be handled in parallel by separate threads, rather than
     rxrpc_sendmsg() allocating, filling, encrypting and transmitting each
     packet before moving onto the next one.

 (2) Get rid of the fixed-size ring which sets a hard limit on the number
     of packets that can be retained in the ring.  This allows the number
     of packets to increase without having to allocate a very large ring or
     having variable-sized rings.

     [Note: the downside of this is that it's then less efficient to locate
     a packet for retransmission as we then have to step through a list and
     examine each buffer in the list.]

 (3) Allow the filler/encrypter to run ahead of the transmission window.

 (4) Make it easier to do zero copy UDP from the packet buffers.

 (5) Make it easier to do zero copy from userspace to the packet buffers -
     and thence to UDP (only if for unauthenticated connections).

To that end, the following changes are made:

 (1) Use the new rxrpc_txbuf struct instead of sk_buff for keeping packets
     to be transmitted in.  This allows them to be placed on multiple
     queues simultaneously.  An sk_buff isn't really necessary as it's
     never passed on to lower-level networking code.

 (2) Keep the transmissable packets in a linked list on the call struct
     rather than in a ring.  As a consequence, the annotation buffer isn't
     used either; rather a flag is set on the packet to indicate ackedness.

 (3) Use the RXRPC_CALL_TX_LAST flag to indicate that the last packet to be
     transmitted has been queued.  Add RXRPC_CALL_TX_ALL_ACKED to indicate
     that all packets up to and including the last got hard acked.

 (4) Wire headers are now stored in the txbuf rather than being concocted
     on the stack and they're stored immediately before the data, thereby
     allowing zerocopy of a single span.

 (5) Don't bother with instant-resend on transmission failure; rather,
     leave it for a timer or an ACK packet to trigger.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2022-11-08 16:42:28 +00:00
David Howells 5d7edbc923 rxrpc: Get rid of the Rx ring
Get rid of the Rx ring and replace it with a pair of queues instead.  One
queue gets the packets that are in-sequence and are ready for processing by
recvmsg(); the other queue gets the out-of-sequence packets for addition to
the first queue as the holes get filled.

The annotation ring is removed and replaced with a SACK table.  The SACK
table has the bits set that correspond exactly to the sequence number of
the packet being acked.  The SACK ring is copied when an ACK packet is
being assembled and rotated so that the first ACK is in byte 0.

Flow control handling is altered so that packets that are moved to the
in-sequence queue are hard-ACK'd even before they're consumed - and then
the Rx window size in the ACK packet (rsize) is shrunk down to compensate
(even going to 0 if the window is full).

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2022-11-08 16:42:28 +00:00
David Howells d4d02d8bb5 rxrpc: Clone received jumbo subpackets and queue separately
Split up received jumbo packets into separate skbuffs by cloning the
original skbuff for each subpacket and setting the offset and length of the
data in that subpacket in the skbuff's private data.  The subpackets are
then placed on the recvmsg queue separately.  The security class then gets
to revise the offset and length to remove its metadata.

If we fail to clone a packet, we just drop it and let the peer resend it.
The original packet gets used for the final subpacket.

This should make it easier to handle parallel decryption of the subpackets.
It also simplifies the handling of lost or misordered packets in the
queuing/buffering loop as the possibility of overlapping jumbo packets no
longer needs to be considered.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2022-11-08 16:42:28 +00:00
David Howells 530403d9ba rxrpc: Clean up ACK handling
Clean up the rxrpc_propose_ACK() function.  If deferred PING ACK proposal
is split out, it's only really needed for deferred DELAY ACKs.  All other
ACKs, bar terminal IDLE ACK are sent immediately.  The deferred IDLE ACK
submission can be handled by conversion of a DELAY ACK into an IDLE ACK if
there's nothing to be SACK'd.

Also, because there's a delay between an ACK being generated and being
transmitted, it's possible that other ACKs of the same type will be
generated during that interval.  Apart from the ACK time and the serial
number responded to, most of the ACK body, including window and SACK
parameters, are not filled out till the point of transmission - so we can
avoid generating a new ACK if there's one pending that will cover the SACK
data we need to convey.

Therefore, don't propose a new DELAY or IDLE ACK for a call if there's one
already pending.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2022-11-08 16:42:28 +00:00
David Howells 72f0c6fb05 rxrpc: Allocate ACK records at proposal and queue for transmission
Allocate rxrpc_txbuf records for ACKs and put onto a queue for the
transmitter thread to dispatch.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2022-11-08 16:42:28 +00:00
David Howells 02a1935640 rxrpc: Define rxrpc_txbuf struct to carry data to be transmitted
Define a struct, rxrpc_txbuf, to carry data to be transmitted instead of a
socket buffer so that it can be placed onto multiple queues at once.  This
also allows the data buffer to be in the same allocation as the internal
data.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2022-11-08 16:42:28 +00:00
David Howells a11e6ff961 rxrpc: Remove call->tx_phase
Remove call->tx_phase as it's only ever set.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2022-11-08 16:42:28 +00:00
David Howells 27f699ccb8 rxrpc: Remove the flags from the rxrpc_skb tracepoint
Remove the flags from the rxrpc_skb tracepoint as we're no longer going to
be using this for the transmission buffers and so marking which are
transmission buffers isn't going to be necessary.

Note that this also remove the rxrpc skb flag that indicates if this is a
transmission buffer and so the count is not updated for the moment.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2022-11-08 16:42:28 +00:00
David Howells b6c66c4324 rxrpc: Use the core ICMP/ICMP6 parsers
Make rxrpc_encap_rcv_err() pass the ICMP/ICMP6 skbuff to ip_icmp_error() or
ipv6_icmp_error() as appropriate to do the parsing rather than trying to do
it in rxrpc.

This pushes an error report onto the UDP socket's error queue and calls
->sk_error_report() from which point rxrpc can pick it up.

It would be preferable to steal the packet directly from ip*_icmp_error()
rather than letting it get queued, but this is probably good enough.

Also note that __udp4_lib_err() calls sk_error_report() twice in some
cases.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2022-11-08 16:42:28 +00:00
David Howells 42fb06b391 net: Change the udp encap_err_rcv to allow use of {ip,ipv6}_icmp_error()
Change the udp encap_err_rcv signature to match ip_icmp_error() and
ipv6_icmp_error() so that those can be used from the called function and
export them.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: netdev@vger.kernel.org
2022-11-08 16:42:28 +00:00
David Howells f7fa52421f rxrpc: Record stats for why the REQUEST-ACK flag is being set
Record stats for why the REQUEST-ACK flag is being set.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2022-11-08 16:42:15 +00:00
David Howells f2a676d100 rxrpc: Record statistics about ACK types
Record statistics about the different types of ACKs that have been
transmitted and received and the number of ACKs that have been filled out
and transmitted or that have been skipped.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2022-11-08 16:42:15 +00:00
David Howells b015424695 rxrpc: Add stats procfile and DATA packet stats
Add a procfile, /proc/net/rxrpc/stats, to display some statistics about
what rxrpc has been doing.  Writing a blank line to the stats file will
clear the increment-only counters.  Allocated resource counters don't get
cleared.

Add some counters to count various things about DATA packets, including the
number created, transmitted and retransmitted and the number received, the
number of ACK-requests markings and the number of jumbo packets received.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2022-11-08 16:42:15 +00:00
David Howells 589a0c1e0a rxrpc: Track highest acked serial
Keep track of the highest DATA serial number that has been acked by the
peer for future purposes.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2022-11-08 16:42:15 +00:00
Gaosheng Cui 9621e74f39 rxrpc: remove rxrpc_max_call_lifetime declaration
rxrpc_max_call_lifetime has been removed since
commit a158bdd324 ("rxrpc: Fix call timeouts"),
so remove it.

Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Link: https://lore.kernel.org/r/20220909064042.1149404-1-cuigaosheng1@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-19 17:58:47 -07:00
David Howells ac56a0b48d rxrpc: Fix ICMP/ICMP6 error handling
Because rxrpc pretends to be a tunnel on top of a UDP/UDP6 socket, allowing
it to siphon off UDP packets early in the handling of received UDP packets
thereby avoiding the packet going through the UDP receive queue, it doesn't
get ICMP packets through the UDP ->sk_error_report() callback.  In fact, it
doesn't appear that there's any usable option for getting hold of ICMP
packets.

Fix this by adding a new UDP encap hook to distribute error messages for
UDP tunnels.  If the hook is set, then the tunnel driver will be able to
see ICMP packets.  The hook provides the offset into the packet of the UDP
header of the original packet that caused the notification.

An alternative would be to call the ->error_handler() hook - but that
requires that the skbuff be cloned (as ip_icmp_error() or ipv6_cmp_error()
do, though isn't really necessary or desirable in rxrpc's case is we want
to parse them there and then, not queue them).

Changes
=======
ver #3)
 - Fixed an uninitialised variable.

ver #2)
 - Fixed some missing CONFIG_AF_RXRPC_IPV6 conditionals.

Fixes: 5271953cad ("rxrpc: Use the UDP encap_rcv hook")
Signed-off-by: David Howells <dhowells@redhat.com>
2022-09-01 11:42:12 +01:00
Jakub Kicinski 677fb75253 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
drivers/net/ethernet/cadence/macb_main.c
  5cebb40bc9 ("net: macb: Fix PTP one step sync support")
  138badbc21 ("net: macb: use NAPI for TX completion path")
https://lore.kernel.org/all/20220523111021.31489367@canb.auug.org.au/

net/smc/af_smc.c
  75c1edf23b ("net/smc: postpone sk_refcnt increment in connect()")
  3aba103006 ("net/smc: align the connect behaviour with TCP")
https://lore.kernel.org/all/20220524114408.4bf1af38@canb.auug.org.au/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-23 21:19:17 -07:00
David Howells 9a3dedcf18 rxrpc: Fix decision on when to generate an IDLE ACK
Fix the decision on when to generate an IDLE ACK by keeping a count of the
number of packets we've received, but not yet soft-ACK'd, and the number of
packets we've processed, but not yet hard-ACK'd, rather than trying to keep
track of which DATA sequence numbers correspond to those points.

We then generate an ACK when either counter exceeds 2.  The counters are
both cleared when we transcribe the information into any sort of ACK packet
for transmission.  IDLE and DELAY ACKs are skipped if both counters are 0
(ie. no change).

Fixes: 805b21b929 ("rxrpc: Send an ACK after every few DATA packets we receive")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22 21:30:53 +01:00
David Howells 81524b6312 rxrpc: Don't let ack.previousPacket regress
The previousPacket field in the rx ACK packet should never go backwards -
it's now the highest DATA sequence number received, not the last on
received (it used to be used for out of sequence detection).

Fixes: 248f219cb8 ("rxrpc: Rewrite the data and ack handling code")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22 21:30:53 +01:00
David Howells 8940ba3cfe rxrpc: Fix overlapping ACK accounting
Fix accidental overlapping of Rx-phase ACK accounting with Tx-phase ACK
accounting through variables shared between the two.  call->acks_* members
refer to ACKs received in the Tx phase and call->ackr_* members to ACKs
sent/to be sent during the Rx phase.

Fixes: 1a2391c30c ("rxrpc: Fix detection of out of order acks")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeffrey Altman <jaltman@auristor.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22 21:30:53 +01:00
David Howells ad25f5cb39 rxrpc: Fix locking issue
There's a locking issue with the per-netns list of calls in rxrpc.  The
pieces of code that add and remove a call from the list use write_lock()
and the calls procfile uses read_lock() to access it.  However, the timer
callback function may trigger a removal by trying to queue a call for
processing and finding that it's already queued - at which point it has a
spare refcount that it has to do something with.  Unfortunately, if it puts
the call and this reduces the refcount to 0, the call will be removed from
the list.  Unfortunately, since the _bh variants of the locking functions
aren't used, this can deadlock.

================================
WARNING: inconsistent lock state
5.18.0-rc3-build4+ #10 Not tainted
--------------------------------
inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
ksoftirqd/2/25 [HC0[0]:SC1[1]:HE1:SE0] takes:
ffff888107ac4038 (&rxnet->call_lock){+.?.}-{2:2}, at: rxrpc_put_call+0x103/0x14b
{SOFTIRQ-ON-W} state was registered at:
...
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&rxnet->call_lock);
  <Interrupt>
    lock(&rxnet->call_lock);

 *** DEADLOCK ***

1 lock held by ksoftirqd/2/25:
 #0: ffff8881008ffdb0 ((&call->timer)){+.-.}-{0:0}, at: call_timer_fn+0x5/0x23d

Changes
=======
ver #2)
 - Changed to using list_next_rcu() rather than rcu_dereference() directly.

Fixes: 17926a7932 ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22 21:03:01 +01:00
David Howells a05754295e rxrpc: Use refcount_t rather than atomic_t
Move to using refcount_t rather than atomic_t for refcounts in rxrpc.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22 21:03:01 +01:00
David Howells 33912c2639 rxrpc: Allow list of in-use local UDP endpoints to be viewed in /proc
Allow the list of in-use local UDP endpoints in the current network
namespace to be viewed in /proc.

To aid with this, the endpoint list is converted to an hlist and RCU-safe
manipulation is used so that the list can be read with only the RCU
read lock held.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22 21:03:01 +01:00
David Howells 4a7f62f919 rxrpc: Fix call timer start racing with call destruction
The rxrpc_call struct has a timer used to handle various timed events
relating to a call.  This timer can get started from the packet input
routines that are run in softirq mode with just the RCU read lock held.
Unfortunately, because only the RCU read lock is held - and neither ref or
other lock is taken - the call can start getting destroyed at the same time
a packet comes in addressed to that call.  This causes the timer - which
was already stopped - to get restarted.  Later, the timer dispatch code may
then oops if the timer got deallocated first.

Fix this by trying to take a ref on the rxrpc_call struct and, if
successful, passing that ref along to the timer.  If the timer was already
running, the ref is discarded.

The timer completion routine can then pass the ref along to the call's work
item when it queues it.  If the timer or work item where already
queued/running, the extra ref is discarded.

Fixes: a158bdd324 ("rxrpc: Fix call timeouts")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
Tested-by: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: http://lists.infradead.org/pipermail/linux-afs/2022-March/005073.html
Link: https://lore.kernel.org/r/164865115696.2943015.11097991776647323586.stgit@warthog.procyon.org.uk
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-03-31 12:25:25 +02:00
David Howells d7d775b1ff rxrpc: Ask the security class how much space to allow in a packet
Ask the security class how much header and trailer space to allow for when
allocating a packet, given how much data is remaining.

This will allow the rxgk security class to stick both a trailer in as well
as a header as appropriate in the future.

Signed-off-by: David Howells <dhowells@redhat.com>
2020-11-23 19:53:11 +00:00
David Howells 521bb3049c rxrpc: Organise connection security to use a union
Organise the security information in the rxrpc_connection struct to use a
union to allow for different data for different security classes.

Signed-off-by: David Howells <dhowells@redhat.com>
2020-11-23 18:09:30 +00:00
David Howells f4bdf3d683 rxrpc: Don't reserve security header in Tx DATA skbuff
Insert the security header into the skbuff representing a DATA packet to be
transmitted rather than using skb_reserve() when the packet is allocated.
This makes it easier to apply crypto that spans the security header and the
data, particularly in the upcoming RxGK class where we have a common
encrypt-and-checksum function that is used in a number of circumstances.

Signed-off-by: David Howells <dhowells@redhat.com>
2020-11-23 18:09:30 +00:00
David Howells 8d47a43c48 rxrpc: Merge prime_packet_security into init_connection_security
Merge the ->prime_packet_security() into the ->init_connection_security()
hook as they're always called together.

Signed-off-by: David Howells <dhowells@redhat.com>
2020-11-23 18:09:30 +00:00
David Howells d5953f6543 rxrpc: Allow security classes to give more info on server keys
Allow a security class to give more information on an rxrpc_s-type key when
it is viewed in /proc/keys.  This will allow the upcoming RxGK security
class to show the enctype name here.

Signed-off-by: David Howells <dhowells@redhat.com>
2020-11-23 18:09:29 +00:00
David Howells 12da59fcab rxrpc: Hand server key parsing off to the security class
Hand responsibility for parsing a server key off to the security class.  We
can determine which class from the description.  This is necessary as rxgk
server keys have different lookup requirements and different content
requirements (dependent on crypto type) to those of rxkad server keys.

Signed-off-by: David Howells <dhowells@redhat.com>
2020-11-23 18:09:29 +00:00
David Howells ca7fb10059 rxrpc: Split the server key type (rxrpc_s) into its own file
Split the server private key type (rxrpc_s) out into its own file rather
than mingling it with the authentication/client key type (rxrpc) since they
don't really bear any relation.

Signed-off-by: David Howells <dhowells@redhat.com>
2020-11-23 18:09:29 +00:00
David Howells ec832bd06d rxrpc: Don't retain the server key in the connection
Don't retain a pointer to the server key in the connection, but rather get
it on demand when the server has to deal with a response packet.

This is necessary to implement RxGK (GSSAPI-mediated transport class),
where we can't know which key we'll need until we've challenged the client
and got back the response.

This also means that we don't need to do a key search in the accept path in
softirq mode.

Also, whilst we're at it, allow the security class to ask for a kvno and
encoding-type variant of a server key as RxGK needs different keys for
different encoding types.  Keys of this type have an extra bit in the
description:

	"<service-id>:<security-index>:<kvno>:<enctype>"

Signed-off-by: David Howells <dhowells@redhat.com>
2020-11-23 18:09:29 +00:00
David Howells 41057ebde0 rxrpc: Support keys with multiple authentication tokens
rxrpc-type keys can have multiple tokens attached for different security
classes.  Currently, rxrpc always picks the first one, whether or not the
security class it indicates is supported.

Add preliminary support for choosing which security class will be used
(this will need to be directed from a higher layer) and go through the
tokens to find one that's supported.

Signed-off-by: David Howells <dhowells@redhat.com>
2020-11-23 18:09:29 +00:00
David Howells ddc7834af8 rxrpc: Fix loss of final ack on shutdown
Fix the loss of transmission of a call's final ack when a socket gets shut
down.  This means that the server will retransmit the last data packet or
send a ping ack and then get an ICMP indicating the port got closed.  The
server will then view this as a failure.

Fixes: 3136ef49a1 ("rxrpc: Delay terminal ACK transmission on a client call")
Signed-off-by: David Howells <dhowells@redhat.com>
2020-10-15 13:28:00 +01:00
Jakub Kicinski 9d49aea13f Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Small conflict around locking in rxrpc_process_event() -
channel_lock moved to bundle in next, while state lock
needs _bh() from net.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-08 15:44:50 -07:00
David Howells 2d914c1bf0 rxrpc: Fix accept on a connection that need securing
When a new incoming call arrives at an userspace rxrpc socket on a new
connection that has a security class set, the code currently pushes it onto
the accept queue to hold a ref on it for the socket.  This doesn't work,
however, as recvmsg() pops it off, notices that it's in the SERVER_SECURING
state and discards the ref.  This means that the call runs out of refs too
early and the kernel oopses.

By contrast, a kernel rxrpc socket manually pre-charges the incoming call
pool with calls that already have user call IDs assigned, so they are ref'd
by the call tree on the socket.

Change the mode of operation for userspace rxrpc server sockets to work
like this too.  Although this is a UAPI change, server sockets aren't
currently functional.

Fixes: 248f219cb8 ("rxrpc: Rewrite the data and ack handling code")
Signed-off-by: David Howells <dhowells@redhat.com>
2020-10-05 16:35:57 +01:00
David Howells 8806245a3e rxrpc: Fix rxrpc_bundle::alloc_error to be signed
The alloc_error field in the rxrpc_bundle struct should be signed as it has
negative error codes assigned to it.  Checks directly on it may then fail,
and may produce a warning like this:

	net/rxrpc/conn_client.c:662 rxrpc_wait_for_channel()
	warn: 'bundle->alloc_error' is unsigned

Fixes: 245500d853 ("rxrpc: Rewrite the client connection manager")
Reported-by Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2020-09-14 16:18:59 +01:00
David Howells 245500d853 rxrpc: Rewrite the client connection manager
Rewrite the rxrpc client connection manager so that it can support multiple
connections for a given security key to a peer.  The following changes are
made:

 (1) For each open socket, the code currently maintains an rbtree with the
     connections placed into it, keyed by communications parameters.  This
     is tricky to maintain as connections can be culled from the tree or
     replaced within it.  Connections can require replacement for a number
     of reasons, e.g. their IDs span too great a range for the IDR data
     type to represent efficiently, the call ID numbers on that conn would
     overflow or the conn got aborted.

     This is changed so that there's now a connection bundle object placed
     in the tree, keyed on the same parameters.  The bundle, however, does
     not need to be replaced.

 (2) An rxrpc_bundle object can now manage the available channels for a set
     of parallel connections.  The lock that manages this is moved there
     from the rxrpc_connection struct (channel_lock).

 (3) There'a a dummy bundle for all incoming connections to share so that
     they have a channel_lock too.  It might be better to give each
     incoming connection its own bundle.  This bundle is not needed to
     manage which channels incoming calls are made on because that's the
     solely at whim of the client.

 (4) The restrictions on how many client connections are around are
     removed.  Instead, a previous patch limits the number of client calls
     that can be allocated.  Ordinarily, client connections are reaped
     after 2 minutes on the idle queue, but when more than a certain number
     of connections are in existence, the reaper starts reaping them after
     2s of idleness instead to get the numbers back down.

     It could also be made such that new call allocations are forced to
     wait until the number of outstanding connections subsides.

Signed-off-by: David Howells <dhowells@redhat.com>
2020-09-08 21:11:43 +01:00
David Howells b7a7d67408 rxrpc: Impose a maximum number of client calls
Impose a maximum on the number of client rxrpc calls that are allowed
simultaneously.  This will be in lieu of a maximum number of client
connections as this is easier to administed as, unlike connections, calls
aren't reusable (to be changed in a subsequent patch)..

This doesn't affect the limits on service calls and connections.

Signed-off-by: David Howells <dhowells@redhat.com>
2020-09-08 21:10:45 +01:00
David Howells 4700c4d80b rxrpc: Fix loss of RTT samples due to interposed ACK
The Rx protocol has a mechanism to help generate RTT samples that works by
a client transmitting a REQUESTED-type ACK when it receives a DATA packet
that has the REQUEST_ACK flag set.

The peer, however, may interpose other ACKs before transmitting the
REQUESTED-ACK, as can be seen in the following trace excerpt:

 rxrpc_tx_data: c=00000044 DATA d0b5ece8:00000001 00000001 q=00000001 fl=07
 rxrpc_rx_ack: c=00000044 00000001 PNG r=00000000 f=00000002 p=00000000 n=0
 rxrpc_rx_ack: c=00000044 00000002 REQ r=00000001 f=00000002 p=00000001 n=0
 ...

DATA packet 1 (q=xx) has REQUEST_ACK set (bit 1 of fl=xx).  The incoming
ping (labelled PNG) hard-acks the request DATA packet (f=xx exceeds the
sequence number of the DATA packet), causing it to be discarded from the Tx
ring.  The ACK that was requested (labelled REQ, r=xx references the serial
of the DATA packet) comes after the ping, but the sk_buff holding the
timestamp has gone and the RTT sample is lost.

This is particularly noticeable on RPC calls used to probe the service
offered by the peer.  A lot of peers end up with an unknown RTT because we
only ever sent a single RPC.  This confuses the server rotation algorithm.

Fix this by caching the information about the outgoing packet in RTT
calculations in the rxrpc_call struct rather than looking in the Tx ring.

A four-deep buffer is maintained and both REQUEST_ACK-flagged DATA and
PING-ACK transmissions are recorded in there.  When the appropriate
response ACK is received, the buffer is checked for a match and, if found,
an RTT sample is recorded.

If a received ACK refers to a packet with a later serial number than an
entry in the cache, that entry is presumed lost and the entry is made
available to record a new transmission.

ACKs types other than REQUESTED-type and PING-type cause any matching
sample to be cancelled as they don't necessarily represent a useful
measurement.

If there's no space in the buffer on ping/data transmission, the sample
base is discarded.

Fixes: 50235c4b5a ("rxrpc: Obtain RTT data by requesting ACKs on DATA packets")
Signed-off-by: David Howells <dhowells@redhat.com>
2020-08-20 17:59:27 +01:00
Christoph Hellwig a7b75c5a8c net: pass a sockptr_t into ->setsockopt
Rework the remaining setsockopt code to pass a sockptr_t instead of a
plain user pointer.  This removes the last remaining set_fs(KERNEL_DS)
outside of architecture specific code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Stefan Schmidt <stefan@datenfreihafen.org> [ieee802154]
Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-24 15:41:54 -07:00
David Howells 3067bf8c59 rxrpc: Move the call completion handling out of line
Move the handling of call completion out of line so that the next patch can
add more code in that area.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
2020-06-05 13:36:35 +01:00
David Howells c410bf0193 rxrpc: Fix the excessive initial retransmission timeout
rxrpc currently uses a fixed 4s retransmission timeout until the RTT is
sufficiently sampled.  This can cause problems with some fileservers with
calls to the cache manager in the afs filesystem being dropped from the
fileserver because a packet goes missing and the retransmission timeout is
greater than the call expiry timeout.

Fix this by:

 (1) Copying the RTT/RTO calculation code from Linux's TCP implementation
     and altering it to fit rxrpc.

 (2) Altering the various users of the RTT to make use of the new SRTT
     value.

 (3) Replacing the use of rxrpc_resend_timeout to use the calculated RTO
     value instead (which is needed in jiffies), along with a backoff.

Notes:

 (1) rxrpc provides RTT samples by matching the serial numbers on outgoing
     DATA packets that have the RXRPC_REQUEST_ACK set and PING ACK packets
     against the reference serial number in incoming REQUESTED ACK and
     PING-RESPONSE ACK packets.

 (2) Each packet that is transmitted on an rxrpc connection gets a new
     per-connection serial number, even for retransmissions, so an ACK can
     be cross-referenced to a specific trigger packet.  This allows RTT
     information to be drawn from retransmitted DATA packets also.

 (3) rxrpc maintains the RTT/RTO state on the rxrpc_peer record rather than
     on an rxrpc_call because many RPC calls won't live long enough to
     generate more than one sample.

 (4) The calculated SRTT value is in units of 8ths of a microsecond rather
     than nanoseconds.

The (S)RTT and RTO values are displayed in /proc/net/rxrpc/peers.

Fixes: 17926a7932 ([AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both"")
Signed-off-by: David Howells <dhowells@redhat.com>
2020-05-11 16:42:28 +01:00