mirror of https://github.com/torvalds/linux.git
435 Commits
| Author | SHA1 | Message | Date |
|---|---|---|---|
|
|
58adf9dcb1 |
net: dsa: let drivers state that they need VLAN filtering while standalone
As explained in commit
|
|
|
|
06cfb2df7e |
net: dsa: don't advertise 'rx-vlan-filter' when not needed
There have been multiple independent reports about dsa_slave_vlan_rx_add_vid being called (and consequently calling the drivers' .port_vlan_add) when it isn't needed, and sometimes (not always) causing problems in the process. Case 1: mv88e6xxx_port_vlan_prepare is stubborn and only accepts VLANs on bridged ports. That is understandably so, because standalone mv88e6xxx ports are VLAN-unaware, and VTU entries are said to be a scarce resource. Otherwise said, the following fails lamentably on mv88e6xxx: ip link add br0 type bridge vlan_filtering 1 ip link set lan3 master br0 ip link add link lan10 name lan10.1 type vlan id 1 [485256.724147] mv88e6085 d0032004.mdio-mii:12: p10: hw VLAN 1 already used by port 3 in br0 RTNETLINK answers: Operation not supported This has become a worse issue since commit |
|
|
|
67b5fb5db7 |
net: dsa: properly fall back to software bridging
If the driver does not implement .port_bridge_{join,leave}, then we must
fall back to standalone operation on that port, and trigger the error
path of dsa_port_bridge_join. This sets dp->bridge_dev = NULL.
In turn, having a non-NULL dp->bridge_dev when there is no offloading
support makes the following things go wrong:
- dsa_default_offload_fwd_mark make the wrong decision in setting
skb->offload_fwd_mark. It should set skb->offload_fwd_mark = 0 for
ports that don't offload the bridge, which should instruct the bridge
to forward in software. But this does not happen, dp->bridge_dev is
incorrectly set to point to the bridge, so the bridge is told that
packets have been forwarded in hardware, which they haven't.
- switchdev objects (MDBs, VLANs) should not be offloaded by ports that
don't offload the bridge. Standalone ports should behave as packet-in,
packet-out and the bridge should not be able to manipulate the pvid of
the port, or tag stripping on egress, or ingress filtering. This
should already work fine because dsa_slave_port_obj_add has:
case SWITCHDEV_OBJ_ID_PORT_VLAN:
if (!dsa_port_offloads_bridge_port(dp, obj->orig_dev))
return -EOPNOTSUPP;
err = dsa_slave_vlan_add(dev, obj, extack);
but since dsa_port_offloads_bridge_port works based on dp->bridge_dev,
this is again sabotaging us.
All the above work in case the port has an unoffloaded LAG interface, so
this is well exercised code, we should apply it for plain unoffloaded
bridge ports too.
Reported-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
f4083a752a |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Conflicts: drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.h |
|
|
|
c35b57ceff |
net: switchdev: zero-initialize struct switchdev_notifier_fdb_info emitted by drivers towards the bridge
The blamed commit added a new field to struct switchdev_notifier_fdb_info,
but did not make sure that all call paths set it to something valid.
For example, a switchdev driver may emit a SWITCHDEV_FDB_ADD_TO_BRIDGE
notifier, and since the 'is_local' flag is not set, it contains junk
from the stack, so the bridge might interpret those notifications as
being for local FDB entries when that was not intended.
To avoid that now and in the future, zero-initialize all
switchdev_notifier_fdb_info structures created by drivers such that all
newly added fields to not need to touch drivers again.
Fixes:
|
|
|
|
39f3210154 |
net: dsa: don't fast age standalone ports
DSA drives the procedure to flush dynamic FDB entries from a port based
on the change of STP state: whenever we go from a state where address
learning is enabled (LEARNING, FORWARDING) to a state where it isn't
(LISTENING, BLOCKING, DISABLED), we need to flush the existing dynamic
entries.
However, there are cases when this is not needed. Internally, when a
DSA switch interface is not under a bridge, DSA still keeps it in the
"FORWARDING" STP state. And when that interface joins a bridge, the
bridge will meticulously iterate that port through all STP states,
starting with BLOCKING and ending with FORWARDING. Because there is a
state transition from the standalone version of FORWARDING into the
temporary BLOCKING bridge port state, DSA calls the fast age procedure.
Since commit
|
|
|
|
c73c57081b |
net: dsa: don't disable multicast flooding to the CPU even without an IGMP querier
Commit |
|
|
|
a76053707d |
dev_ioctl: split out ndo_eth_ioctl
Most users of ndo_do_ioctl are ethernet drivers that implement the MII commands SIOCGMIIPHY/SIOCGMIIREG/SIOCSMIIREG, or hardware timestamping with SIOCSHWTSTAMP/SIOCGHWTSTAMP. Separate these from the few drivers that use ndo_do_ioctl to implement SIOCBOND, SIOCBR and SIOCWANDEV commands. This is a purely cosmetic change intended to help readers find their way through the implementation. Cc: Doug Ledford <dledford@redhat.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: Andrew Lunn <andrew@lunn.ch> Cc: Vivien Didelot <vivien.didelot@gmail.com> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Vladimir Oltean <olteanv@gmail.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: linux-rdma@vger.kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
|
|
|
5af84df962 |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Conflicts are simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net> |
|
|
|
4e51bf44a0 |
net: bridge: move the switchdev object replay helpers to "push" mode
Starting with commit |
|
|
|
21cf377a9c |
net: dsa: ensure linearized SKBs in case of tail taggers
The function skb_put() that is used by tail taggers to make room for the DSA tag must only be called for linearized SKBS. However in case that the slave device inherited features like NETIF_F_HW_SG or NETIF_F_FRAGLIST the SKB passed to the slaves transmit function may not be linearized. Avoid those SKBs by clearing the NETIF_F_HW_SG and NETIF_F_FRAGLIST flags for tail taggers. Furthermore since the tagging protocol can be changed at runtime move the code for setting up the slaves features into dsa_slave_setup_tagger(). Suggested-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
|
|
|
b94dc99c0d |
net: dsa: use switchdev_handle_fdb_{add,del}_to_device
Using the new fan-out helper for FDB entries installed on the software bridge, we can install host addresses with the proper refcount on the CPU port, such that this case: ip link set swp0 master br0 ip link set swp1 master br0 ip link set swp2 master br0 ip link set swp3 master br0 ip link set br0 address 00:01:02:03:04:05 ip link set swp3 nomaster works properly and the br0 address remains installed as a host entry with refcount 3 instead of getting deleted. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
|
|
|
c6451cda10 |
net: switchdev: introduce helper for checking dynamically learned FDB entries
It is a bit difficult to understand what DSA checks when it tries to avoid installing dynamically learned addresses on foreign interfaces as local host addresses, so create a generic switchdev helper that can be reused and is generally more readable. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
|
|
|
4bed397c3e |
net: dsa: ensure during dsa_fdb_offload_notify that dev_hold and dev_put are on the same dev
When
(a) "dev" is a bridge port which the DSA switch tree offloads, but is
otherwise not a dsa slave (such as a LAG netdev), or
(b) "dev" is the bridge net device itself
then strange things happen to the dev_hold/dev_put pair:
dsa_schedule_work() will still be called with a DSA port that offloads
that netdev, but dev_hold() will be called on the non-DSA netdev.
Then the "if" condition in dsa_slave_switchdev_event_work() does not
pass, because "dev" is not a DSA netdev, so dev_put() is not called.
This results in the simple fact that we have a reference counting
mismatch on the "dev" net device.
This can be seen when we add support for host addresses installed on the
bridge net device.
ip link add br1 type bridge
ip link set br1 address 00:01:02:03:04:05
ip link set swp0 master br1
ip link del br1
[ 968.512278] unregister_netdevice: waiting for br1 to become free. Usage count = 5
It seems foolish to do penny pinching and not add the net_device pointer
in the dsa_switchdev_event_work structure, so let's finally do that.
As an added bonus, when we start offloading local entries pointing
towards the bridge, these will now properly appear as 'offloaded' in
'bridge fdb' (this was not possible before, because 'dev' was assumed to
only be a DSA net device):
00:01:02:03:04:05 dev br0 vlan 1 offload master br0 permanent
00:01:02:03:04:05 dev br0 offload master br0 permanent
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
81a619f787 |
net: dsa: include fdb entries pointing to bridge in the host fdb list
The bridge supports a legacy way of adding local (non-forwarded) FDB
entries, which works on an individual port basis:
bridge fdb add dev swp0 00:01:02:03:04:05 master local
As well as a new way, added by Roopa Prabhu in commit
|
|
|
|
10fae4ac89 |
net: dsa: include bridge addresses which are local in the host fdb list
The bridge automatically creates local (not forwarded) fdb entries pointing towards physical ports with their interface MAC addresses. For switchdev, the significance of these fdb entries is the exact opposite of that of non-local entries: instead of sending these frame outwards, we must send them inwards (towards the host). NOTE: The bridge's own MAC address is also "local". If that address is not shared with any port, the bridge's MAC is not be added by this functionality - but the following commit takes care of that case. NOTE 2: We mark these addresses as host-filtered regardless of the value of ds->assisted_learning_on_cpu_port. This is because, as opposed to the speculative logic done for dynamic address learning on foreign interfaces, the local FDB entries are rather fixed, so there isn't any risk of them migrating from one bridge port to another. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
|
|
|
3068d466a6 |
net: dsa: sync static FDB entries on foreign interfaces to hardware
DSA is able to install FDB entries towards the CPU port for addresses which were dynamically learnt by the software bridge on foreign interfaces that are in the same bridge with a DSA switch interface. Since this behavior is opportunistic, it is guarded by the "assisted_learning_on_cpu_port" property which can be enabled by drivers and is not done automatically (since certain switches may support address learning of packets coming from the CPU port). But if those FDB entries added on the foreign interfaces are static (added by the user) instead of dynamically learnt, currently DSA does not do anything (and arguably it should). Because static FDB entries are not supposed to move on their own, there is no downside in reusing the "assisted_learning_on_cpu_port" logic to sync static FDB entries to the DSA CPU port unconditionally, even if assisted_learning_on_cpu_port is not requested by the driver. For example, this situation: br0 / \ swp0 dummy0 $ bridge fdb add 02:00:de:ad:00:01 dev dummy0 vlan 1 master static Results in DSA adding an entry in the hardware FDB, pointing this address towards the CPU port. The same is true for entries added to the bridge itself, e.g: $ bridge fdb add 02:00:de:ad:00:01 dev br0 vlan 1 self local (except that right now, DSA still ignores 'local' FDB entries, this will be changed in a later patch) Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
|
|
|
3dc80afc50 |
net: dsa: introduce a separate cross-chip notifier type for host FDBs
DSA treats some bridge FDB entries by trapping them to the CPU port. Currently, the only class of such entries are FDB addresses learnt by the software bridge on a foreign interface. However there are many more to be added: - FDB entries with the is_local flag (for termination) added by the bridge on the user ports (typically containing the MAC address of the bridge port) - FDB entries pointing towards the bridge net device (for termination). Typically these contain the MAC address of the bridge net device. - Static FDB entries installed on a foreign interface that is in the same bridge with a DSA user port. The reason why a separate cross-chip notifier for host FDBs is justified compared to normal FDBs is the same as in the case of host MDBs: the cross-chip notifier matching function in switch.c should avoid installing these entries on routing ports that route towards the targeted switch, but not towards the CPU. This is required in order to have proper support for H-like multi-chip topologies. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
|
|
|
b8e997c490 |
net: dsa: introduce a separate cross-chip notifier type for host MDBs
Commit
|
|
|
|
b117e1e8a8 |
net: dsa: delete dsa_legacy_fdb_add and dsa_legacy_fdb_del
We want to add reference counting for FDB entries in cross-chip topologies, and in order for that to have any chance of working and not be unbalanced (leading to entries which are never deleted), we need to ensure that higher layers are sane, because if they aren't, it's garbage in, garbage out. For example, if we add a bridge FDB entry twice, the bridge properly errors out: $ bridge fdb add dev swp0 00:01:02:03:04:07 master static $ bridge fdb add dev swp0 00:01:02:03:04:07 master static RTNETLINK answers: File exists However, the same thing cannot be said about the bridge bypass operations: $ bridge fdb add dev swp0 00:01:02:03:04:07 $ bridge fdb add dev swp0 00:01:02:03:04:07 $ bridge fdb add dev swp0 00:01:02:03:04:07 $ bridge fdb add dev swp0 00:01:02:03:04:07 $ echo $? 0 But one 'bridge fdb del' is enough to remove the entry, no matter how many times it was added. The bridge bypass operations are impossible to maintain in these circumstances and lack of support for reference counting the cross-chip notifiers is holding us back from making further progress, so just drop support for them. The only way left for users to install static bridge FDB entries is the proper one, using the "master static" flags. With this change, rtnl_fdb_add() falls back to calling ndo_dflt_fdb_add() which uses the duplicate-exclusive variant of dev_uc_add(): dev_uc_add_excl(). Because DSA does not (yet) declare IFF_UNICAST_FLT, this results in us going to promiscuous mode: $ bridge fdb add dev swp0 00:01:02:03:04:05 [ 28.206743] device swp0 entered promiscuous mode $ bridge fdb add dev swp0 00:01:02:03:04:05 RTNETLINK answers: File exists So even if it does not completely fail, there is at least some indication that it is behaving differently from before, and closer to user space expectations, I would argue (the lack of a "local|static" specifier defaults to "local", or "host-only", so dev_uc_add() is a reasonable default implementation). If the generic implementation of .ndo_fdb_add provided by Vlad Yasevich is a proof of anything, it only proves that the implementation provided by DSA was always wrong, by not looking at "ndm->ndm_state & NUD_NOARP" (the "static" flag which means that the FDB entry points outwards) and "ndm->ndm_state & NUD_PERMANENT" (the "local" flag which means that the FDB entry points towards the host). It all used to mean the same thing to DSA. Update the documentation so that the users are not confused about what's going on. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
|
|
|
7491894532 |
net: dsa: replay a deletion of switchdev objects for ports leaving a bridged LAG
When a DSA switch port leaves a bonding interface that is under a bridge, there might be dangling switchdev objects on that port left behind, because the bridge is not aware that its lower interface (the bond) changed state in any way. Call the bridge replay helpers with adding=false before changing dp->bridge_dev to NULL, because we need to simulate to dsa_slave_port_obj_del() that these notifications were emitted by the bridge. We add this hook to the NETDEV_PRECHANGEUPPER event handler, because we are calling into switchdev (and the __switchdev_handle_port_obj_del fanout helpers expect the upper/lower adjacency lists to still be valid) and PRECHANGEUPPER is the last moment in time when they still are. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
|
|
|
4ede74e73b |
net: dsa: refactor the prechangeupper sanity checks into a dedicated function
We need to add more logic to the DSA NETDEV_PRECHANGEUPPER event handler, more exactly we need to request an unsync of switchdev objects. In order to fit more code, refactor the existing logic into a helper. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
|
|
|
0d2cfbd41c |
net: bridge: ignore switchdev events for LAG ports which didn't request replay
There is a slight inconvenience in the switchdev replay helpers added recently, and this is when: ip link add br0 type bridge ip link add bond0 type bond ip link set bond0 master br0 bridge vlan add dev bond0 vid 100 ip link set swp0 master bond0 ip link set swp1 master bond0 Since the underlying driver (currently only DSA) asks for a replay of VLANs when swp0 and swp1 join the LAG because it is bridged, what will happen is that DSA will try to react twice on the VLAN event for swp0. This is not really a huge problem right now, because most drivers accept duplicates since the bridge itself does, but it will become a problem when we add support for replaying switchdev object deletions. Let's fix this by adding a blank void *ctx in the replay helpers, which will be passed on by the bridge in the switchdev notifications. If the context is NULL, everything is the same as before. But if the context is populated with a valid pointer, the underlying switchdev driver (currently DSA) can use the pointer to 'see through' the bridge port (which in the example above is bond0) and 'know' that the event is only for a particular physical port offloading that bridge port, and not for all of them. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
|
|
|
69bfac968a |
net: switchdev: add a context void pointer to struct switchdev_notifier_info
In the case where the driver asks for a replay of a certain type of event (port object or attribute) for a bridge port that is a LAG, it may do so because this port has just joined the LAG. But there might already be other switchdev ports in that LAG, and it is preferable that those preexisting switchdev ports do not act upon the replayed event. The solution is to add a context to switchdev events, which is NULL most of the time (when the bridge layer initiates the call) but which can be set to a value controlled by the switchdev driver when a replay is requested. The driver can then check the context to figure out if all ports within the LAG should act upon the switchdev event, or just the ones that match the context. We have to modify all switchdev_handle_* helper functions as well as the prototypes in the drivers that use these helpers too, because these helpers hide the underlying struct switchdev_notifier_info from us and there is no way to retrieve the context otherwise. The context structure will be populated and used in later patches. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
|
|
|
88faba20e2 |
net: dsa: targeted MTU notifiers should only match on one port
dsa_slave_change_mtu() calls dsa_port_mtu_change() twice:
- it sends a cross-chip notifier with the MTU of the CPU port which is
used to update the DSA links.
- it sends one targeted MTU notifier which is supposed to only match the
user port on which we are changing the MTU. The "propagate_upstream"
variable is used here to bypass the cross-chip notifier system from
switch.c
But due to a mistake, the second, targeted notifier matches not only on
the user port, but also on the DSA link which is a member of the same
switch, if that exists.
And because the DSA links of the entire dst were programmed in a
previous round to the largest_mtu via a "propagate_upstream == true"
notification, then the dsa_port_mtu_change(propagate_upstream == false)
call that is immediately upcoming will break the MTU on the one DSA link
which is chip-wise local to the dp whose MTU is changing right now.
Example given this daisy chain topology:
sw0p0 sw0p1 sw0p2 sw0p3 sw0p4
[ cpu ] [ user ] [ user ] [ dsa ] [ user ]
[ x ] [ ] [ ] [ x ] [ ]
|
+---------+
|
sw1p0 sw1p1 sw1p2 sw1p3 sw1p4
[ user ] [ user ] [ user ] [ dsa ] [ dsa ]
[ ] [ ] [ ] [ ] [ x ]
ip link set sw0p1 mtu 9000
ip link set sw1p1 mtu 9000 # at this stage, sw0p1 and sw1p1 can talk
# to one another using jumbo frames
ip link set sw0p2 mtu 1500 # this programs the sw0p3 DSA link first to
# the largest_mtu of 9000, then reprograms it to
# 1500 with the "propagate_upstream == false"
# notifier, breaking communication between
# sw0p1 and sw1p1
To escape from this situation, make the targeted match really match on a
single port - the user port, and rename the "propagate_upstream"
variable to "targeted_match" to clarify the intention and avoid future
issues.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
4e4ab79500 |
net: dsa: calculate the largest_mtu across all ports in the tree
If we have a cross-chip topology like this:
sw0p0 sw0p1 sw0p2 sw0p3 sw0p4
[ cpu ] [ user ] [ user ] [ dsa ] [ user ]
|
+---------+
|
sw1p0 sw1p1 sw1p2 sw1p3 sw1p4
[ user ] [ user ] [ user ] [ dsa ] [ dsa ]
and we issue the following commands:
1. ip link set sw0p1 mtu 1700
2. ip link set sw1p1 mtu 1600
we notice the following happening:
Command 1. emits a non-targeted MTU notifier for the CPU port (sw0p0)
with the largest_mtu calculated across switch 0, of 1700. This matches
sw0p0, sw0p3 and sw1p4 (all CPU ports and DSA links).
Then, it emits a targeted MTU notifier for the user port (sw0p1), again
with MTU 1700 (this doesn't matter).
Command 2. emits a non-targeted MTU notifier for the CPU port (sw0p0)
with the largest_mtu calculated across switch 1, of 1600. This matches
the same group of ports as above, and decreases the MTU for the CPU port
and the DSA links from 1700 to 1600.
As a result, the sw0p1 user port can no longer communicate with its CPU
port at MTU 1700.
To address this, we should calculate the largest_mtu across all switches
that may share a CPU port, and only emit MTU notifiers with that value.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
c916e8e1ea |
net: dsa: dsa_slave_phy_connect(): extend phy's flags with port specific phy flags
The current get_phy_flags() is only processed when we connect to a PHY via a designed phy-handle property via phylink_of_phy_connect(), but if we fallback on the internal MDIO bus created by a switch and take the dsa_slave_phy_connect() path then we would not be processing that flag and using it at PHY connection time. Suggested-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net> |
|
|
|
4e50025129 |
net: dsa: generalize overhead for taggers that use both headers and trailers
Some really really weird switches just couldn't decide whether to use a normal or a tail tagger, so they just did both. This creates problems for DSA, because we only have the concept of an 'overhead' which can be applied to the headroom or to the tailroom of the skb (like for example during the central TX reallocation procedure), depending on the value of bool tail_tag, but not to both. We need to generalize DSA to cater for these odd switches by transforming the 'overhead / tail_tag' pair into 'needed_headroom / needed_tailroom'. The DSA master's MTU is increased to account for both. The flow dissector code is modified such that it only calls the DSA adjustment callback if the tagger has a non-zero header length. Taggers are trivially modified to declare either needed_headroom or needed_tailroom, based on the tail_tag value that they currently declare. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
|
|
|
b94cbc909f |
net: dsa: fix error code getting shifted with 4 in dsa_slave_get_sset_count
DSA implements a bunch of 'standardized' ethtool statistics counters,
namely tx_packets, tx_bytes, rx_packets, rx_bytes. So whatever the
hardware driver returns in .get_sset_count(), we need to add 4 to that.
That is ok, except that .get_sset_count() can return a negative error
code, for example:
b53_get_sset_count
-> phy_ethtool_get_sset_count
-> return -EIO
-EIO is -5, and with 4 added to it, it becomes -1, aka -EPERM. One can
imagine that certain error codes may even become positive, although
based on code inspection I did not see instances of that.
Check the error code first, if it is negative return it as-is.
Based on a similar patch for dsa_master_get_strings from Dan Carpenter:
https://patchwork.kernel.org/project/netdevbpf/patch/YJaSe3RPgn7gKxZv@mwanda/
Fixes:
|
|
|
|
c4b364ce12 |
net: dsa: free skb->cb usage in core driver
Free skb->cb usage in core driver and let device drivers decide to use or not. The reason having a DSA_SKB_CB(skb)->clone was because dsa_skb_tx_timestamp() which may set the clone pointer was called before p->xmit() which would use the clone if any, and the device driver has no way to initialize the clone pointer. This patch just put memset(skb->cb, 0, sizeof(skb->cb)) at beginning of dsa_slave_xmit(). Some new features in the future, like one-step timestamp may need more bytes of skb->cb to use in dsa_skb_tx_timestamp(), and p->xmit(). Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
|
|
|
5c5416f5d4 |
net: dsa: no longer clone skb in core driver
It was a waste to clone skb directly in dsa_skb_tx_timestamp(). For one-step timestamping, a clone was not needed. For any failure of port_txtstamp (this may usually happen), the skb clone had to be freed. So this patch moves skb cloning for tx timestamp out of dsa core, and let drivers clone skb in port_txtstamp if they really need. Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Tested-by: Kurt Kanzenbach <kurt@linutronix.de> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
|
|
|
cf536ea3c7 |
net: dsa: no longer identify PTP packet in core driver
Move ptp_classify_raw out of dsa core driver for handling tx timestamp request. Let device drivers do this if they want. Not all drivers want to limit tx timestamping for only PTP packet. Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Tested-by: Kurt Kanzenbach <kurt@linutronix.de> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
|
|
|
cfd12c06cd |
net: dsa: check tx timestamp request in core driver
Check tx timestamp request in core driver at very beginning of dsa_skb_tx_timestamp(), so that most skbs not requiring tx timestamp just return. And drop such checking in device drivers. Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Tested-by: Kurt Kanzenbach <kurt@linutronix.de> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
|
|
|
a71acad90a |
net: dsa: enable selftest support for all switches by default
Most of generic selftest should be able to work with probably all ethernet controllers. The DSA switches are not exception, so enable it by default at least for DSA. This patch was tested with SJA1105 and AR9331. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net> |
|
|
|
2c4eca3ef7 |
net: bridge: switchdev: include local flag in FDB notifications
As explained in bugfix commit
|
|
|
|
83216e3988 |
of: net: pass the dst buffer to of_get_mac_address()
of_get_mac_address() returns a "const void*" pointer to a MAC address.
Lately, support to fetch the MAC address by an NVMEM provider was added.
But this will only work with platform devices. It will not work with
PCI devices (e.g. of an integrated root complex) and esp. not with DSA
ports.
There is an of_* variant of the nvmem binding which works without
devices. The returned data of a nvmem_cell_read() has to be freed after
use. On the other hand the return of_get_mac_address() points to some
static data without a lifetime. The trick for now, was to allocate a
device resource managed buffer which is then returned. This will only
work if we have an actual device.
Change it, so that the caller of of_get_mac_address() has to supply a
buffer where the MAC address is written to. Unfortunately, this will
touch all drivers which use the of_get_mac_address().
Usually the code looks like:
const char *addr;
addr = of_get_mac_address(np);
if (!IS_ERR(addr))
ether_addr_copy(ndev->dev_addr, addr);
This can then be simply rewritten as:
of_get_mac_address(np, ndev->dev_addr);
Sometimes is_valid_ether_addr() is used to test the MAC address.
of_get_mac_address() already makes sure, it just returns a valid MAC
address. Thus we can just test its return code. But we have to be
careful if there are still other sources for the MAC address before the
of_get_mac_address(). In this case we have to keep the
is_valid_ether_addr() call.
The following coccinelle patch was used to convert common cases to the
new style. Afterwards, I've manually gone over the drivers and fixed the
return code variable: either used a new one or if one was already
available use that. Mansour Moufid, thanks for that coccinelle patch!
<spml>
@a@
identifier x;
expression y, z;
@@
- x = of_get_mac_address(y);
+ x = of_get_mac_address(y, z);
<...
- ether_addr_copy(z, x);
...>
@@
identifier a.x;
@@
- if (<+... x ...+>) {}
@@
identifier a.x;
@@
if (<+... x ...+>) {
...
}
- else {}
@@
identifier a.x;
expression e;
@@
- if (<+... x ...+>@e)
- {}
- else
+ if (!(e))
{...}
@@
expression x, y, z;
@@
- x = of_get_mac_address(y, z);
+ of_get_mac_address(y, z);
... when != x
</spml>
All drivers, except drivers/net/ethernet/aeroflex/greth.c, were
compile-time tested.
Suggested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
3fb24a43c9 |
dsa: slave: add support for TC_SETUP_FT
The dsa infrastructure provides a well-defined hierarchy of devices, pass up the call to set up the flow block to the master device. From the software dataplane, the netfilter infrastructure uses the dsa slave devices to refer to the input and output device for the given skbuff. Similarly, the flowtable definition in the ruleset refers to the dsa slave port devices. This patch adds the glue code to call ndo_setup_tc with TC_SETUP_FT with the master device via the dsa slave devices. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net> |
|
|
|
0994d492a1 |
net: dsa: resolve forwarding path for dsa slave ports
Add .ndo_fill_forward_path for dsa slave port devices Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net> |
|
|
|
010e269f91 |
net: dsa: sync up switchdev objects and port attributes when joining the bridge
If we join an already-created bridge port, such as a bond master interface, then we can miss the initial switchdev notifications emitted by the bridge for this port, while it wasn't offloaded by anybody. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
|
|
|
2afc526ab3 |
net: dsa: pass extack to dsa_port_{bridge,lag}_join
This is a pretty noisy change that was broken out of the larger change for replaying switchdev attributes and objects at bridge join time, which is when these extack objects are actually used. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
|
|
|
03cbb87054 |
net: dsa: fix switchdev objects on bridge master mistakenly being applied on ports
Tobias reports that after the blamed patch, VLAN objects being added to
a bridge device are being added to all slave ports instead (swp2, swp3).
ip link add br0 type bridge vlan_filtering 1
ip link set swp2 master br0
ip link set swp3 master br0
bridge vlan add dev br0 vid 100 self
This is because the fix was too broad: we made dsa_port_offloads_netdev
say "yes, I offload the br0 bridge" for all slave ports, but we didn't
add the checks whether the switchdev object was in fact meant for the
physical port or for the bridge itself. So we are reacting on events in
a way in which we shouldn't.
The reason why the fix was too broad is because the question itself,
"does this DSA port offload this netdev", was too broad in the first
place. The solution is to disambiguate the question and separate it into
two different functions, one to be called for each switchdev attribute /
object that has an orig_dev == net_bridge (dsa_port_offloads_bridge),
and the other for orig_dev == net_bridge_port (*_offloads_bridge_port).
In the case of VLAN objects on the bridge interface, this solves the
problem because we know that VLAN objects are per bridge port and not
per bridge. And when orig_dev is equal to the net_bridge, we offload it
as a bridge, but not as a bridge port; that's how we are able to skip
reacting on those events. Note that this is compatible with future plans
to have explicit offloading of VLAN objects on the bridge interface as a
bridge port (in DSA, this signifies that we should add that VLAN towards
the CPU port).
Fixes:
|
|
|
|
c595c4330d |
net: dsa: add MRP support
Add support for offloading MRP in HW. Currently implement the switchdev calls 'SWITCHDEV_OBJ_ID_MRP', 'SWITCHDEV_OBJ_ID_RING_ROLE_MRP', to allow to create MRP instances and to set the role of these instances. Add DSA_NOTIFIER_MRP_ADD/DEL and DSA_NOTIFIER_MRP_ADD/DEL_RING_ROLE which calls to .port_mrp_add/del and .port_mrp_add/del_ring_role in the DSA driver for the switch. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
|
|
|
89153ed6eb |
net: dsa: propagate extack to .port_vlan_filtering
Some drivers can't dynamically change the VLAN filtering option, or impose some restrictions, it would be nice to propagate this info through netlink instead of printing it to a kernel log that might never be read. Also netlink extack includes the module that emitted the message, which means that it's easier to figure out which ones are driver-generated errors as opposed to command misuse. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
|
|
|
31046a5fd9 |
net: dsa: propagate extack to .port_vlan_add
Allow drivers to communicate their restrictions to user space directly, instead of printing to the kernel log. Where the conversion would have been lossy and things like VLAN ID could no longer be conveyed (due to the lack of support for printf format specifier in netlink extack), I chose to keep the messages in full form to the kernel log only, and leave it up to individual driver maintainers to move more messages to extack. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
|
|
|
a8b659e7ff |
net: dsa: act as passthrough for bridge port flags
There are multiple ways in which a PORT_BRIDGE_FLAGS attribute can be expressed by the bridge through switchdev, and not all of them can be emulated by DSA mid-layer API at the same time. One possible configuration is when the bridge offloads the port flags using a mask that has a single bit set - therefore only one feature should change. However, DSA currently groups together unicast and multicast flooding in the .port_egress_floods method, which limits our options when we try to add support for turning off broadcast flooding: do we extend .port_egress_floods with a third parameter which b53 and mv88e6xxx will ignore? But that means that the DSA layer, which currently implements the PRE_BRIDGE_FLAGS attribute all by itself, will see that .port_egress_floods is implemented, and will report that all 3 types of flooding are supported - not necessarily true. Another configuration is when the user specifies more than one flag at the same time, in the same netlink message. If we were to create one individual function per offloadable bridge port flag, we would limit the expressiveness of the switch driver of refusing certain combinations of flag values. For example, a switch may not have an explicit knob for flooding of unknown multicast, just for flooding in general. In that case, the only correct thing to do is to allow changes to BR_FLOOD and BR_MCAST_FLOOD in tandem, and never allow mismatched values. But having a separate .port_set_unicast_flood and .port_set_multicast_flood would not allow the driver to possibly reject that. Also, DSA doesn't consider it necessary to inform the driver that a SWITCHDEV_ATTR_ID_BRIDGE_MROUTER attribute was offloaded, because it just calls .port_egress_floods for the CPU port. When we'll add support for the plain SWITCHDEV_ATTR_ID_PORT_MROUTER, that will become a real problem because the flood settings will need to be held statefully in the DSA middle layer, otherwise changing the mrouter port attribute will impact the flooding attribute. And that's _assuming_ that the underlying hardware doesn't have anything else to do when a multicast router attaches to a port than flood unknown traffic to it. If it does, there will need to be a dedicated .port_set_mrouter anyway. So we need to let the DSA drivers see the exact form that the bridge passes this switchdev attribute in, otherwise we are standing in the way. Therefore we also need to use this form of language when communicating to the driver that it needs to configure its initial (before bridge join) and final (after bridge leave) port flags. The b53 and mv88e6xxx drivers are converted to the passthrough API and their implementation of .port_egress_floods is split into two: a function that configures unicast flooding and another for multicast. The mv88e6xxx implementation is quite hairy, and it turns out that the implementations of unknown unicast flooding are actually the same for 6185 and for 6352: behind the confusing names actually lie two individual bits: NO_UNKNOWN_MC -> FLOOD_UC = 0x4 = BIT(2) NO_UNKNOWN_UC -> FLOOD_MC = 0x8 = BIT(3) so there was no reason to entangle them in the first place. Whereas the 6185 writes to MV88E6185_PORT_CTL0_FORWARD_UNKNOWN of PORT_CTL0, which has the exact same bit index. I have left the implementations separate though, for the only reason that the names are different enough to confuse me, since I am not able to double-check with a user manual. The multicast flooding setting for 6185 is in a different register than for 6352 though. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
|
|
|
4c08c586ff |
net: switchdev: propagate extack to port attributes
When a struct switchdev_attr is notified through switchdev, there is no way to report informational messages, unlike for struct switchdev_obj. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Nikolay Aleksandrov <nikolay@nvidia.com> Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
|
|
|
18596f504a |
net: dsa: add support for offloading HSR
Add support for offloading of HSR/PRP (IEC 62439-3) tag insertion
tag removal, duplicate generation and forwarding on DSA switches.
Add DSA_NOTIFIER_HSR_JOIN and DSA_NOTIFIER_HSR_LEAVE which trigger calls
to .port_hsr_join and .port_hsr_leave in the DSA driver for the switch.
The DSA switch driver should then set netdev feature flags for the
HSR/PRP operation that it offloads.
NETIF_F_HW_HSR_TAG_INS
NETIF_F_HW_HSR_TAG_RM
NETIF_F_HW_HSR_FWD
NETIF_F_HW_HSR_DUP
Signed-off-by: George McCollister <george.mccollister@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
a324d3d48f |
net: dsa: make assisted_learning_on_cpu_port bypass offloaded LAG interfaces
Given the following topology, and focusing only on Box A:
Box A
+----------------------------------+
| Board 1 br0 |
| +---------+ |
| / \ |
| | | |
| | bond0 |
| | +-----+ |
|192.168.1.1 | / \ |
| eno0 swp0 swp1 swp2 |
+---|--------|-------|-------|-----+
| | | |
+--------+ | |
Cable | |
Cable| |Cable
Cable | |
+--------+ | |
| | | |
+---|--------|-------|-------|-----+
| eno0 swp0 swp1 swp2 |
|192.168.1.2 | \ / |
| | +-----+ |
| | bond0 |
| | | |
| \ / |
| +---------+ |
| Board 2 br0 |
+----------------------------------+
Box B
The assisted_learning_on_cpu_port logic will see that swp0 is bridged
with a "foreign interface" (bond0) and will therefore install all
addresses learnt by the software bridge towards bond0 (including the
address of eno0 on Box B) as static addresses towards the CPU port.
But that's not what we want - bond0 is not really a "foreign interface"
but one we can offload including L2 forwarding from/towards it. So we
need to refine our logic for assisted learning such that, whenever we
see an address learnt on a non-DSA interface, we search through the tree
for any port that offloads that non-DSA interface.
Some confusion might arise as to why we search through the whole tree
instead of just the local switch returned by dsa_slave_dev_lower_find.
Or a different angle of the same confusion: why does
dsa_slave_dev_lower_find(br_dev) return a single dp that's under br_dev
instead of the whole list of bridged DSA ports?
To answer the second question, it should be enough to install the static
FDB entry on the CPU port of a single switch in the tree, because
dsa_port_fdb_add uses DSA_NOTIFIER_FDB_ADD which ensures that all other
switches in the tree get notified of that address, and add the entry
themselves using dsa_towards_port().
This should help understand the answer to the first question: the port
returned by dsa_slave_dev_lower_find may not be on the same switch as
the ports that offload the LAG. Nonetheless, if the driver implements
.crosschip_lag_join and .crosschip_bridge_join as mv88e6xxx does, there
still isn't any reason for trapping addresses learnt on the remote LAG
towards the CPU, and we should prevent that.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
|
|
c0a8a9c274 |
net: dsa: automatically bring user ports down when master goes down
This is not fixing any actual bug that I know of, but having a DSA interface that is up even when its lower (master) interface is down is one of those things that just do not sound right. Yes, DSA checks if the master is up before actually bringing the user interface up, but nobody prevents bringing the master interface down immediately afterwards... Then the user ports would attempt dev_queue_xmit on an interface that is down, and wonder what's wrong. This patch prevents that from happening. NETDEV_GOING_DOWN is the notification emitted _before_ the master actually goes down, and we are protected by the rtnl_mutex, so all is well. For those of you reading this because you were doing switch testing such as latency measurements for autonomously forwarded traffic, and you needed a controlled environment with no extra packets sent by the network stack, this patch breaks that, because now the user ports go down too, which may shut down the PHY etc. But please don't do it like that, just do instead: tc qdisc add dev eno2 clsact tc filter add dev eno2 egress flower action drop Tested with two cascaded DSA switches: $ ip link set eno2 down sja1105 spi2.0 sw0p2: Link is Down mscc_felix 0000:00:00.5 swp0: Link is Down fsl_enetc 0000:00:00.2 eno2: Link is Down Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
|
|
|
9d5ef190e5 |
net: dsa: automatically bring up DSA master when opening user port
DSA wants the master interface to be open before the user port is due to historical reasons. The promiscuity of interfaces that are down used to have issues, as referenced Lennert Buytenhek in commit |