linux/net/ipv4
Gilad Naaman 227adfb2b1 net: Set true network header for ECN decapsulation
In cases where the header straight after the tunnel header was
another ethernet header (TEB), instead of the network header,
the ECN decapsulation code would treat the ethernet header as if
it was an IP header, resulting in mishandling and possible
wrong drops or corruption of the IP header.

In this case, ECT(1) is sent, so IP_ECN_decapsulate tries to copy it to the
inner IPv4 header, and correct its checksum.

The offset of the ECT bits in an IPv4 header corresponds to the
lower 2 bits of the second octet of the destination MAC address
in the ethernet header.
The IPv4 checksum corresponds to end of the source address.

In order to reproduce:

    $ ip netns add A
    $ ip netns add B
    $ ip -n A link add _v0 type veth peer name _v1 netns B
    $ ip -n A link set _v0 up
    $ ip -n A addr add dev _v0 10.254.3.1/24
    $ ip -n A route add default dev _v0 scope global
    $ ip -n B link set _v1 up
    $ ip -n B addr add dev _v1 10.254.1.6/24
    $ ip -n B route add default dev _v1 scope global
    $ ip -n B link add gre1 type gretap local 10.254.1.6 remote 10.254.3.1 key 0x49000000
    $ ip -n B link set gre1 up

    # Now send an IPv4/GRE/Eth/IPv4 frame where the outer header has ECT(1),
    # and the inner header has no ECT bits set:

    $ cat send_pkt.py
        #!/usr/bin/env python3
        from scapy.all import *

        pkt = IP(b'E\x01\x00\xa7\x00\x00\x00\x00@/`%\n\xfe\x03\x01\n\xfe\x01\x06 \x00eXI\x00'
                 b'\x00\x00\x18\xbe\x92\xa0\xee&\x18\xb0\x92\xa0l&\x08\x00E\x00\x00}\x8b\x85'
                 b'@\x00\x01\x01\xe4\xf2\x82\x82\x82\x01\x82\x82\x82\x02\x08\x00d\x11\xa6\xeb'
                 b'3\x1e\x1e\\xf3\\xf7`\x00\x00\x00\x00ZN\x00\x00\x00\x00\x00\x00\x10\x11\x12'
                 b'\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./01234'
                 b'56789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ')

        send(pkt)
    $ sudo ip netns exec B tcpdump -neqlllvi gre1 icmp & ; sleep 1
    $ sudo ip netns exec A python3 send_pkt.py

In the original packet, the source/destinatio MAC addresses are
dst=18:be:92:a0:ee:26 src=18:b0:92:a0:6c:26

In the received packet, they are
dst=18:bd:92:a0:ee:26 src=18:b0:92:a0:6c:27

Thanks to Lahav Schlesinger <lschlesinger@drivenets.com> and Isaac Garzon <isaac@speed.io>
for helping me pinpoint the origin.

Fixes: b723748750 ("tunnel: Propagate ECT(1) when decapsulating as recommended by RFC6040")
Cc: David S. Miller <davem@davemloft.net>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: David Ahern <dsahern@kernel.org>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Gilad Naaman <gnaaman@drivenets.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-23 16:38:57 +01:00
..
bpfilter
netfilter
Kconfig
Makefile
af_inet.c
ah4.c Networking changes for 5.14. 2021-06-30 15:51:09 -07:00
arp.c
bpf_tcp_ca.c
cipso_ipv4.c
datagram.c
devinet.c
esp4.c Networking changes for 5.14. 2021-06-30 15:51:09 -07:00
esp4_offload.c
fib_frontend.c net: Use nlmsg_unicast() instead of netlink_unicast() 2021-07-13 09:28:29 -07:00
fib_lookup.h
fib_notifier.c
fib_rules.c
fib_semantics.c
fib_trie.c
fou.c
gre_demux.c
gre_offload.c
icmp.c
igmp.c
inet_connection_sock.c
inet_diag.c net: Use nlmsg_unicast() instead of netlink_unicast() 2021-07-13 09:28:29 -07:00
inet_fragment.c
inet_hashtables.c
inet_timewait_sock.c
inetpeer.c
ip_forward.c
ip_fragment.c
ip_gre.c
ip_input.c
ip_options.c
ip_output.c
ip_sockglue.c
ip_tunnel.c net: Set true network header for ECN decapsulation 2021-07-23 16:38:57 +01:00
ip_tunnel_core.c
ip_vti.c
ipcomp.c Networking changes for 5.14. 2021-06-30 15:51:09 -07:00
ipconfig.c
ipip.c
ipmr.c ipmr: Fix indentation issue 2021-07-07 20:52:25 -07:00
ipmr_base.c
metrics.c
netfilter.c
netlink.c
nexthop.c
ping.c net: sock: introduce sk_error_report 2021-06-29 11:28:21 -07:00
proc.c
protocol.c
raw.c net: sock: introduce sk_error_report 2021-06-29 11:28:21 -07:00
raw_diag.c net: Use nlmsg_unicast() instead of netlink_unicast() 2021-07-13 09:28:29 -07:00
route.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-06-29 15:45:27 -07:00
syncookies.c
sysctl_net_ipv4.c
tcp.c tcp: call sk_wmem_schedule before sk_mem_charge in zerocopy path 2021-07-09 11:25:24 -07:00
tcp_bbr.c
tcp_bic.c
tcp_bpf.c bpf, sockmap, tcp: sk_prot needs inuse_idx set for proc stats 2021-07-15 19:54:22 +02:00
tcp_cdg.c
tcp_cong.c
tcp_cubic.c
tcp_dctcp.c
tcp_dctcp.h
tcp_diag.c
tcp_fastopen.c tcp: disable TFO blackhole logic by default 2021-07-21 22:50:31 -07:00
tcp_highspeed.c
tcp_htcp.c
tcp_hybla.c
tcp_illinois.c
tcp_input.c mptcp: avoid processing packet if a subflow reset 2021-07-09 18:38:53 -07:00
tcp_ipv4.c tcp: disable TFO blackhole logic by default 2021-07-21 22:50:31 -07:00
tcp_lp.c
tcp_metrics.c
tcp_minisocks.c
tcp_nv.c
tcp_offload.c
tcp_output.c ipv6: tcp: drop silly ICMPv6 packet too big messages 2021-07-08 12:27:08 -07:00
tcp_rate.c
tcp_recovery.c
tcp_scalable.c
tcp_timer.c net: sock: introduce sk_error_report 2021-06-29 11:28:21 -07:00
tcp_ulp.c
tcp_vegas.c
tcp_vegas.h
tcp_veno.c
tcp_westwood.c
tcp_yeah.c tcp_yeah: check struct yeah size at compile time 2021-06-29 11:54:36 -07:00
tunnel4.c
udp.c udp: check encap socket in __udp_lib_err 2021-07-21 08:49:31 -07:00
udp_bpf.c bpf, sockmap, udp: sk_prot needs inuse_idx set for proc stats 2021-07-15 19:54:36 +02:00
udp_diag.c net: Use nlmsg_unicast() instead of netlink_unicast() 2021-07-13 09:28:29 -07:00
udp_impl.h
udp_offload.c udp: properly flush normal packet at GRO time 2021-07-02 16:19:14 -07:00
udp_tunnel_core.c
udp_tunnel_nic.c
udp_tunnel_stub.c
udplite.c
xfrm4_input.c
xfrm4_output.c
xfrm4_policy.c
xfrm4_protocol.c
xfrm4_state.c
xfrm4_tunnel.c