Merge branch 'make-tc-bpf-helpers-preserve-skb-metadata'

Jakub Sitnicki says:

====================
Make TC BPF helpers preserve skb metadata

Changes in v4:
- Fix copy-paste bug in check_metadata() test helper (AI review)
- Add "out of scope" section (at the bottom)
- Link to v3: https://lore.kernel.org/r/20251026-skb-meta-rx-path-v3-0-37cceebb95d3@cloudflare.com

Changes in v3:
- Use the already existing BPF_STREAM_STDERR const in tests (Martin)
- Unclone skb head on bpf_dynptr_write to skb metadata (patch 3) (Martin)
- Swap order of patches 1 & 2 to refer to skb_postpush_data_move() in docs
- Mention in skb_data_move() docs how to move just the metadata
- Note in pskb_expand_head() docs to move metadata after skb_push() (Jakub)
- Link to v2: https://lore.kernel.org/r/20251019-skb-meta-rx-path-v2-0-f9a58f3eb6d6@cloudflare.com

Changes in v2:
- Tweak WARN_ON_ONCE check in skb_data_move() (patch 2)
- Convert all tests to verify skb metadata in BPF (patches 9-10)
- Add test coverage for modified BPF helpers (patches 12-15)
- Link to RFCv1: https://lore.kernel.org/r/20250929-skb-meta-rx-path-v1-0-de700a7ab1cb@cloudflare.com

This patch set continues our work [1] to allow BPF programs and user-space
applications to attach multiple bytes of metadata to packets via the
XDP/skb metadata area.

The focus of this patch set it to ensure that skb metadata remains intact
when packets pass through a chain of TC BPF programs that call helpers
which operate on skb head.

Currently, several helpers that either adjust the skb->data pointer or
reallocate skb->head do not preserve metadata at its expected location,
that is immediately in front of the MAC header. These are:

- bpf_skb_adjust_room
- bpf_skb_change_head
- bpf_skb_change_proto
- bpf_skb_change_tail
- bpf_skb_vlan_pop
- bpf_skb_vlan_push

In TC BPF context, metadata must be moved whenever skb->data changes to
keep the skb->data_meta pointer valid. I don't see any way around
it. Creative ideas how to avoid that would be very welcome.

With that in mind, we can patch the helpers in at least two different ways:

1. Integrate metadata move into header move

   Replace the existing memmove, which follows skb_push/pull, with a helper
   that moves both headers and metadata in a single call. This avoids an
   extra memmove but reduces transparency.

        skb_pull(skb, len);
-       memmove(skb->data, skb->data - len, n);
+       skb_postpull_data_move(skb, len, n);
        skb->mac_header += len;

        skb_push(skb, len)
-       memmove(skb->data, skb->data + len, n);
+       skb_postpush_data_move(skb, len, n);
        skb->mac_header -= len;

2. Move metadata separately

   Add a dedicated metadata move after the header move. This is more
   explicit but costs an additional memmove.

        skb_pull(skb, len);
        memmove(skb->data, skb->data - len, n);
+       skb_metadata_postpull_move(skb, len);
        skb->mac_header += len;

        skb_push(skb, len)
+       skb_metadata_postpush_move(skb, len);
        memmove(skb->data, skb->data + len, n);
        skb->mac_header -= len;

This patch set implements option (1), expecting that "you can have just one
memmove" will be the most obvious feedback, while readability is a,
somewhat subjective, matter of taste, which I don't claim to have ;-)

The structure of the patch set is as follows:

- patches 1-4 prepare ground for safe-proofing the BPF helpers
- patches 5-9 modify the BPF helpers to preserve skb metadata
- patches 10-11 prepare ground for metadata tests with BPF helper calls
- patches 12-16 adapt and expand tests to cover the made changes

Out of scope for this series:
- safe-proofing tunnel & tagging devices - VLAN, GRE, ...
  (next in line, in development preview at [2])
- metadata access after packet foward
  (to do after Rx path - once metadata reliably reaches sk_filter)

Thanks,
-jkbs

[1] https://lore.kernel.org/all/20250814-skb-metadata-thru-dynptr-v7-0-8a39e636e0fb@cloudflare.com/
[2] https://github.com/jsitnicki/linux/commits/skb-meta/safeproof-netdevs/
====================

Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-0-5ceb08a9b37b@cloudflare.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
This commit is contained in:
Martin KaFai Lau 2025-11-10 10:52:33 -08:00
commit abd0c0f6aa
8 changed files with 475 additions and 183 deletions

View File

@ -1781,6 +1781,8 @@ int __bpf_xdp_store_bytes(struct xdp_buff *xdp, u32 offset, void *buf, u32 len);
void *bpf_xdp_pointer(struct xdp_buff *xdp, u32 offset, u32 len);
void bpf_xdp_copy_buf(struct xdp_buff *xdp, unsigned long off,
void *buf, unsigned long len, bool flush);
int __bpf_skb_meta_store_bytes(struct sk_buff *skb, u32 offset,
const void *from, u32 len, u64 flags);
void *bpf_skb_meta_pointer(struct sk_buff *skb, u32 offset);
#else /* CONFIG_NET */
static inline int __bpf_skb_load_bytes(const struct sk_buff *skb, u32 offset,
@ -1817,6 +1819,13 @@ static inline void bpf_xdp_copy_buf(struct xdp_buff *xdp, unsigned long off, voi
{
}
static inline int __bpf_skb_meta_store_bytes(struct sk_buff *skb, u32 offset,
const void *from, u32 len,
u64 flags)
{
return -EOPNOTSUPP;
}
static inline void *bpf_skb_meta_pointer(struct sk_buff *skb, u32 offset)
{
return ERR_PTR(-EOPNOTSUPP);

View File

@ -355,16 +355,17 @@ static inline int __vlan_insert_inner_tag(struct sk_buff *skb,
__be16 vlan_proto, u16 vlan_tci,
unsigned int mac_len)
{
const u8 meta_len = mac_len > ETH_TLEN ? skb_metadata_len(skb) : 0;
struct vlan_ethhdr *veth;
if (skb_cow_head(skb, VLAN_HLEN) < 0)
if (skb_cow_head(skb, meta_len + VLAN_HLEN) < 0)
return -ENOMEM;
skb_push(skb, VLAN_HLEN);
/* Move the mac header sans proto to the beginning of the new header. */
if (likely(mac_len > ETH_TLEN))
memmove(skb->data, skb->data + VLAN_HLEN, mac_len - ETH_TLEN);
skb_postpush_data_move(skb, VLAN_HLEN, mac_len - ETH_TLEN);
if (skb_mac_header_was_set(skb))
skb->mac_header -= VLAN_HLEN;
@ -731,18 +732,16 @@ static inline void vlan_set_encap_proto(struct sk_buff *skb,
*
* Expects the skb to contain a VLAN tag in the payload, and to have skb->data
* pointing at the MAC header.
*
* Returns: a new pointer to skb->data, or NULL on failure to pull.
*/
static inline void *vlan_remove_tag(struct sk_buff *skb, u16 *vlan_tci)
static inline void vlan_remove_tag(struct sk_buff *skb, u16 *vlan_tci)
{
struct vlan_hdr *vhdr = (struct vlan_hdr *)(skb->data + ETH_HLEN);
*vlan_tci = ntohs(vhdr->h_vlan_TCI);
memmove(skb->data + VLAN_HLEN, skb->data, 2 * ETH_ALEN);
vlan_set_encap_proto(skb, vhdr);
return __skb_pull(skb, VLAN_HLEN);
__skb_pull(skb, VLAN_HLEN);
skb_postpull_data_move(skb, VLAN_HLEN, 2 * ETH_ALEN);
}
/**

View File

@ -4564,6 +4564,81 @@ static inline void skb_metadata_clear(struct sk_buff *skb)
skb_metadata_set(skb, 0);
}
/**
* skb_data_move - Move packet data and metadata after skb_push() or skb_pull().
* @skb: packet to operate on
* @len: number of bytes pushed or pulled from &sk_buff->data
* @n: number of bytes to memmove() from pre-push/pull &sk_buff->data
*
* Moves @n bytes of packet data, can be zero, and all bytes of skb metadata.
*
* Assumes metadata is located immediately before &sk_buff->data prior to the
* push/pull, and that sufficient headroom exists to hold it after an
* skb_push(). Otherwise, metadata is cleared and a one-time warning is issued.
*
* Prefer skb_postpull_data_move() or skb_postpush_data_move() to calling this
* helper directly.
*/
static inline void skb_data_move(struct sk_buff *skb, const int len,
const unsigned int n)
{
const u8 meta_len = skb_metadata_len(skb);
u8 *meta, *meta_end;
if (!len || (!n && !meta_len))
return;
if (!meta_len)
goto no_metadata;
meta_end = skb_metadata_end(skb);
meta = meta_end - meta_len;
if (WARN_ON_ONCE(meta_end + len != skb->data ||
meta_len > skb_headroom(skb))) {
skb_metadata_clear(skb);
goto no_metadata;
}
memmove(meta + len, meta, meta_len + n);
return;
no_metadata:
memmove(skb->data, skb->data - len, n);
}
/**
* skb_postpull_data_move - Move packet data and metadata after skb_pull().
* @skb: packet to operate on
* @len: number of bytes pulled from &sk_buff->data
* @n: number of bytes to memmove() from pre-pull &sk_buff->data
*
* See skb_data_move() for details.
*/
static inline void skb_postpull_data_move(struct sk_buff *skb,
const unsigned int len,
const unsigned int n)
{
DEBUG_NET_WARN_ON_ONCE(len > INT_MAX);
skb_data_move(skb, len, n);
}
/**
* skb_postpush_data_move - Move packet data and metadata after skb_push().
* @skb: packet to operate on
* @len: number of bytes pushed onto &sk_buff->data
* @n: number of bytes to memmove() from pre-push &sk_buff->data
*
* See skb_data_move() for details.
*/
static inline void skb_postpush_data_move(struct sk_buff *skb,
const unsigned int len,
const unsigned int n)
{
DEBUG_NET_WARN_ON_ONCE(len > INT_MAX);
skb_data_move(skb, -len, n);
}
struct sk_buff *skb_clone_sk(struct sk_buff *skb);
#ifdef CONFIG_NETWORK_PHY_TIMESTAMPING

View File

@ -1842,10 +1842,8 @@ int __bpf_dynptr_write(const struct bpf_dynptr_kern *dst, u32 offset, void *src,
return -EINVAL;
return __bpf_xdp_store_bytes(dst->data, dst->offset + offset, src, len);
case BPF_DYNPTR_TYPE_SKB_META:
if (flags)
return -EINVAL;
memmove(bpf_skb_meta_pointer(dst->data, dst->offset + offset), src, len);
return 0;
return __bpf_skb_meta_store_bytes(dst->data, dst->offset + offset, src,
len, flags);
default:
WARN_ONCE(true, "bpf_dynptr_write: unknown dynptr type %d\n", type);
return -EFAULT;

View File

@ -3253,11 +3253,11 @@ static void bpf_skb_change_protocol(struct sk_buff *skb, u16 proto)
static int bpf_skb_generic_push(struct sk_buff *skb, u32 off, u32 len)
{
/* Caller already did skb_cow() with len as headroom,
/* Caller already did skb_cow() with meta_len+len as headroom,
* so no need to do it here.
*/
skb_push(skb, len);
memmove(skb->data, skb->data + len, off);
skb_postpush_data_move(skb, len, off);
memset(skb->data + off, 0, len);
/* No skb_postpush_rcsum(skb, skb->data + off, len)
@ -3281,7 +3281,7 @@ static int bpf_skb_generic_pop(struct sk_buff *skb, u32 off, u32 len)
old_data = skb->data;
__skb_pull(skb, len);
skb_postpull_rcsum(skb, old_data + off, len);
memmove(skb->data, old_data, off);
skb_postpull_data_move(skb, len, off);
return 0;
}
@ -3326,10 +3326,11 @@ static int bpf_skb_net_hdr_pop(struct sk_buff *skb, u32 off, u32 len)
static int bpf_skb_proto_4_to_6(struct sk_buff *skb)
{
const u32 len_diff = sizeof(struct ipv6hdr) - sizeof(struct iphdr);
const u8 meta_len = skb_metadata_len(skb);
u32 off = skb_mac_header_len(skb);
int ret;
ret = skb_cow(skb, len_diff);
ret = skb_cow(skb, meta_len + len_diff);
if (unlikely(ret < 0))
return ret;
@ -3489,6 +3490,7 @@ static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff,
u8 inner_mac_len = flags >> BPF_ADJ_ROOM_ENCAP_L2_SHIFT;
bool encap = flags & BPF_F_ADJ_ROOM_ENCAP_L3_MASK;
u16 mac_len = 0, inner_net = 0, inner_trans = 0;
const u8 meta_len = skb_metadata_len(skb);
unsigned int gso_type = SKB_GSO_DODGY;
int ret;
@ -3499,7 +3501,7 @@ static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff,
return -ENOTSUPP;
}
ret = skb_cow_head(skb, len_diff);
ret = skb_cow_head(skb, meta_len + len_diff);
if (unlikely(ret < 0))
return ret;
@ -3873,6 +3875,7 @@ static const struct bpf_func_proto sk_skb_change_tail_proto = {
static inline int __bpf_skb_change_head(struct sk_buff *skb, u32 head_room,
u64 flags)
{
const u8 meta_len = skb_metadata_len(skb);
u32 max_len = BPF_SKB_MAX_LEN;
u32 new_len = skb->len + head_room;
int ret;
@ -3882,7 +3885,7 @@ static inline int __bpf_skb_change_head(struct sk_buff *skb, u32 head_room,
new_len < skb->len))
return -EINVAL;
ret = skb_cow(skb, head_room);
ret = skb_cow(skb, meta_len + head_room);
if (likely(!ret)) {
/* Idea for this helper is that we currently only
* allow to expand on mac header. This means that
@ -3894,6 +3897,7 @@ static inline int __bpf_skb_change_head(struct sk_buff *skb, u32 head_room,
* for redirection into L2 device.
*/
__skb_push(skb, head_room);
skb_postpush_data_move(skb, head_room, 0);
memset(skb->data, 0, head_room);
skb_reset_mac_header(skb);
skb_reset_mac_len(skb);
@ -12102,6 +12106,18 @@ void *bpf_skb_meta_pointer(struct sk_buff *skb, u32 offset)
return skb_metadata_end(skb) - skb_metadata_len(skb) + offset;
}
int __bpf_skb_meta_store_bytes(struct sk_buff *skb, u32 offset,
const void *from, u32 len, u64 flags)
{
if (unlikely(flags))
return -EINVAL;
if (unlikely(bpf_try_make_writable(skb, 0)))
return -EFAULT;
memmove(bpf_skb_meta_pointer(skb, offset), from, len);
return 0;
}
__bpf_kfunc_start_defs();
__bpf_kfunc int bpf_dynptr_from_skb(struct __sk_buff *s, u64 flags,
struct bpf_dynptr *ptr__uninit)
@ -12129,9 +12145,6 @@ __bpf_kfunc int bpf_dynptr_from_skb(struct __sk_buff *s, u64 flags,
* XDP context with bpf_xdp_adjust_meta(). Serves as an alternative to
* &__sk_buff->data_meta.
*
* If passed @skb_ is a clone which shares the data with the original, the
* dynptr will be read-only. This limitation may be lifted in the future.
*
* Return:
* * %0 - dynptr ready to use
* * %-EINVAL - invalid flags, dynptr set to null
@ -12149,9 +12162,6 @@ __bpf_kfunc int bpf_dynptr_from_skb_meta(struct __sk_buff *skb_, u64 flags,
bpf_dynptr_init(ptr, skb, BPF_DYNPTR_TYPE_SKB_META, 0, skb_metadata_len(skb));
if (skb_cloned(skb))
bpf_dynptr_set_rdonly(ptr);
return 0;
}

View File

@ -2234,6 +2234,10 @@ EXPORT_SYMBOL(__pskb_copy_fclone);
*
* All the pointers pointing into skb header may change and must be
* reloaded after call to this function.
*
* Note: If you skb_push() the start of the buffer after reallocating the
* header, call skb_postpush_data_move() first to move the metadata out of
* the way before writing to &sk_buff->data.
*/
int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
@ -2305,8 +2309,6 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
skb->nohdr = 0;
atomic_set(&skb_shinfo(skb)->dataref, 1);
skb_metadata_clear(skb);
/* It is not generally safe to change skb->truesize.
* For the moment, we really care of rx path, or
* when skb is orphaned (not attached to a socket).

View File

@ -124,10 +124,10 @@ static int send_test_packet(int ifindex)
int n, sock = -1;
__u8 packet[sizeof(struct ethhdr) + TEST_PAYLOAD_LEN];
/* The ethernet header is not relevant for this test and doesn't need to
* be meaningful.
*/
struct ethhdr eth = { 0 };
/* We use the Ethernet header only to identify the test packet */
struct ethhdr eth = {
.h_source = { 0x12, 0x34, 0xDE, 0xAD, 0xBE, 0xEF },
};
memcpy(packet, &eth, sizeof(eth));
memcpy(packet + sizeof(eth), test_payload, TEST_PAYLOAD_LEN);
@ -160,8 +160,16 @@ static int write_test_packet(int tap_fd)
__u8 packet[sizeof(struct ethhdr) + TEST_PAYLOAD_LEN];
int n;
/* The ethernet header doesn't need to be valid for this test */
memset(packet, 0, sizeof(struct ethhdr));
/* The Ethernet header is mostly not relevant. We use it to identify the
* test packet and some BPF helpers we exercise expect to operate on
* Ethernet frames carrying IP packets. Pretend that's the case.
*/
struct ethhdr eth = {
.h_source = { 0x12, 0x34, 0xDE, 0xAD, 0xBE, 0xEF },
.h_proto = htons(ETH_P_IP),
};
memcpy(packet, &eth, sizeof(eth));
memcpy(packet + sizeof(struct ethhdr), test_payload, TEST_PAYLOAD_LEN);
n = write(tap_fd, packet, sizeof(packet));
@ -171,31 +179,19 @@ static int write_test_packet(int tap_fd)
return 0;
}
static void assert_test_result(const struct bpf_map *result_map)
static void dump_err_stream(const struct bpf_program *prog)
{
int err;
__u32 map_key = 0;
__u8 map_value[TEST_PAYLOAD_LEN];
char buf[512];
int ret;
err = bpf_map__lookup_elem(result_map, &map_key, sizeof(map_key),
&map_value, TEST_PAYLOAD_LEN, BPF_ANY);
if (!ASSERT_OK(err, "lookup test_result"))
return;
ASSERT_MEMEQ(&map_value, &test_payload, TEST_PAYLOAD_LEN,
"test_result map contains test payload");
}
static bool clear_test_result(struct bpf_map *result_map)
{
const __u8 v[sizeof(test_payload)] = {};
const __u32 k = 0;
int err;
err = bpf_map__update_elem(result_map, &k, sizeof(k), v, sizeof(v), BPF_ANY);
ASSERT_OK(err, "update test_result");
return err == 0;
ret = 0;
do {
ret = bpf_prog_stream_read(bpf_program__fd(prog),
BPF_STREAM_STDERR, buf, sizeof(buf),
NULL);
if (ret > 0)
fwrite(buf, sizeof(buf[0]), ret, stderr);
} while (ret > 0);
}
void test_xdp_context_veth(void)
@ -270,11 +266,14 @@ void test_xdp_context_veth(void)
if (!ASSERT_GE(tx_ifindex, 0, "if_nametoindex tx"))
goto close;
skel->bss->test_pass = false;
ret = send_test_packet(tx_ifindex);
if (!ASSERT_OK(ret, "send_test_packet"))
goto close;
assert_test_result(skel->maps.test_result);
if (!ASSERT_TRUE(skel->bss->test_pass, "test_pass"))
dump_err_stream(tc_prog);
close:
close_netns(nstoken);
@ -286,7 +285,7 @@ void test_xdp_context_veth(void)
static void test_tuntap(struct bpf_program *xdp_prog,
struct bpf_program *tc_prio_1_prog,
struct bpf_program *tc_prio_2_prog,
struct bpf_map *result_map)
bool *test_pass)
{
LIBBPF_OPTS(bpf_tc_hook, tc_hook, .attach_point = BPF_TC_INGRESS);
LIBBPF_OPTS(bpf_tc_opts, tc_opts, .handle = 1, .priority = 1);
@ -295,8 +294,7 @@ static void test_tuntap(struct bpf_program *xdp_prog,
int tap_ifindex;
int ret;
if (!clear_test_result(result_map))
return;
*test_pass = false;
ns = netns_new(TAP_NETNS, true);
if (!ASSERT_OK_PTR(ns, "create and open ns"))
@ -340,7 +338,8 @@ static void test_tuntap(struct bpf_program *xdp_prog,
if (!ASSERT_OK(ret, "write_test_packet"))
goto close;
assert_test_result(result_map);
if (!ASSERT_TRUE(*test_pass, "test_pass"))
dump_err_stream(tc_prio_2_prog ? : tc_prio_1_prog);
close:
if (tap_fd >= 0)
@ -411,7 +410,8 @@ static void test_tuntap_mirred(struct bpf_program *xdp_prog,
if (!ASSERT_OK(ret, "write_test_packet"))
goto close;
ASSERT_TRUE(*test_pass, "test_pass");
if (!ASSERT_TRUE(*test_pass, "test_pass"))
dump_err_stream(tc_prog);
close:
if (tap_fd >= 0)
@ -431,61 +431,82 @@ void test_xdp_context_tuntap(void)
test_tuntap(skel->progs.ing_xdp,
skel->progs.ing_cls,
NULL, /* tc prio 2 */
skel->maps.test_result);
&skel->bss->test_pass);
if (test__start_subtest("dynptr_read"))
test_tuntap(skel->progs.ing_xdp,
skel->progs.ing_cls_dynptr_read,
NULL, /* tc prio 2 */
skel->maps.test_result);
&skel->bss->test_pass);
if (test__start_subtest("dynptr_slice"))
test_tuntap(skel->progs.ing_xdp,
skel->progs.ing_cls_dynptr_slice,
NULL, /* tc prio 2 */
skel->maps.test_result);
&skel->bss->test_pass);
if (test__start_subtest("dynptr_write"))
test_tuntap(skel->progs.ing_xdp_zalloc_meta,
skel->progs.ing_cls_dynptr_write,
skel->progs.ing_cls_dynptr_read,
skel->maps.test_result);
&skel->bss->test_pass);
if (test__start_subtest("dynptr_slice_rdwr"))
test_tuntap(skel->progs.ing_xdp_zalloc_meta,
skel->progs.ing_cls_dynptr_slice_rdwr,
skel->progs.ing_cls_dynptr_slice,
skel->maps.test_result);
&skel->bss->test_pass);
if (test__start_subtest("dynptr_offset"))
test_tuntap(skel->progs.ing_xdp_zalloc_meta,
skel->progs.ing_cls_dynptr_offset_wr,
skel->progs.ing_cls_dynptr_offset_rd,
skel->maps.test_result);
&skel->bss->test_pass);
if (test__start_subtest("dynptr_offset_oob"))
test_tuntap(skel->progs.ing_xdp,
skel->progs.ing_cls_dynptr_offset_oob,
skel->progs.ing_cls,
skel->maps.test_result);
if (test__start_subtest("clone_data_meta_empty_on_data_write"))
&skel->bss->test_pass);
if (test__start_subtest("clone_data_meta_survives_data_write"))
test_tuntap_mirred(skel->progs.ing_xdp,
skel->progs.clone_data_meta_empty_on_data_write,
skel->progs.clone_data_meta_survives_data_write,
&skel->bss->test_pass);
if (test__start_subtest("clone_data_meta_empty_on_meta_write"))
if (test__start_subtest("clone_data_meta_survives_meta_write"))
test_tuntap_mirred(skel->progs.ing_xdp,
skel->progs.clone_data_meta_empty_on_meta_write,
skel->progs.clone_data_meta_survives_meta_write,
&skel->bss->test_pass);
if (test__start_subtest("clone_dynptr_empty_on_data_slice_write"))
if (test__start_subtest("clone_meta_dynptr_survives_data_slice_write"))
test_tuntap_mirred(skel->progs.ing_xdp,
skel->progs.clone_dynptr_empty_on_data_slice_write,
skel->progs.clone_meta_dynptr_survives_data_slice_write,
&skel->bss->test_pass);
if (test__start_subtest("clone_dynptr_empty_on_meta_slice_write"))
if (test__start_subtest("clone_meta_dynptr_survives_meta_slice_write"))
test_tuntap_mirred(skel->progs.ing_xdp,
skel->progs.clone_dynptr_empty_on_meta_slice_write,
skel->progs.clone_meta_dynptr_survives_meta_slice_write,
&skel->bss->test_pass);
if (test__start_subtest("clone_dynptr_rdonly_before_data_dynptr_write"))
if (test__start_subtest("clone_meta_dynptr_rw_before_data_dynptr_write"))
test_tuntap_mirred(skel->progs.ing_xdp,
skel->progs.clone_dynptr_rdonly_before_data_dynptr_write,
skel->progs.clone_meta_dynptr_rw_before_data_dynptr_write,
&skel->bss->test_pass);
if (test__start_subtest("clone_dynptr_rdonly_before_meta_dynptr_write"))
if (test__start_subtest("clone_meta_dynptr_rw_before_meta_dynptr_write"))
test_tuntap_mirred(skel->progs.ing_xdp,
skel->progs.clone_dynptr_rdonly_before_meta_dynptr_write,
skel->progs.clone_meta_dynptr_rw_before_meta_dynptr_write,
&skel->bss->test_pass);
/* Tests for BPF helpers which touch headroom */
if (test__start_subtest("helper_skb_vlan_push_pop"))
test_tuntap(skel->progs.ing_xdp,
skel->progs.helper_skb_vlan_push_pop,
NULL, /* tc prio 2 */
&skel->bss->test_pass);
if (test__start_subtest("helper_skb_adjust_room"))
test_tuntap(skel->progs.ing_xdp,
skel->progs.helper_skb_adjust_room,
NULL, /* tc prio 2 */
&skel->bss->test_pass);
if (test__start_subtest("helper_skb_change_head_tail"))
test_tuntap(skel->progs.ing_xdp,
skel->progs.helper_skb_change_head_tail,
NULL, /* tc prio 2 */
&skel->bss->test_pass);
if (test__start_subtest("helper_skb_change_proto"))
test_tuntap(skel->progs.ing_xdp,
skel->progs.helper_skb_change_proto,
NULL, /* tc prio 2 */
&skel->bss->test_pass);
test_xdp_meta__destroy(skel);
}

View File

@ -4,6 +4,7 @@
#include <linux/if_ether.h>
#include <linux/pkt_cls.h>
#include <bpf/bpf_endian.h>
#include <bpf/bpf_helpers.h>
#include "bpf_kfuncs.h"
@ -11,37 +12,72 @@
#define ctx_ptr(ctx, mem) (void *)(unsigned long)ctx->mem
/* Demonstrates how metadata can be passed from an XDP program to a TC program
* using bpf_xdp_adjust_meta.
* For the sake of testing the metadata support in drivers, the XDP program uses
* a fixed-size payload after the Ethernet header as metadata. The TC program
* copies the metadata it receives into a map so it can be checked from
* userspace.
/* Demonstrate passing metadata from XDP to TC using bpf_xdp_adjust_meta.
*
* The XDP program extracts a fixed-size payload following the Ethernet header
* and stores it as packet metadata to test the driver's metadata support. The
* TC program then verifies if the passed metadata is correct.
*/
struct {
__uint(type, BPF_MAP_TYPE_ARRAY);
__uint(max_entries, 1);
__type(key, __u32);
__uint(value_size, META_SIZE);
} test_result SEC(".maps");
bool test_pass;
static const __u8 smac_want[ETH_ALEN] = {
0x12, 0x34, 0xDE, 0xAD, 0xBE, 0xEF,
};
static const __u8 meta_want[META_SIZE] = {
0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08,
0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17, 0x18,
0x21, 0x22, 0x23, 0x24, 0x25, 0x26, 0x27, 0x28,
0x31, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37, 0x38,
};
static bool check_smac(const struct ethhdr *eth)
{
return !__builtin_memcmp(eth->h_source, smac_want, ETH_ALEN);
}
static bool check_metadata(const char *file, int line, __u8 *meta_have)
{
if (!__builtin_memcmp(meta_have, meta_want, META_SIZE))
return true;
bpf_stream_printk(BPF_STREAM_STDERR,
"FAIL:%s:%d: metadata mismatch\n"
" have:\n %pI6\n %pI6\n"
" want:\n %pI6\n %pI6\n",
file, line,
&meta_have[0x00], &meta_have[0x10],
&meta_want[0x00], &meta_want[0x10]);
return false;
}
#define check_metadata(meta_have) check_metadata(__FILE__, __LINE__, meta_have)
static bool check_skb_metadata(const char *file, int line, struct __sk_buff *skb)
{
__u8 *data_meta = ctx_ptr(skb, data_meta);
__u8 *data = ctx_ptr(skb, data);
return data_meta + META_SIZE <= data && (check_metadata)(file, line, data_meta);
}
#define check_skb_metadata(skb) check_skb_metadata(__FILE__, __LINE__, skb)
SEC("tc")
int ing_cls(struct __sk_buff *ctx)
{
__u8 *data, *data_meta;
__u32 key = 0;
__u8 *meta_have = ctx_ptr(ctx, data_meta);
__u8 *data = ctx_ptr(ctx, data);
data_meta = ctx_ptr(ctx, data_meta);
data = ctx_ptr(ctx, data);
if (meta_have + META_SIZE > data)
goto out;
if (data_meta + META_SIZE > data)
return TC_ACT_SHOT;
bpf_map_update_elem(&test_result, &key, data_meta, BPF_ANY);
if (!check_metadata(meta_have))
goto out;
test_pass = true;
out:
return TC_ACT_SHOT;
}
@ -49,17 +85,17 @@ int ing_cls(struct __sk_buff *ctx)
SEC("tc")
int ing_cls_dynptr_read(struct __sk_buff *ctx)
{
__u8 meta_have[META_SIZE];
struct bpf_dynptr meta;
const __u32 zero = 0;
__u8 *dst;
dst = bpf_map_lookup_elem(&test_result, &zero);
if (!dst)
return TC_ACT_SHOT;
bpf_dynptr_from_skb_meta(ctx, 0, &meta);
bpf_dynptr_read(dst, META_SIZE, &meta, 0, 0);
bpf_dynptr_read(meta_have, META_SIZE, &meta, 0, 0);
if (!check_metadata(meta_have))
goto out;
test_pass = true;
out:
return TC_ACT_SHOT;
}
@ -86,20 +122,18 @@ SEC("tc")
int ing_cls_dynptr_slice(struct __sk_buff *ctx)
{
struct bpf_dynptr meta;
const __u32 zero = 0;
__u8 *dst, *src;
dst = bpf_map_lookup_elem(&test_result, &zero);
if (!dst)
return TC_ACT_SHOT;
__u8 *meta_have;
bpf_dynptr_from_skb_meta(ctx, 0, &meta);
src = bpf_dynptr_slice(&meta, 0, NULL, META_SIZE);
if (!src)
return TC_ACT_SHOT;
meta_have = bpf_dynptr_slice(&meta, 0, NULL, META_SIZE);
if (!meta_have)
goto out;
__builtin_memcpy(dst, src, META_SIZE);
if (!check_metadata(meta_have))
goto out;
test_pass = true;
out:
return TC_ACT_SHOT;
}
@ -129,14 +163,12 @@ int ing_cls_dynptr_slice_rdwr(struct __sk_buff *ctx)
SEC("tc")
int ing_cls_dynptr_offset_rd(struct __sk_buff *ctx)
{
struct bpf_dynptr meta;
const __u32 chunk_len = META_SIZE / 4;
const __u32 zero = 0;
__u8 meta_have[META_SIZE];
struct bpf_dynptr meta;
__u8 *dst, *src;
dst = bpf_map_lookup_elem(&test_result, &zero);
if (!dst)
return TC_ACT_SHOT;
dst = meta_have;
/* 1. Regular read */
bpf_dynptr_from_skb_meta(ctx, 0, &meta);
@ -155,9 +187,14 @@ int ing_cls_dynptr_offset_rd(struct __sk_buff *ctx)
/* 4. Read from a slice starting at an offset */
src = bpf_dynptr_slice(&meta, 2 * chunk_len, NULL, chunk_len);
if (!src)
return TC_ACT_SHOT;
goto out;
__builtin_memcpy(dst, src, chunk_len);
if (!check_metadata(meta_have))
goto out;
test_pass = true;
out:
return TC_ACT_SHOT;
}
@ -254,7 +291,7 @@ int ing_xdp_zalloc_meta(struct xdp_md *ctx)
/* Drop any non-test packets */
if (eth + 1 > ctx_ptr(ctx, data_end))
return XDP_DROP;
if (eth->h_proto != 0)
if (!check_smac(eth))
return XDP_DROP;
ret = bpf_xdp_adjust_meta(ctx, -META_SIZE);
@ -294,9 +331,9 @@ int ing_xdp(struct xdp_md *ctx)
/* The Linux networking stack may send other packets on the test
* interface that interfere with the test. Just drop them.
* The test packets can be recognized by their ethertype of zero.
* The test packets can be recognized by their source MAC address.
*/
if (eth->h_proto != 0)
if (!check_smac(eth))
return XDP_DROP;
__builtin_memcpy(data_meta, payload, META_SIZE);
@ -304,22 +341,25 @@ int ing_xdp(struct xdp_md *ctx)
}
/*
* Check that skb->data_meta..skb->data is empty if prog writes to packet
* _payload_ using packet pointers. Applies only to cloned skbs.
* Check that, when operating on a cloned packet, skb->data_meta..skb->data is
* kept intact if prog writes to packet _payload_ using packet pointers.
*/
SEC("tc")
int clone_data_meta_empty_on_data_write(struct __sk_buff *ctx)
int clone_data_meta_survives_data_write(struct __sk_buff *ctx)
{
__u8 *meta_have = ctx_ptr(ctx, data_meta);
struct ethhdr *eth = ctx_ptr(ctx, data);
if (eth + 1 > ctx_ptr(ctx, data_end))
goto out;
/* Ignore non-test packets */
if (eth->h_proto != 0)
if (!check_smac(eth))
goto out;
/* Expect no metadata */
if (ctx->data_meta != ctx->data)
if (meta_have + META_SIZE > eth)
goto out;
if (!check_metadata(meta_have))
goto out;
/* Packet write to trigger unclone in prologue */
@ -331,40 +371,44 @@ int clone_data_meta_empty_on_data_write(struct __sk_buff *ctx)
}
/*
* Check that skb->data_meta..skb->data is empty if prog writes to packet
* _metadata_ using packet pointers. Applies only to cloned skbs.
* Check that, when operating on a cloned packet, skb->data_meta..skb->data is
* kept intact if prog writes to packet _metadata_ using packet pointers.
*/
SEC("tc")
int clone_data_meta_empty_on_meta_write(struct __sk_buff *ctx)
int clone_data_meta_survives_meta_write(struct __sk_buff *ctx)
{
__u8 *meta_have = ctx_ptr(ctx, data_meta);
struct ethhdr *eth = ctx_ptr(ctx, data);
__u8 *md = ctx_ptr(ctx, data_meta);
if (eth + 1 > ctx_ptr(ctx, data_end))
goto out;
/* Ignore non-test packets */
if (eth->h_proto != 0)
if (!check_smac(eth))
goto out;
if (md + 1 > ctx_ptr(ctx, data)) {
/* Expect no metadata */
test_pass = true;
} else {
/* Metadata write to trigger unclone in prologue */
*md = 42;
}
if (meta_have + META_SIZE > eth)
goto out;
if (!check_metadata(meta_have))
goto out;
/* Metadata write to trigger unclone in prologue */
*meta_have = 42;
test_pass = true;
out:
return TC_ACT_SHOT;
}
/*
* Check that skb_meta dynptr is writable but empty if prog writes to packet
* _payload_ using a dynptr slice. Applies only to cloned skbs.
* Check that, when operating on a cloned packet, metadata remains intact if
* prog creates a r/w slice to packet _payload_.
*/
SEC("tc")
int clone_dynptr_empty_on_data_slice_write(struct __sk_buff *ctx)
int clone_meta_dynptr_survives_data_slice_write(struct __sk_buff *ctx)
{
struct bpf_dynptr data, meta;
__u8 meta_have[META_SIZE];
struct ethhdr *eth;
bpf_dynptr_from_skb(ctx, 0, &data);
@ -372,51 +416,45 @@ int clone_dynptr_empty_on_data_slice_write(struct __sk_buff *ctx)
if (!eth)
goto out;
/* Ignore non-test packets */
if (eth->h_proto != 0)
if (!check_smac(eth))
goto out;
/* Expect no metadata */
bpf_dynptr_from_skb_meta(ctx, 0, &meta);
if (bpf_dynptr_is_rdonly(&meta) || bpf_dynptr_size(&meta) > 0)
bpf_dynptr_read(meta_have, META_SIZE, &meta, 0, 0);
if (!check_metadata(meta_have))
goto out;
/* Packet write to trigger unclone in prologue */
eth->h_proto = 42;
test_pass = true;
out:
return TC_ACT_SHOT;
}
/*
* Check that skb_meta dynptr is writable but empty if prog writes to packet
* _metadata_ using a dynptr slice. Applies only to cloned skbs.
* Check that, when operating on a cloned packet, metadata remains intact if
* prog creates an r/w slice to packet _metadata_.
*/
SEC("tc")
int clone_dynptr_empty_on_meta_slice_write(struct __sk_buff *ctx)
int clone_meta_dynptr_survives_meta_slice_write(struct __sk_buff *ctx)
{
struct bpf_dynptr data, meta;
const struct ethhdr *eth;
__u8 *md;
__u8 *meta_have;
bpf_dynptr_from_skb(ctx, 0, &data);
eth = bpf_dynptr_slice(&data, 0, NULL, sizeof(*eth));
if (!eth)
goto out;
/* Ignore non-test packets */
if (eth->h_proto != 0)
if (!check_smac(eth))
goto out;
/* Expect no metadata */
bpf_dynptr_from_skb_meta(ctx, 0, &meta);
if (bpf_dynptr_is_rdonly(&meta) || bpf_dynptr_size(&meta) > 0)
meta_have = bpf_dynptr_slice_rdwr(&meta, 0, NULL, META_SIZE);
if (!meta_have)
goto out;
/* Metadata write to trigger unclone in prologue */
bpf_dynptr_from_skb_meta(ctx, 0, &meta);
md = bpf_dynptr_slice_rdwr(&meta, 0, NULL, sizeof(*md));
if (md)
*md = 42;
if (!check_metadata(meta_have))
goto out;
test_pass = true;
out:
@ -424,34 +462,40 @@ int clone_dynptr_empty_on_meta_slice_write(struct __sk_buff *ctx)
}
/*
* Check that skb_meta dynptr is read-only before prog writes to packet payload
* using dynptr_write helper. Applies only to cloned skbs.
* Check that, when operating on a cloned packet, skb_meta dynptr is read-write
* before prog writes to packet _payload_ using dynptr_write helper and metadata
* remains intact before and after the write.
*/
SEC("tc")
int clone_dynptr_rdonly_before_data_dynptr_write(struct __sk_buff *ctx)
int clone_meta_dynptr_rw_before_data_dynptr_write(struct __sk_buff *ctx)
{
struct bpf_dynptr data, meta;
__u8 meta_have[META_SIZE];
const struct ethhdr *eth;
int err;
bpf_dynptr_from_skb(ctx, 0, &data);
eth = bpf_dynptr_slice(&data, 0, NULL, sizeof(*eth));
if (!eth)
goto out;
/* Ignore non-test packets */
if (eth->h_proto != 0)
if (!check_smac(eth))
goto out;
/* Expect read-only metadata before unclone */
/* Expect read-write metadata before unclone */
bpf_dynptr_from_skb_meta(ctx, 0, &meta);
if (!bpf_dynptr_is_rdonly(&meta) || bpf_dynptr_size(&meta) != META_SIZE)
if (bpf_dynptr_is_rdonly(&meta))
goto out;
err = bpf_dynptr_read(meta_have, META_SIZE, &meta, 0, 0);
if (err || !check_metadata(meta_have))
goto out;
/* Helper write to payload will unclone the packet */
bpf_dynptr_write(&data, offsetof(struct ethhdr, h_proto), "x", 1, 0);
/* Expect no metadata after unclone */
bpf_dynptr_from_skb_meta(ctx, 0, &meta);
if (bpf_dynptr_is_rdonly(&meta) || bpf_dynptr_size(&meta) != 0)
err = bpf_dynptr_read(meta_have, META_SIZE, &meta, 0, 0);
if (err || !check_metadata(meta_have))
goto out;
test_pass = true;
@ -460,31 +504,165 @@ int clone_dynptr_rdonly_before_data_dynptr_write(struct __sk_buff *ctx)
}
/*
* Check that skb_meta dynptr is read-only if prog writes to packet
* metadata using dynptr_write helper. Applies only to cloned skbs.
* Check that, when operating on a cloned packet, skb_meta dynptr is read-write
* before prog writes to packet _metadata_ using dynptr_write helper and
* metadata remains intact before and after the write.
*/
SEC("tc")
int clone_dynptr_rdonly_before_meta_dynptr_write(struct __sk_buff *ctx)
int clone_meta_dynptr_rw_before_meta_dynptr_write(struct __sk_buff *ctx)
{
struct bpf_dynptr data, meta;
__u8 meta_have[META_SIZE];
const struct ethhdr *eth;
int err;
bpf_dynptr_from_skb(ctx, 0, &data);
eth = bpf_dynptr_slice(&data, 0, NULL, sizeof(*eth));
if (!eth)
goto out;
/* Ignore non-test packets */
if (eth->h_proto != 0)
if (!check_smac(eth))
goto out;
/* Expect read-only metadata */
/* Expect read-write metadata before unclone */
bpf_dynptr_from_skb_meta(ctx, 0, &meta);
if (!bpf_dynptr_is_rdonly(&meta) || bpf_dynptr_size(&meta) != META_SIZE)
if (bpf_dynptr_is_rdonly(&meta))
goto out;
/* Metadata write. Expect failure. */
bpf_dynptr_from_skb_meta(ctx, 0, &meta);
if (bpf_dynptr_write(&meta, 0, "x", 1, 0) != -EINVAL)
err = bpf_dynptr_read(meta_have, META_SIZE, &meta, 0, 0);
if (err || !check_metadata(meta_have))
goto out;
/* Helper write to metadata will unclone the packet */
bpf_dynptr_write(&meta, 0, &meta_have[0], 1, 0);
err = bpf_dynptr_read(meta_have, META_SIZE, &meta, 0, 0);
if (err || !check_metadata(meta_have))
goto out;
test_pass = true;
out:
return TC_ACT_SHOT;
}
SEC("tc")
int helper_skb_vlan_push_pop(struct __sk_buff *ctx)
{
int err;
/* bpf_skb_vlan_push assumes HW offload for primary VLAN tag. Only
* secondary tag push triggers an actual MAC header modification.
*/
err = bpf_skb_vlan_push(ctx, 0, 42);
if (err)
goto out;
err = bpf_skb_vlan_push(ctx, 0, 207);
if (err)
goto out;
if (!check_skb_metadata(ctx))
goto out;
err = bpf_skb_vlan_pop(ctx);
if (err)
goto out;
err = bpf_skb_vlan_pop(ctx);
if (err)
goto out;
if (!check_skb_metadata(ctx))
goto out;
test_pass = true;
out:
return TC_ACT_SHOT;
}
SEC("tc")
int helper_skb_adjust_room(struct __sk_buff *ctx)
{
int err;
/* Grow a 1 byte hole after the MAC header */
err = bpf_skb_adjust_room(ctx, 1, BPF_ADJ_ROOM_MAC, 0);
if (err)
goto out;
if (!check_skb_metadata(ctx))
goto out;
/* Shrink a 1 byte hole after the MAC header */
err = bpf_skb_adjust_room(ctx, -1, BPF_ADJ_ROOM_MAC, 0);
if (err)
goto out;
if (!check_skb_metadata(ctx))
goto out;
/* Grow a 256 byte hole to trigger head reallocation */
err = bpf_skb_adjust_room(ctx, 256, BPF_ADJ_ROOM_MAC, 0);
if (err)
goto out;
if (!check_skb_metadata(ctx))
goto out;
test_pass = true;
out:
return TC_ACT_SHOT;
}
SEC("tc")
int helper_skb_change_head_tail(struct __sk_buff *ctx)
{
int err;
/* Reserve 1 extra in the front for packet data */
err = bpf_skb_change_head(ctx, 1, 0);
if (err)
goto out;
if (!check_skb_metadata(ctx))
goto out;
/* Reserve 256 extra bytes in the front to trigger head reallocation */
err = bpf_skb_change_head(ctx, 256, 0);
if (err)
goto out;
if (!check_skb_metadata(ctx))
goto out;
/* Reserve 4k extra bytes in the back to trigger head reallocation */
err = bpf_skb_change_tail(ctx, ctx->len + 4096, 0);
if (err)
goto out;
if (!check_skb_metadata(ctx))
goto out;
test_pass = true;
out:
return TC_ACT_SHOT;
}
SEC("tc")
int helper_skb_change_proto(struct __sk_buff *ctx)
{
int err;
err = bpf_skb_change_proto(ctx, bpf_htons(ETH_P_IPV6), 0);
if (err)
goto out;
if (!check_skb_metadata(ctx))
goto out;
err = bpf_skb_change_proto(ctx, bpf_htons(ETH_P_IP), 0);
if (err)
goto out;
if (!check_skb_metadata(ctx))
goto out;
test_pass = true;