wkz

/* You are not expected to understand this */

Marvell LinkStreet LAG Issues

May 17, 2022 5 min read

While running some kselftest-like tests on a system made up of three Marvell LinkStreet chips, I uncovered a series of issues related to offloaded link aggregates (LAGs) that are spread across multiple chips.

The system has the following layout:

                    .--0-1-2-3-4--.
              .-----a     sw1     |
        .-0-1-4-.   '--5-6-7-8-9--'   sw0: 6353 (Agate)
CPU +---6  sw0  |                     sw1: 6097 (Opal+)
        '-2-3-5-'   .--0-1-2-3-4--.   sw2: 6097 (Opal+)
              '-----a     sw2     |
                    '--5-6-7-8-9--'

Software-wise, the system is running NetBox with 5.18 Linux kernel, using the mv88e6xxx driver to control the switch chips.

Inconsistent Hashing

After creating a LAG consisting of sw0p1 and sw1p1, I was running a test where multicast was to be flooded out through the LAG. During that test, I observed that some groups were correctly flooded, while others were not.

If a packet was forced out of the LAG port By injecting a FROM_CPU packet on the CPU port , it was correctly received on the other side. So I knew that the problem was in the forwarding plane of the switch.

Previously, I had run the same test, but with sw1p0 and sw2p4 as the LAG ports, without any issues. Guessing that the issue might have something to do with asymmetric hashing, I generated some test packets with nemesis:

for i in $(seq 0 255); do
	nemesis ethernet -c 1 -d eth0 -M 01:00:de:ad:00:$(printf '%2.2x' $i) \
		-T 0xbbbb -P <(echo testing $i)
done

On the receiver I observed the following results:

# Groups	# Copies	Interpretation
64	0	Both switches assume that the designated port is on the other switch
128	1	Switches are in agreement
64	2	Both switches determines that the local port is the designated one

Disabling the hashing By clearing Global1:Reg7:Bit11 causes the devices to fall back to a simple XOR-based port selection. In this mode of operation, sending the same test packets, exactly one copy of each of the 256 packets is received.

Conclusion: The Agate uses a different hash function than the one on the Opal+.

DSA Tag Trunk Bit Override

Because of the way mv88e6xxx handles port isolation Full disclosure: This is my fault , frames assigned to VID 0 ingressing on DSA ports are trapped to the CPU using the VTU policy feature.

This avoids a whole slew of issues where intermediate switches may be confused by looking up the DA in the ATU. Unfortunately it also means that the original DSA tag, which looked something like this…

FORWARD dev:2 port:0 trunk:yes

…is rewritten to:

TO_CPU  dev:2 port:0 code:policy-trap

Since there is not trunk bit in the TO_CPU tag, the original source information is lost.

Fortunately, this is only an issue for LAGs in standalone mode - as soon as a LAG is added to a bridge, no packets will ever be assigned to VID 0. If a standalone LAG is required, you can create one using a mode that is not possible to offload E.g. balance-rr for Bond interfaces . In that case, the DSA layer in the kernel will fallback to a software LAG and everything will work as expected.

Conclusion: The VTU policy feature can’t coexist with offloaded standalone LAGs.

ATU Trunk Bit Inheritance

When mv88e6xxx adds a static FDB entry to the ATU, it will reuse any existing ATU entry for the DA in question This is most likely because it makes the code reusable for MDB operations .

Unfortunately, the Trunk bit of the ATU entry is not cleared before entry is written back with the new port information.

Example: We start with the following bridge setup:

  br0
  / \
 / lag0  lag1
'   /\    /\
0  1  2  3  4
   |  '--'  |
   '--------'

Then we send a packet from lag1, which is physically looped back to lag0, where it is received and learned by the ATU.

Now let’s say that we swap the roles of the LAGs, i.e. we connect lag1 to the bridge and keep lag0 as a standalone interface.

At this point, the bridge will want to add a static entry for the MAC address of lag1, pointing towards the CPU port. It will then find the existing dynamic entry from the previous configuration, override the port and state and the write it back to the ATU. But since the trunk bit is not cleared, you now end up with a static entry pointing towards a non-existing LAG 0x400 (for the common case where the CPU port is 11).

Conclusion: The trunk bit must always be cleared when updating an existing ATU entry.

wkz

/* You are not expected to understand this */

Marvell LinkStreet LAG Issues

Inconsistent Hashing

Suggested Solution

DSA Tag Trunk Bit Override

Suggested Solution

ATU Trunk Bit Inheritance

Suggested Solution