ECN_PFC_DCQCN_RoCE
5
points you can understand why using ECN and PFC Together to Build Lossless
Ethernet Networks in AI infrastructure:
1- Both ECN and PFC by themselves can manage
congestion very well. By working together, they can be even more effective.
2- ECN can react first to mitigate congestion. If
ECN does not react fast enough and buffer utilization continues to build up,
PFC behaves as a fail-safe and prevents traffic drops. This is the most
efficient way to manage congestion and build lossless Ethernet networks.
3-This collaborative process between PFC and ECN
where they managed congestion together is called Data Center Quantized
Congestion Notification (DCQCN) and is developed for RoCE networks.
4- By working together, PFC and ECN provide
efficient end-to-end congestion management. When the system is experiencing
minor congestion where buffer usage is moderate, WRED with ECN manages the
congestion seamlessly.
5- For both WRED and ECN to work as described, you
should set appropriate thresholds. In the following example, the WRED minimum
and maximum threshold are set for lower buffer utilization to mitigate
congestion first, and the PFC threshold is set higher as a safety net to
mitigate congestion after ECN. Both ECN and PFC work on a queue that is
no-drop, providing lossless transport.
The below example both Host A and B send traffic
to Host X. As leaf uplinks provide enough bandwidth for all traffic to arrive
to Leaf X, the congestion point is on the outgoing interface toward Host X.
Comments
Post a Comment