Abstract: The increasing scale of deep learning models requires widespread adoption of distributed training using the Ring-AllReduce (RAR) architecture. However, existing multi-path transmission ...