Cornell Collective Communication Library

Collective Communication for Distributed ML (NSF Award #2435852)

We aim to significantly improve collective communication by creating software tools and algorithms specifically designed for the variety of connections found in modern cloud-based accelerator systems. Our project will measure how communication speeds and delays vary between accelerators, accounting for complexities like proprietary technologies and hidden network paths within data centers. These measurements will be used to design optimized collective communication strategies tailored to cloud setups.

Principal Investigators