FedAT: A High-Performance and Communication-Efficient Federated Learning System with Asynchronous Tiers

Zheng Chai; Yujing Chen; Ali Anwar; Liang Zhao; Yue Cheng; Huzefa Rangwala

doi:10.1145/3458817.3476211

Publication

SC 2021

Conference paper

FedAT: A High-Performance and Communication-Efficient Federated Learning System with Asynchronous Tiers

SC 2021

Download paper

Abstract

Federated learning (FL) involves training a model over massive distributed devices, while keeping the training data localized and private. This form of collaborative learning exposes new tradeoffs among model convergence speed, model accuracy, balance across clients and communication cost, with new challenges including the straggler problem and communication bottleneck. To address these issues, we present FedAT, a novel federated learning system with asynchronous tiers. FedAT synergistically combines synchronous, intra-tier training and asynchronous, cross-tier training. By bridging the synchronous and asynchronous training through tiering, FedAT minimizes the straggler with improved test accuracy. FedAT uses a weighted aggregation heuristic to balance the training across clients for further accuracy improvement. FedAT compresses uplink and downlink communications using an efficient compression algorithm, which minimizes the communication cost. Results show that FedAT improves the prediction performance by up to 21.09% and reduces the communication cost by up to 8.5x, compared to state-of-the-art FL methods.

Date

14 Nov 2021

Publication

SC 2021

Authors

IBM-affiliated at time of publication

Topics

Resources

Publication

Abstract

Date

Publication

Authors

Topics

Resources

Share