FedVG: Gradient-Guided Aggregation for Enhanced Federated Learning

Abstract

Federated Learning (FL) enables collaborative model training across multiple clients without sharing their private data. However, data heterogeneity across clients leads to client drift, which degrades the overall generalization performance of the model. This effect is further compounded by overemphasis on poorly performing clients. To address this problem, we propose FedVG, a novel gradient-based federated aggregation framework that leverages a global validation set to guide the optimization process. Such a global validation set can be established using readily available public datasets, ensuring accessibility and consistency across clients without compromising privacy. In contrast to conventional approaches that prioritize client dataset volume, FedVG assesses the generalization ability of client models by measuring the magnitude of validation gradients across layers. Specifically, we compute layerwise gradient norms to derive a client-specific score that reflects how much each client needs to adjust for improved generalization on the global validation set, thereby enabling more informed and adaptive federated aggregation. Extensive experiments on both natural and medical image benchmarking datasets, across diverse model architectures, demonstrate that FedVG consistently improves performance, particularly in highly heterogeneous settings. Moreover, FedVG is modular and can be seamlessly integrated with various state-of-the-art FL algorithms, often further improving their results.

Overview

Method Diagram

An overview of our proposed approach FedVG framework. Locally trained client models, which converge to varying points, are sent to the server. A global validation set is used to compute their validation gradients, obtain client scores $s_k$ , and form a global model by weighted sum. Individual client updates are represented as dotted lines in the loss landscape on the right, while aggregated models are solid.

Modular Integration with Baselines

FedVG’s modular design allows it to enhance existing federated learning algorithms by incorporating validation-gradient-based weighting during server-side aggregation. Experiments show that integrating FedVG with baselines like FedAvg, FedProx, and Elastic consistently improves performance, particularly under high heterogeneity, without modifying client-side updates. These results highlight FedVG’s versatility as a complementary module that can boost a wide range of FL strategies.

Ablation on Global Validation Set

1. External Public Dataset as Global Validation Set

We evaluated FedVG on CIFAR-10 using external validation sets from STL-10 and CIFAR-100, selected for overlapping classes and imaging context. Small validation sets were sampled (3,650 from CIFAR-100 and 4,950 from STL-10) to test robustness under distribution shifts. Across high heterogeneity settings ($\alpha = 0.1, 0.05$), FedVG consistently outperformed baseline methods, achieving accuracies up to 61.1%, demonstrating its ability to maintain strong performance even when validated on data outside the training distribution.

2. Impact of Imbalanced Global Validation Set

We evaluate FedVG under class-imbalanced global validation sets to simulate real-world distribution shifts, where client data distributions are unknown. Even as the class imbalance ratio decreases, FedVG consistently outperforms FedAvg, demonstrating its robustness to varying validation set compositions and highlighting its effectiveness under realistic, imbalanced scenarios.

Acknowledgement

This material is based upon work supported by the National Science Foundation (NSF) under Grant No. 2119654. Any opinions, findings, conclusions, or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the NSF. We also recognize the computational resources provided by the WVU Research Computing Dolly Sods HPC cluster, which is funded in part by NSF OAC-2117575.

FedVG: Gradient-Guided Aggregation for Enhanced Federated Learning

Abstract

Overview

Data Heterogeneity (Examples)

CIFAR-10 class distributions across 100 clients at different heterogeneity levels

COVID19 class distributions across 20 clients at different heterogeneity levels

Results at Varying Degree of Heterogeneity Across Different Datasets

Results for ResNet-based Architectures

Results for Transformer-based Architectures

Modular Integration with Baselines

Ablation on Global Validation Set

1. External Public Dataset as Global Validation Set

2. Impact of Imbalanced Global Validation Set

BibTeX

Acknowledgement