ScaleFL: Resource-Adaptive Federated Learning with Heterogeneous Clients

Code for the following paper:

Fatih Ilhan, Gong Su and Ling Liu, "ScaleFL: Resource-Adaptive Federated Learning with Heterogeneous Clients," IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, Canada, Jun. 18-22, 2023.

Introduction

Federated learning (FL) is an attractive distributed learning paradigm supporting real-time continuous learning and client privacy by default. In most FL approaches, all edge clients are assumed to have sufficient computation capabilities to participate in the learning of a deep neural network (DNN) model. However, in real-life applications, some clients may have severely limited resources and can only train a much smaller local model. This paper presents ScaleFL, a novel FL approach with two distinctive mechanisms to handle resource heterogeneity and provide an equitable FL framework for all clients. First, ScaleFL adaptively scales down the DNN model along width and depth dimensions by leveraging early exits to find the best-fit models for resource-aware local training on distributed clients. In this way, ScaleFL provides an efficient balance of preserving basic and complex features in local model splits with various sizes for joint training while enabling fast inference for model deployment. Second, ScaleFL utilizes self-distillation among exit predictions during training to improve aggregation through knowledge transfer among subnetworks. We conduct extensive experiments on benchmark CV (CIFAR-10/100, ImageNet) and NLP datasets (SST-2, AgNews). We demonstrate that ScaleFL outperforms existing representative heterogeneous FL approaches in terms of global/local model performance and provides inference efficiency, with up to 2x latency and 4x model size reduction with negligible performance drop below 2%.

Requirements

Python 3.7+ (tested with PyTorch 1.12)
PyTorch
HuggingFace transformers and datasets
Matplotlib, NumPy, Pandas (for report generation)

You can recreate the full environment with environment.yml:

conda env create -f environment.yml
conda activate scalefl

Quickstart

Train via scripts

The scripts/ directory holds preset runs. Examples:

bash scripts/CIFAR10.sh
bash scripts/CIFAR100.sh
bash scripts/ImageNet.sh
bash scripts/SST2.sh
bash scripts/AgNews.sh

Each script configures data paths, architecture, rounds, and sampling rates for the corresponding dataset. Use the full_* variants for full-scale runs.

Train manually

Invoke main.py with your dataset, architecture, and scaling choices:

# ResNet110 on CIFAR10
python main.py --data-root <data-root> --data cifar10 --arch resnet110_4 --use-valid

# MSDNet24 on CIFAR100
python main.py --data-root <data-root> --data cifar100 --arch msdnet24_4 --use-valid \
	--ee_locs 15 18 21 --vertical_scale_ratios 0.65 0.7 0.85 1

# EffNetB4 on ImageNet
python main.py --data-root <data-root> --data imagenet --arch effnetb4_4 --use-valid \
	--num_rounds 90 --num_clients 50 --sample_rate 0.2 --vertical_scale_ratios 0.65 0.65 0.82 1

# BERT on AgNews
python main.py --data-root <data-root> --data ag_news --arch bert_4 --use-valid \
	--ee_locs 4 6 9 --KD_gamma 0.05 --num_rounds 100 --num_clients 50 \
	--sample_rate 0.2 --vertical_scale_ratios 0.4 0.55 0.75 1

All training, inference, and model parameters are centralized in config.py.

Reports and logging

During training, StatsLogger writes CSVs under report/ inside each run directory (e.g., exp_outputs/<DATASET>/<RUN>/report). To build a PDF summary from these assets:

python utils/generate_report.py <run_dir> --output <optional_output_path>
# Default output: <run_dir>/report/report.pdf

The report includes client group mappings, parameter breakdowns, partitioning visuals, and per-client accuracy curves. Individual per-client PNG plots are also saved to <run_dir>/report/per_client_accuracy_curves/.

Project layout

main.py, train.py, predict.py: entry points for training and evaluation
config.py, args.py: configuration and CLI flags
models/: model definitions (ResNet, MSDNet, EfficientNet, BERT)
data_tools/: data loading and sampling utilities
utils/: logging, reporting, and helper functions
scripts/: ready-made experiment launchers
exp_outputs/: sample output structure with reports and checkpoints

Citation

If you use ScaleFL_API in your research, please cite the CVPR 2023 paper above.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ScaleFL: Resource-Adaptive Federated Learning with Heterogeneous Clients

Introduction

Requirements

Quickstart

Train via scripts

Train manually

Reports and logging

Project layout

Citation

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data_tools		data_tools
datasets		datasets
models		models
outputs		outputs
scripts		scripts
utils		utils
.gitignore		.gitignore
README.md		README.md
args.py		args.py
client_split_mapping		client_split_mapping
config.py		config.py
environment.yml		environment.yml
fed.py		fed.py
main.py		main.py
predict.py		predict.py
train.py		train.py

SathvikKP/ScaleFL_API

Folders and files

Latest commit

History

Repository files navigation

ScaleFL: Resource-Adaptive Federated Learning with Heterogeneous Clients

Introduction

Requirements

Quickstart

Train via scripts

Train manually

Reports and logging

Project layout

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages