Getting Started

Let’s start our Bagua journey!

Migrate from your existing single GPU training code

To use Bagua, you need make the following changes on your training code:

First, import bagua:

import bagua.torch_api as bagua

Then initialize Bagua's process group:

torch.cuda.set_device(bagua.get_local_rank())
bagua.init_process_group()

Then, use torch's distributed sampler for your data loader:

train_dataset = ...
test_dataset = ...

train_sampler = torch.utils.data.distributed.DistributedSampler(train_dataset,
    num_replicas=bagua.get_world_size(), rank=bagua.get_rank())

train_loader = torch.utils.data.DataLoader(
    train_dataset,
    batch_size=batch_size,
    shuffle=(train_sampler is None),
    sampler=train_sampler,
)

test_loader = torch.utils.data.DataLoader(test_dataset, ...)

Finally, wrap you model and optimizer with bagua by adding one line of code to your original script:

# define your model and optimizer
model = ...
model = model.cuda()
optimizer = ...

# select your Bagua algorithm to use
from bagua.torch_api.algorithms import gradient_allreduce

# wrap your model and optimizer with Bagua
model = model.with_bagua(
    [optimizer], gradient_allreduce.GradientAllReduceAlgorithm()
)

More examples can be found here.

Launch job

Bagua has a built-in tool bagua.distributed.launch to launch jobs, whose usage is similar to Pytorch torch.distributed.launch.

We introduce how to start distributed training in the following sections.

Single node multi-process training

python -m bagua.distributed.launch --nproc_per_node=8 \
  your_training_script.py (--arg1 --arg2 ...)

Multi-node multi-process training (e.g. two nodes)

Node 1: (IP: 192.168.1.1, and has a free port: 1234)

python -m bagua.distributed.launch --nproc_per_node=8 --nnodes=2 --node_rank=0 --master_addr="192.168.1.1" --master_port=1234  your_training_script.py (--arg1 --arg2 ...)

Node 2:

python -m bagua.distributed.launch --nproc_per_node=8 --nnodes=2 --node_rank=1 --master_addr="192.168.1.1" --master_port=1234 your_training_script.py (--arg1 --arg2 ...)

Tips:

If you need some preprocessing work, you can include them in your bash script and launch job by adding --no_python to your command.

python -m bagua.distributed.launch --no_python --nproc_per_node=8 bash your_bash_script.sh