DeepSpeed Configuration

Configurations

Training Setup

class deepspeed.config.TrainingConfig(**kwargs)[source]

Top-level configuration for all aspects of training with DeepSpeed.

batch = None

Batch configuration, see BatchConfig

fp16 = None

FP16 training, see FP16Config

class deepspeed.config.BatchConfig(**kwargs)[source]

Batch size related parameters.

train_batch_size = None

The effective training batch size.

This is the number of data samples that leads to one step of model update. train_batch_size is aggregated by the batch size that a single GPU processes in one forward/backward pass (a.k.a., train_step_batch_size), the gradient accumulation steps (a.k.a., gradient_accumulation_steps), and the number of GPUs.

train_micro_batch_size_per_gpu = None

The batch size to be processed per device each forward/backward step.

When specified, gradient_accumulation_steps is automatically calculated using train_batch_size and the number of devices. Should not be concurrently specified with gradient_accumulation_steps.

gradient_accumulation_steps = None

The number of training steps to accumulate gradients before averaging and applying them.

This feature is sometimes useful to improve scalability since it results in less frequent communication of gradients between steps. Another impact of this feature is the ability to train with larger batch sizes per GPU. When specified, train_step_batch_size is automatically calculated using train_batch_size and number of GPUs. Should not be concurrently specified with train_step_batch_size.

resolve()[source]

Complete batch configuration so long as two are provided.

is_valid()[source]

Resolve any missing configurations and determine in the configuration is valid.

Returns:Whether the config and all sub-configs are valid.
Return type:bool
class deepspeed.config.FP16Config(**kwargs)[source]

FP16 configuration.

enabled = None

Enable/disable FP16

clip = None

Gradient clipping

Training Optimizations

class deepspeed.config.FP16Config(**kwargs)[source]

FP16 configuration.

enabled = None

Enable/disable FP16

clip = None

Gradient clipping

Extending Configurations

class deepspeed.config.Config(**kwargs)[source]

Base class for DeepSpeed configurations.

Config is a struct with subclassing. They are initialized from dictionaries and thus also keyword arguments:

>>> c = Config(verbose=True)
>>> c.verbose
True
>>> c['verbose']
True

You can initialize them from dictionaries:

>>> myconf = {'verbose' : True}
>>> c = Config.from_dict(myconf)
>>> c.verbose
True

Configurations should be subclassed to group arguments by topic.

resolve()[source]

Infer any missing arguments, if possible.

This is useful for configs such as BatchConfig in only a subset of arguments are required to complete a valid config.

is_valid()[source]

Resolve any missing configurations and determine in the configuration is valid.

Returns:Whether the config and all sub-configs are valid.
Return type:bool