Training API¶

deepspeed.initialize() returns a training engine in its first argument of type DeepSpeedEngine. This engine is used to progress training:

for step, batch in enumerate(data_loader):
    #forward() method
    loss = model_engine(batch)

    #runs backpropagation
    model_engine.backward(loss)

    #weight update
    model_engine.step()

Forward Propagation¶

deepspeed.DeepSpeedEngine.forward(self, *inputs, **kwargs)¶

Execute forward propagation

Parameters:	inputs – Variable length input list *kwargs – variable length keyword arguments

Backward Propagation¶

deepspeed.DeepSpeedEngine.backward(self, loss, allreduce_gradients=True, release_loss=False)¶

Execute backward pass on the loss

Parameters:	loss – Torch tensor on which to execute backward propagation allreduce_gradients – If this is False, then gradient averaging will be skipped. Default is True.

Optimizer Step¶

deepspeed.DeepSpeedEngine.step(self, lr_kwargs=None)¶: Execute the weight update step after forward and backward propagation on effective_train_batch.

Gradient Accumulation¶

deepspeed.DeepSpeedEngine.is_gradient_accumulation_boundary(self)¶

Query whether the current micro-batch is at the boundary of gradient accumulation, and thus will trigger gradient reductions and an optimizer step.

Returns:	if the current step is a gradient accumulation boundary.
Return type:	bool