DeepSpeed
latest
  • Training Setup
  • DeepSpeed Configuration
  • Training API
  • Model Checkpointing
  • Activation Checkpointing
  • Transformer Kernels
  • Pipeline Parallelism
DeepSpeed
  • Docs »
  • DeepSpeed
  • Edit on GitHub

DeepSpeed¶

Model Setup¶

  • Training Setup
    • Argument Parsing
    • Training Initialization
    • Distributed Initialization

Configuration¶

  • DeepSpeed Configuration
    • Configurations
    • Extending Configurations

Training API¶

  • Training API
    • Forward Propagation
    • Backward Propagation
    • Optimizer Step
    • Gradient Accumulation

Checkpointing API¶

  • Model Checkpointing
    • Loading Training Checkpoints
    • Saving Training Checkpoints
  • Activation Checkpointing
    • Configuring Activation Checkpointing
    • Using Activation Checkpointing
    • Configuring and Checkpointing Random Seeds

Transformer Kernel API¶

  • Transformer Kernels
    • DeepSpeed Transformer Config
    • DeepSpeed Transformer Layer

Pipeline Parallelism¶

  • Pipeline Parallelism
    • Model Specification
    • Training
    • Extending Pipeline Parallelism

Indices and tables¶

  • Index
  • Module Index
  • Search Page
Next

© Copyright 2020, Microsoft Revision 67e5563c.

Built with Sphinx using a theme provided by Read the Docs.