ocp.v1.training.errors module#

Errors encountered during training.

StepAlreadyExistsError#

class orbax.checkpoint.experimental.v1.training.errors.StepAlreadyExistsError[source]#

Raised when a step intended for saving already exists.

This error is raised when a training step attempts to save at a step number that already contains a checkpoint. This is intended to prevent accidental overwriting of existing checkpoints without explicit intention.

Example Usage:

This error is typically handled during the saving process:

try:
  checkpoint_manager.save(step=100)
except StepAlreadyExistsError:
  print("Checkpoint already exists at step 100.")
step#

The step number that already exists.

Type:

int

path#

The path to the directory where the step already exists.

Type:

str

StepNotFoundError#

class orbax.checkpoint.experimental.v1.training.errors.StepNotFoundError[source]#

Raised when a requested step is not found.

This error is raised when a restoration operation requests a specific training step that does not exist in the checkpoint directory. This serves as a signal that the requested history is missing or has been deleted.

Example Usage:

This error is typically handled during the restoration process:

try:
  checkpoint_manager.restore(step=200)
except StepNotFoundError:
  print("Step 200 not found. Restoring latest available step.")
  checkpoint_manager.restore(step=checkpoint_manager.latest_step())
step#

The step number that was requested but not found.

Type:

int

path#

The path where the step was expected to be found.

Type:

str