ocp.v1.training.errors module#
Errors encountered during training.
StepAlreadyExistsError#
- class orbax.checkpoint.experimental.v1.training.errors.StepAlreadyExistsError[source]#
Raised when a step intended for saving already exists.
This error is raised when a training step attempts to save at a step number that already contains a checkpoint. This is intended to prevent accidental overwriting of existing checkpoints without explicit intention.
- Example Usage:
This error is typically handled during the saving process:
try: checkpoint_manager.save(step=100) except StepAlreadyExistsError: print("Checkpoint already exists at step 100.")
- step#
The step number that already exists.
- Type:
int
- path#
The path to the directory where the step already exists.
- Type:
str
StepNotFoundError#
- class orbax.checkpoint.experimental.v1.training.errors.StepNotFoundError[source]#
Raised when a requested step is not found.
This error is raised when a restoration operation requests a specific training step that does not exist in the checkpoint directory. This serves as a signal that the requested history is missing or has been deleted.
- Example Usage:
This error is typically handled during the restoration process:
try: checkpoint_manager.restore(step=200) except StepNotFoundError: print("Step 200 not found. Restoring latest available step.") checkpoint_manager.restore(step=checkpoint_manager.latest_step())
- step#
The step number that was requested but not found.
- Type:
int
- path#
The path where the step was expected to be found.
- Type:
str