ocp.v1.training.preservation_policies module#

Defines policies for when a checkpoint is preserved.

PreservationPolicy#

class orbax.checkpoint.experimental.v1.training.preservation_policies.PreservationPolicy(*args, **kwargs)[source]#

Bases: Protocol

A policy that defines when checkpoints should be preserved.

should_preserve(checkpoints, *, context)[source]#

Indicates which checkpoints should be preserved..

Return type:

Sequence[bool]

PreserveAll#

class orbax.checkpoint.experimental.v1.training.preservation_policies.PreserveAll[source]#

Preserves all checkpoints.

should_preserve(checkpoints, *, context)[source]#

Indicates which checkpoints should be preserved..

Return type:

Sequence[bool]

LatestN#

class orbax.checkpoint.experimental.v1.training.preservation_policies.LatestN(n=None)[source]#

Preserves the last n checkpoints. Preserves all checkpoint if n is None.

should_preserve(checkpoints, *, context)[source]#

Indicates which checkpoints should be preserved..

Return type:

Sequence[bool]

EveryNSeconds#

class orbax.checkpoint.experimental.v1.training.preservation_policies.EveryNSeconds(interval_secs)[source]#

Ensures checkpoints are preserved at least after the time interval.

should_preserve(checkpoints, *, context)[source]#

Indicates which checkpoints should be preserved..

Return type:

Sequence[bool]

EveryNSteps#

class orbax.checkpoint.experimental.v1.training.preservation_policies.EveryNSteps(interval_steps, exact_interval=True, max_to_keep=None)[source]#

Preserves checkpoints after at least N steps.

should_preserve(checkpoints, *, context)[source]#

Indicates which checkpoints should be preserved..

Return type:

Sequence[bool]

CustomSteps#

class orbax.checkpoint.experimental.v1.training.preservation_policies.CustomSteps(steps)[source]#

Preserves checkpoints at the given steps.

should_preserve(checkpoints, *, context)[source]#

Indicates which checkpoints should be preserved..

Return type:

Sequence[bool]

BestN#

class orbax.checkpoint.experimental.v1.training.preservation_policies.BestN(*, get_metric_fn, reverse=False, n=None, keep_checkpoints_without_metrics=True)[source]#

A policy that preserves the best checkpoints based on a best_fn.

get_metric_fn:

A function that accepts a nested tree of metrics and returns a scalar value representing the value used for ranking checkpoints.

reverse:

If False (default), checkpoints are sorted in ascending order, according to the best_fn. If True, checkpoints are sorted in descending order. Same as the semantics of built-in sorted() function.

n:

The number of checkpoints to preserve. If None, all checkpoints are preserved. If 0, no checkpoints are preserved.

should_preserve(checkpoints, *, context)[source]#

Indicates which checkpoints should be preserved..

Return type:

Sequence[bool]

LatestDuration#

class orbax.checkpoint.experimental.v1.training.preservation_policies.LatestDuration(duration)[source]#

Preserves checkpoints that are newer than the given duration.

E.g. retain checkpoints within the last 24 hours:

import datetime
LatestDuration(datetime.timedelta(hours=24))
should_preserve(checkpoints, *, context)[source]#

Indicates which checkpoints should be preserved..

Return type:

Sequence[bool]

EveryNStepsClosest#

class orbax.checkpoint.experimental.v1.training.preservation_policies.EveryNStepsClosest(interval_steps, max_to_keep=None)[source]#

Preserves checkpoints at steps closest to absolute multiples of N.

This policy maps each checkpoint to its closest nominal target step on a grid defined by interval_steps (i.e. k * interval_steps). For each nominal target, the closest available checkpoint is preserved.

This avoids the error accumulation/drift that can occur with EveryNSteps(exact_interval=False) when checkpoints are irregular.

The last checkpoint is always preserved for final model state and efficient recovery.

should_preserve(checkpoints, *, context)[source]#

Indicates which checkpoints should be preserved..

Return type:

Sequence[bool]

AnyPreservationPolicy#

class orbax.checkpoint.experimental.v1.training.preservation_policies.AnyPreservationPolicy(policies)[source]#

Applies multiple preservation policies and preserves if any policy preserves.

should_preserve(checkpoints, *, context)[source]#

Indicates which checkpoints should be preserved..

Return type:

Sequence[bool]

PreservationContext#

class orbax.checkpoint.experimental.v1.training.preservation_policies.PreservationContext[source]#

Additional properties for making a save decision.

__eq__(other)#

Return self==value.

__hash__ = None#
__init__()#