ocp.v1.multihost module#
Multihost functionalities.
- async orbax.checkpoint.experimental.v1.multihost.sync_global_processes(key, *, operation_id, timeout=None, processes=None, record_event_name='/jax/checkpoint/sync_global_devices_duration_sec')[source][source]#
Barrier to sync concurrent processes.
NOTE: The barrier name must be unique, i.e. no process should wait on the same barrier name multiple times.
- Parameters:
key (
str) – barrier name. Must be unique.operation_id (
str) – The barrier name will be prefixed with the operation id.timeout (
UnionType[int,None]) – timeout in seconds.processes (
Optional[Collection[int],None]) – If None, expects to wait across all processes and devices. Otherwise, creates a barrier only across devices associated with the given processes.record_event_name (
str) – The name of the event to record the duration of the synchronization.