netket.callbacks.AutoSlurmRequeue

netket.callbacks.AutoSlurmRequeue#

class netket.callbacks.AutoSlurmRequeue[source]#

Bases: AbstractCallback

A callback that automatically requeues a Slurm job if it is about to run out of time.

This callback should be used together with a form of checkpointing to ensure that the job can be requeued without losing progress.

Inheritance

__init__(before=datetime.timedelta(seconds=300), max_requeue_count=3)[source]#

Initialize the auto-requeue callback.

Parameters:

before (timedelta) – The time before the job ends to check for requeueing (default: 5 minute). This should be a timedelta object or a number of seconds, and it should be at least as long as the time it takes an iteration to run.
max_requeue_count (int) – Maximum number of times the job should be requeued.

Attributes

callback_order#

An integer representing the order in which this callback should be called.

Lower numbers are called first, and higher numbers are called later.

This can be redefined in subclasses to change the order in which callbacks are called. (Default: 0, for all callbacks, 10 for loggers).

before: timedelta#

max_requeue_count: int#

Methods

before_parameter_update(step, log_data, driver)[source]#

Called after all update logic has been computed and the step has been accepted, but before the driver applies the parameter update.

At this point:

The loss and its gradient have been computed by compute_loss_and_update().
The step has been accepted (not rejected by on_compute_update_end()).
driver.step_count still refers to the current step — it has not yet been incremented.
The variational state parameters have not yet changed.

This is the right place to estimate additional observables, add data to log_data, or take a snapshot of the state for logging. Callbacks with a lower callback_order run first, so observables callbacks (order 0) are guaranteed to populate log_data before logger callbacks (order 10) read it.

on_compute_update_end(step, log_data, driver)[source]#

Callback called at the end of the compute update phase, after computing the loss and its gradient.

This is called before the parameters are updated, so it can be used to implement custom logic for rejecting a step based on the computed loss or gradient.

Return type:: bool
Returns:: A boolean indicating whether to reject the step (i.e. repeat it with the same parameters). If it returns None, it is treated as False.

on_compute_update_start(step, log_data, driver)[source]#

on_run_end(step, driver)[source]#

on_run_error(step, error, driver)[source]#

on_run_start(step, driver)[source]#

on_step_end(step, log_data, driver)[source]#

on_step_start(step, log_data, driver)[source]#

replace(**kwargs)[source]#

Replace the values of the fields of the object with the values of the keyword arguments. If the object is a dataclass, dataclasses.replace will be used. Otherwise, a new object will be created with the same type as the original object.

Return type:

TypeVar(P, bound= Pytree)

Parameters:

self (P)
kwargs (Any)

netket.callbacks.AutoSlurmRequeue

Contents

netket.callbacks.AutoSlurmRequeue#