lamb_update(ws, dcdws, lr, mw_tm1, vw_tm1, step, beta1=0.9, beta2=0.999, epsilon=1e-07, max_trust_ratio=10, decay_lambda=0, inplace=True, stop_gradients=True)¶
Update weights ws of some function, given the derivatives of some cost c with respect to ws, [dc/dw for w in ws], by applying LAMB method.
ws (container of variables) – Weights of the function to be updated.
dcdws (container of arrays) – Derivates of the cost c with respect to the weights ws, [dc/dw for w in ws].
lr (float or container of layer-wise rates.) – Learning rate(s), the rate(s) at which the weights should be updated relative to the gradient.
mw_tm1 (container of arrays) – running average of the gradients, from the previous time-step.
vw_tm1 (container of arrays) – running average of second moments of the gradients, from the previous time-step.
step (int) – training step
beta1 (float) – gradient forgetting factor
beta2 (float) – second moment of gradient forgetting factor
epsilon (float) – divisor during adam update, preventing division by zero
max_trust_ratio (float, optional) – The maximum value for the trust ratio. Default is 10.
decay_lambda (float) – The factor used for weight decay. Default is zero.
inplace (bool, optional) – Whether to perform the operation inplace, for backends which support inplace variable updates, and handle gradients behind the scenes such as PyTorch. If the update step should form part of a computation graph (i.e. higher order optimization), then this should be set to False. Default is True.
stop_gradients (bool, optional) – Whether to stop the gradients of the variables after each gradient step. Default is True.
The new function weights ws_new, following the LARS updates.