laplace.curvature
#
Classes:
-
CurvatureInterface–Interface to access curvature for a model and corresponding likelihood.
-
GGNInterface–Generalized Gauss-Newton or Fisher Curvature Interface.
-
EFInterface–Interface for Empirical Fisher as Hessian approximation.
-
AsdlInterface–Interface for asdfghjkl backend.
-
AsdlGGN–Implementation of the
GGNInterfaceusing asdfghjkl. -
AsdlEF–Implementation of the
EFInterfaceusing asdfghjkl. -
AsdlHessian– -
BackPackInterface–Interface for Backpack backend.
-
BackPackGGN–Implementation of the
GGNInterfaceusing Backpack. -
BackPackEF–Implementation of
EFInterfaceusing Backpack. -
CurvlinopsInterface–Interface for Curvlinops backend. https://github.com/f-dangel/curvlinops
-
CurvlinopsGGN–Implementation of the
GGNInterfaceusing Curvlinops. -
CurvlinopsEF–Implementation of
EFInterfaceusing Curvlinops. -
CurvlinopsHessian–Implementation of the full Hessian using Curvlinops.
CurvatureInterface
#
CurvatureInterface(model: Module, likelihood: Likelihood | str, last_layer: bool = False, subnetwork_indices: LongTensor | None = None, dict_key_x: str = 'input_ids', dict_key_y: str = 'labels')
Interface to access curvature for a model and corresponding likelihood.
A CurvatureInterface must inherit from this baseclass and implement the
necessary functions jacobians, full, kron, and diag.
The interface might be extended in the future to account for other curvature
structures, for example, a block-diagonal one.
Parameters:
-
(model#torch.nn.Module or `laplace.utils.feature_extractor.FeatureExtractor`) –torch model (neural network)
-
(likelihood#('classification', 'regression'), default:'classification') – -
(last_layer#bool, default:False) –only consider curvature of last layer
-
(subnetwork_indices#LongTensor, default:None) –indices of the vectorized model parameters that define the subnetwork to apply the Laplace approximation over
-
(dict_key_x#str, default:'input_ids') –The dictionary key under which the input tensor
xis stored. Only has effect when the model takes aMutableMappingas the input. Useful for Huggingface LLM models. -
(dict_key_y#str, default:'labels') –The dictionary key under which the target tensor
yis stored. Only has effect when the model takes aMutableMappingas the input. Useful for Huggingface LLM models.
Attributes:
-
lossfunc(MSELoss or CrossEntropyLoss) – -
factor(float) –conversion factor between torch losses and base likelihoods For example, \(\frac{1}{2}\) to get to \(\mathcal{N}(f, 1)\) from MSELoss.
Methods:
-
jacobians–Compute Jacobians \(\nabla_{\theta} f(x;\theta)\) at current parameter \(\theta\),
-
last_layer_jacobians–Compute Jacobians \(\nabla_{\theta_\textrm{last}} f(x;\theta_\textrm{last})\)
-
gradients–Compute batch gradients \(\nabla_\theta \ell(f(x;\theta, y)\) at
-
full–Compute a dense curvature (approximation) in the form of a \(P \times P\) matrix
-
kron–Compute a Kronecker factored curvature approximation (such as KFAC).
-
diag–Compute a diagonal Hessian approximation to \(H\) and is represented as a
Source code in laplace/curvature/curvature.py
jacobians
#
jacobians(x: Tensor | MutableMapping[str, Tensor | Any], enable_backprop: bool = False) -> tuple[Tensor, Tensor]
Compute Jacobians \(\nabla_{\theta} f(x;\theta)\) at current parameter \(\theta\), via torch.func.
Parameters:
-
(x#Tensor) –input data
(batch, input_shape)on compatible device with model. -
(enable_backprop#bool, default:= False) –whether to enable backprop through the Js and f w.r.t. x
Returns:
-
Js(Tensor) –Jacobians
(batch, parameters, outputs) -
f(Tensor) –output function
(batch, outputs)
Source code in laplace/curvature/curvature.py
last_layer_jacobians
#
last_layer_jacobians(x: Tensor | MutableMapping[str, Tensor | Any], enable_backprop: bool = False) -> tuple[Tensor, Tensor]
Compute Jacobians \(\nabla_{\theta_\textrm{last}} f(x;\theta_\textrm{last})\) only at current last-layer parameter \(\theta_{\textrm{last}}\).
Parameters:
Returns:
-
Js(Tensor) –Jacobians
(batch, outputs, last-layer-parameters) -
f(Tensor) –output function
(batch, outputs)
Source code in laplace/curvature/curvature.py
gradients
#
Compute batch gradients \(\nabla_\theta \ell(f(x;\theta, y)\) at current parameter \(\theta\).
Parameters:
Returns:
-
Gs(Tensor) –gradients
(batch, parameters) -
loss(Tensor) –
Source code in laplace/curvature/curvature.py
full
#
Compute a dense curvature (approximation) in the form of a \(P \times P\) matrix \(H\) with respect to parameters \(\theta \in \mathbb{R}^P\).
Parameters:
Returns:
-
loss(Tensor) – -
H(Tensor) –Hessian approximation
(parameters, parameters)
Source code in laplace/curvature/curvature.py
kron
#
kron(x: Tensor | MutableMapping[str, Tensor | Any], y: Tensor, N: int, **kwargs: dict[str, Any]) -> tuple[Tensor, Kron]
Compute a Kronecker factored curvature approximation (such as KFAC). The approximation to \(H\) takes the form of two Kronecker factors \(Q, H\), i.e., \(H \approx Q \otimes H\) for each Module in the neural network permitting such curvature. \(Q\) is quadratic in the input-dimension of a module \(p_{in} \times p_{in}\) and \(H\) in the output-dimension \(p_{out} \times p_{out}\).
Parameters:
-
(x#Tensor) –input data
(batch, input_shape) -
(y#Tensor) –labels
(batch, label_shape) -
(N#int) –total number of data points
Returns:
-
loss(Tensor) – -
H(`laplace.utils.matrix.Kron`) –Kronecker factored Hessian approximation.
Source code in laplace/curvature/curvature.py
diag
#
Compute a diagonal Hessian approximation to \(H\) and is represented as a vector of the dimensionality of parameters \(\theta\).
Parameters:
Returns:
-
loss(Tensor) – -
H(Tensor) –vector representing the diagonal of H
Source code in laplace/curvature/curvature.py
GGNInterface
#
GGNInterface(model: Module, likelihood: Likelihood | str, last_layer: bool = False, subnetwork_indices: LongTensor | None = None, dict_key_x: str = 'input_ids', dict_key_y: str = 'labels', stochastic: bool = False, num_samples: int = 1)
Bases: CurvatureInterface
Generalized Gauss-Newton or Fisher Curvature Interface.
The GGN is equal to the Fisher information for the available likelihoods.
In addition to CurvatureInterface, methods for Jacobians are required by subclasses.
Parameters:
-
(model#torch.nn.Module or `laplace.utils.feature_extractor.FeatureExtractor`) –torch model (neural network)
-
(likelihood#('classification', 'regression'), default:'classification') – -
(last_layer#bool, default:False) –only consider curvature of last layer
-
(subnetwork_indices#Tensor, default:None) –indices of the vectorized model parameters that define the subnetwork to apply the Laplace approximation over
-
(dict_key_x#str, default:'input_ids') –The dictionary key under which the input tensor
xis stored. Only has effect when the model takes aMutableMappingas the input. Useful for Huggingface LLM models. -
(dict_key_y#str, default:'labels') –The dictionary key under which the target tensor
yis stored. Only has effect when the model takes aMutableMappingas the input. Useful for Huggingface LLM models. -
(stochastic#bool, default:False) –Fisher if stochastic else GGN
-
(num_samples#int, default:1) –Number of samples used to approximate the stochastic Fisher
Methods:
-
jacobians–Compute Jacobians \(\nabla_{\theta} f(x;\theta)\) at current parameter \(\theta\),
-
last_layer_jacobians–Compute Jacobians \(\nabla_{\theta_\textrm{last}} f(x;\theta_\textrm{last})\)
-
gradients–Compute batch gradients \(\nabla_\theta \ell(f(x;\theta, y)\) at
-
kron–Compute a Kronecker factored curvature approximation (such as KFAC).
-
full–Compute the full GGN \(P \times P\) matrix as Hessian approximation
Source code in laplace/curvature/curvature.py
jacobians
#
jacobians(x: Tensor | MutableMapping[str, Tensor | Any], enable_backprop: bool = False) -> tuple[Tensor, Tensor]
Compute Jacobians \(\nabla_{\theta} f(x;\theta)\) at current parameter \(\theta\), via torch.func.
Parameters:
-
(x#Tensor) –input data
(batch, input_shape)on compatible device with model. -
(enable_backprop#bool, default:= False) –whether to enable backprop through the Js and f w.r.t. x
Returns:
-
Js(Tensor) –Jacobians
(batch, parameters, outputs) -
f(Tensor) –output function
(batch, outputs)
Source code in laplace/curvature/curvature.py
last_layer_jacobians
#
last_layer_jacobians(x: Tensor | MutableMapping[str, Tensor | Any], enable_backprop: bool = False) -> tuple[Tensor, Tensor]
Compute Jacobians \(\nabla_{\theta_\textrm{last}} f(x;\theta_\textrm{last})\) only at current last-layer parameter \(\theta_{\textrm{last}}\).
Parameters:
Returns:
-
Js(Tensor) –Jacobians
(batch, outputs, last-layer-parameters) -
f(Tensor) –output function
(batch, outputs)
Source code in laplace/curvature/curvature.py
gradients
#
Compute batch gradients \(\nabla_\theta \ell(f(x;\theta, y)\) at current parameter \(\theta\).
Parameters:
Returns:
-
Gs(Tensor) –gradients
(batch, parameters) -
loss(Tensor) –
Source code in laplace/curvature/curvature.py
kron
#
kron(x: Tensor | MutableMapping[str, Tensor | Any], y: Tensor, N: int, **kwargs: dict[str, Any]) -> tuple[Tensor, Kron]
Compute a Kronecker factored curvature approximation (such as KFAC). The approximation to \(H\) takes the form of two Kronecker factors \(Q, H\), i.e., \(H \approx Q \otimes H\) for each Module in the neural network permitting such curvature. \(Q\) is quadratic in the input-dimension of a module \(p_{in} \times p_{in}\) and \(H\) in the output-dimension \(p_{out} \times p_{out}\).
Parameters:
-
(x#Tensor) –input data
(batch, input_shape) -
(y#Tensor) –labels
(batch, label_shape) -
(N#int) –total number of data points
Returns:
-
loss(Tensor) – -
H(`laplace.utils.matrix.Kron`) –Kronecker factored Hessian approximation.
Source code in laplace/curvature/curvature.py
_get_mc_functional_fisher
#
Approximate the Fisher's middle matrix (expected outer product of the functional gradient)
using MC integral with self.num_samples many samples.
Source code in laplace/curvature/curvature.py
full
#
full(x: Tensor | MutableMapping[str, Tensor | Any], y: Tensor, **kwargs: dict[str, Any]) -> tuple[Tensor, Tensor]
Compute the full GGN \(P \times P\) matrix as Hessian approximation \(H_{ggn}\) with respect to parameters \(\theta \in \mathbb{R}^P\). For last-layer, reduced to \(\theta_{last}\)
Parameters:
Returns:
-
loss(Tensor) – -
H(Tensor) –GGN
(parameters, parameters)
Source code in laplace/curvature/curvature.py
EFInterface
#
EFInterface(model: Module, likelihood: Likelihood | str, last_layer: bool = False, subnetwork_indices: LongTensor | None = None, dict_key_x: str = 'input_ids', dict_key_y: str = 'labels')
Bases: CurvatureInterface
Interface for Empirical Fisher as Hessian approximation.
In addition to CurvatureInterface, methods for gradients are required by subclasses.
Parameters:
-
(model#torch.nn.Module or `laplace.utils.feature_extractor.FeatureExtractor`) –torch model (neural network)
-
(likelihood#('classification', 'regression'), default:'classification') – -
(last_layer#bool, default:False) –only consider curvature of last layer
-
(subnetwork_indices#Tensor, default:None) –indices of the vectorized model parameters that define the subnetwork to apply the Laplace approximation over
-
(dict_key_x#str, default:'input_ids') –The dictionary key under which the input tensor
xis stored. Only has effect when the model takes aMutableMappingas the input. Useful for Huggingface LLM models. -
(dict_key_y#str, default:'labels') –The dictionary key under which the target tensor
yis stored. Only has effect when the model takes aMutableMappingas the input. Useful for Huggingface LLM models.
Attributes:
-
lossfunc(MSELoss or CrossEntropyLoss) – -
factor(float) –conversion factor between torch losses and base likelihoods For example, \(\frac{1}{2}\) to get to \(\mathcal{N}(f, 1)\) from MSELoss.
Methods:
-
jacobians–Compute Jacobians \(\nabla_{\theta} f(x;\theta)\) at current parameter \(\theta\),
-
last_layer_jacobians–Compute Jacobians \(\nabla_{\theta_\textrm{last}} f(x;\theta_\textrm{last})\)
-
gradients–Compute batch gradients \(\nabla_\theta \ell(f(x;\theta, y)\) at
-
kron–Compute a Kronecker factored curvature approximation (such as KFAC).
-
full–Compute the full EF \(P \times P\) matrix as Hessian approximation
Source code in laplace/curvature/curvature.py
jacobians
#
jacobians(x: Tensor | MutableMapping[str, Tensor | Any], enable_backprop: bool = False) -> tuple[Tensor, Tensor]
Compute Jacobians \(\nabla_{\theta} f(x;\theta)\) at current parameter \(\theta\), via torch.func.
Parameters:
-
(x#Tensor) –input data
(batch, input_shape)on compatible device with model. -
(enable_backprop#bool, default:= False) –whether to enable backprop through the Js and f w.r.t. x
Returns:
-
Js(Tensor) –Jacobians
(batch, parameters, outputs) -
f(Tensor) –output function
(batch, outputs)
Source code in laplace/curvature/curvature.py
last_layer_jacobians
#
last_layer_jacobians(x: Tensor | MutableMapping[str, Tensor | Any], enable_backprop: bool = False) -> tuple[Tensor, Tensor]
Compute Jacobians \(\nabla_{\theta_\textrm{last}} f(x;\theta_\textrm{last})\) only at current last-layer parameter \(\theta_{\textrm{last}}\).
Parameters:
Returns:
-
Js(Tensor) –Jacobians
(batch, outputs, last-layer-parameters) -
f(Tensor) –output function
(batch, outputs)
Source code in laplace/curvature/curvature.py
gradients
#
Compute batch gradients \(\nabla_\theta \ell(f(x;\theta, y)\) at current parameter \(\theta\).
Parameters:
Returns:
-
Gs(Tensor) –gradients
(batch, parameters) -
loss(Tensor) –
Source code in laplace/curvature/curvature.py
kron
#
kron(x: Tensor | MutableMapping[str, Tensor | Any], y: Tensor, N: int, **kwargs: dict[str, Any]) -> tuple[Tensor, Kron]
Compute a Kronecker factored curvature approximation (such as KFAC). The approximation to \(H\) takes the form of two Kronecker factors \(Q, H\), i.e., \(H \approx Q \otimes H\) for each Module in the neural network permitting such curvature. \(Q\) is quadratic in the input-dimension of a module \(p_{in} \times p_{in}\) and \(H\) in the output-dimension \(p_{out} \times p_{out}\).
Parameters:
-
(x#Tensor) –input data
(batch, input_shape) -
(y#Tensor) –labels
(batch, label_shape) -
(N#int) –total number of data points
Returns:
-
loss(Tensor) – -
H(`laplace.utils.matrix.Kron`) –Kronecker factored Hessian approximation.
Source code in laplace/curvature/curvature.py
full
#
full(x: Tensor | MutableMapping[str, Tensor | Any], y: Tensor, **kwargs: dict[str, Any]) -> tuple[Tensor, Tensor]
Compute the full EF \(P \times P\) matrix as Hessian approximation \(H_{ef}\) with respect to parameters \(\theta \in \mathbb{R}^P\). For last-layer, reduced to \(\theta_{last}\)
Parameters:
Returns:
-
loss(Tensor) – -
H_ef(Tensor) –EF
(parameters, parameters)
Source code in laplace/curvature/curvature.py
AsdlInterface
#
AsdlInterface(model: Module, likelihood: Likelihood | str, last_layer: bool = False, subnetwork_indices: LongTensor | None = None, dict_key_x: str = 'input_ids', dict_key_y: str = 'labels')
Bases: CurvatureInterface
Interface for asdfghjkl backend.
Methods:
-
last_layer_jacobians–Compute Jacobians \(\nabla_{\theta_\textrm{last}} f(x;\theta_\textrm{last})\)
-
full–Compute a dense curvature (approximation) in the form of a \(P \times P\) matrix
-
jacobians–Compute Jacobians \(\nabla_\theta f(x;\theta)\) at current parameter \(\theta\)
-
gradients–Compute gradients \(\nabla_\theta \ell(f(x;\theta, y)\) at current parameter
Source code in laplace/curvature/asdl.py
last_layer_jacobians
#
last_layer_jacobians(x: Tensor | MutableMapping[str, Tensor | Any], enable_backprop: bool = False) -> tuple[Tensor, Tensor]
Compute Jacobians \(\nabla_{\theta_\textrm{last}} f(x;\theta_\textrm{last})\) only at current last-layer parameter \(\theta_{\textrm{last}}\).
Parameters:
Returns:
-
Js(Tensor) –Jacobians
(batch, outputs, last-layer-parameters) -
f(Tensor) –output function
(batch, outputs)
Source code in laplace/curvature/curvature.py
full
#
Compute a dense curvature (approximation) in the form of a \(P \times P\) matrix \(H\) with respect to parameters \(\theta \in \mathbb{R}^P\).
Parameters:
Returns:
-
loss(Tensor) – -
H(Tensor) –Hessian approximation
(parameters, parameters)
Source code in laplace/curvature/curvature.py
jacobians
#
jacobians(x: Tensor | MutableMapping[str, Tensor | Any], enable_backprop: bool = False) -> tuple[Tensor, Tensor]
Compute Jacobians \(\nabla_\theta f(x;\theta)\) at current parameter \(\theta\) using asdfghjkl's gradient per output dimension.
Parameters:
-
(x#Tensor or MutableMapping(dict, UserDict)) –input data
(batch, input_shape)on compatible device with model if torch.Tensor. If MutableMapping, then at least containsself.dict_key_x. The latter is specific for reward modeling. -
(enable_backprop#bool, default:= False) –whether to enable backprop through the Js and f w.r.t. x
Returns:
-
Js(Tensor) –Jacobians
(batch, parameters, outputs) -
f(Tensor) –output function
(batch, outputs)
Source code in laplace/curvature/asdl.py
gradients
#
Compute gradients \(\nabla_\theta \ell(f(x;\theta, y)\) at current parameter \(\theta\) using asdfghjkl's backend.
Parameters:
Returns:
-
loss(Tensor) – -
Gs(Tensor) –gradients
(batch, parameters)
Source code in laplace/curvature/asdl.py
_get_batch_size
#
_get_batch_size(x: Tensor | MutableMapping[str, Tensor | Any]) -> int | None
ASDL assumes that all leading dimensions are the batch size by default (batch_size = None). Here, we want to specify that only the first dimension is the actual batch size. This is the case for LLMs.
Source code in laplace/curvature/asdl.py
AsdlGGN
#
AsdlGGN(model: Module, likelihood: Likelihood | str, last_layer: bool = False, subnetwork_indices: LongTensor | None = None, dict_key_x: str = 'input_ids', dict_key_y: str = 'labels', stochastic: bool = False)
Bases: AsdlInterface, GGNInterface
Implementation of the GGNInterface using asdfghjkl.
Methods:
-
jacobians–Compute Jacobians \(\nabla_\theta f(x;\theta)\) at current parameter \(\theta\)
-
last_layer_jacobians–Compute Jacobians \(\nabla_{\theta_\textrm{last}} f(x;\theta_\textrm{last})\)
-
gradients–Compute gradients \(\nabla_\theta \ell(f(x;\theta, y)\) at current parameter
-
full–Compute the full GGN \(P \times P\) matrix as Hessian approximation
Source code in laplace/curvature/asdl.py
jacobians
#
jacobians(x: Tensor | MutableMapping[str, Tensor | Any], enable_backprop: bool = False) -> tuple[Tensor, Tensor]
Compute Jacobians \(\nabla_\theta f(x;\theta)\) at current parameter \(\theta\) using asdfghjkl's gradient per output dimension.
Parameters:
-
(x#Tensor or MutableMapping(dict, UserDict)) –input data
(batch, input_shape)on compatible device with model if torch.Tensor. If MutableMapping, then at least containsself.dict_key_x. The latter is specific for reward modeling. -
(enable_backprop#bool, default:= False) –whether to enable backprop through the Js and f w.r.t. x
Returns:
-
Js(Tensor) –Jacobians
(batch, parameters, outputs) -
f(Tensor) –output function
(batch, outputs)
Source code in laplace/curvature/asdl.py
last_layer_jacobians
#
last_layer_jacobians(x: Tensor | MutableMapping[str, Tensor | Any], enable_backprop: bool = False) -> tuple[Tensor, Tensor]
Compute Jacobians \(\nabla_{\theta_\textrm{last}} f(x;\theta_\textrm{last})\) only at current last-layer parameter \(\theta_{\textrm{last}}\).
Parameters:
Returns:
-
Js(Tensor) –Jacobians
(batch, outputs, last-layer-parameters) -
f(Tensor) –output function
(batch, outputs)
Source code in laplace/curvature/curvature.py
gradients
#
Compute gradients \(\nabla_\theta \ell(f(x;\theta, y)\) at current parameter \(\theta\) using asdfghjkl's backend.
Parameters:
Returns:
-
loss(Tensor) – -
Gs(Tensor) –gradients
(batch, parameters)
Source code in laplace/curvature/asdl.py
full
#
full(x: Tensor | MutableMapping[str, Tensor | Any], y: Tensor, **kwargs: dict[str, Any]) -> tuple[Tensor, Tensor]
Compute the full GGN \(P \times P\) matrix as Hessian approximation \(H_{ggn}\) with respect to parameters \(\theta \in \mathbb{R}^P\). For last-layer, reduced to \(\theta_{last}\)
Parameters:
Returns:
-
loss(Tensor) – -
H(Tensor) –GGN
(parameters, parameters)
Source code in laplace/curvature/curvature.py
_get_mc_functional_fisher
#
Approximate the Fisher's middle matrix (expected outer product of the functional gradient)
using MC integral with self.num_samples many samples.
Source code in laplace/curvature/curvature.py
_get_batch_size
#
_get_batch_size(x: Tensor | MutableMapping[str, Tensor | Any]) -> int | None
ASDL assumes that all leading dimensions are the batch size by default (batch_size = None). Here, we want to specify that only the first dimension is the actual batch size. This is the case for LLMs.
Source code in laplace/curvature/asdl.py
AsdlEF
#
AsdlEF(model: Module, likelihood: Likelihood | str, last_layer: bool = False, dict_key_x: str = 'input_ids', dict_key_y: str = 'labels')
Bases: AsdlInterface, EFInterface
Implementation of the EFInterface using asdfghjkl.
Methods:
-
jacobians–Compute Jacobians \(\nabla_\theta f(x;\theta)\) at current parameter \(\theta\)
-
last_layer_jacobians–Compute Jacobians \(\nabla_{\theta_\textrm{last}} f(x;\theta_\textrm{last})\)
-
gradients–Compute gradients \(\nabla_\theta \ell(f(x;\theta, y)\) at current parameter
-
full–Compute the full EF \(P \times P\) matrix as Hessian approximation
Source code in laplace/curvature/asdl.py
jacobians
#
jacobians(x: Tensor | MutableMapping[str, Tensor | Any], enable_backprop: bool = False) -> tuple[Tensor, Tensor]
Compute Jacobians \(\nabla_\theta f(x;\theta)\) at current parameter \(\theta\) using asdfghjkl's gradient per output dimension.
Parameters:
-
(x#Tensor or MutableMapping(dict, UserDict)) –input data
(batch, input_shape)on compatible device with model if torch.Tensor. If MutableMapping, then at least containsself.dict_key_x. The latter is specific for reward modeling. -
(enable_backprop#bool, default:= False) –whether to enable backprop through the Js and f w.r.t. x
Returns:
-
Js(Tensor) –Jacobians
(batch, parameters, outputs) -
f(Tensor) –output function
(batch, outputs)
Source code in laplace/curvature/asdl.py
last_layer_jacobians
#
last_layer_jacobians(x: Tensor | MutableMapping[str, Tensor | Any], enable_backprop: bool = False) -> tuple[Tensor, Tensor]
Compute Jacobians \(\nabla_{\theta_\textrm{last}} f(x;\theta_\textrm{last})\) only at current last-layer parameter \(\theta_{\textrm{last}}\).
Parameters:
Returns:
-
Js(Tensor) –Jacobians
(batch, outputs, last-layer-parameters) -
f(Tensor) –output function
(batch, outputs)
Source code in laplace/curvature/curvature.py
gradients
#
Compute gradients \(\nabla_\theta \ell(f(x;\theta, y)\) at current parameter \(\theta\) using asdfghjkl's backend.
Parameters:
Returns:
-
loss(Tensor) – -
Gs(Tensor) –gradients
(batch, parameters)
Source code in laplace/curvature/asdl.py
full
#
full(x: Tensor | MutableMapping[str, Tensor | Any], y: Tensor, **kwargs: dict[str, Any]) -> tuple[Tensor, Tensor]
Compute the full EF \(P \times P\) matrix as Hessian approximation \(H_{ef}\) with respect to parameters \(\theta \in \mathbb{R}^P\). For last-layer, reduced to \(\theta_{last}\)
Parameters:
Returns:
-
loss(Tensor) – -
H_ef(Tensor) –EF
(parameters, parameters)
Source code in laplace/curvature/curvature.py
_get_batch_size
#
_get_batch_size(x: Tensor | MutableMapping[str, Tensor | Any]) -> int | None
ASDL assumes that all leading dimensions are the batch size by default (batch_size = None). Here, we want to specify that only the first dimension is the actual batch size. This is the case for LLMs.
Source code in laplace/curvature/asdl.py
AsdlHessian
#
AsdlHessian(model: Module, likelihood: Likelihood | str, last_layer: bool = False, dict_key_x: str = 'input_ids', dict_key_y: str = 'labels')
Bases: AsdlInterface
Methods:
-
jacobians–Compute Jacobians \(\nabla_\theta f(x;\theta)\) at current parameter \(\theta\)
-
last_layer_jacobians–Compute Jacobians \(\nabla_{\theta_\textrm{last}} f(x;\theta_\textrm{last})\)
-
gradients–Compute gradients \(\nabla_\theta \ell(f(x;\theta, y)\) at current parameter
Source code in laplace/curvature/asdl.py
jacobians
#
jacobians(x: Tensor | MutableMapping[str, Tensor | Any], enable_backprop: bool = False) -> tuple[Tensor, Tensor]
Compute Jacobians \(\nabla_\theta f(x;\theta)\) at current parameter \(\theta\) using asdfghjkl's gradient per output dimension.
Parameters:
-
(x#Tensor or MutableMapping(dict, UserDict)) –input data
(batch, input_shape)on compatible device with model if torch.Tensor. If MutableMapping, then at least containsself.dict_key_x. The latter is specific for reward modeling. -
(enable_backprop#bool, default:= False) –whether to enable backprop through the Js and f w.r.t. x
Returns:
-
Js(Tensor) –Jacobians
(batch, parameters, outputs) -
f(Tensor) –output function
(batch, outputs)
Source code in laplace/curvature/asdl.py
last_layer_jacobians
#
last_layer_jacobians(x: Tensor | MutableMapping[str, Tensor | Any], enable_backprop: bool = False) -> tuple[Tensor, Tensor]
Compute Jacobians \(\nabla_{\theta_\textrm{last}} f(x;\theta_\textrm{last})\) only at current last-layer parameter \(\theta_{\textrm{last}}\).
Parameters:
Returns:
-
Js(Tensor) –Jacobians
(batch, outputs, last-layer-parameters) -
f(Tensor) –output function
(batch, outputs)
Source code in laplace/curvature/curvature.py
gradients
#
Compute gradients \(\nabla_\theta \ell(f(x;\theta, y)\) at current parameter \(\theta\) using asdfghjkl's backend.
Parameters:
Returns:
-
loss(Tensor) – -
Gs(Tensor) –gradients
(batch, parameters)
Source code in laplace/curvature/asdl.py
_get_batch_size
#
_get_batch_size(x: Tensor | MutableMapping[str, Tensor | Any]) -> int | None
ASDL assumes that all leading dimensions are the batch size by default (batch_size = None). Here, we want to specify that only the first dimension is the actual batch size. This is the case for LLMs.
Source code in laplace/curvature/asdl.py
BackPackInterface
#
BackPackInterface(model: Module, likelihood: Likelihood | str, last_layer: bool = False, subnetwork_indices: LongTensor | None = None, dict_key_x: str = 'input_ids', dict_key_y: str = 'labels')
Bases: CurvatureInterface
Interface for Backpack backend.
Methods:
-
last_layer_jacobians–Compute Jacobians \(\nabla_{\theta_\textrm{last}} f(x;\theta_\textrm{last})\)
-
full–Compute a dense curvature (approximation) in the form of a \(P \times P\) matrix
-
kron–Compute a Kronecker factored curvature approximation (such as KFAC).
-
diag–Compute a diagonal Hessian approximation to \(H\) and is represented as a
-
jacobians–Compute Jacobians \(\nabla_{\theta} f(x;\theta)\) at current parameter \(\theta\)
-
gradients–Compute gradients \(\nabla_\theta \ell(f(x;\theta, y)\) at current parameter
Source code in laplace/curvature/backpack.py
last_layer_jacobians
#
last_layer_jacobians(x: Tensor | MutableMapping[str, Tensor | Any], enable_backprop: bool = False) -> tuple[Tensor, Tensor]
Compute Jacobians \(\nabla_{\theta_\textrm{last}} f(x;\theta_\textrm{last})\) only at current last-layer parameter \(\theta_{\textrm{last}}\).
Parameters:
Returns:
-
Js(Tensor) –Jacobians
(batch, outputs, last-layer-parameters) -
f(Tensor) –output function
(batch, outputs)
Source code in laplace/curvature/curvature.py
full
#
Compute a dense curvature (approximation) in the form of a \(P \times P\) matrix \(H\) with respect to parameters \(\theta \in \mathbb{R}^P\).
Parameters:
Returns:
-
loss(Tensor) – -
H(Tensor) –Hessian approximation
(parameters, parameters)
Source code in laplace/curvature/curvature.py
kron
#
kron(x: Tensor | MutableMapping[str, Tensor | Any], y: Tensor, N: int, **kwargs: dict[str, Any]) -> tuple[Tensor, Kron]
Compute a Kronecker factored curvature approximation (such as KFAC). The approximation to \(H\) takes the form of two Kronecker factors \(Q, H\), i.e., \(H \approx Q \otimes H\) for each Module in the neural network permitting such curvature. \(Q\) is quadratic in the input-dimension of a module \(p_{in} \times p_{in}\) and \(H\) in the output-dimension \(p_{out} \times p_{out}\).
Parameters:
-
(x#Tensor) –input data
(batch, input_shape) -
(y#Tensor) –labels
(batch, label_shape) -
(N#int) –total number of data points
Returns:
-
loss(Tensor) – -
H(`laplace.utils.matrix.Kron`) –Kronecker factored Hessian approximation.
Source code in laplace/curvature/curvature.py
diag
#
Compute a diagonal Hessian approximation to \(H\) and is represented as a vector of the dimensionality of parameters \(\theta\).
Parameters:
Returns:
-
loss(Tensor) – -
H(Tensor) –vector representing the diagonal of H
Source code in laplace/curvature/curvature.py
jacobians
#
jacobians(x: Tensor | MutableMapping[str, Tensor | Any], enable_backprop: bool = False) -> tuple[Tensor, Tensor]
Compute Jacobians \(\nabla_{\theta} f(x;\theta)\) at current parameter \(\theta\) using backpack's BatchGrad per output dimension. Note that BackPACK doesn't play well with torch.func, so this method has to be overridden.
Parameters:
-
(x#Tensor) –input data
(batch, input_shape)on compatible device with model. -
(enable_backprop#bool, default:= False) –whether to enable backprop through the Js and f w.r.t. x
Returns:
-
Js(Tensor) –Jacobians
(batch, parameters, outputs) -
f(Tensor) –output function
(batch, outputs)
Source code in laplace/curvature/backpack.py
gradients
#
Compute gradients \(\nabla_\theta \ell(f(x;\theta, y)\) at current parameter \(\theta\) using Backpack's BatchGrad. Note that BackPACK doesn't play well with torch.func, so this method has to be overridden.
Parameters:
Returns:
-
Gs(Tensor) –gradients
(batch, parameters) -
loss(Tensor) –
Source code in laplace/curvature/backpack.py
BackPackGGN
#
BackPackGGN(model: Module, likelihood: Likelihood | str, last_layer: bool = False, subnetwork_indices: LongTensor | None = None, dict_key_x: str = 'input_ids', dict_key_y: str = 'labels', stochastic: bool = False)
Bases: BackPackInterface, GGNInterface
Implementation of the GGNInterface using Backpack.
Methods:
-
jacobians–Compute Jacobians \(\nabla_{\theta} f(x;\theta)\) at current parameter \(\theta\)
-
last_layer_jacobians–Compute Jacobians \(\nabla_{\theta_\textrm{last}} f(x;\theta_\textrm{last})\)
-
gradients–Compute gradients \(\nabla_\theta \ell(f(x;\theta, y)\) at current parameter
-
full–Compute the full GGN \(P \times P\) matrix as Hessian approximation
Source code in laplace/curvature/backpack.py
jacobians
#
jacobians(x: Tensor | MutableMapping[str, Tensor | Any], enable_backprop: bool = False) -> tuple[Tensor, Tensor]
Compute Jacobians \(\nabla_{\theta} f(x;\theta)\) at current parameter \(\theta\) using backpack's BatchGrad per output dimension. Note that BackPACK doesn't play well with torch.func, so this method has to be overridden.
Parameters:
-
(x#Tensor) –input data
(batch, input_shape)on compatible device with model. -
(enable_backprop#bool, default:= False) –whether to enable backprop through the Js and f w.r.t. x
Returns:
-
Js(Tensor) –Jacobians
(batch, parameters, outputs) -
f(Tensor) –output function
(batch, outputs)
Source code in laplace/curvature/backpack.py
last_layer_jacobians
#
last_layer_jacobians(x: Tensor | MutableMapping[str, Tensor | Any], enable_backprop: bool = False) -> tuple[Tensor, Tensor]
Compute Jacobians \(\nabla_{\theta_\textrm{last}} f(x;\theta_\textrm{last})\) only at current last-layer parameter \(\theta_{\textrm{last}}\).
Parameters:
Returns:
-
Js(Tensor) –Jacobians
(batch, outputs, last-layer-parameters) -
f(Tensor) –output function
(batch, outputs)
Source code in laplace/curvature/curvature.py
gradients
#
Compute gradients \(\nabla_\theta \ell(f(x;\theta, y)\) at current parameter \(\theta\) using Backpack's BatchGrad. Note that BackPACK doesn't play well with torch.func, so this method has to be overridden.
Parameters:
Returns:
-
Gs(Tensor) –gradients
(batch, parameters) -
loss(Tensor) –
Source code in laplace/curvature/backpack.py
full
#
full(x: Tensor | MutableMapping[str, Tensor | Any], y: Tensor, **kwargs: dict[str, Any]) -> tuple[Tensor, Tensor]
Compute the full GGN \(P \times P\) matrix as Hessian approximation \(H_{ggn}\) with respect to parameters \(\theta \in \mathbb{R}^P\). For last-layer, reduced to \(\theta_{last}\)
Parameters:
Returns:
-
loss(Tensor) – -
H(Tensor) –GGN
(parameters, parameters)
Source code in laplace/curvature/curvature.py
_get_mc_functional_fisher
#
Approximate the Fisher's middle matrix (expected outer product of the functional gradient)
using MC integral with self.num_samples many samples.
Source code in laplace/curvature/curvature.py
BackPackEF
#
BackPackEF(model: Module, likelihood: Likelihood | str, last_layer: bool = False, subnetwork_indices: LongTensor | None = None, dict_key_x: str = 'input_ids', dict_key_y: str = 'labels')
Bases: BackPackInterface, EFInterface
Implementation of EFInterface using Backpack.
Methods:
-
jacobians–Compute Jacobians \(\nabla_{\theta} f(x;\theta)\) at current parameter \(\theta\)
-
last_layer_jacobians–Compute Jacobians \(\nabla_{\theta_\textrm{last}} f(x;\theta_\textrm{last})\)
-
gradients–Compute gradients \(\nabla_\theta \ell(f(x;\theta, y)\) at current parameter
-
full–Compute the full EF \(P \times P\) matrix as Hessian approximation
Source code in laplace/curvature/backpack.py
jacobians
#
jacobians(x: Tensor | MutableMapping[str, Tensor | Any], enable_backprop: bool = False) -> tuple[Tensor, Tensor]
Compute Jacobians \(\nabla_{\theta} f(x;\theta)\) at current parameter \(\theta\) using backpack's BatchGrad per output dimension. Note that BackPACK doesn't play well with torch.func, so this method has to be overridden.
Parameters:
-
(x#Tensor) –input data
(batch, input_shape)on compatible device with model. -
(enable_backprop#bool, default:= False) –whether to enable backprop through the Js and f w.r.t. x
Returns:
-
Js(Tensor) –Jacobians
(batch, parameters, outputs) -
f(Tensor) –output function
(batch, outputs)
Source code in laplace/curvature/backpack.py
last_layer_jacobians
#
last_layer_jacobians(x: Tensor | MutableMapping[str, Tensor | Any], enable_backprop: bool = False) -> tuple[Tensor, Tensor]
Compute Jacobians \(\nabla_{\theta_\textrm{last}} f(x;\theta_\textrm{last})\) only at current last-layer parameter \(\theta_{\textrm{last}}\).
Parameters:
Returns:
-
Js(Tensor) –Jacobians
(batch, outputs, last-layer-parameters) -
f(Tensor) –output function
(batch, outputs)
Source code in laplace/curvature/curvature.py
gradients
#
Compute gradients \(\nabla_\theta \ell(f(x;\theta, y)\) at current parameter \(\theta\) using Backpack's BatchGrad. Note that BackPACK doesn't play well with torch.func, so this method has to be overridden.
Parameters:
Returns:
-
Gs(Tensor) –gradients
(batch, parameters) -
loss(Tensor) –
Source code in laplace/curvature/backpack.py
full
#
full(x: Tensor | MutableMapping[str, Tensor | Any], y: Tensor, **kwargs: dict[str, Any]) -> tuple[Tensor, Tensor]
Compute the full EF \(P \times P\) matrix as Hessian approximation \(H_{ef}\) with respect to parameters \(\theta \in \mathbb{R}^P\). For last-layer, reduced to \(\theta_{last}\)
Parameters:
Returns:
-
loss(Tensor) – -
H_ef(Tensor) –EF
(parameters, parameters)
Source code in laplace/curvature/curvature.py
CurvlinopsInterface
#
CurvlinopsInterface(model: Module, likelihood: Likelihood | str, last_layer: bool = False, subnetwork_indices: LongTensor | None = None, dict_key_x: str = 'input_ids', dict_key_y: str = 'labels')
Bases: CurvatureInterface
Interface for Curvlinops backend. https://github.com/f-dangel/curvlinops
Methods:
-
jacobians–Compute Jacobians \(\nabla_{\theta} f(x;\theta)\) at current parameter \(\theta\),
-
last_layer_jacobians–Compute Jacobians \(\nabla_{\theta_\textrm{last}} f(x;\theta_\textrm{last})\)
-
gradients–Compute batch gradients \(\nabla_\theta \ell(f(x;\theta, y)\) at
-
diag–Compute a diagonal Hessian approximation to \(H\) and is represented as a
Source code in laplace/curvature/curvlinops.py
jacobians
#
jacobians(x: Tensor | MutableMapping[str, Tensor | Any], enable_backprop: bool = False) -> tuple[Tensor, Tensor]
Compute Jacobians \(\nabla_{\theta} f(x;\theta)\) at current parameter \(\theta\), via torch.func.
Parameters:
-
(x#Tensor) –input data
(batch, input_shape)on compatible device with model. -
(enable_backprop#bool, default:= False) –whether to enable backprop through the Js and f w.r.t. x
Returns:
-
Js(Tensor) –Jacobians
(batch, parameters, outputs) -
f(Tensor) –output function
(batch, outputs)
Source code in laplace/curvature/curvature.py
last_layer_jacobians
#
last_layer_jacobians(x: Tensor | MutableMapping[str, Tensor | Any], enable_backprop: bool = False) -> tuple[Tensor, Tensor]
Compute Jacobians \(\nabla_{\theta_\textrm{last}} f(x;\theta_\textrm{last})\) only at current last-layer parameter \(\theta_{\textrm{last}}\).
Parameters:
Returns:
-
Js(Tensor) –Jacobians
(batch, outputs, last-layer-parameters) -
f(Tensor) –output function
(batch, outputs)
Source code in laplace/curvature/curvature.py
gradients
#
Compute batch gradients \(\nabla_\theta \ell(f(x;\theta, y)\) at current parameter \(\theta\).
Parameters:
Returns:
-
Gs(Tensor) –gradients
(batch, parameters) -
loss(Tensor) –
Source code in laplace/curvature/curvature.py
diag
#
Compute a diagonal Hessian approximation to \(H\) and is represented as a vector of the dimensionality of parameters \(\theta\).
Parameters:
Returns:
-
loss(Tensor) – -
H(Tensor) –vector representing the diagonal of H
Source code in laplace/curvature/curvature.py
CurvlinopsGGN
#
CurvlinopsGGN(model: Module, likelihood: Likelihood | str, last_layer: bool = False, subnetwork_indices: LongTensor | None = None, dict_key_x: str = 'input_ids', dict_key_y: str = 'labels', stochastic: bool = False)
Bases: CurvlinopsInterface, GGNInterface
Implementation of the GGNInterface using Curvlinops.
Methods:
-
jacobians–Compute Jacobians \(\nabla_{\theta} f(x;\theta)\) at current parameter \(\theta\),
-
last_layer_jacobians–Compute Jacobians \(\nabla_{\theta_\textrm{last}} f(x;\theta_\textrm{last})\)
-
gradients–Compute batch gradients \(\nabla_\theta \ell(f(x;\theta, y)\) at
Source code in laplace/curvature/curvlinops.py
jacobians
#
jacobians(x: Tensor | MutableMapping[str, Tensor | Any], enable_backprop: bool = False) -> tuple[Tensor, Tensor]
Compute Jacobians \(\nabla_{\theta} f(x;\theta)\) at current parameter \(\theta\), via torch.func.
Parameters:
-
(x#Tensor) –input data
(batch, input_shape)on compatible device with model. -
(enable_backprop#bool, default:= False) –whether to enable backprop through the Js and f w.r.t. x
Returns:
-
Js(Tensor) –Jacobians
(batch, parameters, outputs) -
f(Tensor) –output function
(batch, outputs)
Source code in laplace/curvature/curvature.py
last_layer_jacobians
#
last_layer_jacobians(x: Tensor | MutableMapping[str, Tensor | Any], enable_backprop: bool = False) -> tuple[Tensor, Tensor]
Compute Jacobians \(\nabla_{\theta_\textrm{last}} f(x;\theta_\textrm{last})\) only at current last-layer parameter \(\theta_{\textrm{last}}\).
Parameters:
Returns:
-
Js(Tensor) –Jacobians
(batch, outputs, last-layer-parameters) -
f(Tensor) –output function
(batch, outputs)
Source code in laplace/curvature/curvature.py
gradients
#
Compute batch gradients \(\nabla_\theta \ell(f(x;\theta, y)\) at current parameter \(\theta\).
Parameters:
Returns:
-
Gs(Tensor) –gradients
(batch, parameters) -
loss(Tensor) –
Source code in laplace/curvature/curvature.py
_get_mc_functional_fisher
#
Approximate the Fisher's middle matrix (expected outer product of the functional gradient)
using MC integral with self.num_samples many samples.
Source code in laplace/curvature/curvature.py
CurvlinopsEF
#
CurvlinopsEF(model: Module, likelihood: Likelihood | str, last_layer: bool = False, subnetwork_indices: LongTensor | None = None, dict_key_x: str = 'input_ids', dict_key_y: str = 'labels')
Bases: CurvlinopsInterface, EFInterface
Implementation of EFInterface using Curvlinops.
Methods:
-
jacobians–Compute Jacobians \(\nabla_{\theta} f(x;\theta)\) at current parameter \(\theta\),
-
last_layer_jacobians–Compute Jacobians \(\nabla_{\theta_\textrm{last}} f(x;\theta_\textrm{last})\)
-
gradients–Compute batch gradients \(\nabla_\theta \ell(f(x;\theta, y)\) at
Source code in laplace/curvature/curvlinops.py
jacobians
#
jacobians(x: Tensor | MutableMapping[str, Tensor | Any], enable_backprop: bool = False) -> tuple[Tensor, Tensor]
Compute Jacobians \(\nabla_{\theta} f(x;\theta)\) at current parameter \(\theta\), via torch.func.
Parameters:
-
(x#Tensor) –input data
(batch, input_shape)on compatible device with model. -
(enable_backprop#bool, default:= False) –whether to enable backprop through the Js and f w.r.t. x
Returns:
-
Js(Tensor) –Jacobians
(batch, parameters, outputs) -
f(Tensor) –output function
(batch, outputs)
Source code in laplace/curvature/curvature.py
last_layer_jacobians
#
last_layer_jacobians(x: Tensor | MutableMapping[str, Tensor | Any], enable_backprop: bool = False) -> tuple[Tensor, Tensor]
Compute Jacobians \(\nabla_{\theta_\textrm{last}} f(x;\theta_\textrm{last})\) only at current last-layer parameter \(\theta_{\textrm{last}}\).
Parameters:
Returns:
-
Js(Tensor) –Jacobians
(batch, outputs, last-layer-parameters) -
f(Tensor) –output function
(batch, outputs)
Source code in laplace/curvature/curvature.py
gradients
#
Compute batch gradients \(\nabla_\theta \ell(f(x;\theta, y)\) at current parameter \(\theta\).
Parameters:
Returns:
-
Gs(Tensor) –gradients
(batch, parameters) -
loss(Tensor) –
Source code in laplace/curvature/curvature.py
CurvlinopsHessian
#
CurvlinopsHessian(model: Module, likelihood: Likelihood | str, last_layer: bool = False, subnetwork_indices: LongTensor | None = None, dict_key_x: str = 'input_ids', dict_key_y: str = 'labels')
Bases: CurvlinopsInterface
Implementation of the full Hessian using Curvlinops.
Methods:
-
jacobians–Compute Jacobians \(\nabla_{\theta} f(x;\theta)\) at current parameter \(\theta\),
-
last_layer_jacobians–Compute Jacobians \(\nabla_{\theta_\textrm{last}} f(x;\theta_\textrm{last})\)
-
gradients–Compute batch gradients \(\nabla_\theta \ell(f(x;\theta, y)\) at
-
diag–Compute a diagonal Hessian approximation to \(H\) and is represented as a
Source code in laplace/curvature/curvlinops.py
jacobians
#
jacobians(x: Tensor | MutableMapping[str, Tensor | Any], enable_backprop: bool = False) -> tuple[Tensor, Tensor]
Compute Jacobians \(\nabla_{\theta} f(x;\theta)\) at current parameter \(\theta\), via torch.func.
Parameters:
-
(x#Tensor) –input data
(batch, input_shape)on compatible device with model. -
(enable_backprop#bool, default:= False) –whether to enable backprop through the Js and f w.r.t. x
Returns:
-
Js(Tensor) –Jacobians
(batch, parameters, outputs) -
f(Tensor) –output function
(batch, outputs)
Source code in laplace/curvature/curvature.py
last_layer_jacobians
#
last_layer_jacobians(x: Tensor | MutableMapping[str, Tensor | Any], enable_backprop: bool = False) -> tuple[Tensor, Tensor]
Compute Jacobians \(\nabla_{\theta_\textrm{last}} f(x;\theta_\textrm{last})\) only at current last-layer parameter \(\theta_{\textrm{last}}\).
Parameters:
Returns:
-
Js(Tensor) –Jacobians
(batch, outputs, last-layer-parameters) -
f(Tensor) –output function
(batch, outputs)
Source code in laplace/curvature/curvature.py
gradients
#
Compute batch gradients \(\nabla_\theta \ell(f(x;\theta, y)\) at current parameter \(\theta\).
Parameters:
Returns:
-
Gs(Tensor) –gradients
(batch, parameters) -
loss(Tensor) –
Source code in laplace/curvature/curvature.py
diag
#
Compute a diagonal Hessian approximation to \(H\) and is represented as a vector of the dimensionality of parameters \(\theta\).
Parameters:
Returns:
-
loss(Tensor) – -
H(Tensor) –vector representing the diagonal of H