clawdia.dictionaries#
Main module for managing all SDL models.
This module serves as the central interface for handling dictionary models included in the CLAWDIA pipeline. It provides classes and functions to load, save, and manage different types of dictionary models used in Sparse Dictionary Learning (SDL). Support is included for both SPAMS-based dictionaries and Low-Rank Sparse Dictionary Learning (LRSDL) models, ensuring compatibility and ease of use.
- class clawdia.dictionaries.DictionaryLRSDL(lambd=0.01, lambd2=0.01, eta=0.0001, k=10, k0=5, updateX_iters=100, updateD_iters=100)[source]#
Bases:
LRSDLInterface for the Low-Rank Shared Dictionary Learning class.
Notes
The authors of Dictol didn’t provide a seed parameter for the random initialization of the dictionary. If reproducibility is important, one must set the global numpy’s seed before callint
LRSDL.__init__().References
[1]Vu, T. H.; Monga, V. (2017). Fast low-rank shared dictionary learning for image classification, IEEE Transactions on Image Processing, 26(11), 5160–5175. (https://doi.org/10.1109/TIP.2017.2729885)
- Attributes:
- t_trainfloat
Training time in seconds.
- lambdfloat
See self.__init__() for details.
- lambd2float
See self.__init__() for details.
- etafloat
See self.__init__() for details.
- Dndarray
Class-specific dictionary.
- Xndarray
Class-specific coefficient vector of the training set given when calling self.fit().
- Yndarray
Class-specific target vector (the training set) given when calling self.fit().
- kint
See self.__init__() for details.
- k0int
See self.__init__() for details.
- updateX_itersint
See self.__init__() for details.
- updateD_itersint
See self.__init__() for details.
- D_rangelist[int]
Auxiliar list containing the range of indices of each class in D.
- D0ndarray
Shared dictionary.
- Y_rangelist[→nt]
Auxiliar list containing the range of indices of each class in Y. Derived directly from ‘train_label’, equivalent to the ‘y_true’ labels. Example: given train_label = [0, 0, 0, 0, 1, 1, 1, 1, 1, 1], then Y_range = [0, 4, 6]. The first value is always 0, marking the start of the first class, and the last value is always the number of classes + 1.
- X0ndarray
Shared coefficient vector of the training set given when calling self.fit().
Methods
fit(X, *, y_true, l_atoms, iterations[, ...])Train the LRSDL dictionary.
predict(X, *[, threshold, offset, with_losses])Predict the class of each window in X.
save(file)Save the dictionary to a file.
evaluate
loss
- __init__(lambd=0.01, lambd2=0.01, eta=0.0001, k=10, k0=5, updateX_iters=100, updateD_iters=100)[source]#
Initialize the LRSDL dictionary.
This method sets up the parameters required for training class-specific and shared dictionaries. These dictionaries are used to represent data with sparsity and low-rank properties, which can be regularized by the parameters defined below.
- Parameters:
- lambdfloat
Regularization parameter for the sparsity term:
\[\lambda \|X\|_1\]This encourages sparsity in the class-specific dictionary, similar to the LASSO regularization term.
- lambd2float
Regularization parameter for the reconstruction term:
\[\frac{\lambda_2}{2} \|X^0 - M^0\|^2\]This ensures that the shared vector (used to select shared atoms) is sparse and close to the mean shared vector, ensuring consistency across all \(X^0\).
- etafloat
Regularization parameter for the low-rank term:
\[\eta \| D^0 \|_*\]Here, \(\|\cdot\|_*\) is the nuclear norm, which enforces the shared dictionary to have low rank.
- kint
Number of class-specific atoms for each class. The total number of atoms in the class-specific dictionary is given by \(k \times C\), where \(C\) is the number of classes.
- k0int
Total number of shared atoms. A value of \(k_0 = 0\) indicates that no shared dictionary is used.
- updateX_iters, updateD_itersint
These parameters are passed to the parent class
LRSDL.__init__(). However, they are suspected to be dummy parameters because no usage of them could be found in the original implementation. They are retained here for compatibility but appear to have no functional effect in this class.
Warning
The updateX_iters and updateD_iters parameters are inherited from the parent class
LRSDL, but they appear to be unused in this implementation. Consider verifying their relevance before relying on them.Notes
The parameters lambd, lambd2, and eta control the sparsity and low-rank properties of the dictionaries.
Setting
k0 = 0disables the shared dictionary.
- evaluate(data, label)#
- fit(X, *, y_true, l_atoms, iterations, step=None, threshold=0, random_seed=None, verbose=False, show_after=5)[source]#
Train the LRSDL dictionary.
This method trains the dictionary using the provided data and allows for several configuration options:
Split the input data X into sliding windows of length l_atoms.
Use the entire input as a single window.
Discard training windows whose L2-norm is below a specified threshold.
The splitting behavior depends on the step parameter. If step is None, the entire input is treated as a single window. Otherwise, overlapping patches of size l_atoms are created with the specified step size.
- Parameters:
- Xndarray of shape (n_samples, n_features)
Training samples. The number of features must be equal to or greater than the dictionary’s atom size.
- y_truendarray of shape (n_samples,)
Labels corresponding to the samples in X. The length of y_true must equal the number of samples in X.
- l_atomsint
Length of the dictionary’s atoms.
- iterationsint
Number of training iterations.
- stepint, optional
The step size for splitting input samples into patches of length l_atoms. If not specified, it is set the same (step = l_atoms) so that all information abailable is extracted from X without any repetition (overlap).
- thresholdfloat, optional
L2-norm threshold (relative to the maximum L2-norm in each strain). Training windows with L2-norm below this value will be discarded. Default is 0 (only null arrays are discarded).
- random_seedint, optional
Random seed for reproducibility. Default is None.
- verbosebool, optional
If True, print verbose output during training. Default is False.
- show_afterint, optional
If verbose is True, progress will be displayed every show_after iterations. Default is 5.
- Returns:
- None
The method trains the dictionary in place.
Notes
Per-class sufficiency is now checked on the effective number of training windows (after patching and thresholding), not on the raw number of input strains per class.
- loss()#
- predict(X, *, threshold=0, offset=0, with_losses=False)[source]#
Predict the class of each window in X.
The class of a window is the class of the closest codeword to that window in the dictionary.
- Parameters:
- X2d-array, shape=(n_signals, n_samples)
Input signals, with equal or more samples than the atoms’.
- thresholdfloat, optional
Loss threshold ABOVE which signals will be marked as “unknown” class, which corresponds to the label value -1. Zero by default, all signals will be classified.
- offsetint, optional
Index i0 at which to crop the input signals X. The i1 will be offset + l_atoms. By default 0.
- with_lossesbool, optional
If True, return a tuple with the class predictions and the corresponding losses.
- Returns:
- y_pred1d-array, shape=(n_signals)
Class predictions for each input signal.
- losses1d-array, shape=(n_signals), optional
Losses of the closest codewords to each input signal. Only returned if with_losses=True.
- class clawdia.dictionaries.DictionarySpams(dict_init=None, model=None, signal_pool=None, a_length=None, d_size=None, wave_pos=None, patch_min=1, l2_normed=True, allow_allzeros=False, random_state=None, ignore_completeness=False, lambda1=None, batch_size=64, n_iter=None, n_train=None, trained=False, mode_traindl=0, modeD_traindl=0, mode_lasso=2, identifier='')[source]#
Bases:
objectSparse Dictionary Learning (SDL) model for waveform denoising via SPAMS.
This class provides an object-oriented implementation of a Sparse Dictionary Learning model, designed for the denoising and reconstruction of waveforms. At its core, it utilizes the trainDL function for dictionary learning and the lasso function for sparse coding from the SPAMS-python library [1].
It extends these core functionalities to arbitrarily long signals and minibatch processing for large datasets. Additionally, the class includes various utilities for signal preprocessing, composite models of denoising (such as iterative reconstruction), and the ability to easily save and load the dictionary’s state.
References
[1]SPAMS (for python), (http://spams-devel.gforge.inria.fr/). Last accessed in October 2018.
- Attributes:
- dict_initndarray
Atoms of the initial dictionary. Remains unaltered after training.
- componentsndarray
Atoms of the current (trained) dictionary.
- modeltuple
SPAMS’ trainDL model components in the form (A, B, iter).
- d_sizeint
Number of atoms in the dictionary (dictionary size).
- a_lengthint
Length of each atom in the dictionary (patch size).
- lambda1float
Regularization parameter for training the dictionary.
- batch_sizeint
Batch size used in mini-batch training.
- n_iterint
Number of iterations performed during training.
- t_trainfloat
Total training time in seconds.
- trainedbool
Indicates whether the dictionary has been trained.
- n_trainint
Number of patches used during training.
- mode_traindlint
Training mode for SPAMS’ trainDL function.
- modeD_traindlint
Dictionary mode for SPAMS’ trainDL function.
- mode_lassoint
Mode for SPAMS’ lasso function.
- identifierstr
Optional identifier or note for distinguishing the dictionary.
Methods
copy()Return a copy of the dictionary.
reconstruct(signal, sc_lambda[, step, ...])Reconstruct a signal as a sparse combination of dictionary atoms.
reconstruct_batch(signals, sc_lambda[, out, ...])TODO
reconstruct_iterative(signals[, sc_lambda, ...])Reconstruct multiple signals using iterative residual subtraction.
reconstruct_loss_optimised(strain, *, reference)Find the best reconstruction of a signal w.r.t.
reconstruct_margin_constrained(signal, *, ...)TODO
reconstruct_minibatch(signals, *, sc_lambda)TODO
reset()Reset the dictionary to its initial (untrained) state.
save(file)Save the current state of the DictionarySpams object to a file.
train(patches[, lambda1, n_iter, ...])Train the dictionary.
- __init__(dict_init=None, model=None, signal_pool=None, a_length=None, d_size=None, wave_pos=None, patch_min=1, l2_normed=True, allow_allzeros=False, random_state=None, ignore_completeness=False, lambda1=None, batch_size=64, n_iter=None, n_train=None, trained=False, mode_traindl=0, modeD_traindl=0, mode_lasso=2, identifier='')[source]#
Initialize the dictionary.
There are two ways to initialize the dictionary:
By directly providing the initial dictionary with dict_init.
By providing a collection of signals (signal_pool) from which atoms are randomly extracted to form the initial dictionary.
If the second option is used, a_length and d_size must be explicitly specified to define the size of the dictionary. Additional optional parameters provide more control over this process.
- Parameters:
- dict_initndarray of shape (d_size, a_length), optional
Atoms of the initial dictionary. If None, signal_pool must be provided.
- modeldict, optional
SPAMS’ trainDL model components as a dictionary with elements {A, B, iter}. Must be provided if continuing training from a previous state.
- signal_poolndarray of shape (n_signals, n_samples), optional
A collection of signals from which atoms are extracted to form the initial dictionary. Ignored if dict_init is provided.
- a_lengthint, optional
Length of each atom in the dictionary (patch size). Required if signal_pool is provided.
- d_sizeint, optional
Number of atoms in the dictionary. Required if signal_pool is provided.
- wave_posarray-like of shape (n_signals, 2), optional
Positions of waveforms within signal_pool to extract atoms from. If None, the entire array is used.
- patch_minint, default=1
Minimum number of samples for each extracted patch. Ignored if wave_pos is None.
- l2_normedbool, default=True
If True, normalize extracted atoms to their L2 norm.
- allow_allzerosbool, default=False
By default, random atoms with all zeros are excluded from the initial dictionary. If allow_allzeros=True, they are allowed.
- random_stateint, optional
Seed for random sampling from signal_pool.
- ignore_completenessbool, optional, default=False
If False, the dictionary must be overcomplete (d_size > a_length).
- lambda1float, optional
Regularization parameter for training.
- batch_sizeint, default=64
Batch size used during training.
- n_iterint, optional
Total number of iterations for training. If None, this must be set when calling the train method.
- n_trainint, optional
Number of patches used for training. Informational only.
- trainedbool, default=False
Indicates whether the dictionary is already trained.
- mode_traindlint, default=0
Training mode for SPAMS’ trainDL function. See SPAMS documentation.
- modeD_traindlint, default=0
Dictionary mode for SPAMS’ trainDL function. See SPAMS documentation.
- mode_lassoint, default=2
Mode for SPAMS’ lasso function. See SPAMS documentation.
- identifierstr, optional
A note or label for identifying the dictionary.
Notes
This method initializes the dictionary but does not train it. Use the train method for training.
- copy()[source]#
Return a copy of the dictionary.
Returns a new instance of the same dictionary with the same values and state.
- Returns:
- dico_copyDictionarySpams
A copy of the current dictionary.
- reconstruct(signal, sc_lambda, step=1, normed=True, with_code=False, **kwargs)[source]#
Reconstruct a signal as a sparse combination of dictionary atoms.
- Parameters:
- signalndarray
Sample to be reconstructed.
- sc_lambdafloat
Regularization parameter of the sparse coding transformation.
- stepint, 1 by default
Sample interval between each patch extracted from signal. Determines the number of patches to be extracted. 1 by default.
- normedboolean, True by default
Normalize the result to the maximum absolute value.
- with_codeboolean, False by default.
If True, also returns the coefficients array.
- **kwargs
Passed directly to the external learning function.
- Returns:
- signal_recarray
Reconstructed signal.
- codearray(a_length, d_size), optional
Transformed data, encoded as a sparse combination of atoms. Returned when ‘with_code’ is True.
- reconstruct_batch(signals, sc_lambda, out=None, step=1, normed=True, verbose=True, **kwargs)[source]#
TODO
Reconstruct multiple signals, each one as a sparse combination of dictionary atoms.
WARNING: Only viable for small ‘signals’ set, it is really memory expensive (all patches are stored in a single array in memory).
WARNING: ‘out’ deprecated, left for backwards compatibility but will be ignored if given.
- reconstruct_iterative(signals, sc_lambda=0.01, step=1, batchsize=64, max_iter=100, threshold=0.001, normed=True, full_output=False, verbose=True, kwargs_lasso={})[source]#
Reconstruct multiple signals using iterative residual subtraction.
This method reconstructs each signal by iteratively updating and accumulating reconstructions. In the first iteration, the original input signal is reconstructed and then subtracted from itself to obtain the initial residual. In each subsequent iteration, a new reconstruction is generated from the current residual and subtracted from it, producing an updated residual for the next iteration, while also being added to the cumulative reconstruction. The process repeats until the Euclidean norm of the difference between consecutive residuals falls below a specified threshold, which sets the convergence criterion.
NOTE: In contrast with the usual procedure, the windows into which each signal is split are not normalized. This is needed to enhance the dictionary discrimination. Otherwise, the residuals are amplified at each iteration, the algorithm takes longer to converge, and some ad-hoc tests showed it also messes up with the resulting shape.
- Parameters:
- signalsndarray
Input signals to be reconstructed, with each signal along the first dimension.
- sc_lambdafloat, optional
Sparsity control parameter for reconstruction.
- stepint, optional
Step size for the reconstruction.
- batchsizeint, optional
Number of signals processed in each minibatch.
- max_iterint, optional
Maximum number of iterations before stopping.
- thresholdfloat, optional
Convergence threshold based on the relative change in residuals.
- normedbool, optional
If True, the reconstructed signals are normalized after convergence.
- full_outputbool, optional
If True, returns additional output values (residuals and iteration counts).
- verbosebool, optional
If True, prints progress information at each iteration.
- kwargs_lassodict, optional
Additional arguments for the Lasso reconstruction method.
- Returns:
- ndarray or tuple
The final reconstructed signals. If full_output is True, also returns the residuals and the number of iterations per signal.
- reconstruct_loss_optimised(strain, *, reference, step=1, limits=None, loss_func='match', normed=True, kwargs_minimize={'bounds': (-2, 1), 'method': 'bounded', 'options': {'maxiter': 100, 'xatol': 0.04}}, kwargs_lasso={}, verbose=False)[source]#
Find the best reconstruction of a signal w.r.t. a reference.
Find the lambda which produces a reconstruction of the input ‘strain’ closest to the given ‘reference’, according to a chosen loss function: Match, Overlap, SSIM, or a custom one.
The minimisation is performed by SciPy’s ‘minimize_scalar’, with options specified through kwargs_minimize.
- Parameters:
- strain: ndarray
Input strain to be reconstructed (and optimized).
- reference: ndarray
Reference strain which to compare the reconstruction to.
- step: int, optional
Separation in samples between each window into which the input strain is split up to be reconstructed by the dictionary. Defaults to 1.
- limits: array-like, optional
Indices of limits to where compute the loss between the reconstruction and the reference strain.
- loss_func: str | callable, optional
If ‘str’, can be ‘match’ (default), ‘overlap’ or ‘ssim’. In both cases, their pseudo-distance is used. Refer to their documentation in ‘clawdia.estimators’ for more details. If ‘callable’, it must be a symmetric function of 2 arguments, over whose the ‘reference’ signal and the denoised signal will be passed. It must return a distance-like score between 0 (best) and 1 (worst) to guide the minimisation algorithm.
- normed: bool, optional
If True, returns the signal normed to its maximum absolute amplitude.
- kwargs_minimize: dict
Passed to SciPy’s minimize_scalar(**kwargs_minimize). Bracket or boundary values must be passed as np.log10(bounds).
- kwargs_lasso: dict, optional
Passed to Python-Spams’ lasso(**kwargs_lasso).
- verbose: bool, optional
Set the maximum verbosity (‘disp’: 3) to SciPy’s minimize_scalar and print info about the minimization results. False by default.
- Returns:
- rec: ndarray
Optimum reconstruction found.
- l_opt: float
Optimum value for lambda.
- loss: float
dOverlap (1 - Overlap)/2 or DSSIM (1 - SSIM)/2 between the optimized reconstruction and the reference.
- reconstruct_margin_constrained(signal: ndarray[Any, dtype[_ScalarType_co]], *, margin: int | tuple | list | ndarray[Any, dtype[_ScalarType_co]], lambda_lims: tuple | list, step: int = 1, normed=True, full_output=False, kwargs_bisect={}, kwargs_lasso={}) tuple[ndarray[Any, dtype[_ScalarType_co]], ndarray[Any, dtype[_ScalarType_co]], ndarray[Any, dtype[_ScalarType_co]]] | ndarray[Any, dtype[_ScalarType_co]][source]#
TODO
- reconstruct_minibatch(signals, *, sc_lambda, step=1, batchsize=4, normed=True, normed_windows=True, verbose=True, **kwargs)[source]#
TODO
Reconstruct multiple signals, each one as a sparse combination of dictionary atoms. Minibatch version.
- save(file)[source]#
Save the current state of the DictionarySpams object to a file.
This method saves all attributes of the object as a .npz file. If the object has not been trained, certain attributes (lambda1, n_train, and t_train) are removed to avoid potential issues when reloading the state.
- Parameters:
- filestr or file-like object
The file path or file object where the state of the object will be saved. If a string is provided, it specifies the path to the .npz file. If a file-like object is given, it must be writable in binary mode.
- train(patches, lambda1=None, n_iter=None, warm_start=False, verbose=False, threads=-1, **kwargs)[source]#
Train the dictionary.
Train the dictionary with the given patches.
This also allows a warm start using the previous components as initial dictionary, but only if the lambda1 parameter is the same. It can be thought of as adding more iterations to the training. Hence, providing different patches is discouraged and untested.
- Parameters:
- patches2d-array(signals, samples)
Training patches.
- lambda1float, optional
Regularization parameter of the learning algorithm. It is not needed if already specified at initialization.
- n_iterint, optional
Total number of iterations to perform. If a negative number is provided it will perform the computation during the corresponding number of seconds. For instance n_iter = -5 trains the dictionary during 5 seconds.
- warm_startbool
If True, use the previous components as initial dictionary. It can be thought of as adding more iterations to the training. Providing different patches is discouraged and untested.
- verbosebool, optional
If True print the iterations (might not be shown in real time).
- threadsint, optional
Number of threads to use during training, see [1].
- **kwargs
Passed directly to ‘spams.trainDL’, see [1].
See also
clawdia.lib.extract_patchesUseful for generating the training patches.