aim2dat.ml.transformers
¶
Scikit learn Transformer classes extracting features from crystals or molecules.
Module Contents¶
Classes¶
Extract ACSF descriptor as defined in doi:10.1063/1.3553717. This transformer class is |
|
Extract Warren Cowley like order parameters for each element as defined in |
|
Extract fractional concentrations of elements or kinds. |
|
Extract coordination numbers and distances between elements or kinds. |
|
Extract density of each element or kind. |
|
Extract the F-fingerprint for each element-pair as defined in |
|
Extract MBTR descriptor as defined in doi:10.1088/2632-2153/aca005. This transformer class |
|
Extract features based on interaction matrices as defined in doi:10.1002/qua.24917. |
|
Extract the partial radial distribution function for each element-pair as defined in |
|
Extract SOAP descriptor as defined in doi:10.1103/PhysRevB.87.184115. This transformer |
-
class aim2dat.ml.transformers.StructureACSFTransformer(r_cut=
7.5
, g2_params=None
, g3_params=None
, g4_params=None
, g5_params=None
, elements=None
, periodic=False
, sparse=False
, dscribe_n_jobs=1
, dscribe_only_physical_cores=False
, n_procs=1
, chunksize=50
, verbose=True
)[source]¶ Bases:
_BaseDscribeTransformer
Extract ACSF descriptor as defined in doi:10.1063/1.3553717. This transformer class is based on the implementations of the dscribe python package.
- Variables:¶
- r_cut : float¶
Cutoff value.
- g2_params : np.array¶
List of pairs of eta and R_s values for the G^2 functions.
- g3_params : np.array¶
List of kappa values for the G^3 functions.
- g4_params : np.array¶
List of triplets of eta, zeta and lambda values for G^4 functions.
- g5_params : np.array¶
List of triplets of eta, zeta and lambda values for G^5 functions.
- elements : list¶
List of atomic numbers or chemical symbols.
- periodic : bool¶
Whether to consider periodic boundary conditions.
- sparse : bool¶
Whether to return a sparse matrix or a dense array.
- dscribe_n_jobs : int¶
Number of jobs used by dscribe to calculate the interaction matrix.
- dscribe_only_physical_cores : bool¶
Whether to only use physicsl cores.
- n_procs : int (optional)¶
Number of parallel processes.
- chunksize : int (optional)¶
Number of structures handed to each process at once.
- verbose : bool (optional)¶
Whether to print a progress bar.
Overview
¶ Precomputed properties given as list of tuples consisting of input parameters and
¶ add_precomputed_properties
(parameters, structure_operations)Add precomputed properties.
Clear all precomputed properties.
fit
(X, y)Fit function that determines the number of features.
fit_transform
(X, y, **fit_params)Fit to data, then transform it.
get_feature_names_out
(input_features)Get feature names.
Get metadata routing of this object.
get_params
(deep)Get parameters for this estimator.
precompute_parameter_space
(param_grid, X)Precompute and store structural properties to be reused later e.g. for a grid search.
set_output
(*None, transform)Set output container.
set_params
(**params)Set the parameters of this estimator.
transform
(X)Transform structures to features.
- property precomputed_properties¶
Precomputed properties given as list of tuples consisting of input parameters and StructureOperations object.
- Type:¶
list
- add_precomputed_properties(parameters, structure_operations)¶
Add precomputed properties.
- Parameters:¶
parameters (dict) – Dictionary of input parameters.
structure_operations (StructureOperations) – StructureOperations object storing the properties according to the input parameters.
- clear_precomputed_properties()¶
Clear all precomputed properties.
-
fit(X, y=
None
)¶ Fit function that determines the number of features.
-
fit_transform(X, y=
None
, **fit_params)¶ Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:¶
X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters.
- Returns:¶
X_new (ndarray array of shape (n_samples, n_features_new)) – Transformed array.
-
get_feature_names_out(input_features=
None
)¶ Get feature names.
- get_metadata_routing()¶
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:¶
routing (MetadataRequest) – A
MetadataRequest
encapsulating routing information.
-
get_params(deep=
True
)¶ Get parameters for this estimator.
- precompute_parameter_space(param_grid, X)¶
Precompute and store structural properties to be reused later e.g. for a grid search.
- Parameters:¶
param_grid (list or dict) – Dictionary or list of dictionaries of input parameters.
X (list or aim2dat.strct.StructureCollection) – List of structures or StructureCollection.
-
set_output(*, transform=
None
)¶ Set output container.
See sphx_glr_auto_examples_miscellaneous_plot_set_output.py for an example on how to use the API.
- Parameters:¶
transform ({“default”, “pandas”}, default=None) – Configure output of transform and fit_transform.
“default”: Default output format of a transformer
“pandas”: DataFrame output
“polars”: Polars output
None: Transform configuration is unchanged
New in version 1.4: “polars” option was added.
- Returns:¶
self (estimator instance) – Estimator instance.
- set_params(**params)¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.
-
class aim2dat.ml.transformers.StructureChemOrderTransformer(r_max=
15.0
, max_shells=3
, n_procs=1
, chunksize=50
, verbose=True
)[source]¶ Bases:
_BaseStructureTransformer
Extract Warren Cowley like order parameters for each element as defined in doi:10.1103/PhysRevB.96.024104.
- Variables:¶
- r_max : float (optional)¶
Cut-off value for the maximum distance between two atoms in angstrom.
- max_shells : int (optional)¶
Number of neighbour shells that are evaluated.
- n_procs : int (optional)¶
Number of parallel processes.
- chunksize : int (optional)¶
Number of structures handed to each process at once.
- verbose : bool (optional)¶
Whether to print a progress bar.
Overview
¶ Precomputed properties given as list of tuples consisting of input parameters and
¶ add_precomputed_properties
(parameters, structure_operations)Add precomputed properties.
Clear all precomputed properties.
fit
(X, y)Fit function that determines the number of features.
fit_transform
(X, y, **fit_params)Fit to data, then transform it.
get_feature_names_out
(input_features)Get feature names.
Get metadata routing of this object.
get_params
(deep)Get parameters for this estimator.
precompute_parameter_space
(param_grid, X)Precompute and store structural properties to be reused later e.g. for a grid search.
set_output
(*None, transform)Set output container.
set_params
(**params)Set the parameters of this estimator.
transform
(X)Transform structures to features.
- property precomputed_properties¶
Precomputed properties given as list of tuples consisting of input parameters and StructureOperations object.
- Type:¶
list
- add_precomputed_properties(parameters, structure_operations)¶
Add precomputed properties.
- Parameters:¶
parameters (dict) – Dictionary of input parameters.
structure_operations (StructureOperations) – StructureOperations object storing the properties according to the input parameters.
- clear_precomputed_properties()¶
Clear all precomputed properties.
-
fit(X, y=
None
)¶ Fit function that determines the number of features.
-
fit_transform(X, y=
None
, **fit_params)¶ Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:¶
X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters.
- Returns:¶
X_new (ndarray array of shape (n_samples, n_features_new)) – Transformed array.
-
get_feature_names_out(input_features=
None
)¶ Get feature names.
- get_metadata_routing()¶
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:¶
routing (MetadataRequest) – A
MetadataRequest
encapsulating routing information.
-
get_params(deep=
True
)¶ Get parameters for this estimator.
- precompute_parameter_space(param_grid, X)¶
Precompute and store structural properties to be reused later e.g. for a grid search.
- Parameters:¶
param_grid (list or dict) – Dictionary or list of dictionaries of input parameters.
X (list or aim2dat.strct.StructureCollection) – List of structures or StructureCollection.
-
set_output(*, transform=
None
)¶ Set output container.
See sphx_glr_auto_examples_miscellaneous_plot_set_output.py for an example on how to use the API.
- Parameters:¶
transform ({“default”, “pandas”}, default=None) – Configure output of transform and fit_transform.
“default”: Default output format of a transformer
“pandas”: DataFrame output
“polars”: Polars output
None: Transform configuration is unchanged
New in version 1.4: “polars” option was added.
- Returns:¶
self (estimator instance) – Estimator instance.
- set_params(**params)¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.
-
class aim2dat.ml.transformers.StructureCompositionTransformer(distinguish_kinds=
False
, n_procs=1
, chunksize=50
, verbose=True
)[source]¶ Bases:
_BaseStructureTransformer
Extract fractional concentrations of elements or kinds.
- Variables:¶
Overview
¶ Precomputed properties given as list of tuples consisting of input parameters and
¶ add_precomputed_properties
(parameters, structure_operations)Add precomputed properties.
Clear all precomputed properties.
fit
(X, y)Fit function that determines the number of features.
fit_transform
(X, y, **fit_params)Fit to data, then transform it.
get_feature_names_out
(input_features)Get feature names.
Get metadata routing of this object.
get_params
(deep)Get parameters for this estimator.
precompute_parameter_space
(param_grid, X)Precompute and store structural properties to be reused later e.g. for a grid search.
set_output
(*None, transform)Set output container.
set_params
(**params)Set the parameters of this estimator.
transform
(X)Transform structures to features.
- property precomputed_properties¶
Precomputed properties given as list of tuples consisting of input parameters and StructureOperations object.
- Type:¶
list
- add_precomputed_properties(parameters, structure_operations)¶
Add precomputed properties.
- Parameters:¶
parameters (dict) – Dictionary of input parameters.
structure_operations (StructureOperations) – StructureOperations object storing the properties according to the input parameters.
- clear_precomputed_properties()¶
Clear all precomputed properties.
-
fit(X, y=
None
)¶ Fit function that determines the number of features.
-
fit_transform(X, y=
None
, **fit_params)¶ Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:¶
X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters.
- Returns:¶
X_new (ndarray array of shape (n_samples, n_features_new)) – Transformed array.
-
get_feature_names_out(input_features=
None
)¶ Get feature names.
- get_metadata_routing()¶
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:¶
routing (MetadataRequest) – A
MetadataRequest
encapsulating routing information.
-
get_params(deep=
True
)¶ Get parameters for this estimator.
- precompute_parameter_space(param_grid, X)¶
Precompute and store structural properties to be reused later e.g. for a grid search.
- Parameters:¶
param_grid (list or dict) – Dictionary or list of dictionaries of input parameters.
X (list or aim2dat.strct.StructureCollection) – List of structures or StructureCollection.
-
set_output(*, transform=
None
)¶ Set output container.
See sphx_glr_auto_examples_miscellaneous_plot_set_output.py for an example on how to use the API.
- Parameters:¶
transform ({“default”, “pandas”}, default=None) – Configure output of transform and fit_transform.
“default”: Default output format of a transformer
“pandas”: DataFrame output
“polars”: Polars output
None: Transform configuration is unchanged
New in version 1.4: “polars” option was added.
- Returns:¶
self (estimator instance) – Estimator instance.
- set_params(**params)¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.
-
class aim2dat.ml.transformers.StructureCoordinationTransformer(r_max=
15.0
, method='minimum_distance'
, min_dist_delta=0.1
, n_nearest_neighbours=5
, econ_tolerance=0.5
, econ_conv_threshold=0.001
, voronoi_weight_type='rel_solid_angle'
, voronoi_weight_threshold=0.5
, feature_types=('nrs_avg', 'nrs_stdev', 'nrs_max', 'nrs_min', 'distance_avg', 'distance_stdev', 'distance_max', 'distance_min')
, n_procs=1
, chunksize=50
, verbose=True
)[source]¶ Bases:
_BaseStructureTransformer
Extract coordination numbers and distances between elements or kinds.
- Variables:¶
- r_max : float (optional)¶
Cut-off value for the maximum distance between two atoms in angstrom.
- method : str (optional)¶
Method used to calculate the coordination environment. The default value is
'minimum_distance'
.- min_dist_delta : float (optional)¶
Tolerance parameter that defines the relative distance from the nearest neighbour atom for the
'minimum_distance'
method.- n_nearest_neighbours : int (optional)¶
Number of neighbours that are considered coordinated for the
'n_neighbours'
method.- econ_tolerance : float (optional)¶
Tolerance parameter for the econ method.
- econ_conv_threshold : float (optional)¶
Convergence threshold for the econ method.
- okeeffe_weight_threshold : float (optional)
Threshold parameter to distinguish indirect and direct neighbour atoms for the
'okeeffe'
.- feature_types : tuple or str (optional)¶
Tuple of features that are extracted. Supported options are:
'nrs_avg'
,'nrs_stdev'
,'nrs_max'
,'nrs_min'
,'distance_avg'
,'distance_stdev'
,'distance_max'
and'distance_min'
.- n_procs : int (optional)¶
Number of parallel processes.
- chunksize : int (optional)¶
Number of structures handed to each process at once.
- verbose : bool (optional)¶
Whether to print a progress bar.
Overview
¶ Feature types that are included.
Precomputed properties given as list of tuples consisting of input parameters and
¶ add_precomputed_properties
(parameters, structure_operations)Add precomputed properties.
Clear all precomputed properties.
fit
(X, y)Fit function that determines the number of features.
fit_transform
(X, y, **fit_params)Fit to data, then transform it.
get_feature_names_out
(input_features)Get feature names.
Get metadata routing of this object.
get_params
(deep)Get parameters for this estimator.
precompute_parameter_space
(param_grid, X)Precompute and store structural properties to be reused later e.g. for a grid search.
set_output
(*None, transform)Set output container.
set_params
(**params)Set the parameters of this estimator.
transform
(X)Transform structures to features.
- property precomputed_properties¶
Precomputed properties given as list of tuples consisting of input parameters and StructureOperations object.
- Type:¶
list
- add_precomputed_properties(parameters, structure_operations)¶
Add precomputed properties.
- Parameters:¶
parameters (dict) – Dictionary of input parameters.
structure_operations (StructureOperations) – StructureOperations object storing the properties according to the input parameters.
- clear_precomputed_properties()¶
Clear all precomputed properties.
-
fit(X, y=
None
)¶ Fit function that determines the number of features.
-
fit_transform(X, y=
None
, **fit_params)¶ Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:¶
X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters.
- Returns:¶
X_new (ndarray array of shape (n_samples, n_features_new)) – Transformed array.
-
get_feature_names_out(input_features=
None
)¶ Get feature names.
- get_metadata_routing()¶
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:¶
routing (MetadataRequest) – A
MetadataRequest
encapsulating routing information.
-
get_params(deep=
True
)¶ Get parameters for this estimator.
- precompute_parameter_space(param_grid, X)¶
Precompute and store structural properties to be reused later e.g. for a grid search.
- Parameters:¶
param_grid (list or dict) – Dictionary or list of dictionaries of input parameters.
X (list or aim2dat.strct.StructureCollection) – List of structures or StructureCollection.
-
set_output(*, transform=
None
)¶ Set output container.
See sphx_glr_auto_examples_miscellaneous_plot_set_output.py for an example on how to use the API.
- Parameters:¶
transform ({“default”, “pandas”}, default=None) – Configure output of transform and fit_transform.
“default”: Default output format of a transformer
“pandas”: DataFrame output
“polars”: Polars output
None: Transform configuration is unchanged
New in version 1.4: “polars” option was added.
- Returns:¶
self (estimator instance) – Estimator instance.
- set_params(**params)¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.
-
class aim2dat.ml.transformers.StructureDensityTransformer(distinguish_kinds=
False
, n_procs=1
, chunksize=50
, verbose=True
)[source]¶ Bases:
_BaseStructureTransformer
Extract density of each element or kind.
- Variables:¶
Overview
¶ Precomputed properties given as list of tuples consisting of input parameters and
¶ add_precomputed_properties
(parameters, structure_operations)Add precomputed properties.
Clear all precomputed properties.
fit
(X, y)Fit function that determines the number of features.
fit_transform
(X, y, **fit_params)Fit to data, then transform it.
get_feature_names_out
(input_features)Get feature names.
Get metadata routing of this object.
get_params
(deep)Get parameters for this estimator.
precompute_parameter_space
(param_grid, X)Precompute and store structural properties to be reused later e.g. for a grid search.
set_output
(*None, transform)Set output container.
set_params
(**params)Set the parameters of this estimator.
transform
(X)Transform structures to features.
- property precomputed_properties¶
Precomputed properties given as list of tuples consisting of input parameters and StructureOperations object.
- Type:¶
list
- add_precomputed_properties(parameters, structure_operations)¶
Add precomputed properties.
- Parameters:¶
parameters (dict) – Dictionary of input parameters.
structure_operations (StructureOperations) – StructureOperations object storing the properties according to the input parameters.
- clear_precomputed_properties()¶
Clear all precomputed properties.
-
fit(X, y=
None
)¶ Fit function that determines the number of features.
-
fit_transform(X, y=
None
, **fit_params)¶ Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:¶
X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters.
- Returns:¶
X_new (ndarray array of shape (n_samples, n_features_new)) – Transformed array.
-
get_feature_names_out(input_features=
None
)¶ Get feature names.
- get_metadata_routing()¶
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:¶
routing (MetadataRequest) – A
MetadataRequest
encapsulating routing information.
-
get_params(deep=
True
)¶ Get parameters for this estimator.
- precompute_parameter_space(param_grid, X)¶
Precompute and store structural properties to be reused later e.g. for a grid search.
- Parameters:¶
param_grid (list or dict) – Dictionary or list of dictionaries of input parameters.
X (list or aim2dat.strct.StructureCollection) – List of structures or StructureCollection.
-
set_output(*, transform=
None
)¶ Set output container.
See sphx_glr_auto_examples_miscellaneous_plot_set_output.py for an example on how to use the API.
- Parameters:¶
transform ({“default”, “pandas”}, default=None) – Configure output of transform and fit_transform.
“default”: Default output format of a transformer
“pandas”: DataFrame output
“polars”: Polars output
None: Transform configuration is unchanged
New in version 1.4: “polars” option was added.
- Returns:¶
self (estimator instance) – Estimator instance.
- set_params(**params)¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.
-
class aim2dat.ml.transformers.StructureFFPrintTransformer(r_max=
15.0
, delta_bin=0.005
, sigma=10.0
, distinguish_kinds=False
, add_header=False
, use_weights=False
, n_procs=1
, chunksize=50
, verbose=True
)[source]¶ Bases:
_BaseStructureTransformer
Extract the F-fingerprint for each element-pair as defined in doi:10.1103/PhysRevB.96.024104.
- Variables:¶
- r_max : float (optional)¶
Cut-off value for the maximum distance between two atoms in angstrom.
- delta_bin : float (optional)¶
Bin size to descritize the function in angstrom.
- sigma : float (optional)¶
Smearing parameter for the Gaussian function.
- distinguish_kinds : bool (optional)¶
Whether different kinds should be distinguished e.g. Ni0 and Ni1 would be considered as different elements if
True
.- add_header : bool¶
Add leading entries that describe the weights and composition for the ffprint kernels.
- n_procs : int (optional)¶
Number of parallel processes.
- chunksize : int (optional)¶
Number of structures handed to each process at once.
- verbose : bool (optional)¶
Whether to print a progress bar.
Overview
¶ Precomputed properties given as list of tuples consisting of input parameters and
¶ add_precomputed_properties
(parameters, structure_operations)Add precomputed properties.
Clear all precomputed properties.
fit
(X, y)Fit function that determines the number of features.
fit_transform
(X, y, **fit_params)Fit to data, then transform it.
get_feature_names_out
(input_features)Get feature names.
Get metadata routing of this object.
get_params
(deep)Get parameters for this estimator.
precompute_parameter_space
(param_grid, X)Precompute and store structural properties to be reused later e.g. for a grid search.
set_output
(*None, transform)Set output container.
set_params
(**params)Set the parameters of this estimator.
transform
(X)Transform structures to features.
- property precomputed_properties¶
Precomputed properties given as list of tuples consisting of input parameters and StructureOperations object.
- Type:¶
list
- add_precomputed_properties(parameters, structure_operations)¶
Add precomputed properties.
- Parameters:¶
parameters (dict) – Dictionary of input parameters.
structure_operations (StructureOperations) – StructureOperations object storing the properties according to the input parameters.
- clear_precomputed_properties()¶
Clear all precomputed properties.
-
fit(X, y=
None
)¶ Fit function that determines the number of features.
-
fit_transform(X, y=
None
, **fit_params)¶ Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:¶
X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters.
- Returns:¶
X_new (ndarray array of shape (n_samples, n_features_new)) – Transformed array.
-
get_feature_names_out(input_features=
None
)¶ Get feature names.
- get_metadata_routing()¶
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:¶
routing (MetadataRequest) – A
MetadataRequest
encapsulating routing information.
-
get_params(deep=
True
)¶ Get parameters for this estimator.
- precompute_parameter_space(param_grid, X)¶
Precompute and store structural properties to be reused later e.g. for a grid search.
- Parameters:¶
param_grid (list or dict) – Dictionary or list of dictionaries of input parameters.
X (list or aim2dat.strct.StructureCollection) – List of structures or StructureCollection.
-
set_output(*, transform=
None
)¶ Set output container.
See sphx_glr_auto_examples_miscellaneous_plot_set_output.py for an example on how to use the API.
- Parameters:¶
transform ({“default”, “pandas”}, default=None) – Configure output of transform and fit_transform.
“default”: Default output format of a transformer
“pandas”: DataFrame output
“polars”: Polars output
None: Transform configuration is unchanged
New in version 1.4: “polars” option was added.
- Returns:¶
self (estimator instance) – Estimator instance.
- set_params(**params)¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.
-
class aim2dat.ml.transformers.StructureMBTRTransformer(geometry=
{'function': 'inverse_distance'}
, grid={'min': 0, 'max': 1, 'n': 100, 'sigma': 0.1}
, weighting={'function': 'exp', 'scale': 1.0, 'threshold': 0.001}
, normalize_gaussians=True
, normalization='l2'
, elements=None
, periodic=False
, sparse=False
, dscribe_n_jobs=1
, dscribe_only_physical_cores=False
, n_procs=1
, chunksize=50
, verbose=True
)[source]¶ Bases:
_BaseDscribeTransformer
Extract MBTR descriptor as defined in doi:10.1088/2632-2153/aca005. This transformer class is based on the implementations of the dscribe python package.
- Variables:¶
- geometry : dict¶
Setup the geometry function.
- grid : dict¶
Setup the discretization grid.
- weighting : dict¶
Setup the weighting function and its parameters.
- normalize_gaussians : bool¶
Whether to normalize the gaussians to an area of 1.
- normalization : str¶
Method for normalizing. Supported options are
'none'
,'l2'
,'n_atoms'
,'valle_oganov'
.- elements : list¶
List of atomic numbers or chemical symbols.
- periodic : bool¶
Whether to consider periodic boundary conditions.
- sparse : bool¶
Whether to return a sparse matrix or a dense array.
- dscribe_n_jobs : int¶
Number of jobs used by dscribe to calculate the interaction matrix.
- dscribe_only_physical_cores : bool¶
Whether to only use physicsl cores.
- n_procs : int (optional)¶
Number of parallel processes.
- chunksize : int (optional)¶
Number of structures handed to each process at once.
- verbose : bool (optional)¶
Whether to print a progress bar.
Overview
¶ Precomputed properties given as list of tuples consisting of input parameters and
¶ add_precomputed_properties
(parameters, structure_operations)Add precomputed properties.
Clear all precomputed properties.
fit
(X, y)Fit function that determines the number of features.
fit_transform
(X, y, **fit_params)Fit to data, then transform it.
get_feature_names_out
(input_features)Get feature names.
Get metadata routing of this object.
get_params
(deep)Get parameters for this estimator.
precompute_parameter_space
(param_grid, X)Precompute and store structural properties to be reused later e.g. for a grid search.
set_output
(*None, transform)Set output container.
set_params
(**params)Set the parameters of this estimator.
transform
(X)Transform structures to features.
- property precomputed_properties¶
Precomputed properties given as list of tuples consisting of input parameters and StructureOperations object.
- Type:¶
list
- add_precomputed_properties(parameters, structure_operations)¶
Add precomputed properties.
- Parameters:¶
parameters (dict) – Dictionary of input parameters.
structure_operations (StructureOperations) – StructureOperations object storing the properties according to the input parameters.
- clear_precomputed_properties()¶
Clear all precomputed properties.
-
fit(X, y=
None
)¶ Fit function that determines the number of features.
-
fit_transform(X, y=
None
, **fit_params)¶ Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:¶
X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters.
- Returns:¶
X_new (ndarray array of shape (n_samples, n_features_new)) – Transformed array.
-
get_feature_names_out(input_features=
None
)¶ Get feature names.
- get_metadata_routing()¶
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:¶
routing (MetadataRequest) – A
MetadataRequest
encapsulating routing information.
-
get_params(deep=
True
)¶ Get parameters for this estimator.
- precompute_parameter_space(param_grid, X)¶
Precompute and store structural properties to be reused later e.g. for a grid search.
- Parameters:¶
param_grid (list or dict) – Dictionary or list of dictionaries of input parameters.
X (list or aim2dat.strct.StructureCollection) – List of structures or StructureCollection.
-
set_output(*, transform=
None
)¶ Set output container.
See sphx_glr_auto_examples_miscellaneous_plot_set_output.py for an example on how to use the API.
- Parameters:¶
transform ({“default”, “pandas”}, default=None) – Configure output of transform and fit_transform.
“default”: Default output format of a transformer
“pandas”: DataFrame output
“polars”: Polars output
None: Transform configuration is unchanged
New in version 1.4: “polars” option was added.
- Returns:¶
self (estimator instance) – Estimator instance.
- set_params(**params)¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.
-
class aim2dat.ml.transformers.StructureMatrixTransformer(matrix_type=
'coulomb'
, n_atoms_max=None
, enforce_real=False
, permutation='eigenspectrum'
, sigma=None
, seed=None
, sparse=False
, ewald_accuracy=1e-05
, ewald_w=1
, ewald_r_cut=None
, ewald_g_cut=None
, ewald_a=None
, dscribe_n_jobs=1
, dscribe_only_physical_cores=False
, n_procs=1
, chunksize=50
, verbose=True
)[source]¶ Bases:
_BaseStructureTransformer
Extract features based on interaction matrices as defined in doi:10.1002/qua.24917. This transformer class is based on the implementations of the dscribe python package.
- Variables:¶
- matrix_type : str¶
Matrix type. Supported options are
'coulomb'
,'ewald_sum'
or'sine'
.- permutation : str¶
Defines the output format. Options are:
'none'
,'sorted_l2'
,'eigenspectrum'
or'random'
.- sigma : float¶
Standar deviation of the Gaussian distributed noise when using
'random'
forpermutation
.- seed : int¶
Seed for the random numbers in case
'random'
is chosen for thepermutation
attibute.- sparse : bool¶
Whether to return a sparse matrix or a dense 1D array.
- ewald_accuracy : float¶
Accuracy threshold for the Ewald sum.
- ewald_w : int¶
Weight parameter.
- ewald_r_cut : float or None¶
Real space cutoff parameter.
- ewald_g_cut : float or None¶
Reciprocal space cutoff parameter.
- ewald_a : float or None¶
Parameter controlling the width of the Gaussian functions.
- dscribe_n_jobs : int¶
Number of jobs used by dscribe to calculate the interaction matrix.
- dscribe_only_physical_cores : bool¶
Whether to only use physicsl cores.
- n_procs : int (optional)¶
Number of parallel processes.
- chunksize : int (optional)¶
Number of structures handed to each process at once.
- verbose : bool (optional)¶
Whether to print a progress bar.
Overview
¶ Precomputed properties given as list of tuples consisting of input parameters and
¶ add_precomputed_properties
(parameters, structure_operations)Add precomputed properties.
Clear all precomputed properties.
fit
(X, y)Fit function that determines the number of features.
fit_transform
(X, y, **fit_params)Fit to data, then transform it.
get_feature_names_out
(input_features)Get feature names.
Get metadata routing of this object.
get_params
(deep)Get parameters for this estimator.
precompute_parameter_space
(param_grid, X)Precompute and store structural properties to be reused later e.g. for a grid search.
set_output
(*None, transform)Set output container.
set_params
(**params)Set the parameters of this estimator.
transform
(X)Transform structures to features.
- property precomputed_properties¶
Precomputed properties given as list of tuples consisting of input parameters and StructureOperations object.
- Type:¶
list
- add_precomputed_properties(parameters, structure_operations)¶
Add precomputed properties.
- Parameters:¶
parameters (dict) – Dictionary of input parameters.
structure_operations (StructureOperations) – StructureOperations object storing the properties according to the input parameters.
- clear_precomputed_properties()¶
Clear all precomputed properties.
-
fit(X, y=
None
)¶ Fit function that determines the number of features.
-
fit_transform(X, y=
None
, **fit_params)¶ Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:¶
X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters.
- Returns:¶
X_new (ndarray array of shape (n_samples, n_features_new)) – Transformed array.
-
get_feature_names_out(input_features=
None
)¶ Get feature names.
- get_metadata_routing()¶
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:¶
routing (MetadataRequest) – A
MetadataRequest
encapsulating routing information.
-
get_params(deep=
True
)¶ Get parameters for this estimator.
- precompute_parameter_space(param_grid, X)¶
Precompute and store structural properties to be reused later e.g. for a grid search.
- Parameters:¶
param_grid (list or dict) – Dictionary or list of dictionaries of input parameters.
X (list or aim2dat.strct.StructureCollection) – List of structures or StructureCollection.
-
set_output(*, transform=
None
)¶ Set output container.
See sphx_glr_auto_examples_miscellaneous_plot_set_output.py for an example on how to use the API.
- Parameters:¶
transform ({“default”, “pandas”}, default=None) – Configure output of transform and fit_transform.
“default”: Default output format of a transformer
“pandas”: DataFrame output
“polars”: Polars output
None: Transform configuration is unchanged
New in version 1.4: “polars” option was added.
- Returns:¶
self (estimator instance) – Estimator instance.
- set_params(**params)¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.
-
class aim2dat.ml.transformers.StructurePRDFTransformer(r_max=
15.0
, delta_bin=0.005
, distinguish_kinds=False
, n_procs=1
, chunksize=50
, verbose=True
)[source]¶ Bases:
_BaseStructureTransformer
Extract the partial radial distribution function for each element-pair as defined in doi:10.1103/PhysRevB.89.205118.
- Variables:¶
- r_max : float (optional)¶
Cut-off value for the maximum distance between two atoms in angstrom.
- delta_bin : float (optional)¶
Bin size to descritize the function in angstrom.
- distinguish_kinds : bool (optional)¶
Whether different kinds should be distinguished e.g. Ni0 and Ni1 would be considered as different elements if
True
.- n_procs : int (optional)¶
Number of parallel processes.
- chunksize : int (optional)¶
Number of structures handed to each process at once.
- verbose : bool (optional)¶
Whether to print a progress bar.
Overview
¶ Precomputed properties given as list of tuples consisting of input parameters and
¶ add_precomputed_properties
(parameters, structure_operations)Add precomputed properties.
Clear all precomputed properties.
fit
(X, y)Fit function that determines the number of features.
fit_transform
(X, y, **fit_params)Fit to data, then transform it.
get_feature_names_out
(input_features)Get feature names.
Get metadata routing of this object.
get_params
(deep)Get parameters for this estimator.
precompute_parameter_space
(param_grid, X)Precompute and store structural properties to be reused later e.g. for a grid search.
set_output
(*None, transform)Set output container.
set_params
(**params)Set the parameters of this estimator.
transform
(X)Transform structures to features.
- property precomputed_properties¶
Precomputed properties given as list of tuples consisting of input parameters and StructureOperations object.
- Type:¶
list
- add_precomputed_properties(parameters, structure_operations)¶
Add precomputed properties.
- Parameters:¶
parameters (dict) – Dictionary of input parameters.
structure_operations (StructureOperations) – StructureOperations object storing the properties according to the input parameters.
- clear_precomputed_properties()¶
Clear all precomputed properties.
-
fit(X, y=
None
)¶ Fit function that determines the number of features.
-
fit_transform(X, y=
None
, **fit_params)¶ Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:¶
X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters.
- Returns:¶
X_new (ndarray array of shape (n_samples, n_features_new)) – Transformed array.
-
get_feature_names_out(input_features=
None
)¶ Get feature names.
- get_metadata_routing()¶
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:¶
routing (MetadataRequest) – A
MetadataRequest
encapsulating routing information.
-
get_params(deep=
True
)¶ Get parameters for this estimator.
- precompute_parameter_space(param_grid, X)¶
Precompute and store structural properties to be reused later e.g. for a grid search.
- Parameters:¶
param_grid (list or dict) – Dictionary or list of dictionaries of input parameters.
X (list or aim2dat.strct.StructureCollection) – List of structures or StructureCollection.
-
set_output(*, transform=
None
)¶ Set output container.
See sphx_glr_auto_examples_miscellaneous_plot_set_output.py for an example on how to use the API.
- Parameters:¶
transform ({“default”, “pandas”}, default=None) – Configure output of transform and fit_transform.
“default”: Default output format of a transformer
“pandas”: DataFrame output
“polars”: Polars output
None: Transform configuration is unchanged
New in version 1.4: “polars” option was added.
- Returns:¶
self (estimator instance) – Estimator instance.
- set_params(**params)¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.
-
class aim2dat.ml.transformers.StructureSOAPTransformer(r_cut=
7.5
, n_max=8
, l_max=6
, sigma=1.0
, rbf='gto'
, weighting=None
, compression={'mode': 'off', 'species_weighting': None}
, average='off'
, elements=None
, periodic=False
, sparse=False
, dscribe_n_jobs=1
, dscribe_only_physical_cores=False
, n_procs=1
, chunksize=50
, verbose=True
)[source]¶ Bases:
_BaseDscribeTransformer
Extract SOAP descriptor as defined in doi:10.1103/PhysRevB.87.184115. This transformer class is based on the implementations of the dscribe python package.
- Variables:¶
- r_cut : float¶
Cutoff value.
- n_max : int¶
The number of radial basis functions.
- l_max : int¶
The maximum degree of spherical harmonics.
- sigma : float¶
The standard deviation of the gaussians.
- rbf : str¶
The radial basis functions to use. Supported options are:
'gto'
or'polynomial'
.- weighting : dict¶
Contains the options which control the weighting of the atomic density.
- compression : dict¶
Feature compression options.
- average : str¶
The averaging mode over the centers of interest. Supported options are:
'off'
,'inner'
or'outer'
.- elements : list¶
List of atomic numbers or chemical symbols.
- periodic : bool¶
Whether to consider periodic boundary conditions.
- sparse : bool¶
Whether to return a sparse matrix or a dense array.
- dscribe_n_jobs : int¶
Number of jobs used by dscribe to calculate the interaction matrix.
- dscribe_only_physical_cores : bool¶
Whether to only use physicsl cores.
- n_procs : int (optional)¶
Number of parallel processes.
- chunksize : int (optional)¶
Number of structures handed to each process at once.
- verbose : bool (optional)¶
Whether to print a progress bar.
Overview
¶ Precomputed properties given as list of tuples consisting of input parameters and
¶ add_precomputed_properties
(parameters, structure_operations)Add precomputed properties.
Clear all precomputed properties.
fit
(X, y)Fit function that determines the number of features.
fit_transform
(X, y, **fit_params)Fit to data, then transform it.
get_feature_names_out
(input_features)Get feature names.
Get metadata routing of this object.
get_params
(deep)Get parameters for this estimator.
precompute_parameter_space
(param_grid, X)Precompute and store structural properties to be reused later e.g. for a grid search.
set_output
(*None, transform)Set output container.
set_params
(**params)Set the parameters of this estimator.
transform
(X)Transform structures to features.
- property precomputed_properties¶
Precomputed properties given as list of tuples consisting of input parameters and StructureOperations object.
- Type:¶
list
- add_precomputed_properties(parameters, structure_operations)¶
Add precomputed properties.
- Parameters:¶
parameters (dict) – Dictionary of input parameters.
structure_operations (StructureOperations) – StructureOperations object storing the properties according to the input parameters.
- clear_precomputed_properties()¶
Clear all precomputed properties.
-
fit(X, y=
None
)¶ Fit function that determines the number of features.
-
fit_transform(X, y=
None
, **fit_params)¶ Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:¶
X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters.
- Returns:¶
X_new (ndarray array of shape (n_samples, n_features_new)) – Transformed array.
-
get_feature_names_out(input_features=
None
)¶ Get feature names.
- get_metadata_routing()¶
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:¶
routing (MetadataRequest) – A
MetadataRequest
encapsulating routing information.
-
get_params(deep=
True
)¶ Get parameters for this estimator.
- precompute_parameter_space(param_grid, X)¶
Precompute and store structural properties to be reused later e.g. for a grid search.
- Parameters:¶
param_grid (list or dict) – Dictionary or list of dictionaries of input parameters.
X (list or aim2dat.strct.StructureCollection) – List of structures or StructureCollection.
-
set_output(*, transform=
None
)¶ Set output container.
See sphx_glr_auto_examples_miscellaneous_plot_set_output.py for an example on how to use the API.
- Parameters:¶
transform ({“default”, “pandas”}, default=None) – Configure output of transform and fit_transform.
“default”: Default output format of a transformer
“pandas”: DataFrame output
“polars”: Polars output
None: Transform configuration is unchanged
New in version 1.4: “polars” option was added.
- Returns:¶
self (estimator instance) – Estimator instance.
- set_params(**params)¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.