aim2dat.ml.transformers

Scikit learn Transformer classes extracting features from crystals or molecules.

Module Contents

Classes

StructureACSFTransformer

Extract ACSF descriptor as defined in doi:10.1063/1.3553717. This transformer class is

StructureChemOrderTransformer

Extract Warren Cowley like order parameters for each element as defined in

StructureCompositionTransformer

Extract fractional concentrations of elements or kinds.

StructureCoordinationTransformer

Extract coordination numbers and distances between elements or kinds.

StructureDensityTransformer

Extract density of each element or kind.

StructureFFPrintTransformer

Extract the F-fingerprint for each element-pair as defined in

StructureMBTRTransformer

Extract MBTR descriptor as defined in doi:10.1088/2632-2153/aca005. This transformer class

StructureMatrixTransformer

Extract features based on interaction matrices as defined in doi:10.1002/qua.24917.

StructurePRDFTransformer

Extract the partial radial distribution function for each element-pair as defined in

StructureSOAPTransformer

Extract SOAP descriptor as defined in doi:10.1103/PhysRevB.87.184115. This transformer

class aim2dat.ml.transformers.StructureACSFTransformer(r_cut=7.5, g2_params=None, g3_params=None, g4_params=None, g5_params=None, elements=None, periodic=False, sparse=False, dscribe_n_jobs=1, dscribe_only_physical_cores=False, n_procs=1, chunksize=50, verbose=True)[source]

Bases: _BaseDscribeTransformer

Extract ACSF descriptor as defined in doi:10.1063/1.3553717. This transformer class is based on the implementations of the dscribe python package.

Variables:
r_cut : float

Cutoff value.

g2_params : np.array

List of pairs of eta and R_s values for the G^2 functions.

g3_params : np.array

List of kappa values for the G^3 functions.

g4_params : np.array

List of triplets of eta, zeta and lambda values for G^4 functions.

g5_params : np.array

List of triplets of eta, zeta and lambda values for G^5 functions.

elements : list

List of atomic numbers or chemical symbols.

periodic : bool

Whether to consider periodic boundary conditions.

sparse : bool

Whether to return a sparse matrix or a dense array.

dscribe_n_jobs : int

Number of jobs used by dscribe to calculate the interaction matrix.

dscribe_only_physical_cores : bool

Whether to only use physicsl cores.

n_procs : int (optional)

Number of parallel processes.

chunksize : int (optional)

Number of structures handed to each process at once.

verbose : bool (optional)

Whether to print a progress bar.

Overview

Properties

precomputed_properties

Precomputed properties given as list of tuples consisting of input parameters and

Methods

add_precomputed_properties(parameters, structure_operations)

Add precomputed properties.

clear_precomputed_properties()

Clear all precomputed properties.

fit(X, y)

Fit function that determines the number of features.

fit_transform(X, y, **fit_params)

Fit to data, then transform it.

get_feature_names_out(input_features)

Get feature names.

get_metadata_routing()

Get metadata routing of this object.

get_params(deep)

Get parameters for this estimator.

precompute_parameter_space(param_grid, X)

Precompute and store structural properties to be reused later e.g. for a grid search.

set_output(*None, transform)

Set output container.

set_params(**params)

Set the parameters of this estimator.

transform(X)

Transform structures to features.

property precomputed_properties

Precomputed properties given as list of tuples consisting of input parameters and StructureOperations object.

Type:

list

add_precomputed_properties(parameters, structure_operations)

Add precomputed properties.

Parameters:
  • parameters (dict) – Dictionary of input parameters.

  • structure_operations (StructureOperations) – StructureOperations object storing the properties according to the input parameters.

clear_precomputed_properties()

Clear all precomputed properties.

fit(X, y=None)

Fit function that determines the number of features.

Parameters:
  • X (list or aim2dat.strct.StructureCollection) – List of structures or StructureCollection.

  • y (list (optional)) – list of target property.

Returns:

self – Transformer object.

fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Input samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).

  • **fit_params (dict) – Additional fit parameters.

Returns:

X_new (ndarray array of shape (n_samples, n_features_new)) – Transformed array.

get_feature_names_out(input_features=None)

Get feature names.

get_metadata_routing()

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routing (MetadataRequest) – A MetadataRequest encapsulating routing information.

get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params (dict) – Parameter names mapped to their values.

precompute_parameter_space(param_grid, X)

Precompute and store structural properties to be reused later e.g. for a grid search.

Parameters:
  • param_grid (list or dict) – Dictionary or list of dictionaries of input parameters.

  • X (list or aim2dat.strct.StructureCollection) – List of structures or StructureCollection.

set_output(*, transform=None)

Set output container.

See sphx_glr_auto_examples_miscellaneous_plot_set_output.py for an example on how to use the API.

Parameters:

transform ({“default”, “pandas”}, default=None) – Configure output of transform and fit_transform.

  • “default”: Default output format of a transformer

  • “pandas”: DataFrame output

  • “polars”: Polars output

  • None: Transform configuration is unchanged

New in version 1.4: “polars” option was added.

Returns:

self (estimator instance) – Estimator instance.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self (estimator instance) – Estimator instance.

transform(X)

Transform structures to features.

Parameters:

X (list or aim2dat.strct.StructureCollection) – List of structures or StructureCollection.

Returns:

numpy.array – Nested array of features.

class aim2dat.ml.transformers.StructureChemOrderTransformer(r_max=15.0, max_shells=3, n_procs=1, chunksize=50, verbose=True)[source]

Bases: _BaseStructureTransformer

Extract Warren Cowley like order parameters for each element as defined in doi:10.1103/PhysRevB.96.024104.

Variables:
r_max : float (optional)

Cut-off value for the maximum distance between two atoms in angstrom.

max_shells : int (optional)

Number of neighbour shells that are evaluated.

n_procs : int (optional)

Number of parallel processes.

chunksize : int (optional)

Number of structures handed to each process at once.

verbose : bool (optional)

Whether to print a progress bar.

Overview

Properties

precomputed_properties

Precomputed properties given as list of tuples consisting of input parameters and

Methods

add_precomputed_properties(parameters, structure_operations)

Add precomputed properties.

clear_precomputed_properties()

Clear all precomputed properties.

fit(X, y)

Fit function that determines the number of features.

fit_transform(X, y, **fit_params)

Fit to data, then transform it.

get_feature_names_out(input_features)

Get feature names.

get_metadata_routing()

Get metadata routing of this object.

get_params(deep)

Get parameters for this estimator.

precompute_parameter_space(param_grid, X)

Precompute and store structural properties to be reused later e.g. for a grid search.

set_output(*None, transform)

Set output container.

set_params(**params)

Set the parameters of this estimator.

transform(X)

Transform structures to features.

property precomputed_properties

Precomputed properties given as list of tuples consisting of input parameters and StructureOperations object.

Type:

list

add_precomputed_properties(parameters, structure_operations)

Add precomputed properties.

Parameters:
  • parameters (dict) – Dictionary of input parameters.

  • structure_operations (StructureOperations) – StructureOperations object storing the properties according to the input parameters.

clear_precomputed_properties()

Clear all precomputed properties.

fit(X, y=None)

Fit function that determines the number of features.

Parameters:
  • X (list or aim2dat.strct.StructureCollection) – List of structures or StructureCollection.

  • y (list (optional)) – list of target property.

Returns:

self – Transformer object.

fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Input samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).

  • **fit_params (dict) – Additional fit parameters.

Returns:

X_new (ndarray array of shape (n_samples, n_features_new)) – Transformed array.

get_feature_names_out(input_features=None)

Get feature names.

get_metadata_routing()

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routing (MetadataRequest) – A MetadataRequest encapsulating routing information.

get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params (dict) – Parameter names mapped to their values.

precompute_parameter_space(param_grid, X)

Precompute and store structural properties to be reused later e.g. for a grid search.

Parameters:
  • param_grid (list or dict) – Dictionary or list of dictionaries of input parameters.

  • X (list or aim2dat.strct.StructureCollection) – List of structures or StructureCollection.

set_output(*, transform=None)

Set output container.

See sphx_glr_auto_examples_miscellaneous_plot_set_output.py for an example on how to use the API.

Parameters:

transform ({“default”, “pandas”}, default=None) – Configure output of transform and fit_transform.

  • “default”: Default output format of a transformer

  • “pandas”: DataFrame output

  • “polars”: Polars output

  • None: Transform configuration is unchanged

New in version 1.4: “polars” option was added.

Returns:

self (estimator instance) – Estimator instance.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self (estimator instance) – Estimator instance.

transform(X)

Transform structures to features.

Parameters:

X (list or aim2dat.strct.StructureCollection) – List of structures or StructureCollection.

Returns:

numpy.array – Nested array of features.

class aim2dat.ml.transformers.StructureCompositionTransformer(distinguish_kinds=False, n_procs=1, chunksize=50, verbose=True)[source]

Bases: _BaseStructureTransformer

Extract fractional concentrations of elements or kinds.

Variables:
distinguish_kinds : bool (optional)

Whether to use kinds instead of elements.

n_procs : int (optional)

Number of parallel processes.

chunksize : int (optional)

Number of structures handed to each process at once.

verbose : bool (optional)

Whether to print a progress bar.

Overview

Properties

precomputed_properties

Precomputed properties given as list of tuples consisting of input parameters and

Methods

add_precomputed_properties(parameters, structure_operations)

Add precomputed properties.

clear_precomputed_properties()

Clear all precomputed properties.

fit(X, y)

Fit function that determines the number of features.

fit_transform(X, y, **fit_params)

Fit to data, then transform it.

get_feature_names_out(input_features)

Get feature names.

get_metadata_routing()

Get metadata routing of this object.

get_params(deep)

Get parameters for this estimator.

precompute_parameter_space(param_grid, X)

Precompute and store structural properties to be reused later e.g. for a grid search.

set_output(*None, transform)

Set output container.

set_params(**params)

Set the parameters of this estimator.

transform(X)

Transform structures to features.

property precomputed_properties

Precomputed properties given as list of tuples consisting of input parameters and StructureOperations object.

Type:

list

add_precomputed_properties(parameters, structure_operations)

Add precomputed properties.

Parameters:
  • parameters (dict) – Dictionary of input parameters.

  • structure_operations (StructureOperations) – StructureOperations object storing the properties according to the input parameters.

clear_precomputed_properties()

Clear all precomputed properties.

fit(X, y=None)

Fit function that determines the number of features.

Parameters:
  • X (list or aim2dat.strct.StructureCollection) – List of structures or StructureCollection.

  • y (list (optional)) – list of target property.

Returns:

self – Transformer object.

fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Input samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).

  • **fit_params (dict) – Additional fit parameters.

Returns:

X_new (ndarray array of shape (n_samples, n_features_new)) – Transformed array.

get_feature_names_out(input_features=None)

Get feature names.

get_metadata_routing()

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routing (MetadataRequest) – A MetadataRequest encapsulating routing information.

get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params (dict) – Parameter names mapped to their values.

precompute_parameter_space(param_grid, X)

Precompute and store structural properties to be reused later e.g. for a grid search.

Parameters:
  • param_grid (list or dict) – Dictionary or list of dictionaries of input parameters.

  • X (list or aim2dat.strct.StructureCollection) – List of structures or StructureCollection.

set_output(*, transform=None)

Set output container.

See sphx_glr_auto_examples_miscellaneous_plot_set_output.py for an example on how to use the API.

Parameters:

transform ({“default”, “pandas”}, default=None) – Configure output of transform and fit_transform.

  • “default”: Default output format of a transformer

  • “pandas”: DataFrame output

  • “polars”: Polars output

  • None: Transform configuration is unchanged

New in version 1.4: “polars” option was added.

Returns:

self (estimator instance) – Estimator instance.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self (estimator instance) – Estimator instance.

transform(X)

Transform structures to features.

Parameters:

X (list or aim2dat.strct.StructureCollection) – List of structures or StructureCollection.

Returns:

numpy.array – Nested array of features.

class aim2dat.ml.transformers.StructureCoordinationTransformer(r_max=15.0, method='minimum_distance', min_dist_delta=0.1, n_nearest_neighbours=5, econ_tolerance=0.5, econ_conv_threshold=0.001, voronoi_weight_type='rel_solid_angle', voronoi_weight_threshold=0.5, feature_types=('nrs_avg', 'nrs_stdev', 'nrs_max', 'nrs_min', 'distance_avg', 'distance_stdev', 'distance_max', 'distance_min'), n_procs=1, chunksize=50, verbose=True)[source]

Bases: _BaseStructureTransformer

Extract coordination numbers and distances between elements or kinds.

Variables:
r_max : float (optional)

Cut-off value for the maximum distance between two atoms in angstrom.

method : str (optional)

Method used to calculate the coordination environment. The default value is 'minimum_distance'.

min_dist_delta : float (optional)

Tolerance parameter that defines the relative distance from the nearest neighbour atom for the 'minimum_distance' method.

n_nearest_neighbours : int (optional)

Number of neighbours that are considered coordinated for the 'n_neighbours' method.

econ_tolerance : float (optional)

Tolerance parameter for the econ method.

econ_conv_threshold : float (optional)

Convergence threshold for the econ method.

okeeffe_weight_threshold : float (optional)

Threshold parameter to distinguish indirect and direct neighbour atoms for the 'okeeffe'.

feature_types : tuple or str (optional)

Tuple of features that are extracted. Supported options are: 'nrs_avg', 'nrs_stdev', 'nrs_max', 'nrs_min', 'distance_avg', 'distance_stdev', 'distance_max' and 'distance_min'.

n_procs : int (optional)

Number of parallel processes.

chunksize : int (optional)

Number of structures handed to each process at once.

verbose : bool (optional)

Whether to print a progress bar.

Overview

Properties

feature_types

Feature types that are included.

precomputed_properties

Precomputed properties given as list of tuples consisting of input parameters and

Methods

add_precomputed_properties(parameters, structure_operations)

Add precomputed properties.

clear_precomputed_properties()

Clear all precomputed properties.

fit(X, y)

Fit function that determines the number of features.

fit_transform(X, y, **fit_params)

Fit to data, then transform it.

get_feature_names_out(input_features)

Get feature names.

get_metadata_routing()

Get metadata routing of this object.

get_params(deep)

Get parameters for this estimator.

precompute_parameter_space(param_grid, X)

Precompute and store structural properties to be reused later e.g. for a grid search.

set_output(*None, transform)

Set output container.

set_params(**params)

Set the parameters of this estimator.

transform(X)

Transform structures to features.

property feature_types

Feature types that are included.

Type:

tuple or str

property precomputed_properties

Precomputed properties given as list of tuples consisting of input parameters and StructureOperations object.

Type:

list

add_precomputed_properties(parameters, structure_operations)

Add precomputed properties.

Parameters:
  • parameters (dict) – Dictionary of input parameters.

  • structure_operations (StructureOperations) – StructureOperations object storing the properties according to the input parameters.

clear_precomputed_properties()

Clear all precomputed properties.

fit(X, y=None)

Fit function that determines the number of features.

Parameters:
  • X (list or aim2dat.strct.StructureCollection) – List of structures or StructureCollection.

  • y (list (optional)) – list of target property.

Returns:

self – Transformer object.

fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Input samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).

  • **fit_params (dict) – Additional fit parameters.

Returns:

X_new (ndarray array of shape (n_samples, n_features_new)) – Transformed array.

get_feature_names_out(input_features=None)

Get feature names.

get_metadata_routing()

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routing (MetadataRequest) – A MetadataRequest encapsulating routing information.

get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params (dict) – Parameter names mapped to their values.

precompute_parameter_space(param_grid, X)

Precompute and store structural properties to be reused later e.g. for a grid search.

Parameters:
  • param_grid (list or dict) – Dictionary or list of dictionaries of input parameters.

  • X (list or aim2dat.strct.StructureCollection) – List of structures or StructureCollection.

set_output(*, transform=None)

Set output container.

See sphx_glr_auto_examples_miscellaneous_plot_set_output.py for an example on how to use the API.

Parameters:

transform ({“default”, “pandas”}, default=None) – Configure output of transform and fit_transform.

  • “default”: Default output format of a transformer

  • “pandas”: DataFrame output

  • “polars”: Polars output

  • None: Transform configuration is unchanged

New in version 1.4: “polars” option was added.

Returns:

self (estimator instance) – Estimator instance.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self (estimator instance) – Estimator instance.

transform(X)

Transform structures to features.

Parameters:

X (list or aim2dat.strct.StructureCollection) – List of structures or StructureCollection.

Returns:

numpy.array – Nested array of features.

class aim2dat.ml.transformers.StructureDensityTransformer(distinguish_kinds=False, n_procs=1, chunksize=50, verbose=True)[source]

Bases: _BaseStructureTransformer

Extract density of each element or kind.

Variables:
distinguish_kinds : bool (optional)

Whether to use kinds instead of elements.

n_procs : int (optional)

Number of parallel processes.

chunksize : int (optional)

Number of structures handed to each process at once.

verbose : bool (optional)

Whether to print a progress bar.

Overview

Properties

precomputed_properties

Precomputed properties given as list of tuples consisting of input parameters and

Methods

add_precomputed_properties(parameters, structure_operations)

Add precomputed properties.

clear_precomputed_properties()

Clear all precomputed properties.

fit(X, y)

Fit function that determines the number of features.

fit_transform(X, y, **fit_params)

Fit to data, then transform it.

get_feature_names_out(input_features)

Get feature names.

get_metadata_routing()

Get metadata routing of this object.

get_params(deep)

Get parameters for this estimator.

precompute_parameter_space(param_grid, X)

Precompute and store structural properties to be reused later e.g. for a grid search.

set_output(*None, transform)

Set output container.

set_params(**params)

Set the parameters of this estimator.

transform(X)

Transform structures to features.

property precomputed_properties

Precomputed properties given as list of tuples consisting of input parameters and StructureOperations object.

Type:

list

add_precomputed_properties(parameters, structure_operations)

Add precomputed properties.

Parameters:
  • parameters (dict) – Dictionary of input parameters.

  • structure_operations (StructureOperations) – StructureOperations object storing the properties according to the input parameters.

clear_precomputed_properties()

Clear all precomputed properties.

fit(X, y=None)

Fit function that determines the number of features.

Parameters:
  • X (list or aim2dat.strct.StructureCollection) – List of structures or StructureCollection.

  • y (list (optional)) – list of target property.

Returns:

self – Transformer object.

fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Input samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).

  • **fit_params (dict) – Additional fit parameters.

Returns:

X_new (ndarray array of shape (n_samples, n_features_new)) – Transformed array.

get_feature_names_out(input_features=None)

Get feature names.

get_metadata_routing()

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routing (MetadataRequest) – A MetadataRequest encapsulating routing information.

get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params (dict) – Parameter names mapped to their values.

precompute_parameter_space(param_grid, X)

Precompute and store structural properties to be reused later e.g. for a grid search.

Parameters:
  • param_grid (list or dict) – Dictionary or list of dictionaries of input parameters.

  • X (list or aim2dat.strct.StructureCollection) – List of structures or StructureCollection.

set_output(*, transform=None)

Set output container.

See sphx_glr_auto_examples_miscellaneous_plot_set_output.py for an example on how to use the API.

Parameters:

transform ({“default”, “pandas”}, default=None) – Configure output of transform and fit_transform.

  • “default”: Default output format of a transformer

  • “pandas”: DataFrame output

  • “polars”: Polars output

  • None: Transform configuration is unchanged

New in version 1.4: “polars” option was added.

Returns:

self (estimator instance) – Estimator instance.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self (estimator instance) – Estimator instance.

transform(X)

Transform structures to features.

Parameters:

X (list or aim2dat.strct.StructureCollection) – List of structures or StructureCollection.

Returns:

numpy.array – Nested array of features.

class aim2dat.ml.transformers.StructureFFPrintTransformer(r_max=15.0, delta_bin=0.005, sigma=10.0, distinguish_kinds=False, add_header=False, use_weights=False, n_procs=1, chunksize=50, verbose=True)[source]

Bases: _BaseStructureTransformer

Extract the F-fingerprint for each element-pair as defined in doi:10.1103/PhysRevB.96.024104.

Variables:
r_max : float (optional)

Cut-off value for the maximum distance between two atoms in angstrom.

delta_bin : float (optional)

Bin size to descritize the function in angstrom.

sigma : float (optional)

Smearing parameter for the Gaussian function.

distinguish_kinds : bool (optional)

Whether different kinds should be distinguished e.g. Ni0 and Ni1 would be considered as different elements if True.

add_header : bool

Add leading entries that describe the weights and composition for the ffprint kernels.

n_procs : int (optional)

Number of parallel processes.

chunksize : int (optional)

Number of structures handed to each process at once.

verbose : bool (optional)

Whether to print a progress bar.

Overview

Properties

precomputed_properties

Precomputed properties given as list of tuples consisting of input parameters and

Methods

add_precomputed_properties(parameters, structure_operations)

Add precomputed properties.

clear_precomputed_properties()

Clear all precomputed properties.

fit(X, y)

Fit function that determines the number of features.

fit_transform(X, y, **fit_params)

Fit to data, then transform it.

get_feature_names_out(input_features)

Get feature names.

get_metadata_routing()

Get metadata routing of this object.

get_params(deep)

Get parameters for this estimator.

precompute_parameter_space(param_grid, X)

Precompute and store structural properties to be reused later e.g. for a grid search.

set_output(*None, transform)

Set output container.

set_params(**params)

Set the parameters of this estimator.

transform(X)

Transform structures to features.

property precomputed_properties

Precomputed properties given as list of tuples consisting of input parameters and StructureOperations object.

Type:

list

add_precomputed_properties(parameters, structure_operations)

Add precomputed properties.

Parameters:
  • parameters (dict) – Dictionary of input parameters.

  • structure_operations (StructureOperations) – StructureOperations object storing the properties according to the input parameters.

clear_precomputed_properties()

Clear all precomputed properties.

fit(X, y=None)

Fit function that determines the number of features.

Parameters:
  • X (list or aim2dat.strct.StructureCollection) – List of structures or StructureCollection.

  • y (list (optional)) – list of target property.

Returns:

self – Transformer object.

fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Input samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).

  • **fit_params (dict) – Additional fit parameters.

Returns:

X_new (ndarray array of shape (n_samples, n_features_new)) – Transformed array.

get_feature_names_out(input_features=None)

Get feature names.

get_metadata_routing()

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routing (MetadataRequest) – A MetadataRequest encapsulating routing information.

get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params (dict) – Parameter names mapped to their values.

precompute_parameter_space(param_grid, X)

Precompute and store structural properties to be reused later e.g. for a grid search.

Parameters:
  • param_grid (list or dict) – Dictionary or list of dictionaries of input parameters.

  • X (list or aim2dat.strct.StructureCollection) – List of structures or StructureCollection.

set_output(*, transform=None)

Set output container.

See sphx_glr_auto_examples_miscellaneous_plot_set_output.py for an example on how to use the API.

Parameters:

transform ({“default”, “pandas”}, default=None) – Configure output of transform and fit_transform.

  • “default”: Default output format of a transformer

  • “pandas”: DataFrame output

  • “polars”: Polars output

  • None: Transform configuration is unchanged

New in version 1.4: “polars” option was added.

Returns:

self (estimator instance) – Estimator instance.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self (estimator instance) – Estimator instance.

transform(X)

Transform structures to features.

Parameters:

X (list or aim2dat.strct.StructureCollection) – List of structures or StructureCollection.

Returns:

numpy.array – Nested array of features.

class aim2dat.ml.transformers.StructureMBTRTransformer(geometry={'function': 'inverse_distance'}, grid={'min': 0, 'max': 1, 'n': 100, 'sigma': 0.1}, weighting={'function': 'exp', 'scale': 1.0, 'threshold': 0.001}, normalize_gaussians=True, normalization='l2', elements=None, periodic=False, sparse=False, dscribe_n_jobs=1, dscribe_only_physical_cores=False, n_procs=1, chunksize=50, verbose=True)[source]

Bases: _BaseDscribeTransformer

Extract MBTR descriptor as defined in doi:10.1088/2632-2153/aca005. This transformer class is based on the implementations of the dscribe python package.

Variables:
geometry : dict

Setup the geometry function.

grid : dict

Setup the discretization grid.

weighting : dict

Setup the weighting function and its parameters.

normalize_gaussians : bool

Whether to normalize the gaussians to an area of 1.

normalization : str

Method for normalizing. Supported options are 'none', 'l2', 'n_atoms', 'valle_oganov'.

elements : list

List of atomic numbers or chemical symbols.

periodic : bool

Whether to consider periodic boundary conditions.

sparse : bool

Whether to return a sparse matrix or a dense array.

dscribe_n_jobs : int

Number of jobs used by dscribe to calculate the interaction matrix.

dscribe_only_physical_cores : bool

Whether to only use physicsl cores.

n_procs : int (optional)

Number of parallel processes.

chunksize : int (optional)

Number of structures handed to each process at once.

verbose : bool (optional)

Whether to print a progress bar.

Overview

Properties

precomputed_properties

Precomputed properties given as list of tuples consisting of input parameters and

Methods

add_precomputed_properties(parameters, structure_operations)

Add precomputed properties.

clear_precomputed_properties()

Clear all precomputed properties.

fit(X, y)

Fit function that determines the number of features.

fit_transform(X, y, **fit_params)

Fit to data, then transform it.

get_feature_names_out(input_features)

Get feature names.

get_metadata_routing()

Get metadata routing of this object.

get_params(deep)

Get parameters for this estimator.

precompute_parameter_space(param_grid, X)

Precompute and store structural properties to be reused later e.g. for a grid search.

set_output(*None, transform)

Set output container.

set_params(**params)

Set the parameters of this estimator.

transform(X)

Transform structures to features.

property precomputed_properties

Precomputed properties given as list of tuples consisting of input parameters and StructureOperations object.

Type:

list

add_precomputed_properties(parameters, structure_operations)

Add precomputed properties.

Parameters:
  • parameters (dict) – Dictionary of input parameters.

  • structure_operations (StructureOperations) – StructureOperations object storing the properties according to the input parameters.

clear_precomputed_properties()

Clear all precomputed properties.

fit(X, y=None)

Fit function that determines the number of features.

Parameters:
  • X (list or aim2dat.strct.StructureCollection) – List of structures or StructureCollection.

  • y (list (optional)) – list of target property.

Returns:

self – Transformer object.

fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Input samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).

  • **fit_params (dict) – Additional fit parameters.

Returns:

X_new (ndarray array of shape (n_samples, n_features_new)) – Transformed array.

get_feature_names_out(input_features=None)

Get feature names.

get_metadata_routing()

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routing (MetadataRequest) – A MetadataRequest encapsulating routing information.

get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params (dict) – Parameter names mapped to their values.

precompute_parameter_space(param_grid, X)

Precompute and store structural properties to be reused later e.g. for a grid search.

Parameters:
  • param_grid (list or dict) – Dictionary or list of dictionaries of input parameters.

  • X (list or aim2dat.strct.StructureCollection) – List of structures or StructureCollection.

set_output(*, transform=None)

Set output container.

See sphx_glr_auto_examples_miscellaneous_plot_set_output.py for an example on how to use the API.

Parameters:

transform ({“default”, “pandas”}, default=None) – Configure output of transform and fit_transform.

  • “default”: Default output format of a transformer

  • “pandas”: DataFrame output

  • “polars”: Polars output

  • None: Transform configuration is unchanged

New in version 1.4: “polars” option was added.

Returns:

self (estimator instance) – Estimator instance.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self (estimator instance) – Estimator instance.

transform(X)

Transform structures to features.

Parameters:

X (list or aim2dat.strct.StructureCollection) – List of structures or StructureCollection.

Returns:

numpy.array – Nested array of features.

class aim2dat.ml.transformers.StructureMatrixTransformer(matrix_type='coulomb', n_atoms_max=None, enforce_real=False, permutation='eigenspectrum', sigma=None, seed=None, sparse=False, ewald_accuracy=1e-05, ewald_w=1, ewald_r_cut=None, ewald_g_cut=None, ewald_a=None, dscribe_n_jobs=1, dscribe_only_physical_cores=False, n_procs=1, chunksize=50, verbose=True)[source]

Bases: _BaseStructureTransformer

Extract features based on interaction matrices as defined in doi:10.1002/qua.24917. This transformer class is based on the implementations of the dscribe python package.

Variables:
matrix_type : str

Matrix type. Supported options are 'coulomb', 'ewald_sum' or 'sine'.

permutation : str

Defines the output format. Options are: 'none', 'sorted_l2', 'eigenspectrum' or 'random'.

sigma : float

Standar deviation of the Gaussian distributed noise when using 'random' for permutation.

seed : int

Seed for the random numbers in case 'random' is chosen for the permutation attibute.

sparse : bool

Whether to return a sparse matrix or a dense 1D array.

ewald_accuracy : float

Accuracy threshold for the Ewald sum.

ewald_w : int

Weight parameter.

ewald_r_cut : float or None

Real space cutoff parameter.

ewald_g_cut : float or None

Reciprocal space cutoff parameter.

ewald_a : float or None

Parameter controlling the width of the Gaussian functions.

dscribe_n_jobs : int

Number of jobs used by dscribe to calculate the interaction matrix.

dscribe_only_physical_cores : bool

Whether to only use physicsl cores.

n_procs : int (optional)

Number of parallel processes.

chunksize : int (optional)

Number of structures handed to each process at once.

verbose : bool (optional)

Whether to print a progress bar.

Overview

Properties

precomputed_properties

Precomputed properties given as list of tuples consisting of input parameters and

Methods

add_precomputed_properties(parameters, structure_operations)

Add precomputed properties.

clear_precomputed_properties()

Clear all precomputed properties.

fit(X, y)

Fit function that determines the number of features.

fit_transform(X, y, **fit_params)

Fit to data, then transform it.

get_feature_names_out(input_features)

Get feature names.

get_metadata_routing()

Get metadata routing of this object.

get_params(deep)

Get parameters for this estimator.

precompute_parameter_space(param_grid, X)

Precompute and store structural properties to be reused later e.g. for a grid search.

set_output(*None, transform)

Set output container.

set_params(**params)

Set the parameters of this estimator.

transform(X)

Transform structures to features.

property precomputed_properties

Precomputed properties given as list of tuples consisting of input parameters and StructureOperations object.

Type:

list

add_precomputed_properties(parameters, structure_operations)

Add precomputed properties.

Parameters:
  • parameters (dict) – Dictionary of input parameters.

  • structure_operations (StructureOperations) – StructureOperations object storing the properties according to the input parameters.

clear_precomputed_properties()

Clear all precomputed properties.

fit(X, y=None)

Fit function that determines the number of features.

Parameters:
  • X (list or aim2dat.strct.StructureCollection) – List of structures or StructureCollection.

  • y (list (optional)) – list of target property.

Returns:

self – Transformer object.

fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Input samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).

  • **fit_params (dict) – Additional fit parameters.

Returns:

X_new (ndarray array of shape (n_samples, n_features_new)) – Transformed array.

get_feature_names_out(input_features=None)

Get feature names.

get_metadata_routing()

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routing (MetadataRequest) – A MetadataRequest encapsulating routing information.

get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params (dict) – Parameter names mapped to their values.

precompute_parameter_space(param_grid, X)

Precompute and store structural properties to be reused later e.g. for a grid search.

Parameters:
  • param_grid (list or dict) – Dictionary or list of dictionaries of input parameters.

  • X (list or aim2dat.strct.StructureCollection) – List of structures or StructureCollection.

set_output(*, transform=None)

Set output container.

See sphx_glr_auto_examples_miscellaneous_plot_set_output.py for an example on how to use the API.

Parameters:

transform ({“default”, “pandas”}, default=None) – Configure output of transform and fit_transform.

  • “default”: Default output format of a transformer

  • “pandas”: DataFrame output

  • “polars”: Polars output

  • None: Transform configuration is unchanged

New in version 1.4: “polars” option was added.

Returns:

self (estimator instance) – Estimator instance.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self (estimator instance) – Estimator instance.

transform(X)

Transform structures to features.

Parameters:

X (list or aim2dat.strct.StructureCollection) – List of structures or StructureCollection.

Returns:

numpy.array – Nested array of features.

class aim2dat.ml.transformers.StructurePRDFTransformer(r_max=15.0, delta_bin=0.005, distinguish_kinds=False, n_procs=1, chunksize=50, verbose=True)[source]

Bases: _BaseStructureTransformer

Extract the partial radial distribution function for each element-pair as defined in doi:10.1103/PhysRevB.89.205118.

Variables:
r_max : float (optional)

Cut-off value for the maximum distance between two atoms in angstrom.

delta_bin : float (optional)

Bin size to descritize the function in angstrom.

distinguish_kinds : bool (optional)

Whether different kinds should be distinguished e.g. Ni0 and Ni1 would be considered as different elements if True.

n_procs : int (optional)

Number of parallel processes.

chunksize : int (optional)

Number of structures handed to each process at once.

verbose : bool (optional)

Whether to print a progress bar.

Overview

Properties

precomputed_properties

Precomputed properties given as list of tuples consisting of input parameters and

Methods

add_precomputed_properties(parameters, structure_operations)

Add precomputed properties.

clear_precomputed_properties()

Clear all precomputed properties.

fit(X, y)

Fit function that determines the number of features.

fit_transform(X, y, **fit_params)

Fit to data, then transform it.

get_feature_names_out(input_features)

Get feature names.

get_metadata_routing()

Get metadata routing of this object.

get_params(deep)

Get parameters for this estimator.

precompute_parameter_space(param_grid, X)

Precompute and store structural properties to be reused later e.g. for a grid search.

set_output(*None, transform)

Set output container.

set_params(**params)

Set the parameters of this estimator.

transform(X)

Transform structures to features.

property precomputed_properties

Precomputed properties given as list of tuples consisting of input parameters and StructureOperations object.

Type:

list

add_precomputed_properties(parameters, structure_operations)

Add precomputed properties.

Parameters:
  • parameters (dict) – Dictionary of input parameters.

  • structure_operations (StructureOperations) – StructureOperations object storing the properties according to the input parameters.

clear_precomputed_properties()

Clear all precomputed properties.

fit(X, y=None)

Fit function that determines the number of features.

Parameters:
  • X (list or aim2dat.strct.StructureCollection) – List of structures or StructureCollection.

  • y (list (optional)) – list of target property.

Returns:

self – Transformer object.

fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Input samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).

  • **fit_params (dict) – Additional fit parameters.

Returns:

X_new (ndarray array of shape (n_samples, n_features_new)) – Transformed array.

get_feature_names_out(input_features=None)

Get feature names.

get_metadata_routing()

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routing (MetadataRequest) – A MetadataRequest encapsulating routing information.

get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params (dict) – Parameter names mapped to their values.

precompute_parameter_space(param_grid, X)

Precompute and store structural properties to be reused later e.g. for a grid search.

Parameters:
  • param_grid (list or dict) – Dictionary or list of dictionaries of input parameters.

  • X (list or aim2dat.strct.StructureCollection) – List of structures or StructureCollection.

set_output(*, transform=None)

Set output container.

See sphx_glr_auto_examples_miscellaneous_plot_set_output.py for an example on how to use the API.

Parameters:

transform ({“default”, “pandas”}, default=None) – Configure output of transform and fit_transform.

  • “default”: Default output format of a transformer

  • “pandas”: DataFrame output

  • “polars”: Polars output

  • None: Transform configuration is unchanged

New in version 1.4: “polars” option was added.

Returns:

self (estimator instance) – Estimator instance.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self (estimator instance) – Estimator instance.

transform(X)

Transform structures to features.

Parameters:

X (list or aim2dat.strct.StructureCollection) – List of structures or StructureCollection.

Returns:

numpy.array – Nested array of features.

class aim2dat.ml.transformers.StructureSOAPTransformer(r_cut=7.5, n_max=8, l_max=6, sigma=1.0, rbf='gto', weighting=None, compression={'mode': 'off', 'species_weighting': None}, average='off', elements=None, periodic=False, sparse=False, dscribe_n_jobs=1, dscribe_only_physical_cores=False, n_procs=1, chunksize=50, verbose=True)[source]

Bases: _BaseDscribeTransformer

Extract SOAP descriptor as defined in doi:10.1103/PhysRevB.87.184115. This transformer class is based on the implementations of the dscribe python package.

Variables:
r_cut : float

Cutoff value.

n_max : int

The number of radial basis functions.

l_max : int

The maximum degree of spherical harmonics.

sigma : float

The standard deviation of the gaussians.

rbf : str

The radial basis functions to use. Supported options are: 'gto' or 'polynomial'.

weighting : dict

Contains the options which control the weighting of the atomic density.

compression : dict

Feature compression options.

average : str

The averaging mode over the centers of interest. Supported options are: 'off', 'inner' or 'outer'.

elements : list

List of atomic numbers or chemical symbols.

periodic : bool

Whether to consider periodic boundary conditions.

sparse : bool

Whether to return a sparse matrix or a dense array.

dscribe_n_jobs : int

Number of jobs used by dscribe to calculate the interaction matrix.

dscribe_only_physical_cores : bool

Whether to only use physicsl cores.

n_procs : int (optional)

Number of parallel processes.

chunksize : int (optional)

Number of structures handed to each process at once.

verbose : bool (optional)

Whether to print a progress bar.

Overview

Properties

precomputed_properties

Precomputed properties given as list of tuples consisting of input parameters and

Methods

add_precomputed_properties(parameters, structure_operations)

Add precomputed properties.

clear_precomputed_properties()

Clear all precomputed properties.

fit(X, y)

Fit function that determines the number of features.

fit_transform(X, y, **fit_params)

Fit to data, then transform it.

get_feature_names_out(input_features)

Get feature names.

get_metadata_routing()

Get metadata routing of this object.

get_params(deep)

Get parameters for this estimator.

precompute_parameter_space(param_grid, X)

Precompute and store structural properties to be reused later e.g. for a grid search.

set_output(*None, transform)

Set output container.

set_params(**params)

Set the parameters of this estimator.

transform(X)

Transform structures to features.

property precomputed_properties

Precomputed properties given as list of tuples consisting of input parameters and StructureOperations object.

Type:

list

add_precomputed_properties(parameters, structure_operations)

Add precomputed properties.

Parameters:
  • parameters (dict) – Dictionary of input parameters.

  • structure_operations (StructureOperations) – StructureOperations object storing the properties according to the input parameters.

clear_precomputed_properties()

Clear all precomputed properties.

fit(X, y=None)

Fit function that determines the number of features.

Parameters:
  • X (list or aim2dat.strct.StructureCollection) – List of structures or StructureCollection.

  • y (list (optional)) – list of target property.

Returns:

self – Transformer object.

fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Input samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).

  • **fit_params (dict) – Additional fit parameters.

Returns:

X_new (ndarray array of shape (n_samples, n_features_new)) – Transformed array.

get_feature_names_out(input_features=None)

Get feature names.

get_metadata_routing()

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routing (MetadataRequest) – A MetadataRequest encapsulating routing information.

get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params (dict) – Parameter names mapped to their values.

precompute_parameter_space(param_grid, X)

Precompute and store structural properties to be reused later e.g. for a grid search.

Parameters:
  • param_grid (list or dict) – Dictionary or list of dictionaries of input parameters.

  • X (list or aim2dat.strct.StructureCollection) – List of structures or StructureCollection.

set_output(*, transform=None)

Set output container.

See sphx_glr_auto_examples_miscellaneous_plot_set_output.py for an example on how to use the API.

Parameters:

transform ({“default”, “pandas”}, default=None) – Configure output of transform and fit_transform.

  • “default”: Default output format of a transformer

  • “pandas”: DataFrame output

  • “polars”: Polars output

  • None: Transform configuration is unchanged

New in version 1.4: “polars” option was added.

Returns:

self (estimator instance) – Estimator instance.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self (estimator instance) – Estimator instance.

transform(X)

Transform structures to features.

Parameters:

X (list or aim2dat.strct.StructureCollection) – List of structures or StructureCollection.

Returns:

numpy.array – Nested array of features.