Handling multiple structures at once¶
To facilitate the work with larger sets of structures, e.g. for high-throughput studies, this package includes the StructureCollection and the StructureOperations classes.
The StructureCollection class¶
The StructureCollection acts as a data container for larger numbers of molecules or crystals:
[1]:
from aim2dat.strct import StructureCollection
strct_c = StructureCollection()
The structures can be added to the object via the different append* functions of the object:
[2]:
from ase.spacegroup import crystal
import aiida
from aiida.orm import StructureData
from aim2dat.strct import Structure
structure_dict = {
"label": "Benzene",
"elements": ["C"] * 6 + ["H"] * 6,
"pbc": False,
"positions": [
[-0.7040, -1.2194, -0.0000],
[0.7040, -1.2194, -0.0000],
[-1.4081, -0.0000, -0.0000],
[1.4081, 0.0000, 0.0000],
[-0.7040, 1.2194, 0.0000],
[0.7040, 1.2194, -0.0000],
[-1.2152, -2.1048, -0.0000],
[1.2152, -2.1048, 0.0000],
[-2.4304, -0.0000, 0.0000],
[2.4304, 0.0000, 0.0000],
[-1.2152, 2.1048, -0.0000],
[1.2152, 2.1048, 0.0000],
],
}
strct_c.append(**structure_dict)
structure = Structure(
elements=["O", "H", "H"],
positions=[[0.0, 0.0, 0.119], [0.0, 0.763, -0.477], [0.0, -0.763, -0.477]],
pbc=False,
)
strct_c.append_structure(structure, label="Water")
a = 4.066 * 2.0
GaAs_conv = crystal(
("Ga", "As"),
basis=((0.0, 0.0, 0.0), (0.75, 0.75, 0.75)),
spacegroup=216,
cellpar=[a, a, a, 90, 90, 90],
primitive_cell=False,
)
GaAs_conv.info = {}
strct_c.append_from_ase_atoms("GaAs", GaAs_conv)
aiida.load_profile("tests")
unit_cell = [[3.0, 0.0, 0.0], [0.0, 3.0, 0.0], [0.0, 0.0, 3.0]]
structure = StructureData(cell=unit_cell)
structure.label = "Li"
structure.append_atom(position=(0.0, 0.0, 0.0), symbols="Li")
structure.append_atom(position=(1.5, 1.5, 1.5), symbols="Li")
strct_c.append_from_aiida_structuredata(structure)
Alternatively, a list of dictionaries can be passed upon initialization of the object:
[3]:
strct_c2 = StructureCollection(structures=[structure_dict])
A summary of the object is given by its string representation:
[4]:
print(strct_c)
----------------------------------------------------------------------
------------------------ Structure Collection ------------------------
----------------------------------------------------------------------
- Number of structures: 4
- Elements: As-C-Ga-H-Li-O
Structures
- Benzene C6H6 [False False False]
- Water OH2 [False False False]
- GaAs Ga4As4 [True True True ]
- Li Li2 [True True True ]
----------------------------------------------------------------------
Additionally, a pandas data frame can be created based on the object’s content:
[5]:
strct_c2.to_pandas_df()
[5]:
| label | structure | el_conc_C | el_conc_H | nr_atoms | nr_atoms_C | nr_atoms_H | |
|---|---|---|---|---|---|---|---|
| 0 | Benzene | [(C, (-0.704, -1.2194, -0.0)), (C, (0.704, -1.... | 0.5 | 0.5 | 12 | 6 | 6 |
The StructureCollection object contains features of the list and dictionary python types and stores each structure as Structure object in a list.
As such each added structure gets an index (integer number) and a label (string) assigned that is used to obtain the structure.
While the label is stored within the Structure object in the label property the index is given by the position in the internal list of the StructureCollection object and defined by the order of the append* function calls.
The structure can be obtained via the get_structure function or squared brackets using its label or index:
[6]:
print(strct_c[1])
----------------------------------------------------------------------
-------------------------- Structure: Water --------------------------
----------------------------------------------------------------------
Formula: OH2
PBC: [False False False]
Sites
- O None [ 0.0000 0.0000 0.1190]
- H None [ 0.0000 0.7630 -0.4770]
- H None [ 0.0000 -0.7630 -0.4770]
----------------------------------------------------------------------
[7]:
print(strct_c["Water"])
----------------------------------------------------------------------
-------------------------- Structure: Water --------------------------
----------------------------------------------------------------------
Formula: OH2
PBC: [False False False]
Sites
- O None [ 0.0000 0.0000 0.1190]
- H None [ 0.0000 0.7630 -0.4770]
- H None [ 0.0000 -0.7630 -0.4770]
----------------------------------------------------------------------
[8]:
print(strct_c.get_structure(3))
----------------------------------------------------------------------
--------------------------- Structure: Li ----------------------------
----------------------------------------------------------------------
Formula: Li2
PBC: [True True True]
Cell
Vectors: - [ 3.0000 0.0000 0.0000]
- [ 0.0000 3.0000 0.0000]
- [ 0.0000 0.0000 3.0000]
Lengths: [ 3.0000 3.0000 3.0000]
Angles: [ 90.0000 90.0000 90.0000]
Volume: 27.0000
Sites
- Li None [ 0.0000 0.0000 0.0000] [ 0.0000 0.0000 0.0000]
- Li None [ 1.5000 1.5000 1.5000] [ 0.5000 0.5000 0.5000]
----------------------------------------------------------------------
Similar to a list the index of the structure is returned using the index function and a structure can be deleted via del and the pop function is implemented as well.
All labels of the structures are returned via the labels property:
[9]:
del strct_c["Benzene"]
strct_c.labels
[9]:
['Water', 'GaAs', 'Li']
Two structure collection objects can be merged into one using +:
[10]:
print(strct_c + strct_c2)
----------------------------------------------------------------------
------------------------ Structure Collection ------------------------
----------------------------------------------------------------------
- Number of structures: 4
- Elements: As-C-Ga-H-Li-O
Structures
- Water OH2 [False False False]
- GaAs Ga4As4 [True True True ]
- Li Li2 [True True True ]
- Benzene C6H6 [False False False]
----------------------------------------------------------------------
There are two ways to store all structures contained in the StructureCollection object, the structures can be written into a hdf5 file or into an AiiDA database using the functions to_file and to_aiida_db, respectively. The structures can be retrieved using the corresponding from_file and from_aiida_db functions.
[11]:
strct_c.to_aiida_db(group_label="test")
strct_c = StructureCollection.from_aiida_db(group_label="test")
Storing data as group `test` in the AiiDA database.
[12]:
strct_c.to_file("test.h5")
strct_c = StructureCollection.from_file("test.h5")
Analysis and manipulation of multiple structures via the StructureOperations class¶
The StructureOperations class offers the same structural analysis and manipulation methods as implemented in the Structure class but offers a more convenient interface to apply the methods on multiple structures at once.
The StructureOperations object demands a StructureCollection object upon initialization which the class uses as internal storage for the original as well as newly created structures via the manipulation methods:
[13]:
from aim2dat.strct import StructureOperations
strct_c += strct_c2
strct_op = StructureOperations(structures=strct_c, verbose=False)
print(strct_op.structures)
----------------------------------------------------------------------
------------------------ Structure Collection ------------------------
----------------------------------------------------------------------
- Number of structures: 4
- Elements: As-C-Ga-H-Li-O
Structures
- Li Li2 [True True True ]
- GaAs Ga4As4 [True True True ]
- Water OH2 [False False False]
- Benzene C6H6 [False False False]
----------------------------------------------------------------------
There are two additional properties to be set:
verboseexpects a boolean variable, if set toTruea progress bar is shown.output_formatexpects a string and specifies the output format for the analysis methods.
A list of all supported options is returned via the supported_output_formats property:
[14]:
strct_op.supported_output_formats
[14]:
['dict', 'DataFrame']
All methods of the class are parallelized, two properties control the parallelization, both expecting a positive integer number:
n_procssets the number of used processes.chunksizedefines the number of tasks assigned to each process at once.
As mentioned before, the StructureOperations class inherits the same analysis and manipulation methods as the Structure class which can be listed with the same properties:
[15]:
print("Analysis methods: ", strct_op.list_analysis_methods())
print("Manipulation methods: ", strct_op.list_analysis_methods())
Analysis methods: ['calc_point_group', 'calc_space_group', 'calc_distance', 'calc_angle', 'calc_dihedral_angle', 'calc_voronoi_tessellation', 'calc_coordination', 'calc_ffingerprint']
Manipulation methods: ['calc_point_group', 'calc_space_group', 'calc_distance', 'calc_angle', 'calc_dihedral_angle', 'calc_voronoi_tessellation', 'calc_coordination', 'calc_ffingerprint']
The analysis and manipulation methods work the same way as for the Structure object, however, now we have the option to specify the first argument of the methods which gives the key or a list/tuple of keys in order to apply the method on the structure(s) in the StructureCollection identified by the key(s).
In case a single key is given by an integer number or the structure label the output will be the same as for the Structure.
For example, the calculation of the distance between two atoms can be performed via the StructureOperations or the StructureCollection object in one line:
[16]:
print("One structure: ", strct_op[["Benzene"]].calc_distance(2, 3))
print("One structure: ", strct_c["Benzene"].calc_distance(site_index1=2, site_index2=3))
One structure: {'Benzene': 2.8162}
One structure: 2.8162
Note
It is important to note that the StructureOperations class behaves differently for strct_op["Benzene"].calc_distance(2, 3) and strct_op[["Benzene"]].calc_distance(2, 3). In the latter case, the input is given as a list, and as such, the output is consistent with the use case of multiple structures described below.
The advantage of the StructureOperations class comes into play, when several structures are analysed at once, e.g.:
[17]:
print(strct_op.structures)
----------------------------------------------------------------------
------------------------ Structure Collection ------------------------
----------------------------------------------------------------------
- Number of structures: 4
- Elements: As-C-Ga-H-Li-O
Structures
- Li Li2 [True True True ]
- GaAs Ga4As4 [True True True ]
- Water OH2 [False False False]
- Benzene C6H6 [False False False]
----------------------------------------------------------------------
[18]:
print("Multiple structures: ", strct_op[0,1].calc_distance(0, 1))
Multiple structures: {'Li': 2.598076211353316, 'GaAs': 5.750192323029818}
If the output_format is changed to 'DataFrame' a pandas data frame is returned using the structure labels as indices and the results are stored in a column named like the called method:
[19]:
strct_op.output_format = "DataFrame"
strct_op[0, 1].calc_distance(0, 1)
[19]:
| <function calc_distance at 0x7f34c70afec0> | |
|---|---|
| Li | 2.598076 |
| GaAs | 5.750192 |
As for the structural manipulation methods, once again, the output for a single key will be the same for the Structure
The only difference is that if append_to_coll is set to True the new structure (for the manipulation methods) is also added to its StructureCollection object:
[20]:
subst_structure = strct_op["GaAs"].substitute_elements(("Ga", "Al"), change_label=True)
print(subst_structure)
print(strct_op.structures.labels)
----------------------------------------------------------------------
--------------------- Structure: GaAs_subst-GaAl ---------------------
----------------------------------------------------------------------
Formula: Al4As4
PBC: [True True True]
Cell
Vectors: - [ 8.0987 0.0000 0.0000]
- [ 0.0000 8.0987 0.0000]
- [ 0.0000 0.0000 8.0987]
Lengths: [ 8.0987 8.0987 8.0987]
Angles: [ 90.0000 90.0000 90.0000]
Volume: 531.1797
Sites
- Al None [ 0.0000 0.0000 0.0000] [ 0.0000 0.0000 0.0000]
- Al None [ 0.0000 4.0493 4.0493] [ 0.0000 0.5000 0.5000]
- Al None [ 4.0493 0.0000 4.0493] [ 0.5000 0.0000 0.5000]
- Al None [ 4.0493 4.0493 0.0000] [ 0.5000 0.5000 0.0000]
- As None [ 6.0740 6.0740 6.0740] [ 0.7500 0.7500 0.7500]
- As None [ 2.0247 2.0247 6.0740] [ 0.2500 0.2500 0.7500]
- As None [ 2.0247 6.0740 2.0247] [ 0.2500 0.7500 0.2500]
- As None [ 6.0740 2.0247 2.0247] [ 0.7500 0.2500 0.2500]
----------------------------------------------------------------------
['Li', 'GaAs', 'Water', 'Benzene']
For a list/tuple of keys instead of a Structure a StructureCollection is returned containing the structures identified via the keys:
[21]:
subst_structures = strct_op[strct_op.structures.labels].substitute_elements(
("Al", "Ga"), change_label=False
)
print(subst_structures)
----------------------------------------------------------------------
------------------------ Structure Collection ------------------------
----------------------------------------------------------------------
- Number of structures: 4
- Elements: As-C-Ga-H-Li-O
Structures
- Li Li2 [True True True ]
- GaAs Ga4As4 [True True True ]
- Water OH2 [False False False]
- Benzene C6H6 [False False False]
----------------------------------------------------------------------
It is important to note that in this case, all structures are returned regardless of whether they are actually changed by the method or not.
External analysis and manipulation methods can be used via the implemented perform_analysis and perform_manipulation functions, respectively.
In this case the analysis function and its keyword arguments need to be passed.
[22]:
from aim2dat.strct.ext_analysis import calc_prdf
output = strct_op["Benzene"].perform_analysis(calc_prdf, {"r_max": 7.5})
The pipeline implementation of the StructureOperations class¶
The pipeline property predefines a series of structure manipulation methods which are applied consecutively on the structure pool. Each step is represented by tuple of up to three entries: the first entry defines the manipulation function, the second (optional) entry contains input parameters as dictionary and the third (optional) entry defines how often the function is executed. The pipeline can be run via the run_pipeline method and outputs the manipulated structures:
[23]:
strct_op.pipeline = [("scale_unit_cell", {"scaling_factors": 1.1}, (0, 1)), ("substitute_elements", {"elements": ("Al", "Ga")})]
strcts = strct_op.run_pipeline()
print(strcts)
----------------------------------------------------------------------
------------------------ Structure Collection ------------------------
----------------------------------------------------------------------
- Number of structures: 8
- Elements: As-C-Ga-H-Li-O
Structures
- Lix0 Li2 [True True True ]
- GaAsx0 Ga4As4 [True True True ]
- Waterx0 OH2 [False False False]
- Benzenex0 C6H6 [False False False]
- Lix1 Li2 [True True True ]
- GaAsx1 Ga4As4 [True True True ]
- Waterx1 OH2 [False False False]
- Benzenex1 C6H6 [False False False]
----------------------------------------------------------------------
Note that the first step is skipped for the first four structures while it is employed once on the latter structures.
Comparing structures via the StructureOperations class¶
Another handy feature of the class are its comparison methods between to structures or the sites of a structure:
And methods to filter out duplicate structures or find equivalent sites based on the comparison methods: