Handling multiple structures at once¶
To facilitate the work with larger sets of structures, e.g. for high-throughput studies, this package includes the StructureCollection
and the StructureOperations
classes.
The StructureCollection
class¶
The StructureCollection
acts as a data container for larger numbers of molecules or crystals:
[1]:
from aim2dat.strct import StructureCollection
strct_c = StructureCollection()
The structures can be added to the object via the different append*
functions of the object:
[2]:
from ase.spacegroup import crystal
import aiida
from aiida.orm import StructureData
from aim2dat.strct import Structure
structure_dict = {
"label": "Benzene",
"elements": ["C"] * 6 + ["H"] * 6,
"pbc": False,
"positions": [
[-0.7040, -1.2194, -0.0000],
[0.7040, -1.2194, -0.0000],
[-1.4081, -0.0000, -0.0000],
[1.4081, 0.0000, 0.0000],
[-0.7040, 1.2194, 0.0000],
[0.7040, 1.2194, -0.0000],
[-1.2152, -2.1048, -0.0000],
[1.2152, -2.1048, 0.0000],
[-2.4304, -0.0000, 0.0000],
[2.4304, 0.0000, 0.0000],
[-1.2152, 2.1048, -0.0000],
[1.2152, 2.1048, 0.0000],
],
}
strct_c.append(**structure_dict)
structure = Structure(
elements=["O", "H", "H"],
positions=[[0.0, 0.0, 0.119], [0.0, 0.763, -0.477], [0.0, -0.763, -0.477]],
pbc=False,
)
strct_c.append_structure(structure, label="Water")
a = 4.066 * 2.0
GaAs_conv = crystal(
("Ga", "As"),
basis=((0.0, 0.0, 0.0), (0.75, 0.75, 0.75)),
spacegroup=216,
cellpar=[a, a, a, 90, 90, 90],
primitive_cell=False,
)
strct_c.append_from_ase_atoms("GaAs", GaAs_conv)
aiida.load_profile("tests")
unit_cell = [[3.0, 0.0, 0.0], [0.0, 3.0, 0.0], [0.0, 0.0, 3.0]]
structure = StructureData(cell=unit_cell)
structure.label = "Li"
structure.append_atom(position=(0.0, 0.0, 0.0), symbols="Li")
structure.append_atom(position=(1.5, 1.5, 1.5), symbols="Li")
strct_c.append_from_aiida_structuredata(structure)
Alternatively, a list of dictionaries can be passed upon initialization of the object:
[3]:
strct_c2 = StructureCollection(structures=[structure_dict])
A summary of the object is given by its string representation:
[4]:
print(strct_c)
----------------------------------------------------------------------
------------------------ Structure Collection ------------------------
----------------------------------------------------------------------
- Number of structures: 4
- Elements: As-C-Ga-H-Li-O
Structures
- Benzene C6H6 [False False False]
- Water OH2 [False False False]
- GaAs Ga4As4 [True True True ]
- Li Li2 [True True True ]
----------------------------------------------------------------------
Additionally, a pandas data frame can be created based on the object’s content:
[5]:
strct_c2.create_pandas_df()
[5]:
label | el_conc_C | el_conc_H | nr_atoms | nr_atoms_C | nr_atoms_H | |
---|---|---|---|---|---|---|
0 | Benzene | 0.5 | 0.5 | 12 | 6 | 6 |
The StructureCollection
object contains features of the list and dictionary python types and stores each structure as Structure
object in a list.
As such each added structure gets an index (integer number) and a label (string) assigned that is used to obtain the structure.
While the label is stored within the Structure
object in the label property the index is given by the position in the internal list of the StructureCollection
object and defined by the order of the append*
function calls.
The structure can be obtained via the get_structure
function or squared brackets using its label or index:
[6]:
print(strct_c[1])
----------------------------------------------------------------------
-------------------------- Structure: Water --------------------------
----------------------------------------------------------------------
Formula: OH2
PBC: [False False False]
Sites
- O None [ 0.0000 0.0000 0.1190]
- H None [ 0.0000 0.7630 -0.4770]
- H None [ 0.0000 -0.7630 -0.4770]
----------------------------------------------------------------------
[7]:
print(strct_c["Water"])
----------------------------------------------------------------------
-------------------------- Structure: Water --------------------------
----------------------------------------------------------------------
Formula: OH2
PBC: [False False False]
Sites
- O None [ 0.0000 0.0000 0.1190]
- H None [ 0.0000 0.7630 -0.4770]
- H None [ 0.0000 -0.7630 -0.4770]
----------------------------------------------------------------------
[8]:
print(strct_c.get_structure(3))
----------------------------------------------------------------------
--------------------------- Structure: Li ----------------------------
----------------------------------------------------------------------
Formula: Li2
PBC: [True True True]
Cell
Vectors: - [ 3.0000 0.0000 0.0000]
- [ 0.0000 3.0000 0.0000]
- [ 0.0000 0.0000 3.0000]
Lengths: [ 3.0000 3.0000 3.0000]
Angles: [ 90.0000 90.0000 90.0000]
Volume: 27.0000
Sites
- Li None [ 0.0000 0.0000 0.0000] [ 0.0000 0.0000 0.0000]
- Li None [ 1.5000 1.5000 1.5000] [ 0.5000 0.5000 0.5000]
----------------------------------------------------------------------
Similar to a list the index of the structure is returned using the index
function and a structure can be deleted via del and the pop
function is implemented as well.
All labels of the structures are returned via the labels
property:
[9]:
del strct_c["Benzene"]
strct_c.labels
[9]:
['Water', 'GaAs', 'Li']
Two structure collection objects can be merged into one using +
:
[10]:
print(strct_c + strct_c2)
----------------------------------------------------------------------
------------------------ Structure Collection ------------------------
----------------------------------------------------------------------
- Number of structures: 4
- Elements: As-C-Ga-H-Li-O
Structures
- Water OH2 [False False False]
- GaAs Ga4As4 [True True True ]
- Li Li2 [True True True ]
- Benzene C6H6 [False False False]
----------------------------------------------------------------------
There are two ways to store all structures contained in the StructureCollection object, the structures can be written into a hdf5 file or into an AiiDA database using the functions store_in_hdf5_file and store_in_aiidadb, respectively. The structures can be retrieved using the corresponding import_from_hdf5_file and import_from_aiidadb functions.
[11]:
strct_c.store_in_aiida_db(group_label="test")
strct_c = StructureCollection()
strct_c.import_from_aiida_db(group_label="test")
Storing data as group `test` in the AiiDA database.
[12]:
strct_c.store_in_hdf5_file("test.h5")
strct_c = StructureCollection()
strct_c.import_from_hdf5_file("test.h5")
Analysis and manipulation of multiple structures via the StructureOperations
class¶
The StructureOperations
class offers the same structural analysis and manipulation methods as implemented in the Structure
class but offers a more convenient interface to apply the methods on multiple structures at once.
The StructureOperations
object demands a StructureCollection
object upon initialization which the class uses as internal storage for the original as well as newly created structures via the manipulation methods:
[13]:
from aim2dat.strct import StructureOperations
strct_c += strct_c2
strct_op = StructureOperations(structures=strct_c, verbose=False)
There are three additional properties to be set:
verbose
expects a boolean variable, if set toTrue
a progress bar is shown.append_to_coll
expects a boolean variable and defines whether new manipulated structures should be appended to theStructureCollection
stored in thestructures
property.output_format
expects a string and specifies the output format for the analysis methods.
A list of all supported options is returned via the supported_output_formats
property:
[14]:
strct_op.supported_output_formats
[14]:
['dict', 'DataFrame']
All methods of the class are parallelized, two properties control the parallelization, both expecting a positive integer number:
n_procs
sets the number of used processes.chunksize
defines the number of tasks assigned to each process at once.
As mentioned before, the StructureOperations
class inherits the same analysis and manipulation methods as the Structure
class which can be listed with the same properties:
[15]:
print("Analysis methods: ", strct_op.analysis_methods)
print("Manipulation methods: ", strct_op.manipulation_methods)
Analysis methods: ['determine_point_group', 'determine_space_group', 'calculate_distance', 'calculate_angle', 'calculate_dihedral_angle', 'calculate_voronoi_tessellation', 'calculate_coordination', 'calculate_ffingerprint']
Manipulation methods: ['delete_atoms', 'scale_unit_cell', 'substitute_elements']
The analysis and manipulation methods work the same way as for the Structure
object, however, now we have the option to specify the first argument of the methods which gives the key or a list/tuple of keys in order to apply the method on the structure(s) in the StructureCollection
identified by the key(s).
In case a single key is given by an integer number or the structure label the output will be the same as for the Structure
.
For example, the calculation of the distance between two atoms can be performed via the StructureOperations
or the StructureCollection
object in one line:
[16]:
print("One structure: ", strct_op[["Benzene"]].calculate_distance(2, 3))
print("One structure: ", strct_c["Benzene"].calculate_distance(site_index1=2, site_index2=3))
One structure: {'Benzene': 2.8162}
One structure: 2.8162
Note
It is important to note that the StructureOperations
class behaves differently for strct_op["Benzene"].calculate_distance(2, 3)
and strct_op[["Benzene"]].calculate_distance(2, 3)
. In the latter case, the input is given as a list, and as such, the output is consistent with the use case of multiple structures described below.
The advantage of the StructureOperations
class comes into play, when several structures are analysed at once, e.g.:
[17]:
print("Multiple structures: ", strct_op[0,1].calculate_distance(0, 1))
Multiple structures: {'Li': 2.598076211353316, 'GaAs': 5.750192323029818}
If the output_format
is changed to 'DataFrame'
a pandas data frame is returned using the structure labels as indices and the results are stored in a column named like the called method:
[18]:
strct_op.output_format = "DataFrame"
strct_op[0, 1].calculate_distance(0, 1)
[18]:
<function calculate_distance at 0x7f230f18b2e0> | |
---|---|
Li | 2.598076 |
GaAs | 5.750192 |
As for the structural manipulation methods, once again, the output for a single key will be the same for the Structure
The only difference is that if append_to_coll
is set to True
the new structure (for the manipulation methods) is also added to its StructureCollection
object:
[19]:
subst_structure = strct_op["GaAs"].substitute_elements(("Ga", "Al"), change_label=True)
print(subst_structure)
print(strct_op.structures.labels)
----------------------------------------------------------------------
--------------------- Structure: GaAs_subst-GaAl ---------------------
----------------------------------------------------------------------
Formula: Al4As4
PBC: [True True True]
Cell
Vectors: - [ 8.0987 0.0000 0.0000]
- [ 0.0000 8.0987 0.0000]
- [ 0.0000 0.0000 8.0987]
Lengths: [ 8.0987 8.0987 8.0987]
Angles: [ 90.0000 90.0000 90.0000]
Volume: 531.1797
Sites
- Al None [ 0.0000 0.0000 0.0000] [ 0.0000 0.0000 0.0000]
- Al None [ 0.0000 4.0493 4.0493] [ 0.0000 0.5000 0.5000]
- Al None [ 4.0493 0.0000 4.0493] [ 0.5000 0.0000 0.5000]
- Al None [ 4.0493 4.0493 0.0000] [ 0.5000 0.5000 0.0000]
- As None [ 6.0740 6.0740 6.0740] [ 0.7500 0.7500 0.7500]
- As None [ 2.0247 2.0247 6.0740] [ 0.2500 0.2500 0.7500]
- As None [ 2.0247 6.0740 2.0247] [ 0.2500 0.7500 0.2500]
- As None [ 6.0740 2.0247 2.0247] [ 0.7500 0.2500 0.2500]
----------------------------------------------------------------------
['Li', 'GaAs', 'Water', 'Benzene']
We can see that if change_label
is set to True
the newly created Structure
is added to structures
.
If we set change_label
to False
the original structure will be overwritten:
[20]:
subst_structure = strct_op["GaAs"].substitute_elements(("Ga", "Al"), change_label=False)
print(subst_structure)
print(strct_op.structures.labels)
print(strct_op.structures["GaAs"])
----------------------------------------------------------------------
-------------------------- Structure: GaAs ---------------------------
----------------------------------------------------------------------
Formula: Al4As4
PBC: [True True True]
Cell
Vectors: - [ 8.0987 0.0000 0.0000]
- [ 0.0000 8.0987 0.0000]
- [ 0.0000 0.0000 8.0987]
Lengths: [ 8.0987 8.0987 8.0987]
Angles: [ 90.0000 90.0000 90.0000]
Volume: 531.1797
Sites
- Al None [ 0.0000 0.0000 0.0000] [ 0.0000 0.0000 0.0000]
- Al None [ 0.0000 4.0493 4.0493] [ 0.0000 0.5000 0.5000]
- Al None [ 4.0493 0.0000 4.0493] [ 0.5000 0.0000 0.5000]
- Al None [ 4.0493 4.0493 0.0000] [ 0.5000 0.5000 0.0000]
- As None [ 6.0740 6.0740 6.0740] [ 0.7500 0.7500 0.7500]
- As None [ 2.0247 2.0247 6.0740] [ 0.2500 0.2500 0.7500]
- As None [ 2.0247 6.0740 2.0247] [ 0.2500 0.7500 0.2500]
- As None [ 6.0740 2.0247 2.0247] [ 0.7500 0.2500 0.2500]
----------------------------------------------------------------------
['Li', 'GaAs', 'Water', 'Benzene']
----------------------------------------------------------------------
-------------------------- Structure: GaAs ---------------------------
----------------------------------------------------------------------
Formula: Ga4As4
PBC: [True True True]
Cell
Vectors: - [ 8.1320 0.0000 0.0000]
- [ 0.0000 8.1320 0.0000]
- [ 0.0000 0.0000 8.1320]
Lengths: [ 8.1320 8.1320 8.1320]
Angles: [ 90.0000 90.0000 90.0000]
Volume: 537.7645
Sites
- Ga None [ 0.0000 0.0000 0.0000] [ 0.0000 0.0000 0.0000]
- Ga None [ 0.0000 4.0660 4.0660] [ 0.0000 0.5000 0.5000]
- Ga None [ 4.0660 0.0000 4.0660] [ 0.5000 0.0000 0.5000]
- Ga None [ 4.0660 4.0660 0.0000] [ 0.5000 0.5000 0.0000]
- As None [ 6.0990 6.0990 6.0990] [ 0.7500 0.7500 0.7500]
- As None [ 2.0330 2.0330 6.0990] [ 0.2500 0.2500 0.7500]
- As None [ 2.0330 6.0990 2.0330] [ 0.2500 0.7500 0.2500]
- As None [ 6.0990 2.0330 2.0330] [ 0.7500 0.2500 0.2500]
----------------------------------------------------------------------
For a list/tuple of keys instead of a Structure
a StructureCollection
is returned containing the structures identified via the keys:
[21]:
subst_structures = strct_op[strct_op.structures.labels].substitute_elements(
("Al", "Ga"), change_label=False
)
print(subst_structures)
----------------------------------------------------------------------
------------------------ Structure Collection ------------------------
----------------------------------------------------------------------
- Number of structures: 4
- Elements: As-C-Ga-H-Li-O
Structures
- Li Li2 [True True True ]
- GaAs Ga4As4 [True True True ]
- Water OH2 [False False False]
- Benzene C6H6 [False False False]
----------------------------------------------------------------------
It is important to note that in this case, all structures are returned regardless of whether they are actually changed by the method or not.
External analysis and manipulation methods can be used via the implemented perform_analysis
and perform_manipulation
functions, respectively.
In this case the analysis function and its keyword arguments need to be passed.
[22]:
from aim2dat.strct.ext_analysis import calculate_prdf
output = strct_op["Benzene"].perform_analysis(calculate_prdf, {"r_max": 7.5})
Comparing structures via the StructureOperations
class¶
Another handy feature of the class are its comparison methods between to structures or the sites of a structure:
compare_structures_via_ffingerprint
compare_structures_via_comp_sym
compare_structures_via_direct_comp
compare_sites_via_coordination
compare_sites_via_ffingerprint
And methods to filter out duplicate structures or find equivalent sites based on the comparison methods:
find_duplicates_via_ffingerprint
find_duplicates_via_comp_sym
find_duplicates_via_direct_comp
find_eq_sites_via_coordination
find_eq_sites_via_ffingerprint
Related examples¶
Related API instances¶
StructureOperations