Handling multiple structures at once

To facilitate the work with larger sets of structures, e.g. for high-throughput studies, this package includes the StructureCollection and the StructureOperations classes.

The StructureCollection class

The StructureCollection acts as a data container for larger numbers of molecules or crystals:

[1]:
from aim2dat.strct import StructureCollection

strct_c = StructureCollection()

The structures can be added to the object via the different append* functions of the object:

[2]:
from ase.spacegroup import crystal
import aiida
from aiida.orm import StructureData
from aim2dat.strct import Structure

structure_dict = {
    "label": "Benzene",
    "elements": ["C"] * 6 + ["H"] * 6,
    "pbc": False,
    "positions": [
        [-0.7040, -1.2194, -0.0000],
        [0.7040, -1.2194, -0.0000],
        [-1.4081, -0.0000, -0.0000],
        [1.4081, 0.0000, 0.0000],
        [-0.7040, 1.2194, 0.0000],
        [0.7040, 1.2194, -0.0000],
        [-1.2152, -2.1048, -0.0000],
        [1.2152, -2.1048, 0.0000],
        [-2.4304, -0.0000, 0.0000],
        [2.4304, 0.0000, 0.0000],
        [-1.2152, 2.1048, -0.0000],
        [1.2152, 2.1048, 0.0000],
    ],
}
strct_c.append(**structure_dict)

structure = Structure(
    elements=["O", "H", "H"],
    positions=[[0.0, 0.0, 0.119], [0.0, 0.763, -0.477], [0.0, -0.763, -0.477]],
    pbc=False,
)
strct_c.append_structure(structure, label="Water")

a = 4.066 * 2.0
GaAs_conv = crystal(
    ("Ga", "As"),
    basis=((0.0, 0.0, 0.0), (0.75, 0.75, 0.75)),
    spacegroup=216,
    cellpar=[a, a, a, 90, 90, 90],
    primitive_cell=False,
)
strct_c.append_from_ase_atoms("GaAs", GaAs_conv)

aiida.load_profile("tests")
unit_cell = [[3.0, 0.0, 0.0], [0.0, 3.0, 0.0], [0.0, 0.0, 3.0]]
structure = StructureData(cell=unit_cell)
structure.label = "Li"
structure.append_atom(position=(0.0, 0.0, 0.0), symbols="Li")
structure.append_atom(position=(1.5, 1.5, 1.5), symbols="Li")
strct_c.append_from_aiida_structuredata(structure)

Alternatively, a list of dictionaries can be passed upon initialization of the object:

[3]:
strct_c2 = StructureCollection(structures=[structure_dict])

A summary of the object is given by its string representation:

[4]:
print(strct_c)
----------------------------------------------------------------------
------------------------ Structure Collection ------------------------
----------------------------------------------------------------------

 - Number of structures: 4
 - Elements: As-C-Ga-H-Li-O

                              Structures
  - Benzene             C6H6                [False False False]
  - Water               OH2                 [False False False]
  - GaAs                Ga4As4              [True  True  True ]
  - Li                  Li2                 [True  True  True ]
----------------------------------------------------------------------

Additionally, a pandas data frame can be created based on the object’s content:

[5]:
strct_c2.create_pandas_df()
[5]:
label el_conc_C el_conc_H nr_atoms nr_atoms_C nr_atoms_H
0 Benzene 0.5 0.5 12 6 6

The StructureCollection object contains features of the list and dictionary python types and stores each structure as Structure object in a list. As such each added structure gets an index (integer number) and a label (string) assigned that is used to obtain the structure. While the label is stored within the Structure object in the label property the index is given by the position in the internal list of the StructureCollection object and defined by the order of the append* function calls.

The structure can be obtained via the get_structure function or squared brackets using its label or index:

[6]:
print(strct_c[1])
----------------------------------------------------------------------
-------------------------- Structure: Water --------------------------
----------------------------------------------------------------------

 Formula: OH2
 PBC: [False False False]

                                Sites
  - O  None  [  0.0000   0.0000   0.1190]
  - H  None  [  0.0000   0.7630  -0.4770]
  - H  None  [  0.0000  -0.7630  -0.4770]
----------------------------------------------------------------------
[7]:
print(strct_c["Water"])
----------------------------------------------------------------------
-------------------------- Structure: Water --------------------------
----------------------------------------------------------------------

 Formula: OH2
 PBC: [False False False]

                                Sites
  - O  None  [  0.0000   0.0000   0.1190]
  - H  None  [  0.0000   0.7630  -0.4770]
  - H  None  [  0.0000  -0.7630  -0.4770]
----------------------------------------------------------------------
[8]:
print(strct_c.get_structure(3))
----------------------------------------------------------------------
--------------------------- Structure: Li ----------------------------
----------------------------------------------------------------------

 Formula: Li2
 PBC: [True True True]

                                 Cell
 Vectors: - [  3.0000   0.0000   0.0000]
          - [  0.0000   3.0000   0.0000]
          - [  0.0000   0.0000   3.0000]
 Lengths: [  3.0000   3.0000   3.0000]
 Angles: [ 90.0000  90.0000  90.0000]
 Volume: 27.0000

                                Sites
  - Li None  [  0.0000   0.0000   0.0000] [  0.0000   0.0000   0.0000]
  - Li None  [  1.5000   1.5000   1.5000] [  0.5000   0.5000   0.5000]
----------------------------------------------------------------------

Similar to a list the index of the structure is returned using the index function and a structure can be deleted via del and the pop function is implemented as well. All labels of the structures are returned via the labels property:

[9]:
del strct_c["Benzene"]
strct_c.labels
[9]:
['Water', 'GaAs', 'Li']

Two structure collection objects can be merged into one using +:

[10]:
print(strct_c + strct_c2)
----------------------------------------------------------------------
------------------------ Structure Collection ------------------------
----------------------------------------------------------------------

 - Number of structures: 4
 - Elements: As-C-Ga-H-Li-O

                              Structures
  - Water               OH2                 [False False False]
  - GaAs                Ga4As4              [True  True  True ]
  - Li                  Li2                 [True  True  True ]
  - Benzene             C6H6                [False False False]
----------------------------------------------------------------------

There are two ways to store all structures contained in the StructureCollection object, the structures can be written into a hdf5 file or into an AiiDA database using the functions store_in_hdf5_file and store_in_aiidadb, respectively. The structures can be retrieved using the corresponding import_from_hdf5_file and import_from_aiidadb functions.

[11]:
strct_c.store_in_aiida_db(group_label="test")
strct_c = StructureCollection()
strct_c.import_from_aiida_db(group_label="test")
Storing data as group `test` in the AiiDA database.
[12]:
strct_c.store_in_hdf5_file("test.h5")
strct_c = StructureCollection()
strct_c.import_from_hdf5_file("test.h5")

Analysis and manipulation of multiple structures via the StructureOperations class

The StructureOperations class offers the same structural analysis and manipulation methods as implemented in the Structure class but offers a more convenient interface to apply the methods on multiple structures at once.

The StructureOperations object demands a StructureCollection object upon initialization which the class uses as internal storage for the original as well as newly created structures via the manipulation methods:

[13]:
from aim2dat.strct import StructureOperations

strct_c += strct_c2
strct_op = StructureOperations(structures=strct_c, verbose=False)

There are three additional properties to be set:

  • verbose expects a boolean variable, if set to True a progress bar is shown.

  • append_to_coll expects a boolean variable and defines whether new manipulated structures should be appended to the StructureCollection stored in the structures property.

  • output_format expects a string and specifies the output format for the analysis methods.

A list of all supported options is returned via the supported_output_formats property:

[14]:
strct_op.supported_output_formats
[14]:
['dict', 'DataFrame']

All methods of the class are parallelized, two properties control the parallelization, both expecting a positive integer number:

  • n_procs sets the number of used processes.

  • chunksize defines the number of tasks assigned to each process at once.

As mentioned before, the StructureOperations class inherits the same analysis and manipulation methods as the Structure class which can be listed with the same properties:

[15]:
print("Analysis methods: ", strct_op.analysis_methods)
print("Manipulation methods: ", strct_op.manipulation_methods)
Analysis methods:  ['determine_point_group', 'determine_space_group', 'calculate_distance', 'calculate_angle', 'calculate_dihedral_angle', 'calculate_voronoi_tessellation', 'calculate_coordination', 'calculate_ffingerprint']
Manipulation methods:  ['delete_atoms', 'scale_unit_cell', 'substitute_elements']

The analysis and manipulation methods work the same way as for the Structure object, however, now we have the option to specify the first argument of the methods which gives the key or a list/tuple of keys in order to apply the method on the structure(s) in the StructureCollection identified by the key(s). In case a single key is given by an integer number or the structure label the output will be the same as for the Structure. For example, the calculation of the distance between two atoms can be performed via the StructureOperations or the StructureCollection object in one line:

[16]:
print("One structure: ", strct_op[["Benzene"]].calculate_distance(2, 3))
print("One structure: ", strct_c["Benzene"].calculate_distance(site_index1=2, site_index2=3))
One structure:  {'Benzene': 2.8162}
One structure:  2.8162

Note

It is important to note that the StructureOperations class behaves differently for strct_op["Benzene"].calculate_distance(2, 3) and strct_op[["Benzene"]].calculate_distance(2, 3). In the latter case, the input is given as a list, and as such, the output is consistent with the use case of multiple structures described below.

The advantage of the StructureOperations class comes into play, when several structures are analysed at once, e.g.:

[17]:
print("Multiple structures: ", strct_op[0,1].calculate_distance(0, 1))
Multiple structures:  {'Li': 2.598076211353316, 'GaAs': 5.750192323029818}

If the output_format is changed to 'DataFrame' a pandas data frame is returned using the structure labels as indices and the results are stored in a column named like the called method:

[18]:
strct_op.output_format = "DataFrame"
strct_op[0, 1].calculate_distance(0, 1)
[18]:
<function calculate_distance at 0x7f230f18b2e0>
Li 2.598076
GaAs 5.750192

As for the structural manipulation methods, once again, the output for a single key will be the same for the Structure The only difference is that if append_to_coll is set to True the new structure (for the manipulation methods) is also added to its StructureCollection object:

[19]:
subst_structure = strct_op["GaAs"].substitute_elements(("Ga", "Al"), change_label=True)
print(subst_structure)
print(strct_op.structures.labels)
----------------------------------------------------------------------
--------------------- Structure: GaAs_subst-GaAl ---------------------
----------------------------------------------------------------------

 Formula: Al4As4
 PBC: [True True True]

                                 Cell
 Vectors: - [  8.0987   0.0000   0.0000]
          - [  0.0000   8.0987   0.0000]
          - [  0.0000   0.0000   8.0987]
 Lengths: [  8.0987   8.0987   8.0987]
 Angles: [ 90.0000  90.0000  90.0000]
 Volume: 531.1797

                                Sites
  - Al None  [  0.0000   0.0000   0.0000] [  0.0000   0.0000   0.0000]
  - Al None  [  0.0000   4.0493   4.0493] [  0.0000   0.5000   0.5000]
  - Al None  [  4.0493   0.0000   4.0493] [  0.5000   0.0000   0.5000]
  - Al None  [  4.0493   4.0493   0.0000] [  0.5000   0.5000   0.0000]
  - As None  [  6.0740   6.0740   6.0740] [  0.7500   0.7500   0.7500]
  - As None  [  2.0247   2.0247   6.0740] [  0.2500   0.2500   0.7500]
  - As None  [  2.0247   6.0740   2.0247] [  0.2500   0.7500   0.2500]
  - As None  [  6.0740   2.0247   2.0247] [  0.7500   0.2500   0.2500]
----------------------------------------------------------------------
['Li', 'GaAs', 'Water', 'Benzene']

We can see that if change_label is set to True the newly created Structure is added to structures. If we set change_label to False the original structure will be overwritten:

[20]:
subst_structure = strct_op["GaAs"].substitute_elements(("Ga", "Al"), change_label=False)
print(subst_structure)
print(strct_op.structures.labels)
print(strct_op.structures["GaAs"])
----------------------------------------------------------------------
-------------------------- Structure: GaAs ---------------------------
----------------------------------------------------------------------

 Formula: Al4As4
 PBC: [True True True]

                                 Cell
 Vectors: - [  8.0987   0.0000   0.0000]
          - [  0.0000   8.0987   0.0000]
          - [  0.0000   0.0000   8.0987]
 Lengths: [  8.0987   8.0987   8.0987]
 Angles: [ 90.0000  90.0000  90.0000]
 Volume: 531.1797

                                Sites
  - Al None  [  0.0000   0.0000   0.0000] [  0.0000   0.0000   0.0000]
  - Al None  [  0.0000   4.0493   4.0493] [  0.0000   0.5000   0.5000]
  - Al None  [  4.0493   0.0000   4.0493] [  0.5000   0.0000   0.5000]
  - Al None  [  4.0493   4.0493   0.0000] [  0.5000   0.5000   0.0000]
  - As None  [  6.0740   6.0740   6.0740] [  0.7500   0.7500   0.7500]
  - As None  [  2.0247   2.0247   6.0740] [  0.2500   0.2500   0.7500]
  - As None  [  2.0247   6.0740   2.0247] [  0.2500   0.7500   0.2500]
  - As None  [  6.0740   2.0247   2.0247] [  0.7500   0.2500   0.2500]
----------------------------------------------------------------------
['Li', 'GaAs', 'Water', 'Benzene']
----------------------------------------------------------------------
-------------------------- Structure: GaAs ---------------------------
----------------------------------------------------------------------

 Formula: Ga4As4
 PBC: [True True True]

                                 Cell
 Vectors: - [  8.1320   0.0000   0.0000]
          - [  0.0000   8.1320   0.0000]
          - [  0.0000   0.0000   8.1320]
 Lengths: [  8.1320   8.1320   8.1320]
 Angles: [ 90.0000  90.0000  90.0000]
 Volume: 537.7645

                                Sites
  - Ga None  [  0.0000   0.0000   0.0000] [  0.0000   0.0000   0.0000]
  - Ga None  [  0.0000   4.0660   4.0660] [  0.0000   0.5000   0.5000]
  - Ga None  [  4.0660   0.0000   4.0660] [  0.5000   0.0000   0.5000]
  - Ga None  [  4.0660   4.0660   0.0000] [  0.5000   0.5000   0.0000]
  - As None  [  6.0990   6.0990   6.0990] [  0.7500   0.7500   0.7500]
  - As None  [  2.0330   2.0330   6.0990] [  0.2500   0.2500   0.7500]
  - As None  [  2.0330   6.0990   2.0330] [  0.2500   0.7500   0.2500]
  - As None  [  6.0990   2.0330   2.0330] [  0.7500   0.2500   0.2500]
----------------------------------------------------------------------

For a list/tuple of keys instead of a Structure a StructureCollection is returned containing the structures identified via the keys:

[21]:
subst_structures = strct_op[strct_op.structures.labels].substitute_elements(
    ("Al", "Ga"), change_label=False
)
print(subst_structures)
----------------------------------------------------------------------
------------------------ Structure Collection ------------------------
----------------------------------------------------------------------

 - Number of structures: 4
 - Elements: As-C-Ga-H-Li-O

                              Structures
  - Li                  Li2                 [True  True  True ]
  - GaAs                Ga4As4              [True  True  True ]
  - Water               OH2                 [False False False]
  - Benzene             C6H6                [False False False]
----------------------------------------------------------------------

It is important to note that in this case, all structures are returned regardless of whether they are actually changed by the method or not.

External analysis and manipulation methods can be used via the implemented perform_analysis and perform_manipulation functions, respectively. In this case the analysis function and its keyword arguments need to be passed.

[22]:
from aim2dat.strct.ext_analysis import calculate_prdf

output = strct_op["Benzene"].perform_analysis(calculate_prdf, {"r_max": 7.5})

Comparing structures via the StructureOperations class

Another handy feature of the class are its comparison methods between to structures or the sites of a structure:

  • compare_structures_via_ffingerprint

  • compare_structures_via_comp_sym

  • compare_structures_via_direct_comp

  • compare_sites_via_coordination

  • compare_sites_via_ffingerprint

And methods to filter out duplicate structures or find equivalent sites based on the comparison methods:

  • find_duplicates_via_ffingerprint

  • find_duplicates_via_comp_sym

  • find_duplicates_via_direct_comp

  • find_eq_sites_via_coordination

  • find_eq_sites_via_ffingerprint