pyxtal.db module
Database class
- pyxtal.db.call_opt_single(p)[source]
Optimize a single structure and log the result.
- Parameters:
p (tuple) – A tuple where the first element is an identifier (id), and the remaining elements are the arguments to pass to opt_single.
- Returns:
- A tuple (id, xtal, eng) where:
id (int): The identifier of the structure.
xtal: The optimized structure.
eng (float): The energy of the opt_structure, or None if it failed.
- Return type:
tuple
- Behavior:
This function calls opt_single to perform the optimization of the structure associated with the given id.
- class pyxtal.db.database(db_name)[source]
Bases:
objectThis is a database class to process crystal data.
- Parameters:
db_name – *.db format from ase database
- class pyxtal.db.database_topology(db_name, rank=0, size=1, ltol=0.05, stol=0.05, atol=3, log_file='db.log')[source]
Bases:
objectThis is a database class to process atomic crystal data
- Parameters:
db_name (str) – *.db format from ase database
rank (int) – default 0
size (int) – default 1
ltol (float) – lattice tolerance
stol (float) – site tolerance
atol (float) – angle tolerance
log_file (str) – log_file
- add_strucs_from_db(db_file, check=False, id_min=0, id_max=None, tol=0.001, freq=50, use_relaxed=None, sort=None, max_count=None, criteria=None, min_atoms=0, max_atoms=250, ignore_check='vasp_energy', same_number=False)[source]
Add new structures from another database file.
- Parameters:
db_file (str) – Path to the source database file
check (bool) – Whether to check if structure already exists before adding
id_min (int) – Starting ID to import from source database. Default is 0
id_max (int) – Ending ID to import from source database. Default is None
tol (float) – Tolerance in Angstroms for symmetry detection. Default is 1e-3
freq (int) – Print progress message every N structures. Default is 50
use_relaxed (str) – Relaxed structure to use - ‘ff_relaxed’, ‘vasp_relaxed’ Default is None to use unrelaxed structures
sort (str) – key to sort the structure, e.g. ‘mace_energy’ Default is None to use row.id
max_count (int) – Number of maximum structure to add Default is None to all all structures
criteria (dict) – criteria to check if a valid.
- check_new_structure(xtal, eng=None, same_group=False, same_number=False, d_tol=0.2, e_tol=0.01, max_atoms=250, min_atoms=0, return_id=False)[source]
Check if the input crystal structure already exists in the database.
- Parameters:
xtal – PyXtal object representing the crystal structure to check
eng (float, optional) – Energy of the structure to compare
same_group (bool) – Whether to only compare structures with same space group
d_tol (float) – Tolerance for density comparison
e_tol (float) – Tolerance for energy comparison
max_atoms (int) – maximum number of atoms for checking
min_atoms (int) – minimum number of atoms for checking
- Returns:
True if structure is new/unique, False if it matches an existing structure
- Return type:
bool
Note
Compares structures based on: - Space group number (if same_group=True) - Density (within d_tol) - Energy (within e_tol if provided) - Structure similarity via pymatgen.analysis.structure_matcher
- check_overlap(reference_db, etol=0.002, verbose=True)[source]
Check the overlap with a reference database.
- Parameters:
reference_db (str) – Path to the reference database file
etol (float, optional) – Energy tolerance for identifying identical structures. Default is 2e-3.
verbose (bool, optional) – Whether to print detailed overlap information. Default is True.
- Returns:
List of overlapping structures, where each entry contains: (id, pearson_symbol, dof, topology, ff_energy)
- Return type:
list
Note
- Two structures are considered overlapping if they have:
Same topology
Same topology detail
Force field energies within etol of each other
- clean_structures(ids=(None, None), dtol=0.002, etol=0.001, criteria=None, eng_key='mace_energy')[source]
Clean up the db by removing the duplicate structures Here we check the follow criteria
same number of atoms
same density
same energy
- Parameters:
dtol (float) – tolerance of density
etol (float) – tolerance of energy
criteria (dict) – including
- clean_structures_pmg(ids=(None, None), min_id=None, dtol=0.05, criteria=None)[source]
Clean up the database by removing duplicate structures based on density and pymatgen matcher.
This method checks for duplicates by comparing structure density within a tolerance and using pymatgen’s StructureMatcher. It can also filter structures based on various criteria like coordination numbers, energies, topology, etc.
ids (tuple, optional): Range of IDs (min, max) to process. Defaults to (None, None). min_id (int, optional): Minimum ID to consider. Structures with lower IDs won’t be deleted.
Defaults to None.
dtol (float, optional): Density tolerance for comparing structures. Defaults to 5e-2. criteria (dict, optional): Dictionary of filtering criteria. Defaults to None.
- Supported criteria keys:
‘CN’: Dict of required coordination numbers per element
‘cutoff’: Float, cutoff distance for connectivity
‘MAX_energy’: Float, maximum allowed energy
‘MAX_similarity’: Float, maximum allowed similarity value
‘BAD_topology’: List of forbidden topology types
‘BAD_dimension’: List of forbidden dimensionality values
Example criteria: {
‘CN’: {‘C’: 3}, ‘BAD_dimension’: [0, 2]
}
- Returns:
None. Modifies database in place by deleting duplicate/invalid structures.
- clean_structures_spg_topology(dim=None)[source]
Clean up the db by removing duplicate structures based on their properties.
- Parameters:
dim (int, optional) – Filter structures by dimension. Only keep structures with this dimension if specified. Defaults to None.
The function removes structures that have identical: - Number of atoms - Space group - Topology - Wyckoff positions (wps)
- export_structures(fmt='vasp', folder='mof_out', criteria=None, sort_by='similarity', overwrite=True, cutoff=None, use_relaxed=None)[source]
Export structures from database according to given criteria.
- Parameters:
fmt (str) – Output format (
vasporcif)folder (str) – Path to output folder
criteria (dict) – Dictionary of validity criteria
sort_by (str) – Attribute to sort structures by
overwrite (bool) – Whether to remove existing output folder
cutoff (int) – Maximum number of structures to export
use_relaxed (str, optional) – e.g.,
ff_relaxedorvasp_relaxed
- get_db_unique(db_name=None, prec=3, key='ff_energy', max_N_atoms=64)[source]
Get a database file containing only unique structures based on topology and energy.
- Parameters:
db_name (str, optional) – Filename for the new database.
None (If)
suffix. (will use original name with '_unique')
prec (int, optional) – Precision for rounding energy values. Default is 3.
key (str, optional) – Energy attribute name to use for filtering.
'ff_energy'. (Default is)
max_N_atoms (int, optional) – Maximum n_atoms for pmg match. Default is 64.
- Returns:
Number of unique structures in the new database.
- Return type:
int
Note
Two structures are considered identical if they have: - Same density value (within precision) - Same energy value (within precision) - Pymatgen match
When duplicates are found, the structure with lower DOF is kept.
- get_db_unique_topology(db_name=None, prec=3, update_topology=True, key='ff_energy')[source]
Get a database file containing only unique structures based on topology and energy.
- Parameters:
db_name (str, optional) – Filename for the new database.
None (If)
suffix. (will use original name with '_unique')
prec (int, optional) – Precision for rounding energy values. Default is 3.
update_topology (bool, optional) – Whether to update topology before filtering.
True. (Default is)
key (str, optional) – Energy attribute name to use for filtering.
'ff_energy'. (Default is)
- Returns:
Number of unique structures in the new database.
- Return type:
int
Note
Two structures are considered identical if they have: - Same topology - Same topology detail - Same energy value (within precision)
When duplicates are found, the structure with lower DOF is kept.
- get_properties(prop)[source]
Retrieve a list of specific property values from the database rows.
- Parameters:
prop (str) – The property name to retrieve (e.g., ‘ff_energy’)
- Returns:
- A list of property values for rows that have the specified property.
If a row does not contain the property, it is ignored.
- Return type:
list
- Raises:
Warning – If no rows in the database contain the specified property.
- get_pyxtal(id, use_relaxed=None, tol=0.0001)[source]
Get pyxtal based on row_id, if use_relaxed, get pyxtal from ff_relaxed
- Parameters:
id (int) – row id
use_relaxed (str) – ‘ff_relaxed’, ‘vasp_relaxed’
- plot_histogram(prop, ax=None, filename=None, xlim=None, nbins=20)[source]
Plot the histogram of a specified row property.
- Parameters:
prop (str) – The name of the property to plot (e.g., ‘ff_energy’).
ax (matplotlib.axes.Axes, optional) – Pre-existing axis to plot on. If None, a new ax will be created.
filename (str, optional) – Path to save the plot (e.g., ‘plot.png’). If None, the plot will not be saved.
xlim (tuple, optional) – Limits for the x-axis (e.g., (0, 10)). If None, the x-axis will scale automatically.
nbins (int, optional) – Number of bins for the histogram. Default is 20.
- Returns:
The axis object with the histogram plotted.
- Return type:
matplotlib.axes.Axes
- print_info(excluded_ids=None, cutoff=100)[source]
Print out the summary of the database based on the calculated energy Mostly used to quickly view the most interesting low-energy structures. Todo: show vasp_energy if available
- Parameters:
excluded_ids (list) – list of unwanted row ids
cutoff (int) – the cutoff value for the print
- select_xtal(ids, N_atoms=(None, None), overwrite=False, attribute=None, use_relaxed=None)[source]
Lazy extraction of selected xtals from the database.
- Parameters:
ids (tuple) – Minimum and maximum row IDs to extract, e.g. (1, 10)
N_atoms (tuple) – Minimum and maximum number of atoms to extract, e.g. (2, 100)
overwrite (bool) – Whether to overwrite existing entries
attribute (str) – Attribute name to check for extraction
use_relaxed (str) – Type of relaxed structure to use (‘ff_relaxed’ or ‘vasp_relaxed’)
- Yields:
tuple – (id, xtal) where id is the row ID and xtal is the corresponding pyxtal object
- select_xtals(ids, N_atoms=(None, None), overwrite=False, attribute=None, use_relaxed=None)[source]
Extract xtals based on attribute name.
- Parameters:
ids (tuple) – Minimum and maximum row IDs to extract, e.g. (1, 10)
N_atoms (tuple) – Minimum and maximum number of atoms to extract, e.g. (2, 100)
overwrite (bool) – Whether to overwrite existing entries
attribute (str) – Attribute name to check for extraction
use_relaxed (str) – Type of relaxed structure to use (‘ff_relaxed’ or ‘vasp_relaxed’)
- Returns:
- (ids, xtals) where ids is a list of row IDs and xtals is a list of
corresponding pyxtal objects
- Return type:
tuple
- update_db_description()[source]
Update database description using robocrys.
Uses robocrystallographer (https://github.com/hackingmaterials/robocrystallographer) to generate natural language descriptions of crystal structures.
- For each row in the database that doesn’t have a description:
Converts ASE atoms to pymatgen structure
Uses StructureCondenser to analyze bonding/connectivity
Uses StructureDescriber to generate text description
Updates the database row with the description
Note
Use it with caution, as it may take a long time to run.
- update_row_energy(calculator='GULP', ids=(None, None), N_atoms=(None, None), ncpu=1, criteria=None, symmetrize=False, overwrite=False, write_freq=100, ff_lib='reaxff', steps=250, fmax=0.1, use_relaxed=None, cmd=None, calc_folder=None, skf_dir=None)[source]
Update the row energy in the database for a given calculator.
- Parameters:
calculator (str) – ‘GULP’, ‘MACE’, ‘VASP’, ‘DFTB’
ids (tuple) – A tuple specifying row IDs to update (e.g., (0, 100)).
ncpu (int) – number of parallel processes
criteria (dict, optional) – Criteria when selecting structures.
symmetrize (bool) – symmetrize the structure before calculation
overwrite (bool) – overwrite the existing energy attributes.
write_freq (int) – frequency to update db for ncpu=1
ff_lib (str) – Force field to use for GULP (‘reaxff’ by default).
steps (int) – Number of optimization steps for DFTB (default is 250).
fmax (float) – force tolerance for mace (defalut is 0.1)
use_relaxed (str, optional) – Use relaxed structures (e.g. ‘ff_relaxed’)
cmd (str, optional) – Command for VASP calculations
calc_folder (str, optional) – calc_folder for GULP/VASP calculations
skf_dir (str, optional) – Directory for DFTB potential files
- Functionality:
Using the selected calculator, it updates the energy rows of the database. If ncpu > 1, run in parallel; otherwise in serial.
- Calculator Options:
‘GULP’: Uses a force field (e.g., ‘reaxff’).
‘MACE’: Uses the MACE calculator.
‘DFTB’: Uses DFTB+ with symmetrization options.
‘VASP’: Uses VASP, with a specified command (cmd).
- update_row_energy_mproc(ncpu, generator, args, args_up)[source]
Perform parallel row energy updates by optimizing atomic structures.
- Parameters:
ncpu (int) – Number of CPUs to use for parallel processing.
generator (generator) – yielding tuples of (id, xtal), where: - id (int): Unique identifier for the structure. - xtal (object): pyxtal instance.
args (list) – Additional arguments passed to call_opt_single. - Typically includes a calculator or potential parameters.
args_up (list) – Additional arguments for function _update_db.
- Functionality:
This function distributes the structures across multiple CPUs using multiprocessing.Pool. It creates chunks (based on ncpu), and process them in parallel by calling call_opt_single. Successful results are periodically written to the database. The function also prints memory usage after each database update.
- Parallelization Process:
The Pool is initialized with ncpu processes.
Structures are divided into chunks with the chunkify function.
Each chunk is processed by call_opt_single via the pool.
Successful results are periodically written to the database.
The pool is closed and joined after processing is complete.
- update_row_energy_serial(generator, write_freq, args, args_up)[source]
Perform a serial update of row energies
- Parameters:
generator (generator) – Yielding tuples of (id, xtal), where: - id (int): Unique identifier for the structure. - xtal (object): pyxtal instance.
write_freq (int) – Frequency to update the database.
args (list) – Additional arguments to the function opt_single.
args_up (list) – Additional arguments for function _update_db.
- Functionality:
It iterates over structures provided by generator, optimizes them using opt_single, and collects results that have converged (status == True). Once the number of results reaches write_freq, it updates the database.
- update_row_topology(StructureType='Auto', overwrite=True, prefix=None, ref_dim=3, timeout=60)[source]
Update row topology using CrystalNets.jl via subprocess (faster than juliacall).
- Parameters:
StructureType (str) – Type of structure to analyze. Options are: - ‘Zeolite’: For zeolite structures - ‘MOF’: For metal-organic frameworks - ‘Auto’: For automatic detection
overwrite (bool) – Whether to overwrite existing topology attributes.
prefix (str) – Prefix for temporary CIF files.
ref_dim (int) – Reference dimensionality to compare against.
timeout (int) – Timeout in seconds for each Julia call. Default is 60.
- pyxtal.db.dftb_opt_single(id, xtal, skf_dir, steps, symmetrize, criteria, kresol=0.05)[source]
Single DFTB optimization for a given atomic xtal
- Parameters:
id (int) – id of the give xtal
xtal – pyxtal instance
skf_dir (str) – path of skf files
steps (int) – number of relaxation steps
criteria (dicts) – to check if the structure
- pyxtal.db.gulp_opt_single(id, xtal, ff_lib, path, criteria)[source]
Perform a single GULP optimization for a given crystal structure.
- Parameters:
id (int) – Identifier for the current structure.
xtal – PyXtal instance representing the crystal to be optimized.
ff_lib (str) – Force field library for GULP, e.g., ‘reaxff’, ‘tersoff’.
path (str) – Path to the folder where the calculation is stored.
criteria (dict) – Dictionary to check the validity of the opt_structure.
- Returns:
xtal: Optimized PyXtal instance.
eng (float): Energy of the optimized structure.
status (bool): Whether the optimization process is successful.
- Return type:
tuple
- Behavior:
This function performs a GULP optimization using the force field. After the optimization, it checks the validity of the structure and attempts to remove the calculation folder if it is empty.
- pyxtal.db.mace_opt_single(id, xtal, step, fmax, criteria)[source]
Perform a single MACE optimization for a given atomic crystal structure.
- Parameters:
id (int) – Identifier for the current structure.
xtal – PyXtal instance representing the crystal structure.
step (int) – Maximum number of relaxation steps. Default is 250.
fmax (float) – fmax for relaxation
criteria (dict) – Dictionary to check the validity of the optimized structure.
- Returns:
xtal: Optimized PyXtal instance (or None if optimization failed).
eng (float): Energy/atom of the opt_structure (or None if it failed).
status (bool): Whether the optimization was successful.
- Return type:
tuple
- pyxtal.db.make_db_from_CSD(dbname, codes)[source]
make database from CSD codes
- Parameters:
dbname – db file name
codes – a list of CSD codes
- pyxtal.db.make_entry_from_CSD(code)[source]
make entry dictionary from CSD codes
- Parameters:
code – a list of CSD codes
- pyxtal.db.make_entry_from_CSD_web(code, number, smiles, name=None)[source]
make enetry dictionary from csd web https://www.ccdc.cam.ac.uk/structures
- Parameters:
code – CSD style letter entry
number – ccdc number
smiles – the corresponding molecular smiles
name – name of the compound
- pyxtal.db.make_entry_from_pyxtal(xtal)[source]
Generate an entry dictionary from a PyXtal object, assuming the SMILES and CCDC number information is provided.
- Parameters:
xtal – PyXtal object (must contain the SMILES (xtal.tag[“smiles”])
number (and CCDC)
- Returns:
- (ase_atoms, entry_dict, None)
ase_atoms: ASE Atoms object converted from the PyXtal structure.
entry_dict (dict): A dictionary containing information
None: Placeholder for future use (currently returns None).
- Return type:
tuple
- Structure of entry_dict:
“csd_code” (str): CSD code (if available) for the crystal structure.
“mol_smi” (str): SMILES representation of the molecule.
“ccdc_number” (str): CCDC identifier number.
“space_group” (str): Space group symbol of the crystal.
“spg_num” (int): Space group number.
“Z” (int): Number of molecules in the unit cell.
“Zprime” (float): Z’ value of the crystal.
“url” (str): URL link to the CCDC database entry for the crystal.
“mol_formula” (str): Molecular formula of the structure.
“mol_weight” (float): Molecular weight of the structure.
“mol_name” (str): Name of the molecule, typically the CSD code.
“l_type” (str): Lattice type of the structure.
Returns None if the PyXtal structure is invalid (i.e., xtal.valid is False).
Example
entry = make_entry_from_pyxtal(xtal_instance) ase_atoms, entry_dict, _ = entry
Notes
The CCDC link is generated using the structure’s CCDC number.
- pyxtal.db.opt_single(id, xtal, calc, *args)[source]
Optimize a structure using the specified calculator.
- Parameters:
id (int) – Identifier of the structure to be optimized.
xtal – Crystal structure object to be optimized.
calc (str) – The calculator to use (‘GULP’, ‘DFTB’, ‘VASP’, ‘MACE’).
*args – Additional arguments to pass to the calculator function.
- Returns:
- The result of the optimization, which typically includes:
xtal: The optimized structure.
energy (float): The energy of the optimized structure.
status (bool): Whether the optimization was successful.
- Return type:
tuple
- Raises:
ValueError – If an unsupported calculator is specified.