pyxtal.db module
Database class
- pyxtal.db.call_opt_single(p)[source]
Optimize a single structure and log the result.
- Parameters:
p (tuple) – A tuple where the first element is an identifier (id), and the remaining elements are the arguments to pass to opt_single.
- Returns:
- A tuple (id, xtal, eng) where:
id (int): The identifier of the structure.
xtal: The optimized structure.
eng (float): The energy of the opt_structure, or None if it failed.
- Return type:
tuple
- Behavior:
This function calls opt_single to perform the optimization of the structure associated with the given id.
- class pyxtal.db.database(db_name)[source]
Bases:
object
This is a database class to process crystal data.
- Parameters:
db_name – *.db format from ase database
- class pyxtal.db.database_topology(db_name, rank=0, size=1, ltol=0.05, stol=0.05, atol=3, log_file='db.log')[source]
Bases:
object
This is a database class to process atomic crystal data
- Parameters:
db_name (str) – *.db format from ase database
rank (int) – default 0
size (int) – default 1
ltol (float) – lattice tolerance
stol (float) – site tolerance
atol (float) – angle tolerance
log_file (str) – log_file
- add_strucs_from_db(db_file, check=False, tol=0.1, freq=50)[source]
Add new structures from the given db_file
- Parameters:
db_file (str) – database path
tol (float) – tolerance in angstrom for symmetry detection
freq (int) – print frequency
- check_new_structure(xtal, same_group=True)[source]
Check if the input xtal already exists in the db
- Parameters:
xtal – pyxtal object
same_group (bool) – keep the same group or not
- check_overlap(reference_db, etol=0.002, verbose=True)[source]
Check the overlap w.r.t the reference database
- Parameters:
reference_db (str) – path of reference database
etol (float) – energy tolerence to distinguish the identical structure
verbose (bool) – whether or not print out details
- clean_structures(ids=(None, None), dtol=0.002, etol=0.001, criteria=None)[source]
Clean up the db by removing the duplicate structures Here we check the follow criteria - same number of atoms - same density - same energy
- Parameters:
dtol (float) – tolerance of density
etol (float) – tolerance of energy
criteria (dict) – including
- clean_structures_pmg(ids=(None, None), min_id=None, dtol=0.05, criteria=None)[source]
Clean up the db by removing the duplicate structures. Here we check the follow criteria same density and pymatgen matcher. The criteria should look like the following,
- {‘CN’: {‘C’: 3},
‘cutoff’: 1.8, ‘MAX_energy’: -8.00, #’MAX_similarity’: 0.2, ‘BAD_topology’: [‘hcb’], ‘BAD_dimension’: [0, 2],
}
- Parameters:
dtol (float) – tolerance of density
criteria (dict) – including
- clean_structures_spg_topology(dim=None)[source]
Clean up the db by removing the duplicate structures Here we check the follow criteria
same number of atoms
same space group
same topology
same wps
- Parameters:
dim (int) – wanted dimension
- export_structures(fmt='vasp', folder='mof_out', criteria=None, sort_by='similarity', overwrite=True, cutoff=None, use_relaxed=None)[source]
export structures from database according to the given criterion
- Parameters:
fmt (str) – ‘vasp’ or ‘cif’
folder (str) – ‘path of output folders’
criteria (dict) – check the validity with dict
sort_by (str) – sort by which attribute
overwrite (bool) – remove the existing folder
cutoff (int) – the maximum number of structures for export
- get_db_unique(db_name=None, prec=3)[source]
Get a db file with only unique structures with the following identical attributes: (topology, ff_energy)
- Parameters:
db_name (str) – filename for the new db
prec (int) – ff_energy precision for the round number
- get_properties(prop)[source]
Retrieve a list of specific property values from the database rows.
- Parameters:
prop (str) – The property name to retrieve (e.g., ‘ff_energy’)
- Returns:
- A list of property values for rows that have the specified property.
If a row does not contain the property, it is ignored.
- Return type:
list
- Raises:
Warning – If no rows in the database contain the specified property.
- get_pyxtal(id, use_relaxed=None)[source]
Get pyxtal based on row_id, if use_relaxed, get pyxtal from ff_relaxed
- Parameters:
id (int) – row id
use_relaxed (str) – ‘ff_relaxed’, ‘vasp_relaxed’
- plot_histogram(prop, ax=None, filename=None, xlim=None, nbins=20)[source]
Plot the histogram of a specified row property.
- Parameters:
prop (str) – The name of the property to plot (e.g., ‘ff_energy’).
ax (matplotlib.axes.Axes, optional) – Pre-existing axis to plot on. If None, a new ax will be created.
filename (str, optional) – Path to save the plot (e.g., ‘plot.png’). If None, the plot will not be saved.
xlim (tuple, optional) – Limits for the x-axis (e.g., (0, 10)). If None, the x-axis will scale automatically.
nbins (int, optional) – Number of bins for the histogram. Default is 20.
- Returns:
The axis object with the histogram plotted.
- Return type:
matplotlib.axes.Axes
- print_info(excluded_ids=None, cutoff=100)[source]
Print out the summary of the database based on the calculated energy Mostly used to quickly view the most interesting low-energy structures. Todo: show vasp_energy if available
- Parameters:
excluded_ids (list) – list of unwanted row ids
cutoff (int) – the cutoff value for the print
- select_xtal(ids, N_atoms=(None, None), overwrite=False, attribute=None, use_relaxed=None)[source]
Lazy extraction of select xtals
- Parameters:
ids
N_atoms
overwrite
atttribute
use_relaxed (str) – ‘ff_relaxed’ or ‘vasp_relaxed’
- select_xtals(ids, N_atoms=(None, None), overwrite=False, attribute=None, use_relaxed=None)[source]
Extract xtals based on attribute name. Mostly called by update_row_energy
- Parameters:
ids
N_atoms
overwrite
atttribute
use_relaxed (str) – ‘ff_relaxed’ or ‘vasp_relaxed’
- update_db_description()[source]
update db description based on robocrys Call robocrys: https://github.com/hackingmaterials/robocrystallographer
- update_row_energy(calculator='GULP', ids=(None, None), N_atoms=(None, None), ncpu=1, criteria=None, symmetrize=False, overwrite=False, write_freq=100, ff_lib='reaxff', steps=250, use_relaxed=None, cmd=None, calc_folder=None, skf_dir=None)[source]
Update the row energy in the database for a given calculator.
- Parameters:
calculator (str) – ‘GULP’, ‘MACE’, ‘VASP’, ‘DFTB’
ids (tuple) – A tuple specifying row IDs to update (e.g., (0, 100)).
ncpu (int) – number of parallel processes
criteria (dict, optional) – Criteria when selecting structures.
symmetrize (bool) – symmetrize the structure before calculation
overwrite (bool) – overwrite the existing energy attributes.
write_freq (int) – frequency to update db for ncpu=1
ff_lib (str) – Force field to use for GULP (‘reaxff’ by default).
steps (int) – Number of optimization steps for DFTB (default is 250).
use_relaxed (str, optional) – Use relaxed structures (e.g. ‘ff_relaxed’)
cmd (str, optional) – Command for VASP calculations
calc_folder (str, optional) – calc_folder for GULP/VASP calculations
skf_dir (str, optional) – Directory for DFTB potential files
- Functionality:
Using the selected calculator, it updates the energy rows of the database. If ncpu > 1, run in parallel; otherwise in serial.
- Calculator Options:
‘GULP’: Uses a force field (e.g., ‘reaxff’).
‘MACE’: Uses the MACE calculator.
‘DFTB’: Uses DFTB+ with symmetrization options.
‘VASP’: Uses VASP, with a specified command (cmd).
- update_row_energy_mproc(ncpu, generator, args, args_up)[source]
Perform parallel row energy updates by optimizing atomic structures.
- Parameters:
ncpu (int) – Number of CPUs to use for parallel processing.
generator (generator) – yielding tuples of (id, xtal), where: - id (int): Unique identifier for the structure. - xtal (object): pyxtal instance.
args (list) – Additional arguments passed to call_opt_single. - Typically includes a calculator or potential parameters.
args_up (list) – Additional arguments for function _update_db.
- Functionality:
This function distributes the structures across multiple CPUs using multiprocessing.Pool. It creates chunks (based on ncpu), and process them in parallel by calling call_opt_single. Successful results are periodically written to the database. The function also prints memory usage after each database update.
- Parallelization Process:
The Pool is initialized with ncpu processes.
Structures are divided into chunks with the chunkify function.
Each chunk is processed by call_opt_single via the pool.
Successful results are periodically written to the database.
The pool is closed and joined after processing is complete.
- update_row_energy_serial(generator, write_freq, args, args_up)[source]
Perform a serial update of row energies
- Parameters:
generator (generator) – Yielding tuples of (id, xtal), where: - id (int): Unique identifier for the structure. - xtal (object): pyxtal instance.
write_freq (int) – Frequency to update the database.
args (list) – Additional arguments to the function opt_single.
args_up (list) – Additional arguments for function _update_db.
- Functionality:
It iterates over structures provided by generator, optimizes them using opt_single, and collects results that have converged (status == True). Once the number of results reaches write_freq, it updates the database.
- update_row_topology(StructureType='Auto', overwrite=True, prefix=None, ref_dim=3)[source]
Update row topology base on the CrystalNets.jl
- Parameters:
StructureType (str) – ‘Zeolite’, ‘MOF’ or ‘Auto’
overwrite (bool) – remove the existing attributes
prefix (str) – prefix for tmp cif file
ref_dim (int) – desired dimension
- pyxtal.db.dftb_opt_single(id, xtal, skf_dir, steps, symmetrize, criteria, kresol=0.05)[source]
Single DFTB optimization for a given atomic xtal
- Parameters:
id (int) – id of the give xtal
xtal – pyxtal instance
skf_dir (str) – path of skf files
steps (int) – number of relaxation steps
criteria (dicts) – to check if the structure
- pyxtal.db.gulp_opt_single(id, xtal, ff_lib, path, criteria)[source]
Perform a single GULP optimization for a given crystal structure.
- Parameters:
id (int) – Identifier for the current structure.
xtal – PyXtal instance representing the crystal to be optimized.
ff_lib (str) – Force field library for GULP, e.g., ‘reaxff’, ‘tersoff’.
path (str) – Path to the folder where the calculation is stored.
criteria (dict) – Dictionary to check the validity of the opt_structure.
- Returns:
xtal: Optimized PyXtal instance.
eng (float): Energy of the optimized structure.
status (bool): Whether the optimization process is successful.
- Return type:
tuple
- Behavior:
This function performs a GULP optimization using the force field. After the optimization, it checks the validity of the structure and attempts to remove the calculation folder if it is empty.
- pyxtal.db.mace_opt_single(id, xtal, step, criteria)[source]
Perform a single MACE optimization for a given atomic crystal structure.
- Parameters:
id (int) – Identifier for the current structure.
xtal – PyXtal instance representing the crystal structure.
step (int) – Maximum number of relaxation steps. Default is 250.
criteria (dict) – Dictionary to check the validity of the optimized structure.
- Returns:
xtal: Optimized PyXtal instance (or None if optimization failed).
eng (float): Energy/atom of the opt_structure (or None if it failed).
status (bool): Whether the optimization was successful.
- Return type:
tuple
- pyxtal.db.make_db_from_CSD(dbname, codes)[source]
make database from CSD codes
- Parameters:
dbname – db file name
codes – a list of CSD codes
- pyxtal.db.make_entry_from_CSD(code)[source]
make entry dictionary from CSD codes
- Parameters:
code – a list of CSD codes
- pyxtal.db.make_entry_from_CSD_web(code, number, smiles, name=None)[source]
make enetry dictionary from csd web https://www.ccdc.cam.ac.uk/structures
- Parameters:
code – CSD style letter entry
number – ccdc number
smiles – the corresponding molecular smiles
name – name of the compound
- pyxtal.db.make_entry_from_pyxtal(xtal)[source]
Generate an entry dictionary from a PyXtal object, assuming the SMILES and CCDC number information is provided.
- Parameters:
xtal – PyXtal object (must contain the SMILES (xtal.tag[“smiles”])
number (and CCDC)
- Returns:
- (ase_atoms, entry_dict, None)
ase_atoms: ASE Atoms object converted from the PyXtal structure.
entry_dict (dict): A dictionary containing information
None: Placeholder for future use (currently returns None).
- Return type:
tuple
- Structure of entry_dict:
“csd_code” (str): CSD code (if available) for the crystal structure.
“mol_smi” (str): SMILES representation of the molecule.
“ccdc_number” (str): CCDC identifier number.
“space_group” (str): Space group symbol of the crystal.
“spg_num” (int): Space group number.
“Z” (int): Number of molecules in the unit cell.
“Zprime” (float): Z’ value of the crystal.
“url” (str): URL link to the CCDC database entry for the crystal.
“mol_formula” (str): Molecular formula of the structure.
“mol_weight” (float): Molecular weight of the structure.
“mol_name” (str): Name of the molecule, typically the CSD code.
“l_type” (str): Lattice type of the structure.
Returns None if the PyXtal structure is invalid (i.e., xtal.valid is False).
Example
entry = make_entry_from_pyxtal(xtal_instance) ase_atoms, entry_dict, _ = entry
Notes
The CCDC link is generated using the structure’s CCDC number.
- pyxtal.db.opt_single(id, xtal, calc, *args)[source]
Optimize a structure using the specified calculator.
- Parameters:
id (int) – Identifier of the structure to be optimized.
xtal – Crystal structure object to be optimized.
calc (str) – The calculator to use (‘GULP’, ‘DFTB’, ‘VASP’, ‘MACE’).
*args – Additional arguments to pass to the calculator function.
- Returns:
- The result of the optimization, which typically includes:
xtal: The optimized structure.
energy (float): The energy of the optimized structure.
status (bool): Whether the optimization was successful.
- Return type:
tuple
- Raises:
ValueError – If an unsupported calculator is specified.