pyxtal.db module

Database class

pyxtal.db.call_opt_single(p)[source]

Optimize a single structure and log the result.

Parameters:

p (tuple) – A tuple where the first element is an identifier (id), and the remaining elements are the arguments to pass to opt_single.

Returns:

A tuple (id, xtal, eng) where:
  • id (int): The identifier of the structure.

  • xtal: The optimized structure.

  • eng (float): The energy of the opt_structure, or None if it failed.

Return type:

tuple

Behavior:

This function calls opt_single to perform the optimization of the structure associated with the given id.

class pyxtal.db.database(db_name)[source]

Bases: object

This is a database class to process crystal data.

Parameters:

db_name*.db format from ase database

add(entry)[source]
add_from_code(code)[source]
check_status(show=False)[source]

Check the current status of each entry

compute(row, work_dir, skf_dir)[source]
copy(db_name, csd_codes)[source]

copy the entries to another db

Parameters:
  • db_name – db file name

  • csd_codes – list of codes

get_all_codes(group=None)[source]

Get all codes

get_pyxtal(code)[source]
get_row(code)[source]
get_row_info(id=None, code=None)[source]
process_kvp(kvp)[source]
vacuum()[source]
view(row_info)[source]

print the summary of benchmark results

Parameters:

row – row object

class pyxtal.db.database_topology(db_name, rank=0, size=1, ltol=0.05, stol=0.05, atol=3, log_file='db.log')[source]

Bases: object

This is a database class to process atomic crystal data

Parameters:
  • db_name (str) – *.db format from ase database

  • rank (int) – default 0

  • size (int) – default 1

  • ltol (float) – lattice tolerance

  • stol (float) – site tolerance

  • atol (float) – angle tolerance

  • log_file (str) – log_file

add_strucs_from_db(db_file, check=False, tol=0.1, freq=50)[source]

Add new structures from the given db_file

Parameters:
  • db_file (str) – database path

  • tol (float) – tolerance in angstrom for symmetry detection

  • freq (int) – print frequency

add_xtal(xtal, kvp)[source]

Add new xtal to the given db

check_new_structure(xtal, same_group=True)[source]

Check if the input xtal already exists in the db

Parameters:
  • xtal – pyxtal object

  • same_group (bool) – keep the same group or not

check_overlap(reference_db, etol=0.002, verbose=True)[source]

Check the overlap w.r.t the reference database

Parameters:
  • reference_db (str) – path of reference database

  • etol (float) – energy tolerence to distinguish the identical structure

  • verbose (bool) – whether or not print out details

clean_structures(ids=(None, None), dtol=0.002, etol=0.001, criteria=None)[source]

Clean up the db by removing the duplicate structures Here we check the follow criteria - same number of atoms - same density - same energy

Parameters:
  • dtol (float) – tolerance of density

  • etol (float) – tolerance of energy

  • criteria (dict) – including

clean_structures_pmg(ids=(None, None), min_id=None, dtol=0.05, criteria=None)[source]

Clean up the db by removing the duplicate structures. Here we check the follow criteria same density and pymatgen matcher. The criteria should look like the following,

{‘CN’: {‘C’: 3},

‘cutoff’: 1.8, ‘MAX_energy’: -8.00, #’MAX_similarity’: 0.2, ‘BAD_topology’: [‘hcb’], ‘BAD_dimension’: [0, 2],

}

Parameters:
  • dtol (float) – tolerance of density

  • criteria (dict) – including

clean_structures_spg_topology(dim=None)[source]

Clean up the db by removing the duplicate structures Here we check the follow criteria

  • same number of atoms

  • same space group

  • same topology

  • same wps

Parameters:

dim (int) – wanted dimension

export_structures(fmt='vasp', folder='mof_out', criteria=None, sort_by='similarity', overwrite=True, cutoff=None, use_relaxed=None)[source]

export structures from database according to the given criterion

Parameters:
  • fmt (str) – ‘vasp’ or ‘cif’

  • folder (str) – ‘path of output folders’

  • criteria (dict) – check the validity with dict

  • sort_by (str) – sort by which attribute

  • overwrite (bool) – remove the existing folder

  • cutoff (int) – the maximum number of structures for export

get_all_xtals(include_energy=False)[source]

Get all pyxtal instances from the current db

get_db_unique(db_name=None, prec=3)[source]

Get a db file with only unique structures with the following identical attributes: (topology, ff_energy)

Parameters:
  • db_name (str) – filename for the new db

  • prec (int) – ff_energy precision for the round number

get_label(i)[source]
get_max_id()[source]

Get the maximum row id

get_properties(prop)[source]

Retrieve a list of specific property values from the database rows.

Parameters:

prop (str) – The property name to retrieve (e.g., ‘ff_energy’)

Returns:

A list of property values for rows that have the specified property.

If a row does not contain the property, it is ignored.

Return type:

list

Raises:

Warning – If no rows in the database contain the specified property.

get_pyxtal(id, use_relaxed=None)[source]

Get pyxtal based on row_id, if use_relaxed, get pyxtal from ff_relaxed

Parameters:
  • id (int) – row id

  • use_relaxed (str) – ‘ff_relaxed’, ‘vasp_relaxed’

plot_histogram(prop, ax=None, filename=None, xlim=None, nbins=20)[source]

Plot the histogram of a specified row property.

Parameters:
  • prop (str) – The name of the property to plot (e.g., ‘ff_energy’).

  • ax (matplotlib.axes.Axes, optional) – Pre-existing axis to plot on. If None, a new ax will be created.

  • filename (str, optional) – Path to save the plot (e.g., ‘plot.png’). If None, the plot will not be saved.

  • xlim (tuple, optional) – Limits for the x-axis (e.g., (0, 10)). If None, the x-axis will scale automatically.

  • nbins (int, optional) – Number of bins for the histogram. Default is 20.

Returns:

The axis object with the histogram plotted.

Return type:

matplotlib.axes.Axes

print_info(excluded_ids=None, cutoff=100)[source]

Print out the summary of the database based on the calculated energy Mostly used to quickly view the most interesting low-energy structures. Todo: show vasp_energy if available

Parameters:
  • excluded_ids (list) – list of unwanted row ids

  • cutoff (int) – the cutoff value for the print

print_memory_usage()[source]
select_xtal(ids, N_atoms=(None, None), overwrite=False, attribute=None, use_relaxed=None)[source]

Lazy extraction of select xtals

Parameters:
  • ids

  • N_atoms

  • overwrite

  • atttribute

  • use_relaxed (str) – ‘ff_relaxed’ or ‘vasp_relaxed’

select_xtals(ids, N_atoms=(None, None), overwrite=False, attribute=None, use_relaxed=None)[source]

Extract xtals based on attribute name. Mostly called by update_row_energy

Parameters:
  • ids

  • N_atoms

  • overwrite

  • atttribute

  • use_relaxed (str) – ‘ff_relaxed’ or ‘vasp_relaxed’

update_db_description()[source]

update db description based on robocrys Call robocrys: https://github.com/hackingmaterials/robocrystallographer

update_row_energy(calculator='GULP', ids=(None, None), N_atoms=(None, None), ncpu=1, criteria=None, symmetrize=False, overwrite=False, write_freq=100, ff_lib='reaxff', steps=250, use_relaxed=None, cmd=None, calc_folder=None, skf_dir=None)[source]

Update the row energy in the database for a given calculator.

Parameters:
  • calculator (str) – ‘GULP’, ‘MACE’, ‘VASP’, ‘DFTB’

  • ids (tuple) – A tuple specifying row IDs to update (e.g., (0, 100)).

  • ncpu (int) – number of parallel processes

  • criteria (dict, optional) – Criteria when selecting structures.

  • symmetrize (bool) – symmetrize the structure before calculation

  • overwrite (bool) – overwrite the existing energy attributes.

  • write_freq (int) – frequency to update db for ncpu=1

  • ff_lib (str) – Force field to use for GULP (‘reaxff’ by default).

  • steps (int) – Number of optimization steps for DFTB (default is 250).

  • use_relaxed (str, optional) – Use relaxed structures (e.g. ‘ff_relaxed’)

  • cmd (str, optional) – Command for VASP calculations

  • calc_folder (str, optional) – calc_folder for GULP/VASP calculations

  • skf_dir (str, optional) – Directory for DFTB potential files

Functionality:

Using the selected calculator, it updates the energy rows of the database. If ncpu > 1, run in parallel; otherwise in serial.

Calculator Options:
  • ‘GULP’: Uses a force field (e.g., ‘reaxff’).

  • ‘MACE’: Uses the MACE calculator.

  • ‘DFTB’: Uses DFTB+ with symmetrization options.

  • ‘VASP’: Uses VASP, with a specified command (cmd).

update_row_energy_mproc(ncpu, generator, args, args_up)[source]

Perform parallel row energy updates by optimizing atomic structures.

Parameters:
  • ncpu (int) – Number of CPUs to use for parallel processing.

  • generator (generator) – yielding tuples of (id, xtal), where: - id (int): Unique identifier for the structure. - xtal (object): pyxtal instance.

  • args (list) – Additional arguments passed to call_opt_single. - Typically includes a calculator or potential parameters.

  • args_up (list) – Additional arguments for function _update_db.

Functionality:

This function distributes the structures across multiple CPUs using multiprocessing.Pool. It creates chunks (based on ncpu), and process them in parallel by calling call_opt_single. Successful results are periodically written to the database. The function also prints memory usage after each database update.

Parallelization Process:
  • The Pool is initialized with ncpu processes.

  • Structures are divided into chunks with the chunkify function.

  • Each chunk is processed by call_opt_single via the pool.

  • Successful results are periodically written to the database.

  • The pool is closed and joined after processing is complete.

update_row_energy_serial(generator, write_freq, args, args_up)[source]

Perform a serial update of row energies

Parameters:
  • generator (generator) – Yielding tuples of (id, xtal), where: - id (int): Unique identifier for the structure. - xtal (object): pyxtal instance.

  • write_freq (int) – Frequency to update the database.

  • args (list) – Additional arguments to the function opt_single.

  • args_up (list) – Additional arguments for function _update_db.

Functionality:

It iterates over structures provided by generator, optimizes them using opt_single, and collects results that have converged (status == True). Once the number of results reaches write_freq, it updates the database.

update_row_topology(StructureType='Auto', overwrite=True, prefix=None, ref_dim=3)[source]

Update row topology base on the CrystalNets.jl

Parameters:
  • StructureType (str) – ‘Zeolite’, ‘MOF’ or ‘Auto’

  • overwrite (bool) – remove the existing attributes

  • prefix (str) – prefix for tmp cif file

  • ref_dim (int) – desired dimension

vacuum()[source]
pyxtal.db.dftb_opt_single(id, xtal, skf_dir, steps, symmetrize, criteria, kresol=0.05)[source]

Single DFTB optimization for a given atomic xtal

Parameters:
  • id (int) – id of the give xtal

  • xtal – pyxtal instance

  • skf_dir (str) – path of skf files

  • steps (int) – number of relaxation steps

  • criteria (dicts) – to check if the structure

pyxtal.db.gulp_opt_single(id, xtal, ff_lib, path, criteria)[source]

Perform a single GULP optimization for a given crystal structure.

Parameters:
  • id (int) – Identifier for the current structure.

  • xtal – PyXtal instance representing the crystal to be optimized.

  • ff_lib (str) – Force field library for GULP, e.g., ‘reaxff’, ‘tersoff’.

  • path (str) – Path to the folder where the calculation is stored.

  • criteria (dict) – Dictionary to check the validity of the opt_structure.

Returns:

  • xtal: Optimized PyXtal instance.

  • eng (float): Energy of the optimized structure.

  • status (bool): Whether the optimization process is successful.

Return type:

tuple

Behavior:

This function performs a GULP optimization using the force field. After the optimization, it checks the validity of the structure and attempts to remove the calculation folder if it is empty.

pyxtal.db.mace_opt_single(id, xtal, step, criteria)[source]

Perform a single MACE optimization for a given atomic crystal structure.

Parameters:
  • id (int) – Identifier for the current structure.

  • xtal – PyXtal instance representing the crystal structure.

  • step (int) – Maximum number of relaxation steps. Default is 250.

  • criteria (dict) – Dictionary to check the validity of the optimized structure.

Returns:

  • xtal: Optimized PyXtal instance (or None if optimization failed).

  • eng (float): Energy/atom of the opt_structure (or None if it failed).

  • status (bool): Whether the optimization was successful.

Return type:

tuple

pyxtal.db.make_db_from_CSD(dbname, codes)[source]

make database from CSD codes

Parameters:
  • dbname – db file name

  • codes – a list of CSD codes

pyxtal.db.make_entry_from_CSD(code)[source]

make entry dictionary from CSD codes

Parameters:

code – a list of CSD codes

pyxtal.db.make_entry_from_CSD_web(code, number, smiles, name=None)[source]

make enetry dictionary from csd web https://www.ccdc.cam.ac.uk/structures

Parameters:
  • code – CSD style letter entry

  • number – ccdc number

  • smiles – the corresponding molecular smiles

  • name – name of the compound

pyxtal.db.make_entry_from_pyxtal(xtal)[source]

Generate an entry dictionary from a PyXtal object, assuming the SMILES and CCDC number information is provided.

Parameters:
  • xtal – PyXtal object (must contain the SMILES (xtal.tag[“smiles”])

  • number (and CCDC)

Returns:

(ase_atoms, entry_dict, None)
  • ase_atoms: ASE Atoms object converted from the PyXtal structure.

  • entry_dict (dict): A dictionary containing information

  • None: Placeholder for future use (currently returns None).

Return type:

tuple

Structure of entry_dict:
  • “csd_code” (str): CSD code (if available) for the crystal structure.

  • “mol_smi” (str): SMILES representation of the molecule.

  • “ccdc_number” (str): CCDC identifier number.

  • “space_group” (str): Space group symbol of the crystal.

  • “spg_num” (int): Space group number.

  • “Z” (int): Number of molecules in the unit cell.

  • “Zprime” (float): Z’ value of the crystal.

  • “url” (str): URL link to the CCDC database entry for the crystal.

  • “mol_formula” (str): Molecular formula of the structure.

  • “mol_weight” (float): Molecular weight of the structure.

  • “mol_name” (str): Name of the molecule, typically the CSD code.

  • “l_type” (str): Lattice type of the structure.

Returns None if the PyXtal structure is invalid (i.e., xtal.valid is False).

Example

entry = make_entry_from_pyxtal(xtal_instance) ase_atoms, entry_dict, _ = entry

Notes

  • The CCDC link is generated using the structure’s CCDC number.

pyxtal.db.opt_single(id, xtal, calc, *args)[source]

Optimize a structure using the specified calculator.

Parameters:
  • id (int) – Identifier of the structure to be optimized.

  • xtal – Crystal structure object to be optimized.

  • calc (str) – The calculator to use (‘GULP’, ‘DFTB’, ‘VASP’, ‘MACE’).

  • *args – Additional arguments to pass to the calculator function.

Returns:

The result of the optimization, which typically includes:
  • xtal: The optimized structure.

  • energy (float): The energy of the optimized structure.

  • status (bool): Whether the optimization was successful.

Return type:

tuple

Raises:

ValueError – If an unsupported calculator is specified.

pyxtal.db.process_xtal(id, xtal, eng, criteria)[source]
pyxtal.db.setup_worker_logger(log_file)[source]

Set up the logger for each worker process.

pyxtal.db.vasp_opt_single(id, xtal, path, cmd, criteria)[source]

Single VASP optimization for a given atomic xtal

Parameters:
  • id (int) – id of the give xtal

  • xtal – pyxtal instance

  • path – calculation folder

  • cmd – vasp command

  • criteria (dicts) – to check if the structure