pyxtal.db module

Database class

pyxtal.db.call_opt_single(p)[source]

Optimize a single structure and log the result.

Parameters:

p (tuple) – A tuple where the first element is an identifier (id), and the remaining elements are the arguments to pass to opt_single.

Returns:

A tuple (id, xtal, eng) where:
  • id (int): The identifier of the structure.

  • xtal: The optimized structure.

  • eng (float): The energy of the opt_structure, or None if it failed.

Return type:

tuple

Behavior:

This function calls opt_single to perform the optimization of the structure associated with the given id.

class pyxtal.db.database(db_name)[source]

Bases: object

This is a database class to process crystal data.

Parameters:

db_name*.db format from ase database

add(entry)[source]
add_from_code(code)[source]
check_status(show=False)[source]

Check the current status of each entry

compute(row, work_dir, skf_dir)[source]
copy(db_name, csd_codes)[source]

copy the entries to another db

Parameters:
  • db_name – db file name

  • csd_codes – list of codes

get_all_codes(group=None)[source]

Get all codes

get_pyxtal(code)[source]
get_row(code)[source]
get_row_info(id=None, code=None)[source]
process_kvp(kvp)[source]
vacuum()[source]
view(row_info)[source]

print the summary of benchmark results

Parameters:

row – row object

class pyxtal.db.database_topology(db_name, rank=0, size=1, ltol=0.05, stol=0.05, atol=3, log_file='db.log')[source]

Bases: object

This is a database class to process atomic crystal data

Parameters:
  • db_name (str) – *.db format from ase database

  • rank (int) – default 0

  • size (int) – default 1

  • ltol (float) – lattice tolerance

  • stol (float) – site tolerance

  • atol (float) – angle tolerance

  • log_file (str) – log_file

add_strucs_from_db(db_file, check=False, id_min=0, id_max=None, tol=0.001, freq=50, use_relaxed=None, sort=None, max_count=None, criteria=None, min_atoms=0, max_atoms=250, ignore_check='vasp_energy', same_number=False)[source]

Add new structures from another database file.

Parameters:
  • db_file (str) – Path to the source database file

  • check (bool) – Whether to check if structure already exists before adding

  • id_min (int) – Starting ID to import from source database. Default is 0

  • id_max (int) – Ending ID to import from source database. Default is None

  • tol (float) – Tolerance in Angstroms for symmetry detection. Default is 1e-3

  • freq (int) – Print progress message every N structures. Default is 50

  • use_relaxed (str) – Relaxed structure to use - ‘ff_relaxed’, ‘vasp_relaxed’ Default is None to use unrelaxed structures

  • sort (str) – key to sort the structure, e.g. ‘mace_energy’ Default is None to use row.id

  • max_count (int) – Number of maximum structure to add Default is None to all all structures

  • criteria (dict) – criteria to check if a valid.

add_xtal(xtal, kvp={})[source]

Add new xtal to the given db

check_new_structure(xtal, eng=None, same_group=False, same_number=False, d_tol=0.2, e_tol=0.01, max_atoms=250, min_atoms=0, return_id=False)[source]

Check if the input crystal structure already exists in the database.

Parameters:
  • xtal – PyXtal object representing the crystal structure to check

  • eng (float, optional) – Energy of the structure to compare

  • same_group (bool) – Whether to only compare structures with same space group

  • d_tol (float) – Tolerance for density comparison

  • e_tol (float) – Tolerance for energy comparison

  • max_atoms (int) – maximum number of atoms for checking

  • min_atoms (int) – minimum number of atoms for checking

Returns:

True if structure is new/unique, False if it matches an existing structure

Return type:

bool

Note

Compares structures based on: - Space group number (if same_group=True) - Density (within d_tol) - Energy (within e_tol if provided) - Structure similarity via pymatgen.analysis.structure_matcher

check_overlap(reference_db, etol=0.002, verbose=True)[source]

Check the overlap with a reference database.

Parameters:
  • reference_db (str) – Path to the reference database file

  • etol (float, optional) – Energy tolerance for identifying identical structures. Default is 2e-3.

  • verbose (bool, optional) – Whether to print detailed overlap information. Default is True.

Returns:

List of overlapping structures, where each entry contains: (id, pearson_symbol, dof, topology, ff_energy)

Return type:

list

Note

Two structures are considered overlapping if they have:
  • Same topology

  • Same topology detail

  • Force field energies within etol of each other

clean_structures(ids=(None, None), dtol=0.002, etol=0.001, criteria=None, eng_key='mace_energy')[source]

Clean up the db by removing the duplicate structures Here we check the follow criteria

  • same number of atoms

  • same density

  • same energy

Parameters:
  • dtol (float) – tolerance of density

  • etol (float) – tolerance of energy

  • criteria (dict) – including

clean_structures_pmg(ids=(None, None), min_id=None, dtol=0.05, criteria=None)[source]

Clean up the database by removing duplicate structures based on density and pymatgen matcher.

This method checks for duplicates by comparing structure density within a tolerance and using pymatgen’s StructureMatcher. It can also filter structures based on various criteria like coordination numbers, energies, topology, etc.

ids (tuple, optional): Range of IDs (min, max) to process. Defaults to (None, None). min_id (int, optional): Minimum ID to consider. Structures with lower IDs won’t be deleted.

Defaults to None.

dtol (float, optional): Density tolerance for comparing structures. Defaults to 5e-2. criteria (dict, optional): Dictionary of filtering criteria. Defaults to None.

Supported criteria keys:
  • ‘CN’: Dict of required coordination numbers per element

  • ‘cutoff’: Float, cutoff distance for connectivity

  • ‘MAX_energy’: Float, maximum allowed energy

  • ‘MAX_similarity’: Float, maximum allowed similarity value

  • ‘BAD_topology’: List of forbidden topology types

  • ‘BAD_dimension’: List of forbidden dimensionality values

Example criteria: {

‘CN’: {‘C’: 3}, ‘BAD_dimension’: [0, 2]

}

Returns:

None. Modifies database in place by deleting duplicate/invalid structures.

clean_structures_spg_topology(dim=None)[source]

Clean up the db by removing duplicate structures based on their properties.

Parameters:

dim (int, optional) – Filter structures by dimension. Only keep structures with this dimension if specified. Defaults to None.

The function removes structures that have identical: - Number of atoms - Space group - Topology - Wyckoff positions (wps)

export_structures(fmt='vasp', folder='mof_out', criteria=None, sort_by='similarity', overwrite=True, cutoff=None, use_relaxed=None)[source]

Export structures from database according to given criteria.

Parameters:
  • fmt (str) – Output format (vasp or cif)

  • folder (str) – Path to output folder

  • criteria (dict) – Dictionary of validity criteria

  • sort_by (str) – Attribute to sort structures by

  • overwrite (bool) – Whether to remove existing output folder

  • cutoff (int) – Maximum number of structures to export

  • use_relaxed (str, optional) – e.g., ff_relaxed or vasp_relaxed

get_all_xtals(include_energy=False)[source]

Get all pyxtal instances from the current db

get_db_unique(db_name=None, prec=3, key='ff_energy', max_N_atoms=64)[source]

Get a database file containing only unique structures based on topology and energy.

Parameters:
  • db_name (str, optional) – Filename for the new database.

  • None (If)

  • suffix. (will use original name with '_unique')

  • prec (int, optional) – Precision for rounding energy values. Default is 3.

  • key (str, optional) – Energy attribute name to use for filtering.

  • 'ff_energy'. (Default is)

  • max_N_atoms (int, optional) – Maximum n_atoms for pmg match. Default is 64.

Returns:

Number of unique structures in the new database.

Return type:

int

Note

Two structures are considered identical if they have: - Same density value (within precision) - Same energy value (within precision) - Pymatgen match

When duplicates are found, the structure with lower DOF is kept.

get_db_unique_topology(db_name=None, prec=3, update_topology=True, key='ff_energy')[source]

Get a database file containing only unique structures based on topology and energy.

Parameters:
  • db_name (str, optional) – Filename for the new database.

  • None (If)

  • suffix. (will use original name with '_unique')

  • prec (int, optional) – Precision for rounding energy values. Default is 3.

  • update_topology (bool, optional) – Whether to update topology before filtering.

  • True. (Default is)

  • key (str, optional) – Energy attribute name to use for filtering.

  • 'ff_energy'. (Default is)

Returns:

Number of unique structures in the new database.

Return type:

int

Note

Two structures are considered identical if they have: - Same topology - Same topology detail - Same energy value (within precision)

When duplicates are found, the structure with lower DOF is kept.

get_label(i)[source]
get_max_id()[source]

Get the maximum row id

get_properties(prop)[source]

Retrieve a list of specific property values from the database rows.

Parameters:

prop (str) – The property name to retrieve (e.g., ‘ff_energy’)

Returns:

A list of property values for rows that have the specified property.

If a row does not contain the property, it is ignored.

Return type:

list

Raises:

Warning – If no rows in the database contain the specified property.

get_pyxtal(id, use_relaxed=None, tol=0.0001)[source]

Get pyxtal based on row_id, if use_relaxed, get pyxtal from ff_relaxed

Parameters:
  • id (int) – row id

  • use_relaxed (str) – ‘ff_relaxed’, ‘vasp_relaxed’

get_row(id)[source]
plot_histogram(prop, ax=None, filename=None, xlim=None, nbins=20)[source]

Plot the histogram of a specified row property.

Parameters:
  • prop (str) – The name of the property to plot (e.g., ‘ff_energy’).

  • ax (matplotlib.axes.Axes, optional) – Pre-existing axis to plot on. If None, a new ax will be created.

  • filename (str, optional) – Path to save the plot (e.g., ‘plot.png’). If None, the plot will not be saved.

  • xlim (tuple, optional) – Limits for the x-axis (e.g., (0, 10)). If None, the x-axis will scale automatically.

  • nbins (int, optional) – Number of bins for the histogram. Default is 20.

Returns:

The axis object with the histogram plotted.

Return type:

matplotlib.axes.Axes

print_info(excluded_ids=None, cutoff=100)[source]

Print out the summary of the database based on the calculated energy Mostly used to quickly view the most interesting low-energy structures. Todo: show vasp_energy if available

Parameters:
  • excluded_ids (list) – list of unwanted row ids

  • cutoff (int) – the cutoff value for the print

print_memory_usage()[source]
select_xtal(ids, N_atoms=(None, None), overwrite=False, attribute=None, use_relaxed=None)[source]

Lazy extraction of selected xtals from the database.

Parameters:
  • ids (tuple) – Minimum and maximum row IDs to extract, e.g. (1, 10)

  • N_atoms (tuple) – Minimum and maximum number of atoms to extract, e.g. (2, 100)

  • overwrite (bool) – Whether to overwrite existing entries

  • attribute (str) – Attribute name to check for extraction

  • use_relaxed (str) – Type of relaxed structure to use (‘ff_relaxed’ or ‘vasp_relaxed’)

Yields:

tuple – (id, xtal) where id is the row ID and xtal is the corresponding pyxtal object

select_xtals(ids, N_atoms=(None, None), overwrite=False, attribute=None, use_relaxed=None)[source]

Extract xtals based on attribute name.

Parameters:
  • ids (tuple) – Minimum and maximum row IDs to extract, e.g. (1, 10)

  • N_atoms (tuple) – Minimum and maximum number of atoms to extract, e.g. (2, 100)

  • overwrite (bool) – Whether to overwrite existing entries

  • attribute (str) – Attribute name to check for extraction

  • use_relaxed (str) – Type of relaxed structure to use (‘ff_relaxed’ or ‘vasp_relaxed’)

Returns:

(ids, xtals) where ids is a list of row IDs and xtals is a list of

corresponding pyxtal objects

Return type:

tuple

update_db_description()[source]

Update database description using robocrys.

Uses robocrystallographer (https://github.com/hackingmaterials/robocrystallographer) to generate natural language descriptions of crystal structures.

For each row in the database that doesn’t have a description:
  1. Converts ASE atoms to pymatgen structure

  2. Uses StructureCondenser to analyze bonding/connectivity

  3. Uses StructureDescriber to generate text description

  4. Updates the database row with the description

Note

Use it with caution, as it may take a long time to run.

update_row_energy(calculator='GULP', ids=(None, None), N_atoms=(None, None), ncpu=1, criteria=None, symmetrize=False, overwrite=False, write_freq=100, ff_lib='reaxff', steps=250, fmax=0.1, use_relaxed=None, cmd=None, calc_folder=None, skf_dir=None)[source]

Update the row energy in the database for a given calculator.

Parameters:
  • calculator (str) – ‘GULP’, ‘MACE’, ‘VASP’, ‘DFTB’

  • ids (tuple) – A tuple specifying row IDs to update (e.g., (0, 100)).

  • ncpu (int) – number of parallel processes

  • criteria (dict, optional) – Criteria when selecting structures.

  • symmetrize (bool) – symmetrize the structure before calculation

  • overwrite (bool) – overwrite the existing energy attributes.

  • write_freq (int) – frequency to update db for ncpu=1

  • ff_lib (str) – Force field to use for GULP (‘reaxff’ by default).

  • steps (int) – Number of optimization steps for DFTB (default is 250).

  • fmax (float) – force tolerance for mace (defalut is 0.1)

  • use_relaxed (str, optional) – Use relaxed structures (e.g. ‘ff_relaxed’)

  • cmd (str, optional) – Command for VASP calculations

  • calc_folder (str, optional) – calc_folder for GULP/VASP calculations

  • skf_dir (str, optional) – Directory for DFTB potential files

Functionality:

Using the selected calculator, it updates the energy rows of the database. If ncpu > 1, run in parallel; otherwise in serial.

Calculator Options:
  • ‘GULP’: Uses a force field (e.g., ‘reaxff’).

  • ‘MACE’: Uses the MACE calculator.

  • ‘DFTB’: Uses DFTB+ with symmetrization options.

  • ‘VASP’: Uses VASP, with a specified command (cmd).

update_row_energy_mproc(ncpu, generator, args, args_up)[source]

Perform parallel row energy updates by optimizing atomic structures.

Parameters:
  • ncpu (int) – Number of CPUs to use for parallel processing.

  • generator (generator) – yielding tuples of (id, xtal), where: - id (int): Unique identifier for the structure. - xtal (object): pyxtal instance.

  • args (list) – Additional arguments passed to call_opt_single. - Typically includes a calculator or potential parameters.

  • args_up (list) – Additional arguments for function _update_db.

Functionality:

This function distributes the structures across multiple CPUs using multiprocessing.Pool. It creates chunks (based on ncpu), and process them in parallel by calling call_opt_single. Successful results are periodically written to the database. The function also prints memory usage after each database update.

Parallelization Process:
  • The Pool is initialized with ncpu processes.

  • Structures are divided into chunks with the chunkify function.

  • Each chunk is processed by call_opt_single via the pool.

  • Successful results are periodically written to the database.

  • The pool is closed and joined after processing is complete.

update_row_energy_serial(generator, write_freq, args, args_up)[source]

Perform a serial update of row energies

Parameters:
  • generator (generator) – Yielding tuples of (id, xtal), where: - id (int): Unique identifier for the structure. - xtal (object): pyxtal instance.

  • write_freq (int) – Frequency to update the database.

  • args (list) – Additional arguments to the function opt_single.

  • args_up (list) – Additional arguments for function _update_db.

Functionality:

It iterates over structures provided by generator, optimizes them using opt_single, and collects results that have converged (status == True). Once the number of results reaches write_freq, it updates the database.

update_row_topology(StructureType='Auto', overwrite=True, prefix=None, ref_dim=3, timeout=60)[source]

Update row topology using CrystalNets.jl via subprocess (faster than juliacall).

Parameters:
  • StructureType (str) – Type of structure to analyze. Options are: - ‘Zeolite’: For zeolite structures - ‘MOF’: For metal-organic frameworks - ‘Auto’: For automatic detection

  • overwrite (bool) – Whether to overwrite existing topology attributes.

  • prefix (str) – Prefix for temporary CIF files.

  • ref_dim (int) – Reference dimensionality to compare against.

  • timeout (int) – Timeout in seconds for each Julia call. Default is 60.

vacuum()[source]
pyxtal.db.dftb_opt_single(id, xtal, skf_dir, steps, symmetrize, criteria, kresol=0.05)[source]

Single DFTB optimization for a given atomic xtal

Parameters:
  • id (int) – id of the give xtal

  • xtal – pyxtal instance

  • skf_dir (str) – path of skf files

  • steps (int) – number of relaxation steps

  • criteria (dicts) – to check if the structure

pyxtal.db.gulp_opt_single(id, xtal, ff_lib, path, criteria)[source]

Perform a single GULP optimization for a given crystal structure.

Parameters:
  • id (int) – Identifier for the current structure.

  • xtal – PyXtal instance representing the crystal to be optimized.

  • ff_lib (str) – Force field library for GULP, e.g., ‘reaxff’, ‘tersoff’.

  • path (str) – Path to the folder where the calculation is stored.

  • criteria (dict) – Dictionary to check the validity of the opt_structure.

Returns:

  • xtal: Optimized PyXtal instance.

  • eng (float): Energy of the optimized structure.

  • status (bool): Whether the optimization process is successful.

Return type:

tuple

Behavior:

This function performs a GULP optimization using the force field. After the optimization, it checks the validity of the structure and attempts to remove the calculation folder if it is empty.

pyxtal.db.mace_opt_single(id, xtal, step, fmax, criteria)[source]

Perform a single MACE optimization for a given atomic crystal structure.

Parameters:
  • id (int) – Identifier for the current structure.

  • xtal – PyXtal instance representing the crystal structure.

  • step (int) – Maximum number of relaxation steps. Default is 250.

  • fmax (float) – fmax for relaxation

  • criteria (dict) – Dictionary to check the validity of the optimized structure.

Returns:

  • xtal: Optimized PyXtal instance (or None if optimization failed).

  • eng (float): Energy/atom of the opt_structure (or None if it failed).

  • status (bool): Whether the optimization was successful.

Return type:

tuple

pyxtal.db.make_db_from_CSD(dbname, codes)[source]

make database from CSD codes

Parameters:
  • dbname – db file name

  • codes – a list of CSD codes

pyxtal.db.make_entry_from_CSD(code)[source]

make entry dictionary from CSD codes

Parameters:

code – a list of CSD codes

pyxtal.db.make_entry_from_CSD_web(code, number, smiles, name=None)[source]

make enetry dictionary from csd web https://www.ccdc.cam.ac.uk/structures

Parameters:
  • code – CSD style letter entry

  • number – ccdc number

  • smiles – the corresponding molecular smiles

  • name – name of the compound

pyxtal.db.make_entry_from_pyxtal(xtal)[source]

Generate an entry dictionary from a PyXtal object, assuming the SMILES and CCDC number information is provided.

Parameters:
  • xtal – PyXtal object (must contain the SMILES (xtal.tag[“smiles”])

  • number (and CCDC)

Returns:

(ase_atoms, entry_dict, None)
  • ase_atoms: ASE Atoms object converted from the PyXtal structure.

  • entry_dict (dict): A dictionary containing information

  • None: Placeholder for future use (currently returns None).

Return type:

tuple

Structure of entry_dict:
  • “csd_code” (str): CSD code (if available) for the crystal structure.

  • “mol_smi” (str): SMILES representation of the molecule.

  • “ccdc_number” (str): CCDC identifier number.

  • “space_group” (str): Space group symbol of the crystal.

  • “spg_num” (int): Space group number.

  • “Z” (int): Number of molecules in the unit cell.

  • “Zprime” (float): Z’ value of the crystal.

  • “url” (str): URL link to the CCDC database entry for the crystal.

  • “mol_formula” (str): Molecular formula of the structure.

  • “mol_weight” (float): Molecular weight of the structure.

  • “mol_name” (str): Name of the molecule, typically the CSD code.

  • “l_type” (str): Lattice type of the structure.

Returns None if the PyXtal structure is invalid (i.e., xtal.valid is False).

Example

entry = make_entry_from_pyxtal(xtal_instance) ase_atoms, entry_dict, _ = entry

Notes

  • The CCDC link is generated using the structure’s CCDC number.

pyxtal.db.opt_single(id, xtal, calc, *args)[source]

Optimize a structure using the specified calculator.

Parameters:
  • id (int) – Identifier of the structure to be optimized.

  • xtal – Crystal structure object to be optimized.

  • calc (str) – The calculator to use (‘GULP’, ‘DFTB’, ‘VASP’, ‘MACE’).

  • *args – Additional arguments to pass to the calculator function.

Returns:

The result of the optimization, which typically includes:
  • xtal: The optimized structure.

  • energy (float): The energy of the optimized structure.

  • status (bool): Whether the optimization was successful.

Return type:

tuple

Raises:

ValueError – If an unsupported calculator is specified.

pyxtal.db.process_xtal(id, xtal, eng, criteria)[source]
pyxtal.db.setup_worker_logger(log_file)[source]

Set up the logger for each worker process.

pyxtal.db.vasp_opt_single(id, xtal, path, cmd, criteria)[source]

Single VASP optimization for a given atomic xtal

Parameters:
  • id (int) – id of the give xtal

  • xtal – pyxtal instance

  • path – calculation folder

  • cmd – vasp command

  • criteria (dicts) – to check if the structure