Database Interoperability
So far, we have looked at the possibility of creating and manipulating any species, whether they exist or not. If we wanted to create a H₂O⁺⁴ molecule, it would not be a problem. However, you will admit that it is a little strange...
This is why ChemistryLab relies on existing databases, in particular Cemdata18 and PSI-Nagra-12-07. Cemdata18 is a chemical thermodynamic database for hydrated Portland cements and alkali-activated materials. PSI-Nagra is a Chemical Thermodynamic Database. The formalism adopted for these databases is that of Thermofun which is a universal open-source client that delivers thermodynamic properties of substances and reactions at the temperature and pressure of interest. The information is stored in json files.
Loading species from a database
The simplest way to load species from a ThermoFun-compatible JSON file is build_species, which reads the file and directly returns a Vector{Species} with compiled thermodynamic functions:
using ChemistryLab
all_species = build_species("../../../data/cemdata18-merged.json")Each species already carries its molar mass and temperature-dependent thermodynamic functions (Cp⁰, ΔₐH⁰, ΔₐS⁰, ΔₐG⁰, logK⁰) as SymbolicFuncs and NumericFuncs.
If you need the raw DataFrames (e.g. to inspect metadata or filter on database columns), the lower-level function read_thermofun_database is still available and returns three DataFrames (df_elements, df_substances, df_reactions):
df_elements, df_substances, df_reactions = read_thermofun_database("../../../data/cemdata18-merged.json")build_species(df_substances) can then be called on the filtered DataFrame.
Filtering species with speciation
In practice, only a small subset of the database is relevant to a given problem. speciation filters a species list to those whose atomic composition is a subset of the atoms found in a set of seed species:
# Keep only species that can form from the calcite / water system
species_calcite = speciation(all_species, split("Cal H2O@");
aggregate_state=[AS_AQUEOUS],
exclude_species=split("H2@ O2@ CH4@"))
dict_species_calcite = Dict(symbol(s) => s for s in species_calcite)Dict{String, Species{Int64}} with 11 entries:
"H+" => H+ {H+} [H+ ◆ H⁺]
"OH-" => OH- {OH-} [OH- ◆ OH⁻]
"Ca(HCO3)+" => Ca(HCO3)+ {CaHCO3+} [Ca(HCO3)+ ◆ Ca(HCO₃)⁺]
"Cal" => Cal {Calcite} [CaCO3 ◆ CaCO₃]
"Ca+2" => Ca+2 {Ca+2} [Ca+2 ◆ Ca²⁺]
"CO2@" => CO2@ {CO2 aq} [CO2@ ◆ CO₂@]
"HCO3-" => HCO3- {HCO3-} [HCO3- ◆ HCO₃⁻]
"CaOH+" => CaOH+ {CaOH+} [Ca(OH)+ ◆ Ca(OH)⁺]
"H2O@" => H2O@ {H2O l} [H2O@ ◆ H₂O@]
"CO3-2" => CO3-2 {CO3-2} [CO3-2 ◆ CO₃²⁻]
"Ca(CO3)@" => Ca(CO3)@ {CaCO3 aq} [CaCO3@ ◆ CaCO₃@]The aggregate_state keyword restricts results to aqueous species (the seed species themselves — Cal and H2O@ — are always kept through include_species internally). For example, the properties of Ca(HCO₃)⁺ can then be read as:
dict_species_calcite["Ca(HCO3)+"]Species{Int64}
name: CaHCO3+
symbol: Ca(HCO3)+
formula: Ca(HCO3)+ ◆ Ca(HCO₃)⁺
atoms: Ca => 1, H => 1, C => 1, O => 3
charge: 1
aggregate_state: AS_AQUEOUS
class: SC_AQSOLUTE
properties: M = 0.10109399996506409 kg mol⁻¹
Tref = 298.15 K
Pref = 100000.0 m⁻¹ kg s⁻²
Cp⁰ = NumericFunc [m² kg s⁻² K⁻¹ mol⁻¹] ◆ vars=(T, P) ◆ T=298.15 K, P=100000.0 m⁻¹ kg s⁻²
ΔₐH⁰ = NumericFunc [m² kg s⁻² mol⁻¹] ◆ vars=(T, P) ◆ T=298.15 K, P=100000.0 m⁻¹ kg s⁻²
S⁰ = NumericFunc [m² kg s⁻² K⁻¹ mol⁻¹] ◆ vars=(T, P) ◆ T=298.15 K, P=100000.0 m⁻¹ kg s⁻²
ΔₐG⁰ = NumericFunc [m² kg s⁻² mol⁻¹] ◆ vars=(T, P) ◆ T=298.15 K, P=100000.0 m⁻¹ kg s⁻²
V⁰ = NumericFunc [m³ mol⁻¹] ◆ vars=(T, P) ◆ T=298.15 K, P=100000.0 m⁻¹ kg s⁻²
Cp⁰_Tref = 233.6999206543 m² kg s⁻² K⁻¹ mol⁻¹
ΔₐH⁰_Tref = -1.231942e6 m² kg s⁻² mol⁻¹
S⁰_Tref = 66.944000244141 m² kg s⁻² K⁻¹ mol⁻¹
ΔₐG⁰_Tref = -1.146041e6 m² kg s⁻² mol⁻¹
V⁰_Tref = 1.3329811096191e-5 m³ mol⁻¹speciation signatures
speciation accepts seed arguments in three forms:
| Seed argument | Description |
|---|---|
Vector{Symbol} | Explicit list of atom symbols |
Vector{<:AbstractSpecies} | Species objects — their union of atoms defines the space |
Vector{<:AbstractString} | Species symbol strings — looked up in species_list |
Common keyword arguments:
| Keyword | Default | Description |
|---|---|---|
aggregate_state | all states | restrict to [AS_AQUEOUS], [AS_CRYSTAL], etc. |
class | all classes | restrict to [SC_AQSOLUTE], etc. |
exclude_species | [] | species (or symbols) to always exclude |
include_species | [] | species to always include regardless of composition |
Primary species extraction
It is also possible to retrieve primary species from the Cemdata18 database. Primary species are a minimal subset such that every other species can be expressed as their linear combination.
df_primaries = extract_primary_species("../../../data/CEMDATA18-31-03-2022-phaseVol.dat")
show(df_primaries, allcols=true, allrows=true)