Database Interoperability

So far, we have looked at the possibility of creating and manipulating any species, whether they exist or not. If we wanted to create a H₂O⁺⁴ molecule, it would not be a problem. However, you will admit that it is a little strange...

This is why ChemistryLab relies on existing databases, in particular Cemdata18 and PSI-Nagra-12-07. Cemdata18 is a chemical thermodynamic database for hydrated Portland cements and alkali-activated materials. PSI-Nagra is a Chemical Thermodynamic Database. The formalism adopted for these databases is that of Thermofun which is a universal open-source client that delivers thermodynamic properties of substances and reactions at the temperature and pressure of interest. The information is stored in json files.

Loading species from a database

The simplest way to load species from a ThermoFun-compatible JSON file is build_species, which reads the file and directly returns a Vector{Species} with compiled thermodynamic functions:

using ChemistryLab
all_species = build_species("../../../data/cemdata18-merged.json")

Each species already carries its molar mass and temperature-dependent thermodynamic functions (Cp⁰, ΔₐH⁰, ΔₐS⁰, ΔₐG⁰, logK⁰) as SymbolicFuncs and NumericFuncs.

Low-level access

If you need the raw DataFrames (e.g. to inspect metadata or filter on database columns), the lower-level function read_thermofun_database is still available and returns three DataFrames (df_elements, df_substances, df_reactions):

df_elements, df_substances, df_reactions = read_thermofun_database("../../../data/cemdata18-merged.json")

build_species(df_substances) can then be called on the filtered DataFrame.

Filtering species with speciation

In practice, only a small subset of the database is relevant to a given problem. speciation filters a species list to those whose atomic composition is a subset of the atoms found in a set of seed species:

# Keep only species that can form from the calcite / water system
species_calcite = speciation(all_species, split("Cal H2O@");
                             aggregate_state=[AS_AQUEOUS],
                             exclude_species=split("H2@ O2@ CH4@"))
dict_species_calcite = Dict(symbol(s) => s for s in species_calcite)
Dict{String, Species{Int64}} with 11 entries:
  "H+"        => H+ {H+} [H+ ◆ H⁺]
  "OH-"       => OH- {OH-} [OH- ◆ OH⁻]
  "Ca(HCO3)+" => Ca(HCO3)+ {CaHCO3+} [Ca(HCO3)+ ◆ Ca(HCO₃)⁺]
  "Cal"       => Cal {Calcite} [CaCO3 ◆ CaCO₃]
  "Ca+2"      => Ca+2 {Ca+2} [Ca+2 ◆ Ca²⁺]
  "CO2@"      => CO2@ {CO2  aq} [CO2@ ◆ CO₂@]
  "HCO3-"     => HCO3- {HCO3-} [HCO3- ◆ HCO₃⁻]
  "CaOH+"     => CaOH+ {CaOH+} [Ca(OH)+ ◆ Ca(OH)⁺]
  "H2O@"      => H2O@ {H2O  l} [H2O@ ◆ H₂O@]
  "CO3-2"     => CO3-2 {CO3-2} [CO3-2 ◆ CO₃²⁻]
  "Ca(CO3)@"  => Ca(CO3)@ {CaCO3  aq} [CaCO3@ ◆ CaCO₃@]

The aggregate_state keyword restricts results to aqueous species (the seed species themselves — Cal and H2O@ — are always kept through include_species internally). For example, the properties of Ca(HCO₃)⁺ can then be read as:

dict_species_calcite["Ca(HCO3)+"]
Species{Int64}
           name: CaHCO3+
         symbol: Ca(HCO3)+
        formula: Ca(HCO3)+ ◆ Ca(HCO₃)⁺
          atoms: Ca => 1, H => 1, C => 1, O => 3
         charge: 1
aggregate_state: AS_AQUEOUS
          class: SC_AQSOLUTE
     properties: M = 0.10109399996506409 kg mol⁻¹
                 Tref = 298.15 K
                 Pref = 100000.0 m⁻¹ kg s⁻²
                 Cp⁰ = NumericFunc [m² kg s⁻² K⁻¹ mol⁻¹] ◆ vars=(T, P) ◆ T=298.15 K, P=100000.0 m⁻¹ kg s⁻²
                 ΔₐH⁰ = NumericFunc [m² kg s⁻² mol⁻¹] ◆ vars=(T, P) ◆ T=298.15 K, P=100000.0 m⁻¹ kg s⁻²
                 S⁰ = NumericFunc [m² kg s⁻² K⁻¹ mol⁻¹] ◆ vars=(T, P) ◆ T=298.15 K, P=100000.0 m⁻¹ kg s⁻²
                 ΔₐG⁰ = NumericFunc [m² kg s⁻² mol⁻¹] ◆ vars=(T, P) ◆ T=298.15 K, P=100000.0 m⁻¹ kg s⁻²
                 V⁰ = NumericFunc [m³ mol⁻¹] ◆ vars=(T, P) ◆ T=298.15 K, P=100000.0 m⁻¹ kg s⁻²
                 Cp⁰_Tref = 233.6999206543 m² kg s⁻² K⁻¹ mol⁻¹
                 ΔₐH⁰_Tref = -1.231942e6 m² kg s⁻² mol⁻¹
                 S⁰_Tref = 66.944000244141 m² kg s⁻² K⁻¹ mol⁻¹
                 ΔₐG⁰_Tref = -1.146041e6 m² kg s⁻² mol⁻¹
                 V⁰_Tref = 1.3329811096191e-5 m³ mol⁻¹

speciation signatures

speciation accepts seed arguments in three forms:

Seed argumentDescription
Vector{Symbol}Explicit list of atom symbols
Vector{<:AbstractSpecies}Species objects — their union of atoms defines the space
Vector{<:AbstractString}Species symbol strings — looked up in species_list

Common keyword arguments:

KeywordDefaultDescription
aggregate_stateall statesrestrict to [AS_AQUEOUS], [AS_CRYSTAL], etc.
classall classesrestrict to [SC_AQSOLUTE], etc.
exclude_species[]species (or symbols) to always exclude
include_species[]species to always include regardless of composition

Primary species extraction

It is also possible to retrieve primary species from the Cemdata18 database. Primary species are a minimal subset such that every other species can be expressed as their linear combination.

df_primaries = extract_primary_species("../../../data/CEMDATA18-31-03-2022-phaseVol.dat")
show(df_primaries, allcols=true, allrows=true)