1  Introduction

Machine Learning (ML) shows promise to accelerate the rate of scientific progress.[Jablonka et al. (2020); Butler et al. (2018); Yano et al. (2022); Yao et al. (2022); De Luna et al. (2017); Wang et al. (2023)] Recent progress in the field has demonstrated, for example, the ability of machine learning (ML) models to make predictions for multiscale systems,[Charalambous et al. (2024); W. Yang et al. (2020); Deringer et al. (2021)] to perform experiments by interacting with laboratory equipment [Boiko et al. (2023); Coley et al. (2019)], to autonomously collect data from scientific literature,[Schilling-Wilhelmi et al. (2025); W. Zhang et al. (2024); Dagdelen et al. (2024)] and to make predictions with high accuracy.[Jablonka et al. (2024); Jablonka et al. (2023); Jung, Jung, and Cole (2024); Rupp et al. (2012); Keith et al. (2021); J. Wu et al. (2024)]

However, the diversity and scale of chemical data create a unique challenge for applying ML to the chemical sciences. This diversity manifests across temporal, spatial, and representational dimensions. Temporally, chemical processes span femtosecond-scale spectroscopic events to year-long stability studies of pharmaceuticals or batteries, demanding data sampled at resolutions tailored to each time regime. Spatially, systems range from the atomic to the industrial scale, requiring models that bridge molecular behavior to macroscopic properties. Representationally, even a single observation (e.g., a 13C-NMR spectrum) can be encoded in chemically equivalent formats: a string [Alberts et al. (2024)], vector [Mirza and Jablonka (2024)], or image[Alberts et al. (2024)]. However, such representations are not computationally equivalent and have been empirically shown to produce different model outputs.[Atz et al. (2024); Alampara et al. (2025); J.-N. Wu et al. (2024); Skinnider (2024)]

Additionally, ML for chemistry is challenged by what one can term “hidden variables”. These can be thought of as the parameters in an experiment that remain largely unaccounted for (e.g., their importance is unknown, or they are difficult to control for), but could have a significant impact on experimental outcomes. One example is seasonal variations in ambient laboratory conditions that are typically not controlled for and, if at all, only communicated in private accounts.[Nega et al. (2021)] In addition to that, chemistry is believed to rely on a large amount of tacit knowledge, i.e., knowledge that cannot be readily verbalized.[Taber (2014); Polanyi (2009)] Tacit chemical knowledge includes the subtle nuances of experimental procedures, troubleshooting techniques, and the ability to anticipate potential problems based on experience.

These factors—the diversity, scale, and tacity—clearly indicate that the full complexity of chemistry cannot be captured using standard approaches with bespoke representations based on structured data.[Jablonka, Patiny, and Smit (2022)] Fully addressing the challenges imposed by chemistry requires the development of ML systems that can handle diverse, “fuzzy”, data instances and have transferable capabilities to leverage low amounts of data.
“Foundation model” has become a popular term for large pretrained models that serve as a basis for various downstream tasks. The first comprehensive description of such models was provided by Bommasani et al. (2021), who also coined the term “foundation models”. In the chemical literature, this term has different connotations. In many cases, however, the term is used to represent a domain-specific, state-of-the-art model limited to one input modality (e.g., amino acid sequences, crystal structures). Here, we make the distinction between what we term general-purpose model (GPM)s, such as large language model (LLM)s [D. Zhang et al. (2024); Guo et al. (2025); OpenAI et al. (2023); Anthropic (n.d.); Brown et al. (2020)] and domain-specific models with state-of-the-art (SOTA) performance in a subset of tasks, such as machine-learning interatomic potentials.[Batatia et al. (2023); Chen and Ong (2022); Unke et al. (2021)] We adopt the term GPMs to avoid the semantic overlap caused by “foundation model” and to signal the breadth of applicability that we seek to emphasize.

A GPM is a model that has been pre-trained on a broad, heterogeneous corpus spanning multiple data modalities (text, images, graphs) or representations (e.g., common names, 3D coordinates, molecular images). It can be applied to a wide spectrum of downstream tasks that differ in objective (classification, regression, generation, reasoning), input format, and domain—ranging from natural-language processing to chemistry and vision—with little or no task-specific fine-tuning. A GPM supports zero-shot, few-shot, or transfer learning and can serve as the core component of autonomous agents.

Table 1.1: Illustrative examples of GPMs, domain-specific foundation models, and specialized chemistry ML pipelines. The table depicts the definition of GPM with examples for such models, as well as comparisons with domain-specific models and chemistry pipelines. Note that a GPM does not necessarily output text.
Category Typical Characteristics Representative Examples
GPMs Pre-trained on a large heterogeneous corpus spanning multiple modalities. Supports zero/few-shot generalization and can be fine-tuned for diverse chemistry tasks. Capable of autonomous agent behavior, including planning and execution.

Autoregressive: GPT-4,[OpenAI et al. (2023)] LLaMA,[Grattafiori et al. (2024)] Galactica[Taylor et al. (2022)]

Diffusion-based: Gemini Diffusion,[Google DeepMind (n.d.)] Inception Mercury[Labs et al. (2025)]

Other: Mamba-based[Gu and Dao (2023)] models

Domain-Specific Foundation Models Trained on curated, domain-specific datasets (e.g., protein structures, crystal structures). Achieve state-of-the-art performance in narrow task sets, but are typically not multimodal or generalizable to unrelated chemistry problems. AlphaFold,[Jumper et al. (2021)] ESM,[Lin et al. (2023)] MACE-MP-0[Batatia et al. (2023)], MatterSim[H. Yang et al. (2024)], MolecularTransformer[Schwaller et al. (2019)]
Specialized Chemistry Pipelines Domain-specific models combined with rule-based or rely on hand-crafted transferability beyond the Graph-based reaction outcome predictors, quantitative structure-property relationships (QSPR) models using | Morgan fingerprints, Gaussian process regression (GPR) on nuclear magnetic resonance (NMR) shifts |

Table 1.1 gives examples of how this definition can be applied. By decoupling the notion of “general-purpose” from any specific architecture or modality, we aim to foster creative exploration of models that are better aligned with the data characteristics and scientific goals of chemistry. We hope to contribute to this by addressing chemists and computer scientists, by providing technical background, a consistent terminology, and explaining key technical terminology in a glossary at the end of the manuscript.