1  Introduction

Much emphasis and hope is placed on machine learning (ML) to accelerate the rate of scientific progress.[Jablonka et al. (2020); Butler et al. (2018); Yano et al. (2022); Yao et al. (2022); De Luna et al. (2017); Wang et al. (2023)] Recent progress in the field has demonstrated, for example, the ability of ML models to make predictions for multiscale systems,[Charalambous et al. (2024); Yang et al. (2020); Deringer et al. (2021)] to perform experiments by interacting with laboratory equipment [Boiko et al. (2023); Coley et al. (2019)], to autonomously collect data from the scientific literature,[Schilling-Wilhelmi et al. (2025); W. Zhang et al. (2024); Dagdelen et al. (2024)] and to make predictions with high accuracy.[Jablonka et al. (2024); Jablonka et al. (2023); Jung, Jung, and Cole (2024); Rupp et al. (2012); Keith et al. (2021); J. Wu et al. (2024)]

However, the diversity and scale of chemical data create a unique challenge for applying ML to the chemical sciences. This diversity manifests across temporal, spatial, and representational dimensions. Temporally, chemical processes span femtosecond-scale spectroscopic events to year-long stability studies of pharmaceuticals or batteries, demanding data sampled at resolutions tailored to each time regime. Spatially, systems range from the atomic to the industrial scale, requiring models that bridge molecular behavior to macroscopic properties. Representationally, even a single observation (e.g., a ^13C-NMR spectrum) can be encoded in chemically equivalent formats: a string [Alberts et al. (2024)], vector [Mirza and Jablonka (2024)], or image[Alberts et al. (2024)]. However, these representations are not computationally equivalent and have been empirically shown to produce diverse model outputs.[Atz et al. (2024); Alampara et al. (2024); J.-N. Wu et al. (2024); Skinnider (2024)]

Additionally, ML for chemistry is challenged by what we term “hidden variables”. These can be thought of as the parameters in an experiment that remain largely unaccounted for (e.g., their importance is unknown or they are difficult to control for), but could have a significant impact on experimental outcomes. One example are seasonal variations in ambient laboratory conditions that are typically not controlled for and, if at all, only communicated in private accounts.[Nega et al. (2021)] In addition to that, chemistry is believed to rely on a large amount of tacit knowledge, i.e., knowledge that cannot be readily verbalized.[Taber (2014); Polanyi (2009)] Tacit chemical knowledge includes the subtle nuances of experimental procedures, troubleshooting techniques, and the ability to anticipate potential problems based on past experiences.

These factors—the diversity, scale, and tacity—clearly indicate that the full complexity of chemistry cannot be captured using standard approaches with bespoke representations based on structured data.[Jablonka, Patiny, and Smit (2022)] Fully addressing the challenges imposed by chemistry requires the development of ML systems that can handle diverse, “fuzzy”, data instances and have transferable capabilities to leverage low amounts of data.
Foundation Models are such models that can easily adapt to new settings and deal with diverse, fuzzy inputs. The first comprehensive description of such models was provided by Bommasani et al. (2021), who also coined the term “foundation models”. In the chemical literature, this term has different connotations. We make the distinction between general-purpose model (GPM)s such as large language model (LLM)s [D. Zhang et al. (2024); Guo et al. (2025); OpenAI et al. (2023); Anthropic (2025); Livne et al. (2024); Brown et al. (2020)] and domain-specific models with state-of-the-art (SOTA) performance in a subset of tasks, such as machine-learning interatomic potentials.[Ahmad et al. (2022); Flöge et al. (2024); Batatia et al. (2023); Chen and Ong (2022); Unke et al. (2021)]

As we will show in the following, GPMs—models designed to generalize across a wide range of tasks and domains with minimal task-specific modifications, typically pre-trained on vast and diverse datasets (see Section 3.1)—are better equipped than domain-specific models to leverage diverse, fuzzy inputs. Thus, this review article focuses on their potential to shape the future of research in the chemical sciences.[White (2023)]