8  Outlook and Conclusions

Figure 8.1: Evolution of general-purpose model (GPM)-powered systems. First GPM applications directly used zero- or few-shot prompted or fine-tuned GPMs. More complex tasks could be solved by combining multiple GPMs in workflows where the execution trajectory is pre-determined. In agents, GPMs autonomously decide on the execution trajectory and in this way enable researchers to address open-ended tasks. Moving forward, coupling the validation closer to real-world objectives with further increased validation in better, custom user interfaces will enhance the impact of GPMs. To ensure safe and ethical deployment, the community must engage with the broader public and policymakers to devise governance strategies.

As we have explored in this review, GPMs—especially large language model (LLM)s—hold remarkable promise for the chemical sciences. The field moved from using models in simple single calls to a GPMs to developing workflows, in which a sequence of calls is performed, to increasingly complex autonomous agents in which the model decides on its own trajectory (see 1). To power those agents, increasingly powerful models are being built, including reasoning models that promise higher data efficiency.

Yet, several fundamental questions remain unresolved. We do not understand if there are fundamental limits to what can be predicted, given the inherent unpredictability of chemical systems and the reliance on tacit knowledge. We do not know what new datasets and techniques need to be developed, given the fact that the knowledge we extract from already published data is approaching a limit.[Silver and Sutton (2025)] New data most likely will be generated by agents learning from their own experience. To optimize systems, we need to better understand the underlying structure of chemical data. In many other fields, data distributions have been shaped by special driving forces. For example, evolution led to a direct link between sequence and fitness in biological sequences, which makes such datasets special. In chemistry it is unclear what the “driving force” that shapes datasets is.

It is also unclear how quickly these innovations will permeate the average chemistry lab, where the adoption of new technology depends on more than just predictive prowess. And we also do not know yet how we should interface with those models for the greatest effectiveness. In addition, it is also unclear how far acceleration can take us, as nature imposes some natural speed limits: Some experiments simply take their time.

Overall, this landscape suggests a future rich with opportunity. But realizing the potential impact of GPMs demands clear-eyed caution: while it is now deceptively easy to spin up prototypes, transforming them into robust, reliable tools is a far more arduous task. [Sculley et al. (2014)] More crucial still is our need for rigorous measurement and feedback—whether in the construction of evaluation suites, the calibration of reward functions for reinforcement learning, or the design of sensible governance. No single discipline can shoulder this alone; chemists, policy experts, and computer scientists must broaden their ranks and collaborate. This is particularly since science has always benefited from embracing a diversity of approaches. While GPM-powered approaches for science, such as “AI scientists”, hold promise, a myopic focus on “AI scientists” might lead to “scientific monocultures”.[Savitsky (2025)] We hope this review lowers the barrier to entry to the background and applications of GPMs in the chemical sciences, inviting a wider spectrum of contributors to adopt a systems-science mindset—and, in doing so, to help harness the best of what GPMs can offer for tackling the chemical sciences’ most persistent and pressing challenges.