7  Implications of GPMs: Education, Safety, and Ethics

As general-purpose model (GPM)s integrate into chemistry education and research, they bring transformative potential alongside critical challenges. Responsible deployment requires addressing pedagogical practices, chemical-specific safety risks, and ethical considerations unique to scientific knowledge systems.

7.1 Education

GPMs open up potential for chemistry education, ranging from personalized tutoring and adaptive feedback to supporting educators in material preparation and assessment. [Mollick et al. (2024)] Specialized systems could tailor explanations to individual learning needs, help students rehearse concepts, and lower barriers to coding and data analysis.[Mollick et al. (2024); Sharma et al. (2025); Du et al. (2024)] Coupled with augmented reality (AR), they could provide safe laboratory simulations before physical experiments. For educators, GPMs promise to reduce workload through automated feedback on open-ended responses and individualized assignment generation.[Kortemeyer, Nöhl, and Onishchuk (2024); Gao et al. (2024)]

Despite this potential, current implementations remain fragmented. Most uses rely on general-purpose interfaces without curricular alignment[Subasinghe, Gersib, and Mankad (2025); Tsai, Ong, and Chen (2023); Shao et al. (2025)], and while first prototypes integrate GPMs into structured systems with retrieval-augmented generation (RAG) [Perez et al. (2025); Jablonka et al. (2023)], learning tracking, or interactive tutoring modes [Li et al. (2025); Wang, Lee, and Mutlu (2025); Anthropic (n.d.a)], chemistry-focused platforms are rare. Studies further suggest that while models can handle simple recall or formatting tasks, they struggle with deeper reasoning, diagram interpretation, and robust grading.[Handa et al. (n.d.); Baral et al. (2025); Kharchenko and Babenko (2024)]

7.1.1 Limitations

GPMs lack transparency in sources and confidence, enabling hallucinations to mislead students.[Marcus (n.d.); Kosmyna et al. (2025)] Over-reliance risks deskilling: students complete assignments without developing chemical understanding or analytical thinking.[Dung and Balg (n.d.); Sharma et al. (2025)] Models are unreliable for grading chemistry work, particularly free-text reasoning and molecular diagrams.[Baral et al. (2025); Kortemeyer, Nöhl, and Onishchuk (2024)]

7.1.2 Open Challenges

  • Responsible Integration Strategies. Developing pedagogical frameworks for thoughtful GPM integration in chemistry curricula, adapting assessment to emphasize critical evaluation over rote completion.

  • Chemistry-Specific Systems. Building platforms that integrate chemical structure recognition, spectroscopic data interpretation, and laboratory safety protocols into tutoring and assessment workflows.

  • Critical Competence Training. Teaching students to evaluate chemical plausibility, identify hallucinated reactions, and creatively extend model outputs.[Klein and Winthrop (n.d.)]

  • Validated Assessment Tools. Developing robust grading systems for chemistry-specific tasks: mechanism proposals, spectral interpretation, synthesis planning.

7.2 Safety

Figure 7.1: A conceptual schematic depicting artificial intelligence (AI) risk factors in chemical science. As one traverses through the scientific process, illustrated as a game, there are various obstacles to encountering AI exacerbated risks. The path to superaligned chemical AI-assistants is obfuscated by unexplored chemical space.

GPMs in chemistry are double-edged: they accelerate discovery while potentially amplifying chemical safety risks. From democratizing hazardous synthesis knowledge to enabling autonomous dangerous compound production, these systems lower barriers to misuse (Figure 7.1). Real-world constraints—specialized equipment, regulated reagents, tacit expertise—currently limit risks, but GPMs are progressively overcoming these barriers.

7.2.1 Chemical-Specific Risk Amplification

7.2.1.1 Information Access and Synthesis Planning

GPMs can lower cognitive barriers to accessing dangerous chemical knowledge. Urbina et al. (2022) demonstrated that molecular generators like MegaSyn could design toxic agents including VX when reward functions prioritize toxicity. Recent evaluations of GPT-4 and Claude 4 Opus for biological threat creation found that while information is already accessible, GPMs improve troubleshooting and help acquire tacit knowledge previously limiting non-experts.[OpenAI (n.d.a); Anthropic (n.d.b)] He et al. (2023) showed large language model (LLM)s generate pathways for explosives (pentaerythritol tetranitrate (PETN)) and nerve agents (sarin). While precursor supply chains remain regulated, GPMs that incrementally lower technical barriers act as force multipliers for malicious actors. Chemistry-specific agents like ChemCrow[Bran et al. (2024)] and Coscientist[Boiko et al. (2023)] demonstrate how GPMs can plan and execute complex syntheses—capabilities extending to dangerous compounds.

7.2.1.2 Hallucinations in Chemical Contexts

GPMs can hallucinate non-existent reactions, falsify safety protocols, and generate plausible-sounding but incorrect procedures.[Pantha et al. (2024); Ji et al. (2023)] In chemistry, these errors risk laboratory accidents or synthesis failures that waste resources. Temporal misalignment due to static training data means chemical knowledge becomes outdated as new reactions, hazards, and regulations emerge.

7.2.1.3 Autonomous Laboratory Risks

Autonomous laboratories controlled by GPMs face cybersecurity vulnerabilities. Compromised systems could be manipulated to synthesize hazardous compounds. The timeline mismatch between AI capabilities and infrastructure security creates risk windows where adversaries exploit inadequately protected systems.[Rouleau and Murugan (2025); Dean (n.d.)]

7.2.2 Existing Approaches to Safety

Red-teaming evaluations identify vulnerabilities but are reactive, not preventive. ChemCrow’s safeguards block known controlled substances, yet He et al. (2023) demonstrated these protections rely on post-query web searches rather than embedded constraints—easily circumventable. Machine unlearning attempts to remove hazardous knowledge face fundamental challenges in chemistry: defining “dangerous chemical knowledge” is context-dependent (bleach is benign alone, hazardous when combined), and removal risks degrading model utility for legitimate research. Alignment through reinforcement learning from human feedback (RLHF) reduces harmful outputs but remains vulnerable to jailbreaks[Kuntz et al. (2025); Yona et al. (2024); Lynch et al. (n.d.)] and fails to generalize to novel threats (models might refuse known toxins while suggesting precursors).

7.2.3 Solutions

Chemical GPM oversight should focus on “advanced chemical models” meeting specific risk thresholds: models trained on sensitive synthesis data, systems demonstrating autonomous synthesis planning capabilities, or agents with laboratory control interfaces. This targeted approach avoids burdening low-risk research while capturing concerning systems—analogous to proposed biological model regulations.[Bloomfield et al. (2024)]

Developing chemistry-aware guardrails requires systems that evaluate chemical plausibility, safety, and regulatory compliance before providing synthesis information. These technical safeguards should integrate real-time hazard assessment checking outputs against controlled substance databases, context-aware refusal mechanisms understanding legitimate versus dangerous use cases, precursor tracking identifying suspicious synthesis pathway queries, and audit logging enabling forensic analysis of concerning interactions. Establishing review processes for GPM-assisted chemical research analogous to institutional biosafety committees provides essential institutional oversight. Researchers deploying chemical GPMs with autonomous capabilities should undergo safety review, particularly for autonomous synthesis planning and execution, large-scale screening for bioactive compounds, or novel chemical class exploration without expert oversight.

International coordination remains critical for effective chemical GPM governance. Unlike self-regulation, which risks conflicts of interest and competitive pressure toward laxity, an international oversight body—analogous to organizations governing nuclear materials or biological weapons—could harmonize safety standards across jurisdictions. Such an International Artificial Intelligence Oversight organization (IAIO) comprising AI researchers, chemists, policymakers, and security experts could establish pre-approval requirements for high-risk chemical GPM development, similar to institutional review boards in biomedical research.[Trager et al. (2023)] Precedents exist in The European Organization for Nuclear Research (CERN)’s approach to balancing civilian research with dual-use risk management and nuclear non-proliferation treaties tying market access to compliance.

Transparency requirements should mandate that chemical GPM developers publicly report red-teaming results for dangerous synthesis queries, safety evaluation methodologies and thresholds, known vulnerabilities and mitigation strategies, and training data sources documenting chemical knowledge scope. The fundamental tension remains: GPMs optimize reactions or design toxins with equal facility, but their black-box nature complicates accountability. Progress requires not just safeguards but deliberate constraints on chemical GPM capabilities in high-risk domains.

7.3 Ethics

GPM deployment in chemistry raises ethical concerns requiring careful consideration: from bias in chemical knowledge to environmental costs of computation.[Crawford (2021)]

7.3.1 Environmental Impact of GPMs

The computational requirements for training and deploying GPMs contribute to environmental degradation through excessive energy consumption and carbon emissions. [Spotte-Smith (2025); Board (2023)] These computational resources are often powered by fossil fuel-based energy sources, which directly contribute to anthropogenic climate change. [Strubell, Ganesh, and McCallum (2019)] The emphasis on AI research has superseded some commitments made by big technological companies to carbon neutrality. For example, Google rescinded its commitment to carbon neutrality amid a surge in AI usage (65\(\%\) increase in carbon emissions between 2021–24) and funding.[Bhuiyan (n.d.)] Additionally, the water consumption for cooling data centers that support these models is another concern, particularly in regions facing water scarcity. [Mytton (2021)]

The irony is particularly stark when considering that in the chemical sciences, these models are used to address climate-related challenges, such as the development of sustainable materials or carbon capture technologies. As a scientific community, we must grapple with the questions about the sustainability of current AI development trajectories and consider more efficient and renewable approaches to model development and deployment. [Kolbert (n.d.)]

7.3.3 Biases

GPMs inherit and amplify harmful prejudices and stereotypes present in their training data, which pose significant risks when applied translationally to medicinal chemistry and biochemistry. [Spotte-Smith (2025); Yang et al. (2025); Omiye et al. (2023)] These models can perpetuate inaccurate and harmful assumptions based on race and gender about drug efficacy, toxicity, and disease susceptibility, leading to misdiagnosis and mistreatment. [Chen et al. (2023)] Historical medical literature contains biased representations of how different populations respond to treatments, and GPMs trained on such data can reinforce these misconceptions. [Mittermaier, Raza, and Kvedar (2023)] The problem extends to broader contexts in chemical research. Biased models can influence research priorities, funding decisions, and the development of chemical tools in ways that systematically disadvantage the most vulnerable populations [Dotan and Milli (2019)].

7.3.3.1 Solutions

The problem of bias can be best addressed through top-down reform. The data necessary to train unbiased models can only exist if clinical studies of drug efficacy are conducted on diverse populations in the real world.[Criado-Perez (2019)] To complement improved data collection, standard evaluations for bias testing must be developed and mandated prior to deployment of GPMs.

7.3.4 Access and Power Concentration

Although GPMs have the potential to democratize access to advanced chemical research capabilities, they may also concentrate power in the hands of a few large companies that control the frontier models. This concentration raises concerns about equitable access to research tools, particularly for researchers in smaller institutions with limited resources.[Satariano and Mozur (2025)]

As a community, we should ensure that the benefits of GPMs in chemistry remain broadly accessible via public compute, open-weight models, and portable tooling. We should also insist on fair access terms and transparent benchmarks so that no single provider can gatekeep core research.