7 Implications of GPMs: Education, Safety, and Ethics
As general-purpose model (GPM)s integrate into chemistry education and research, they bring transformative potential alongside critical challenges. Responsible deployment requires addressing pedagogical practices, chemical-specific safety risks, and ethical considerations unique to scientific knowledge systems.
7.1 Education
GPMs open up potential for chemistry education, ranging from personalized tutoring and adaptive feedback to supporting educators in material preparation and assessment. [Mollick et al. (2024)] Specialized systems could tailor explanations to individual learning needs, help students rehearse concepts, and lower barriers to coding and data analysis.[Mollick et al. (2024); Sharma et al. (2025); Du et al. (2024)] Coupled with augmented reality (AR), they could provide safe laboratory simulations before physical experiments. For educators, GPMs promise to reduce workload through automated feedback on open-ended responses and individualized assignment generation.[Kortemeyer, Nöhl, and Onishchuk (2024); Gao et al. (2024)]
Despite this potential, current implementations remain fragmented. Most uses rely on general-purpose interfaces without curricular alignment[Subasinghe, Gersib, and Mankad (2025); Tsai, Ong, and Chen (2023); Shao et al. (2025)], and while first prototypes integrate GPMs into structured systems with retrieval-augmented generation (RAG) [Perez et al. (2025); Jablonka et al. (2023)], learning tracking, or interactive tutoring modes [Li et al. (2025); Wang, Lee, and Mutlu (2025); Anthropic (n.d.a)], chemistry-focused platforms are rare. Studies further suggest that while models can handle simple recall or formatting tasks, they struggle with deeper reasoning, diagram interpretation, and robust grading.[Handa et al. (n.d.); Baral et al. (2025); Kharchenko and Babenko (2024)]
7.1.1 Limitations
GPMs lack transparency in sources and confidence, enabling hallucinations to mislead students.[Marcus (n.d.); Kosmyna et al. (2025)] Over-reliance risks deskilling: students complete assignments without developing chemical understanding or analytical thinking.[Dung and Balg (n.d.); Sharma et al. (2025)] Models are unreliable for grading chemistry work, particularly free-text reasoning and molecular diagrams.[Baral et al. (2025); Kortemeyer, Nöhl, and Onishchuk (2024)]
7.1.2 Open Challenges
Responsible Integration Strategies. Developing pedagogical frameworks for thoughtful GPM integration in chemistry curricula, adapting assessment to emphasize critical evaluation over rote completion.
Chemistry-Specific Systems. Building platforms that integrate chemical structure recognition, spectroscopic data interpretation, and laboratory safety protocols into tutoring and assessment workflows.
Critical Competence Training. Teaching students to evaluate chemical plausibility, identify hallucinated reactions, and creatively extend model outputs.[Klein and Winthrop (n.d.)]
Validated Assessment Tools. Developing robust grading systems for chemistry-specific tasks: mechanism proposals, spectral interpretation, synthesis planning.
7.2 Safety
GPMs in chemistry are double-edged: they accelerate discovery while potentially amplifying chemical safety risks. From democratizing hazardous synthesis knowledge to enabling autonomous dangerous compound production, these systems lower barriers to misuse (Figure 7.1). Real-world constraints—specialized equipment, regulated reagents, tacit expertise—currently limit risks, but GPMs are progressively overcoming these barriers.
7.2.1 Chemical-Specific Risk Amplification
7.2.1.1 Information Access and Synthesis Planning
GPMs can lower cognitive barriers to accessing dangerous chemical knowledge. Urbina et al. (2022) demonstrated that molecular generators like MegaSyn could design toxic agents including VX when reward functions prioritize toxicity. Recent evaluations of GPT-4 and Claude 4 Opus for biological threat creation found that while information is already accessible, GPMs improve troubleshooting and help acquire tacit knowledge previously limiting non-experts.[OpenAI (n.d.a); Anthropic (n.d.b)] He et al. (2023) showed large language model (LLM)s generate pathways for explosives (pentaerythritol tetranitrate (PETN)) and nerve agents (sarin). While precursor supply chains remain regulated, GPMs that incrementally lower technical barriers act as force multipliers for malicious actors. Chemistry-specific agents like ChemCrow[Bran et al. (2024)] and Coscientist[Boiko et al. (2023)] demonstrate how GPMs can plan and execute complex syntheses—capabilities extending to dangerous compounds.
7.2.1.2 Hallucinations in Chemical Contexts
GPMs can hallucinate non-existent reactions, falsify safety protocols, and generate plausible-sounding but incorrect procedures.[Pantha et al. (2024); Ji et al. (2023)] In chemistry, these errors risk laboratory accidents or synthesis failures that waste resources. Temporal misalignment due to static training data means chemical knowledge becomes outdated as new reactions, hazards, and regulations emerge.
7.2.1.3 Autonomous Laboratory Risks
Autonomous laboratories controlled by GPMs face cybersecurity vulnerabilities. Compromised systems could be manipulated to synthesize hazardous compounds. The timeline mismatch between AI capabilities and infrastructure security creates risk windows where adversaries exploit inadequately protected systems.[Rouleau and Murugan (2025); Dean (n.d.)]
7.2.2 Existing Approaches to Safety
Red-teaming evaluations identify vulnerabilities but are reactive, not preventive. ChemCrow’s safeguards block known controlled substances, yet He et al. (2023) demonstrated these protections rely on post-query web searches rather than embedded constraints—easily circumventable. Machine unlearning attempts to remove hazardous knowledge face fundamental challenges in chemistry: defining “dangerous chemical knowledge” is context-dependent (bleach is benign alone, hazardous when combined), and removal risks degrading model utility for legitimate research. Alignment through reinforcement learning from human feedback (RLHF) reduces harmful outputs but remains vulnerable to jailbreaks[Kuntz et al. (2025); Yona et al. (2024); Lynch et al. (n.d.)] and fails to generalize to novel threats (models might refuse known toxins while suggesting precursors).
7.2.3 Solutions
Chemical GPM oversight should focus on “advanced chemical models” meeting specific risk thresholds: models trained on sensitive synthesis data, systems demonstrating autonomous synthesis planning capabilities, or agents with laboratory control interfaces. This targeted approach avoids burdening low-risk research while capturing concerning systems—analogous to proposed biological model regulations.[Bloomfield et al. (2024)]
Developing chemistry-aware guardrails requires systems that evaluate chemical plausibility, safety, and regulatory compliance before providing synthesis information. These technical safeguards should integrate real-time hazard assessment checking outputs against controlled substance databases, context-aware refusal mechanisms understanding legitimate versus dangerous use cases, precursor tracking identifying suspicious synthesis pathway queries, and audit logging enabling forensic analysis of concerning interactions. Establishing review processes for GPM-assisted chemical research analogous to institutional biosafety committees provides essential institutional oversight. Researchers deploying chemical GPMs with autonomous capabilities should undergo safety review, particularly for autonomous synthesis planning and execution, large-scale screening for bioactive compounds, or novel chemical class exploration without expert oversight.
International coordination remains critical for effective chemical GPM governance. Unlike self-regulation, which risks conflicts of interest and competitive pressure toward laxity, an international oversight body—analogous to organizations governing nuclear materials or biological weapons—could harmonize safety standards across jurisdictions. Such an International Artificial Intelligence Oversight organization (IAIO) comprising AI researchers, chemists, policymakers, and security experts could establish pre-approval requirements for high-risk chemical GPM development, similar to institutional review boards in biomedical research.[Trager et al. (2023)] Precedents exist in The European Organization for Nuclear Research (CERN)’s approach to balancing civilian research with dual-use risk management and nuclear non-proliferation treaties tying market access to compliance.
Transparency requirements should mandate that chemical GPM developers publicly report red-teaming results for dangerous synthesis queries, safety evaluation methodologies and thresholds, known vulnerabilities and mitigation strategies, and training data sources documenting chemical knowledge scope. The fundamental tension remains: GPMs optimize reactions or design toxins with equal facility, but their black-box nature complicates accountability. Progress requires not just safeguards but deliberate constraints on chemical GPM capabilities in high-risk domains.
7.3 Ethics
GPM deployment in chemistry raises ethical concerns requiring careful consideration: from bias in chemical knowledge to environmental costs of computation.[Crawford (2021)]
7.3.1 Environmental Impact of GPMs
The computational requirements for training and deploying GPMs contribute to environmental degradation through excessive energy consumption and carbon emissions. [Spotte-Smith (2025); Board (2023)] These computational resources are often powered by fossil fuel-based energy sources, which directly contribute to anthropogenic climate change. [Strubell, Ganesh, and McCallum (2019)] The emphasis on AI research has superseded some commitments made by big technological companies to carbon neutrality. For example, Google rescinded its commitment to carbon neutrality amid a surge in AI usage (65\(\%\) increase in carbon emissions between 2021–24) and funding.[Bhuiyan (n.d.)] Additionally, the water consumption for cooling data centers that support these models is another concern, particularly in regions facing water scarcity. [Mytton (2021)]
The irony is particularly stark when considering that in the chemical sciences, these models are used to address climate-related challenges, such as the development of sustainable materials or carbon capture technologies. As a scientific community, we must grapple with the questions about the sustainability of current AI development trajectories and consider more efficient and renewable approaches to model development and deployment. [Kolbert (n.d.)]
7.3.2 Copyright Infringement and Plagiarism Concerns
GPMs are typically trained on a vast corpora of copyrighted scientific literature, patents, and proprietary databases, often without explicit permission, a practice that has sparked legal disputes, such as Getty Images v. Stability AI, where plaintiffs allege unauthorized scraping of protected content. [Kirchhübel and Brown (2024)] Developers at OpenAI claimed in a statement to the United Kingdom (UK) House of Lords that training state-of-the-art (SOTA) models is “impossible” without copyrighted material, highlighting a fundamental tension between intellectual property (IP) law and AI advancement. [OpenAI (n.d.b)] In the chemical sciences, this challenge persists through the training of models on experimental results from pay-walled journals. A potential resolution to this in the scientific sphere lies in the expansion of open-access research frameworks. Initiatives like the chemical abstracts service (CAS) Common Chemistry database provide legally clear training data while maintaining attribution. LLMs have shown a high propensity to regurgitate elements from their training data. When generating text, models may reproduce near-verbatim fragments of training data without citation, effectively obscuring intellectual contributions.[Bender et al. (2021)] While some praise GPMs for overcoming “blank-page syndrome” for early-career scientists [Altmäe, Sola-Leyva, and Salumets (2023)], others warn that uncritical reliance on their outputs risks eroding scientific rigor.[Donker (2023)]
7.3.3 Biases
GPMs inherit and amplify harmful prejudices and stereotypes present in their training data, which pose significant risks when applied translationally to medicinal chemistry and biochemistry. [Spotte-Smith (2025); Yang et al. (2025); Omiye et al. (2023)] These models can perpetuate inaccurate and harmful assumptions based on race and gender about drug efficacy, toxicity, and disease susceptibility, leading to misdiagnosis and mistreatment. [Chen et al. (2023)] Historical medical literature contains biased representations of how different populations respond to treatments, and GPMs trained on such data can reinforce these misconceptions. [Mittermaier, Raza, and Kvedar (2023)] The problem extends to broader contexts in chemical research. Biased models can influence research priorities, funding decisions, and the development of chemical tools in ways that systematically disadvantage the most vulnerable populations [Dotan and Milli (2019)].
7.3.3.1 Solutions
The problem of bias can be best addressed through top-down reform. The data necessary to train unbiased models can only exist if clinical studies of drug efficacy are conducted on diverse populations in the real world.[Criado-Perez (2019)] To complement improved data collection, standard evaluations for bias testing must be developed and mandated prior to deployment of GPMs.
7.3.4 Access and Power Concentration
Although GPMs have the potential to democratize access to advanced chemical research capabilities, they may also concentrate power in the hands of a few large companies that control the frontier models. This concentration raises concerns about equitable access to research tools, particularly for researchers in smaller institutions with limited resources.[Satariano and Mozur (2025)]
As a community, we should ensure that the benefits of GPMs in chemistry remain broadly accessible via public compute, open-weight models, and portable tooling. We should also insist on fair access terms and transparent benchmarks so that no single provider can gatekeep core research.