What can go wrong?

TL;DR
- Treat anything you paste into a consumer chatbot UI as data that may be logged and used to improve the service. Prefer API/enterprise endpoints or institution-hosted/local models for work with personal data, unpublished results, or proprietary code. See statements from OpenAI (Enterprise Privacy), Azure OpenAI (Data & Privacy), and Google Vertex AI (Data Governance & Zero Data Retention).
- Multi-tenant chat UIs have had cross-user exposure incidents (see OpenAI’s postmortem on the March 20, 2023 ChatGPT outage).
- Independent reporting shows platforms collect broad personal data and may share prompts with service providers and other parties; mobile apps can add precise location, phone numbers, and photos. See: heise developer.
- Safer patterns exist: local (Ollama + Open WebUI / LM Studio / GPT4All / llama.cpp), institution-hosted gateways, or enterprise plans with no-training defaults and retention/residency controls. See the Hamburg DPA checklist.

Note that often data shared via APIs is — in contrast to data shared via web frontends — not used for training purposes.

Safer usage patterns

A) Local-only (no data leaves your machine)

Run models locally with: Ollama + Open WebUI or the Open WebUI docs, LM Studio, GPT4All, or llama.cpp.
Best for: human-subject data, embargoed results, export-controlled material.

B) Institution-hosted (on-prem/HPC/VPN/VPC)

Spin up vLLM/llama.cpp/Ollama behind your VPN. Add a prompt-gateway to redact PII, block risky uploads, and log usage for audit. A self-hosted UI like Open WebUI makes this approachable.

C) Enterprise/per-tenant cloud

Use your lab or university’s tenant with no-training defaults, retention controls, and regional hosting. See: OpenAI Enterprise Privacy, Azure OpenAI data & privacy, Vertex AI data governance.