GDPR-compliant AI on your website is possible, but never as a blanket yes, and never by embedding a chatbot shipped out of the United States. If you want a clear line, you get one the moment you keep two questions apart: where does the data go, and where does it come from?
In roughly every second conversation over the past twelve months, some version of this lands on the table: "Can we use AI on our website without running into the GDPR?" We are not lawyers and we do not replace legal advice. But we have spent two years building AI-assisted features into B2B websites (lead qualification, content generation, semantic search) and we have wrestled, very concretely, with what it takes technically and contractually for this to hold up inside the European framework. Anyone who needs legal certainty at the end still talks to a lawyer, but by then with far sharper questions.
Why "GDPR-compliant AI on your website" so rarely gets a clear answer
The question stays open because it conflates two things that belong apart. The first fault line is data flow: which data moves, where to, and who processes it? That determines whether you need a processing arrangement, whether a third-country transfer happens, whether Standard Contractual Clauses have to apply. The second fault line is data origin: whose data may be processed at all, and does it end up in a model's training data?
Lay those two axes down and every use case sorts into one of four categories: unproblematic, consent-required, contractually coverable, or better left unbuilt. That replaces gut feel with a decision logic you can defend in front of your data protection officer or the board.
The four categories for any AI use case
Case 1: server-side AI with no personal data. A keyword generator, a translation, a summary of editorial content. Here the GDPR is not the bottleneck, because no personal data arises in the first place. The basics still apply: a vendor with a European server region (Anthropic Claude, OpenAI via Azure EU, Mistral, Aleph Alpha), a clean data processing agreement, disciplined API key management. This is the unproblematic category, and the bulk of what marketing departments actually want falls right here.
Case 2: AI with user data in real time. Chatbot, semantic search, lead qualification. The moment the IP address or the input content flows to an external LLM vendor, you are processing personal data. What holds up here: a documented data flow map, a data processing agreement with the LLM vendor, a transparent privacy notice that names the vendor and the type of data, and genuine consent for AI before activation, not a pre-ticked banner. If the vendor sits outside the EU, Standard Contractual Clauses and a third-country transfer assessment under Schrems II come on top. This is the consent-required, contractually coverable category.
Case 3: AI with long-term data storage. Conversations that "learn", profiles, personalisation. Here purpose limitation, data minimisation, storage limitation, and the rights to access and erasure hit at full force. Pseudonymise or anonymise wherever you can. Define retention periods and enforce them automatically. And, the point where it gets technically serious, keep a deletion path ready that also reaches the embeddings in the vector store and the conversation history. If you run pgvector as your AI backend, you have to design that path in from the start, or erasure stays theory; we have described the architecture behind it in detail elsewhere.
Case 4: AI we don't recommend. A separate section for that in a moment; this category deserves more than a single line.
The point that gets overlooked: training data
Are the data sent to an LLM vendor used for training? With the large vendors (OpenAI, Anthropic, Google) the answer in B2B, API, and enterprise plans is no by default. In consumer plans it isn't, and the practice can change. That is exactly why the training exclusion belongs in the contract before any integration, not in an assumption.
We prefer vendors who communicate training exclusion as the default rather than selling it as a premium option. This isn't a detail for legal; it's an argument that goes straight into your privacy notice and that you can stand behind in front of customers. Build on a US vendor's goodwill here and you build on sand, because the question of whether European data is safe at all on US infrastructure is not settled, see the CLOUD Act and the matter of data sovereignty.
Which AI features you deliberately don't build
Here the honest answer gets uncomfortable. Three classes of feature are technically buildable but shouldn't be.
Emotion analysis and behavioural profiling via webcam, microphone, or input patterns almost always force a Data Protection Impact Assessment, are hard to pass, and carry a negative social charge. The EU AI Act sharpens this line further: emotion recognition in the workplace and in education falls under the prohibited practices, and biometric categorisation counts as a high-risk application with substantial obligations. So you add a second regulatory dimension on top of the GDPR, with the transition periods still to be verified as of mid-2026.
The third class is fully automated decisions with legal effect, such as a lead score that governs access or terms. That is the territory of Article 22 GDPR, which ties such decisions to strict conditions by default. Our recommendation in all three cases: first check whether the business goal is reachable without the mechanism. In nine out of ten cases it is.
And then there is the genuinely hard part, the one that appears in no contract template: the deletion path. An Article 17 erasure request has to reach the data everywhere, in the main database, in the conversation history, and in the embeddings of the vector store. That is exactly where systems fail in practice, because the vector store, a downstream index, is so often forgotten. On top of that, the legal ground itself is in motion: the relationship between the EU-US Data Privacy Framework and Schrems, vendors' training practices, the interpretation of the AI Act, none of it is finally sorted. Build on a cut-off date here and you build wrong.
The pragmatic procedure before any integration
Before a single line of integration code exists, we answer five questions in this order. First: which data actually flows, recorded end to end, from click to answer? Second: which vendor, European-hosted and with a training exclusion? Third: which contractual basis, DPA, Standard Contractual Clauses where applicable, a third-country transfer assessment? Fourth: which consent, technically enforced before activation? Fifth: what does the deletion path look like, where does the data live, everywhere, and how does it get removed without remainder?
The fifth point is the one most often forgotten and at the same time the best litmus test for a team's technical maturity. If you can't draw the deletion path, you haven't understood the architecture. The same question of data sovereignty and storage location, incidentally, already arises one layer down, at the backend itself; which architecture even makes that control possible is what we cover on our Supabase architecture overview.
What I tell decision-makers
Don't rely on a single vendor's compliance statement; rely on an architecture that enforces three things: the training exclusion in the contract, consent technically before activation, and a deletion path that reaches every storage location down to the vector store. Those three pieces of homework are non-negotiable. They decide whether the system stays legally operable the next time the legal ground shifts.
The rest is sorting work. Separate data flow from data origin, place every use case into the four categories, and strike the fourth without debate. Work this way and you run AI on the website not in spite of the GDPR, but in a form you can defend before the board and before the supervisory authority alike. That is no legal sleight of hand. It's engineering hygiene.
