GDPR-Com­pli­ant AI on Your Web­site: What Works, What Does­n't

GDPR-com­pli­ant AI is pos­si­ble, but nev­er as a blan­ket yes. Sep­a­rate data flow from data ori­gin and every use case sorts into one of four cat­e­gories. Three pieces of home­work de­cide whether the sys­tem stays legal­ly op­er­a­ble.
7 min readMatthias RadscheitMatthias Radscheit
Happycodingen-US

TL;DR

GDPR-compliant AI on your website is achievable, but never as a blanket answer. Separate data flow from data origin, and every use case lands in one of four categories. Three pieces of homework decide legal operability: a training-exclusion clause, technically enforced consent, and a deletion path that reaches the vector store.

  • Two fault lines settle every case: data flow (where data goes, who processes it) and data origin (whose data ends up in training).
  • Use cases sort into four categories: unproblematic, consent-required, contractually coverable, or better left unbuilt.
  • Three pieces of homework are non-negotiable: a training-exclusion clause in the DPA, technically enforced consent before activation, and a deletion path that reaches into the vector store.
  • Emotion analysis, behavioural profiling, and fully automated decisions under Article 22 GDPR are functions you deliberately do not build; the business goal is usually reachable without them.
  • The legal ground (DPF/Schrems, the EU AI Act, vendors' training practices) keeps shifting; rely on architecture, not on a US vendor's current plan.

GDPR-com­pli­ant AI on your web­site is pos­si­ble, but nev­er as a blan­ket yes, and nev­er by em­bed­ding a chat­bot shipped out of the Unit­ed States. If you want a clear line, you get one the mo­ment you keep two ques­tions apart: where does the data go, and where does it come from?

In rough­ly every sec­ond con­ver­sa­tion over the past twelve months, some ver­sion of this lands on the ta­ble: "Can we use AI on our web­site with­out run­ning into the GDPR?" We are not lawyers and we do not re­place le­gal ad­vice. But we have spent two years build­ing AI-as­sist­ed fea­tures into B2B web­sites (lead qual­i­fi­ca­tion, con­tent gen­er­a­tion, se­man­tic search) and we have wres­tled, very con­crete­ly, with what it takes tech­ni­cal­ly and con­trac­tu­al­ly for this to hold up in­side the Eu­ro­pean frame­work. Any­one who needs le­gal cer­tain­ty at the end still talks to a lawyer, but by then with far sharp­er ques­tions.

Why "GDPR-com­pli­ant AI on your web­site" so rarely gets a clear an­swer

The ques­tion stays open be­cause it con­flates two things that be­long apart. The first fault line is data flow: which data moves, where to, and who process­es it? That de­ter­mines whether you need a pro­cess­ing arrange­ment, whether a third-coun­try trans­fer hap­pens, whether Stan­dard Con­trac­tu­al Claus­es have to ap­ply. The sec­ond fault line is data ori­gin: whose data may be processed at all, and does it end up in a mod­el's train­ing data?

Lay those two axes down and every use case sorts into one of four cat­e­gories: un­prob­lem­at­ic, con­sent-re­quired, con­trac­tu­al­ly cov­er­able, or bet­ter left un­built. That re­places gut feel with a de­ci­sion log­ic you can de­fend in front of your data pro­tec­tion of­fi­cer or the board.

The four cat­e­gories for any AI use case

Case 1: serv­er-side AI with no per­son­al data. A key­word gen­er­a­tor, a trans­la­tion, a sum­ma­ry of ed­i­to­r­i­al con­tent. Here the GDPR is not the bot­tle­neck, be­cause no per­son­al data aris­es in the first place. The ba­sics still ap­ply: a ven­dor with a Eu­ro­pean serv­er re­gion (An­throp­ic Claude, Ope­nAI via Azure EU, Mis­tral, Aleph Al­pha), a clean data pro­cess­ing agree­ment, dis­ci­plined API key man­age­ment. This is the un­prob­lem­at­ic cat­e­go­ry, and the bulk of what mar­ket­ing de­part­ments ac­tu­al­ly want falls right here.

Case 2: AI with user data in real time. Chat­bot, se­man­tic search, lead qual­i­fi­ca­tion. The mo­ment the IP ad­dress or the in­put con­tent flows to an ex­ter­nal LLM ven­dor, you are pro­cess­ing per­son­al data. What holds up here: a doc­u­ment­ed data flow map, a data pro­cess­ing agree­ment with the LLM ven­dor, a trans­par­ent pri­va­cy no­tice that names the ven­dor and the type of data, and gen­uine con­sent for AI be­fore ac­ti­va­tion, not a pre-ticked ban­ner. If the ven­dor sits out­side the EU, Stan­dard Con­trac­tu­al Claus­es and a third-coun­try trans­fer as­sess­ment un­der Schrems II come on top. This is the con­sent-re­quired, con­trac­tu­al­ly cov­er­able cat­e­go­ry.

Case 3: AI with long-term data stor­age. Con­ver­sa­tions that "learn", pro­files, per­son­al­i­sa­tion. Here pur­pose lim­i­ta­tion, data min­imi­sa­tion, stor­age lim­i­ta­tion, and the rights to ac­cess and era­sure hit at full force. Pseu­do­nymise or anonymise wher­ev­er you can. De­fine re­ten­tion pe­ri­ods and en­force them au­to­mat­i­cal­ly. And, the point where it gets tech­ni­cal­ly se­ri­ous, keep a dele­tion path ready that also reach­es the em­bed­dings in the vec­tor store and the con­ver­sa­tion his­to­ry. If you run pgvec­tor as your AI back­end, you have to de­sign that path in from the start, or era­sure stays the­o­ry; we have de­scribed the ar­chi­tec­ture be­hind it in de­tail else­where.

Case 4: AI we don't rec­om­mend. A sep­a­rate sec­tion for that in a mo­ment; this cat­e­go­ry de­serves more than a sin­gle line.

The point that gets over­looked: train­ing data

Are the data sent to an LLM ven­dor used for train­ing? With the large ven­dors (Ope­nAI, An­throp­ic, Google) the an­swer in B2B, API, and en­ter­prise plans is no by de­fault. In con­sumer plans it is­n't, and the prac­tice can change. That is ex­act­ly why the train­ing ex­clu­sion be­longs in the con­tract be­fore any in­te­gra­tion, not in an as­sump­tion.

We pre­fer ven­dors who com­mu­ni­cate train­ing ex­clu­sion as the de­fault rather than sell­ing it as a pre­mi­um op­tion. This is­n't a de­tail for le­gal; it's an ar­gu­ment that goes straight into your pri­va­cy no­tice and that you can stand be­hind in front of cus­tomers. Build on a US ven­dor's good­will here and you build on sand, be­cause the ques­tion of whether Eu­ro­pean data is safe at all on US in­fra­struc­ture is not set­tled, see the CLOUD Act and the mat­ter of data sov­er­eign­ty.

Which AI fea­tures you de­lib­er­ate­ly don't build

Here the hon­est an­swer gets un­com­fort­able. Three class­es of fea­ture are tech­ni­cal­ly build­able but should­n't be.

Emo­tion analy­sis and be­hav­iour­al pro­fil­ing via we­b­cam, mi­cro­phone, or in­put pat­terns al­most al­ways force a Data Pro­tec­tion Im­pact As­sess­ment, are hard to pass, and car­ry a neg­a­tive so­cial charge. The EU AI Act sharp­ens this line fur­ther: emo­tion recog­ni­tion in the work­place and in ed­u­ca­tion falls un­der the pro­hib­it­ed prac­tices, and bio­met­ric cat­e­gori­sa­tion counts as a high-risk ap­pli­ca­tion with sub­stan­tial oblig­a­tions. So you add a sec­ond reg­u­la­to­ry di­men­sion on top of the GDPR, with the tran­si­tion pe­ri­ods still to be ver­i­fied as of mid-2026.

The third class is ful­ly au­to­mat­ed de­ci­sions with le­gal ef­fect, such as a lead score that gov­erns ac­cess or terms. That is the ter­ri­to­ry of Ar­ti­cle 22 GDPR, which ties such de­ci­sions to strict con­di­tions by de­fault. Our rec­om­men­da­tion in all three cas­es: first check whether the busi­ness goal is reach­able with­out the mech­a­nism. In nine out of ten cas­es it is.

And then there is the gen­uine­ly hard part, the one that ap­pears in no con­tract tem­plate: the dele­tion path. An Ar­ti­cle 17 era­sure re­quest has to reach the data every­where, in the main data­base, in the con­ver­sa­tion his­to­ry, and in the em­bed­dings of the vec­tor store. That is ex­act­ly where sys­tems fail in prac­tice, be­cause the vec­tor store, a down­stream in­dex, is so of­ten for­got­ten. On top of that, the le­gal ground it­self is in mo­tion: the re­la­tion­ship be­tween the EU-US Data Pri­va­cy Frame­work and Schrems, ven­dors' train­ing prac­tices, the in­ter­pre­ta­tion of the AI Act, none of it is fi­nal­ly sort­ed. Build on a cut-off date here and you build wrong.

The prag­mat­ic pro­ce­dure be­fore any in­te­gra­tion

Be­fore a sin­gle line of in­te­gra­tion code ex­ists, we an­swer five ques­tions in this or­der. First: which data ac­tu­al­ly flows, record­ed end to end, from click to an­swer? Sec­ond: which ven­dor, Eu­ro­pean-host­ed and with a train­ing ex­clu­sion? Third: which con­trac­tu­al ba­sis, DPA, Stan­dard Con­trac­tu­al Claus­es where ap­plic­a­ble, a third-coun­try trans­fer as­sess­ment? Fourth: which con­sent, tech­ni­cal­ly en­forced be­fore ac­ti­va­tion? Fifth: what does the dele­tion path look like, where does the data live, every­where, and how does it get re­moved with­out re­main­der?

The fifth point is the one most of­ten for­got­ten and at the same time the best lit­mus test for a team's tech­ni­cal ma­tu­ri­ty. If you can't draw the dele­tion path, you haven't un­der­stood the ar­chi­tec­ture. The same ques­tion of data sov­er­eign­ty and stor­age lo­ca­tion, in­ci­den­tal­ly, al­ready aris­es one lay­er down, at the back­end it­self; which ar­chi­tec­ture even makes that con­trol pos­si­ble is what we cov­er on our Su­pabase ar­chi­tec­ture overview.

What I tell de­ci­sion-mak­ers

Don't rely on a sin­gle ven­dor's com­pli­ance state­ment; rely on an ar­chi­tec­ture that en­forces three things: the train­ing ex­clu­sion in the con­tract, con­sent tech­ni­cal­ly be­fore ac­ti­va­tion, and a dele­tion path that reach­es every stor­age lo­ca­tion down to the vec­tor store. Those three pieces of home­work are non-ne­go­tiable. They de­cide whether the sys­tem stays legal­ly op­er­a­ble the next time the le­gal ground shifts.

The rest is sort­ing work. Sep­a­rate data flow from data ori­gin, place every use case into the four cat­e­gories, and strike the fourth with­out de­bate. Work this way and you run AI on the web­site not in spite of the GDPR, but in a form you can de­fend be­fore the board and be­fore the su­per­vi­so­ry au­thor­i­ty alike. That is no le­gal sleight of hand. It's en­gi­neer­ing hy­giene.

Frequently asked questions

Is GDPR-compliant AI on a website even possible?
Yes, but not as a blanket answer. What matters is which data goes where and whether it ends up in training. Server-side AI with no personal data is uncritical; a chatbot that sends user input to an external LLM needs a data processing agreement, a transparent privacy notice, genuine consent, and, for non-EU vendors, a third-country transfer assessment.
Do I need a data processing agreement for an LLM?
As soon as personal data, even just the IP address or the input content, flows to an external LLM vendor, yes. The DPA governs the processing on your behalf. For vendors outside the EU, Standard Contractual Clauses and a third-country transfer assessment under Schrems II come on top. Check in the same step whether a training exclusion is contractually guaranteed.
What is the most common mistake in data protection for AI features?
The forgotten deletion path. Many teams implement consent and the DPA cleanly but cannot fully remove the data on an Article 17 erasure request, because embeddings sit in the vector store and conversation histories live outside the main database. The deletion path has to cover every storage location from day one.
Which AI features should you deliberately not build?
Emotion analysis, behavioural profiling via webcam, microphone, or input patterns, and fully automated decisions with legal effect under Article 22 GDPR. These functions usually force a Data Protection Impact Assessment, are hard to justify, and increasingly collide with the EU AI Act. In nine out of ten cases the business goal is reachable without them.

Sources

Related articles

Open for select projects

Let's talk about your project

Book a no-oblig­a­tion call, send us an email, or use the form – we'd love to hear from you.

150+
Completed projects
15
Years of experience
8
Senior‑level team members