Su­pabase as an AI back­end: pgvec­tor, Edge Func­tions and the con­sol­i­da­tion ques­tion

For most en­ter­prise use cas­es, a sep­a­rate vec­tor data­base is a prob­lem you buy back into your own stack. Why Su­pabase with pgvec­tor and Edge Func­tions is a con­sol­i­da­tion de­ci­sion, not a hype one — and where the hon­est lim­it ac­tu­al­ly sits.
9 min readMatthias RadscheitMatthias Radscheit
Happycodingen-US

TL;DR

Supabase works as an AI backend because pgvector runs RAG and semantic search directly in PostgreSQL, with no separate vector database. Edge Functions handle embedding generation and API proxying right next to the stack. For most enterprise use cases that means one fewer component to operate, not one more AI feature to market.

  • pgvector makes RAG and semantic search possible directly in PostgreSQL — no separate vector store to back up, sync and operate.
  • Edge Functions are built for short-lived work (generating embeddings, webhooks, API proxying), not long heavy-compute jobs: 256 MB RAM, 2 seconds of CPU time per request.
  • pgvector's core scaling limit is physical: once the HNSW index no longer fits in RAM, query performance collapses non-linearly.
  • Rule of thumb: below roughly five to ten million vectors at moderate query load, pgvector is the pragmatic default. Above that under load, look at Qdrant, Pinecone or Weaviate.
  • The decision is commercial, not technical infatuation: fewer moving parts mean less operational risk, fewer vendor interfaces and more predictable TCO.

Most AI projects don't fail at the mod­el. They fail at the in­fra­struc­ture around it — the part no­body want­ed to op­er­ate and some­body ends up op­er­at­ing any­way.

When a team says "we're build­ing RAG" to­day, a sec­ond data­base lands on the ar­chi­tec­ture white­board al­most by re­flex: a ded­i­cat­ed vec­tor data­base next to the ex­ist­ing Post­greSQL. Pinecone, Weav­i­ate, Qdrant — the names change, the pat­tern holds. With the sec­ond data­base comes the sec­ond back­up strat­e­gy, the sec­ond mon­i­tor­ing set­up, the sec­ond ac­cess-con­trol mod­el, and the sync job that's sup­posed to keep data con­sis­tent be­tween the pri­ma­ry store and the vec­tor store. That sync is ex­act­ly where the pager goes off at three in the morn­ing.

Why Su­pabase as an AI back­end is a con­sol­i­da­tion de­ci­sion

Su­pabase is, at its core, man­aged Post­greSQL with Auth, Stor­age, Re­al­time and Edge Func­tions lay­ered on top. The piece that mat­ters for AI is a Post­greSQL ex­ten­sion called pgvec­tor. It adds a na­tive data type for vec­tors and turns sim­i­lar­i­ty search into a SQL op­er­a­tion. That makes the vec­tor data­base Post­greSQL not a mar­ket­ing slo­gan but a plain fact: the vec­tor in­dex lives in the same data­base as your busi­ness data.

This shifts the ques­tion. It is no longer "which vec­tor data­base do we buy" but "do we need to buy one at all". For a com­pa­ny that wants to build AI fea­tures into its own prod­uct — se­man­tic search across the knowl­edge base, a sup­port as­sis­tant ground­ed in its own doc­u­ments, prod­uct rec­om­men­da­tions over em­bed­dings — the an­swer, in most cas­es, is no.

The prac­ti­cal an­chor is mun­dane, and that's pre­cise­ly why it's strong. Stand­ing up a RAG ap­pli­ca­tion on Post­greSQL you al­ready run means no sec­ond data­base to in­tro­duce, op­er­ate, back up and syn­chro­nise. Few­er mov­ing parts mean less op­er­a­tional risk. That is not an AI ar­gu­ment. It is an op­er­a­tions ar­gu­ment, and op­er­a­tions ar­gu­ments are the ones that hold up in front of a CFO.

An em­bed­ding is a vec­tor — a list of num­bers that places the mean­ing of a text, im­age or prod­uct in space. Things that re­sem­ble each oth­er sit close to­geth­er. Se­man­tic search is then noth­ing more than this: find the vec­tors near­est to the query vec­tor. pgvec­tor ex­pos­es the dis­tance op­er­a­tors for that di­rect­ly in SQL — co­sine dis­tance with <=>, L2 with <->, neg­a­tive in­ner prod­uct with <#>, which in prac­tice is the fastest.

The de­ci­sive point for ar­chi­tec­ture: this search runs as part of an or­di­nary SQL query. You can com­bine vec­tor sim­i­lar­i­ty with clas­sic WHERE con­di­tions — only this ten­an­t's doc­u­ments, only ar­ti­cles from this pe­ri­od, only records the logged-in user is al­lowed to see. In a sep­a­rate vec­tor data­base you main­tain those meta­da­ta fil­ters in par­al­lel and hope both sys­tems tell the same truth.

In Su­pabase you typ­i­cal­ly wrap RAG re­trieval in a Post­greSQL func­tion with match_thresh­old and match_count, be­cause the API lay­er above it can't ex­press vec­tor dis­tances di­rect­ly. The Su­pabase doc­u­men­ta­tion on vec­tor columns lays this out clean­ly. The side ef­fect weighs more than the con­straint: ac­cess con­trol through Row Lev­el Se­cu­ri­ty ap­plies au­to­mat­i­cal­ly — to the search re­sults too. A vec­tor data­base with­out row-lev­el au­tho­ri­sa­tion log­ic forces you to re­build ex­act­ly that se­cu­ri­ty in the ap­pli­ca­tion lay­er. That is the kind of be­spoke work a pen­test tends to find.

For the in­dex, pgvec­tor gives you two op­tions. HNSW builds a mul­ti-lay­er graph, can be cre­at­ed on an emp­ty ta­ble, and of­fers the bet­ter trade-off be­tween speed and re­call. IVF­Flat builds faster and uses less mem­o­ry, but has to be cre­at­ed af­ter the ta­ble is pop­u­lat­ed be­cause it trains cen­troids, and it stays weak­er on the speed-re­call com­par­i­son. For most pro­duc­tion se­tups HNSW is the choice. Su­pabase bench­marked ex­act­ly this against IVF­Flat: at high ac­cu­ra­cy HNSW came out more than six times ahead, and at very high re­call pgvec­tor's HNSW on the same com­pute even beat a ded­i­cat­ed vec­tor data­base like Qdrant. Read the ab­solute through­put num­bers from sec­ondary sum­maries only as rough in­di­ca­tion — the rel­e­vant find­ing is the rel­a­tive mul­ti­pli­ers, and those are clear.

What Edge Func­tions bring to the stack

Em­bed­dings have to be gen­er­at­ed some­where, and the API key for the em­bed­ding mod­el has no busi­ness sit­ting in a brows­er fron­tend. This is where Su­pabase Edge Func­tions come in: server­less, Deno-based func­tions run­ning right next to the data­base. The typ­i­cal AI path looks like this — a new record is in­sert­ed, a trig­ger puts it in a queue, a sched­uler calls an Edge Func­tion in batch that re­quests the em­bed­ding from the mod­el and writes it back as a vec­tor. Su­pabase has stan­dard­ised this pat­tern un­der the name Au­to­mat­ic Em­bed­dings us­ing four ex­ten­sions — pgvec­tor for stor­age, pgmq for the queue, pg_net for asyn­chro­nous HTTP, pg_cron for sched­ul­ing. That's not a hack, it's a doc­u­ment­ed path.

The thing not to do is over­es­ti­mate Edge Func­tions. They are built for short-lived work, and the of­fi­cial lim­its say so plain­ly: 256 MB RAM, 2 sec­onds of CPU time per re­quest (pure asyn­chro­nous I/O does not count against it), a max­i­mum of 150 to 400 sec­onds of wall-clock de­pend­ing on plan, a 20 MB bun­dle size. Web­hooks, em­bed­ding gen­er­a­tion, an API proxy in front of Ope­nAI or a self-host­ed mod­el — ide­al. A long-run­ning re-in­dex­ing job over sev­er­al mil­lion doc­u­ments does not be­long in an Edge Func­tion; it be­longs in a sep­a­rate work­er. Re­spect that bound­ary and you get a clean tool. Ig­nore it and you build your­self time­outs you'll strug­gle to ex­plain lat­er.

When a ded­i­cat­ed vec­tor data­base is the bet­ter choice

I won't pre­tend there are no cas­es for Pinecone, Weav­i­ate or Qdrant. There are. At tens of mil­lions of vec­tors with high, sus­tained query load, spe­cialised sys­tems scale bet­ter and of­fer fin­er tun­ing knobs, hor­i­zon­tal shard­ing mod­els and op­er­at­ing modes that a gen­er­al-pur­pose data­base does­n't ship with. Any­one run­ning a pub­lic search en­gine over hun­dreds of mil­lions of em­bed­dings does not build on pgvec­tor. That would be the wrong tool, and I would­n't rec­om­mend it to any­one.

The hon­est rule of thumb from com­par­i­son analy­ses — ex­plic­it­ly a rule of thumb, not a ven­dor-guar­an­teed thresh­old: be­low rough­ly five to ten mil­lion vec­tors at mod­er­ate query load, pgvec­tor is the prag­mat­ic de­fault when Post­greSQL is al­ready in the stack. ACID trans­ac­tions, co-lo­cat­ed data, one com­po­nent few­er, the cheap­est op­tion. Above that, un­der se­ri­ous load, a ded­i­cat­ed vec­tor data­base be­longs in the re­quire­ments.

The point is the dis­tri­b­u­tion of re­al­i­ty. The over­whelm­ing ma­jor­i­ty of en­ter­prise use cas­es — in­ter­nal knowl­edge search over tens of thou­sands of Con­flu­ence pages, the prod­uct cat­a­logue with a few hun­dred thou­sand items, the sup­port bot ground­ed in the docs — sit or­ders of mag­ni­tude be­low that thresh­old. For them a ded­i­cat­ed vec­tor data­base is not an ad­van­tage but over­sized in­fra­struc­ture that some­one has to op­er­ate, pay for and an­swer to the ven­dor for. You'd be buy­ing com­plex­i­ty for a scal­ing prob­lem you'll nev­er have.

The un­com­fort­able part: pgvec­tor has real, phys­i­cal lim­its

Now the part where I hon­est­ly com­pli­cate my own the­sis. pgvec­tor is good, but it is­n't a mag­ic trick, and the lim­its are real.

The core lim­i­ta­tion is phys­i­cal: the HNSW in­dex has to fit in mem­o­ry. As long as it does, pgvec­tor de­liv­ers im­pres­sive num­bers. Once the in­dex grows larg­er than shared_buffers, query per­for­mance col­laps­es non-lin­ear­ly — and bru­tal­ly. The num­bers are there to read in the doc­u­ment­ed per­for­mance is­sue #700 of the pgvec­tor project: on a ma­chine with 16 GB shared_buffers, the in­dex served around 2,110 queries per sec­ond at 2 mil­lion vec­tors. At 3 mil­lion it was 102. At 5 mil­lion, bare­ly 13. That is not a gen­tle plateau, that is a cliff.

The same goes for the in­dex build. HNSW only builds fast while the graph fits in main­te­nance_work_mem — and Post­greSQL's 64 MB de­fault is too small for se­ri­ous builds. Ex­ceed it and the build falls back to a disk-spill path that sec­ondary sources put at ten to fifty times slow­er. At very large vec­tor vol­umes the in­dex build be­comes a gen­uine cost fac­tor, and HNSW ad­di­tion­al­ly forces tun­ing be­tween re­call and la­ten­cy through pa­ra­me­ters like ef_search.

The con­se­quence is clear, not "it de­pends": any­one who needs hun­dreds of mil­lions of vec­tors with guar­an­teed sub-10-mil­lisec­ond la­ten­cy from day one should not pre­tend Post­gres is the an­swer. For them the right ar­chi­tec­ture is a ded­i­cat­ed vec­tor data­base, full stop. But that re­quire­ment pro­file is the ex­cep­tion, and it is in­tel­lec­tu­al­ly dis­hon­est to blud­geon a prag­mat­ic de­fault de­ci­sion with an ex­treme case you'll prob­a­bly nev­er reach. The right ques­tion for your own project is not "what if we be­came Google" but "how many vec­tors will we re­al­is­ti­cal­ly have in three years". The team usu­al­ly knows that num­ber. It al­most al­ways sits in the green.

The data pro­tec­tion di­men­sion no­body can mod­er­ate away

An AI back­end of­ten process­es ex­act­ly the data that is reg­u­la­to­ri­ly sen­si­tive — cus­tomer doc­u­ments, in­ter­nal knowl­edge bases, sup­port his­to­ries. Su­pabase lets you choose an EU re­gion per project, Frank­furt for in­stance, and the en­tire stack is self-hostable via Dock­er. That is the good news for data sov­er­eign­ty.

The un­com­fort­able nu­ance re­mains: Su­pabase Inc. is a US-in­cor­po­rat­ed com­pa­ny and can there­fore po­ten­tial­ly fall un­der the CLOUD Act, even with an EU stor­age lo­ca­tion. That is a le­gal clas­si­fi­ca­tion, not an in­fra­struc­ture lim­i­ta­tion — but for reg­u­lat­ed in­dus­tries it's a point for risk man­age­ment. Any­one need­ing max­i­mum con­trol runs the stack self-host­ed on their own EU in­fra­struc­ture, where the CLOUD Act at­tack vec­tor sim­ply dis­ap­pears. The same care ap­plies to em­bed­ding gen­er­a­tion it­self: which mod­el sees the data and where it runs is part of the same sov­er­eign­ty ques­tion — not every em­bed­ding has to be gen­er­at­ed at a US hy­per­scaler.

What I tell de­ci­sion-mak­ers

Treat the choice of AI back­end not as an AI ques­tion but as what it is: an op­er­at­ing-mod­el de­ci­sion. Every ad­di­tion­al com­po­nent in the stack is a con­tract, an op­er­a­tional bur­den, an in­ter­face that can break. pgvec­tor in­side a Post­greSQL you al­ready run is, for the vast ma­jor­i­ty of en­ter­prise use cas­es, the ar­chi­tec­ture with the fewest mov­ing parts — and there­fore the low­est op­er­a­tional risk and the most pre­dictable TCO. If you want to see how that log­ic adds up over three years, the fig­ures are in our look at what Su­pabase re­al­ly costs in pro­duc­tion, and the strate­gic pic­ture of Su­pabase as a Eu­ro­pean back­end plat­form sits on our Su­pabase overview page.

Con­sol­i­da­tion is the win, not the AI fea­ture. Any­one can build the fea­ture. Op­er­at­ing one com­po­nent few­er is the dif­fer­ence be­tween a sys­tem your team com­mands and one your team mere­ly ad­min­is­ters. Don't ask first which vec­tor data­base is best. Ask whether you need a sec­ond one at all. For most projects the hon­est an­swer is: no — not yet, and maybe nev­er.

Frequently asked questions

Do I really need a dedicated vector database for RAG and semantic search?
No. With pgvector you can run vector search directly in PostgreSQL. For most enterprise use cases — roughly up to the low tens of millions of vectors at moderate query load — that is enough, and it saves you an entire infrastructure component. Dedicated vector databases like Pinecone, Weaviate or Qdrant only pay off at very large volumes under high load.
What are Supabase Edge Functions good for in an AI context, and what not?
Edge Functions are built for short-lived work: generating embeddings, handling webhooks, proxying external AI APIs without exposing keys to the client. They get 256 MB RAM and 2 seconds of CPU time per request (pure async I/O does not count against that). They are not meant for long, compute-heavy batch jobs or model training.
What is pgvector's most important scaling limit?
The HNSW index has to fit in memory. As long as it does, pgvector delivers high query throughput. Once the index grows larger than the available RAM, or shared_buffers, performance collapses non-linearly — documented drops run from over 2,000 to around 100 queries per second. That is a physical limit, not a licensing one.
Does my RAG use case stay GDPR-compliant on Supabase?
You can pin the region per project to an EU location such as Frankfurt, and the stack is self-hostable. What stays relevant under data protection law is that Supabase Inc. is US-incorporated and therefore potentially subject to the CLOUD Act. Anyone needing maximum data sovereignty should evaluate the self-hosted variant on their own EU infrastructure.

Sources

Related articles

Open for select projects

Let's talk about your project

Book a no-oblig­a­tion call, send us an email, or use the form – we'd love to hear from you.

150+
Completed projects
15
Years of experience
8
Senior‑level team members