Most AI projects don't fail at the model. They fail at the infrastructure around it — the part nobody wanted to operate and somebody ends up operating anyway.
When a team says "we're building RAG" today, a second database lands on the architecture whiteboard almost by reflex: a dedicated vector database next to the existing PostgreSQL. Pinecone, Weaviate, Qdrant — the names change, the pattern holds. With the second database comes the second backup strategy, the second monitoring setup, the second access-control model, and the sync job that's supposed to keep data consistent between the primary store and the vector store. That sync is exactly where the pager goes off at three in the morning.
Why Supabase as an AI backend is a consolidation decision
Supabase is, at its core, managed PostgreSQL with Auth, Storage, Realtime and Edge Functions layered on top. The piece that matters for AI is a PostgreSQL extension called pgvector. It adds a native data type for vectors and turns similarity search into a SQL operation. That makes the vector database PostgreSQL not a marketing slogan but a plain fact: the vector index lives in the same database as your business data.
This shifts the question. It is no longer "which vector database do we buy" but "do we need to buy one at all". For a company that wants to build AI features into its own product — semantic search across the knowledge base, a support assistant grounded in its own documents, product recommendations over embeddings — the answer, in most cases, is no.
The practical anchor is mundane, and that's precisely why it's strong. Standing up a RAG application on PostgreSQL you already run means no second database to introduce, operate, back up and synchronise. Fewer moving parts mean less operational risk. That is not an AI argument. It is an operations argument, and operations arguments are the ones that hold up in front of a CFO.
How pgvector actually solves RAG and semantic search
An embedding is a vector — a list of numbers that places the meaning of a text, image or product in space. Things that resemble each other sit close together. Semantic search is then nothing more than this: find the vectors nearest to the query vector. pgvector exposes the distance operators for that directly in SQL — cosine distance with <=>, L2 with <->, negative inner product with <#>, which in practice is the fastest.
The decisive point for architecture: this search runs as part of an ordinary SQL query. You can combine vector similarity with classic WHERE conditions — only this tenant's documents, only articles from this period, only records the logged-in user is allowed to see. In a separate vector database you maintain those metadata filters in parallel and hope both systems tell the same truth.
In Supabase you typically wrap RAG retrieval in a PostgreSQL function with match_threshold and match_count, because the API layer above it can't express vector distances directly. The Supabase documentation on vector columns lays this out cleanly. The side effect weighs more than the constraint: access control through Row Level Security applies automatically — to the search results too. A vector database without row-level authorisation logic forces you to rebuild exactly that security in the application layer. That is the kind of bespoke work a pentest tends to find.
For the index, pgvector gives you two options. HNSW builds a multi-layer graph, can be created on an empty table, and offers the better trade-off between speed and recall. IVFFlat builds faster and uses less memory, but has to be created after the table is populated because it trains centroids, and it stays weaker on the speed-recall comparison. For most production setups HNSW is the choice. Supabase benchmarked exactly this against IVFFlat: at high accuracy HNSW came out more than six times ahead, and at very high recall pgvector's HNSW on the same compute even beat a dedicated vector database like Qdrant. Read the absolute throughput numbers from secondary summaries only as rough indication — the relevant finding is the relative multipliers, and those are clear.
What Edge Functions bring to the stack
Embeddings have to be generated somewhere, and the API key for the embedding model has no business sitting in a browser frontend. This is where Supabase Edge Functions come in: serverless, Deno-based functions running right next to the database. The typical AI path looks like this — a new record is inserted, a trigger puts it in a queue, a scheduler calls an Edge Function in batch that requests the embedding from the model and writes it back as a vector. Supabase has standardised this pattern under the name Automatic Embeddings using four extensions — pgvector for storage, pgmq for the queue, pg_net for asynchronous HTTP, pg_cron for scheduling. That's not a hack, it's a documented path.
The thing not to do is overestimate Edge Functions. They are built for short-lived work, and the official limits say so plainly: 256 MB RAM, 2 seconds of CPU time per request (pure asynchronous I/O does not count against it), a maximum of 150 to 400 seconds of wall-clock depending on plan, a 20 MB bundle size. Webhooks, embedding generation, an API proxy in front of OpenAI or a self-hosted model — ideal. A long-running re-indexing job over several million documents does not belong in an Edge Function; it belongs in a separate worker. Respect that boundary and you get a clean tool. Ignore it and you build yourself timeouts you'll struggle to explain later.
When a dedicated vector database is the better choice
I won't pretend there are no cases for Pinecone, Weaviate or Qdrant. There are. At tens of millions of vectors with high, sustained query load, specialised systems scale better and offer finer tuning knobs, horizontal sharding models and operating modes that a general-purpose database doesn't ship with. Anyone running a public search engine over hundreds of millions of embeddings does not build on pgvector. That would be the wrong tool, and I wouldn't recommend it to anyone.
The honest rule of thumb from comparison analyses — explicitly a rule of thumb, not a vendor-guaranteed threshold: below roughly five to ten million vectors at moderate query load, pgvector is the pragmatic default when PostgreSQL is already in the stack. ACID transactions, co-located data, one component fewer, the cheapest option. Above that, under serious load, a dedicated vector database belongs in the requirements.
The point is the distribution of reality. The overwhelming majority of enterprise use cases — internal knowledge search over tens of thousands of Confluence pages, the product catalogue with a few hundred thousand items, the support bot grounded in the docs — sit orders of magnitude below that threshold. For them a dedicated vector database is not an advantage but oversized infrastructure that someone has to operate, pay for and answer to the vendor for. You'd be buying complexity for a scaling problem you'll never have.
The uncomfortable part: pgvector has real, physical limits
Now the part where I honestly complicate my own thesis. pgvector is good, but it isn't a magic trick, and the limits are real.
The core limitation is physical: the HNSW index has to fit in memory. As long as it does, pgvector delivers impressive numbers. Once the index grows larger than shared_buffers, query performance collapses non-linearly — and brutally. The numbers are there to read in the documented performance issue #700 of the pgvector project: on a machine with 16 GB shared_buffers, the index served around 2,110 queries per second at 2 million vectors. At 3 million it was 102. At 5 million, barely 13. That is not a gentle plateau, that is a cliff.
The same goes for the index build. HNSW only builds fast while the graph fits in maintenance_work_mem — and PostgreSQL's 64 MB default is too small for serious builds. Exceed it and the build falls back to a disk-spill path that secondary sources put at ten to fifty times slower. At very large vector volumes the index build becomes a genuine cost factor, and HNSW additionally forces tuning between recall and latency through parameters like ef_search.
The consequence is clear, not "it depends": anyone who needs hundreds of millions of vectors with guaranteed sub-10-millisecond latency from day one should not pretend Postgres is the answer. For them the right architecture is a dedicated vector database, full stop. But that requirement profile is the exception, and it is intellectually dishonest to bludgeon a pragmatic default decision with an extreme case you'll probably never reach. The right question for your own project is not "what if we became Google" but "how many vectors will we realistically have in three years". The team usually knows that number. It almost always sits in the green.
The data protection dimension nobody can moderate away
An AI backend often processes exactly the data that is regulatorily sensitive — customer documents, internal knowledge bases, support histories. Supabase lets you choose an EU region per project, Frankfurt for instance, and the entire stack is self-hostable via Docker. That is the good news for data sovereignty.
The uncomfortable nuance remains: Supabase Inc. is a US-incorporated company and can therefore potentially fall under the CLOUD Act, even with an EU storage location. That is a legal classification, not an infrastructure limitation — but for regulated industries it's a point for risk management. Anyone needing maximum control runs the stack self-hosted on their own EU infrastructure, where the CLOUD Act attack vector simply disappears. The same care applies to embedding generation itself: which model sees the data and where it runs is part of the same sovereignty question — not every embedding has to be generated at a US hyperscaler.
What I tell decision-makers
Treat the choice of AI backend not as an AI question but as what it is: an operating-model decision. Every additional component in the stack is a contract, an operational burden, an interface that can break. pgvector inside a PostgreSQL you already run is, for the vast majority of enterprise use cases, the architecture with the fewest moving parts — and therefore the lowest operational risk and the most predictable TCO. If you want to see how that logic adds up over three years, the figures are in our look at what Supabase really costs in production, and the strategic picture of Supabase as a European backend platform sits on our Supabase overview page.
Consolidation is the win, not the AI feature. Anyone can build the feature. Operating one component fewer is the difference between a system your team commands and one your team merely administers. Don't ask first which vector database is best. Ask whether you need a second one at all. For most projects the honest answer is: no — not yet, and maybe never.
