Per-customer embedding tenancy, explained

Multi-tenant SaaS over an embedding store is a problem with three reasonable solutions and one wrong one. The wrong one is shared embedding tables with application-layer filtering. We use Postgres row-level security on pgvector. This post explains why.

The threat model

Each customer’s KB is private. Customer A’s content blocks must not be retrievable by a query running for Customer B. The threat is not malicious tenant-to-tenant access — that is rare in practice — but accidental cross-tenant retrieval caused by a bug in the application code that filters search results.

The classic failure mode looks like this:

// Bad — application-layer filter
const matches = await db
  .select()
  .from(kbBlockEmbeddings)
  .orderBy(sql`embedding <-> ${queryEmbedding}`)
  .limit(20);

// Filter happens here
const filtered = matches.filter((m) => m.companyId === currentCompanyId);

The filter works until the day someone forgets to write it. Forgetting to filter is a one-line mistake that surfaces no test failures (the query still returns rows; they are just rows from the wrong tenant) and produces a customer-facing data leak. The architecture is wrong.

What we do instead

We push tenant isolation into Postgres via row-level security policies. The policy is enforced regardless of how the query is written. Forgetting to filter does not produce wrong results — the database refuses to return cross-tenant rows.

The schema:

ALTER TABLE kb_block_embeddings ENABLE ROW LEVEL SECURITY;

CREATE POLICY company_isolation ON kb_block_embeddings
  USING (company_id = current_setting('app.current_company_id')::uuid);

The application sets app.current_company_id per session via SET LOCAL. Any query that reads kb_block_embeddings only sees rows where company_id matches the session-local company. Forgetting the filter is impossible — there is no filter to forget. The policy is the filter.

This includes pgvector queries. The HNSW index over the embedding column respects the RLS policy. A nearest-neighbor query returns the closest embeddings within the tenant’s row set, not the closest embeddings globally followed by a filter. The performance characteristic is the one we want: latency scales with the tenant’s KB size, not with the total embedding table size.

Why not shared embeddings with filtering

Shared embedding tables — one big embedding space, with company_id as a column — are cheaper to operate than per-tenant tables. The HNSW index is built once over all embeddings, retrieval is uniformly fast, and storage overhead is minimal.

We considered this design and rejected it for two reasons.

The failure mode is silent. As above, application-layer filter bugs do not produce errors. They produce wrong results. Wrong results that look right are the worst kind of bug, and the consequence in this domain is a tenant-isolation breach.

The HNSW index does not natively respect post-hoc filters. When you build an HNSW index over a million embeddings and then filter the top-k results by company_id, you can end up with fewer than k results from the tenant — because the top-k globally may include zero rows from the requesting tenant, even though the tenant has many relevant blocks. You then have to over-fetch and re-rank, which costs latency and complicates the query. RLS pushes the filter into the index traversal, which avoids the over-fetch problem.

There are pgvector and HNSW patterns that handle filtered queries efficiently — partition pruning, partial indexes — but they require operational discipline (every tenant gets an index partition; partitions get re-balanced as tenants grow) that we are not willing to take on for the marginal cost saving versus RLS.

What about per-tenant databases

The other end of the spectrum: a separate Postgres database per customer. Strongest possible isolation, simplest possible mental model, and the most expensive operationally — every database needs its own connection pool, its own backup schedule, its own migration runner.

We do not do this either. The operational cost crosses the threshold of “engineer-hours per customer” too quickly, and the threat model RLS addresses is the same threat model per-database isolation addresses. RLS is cheaper to operate and equally safe, given that the policy is enforced at the database level rather than at the application level.

There are customers — typically large enterprise deployments with regulatory isolation requirements — where per-tenant databases or even per-tenant deployments are required. Our self-hosted deployment option exists for those cases. RLS is the default; per-tenant deployment is the upgrade path.

The connection-pool detail that matters

RLS only works if app.current_company_id is set correctly on every session, every time. Connection pooling complicates this. A connection returned to the pool by Customer A and re-used by Customer B will, by default, retain the previous session’s settings — which is a tenant-isolation leak.

We use SET LOCAL (transaction-scoped, not session-scoped) and wrap every request in a transaction that opens with the company-id setting and closes when the request finishes. The pattern:

await db.transaction(async (tx) => {
  await tx.execute(
    sql`SET LOCAL app.current_company_id = ${companyId}::text`
  );
  // ...all queries in this transaction respect the policy
  return await handler(tx);
});

SET LOCAL ends with the transaction, so when the connection returns to the pool, the setting is gone. The next request that picks up the connection sets its own company-id at the top of its transaction. There is no window in which a connection carries a stale setting.

This pattern has been stable for us. The lint rule we wrote — every request handler must open with the transaction-and-set-local pattern, enforced by a custom ESLint check — has caught two would-be regressions in the past year.

What this does not solve

RLS protects retrieval at the embedding-store layer. It does not protect against:

Application bugs that join across tables incorrectly. RLS policies must be set on every table that holds tenant-scoped data. Missing a policy on a related table (KB block metadata, citation records, version history) is the equivalent of missing a filter. We have policies on every table; the discipline has to be maintained.
Logs and traces that leak tenant data. A retrieval query that includes the embedding text in a debug log can leak content even when the query itself is policy-protected. We scrub tenant content from logs at the application layer; this is a separate concern from RLS.
Backup and restore operations. Postgres backups include all rows; restoring into a misconfigured environment can expose data. Our backup pipeline encrypts at rest and restricts restore access by role. Again, separate from RLS, but a related concern.

The closing argument

The thing we are buying with RLS is robustness. The cost is some latency overhead on the planner (RLS adds a predicate to every query) and some operational discipline (every new tenant-scoped table needs a policy). The cost is small. The benefit — that an entire class of cross-tenant retrieval bugs becomes impossible to write — is large.

For multi-tenant SaaS over a vector store, the default should be database-level isolation, not application-level isolation. Postgres RLS is the cheapest way to get there. The fact that it composes correctly with pgvector’s HNSW indexes is what makes it the right default for this product.