The Semantic Layer

January 18, 2025 at 4:00 PM

Why we need to build systems that think in concepts, not tables

We are building data systems wrong.

But that is only the local version of the problem.

The deeper problem is that we are living wrong in the same way.

We keep mistaking the surface for the thing.

In data, the surface is the schema: table names, column headers, foreign keys, brittle little labels we pretend are reality. We treat them like essence. We build dashboards, agents, decision systems, and governance around them. Then we act surprised when a rename breaks everything, when a join silently corrupts truth, when an AI hallucinates a table that does not exist and everyone nods along because it sounds right.

But the schema is not the world. The schema is the story we tell ourselves about the world.

And stories are fragile.

A flight event is still a flight event whether it is stored as event_time, timestamp, or t. A customer is still a customer whether the key is customer_id, cust_num, or a UUID buried in JSON. The meaning is constant. Only the costume changes.

This should be obvious. And yet most of our systems cannot see it.

That is not a tooling failure. That is a metaphysical failure.

It is the same failure that shows up everywhere humans make judgments.

We judge people by their resumes. By their accents. By their clothes. By a single sentence taken out of context. We rank them, filter them, condemn them, reward them based on whichever surface is easiest to process. We do this because surfaces are cheap and meaning is expensive.

Meaning requires work.

Meaning requires evidence.

Meaning requires humility.

So we settle for form.

And then we build entire civilizations on top of it.

In data systems, the consequences are just broken pipelines and wrong metrics. In human systems, the consequences are reputations, livelihoods, justice, history.

The violence of the surface is that it pretends to be enough.

Schemas are a perfect parable for this.

Rename a column and everything breaks because the system never understood what it was looking at. It memorized labels. It learned rituals. It learned how to perform competence. It learned how to pass validation checks. But it never learned meaning.

This is why LLM agents hallucinate tables: they are fluent in appearances. They autocomplete the world. They generate plausible forms. And the world rewards plausibility constantly until the moment it matters.

The ToRR benchmark shows that LLM accuracy varies wildly with formatting changes that preserve semantics. TURL explicitly encodes headers, captions, and metadata as inputs, making it sensitive to surface-level variations. RelBench evaluates performance across diverse schemas, emphasizing that many relational encoders rely on specific surface structures.

This is why dashboards drift silently: because metrics are defined in terms of where they are stored, not what they are. And storage changes. Reality does not.

This is why foundation models break under harmless refactors: because they were trained on the shape of the cage, not the animal inside it.

A system that cannot distinguish essence from surface is not intelligent. It is merely trained.

So what would intelligence look like?

It would look like an insistence on concepts.

Not as a buzzword. As a discipline.

A concept is what remains when you strip away representation. A concept is what survives renaming. A concept is what still holds when the database migrates, the formats rotate, the table splits, the keys disappear, the pipeline changes hands.

In other words: a concept is what is real.

If you want an older language for this, the Rig Veda already said it: truth is one, the wise speak of it in many ways. The names vary; the thing does not. The column headers are the many names. The concept is the one.

And the Bible warns about the same mistake in different terms: the danger of worshipping what you can see. Idolatry is not only statues. Idolatry is any time you treat a representation as ultimate, any time you bow to the interface and forget what it points to.

The semantic layer is the refusal to worship the schema.

It is a layer that says: I do not care what you called it. Show me what it is.

It learns a concept map from structure, content, and usage because meaning leaves footprints: in how data relates, in what values look like, in what people ask it to do.

And then it exposes those concepts as the interface to everything downstream: to LLM agents, so they do not hallucinate join graphs based on names; to relational foundation models, so they do not depend on brittle foreign keys; to analytics, so metrics do not mutate when schemas refactor.

In Hindu thought, there is a word for confusing the surface with the real: maya. Not fake in the childish sense, more like mis-taken. The world is not denied; it is misread. We grasp the label and miss the essence. We cling to the name and lose the thing.

The Upanishads offer a method: neti neti, not this, not that. Strip away what something is not, until you are left with what it is. That is exactly what a semantic layer does in engineering terms: it subtracts the accidental so the essential can remain.

And the Bible has its own version of this discipline: the difference between the letter and the spirit. The letter is the schema. The spirit is the meaning. A system that cannot move from letter to spirit will always be brittle, because letters are endlessly mutable.

The research confirms this works. Semantic map interfaces can materially reduce join and schema-linking errors in controlled studies. They improve cross-schema generalization and maintain accuracy under schema renames. They substantially reduce silent metric drift bugs observed in traditional BI systems. Relational foundation models supplied with concept-aligned subgraphs show improved predictive performance on entity classification, with more stable embeddings under schema drift. Models trained for semantic stability demonstrate strong invariance to syntactic changes while remaining sensitive to genuine semantic shifts.

But here is the part that matters beyond data:

A semantic layer is a moral technology.

Because anything that makes decisions is a judge. And judges who see only surfaces are dangerous.

A system that cannot tell the difference between syntactic change and semantic drift cannot be trusted with authority whether that authority is approving a loan, flagging a safety incident, or shaping what a human believes.

We like to imagine judgment as rational. But most judgment is pattern recognition over appearances. We call it common sense when it is really just cached heuristics.

This is why the semantic layer is bigger than databases.

It is a theory of knowledge.

It is the claim that truth is not a label. It is the claim that meaning is not where you stored it. It is the claim that reality is not the interface you were handed.

Religious texts keep returning to this because civilizations keep forgetting it.

The Tower of Babel is a story about what happens when language becomes a prison: when names stop being bridges and become walls. People cannot coordinate, not because reality changed, but because their symbols fractured. That is also what happens when your enterprise has fifteen customer tables and none of them agree on what a customer is.

And in the epics, the moral failures are rarely technical; they are perceptual. Characters fall because they misidentify what matters, because they confuse role with self, mask with face, status with substance. In the Bhagavad Gita, the battlefield is not only a place; it is a condition: the moment when you must see clearly enough to act rightly.

That is what we are asking our systems to do.

And once you see that in data systems, it becomes hard not to see it everywhere else.

How much of your life is a schema?

How many things do you treat as essence that are only conventions? How many beliefs are just inherited column names? How many relationships are maintained by mutual agreement to never rename anything?

How many times have you broken when the surface changed?

In aviation, the cost of confusing surfaces for meaning is measured in lives. Safety intelligence cannot be a prettier dashboard. It has to be epistemology made operational. Temporal reasoning works because events are semantically aligned, not just structurally similar. Agents can traverse the graph because they understand relationships between concepts, not just foreign keys. Systems remain stable because they reason about meaning, not structure.

Every pipeline is a court. Every dashboard is a verdict. Every model is a sentencing guideline. The tragedy is that we keep appointing judges that cannot explain what evidence they used.

Schemas are shadows on the wall. We argue about the shadows. We optimize the shadows. We govern by shadows. Then reality changes and we call it unexpected.

The future of data systems is not better schema management.

It is moving beyond schemas entirely.

Not because schemas are bad, but because they are what they always were: a convenient illusion.

The next generation of systems will be judged by a simple property:

Do they understand meaning or do they only recognize form?

The same question applies to us.

Vadoothker, Vin. "The Semantic Layer." January 18, 2025. https://www.vvadooth.com/blog/2025/01/18/160000