Engineering the Evidence Layer

In the previous article in this series, I wrote about pragmatic AI engineering and how reviewing model-generated code is the new bottleneck. We focused on tiered intelligence and token mirrors to manage developer cognitive load. But as I spent my evenings applying these ideas to build out my personal project portfolio, I ran into a deeper surprise.

When I started building AI applications this year, I thought the hardest part would be getting the models to produce good answers.

I was wrong.

Over the last few months, I spent my evenings toggling between four separate codebase directories in my editor: plotsense/ (a title verification system for property documents), intraday-lab/ (a research platform for market backtesting), hungergames/ (a team lunch preference calculator), and cartsense/ (a credit card rewards optimizer).

Each codebase lived in its own folder. Each had different data inputs, different schemas, and entirely different domains. But as I switched between these workspaces, I was hit by a strange feeling of repetition.

I wasn't repeating prompts, model choices, or vector indices. In fact, once I had a basic frontier model prompt or ocr extraction loop working, I rarely edited them. They just worked.

The AI call was a commodity finished in an hour. The next three weeks were spent building a custom application around it.

That was the moment my mental model shifted. I've stopped thinking of AI as the decision maker. It is just another source of evidence. My real job has become engineering the system that decides what evidence to trust.

Defining the Evidence Layer

For decades, traditional software architecture followed a predictable path: raw data was processed by deterministic business logic and handed to a user interface.

AI changes this flow. It introduces a probabilistic component into the core stack. To make these systems safe and useful, we have to insert an entirely new architectural layer: The Evidence Layer. Its only job is to convert probabilistic outputs into deterministic decisions.

⚡ Interactive Stack Blueprint

Hover or click nodes to trace how raw data matures into decisions.

Input

Raw

AI Ingestion

Prob

Validation

Det

Rules

Det

Decision

Det

Lineage & Structure Validation✓ Deterministic

Validates schema formats, checks boundaries, and verifies document survey numbers match prior historical deeds.

Validation Time

< 2ms

Method

Deterministic Match

if (extracted.survey !== expected.survey) {
  throw new TitleDefectError("Survey drift detected");
}

Traditional vs Modern AI Stack Comparison

The Primary Proof: Property Chain Verification

To see why this layer is necessary, take the property title verification system in plotsense/.

Using OCR can extract text from an Encumbrance Certificate, and an LLM can summarize the transactions. But neither tells you whether it's actually safe to buy the land. The AI output—no matter how accurate—is merely evidence.

The real engineering work is the validation code that sits above the model's output to verify the chain of custody. It must answer questions that the model cannot:

Can I trust the data? Does this specific document belong to the same property boundary, or has the survey number mutated?
Can I verify the result? Is there a missing transaction link in the 30-year ownership history?
Can I explain the reasoning? If the system warns of a title defect, can it pinpoint the exact year and deed where the continuity broke?
Can I maintain this system? Will this validation hold up when bank formats drift or model versions update?

Extracting the names and dates is a commodity task handled by the model. Resolving the chain of custody is a deterministic software problem.

The Pattern Repeats

The division between probabilistic evidence and deterministic decisions repeated across every project I built:

Domain	AI gives me (Evidence)	The Evidence Layer (The Real Work)
Property (PlotSense)	Structured document entities	Title chain continuity validation
Finance (CartSense)	Raw merchant name & amount	Reward optimization & refund match rules
Restaurant (Lunch Hub)	Dish reviews & classifications	Team diet compatibility matching
Markets (Strategy Lab)	Stock news & technical indicators	Expectancy simulator & risk safety rails

In the rewards optimizer, the AI can classify a transaction from a receipt. That is evidence. The application must still run the deterministic rules to map that merchant to your credit card rewards matrices and verify if it matches a refund within a 7-day window.

In the market lab, the AI can summarize stock sentiment. That is evidence. The application must still run the backtest simulations, calculate expectancy, and enforce the hardcoded risk safeguards that limit order sizes.

In every case, the AI did not make the decision. It merely provided structured, high-signal evidence to a system designed to verify it.

The Shift in AI Engineering

This realization has changed how I measure my work.

A year ago, I measured progress by how good my prompts were. Today, I measure progress by how much confidence my system gives the person making the decision.

I suspect that's the direction AI engineering is heading.

Defining the Evidence Layer

For decades, traditional software architecture followed a predictable path: raw data was processed by deterministic business logic and handed to a user interface.

⚡ Interactive Stack Blueprint

Hover or click nodes to trace how raw data matures into decisions.

Input

Raw

AI Ingestion

Prob

Validation

Det

Rules

Det

Decision

Det

Lineage & Structure Validation✓ Deterministic

Validates schema formats, checks boundaries, and verifies document survey numbers match prior historical deeds.

Validation Time

< 2ms

Method

Deterministic Match

if (extracted.survey !== expected.survey) {
  throw new TitleDefectError("Survey drift detected");
}

Traditional vs Modern AI Stack Comparison

The Primary Proof: Property Chain Verification

To see why this layer is necessary, take the property title verification system in plotsense/.

The real engineering work is the validation code that sits above the model's output to verify the chain of custody. It must answer questions that the model cannot:

Can I trust the data? Does this specific document belong to the same property boundary, or has the survey number mutated?

Can I verify the result? Is there a missing transaction link in the 30-year ownership history?

Can I explain the reasoning? If the system warns of a title defect, can it pinpoint the exact year and deed where the continuity broke?

Can I maintain this system? Will this validation hold up when bank formats drift or model versions update?

Extracting the names and dates is a commodity task handled by the model. Resolving the chain of custody is a deterministic software problem.

The Pattern Repeats

The division between probabilistic evidence and deterministic decisions repeated across every project I built:

Domain

AI gives me (Evidence)

The Evidence Layer (The Real Work)

Property (PlotSense)

Structured document entities

Title chain continuity validation

Finance (CartSense)

Raw merchant name & amount

Reward optimization & refund match rules

Restaurant (Lunch Hub)

Dish reviews & classifications

Team diet compatibility matching

Markets (Strategy Lab)

Stock news & technical indicators

Expectancy simulator & risk safety rails

In every case, the AI did not make the decision. It merely provided structured, high-signal evidence to a system designed to verify it.