Decision Support at Scale: How Structured Data, Modeling, and LLM Summaries Work Together

February 23, 2026 thepetersams

Share This

In the previous article, we focused on cTAKES and the engineering work required to make clinical NLP viable on large, multi‑page medical evidence documents. This article moves one step downstream — into Decision Support, the stage where most of the system’s modeling occurs and where summarization is generated.

But it’s important to understand that modeling is not confined to Decision Support. Modeling is used throughout the pipeline:

To validate extracted text
To detect malformed regions
To identify OCR anomalies
To classify document types
To support region detection
To drive Decision Support and summarization

Decision Support is simply the stage where modeling becomes the primary activity, and where the outputs directly support examiner workflows.

The role of Decision Support in the pipeline

Decision Support consumes:

Structured data from cTAKES
TF‑IDF features
Region boundaries
Encounter‑level signals
Temporal anchors
Supplemental retrieval from SOLR

And produces:

Predictive model outputs
Evidence‑grounded summaries
Citation‑linked passages
Examiner‑ready insights

This is the stage where extracted evidence becomes actionable.

Our current modeling approach: structured features + TF‑IDF + shallow neural networks

Before LLMs were viable at scale, our system used a hybrid modeling approach:

Structured features extracted by cTAKES
TF‑IDF n‑grams
Shallow neural networks trained on these combined features

This architecture is:

Fast
Predictable
Easy to maintain
Scalable to national workloads

And it works extremely well — but it has a known limitation: explainability.

Explainability today: useful, but imperfect

Current explainability is generated by:

Scoring cTAKES features against domain‑specific rules
Mapping model outcomes to TF‑IDF terms
Highlighting passages that contain those terms
Displaying those passages as the “explanation”

This is effective, but it introduces a mismatch:

The model may be using different passages than the ones shown in the UI.

Because the model’s decision boundary is based on TF‑IDF vectors and structured features, the highlighted passages are a proxy, not a guarantee.

In a high‑stakes environment, proxies are not enough.

Why LLMs change the explainability model

Modern LLMs allow us to shift from:

Model → Explanation
to
Evidence → Summary → Explanation

Instead of trying to infer what the model used, we can now:

Identify the right evidence using structured data
Feed only that evidence to the LLM
Generate a grounded summary
Use the summary itself as the explanation

This eliminates the mismatch entirely.

Structured data is the foundation of scalable LLM summarization

The most important part of this transition is not the LLM — it’s the structured data extracted by cTAKES.

Structured data allows us to:

Identify clinically relevant regions
Filter out irrelevant text
Anchor findings to encounters and timelines
Score features against domain knowledge
Select the correct passages for retrieval

This ensures the LLM sees only the evidence that matters, not the entire document.

Why this matters at 20M+ pages/day

Lower token counts → lower cost and higher throughput
Reduced hallucinations → because the LLM is grounded in curated evidence
Better accuracy → because irrelevant text is excluded
Predictable behavior → because the input is deterministic

Token count is not a theoretical concern — it is a cost and throughput constraint.

RAG driven by structured data — not blind retrieval

Our Retrieval‑Augmented Generation (RAG) pipeline is structured‑first:

cTAKES extracts entities, regions, encounters, and temporal anchors
We score these features against domain‑specific rules
We identify the exact passages relevant to the Decision Support task
Only those passages are fed to the LLM

This ensures the LLM is grounded in the same evidence the model uses.

Where SOLR fits: supplementing structure when needed

Structured data is powerful, but not always sufficient.

Some tasks require:

Narrative context
Rare or unusual phrasing
Evidence that is difficult or impossible to extract in a structured form.

For these cases, SOLR supplements structured data by retrieving:

Nearby text spans
Related passages
Rare patterns not captured by cTAKES

SOLR is not the primary retrieval engine — it is the fallback and enhancer when structure alone cannot provide the full picture.

This hybrid approach ensures:

High recall
High precision
Low token counts
Strong grounding
Minimal hallucination risk

Decision Support: structured → scored → retrieved → summarized

The Decision Support stage looks like this:

cTAKES extracts structured data
Domain scoring identifies the most relevant features
RAG selects passages using structure first, SOLR second
The LLM receives only curated evidence
The LLM produces a grounded, citation‑rich summary
The summary becomes both the output and the explanation

This architecture is:

Scalable
Explainable
Cost‑efficient
Token‑efficient
Auditable
Future‑proof

Why this transition matters

This is not about replacing models with LLMs. It’s about:

Improving explainability
Eliminating mismatches between model logic and UI
Reducing cost through token efficiency
Ensuring every summary is grounded in real evidence
Making the system more maintainable long‑term
Preserving the strengths of structured extraction while adding LLM flexibility

LLMs are powerful — but structured data is what makes them safe, scalable, and affordable.

thepetersams

SOLR and Selective Retrieval at Scale: Full‑Document Indexing for User Search and LLM Summaries

cTAKES at Production Scale: Data Modeling, Performance, and Operational Practices

Engineering for Scale: How Our Pipeline Runs 20M+ Pages a Day on Commodity Hardware

The Hidden Challenge of Medical Evidence at a National Scale

Interactive Consulting Services, Inc.

Decision Support at Scale: How Structured Data, Modeling, and LLM Summaries Work Together

thepetersams

Related Posts

Leave a Reply Text Cancel reply

Services

Contact Us