Scalability and Cost Efficiency: Delivering National‑Scale NLP on Commodity Hardware
When people hear that we process 20M+ pages per day, support 20,000+ users, and render documents that exceed 10,000 pages, they often assume the system must run on exotic hardware or a massive distributed cluster.
It doesn’t.
The real story is that national‑scale performance is achievable on commodity hardware, as long as the architecture is designed for predictable workloads, horizontal scaling, and disciplined operational controls. This article explains how we scale efficiently, keep costs flat, and maintain consistent performance even as volume grows.
The Core Principle: Scale Out, Not Up
Scaling “up” — buying bigger servers — is expensive, brittle, and unpredictable.
Scaling “out” — adding more modest servers — is:
- Cheaper
- More resilient
- Easier to replace
- Easier to tune
- Easier to observe
That last point surprises people, so let’s address it directly.
Why more servers actually make observability easier
When each component is small and independent, the signals become cleaner:
- A latency spike on Server‑7 is obvious
- A slowdown on Analyzer‑12 stands out
- A routing imbalance across NodeJS instances is immediately visible
Comparisons reveal patterns.
Patterns reveal problems.
Problems get fixed.
Instead of one giant, noisy system, you have many small, predictable systems — each producing clear, isolated metrics. Failures become local, not systemic. Drift shows up early. Bottlenecks are easier to pinpoint. And because our management console can pull stack traces from every thread on every server, more servers mean more comparison points and more isolation points, which means more diagnostic power, not less.
Horizontal scaling isn’t just a cost strategy — it’s an observability strategy.
Analyzers: Real‑Time Processing on Commodity Hardware
Our analyzers are the backbone of the backend. We run about 20 analyzers, each on commodity hardware, and each one:
- Retrieves work from a PostgreSQL‑backed queue
- Processes tasks in real time
- Uses multi‑threading to handle multiple tasks simultaneously
- Can be deployed on many small servers or fewer larger ones
Why the queue matters
The PostgreSQL queue ensures:
- Balanced load distribution
- No analyzer becomes overloaded
- Tasks are processed in predictable order
- Scaling analyzers is as simple as adding another server
This design gives us flexibility:
- If volume increases, we add analyzers.
- If patterns shift, we rebalance.
- If hardware changes, the system adapts.
NodeJS: Lightweight, High‑Throughput, and Easy to Scale
NodeJS is ideal for our workload because it:
- Handles high concurrency efficiently
- Streams pages on demand
- Routes requests directly to the correct server
- Scales horizontally with minimal overhead
When traffic increases, we scale web servers, not hardware.
When patterns shift, we tune analyzers, not infrastructure.
This keeps both performance and cost stable.
SOLR: Independent Servers for Predictable Costs
Instead of a Zookeeper‑managed cluster, we use 10 independent servers, assigned via a PostgreSQL routing table. This design:
- Keeps each index small and fast
- Avoids cluster‑wide failures
- Makes performance predictable
- Allows simple, linear scaling
- Keeps hardware requirements modest
Storage consistency through intelligent purging
To keep storage predictable:
- SOLR cache is cleared after 3 days of inactivity
- Page cache is cleared after 35 days of inactivity
- MongoDB data is removed once a case is closed
This ensures storage usage stays flat even as volume grows.
ReactJS: Rendering Efficiency as a Cost Strategy
Rendering efficiency isn’t just a UX feature — it’s a cost‑control mechanism.
React’s virtualized rendering means:
- The browser does the heavy lifting
- The server only delivers what’s needed
- Memory usage stays flat
- Page requests remain small and predictable
By pushing rendering to the client, we reduce server load and avoid the need for expensive compute.
Docker: Consistency Without Overhead
Docker gives us:
- Predictable resource usage
- Lightweight deployments
- Easy rollback
- Environment consistency
- Fast horizontal scaling
Docker ensures every component behaves the same way everywhere — but the scaling strategy comes from the architecture itself, not the containerization.
Scaling Only the Components That Need It
One of the biggest cost advantages of this architecture is that each subsystem scales independently:
- If delivery load increases → scale NodeJS
- If retrieval load increases → add another server
- If processing load increases → add analyzers
This modularity keeps costs aligned with actual usage and prevents over‑provisioning.
Observability as a Cost‑Control Mechanism
Operational controls aren’t just about reliability — they’re about efficiency.
Nightly reports help us:
- Identify slow tasks that waste compute
- Detect data drift that increases processing time
- Spot timing spikes that signal inefficient code paths
The Validation Console helps us:
- Debug anomalies without expensive trial‑and‑error
- Validate fixes before they hit production
- Avoid costly regressions
The Management Console helps us:
- Monitor real‑time throughput
- Identify bottlenecks instantly
- Resolve deadlocks before they cascade
Observability prevents waste — of compute, of time, and of money.
Scaling Without Surprises
The combination of:
- Independent servers
- Horizontally scalable NodeJS
- 20+ analyzers running on commodity hardware
- Client‑side rendering
- Dockerized deployments
- White‑box observability
…means the system scales linearly.
When volume doubles, cost does not.
When users increase, latency does not.
When documents grow, performance does not degrade.
This is the hallmark of a well‑designed national‑scale system.
The Bottom Line: Cost Efficiency Comes from Architectural Discipline
You don’t need exotic hardware to run a system at this scale.
You need:
- Predictable components
- Horizontal scaling
- Efficient analyzers
- Lightweight services
- Intelligent purging
- Clear observability
- Fast debugging
- A culture of operational rigor
Cost efficiency isn’t a feature — it’s the outcome of good engineering.
Previous Post
Next Post