Local Compute

Infrastructure

Bot Research operates independent AI infrastructure for testing and research. Local systems reveal failure modes that cloud APIs hide.

Why Local Infrastructure

Cloud APIs smooth away failure modes. Retries, truncation, fallback behaviour, and resource starvation shape outputs in ways that are not visible to the user. Running systems locally makes these patterns observable.

Our infrastructure exists to run AI systems continuously and observe behaviour over time. The interest is not capability. It is what happens to errors once outputs become routine.

Timeline

January 2026

Fareham Innovation Centre

Relocated to purpose-built facility. Infrastructure transitioning to Ada-generation cards.

January 2025

Mumby's AI Hub Opens

Opened dedicated facility in a former soda water factory built in 1850. The Hub operated local language models outside cloud APIs, testing how outputs were relied on once they became familiar and documenting failure modes as operational artefacts.

Late 2024

Initial Infrastructure

First GPU rigs built. Began running local AI systems and observing what happened once they were used repeatedly, under constraint, and without vendor insulation.

Current Systems

Server Infrastructure

Multi-node server build for parallel testing and direct comparison across models and configurations. Used to run identical prompts across multiple models, observe divergence between nominally similar deployments, and test chained outputs and error propagation. Our infrastructure includes NVIDIA Blackwell, DGX systems, and multi-GPU configurations.

Configuration Multi-GPU server nodes

Previous generation 12× RTX 3090 FE (24GB each)

Current generation RTX 4500 and 6000 Ada-generation cards

Workstation Build

Compact workstation allowing larger contexts, longer documents, and comparison with API-served equivalents.

GPU RTX 6000 Ada (48GB)

Use case Extended context testing

GPU Rigs

Open-air builds reflecting how many were experimenting with local AI in 2024-2025: consumer hardware, limited memory, cost constraints.

Configuration 6× RTX 3060 (12GB each)

Models tested LLaMA 7B/13B, Mistral 7B, Mixtral

Quantisation 4-bit, 5-bit, 8-bit

Teaching Systems

Smaller demonstration systems for showing AI behaviour in real time. Watching a system fabricate a citation or policy reference live is often more effective than written guidance.

GPU Quadro RTX 5000 (16GB)

Use case Live demonstration

What We Run

Across 2024 and 2025, these systems have run:

LLaMA-family derivatives at 7B and 13B parameters
Mistral 7B and instruction-tuned variants
Mixtral configurations where memory allowed
Quantised models at 4-, 5- and 8-bit precision
Local embedding models paired with retrieval pipelines

These setups observe how hallucinations behave under constraint. Context saturation, clipping, incomplete source material, and aggressive quantisation all produce the same pattern: confidence remains high but accuracy degrades quietly.

What We Learned

Cloud APIs hide failure modes. Local systems do not. Retries, truncation, fallback behaviour, and resource starvation are smoothed away in managed environments but still shape outputs.

Constraint changes behaviour. Under pressure, models hallucinate more decisively, not less. Larger models sound safer, but they are often harder for humans to challenge once fluent output is treated as sufficient.

Operational risk does not stop at the model. Hardware provenance, configuration drift, and supply chain integrity matter.

Reducing precision does not make outputs more cautious. It makes them wrong more efficiently.

Independence

All infrastructure is owned and operated by Bot Research. We do not accept hardware partnerships, vendor loans, or sponsored configurations. Complete control over test conditions requires complete ownership of the systems.

This independence is the foundation of our credibility. When we document a failure mode, there is no commercial relationship shaping what we can say about it.

Research programme