Why Local Infrastructure
Cloud APIs smooth away failure modes. Retries, truncation, fallback behaviour, and resource starvation shape outputs in ways that are not visible to the user. Running systems locally makes these patterns observable.
Our infrastructure exists to run AI systems continuously and observe behaviour over time. The interest is not capability. It is what happens to errors once outputs become routine.
Timeline
January 2026
Fareham Innovation Centre
Relocated to purpose-built facility. Infrastructure transitioning to Ada-generation cards.
January 2025
Mumby's AI Hub Opens
Opened dedicated facility in a former soda water factory built in 1850. The Hub operated local language models outside cloud APIs, testing how outputs were relied on once they became familiar and documenting failure modes as operational artefacts.
Late 2024
Initial Infrastructure
First GPU rigs built. Began running local AI systems and observing what happened once they were used repeatedly, under constraint, and without vendor insulation.
Current Systems
Server Infrastructure
Multi-node server build for parallel testing and direct comparison across models and configurations. Used to run identical prompts across multiple models, observe divergence between nominally similar deployments, and test chained outputs and error propagation. Our infrastructure includes NVIDIA Blackwell, DGX systems, and multi-GPU configurations.
Workstation Build
Compact workstation allowing larger contexts, longer documents, and comparison with API-served equivalents.
GPU Rigs
Open-air builds reflecting how many were experimenting with local AI in 2024-2025: consumer hardware, limited memory, cost constraints.
Teaching Systems
Smaller demonstration systems for showing AI behaviour in real time. Watching a system fabricate a citation or policy reference live is often more effective than written guidance.
What We Run
Across 2024 and 2025, these systems have run:
- LLaMA-family derivatives at 7B and 13B parameters
- Mistral 7B and instruction-tuned variants
- Mixtral configurations where memory allowed
- Quantised models at 4-, 5- and 8-bit precision
- Local embedding models paired with retrieval pipelines
These setups observe how hallucinations behave under constraint. Context saturation, clipping, incomplete source material, and aggressive quantisation all produce the same pattern: confidence remains high but accuracy degrades quietly.
What We Learned
Cloud APIs hide failure modes. Local systems do not. Retries, truncation, fallback behaviour, and resource starvation are smoothed away in managed environments but still shape outputs.
Constraint changes behaviour. Under pressure, models hallucinate more decisively, not less. Larger models sound safer, but they are often harder for humans to challenge once fluent output is treated as sufficient.
Operational risk does not stop at the model. Hardware provenance, configuration drift, and supply chain integrity matter.
Reducing precision does not make outputs more cautious. It makes them wrong more efficiently.
Independence
All infrastructure is owned and operated by Bot Research. We do not accept hardware partnerships, vendor loans, or sponsored configurations. Complete control over test conditions requires complete ownership of the systems.
This independence is the foundation of our credibility. When we document a failure mode, there is no commercial relationship shaping what we can say about it.