What we do
We spend our time working with AI systems directly. Building, testing, breaking, understanding. The research comes from the work.
Current focus
Local AI infrastructure. Running frontier models on our own hardware. NVIDIA Blackwell, DGX systems, multi-GPU configurations. This gives us complete control over what we test and how.
Agentic workflows. Building and testing AI agents for real work. We use tools like Claude Code daily and develop our own automation systems.
Frontier systems. Working with new releases as they ship. Understanding what changes, what improves, what breaks.
Reliability observation. Documenting how AI systems actually behave when relied upon for consequential work. The gap between claimed and observed capability.
What we have learned
Working with AI systems daily surfaces patterns. Some of what we have observed:
Observations
Confidence without accuracy. Systems can sound authoritative while being wrong. The fluency can mask failures.
Context matters. The same prompt behaves differently across models, configurations, and context lengths. Reproducibility is harder than it looks.
Agentic failures cascade. When multi-step workflows break, they often break in interesting ways. Small errors compound.
Local deployment reveals more. Running systems yourself, without API smoothing, shows failure modes that cloud access hides.
Longer-term work
Some projects take time. We are developing methodology for assessing when AI reliance is reasonable in professional contexts. This work is informed by our daily use of these systems and will be published openly when ready.
Infrastructure
We operate local AI infrastructure for testing that cannot be done through API access alone. Complete control over test conditions, ability to run identical tests across model versions, and freedom to publish without commercial constraints.
Our hardware includes NVIDIA Blackwell, DGX systems and multi-GPU configurations capable of running current open-source models locally. All tests are documented with exact prompts, outputs, and analysis.
Writing
We publish what we learn. Observations, experiments, things that surprised us.
Interested?
We are open to research collaboration, advisory conversations, and support for what we are building. If this sounds interesting, reach out.