Technology
What the technology is
Tumbling Dice has developed a new class of compute architecture designed to overcome the limitations of the traditional Von Neumann model. Instead of relying on heavy, power-hungry GPU pipelines, our approach brings computation closer to the data, enabling massively parallel execution with far lower overhead. This makes the technology ideal for modern AI workloads, where concurrency and efficiency matter more than raw brute force.
Unlike fixed-function accelerators, which are optimised for specific model families, our architecture is flexible and model-agnostic. It adapts to emerging AI approaches without requiring new silicon, giving us a broader application footprint and a faster innovation cycle.
By combining lightweight model structures with hardware-native execution, we deliver fast, predictable inference on compact, low-power devices, thereby unlocking AI capability in environments where GPUs and specialised ASICs are impractical or uneconomical.
why it works
Most traditional processors chase performance by pushing clock speeds into the gigahertz range. Our FPGA-based accelerator takes a different approach. Even running at 60 MHz on a low-end FPGA architecture, it delivers over four times the throughput of a modern superscalar CPU core while using around one-fortieth of the power.
The advantage is not raw frequency. It is architecture.
Instead of one fast processor core working sequentially, the accelerator executes thousands of hardware operations in parallel, each tailored to the workload. This deep concurrency means more work gets done per cycle, even at dramatically lower clock speeds.
The result is a platform that is faster, cooler, and far more efficient. Not because it works harder, but because it works smarter.
features & benefits
High Throughput at Low Clock Speeds
Our architecture delivers exceptional performance even on low-end FPGA platforms. By exploiting deep parallelism, the system achieves high throughput without relying on high clock frequencies or power-hungry cores.
Ultra-Low Energy and Water Use
Efficiency is built into the hardware. The accelerator uses a fraction of the energy of traditional CPU and GPU-based systems, reducing both operational costs and the environmental footprint. This includes reducing the water required for cooling in data-centre environments.
Scalable by Design
The architecture scales from small, low-power devices to large multi-FPGA or ASIC deployments. This flexibility allows the same core technology to support everything from embedded systems to highperformance workloads.
Edge AI Ready
AI is moving out of the cloud and into everyday devices. Our accelerator is designed for this shift: compact, efficient, and capable of running advanced models close to where data is generated.
Available Technology Pathways
The platform runs today on widely available FPGAs and can be migrated to ASIC for even greater performance, efficiency, and cost reduction. This ensures a clear roadmap from prototype to mass deployment.
Reconfigurable Hardware
The system can be re-programmed to support new models, new workloads, or updated algorithms without replacing hardware. This reduces upgrade costs and extends the lifetime of deployed systems.
Low Cost
By combining low-end FPGA compatibility with a clear ASIC path the technology offers a compelling cost profile, both in terms of upfront hardware and ongoing operational expenses.
Protected IP
The core architecture is patent-pending, securing the company’s competitive advantage and protecting future commercialisation.
How we compare to a GPU
High-end GPUs dominate in data-centre environments. They are engineered for maximum raw throughput, using thousands of cores, high clock speeds, and large power budgets to train and run very large AI models. They excel at this, but they come with significant energy and cooling demands.
Our approach is different and optimised for a different purpose.
Not Brute Force
We do not match GPU clock speeds, and we do not need to. Instead of relying on frequency and large power budgets, our architecture achieves high throughput through efficient, workload-specific parallelism.
Parallel Efficiency
The FPGA fabric provides a reconfigurable hardware environment that excels at highly parallel tasks. Rather than forcing every workload through a fixed GPU pipeline, the hardware adapts to the model, not the other way around.
Edge-Ready AI
This efficiency makes the accelerator ideal for running small language models (SLMs) and other AI workloads directly on devices where power, heat, and physical space are limited. It brings meaningful AI capability outside the data centre and into real-world products.
Low-Power Server Deployment
The technology can be deployed in server-farm environments and doing so offers a clear environmental advantage. Traditional CPU and GPU-based clusters require substantial power and cooling infrastructure, driving both energy use and water consumption. Our accelerator provides a low-power alternative for inference workloads, enabling data-centre operators to reduce operational costs and significantly reduce their environmental footprint.
A Different Class of Performance
GPUs remain the right tool for training and serving very large models in high-power environments. Our accelerator is built for the opposite end of the spectrum: fast, efficient, low-power inference at the edge with the added benefit of enabling greener, lower-impact server deployments where appropriate.
Where it fits
Our accelerator is designed for real-world deployment across a wide range of environments. Its efficiency, scalability, and reconfigurable architecture make it suitable for multiple markets, from compact embedded devices to larger low-power server installations.
Edge Devices
Ideal for running small language models (SLMs) and other AI workloads directly on devices where power, heat, and space are limited. This enables meaningful AI capability without relying on cloud connectivity or data-centre infrastructure.
Embedded Systems
The low-power, reconfigurable design makes it a strong fit for industrial, automotive, medical, and consumer products that require fast, reliable inference on-device. It supports long product lifecycles and can be updated without replacing hardware.
Low-Power Server Deployments
While not designed to replace high-end GPUs in large data-centres, the technology can be deployed in server-farm environments for inference workloads. Its efficiency reduces energy use, cooling requirements, and overall environmental impact, offering a greener alternative to traditional CPU- and GPU-based clusters.
FPGA Today, ASIC Tomorrow
The platform runs on widely available FPGAs, enabling rapid development and early deployment. A clear ASIC roadmap provides a path to higher performance, lower cost, and large-scale commercial rollout.
A Flexible Footprint
From handheld devices to rack-mounted systems, the architecture adapts to the environment. It delivers consistent advantages wherever power, heat, and efficiency matter more than brute-force compute.