In 2022, Fractile CEO Dr. Walter Goodwin was trying to build “general purpose robot brains.”
While making robots that are good at many things – most are currently good at just doing one specific thing over and over again – was the focus of Goodwin’s efforts, he says at that point in his studies at the UK’s Oxford Robotics Institute, his interests became less about how well a robot could pick up a mug, and more about scaling laws.
Having spent four years working on big vision and language AI models that had been trained on images and texts scraped from the Internet, Goodwin says he saw how scaling laws had been “rocking that AI world,” particularly when it comes to model training and the idea that increasing the training flops for a foundation model’s training run would bring a deterministic improvement in how that model works.
At the time, Goodwin says he was part of a group that started pushing the idea that, as foundation models continue to permeate our lives, this would be accompanied by an inevitable shift in how we think more broadly about AI.
He explains that, from 2011 to “maybe 2020,” every problem had its own specific neural network, meaning companies would collect a data set, find the right neural network architecture, and then train it for a specific application until it became good enough.
“Towards the end of my PhD, what I was increasingly convinced by was that [the application-specific neural networks] were going to be eclipsed by this idea of the ‘do it all’ AI model that is trained on just huge swathes of data and would generalize very well,” Goodwin says. “I was seeing that in robotics, and I could see that the same would start to happen with language and vision.
“And what happens is, when you get that shift, the big story in AI stops being: ‘How do we train a slightly better model;’ it actually stops being so much about that training at all, and it shifts much more to: ‘If we’re going to be running this small set of models at this vast scale, how are we actually going to do that in a sustainable way?’”
With that in mind, Goodwin returned to his electrical engineering background, forming Fractile to answer the question: “If we’re heading towards a world where most compute power is focused on inference, is our current hardware fit for purpose?”
For Fractile, the chip company he founded, the answer was a resounding no.
AI inference is your new best friend
While training has dominated AI conversations in the last few years, recently companies of all sizes have publicly announced a move away from training AI models to focus on the less computationally intensive inference, where, simply put, an AI model uses the patterns it's been trained on to make predictions.
On Microsoft’s Q1 2025 earnings call in October, CEO Satya Nadella said the company was on track to generate $10bn in annual revenue from AI inference and, as a result, the company was turning away requests to use its GPUs for training "because we have so much demand on inference."
Goodwin says that platforms built based on von Neumann architecture – the basis for most general-purpose computers – are full of memory-bound stages which cause a latency cost tradeoff. This is less of an issue when training models because throughput, not latency, is the focus. However, when it comes to inference, users expect it to be done at speed, what he calls the “inference time concept” – “[OpenAI’s] o1 is the best model in the world, but you have to wait 15 seconds for an answer.”
Goodwin says that, for the best part of the last two years, Fractile has been talking to people about this concept of inference scaling.
“We’ve had scaling laws in training where you increase the training flops, but what inference scaling says is that AI performance is actually about two things,” he explains. “How good you can make the base model, and how to use more compute to get better AI outcomes.”
To achieve these better outcomes, the UK-based company has been developing chips that use in-memory compute, an approach that allows processors to run calculations directly in computer memory. Goodwin says that by taking this approach, the company hopes to create hardware that reduces power consumption and improves performance, all while allowing for faster and less expensive inference at scale.
In July 2024, Fractile emerged from stealth, having raised $15 million in seed funding from a round co-led by Kindred Capital, Nato Innovation Fund, Oxford Science Enterprises, and a number of angel investors, including entrepreneur Stan Boland, a former Arm and Acorn Computers executive who has built and sold a number of chip and AI companies.
Since this interview was conducted, it has added former Intel and VMware CEO Pat Gelsinger to its list of backers. Gelsinger, who 'retired' from the chipmaking giant in December, will also advise the company as it grows. In a LinkedIn post announcing his investment he praised Fractile for its "radical approach" to tackling the inference question.
While Fractile has yet to bring its product to market, the company believes its hardware will ultimately be able to run large language models 100× faster and 10× cheaper than Nvidia’s GPUs, and have a 20× better performance per watt of energy than any other AI hardware currently on the market – although by the time it does launch, competitor hardware will have advanced substantially.
Goodwin notes that, while there are a few companies that are also exploring this concept of more on-chip memory, what Fractile is looking to do differently is remove the need for a separate memory bank and processor, allowing the company to better address what Goodwin believes is the most critical limitation in compute scaling right now, power.
“[With Fractile’s approach] what you can achieve is far, far, far higher than you would get if you just had that near memory compute piece. While [near memory] is good for driving up the bandwidth, it doesn't drive up your TOPS per watt, so you still have a chip that ultimately is going to be thermally limited. For a long time now, we've been thermally limited in how we scale up these systems.
“[For Fractile] it’s about building a system that will allow us to run inference at scale for these very large models, far faster. That means more tokens per second, more words output per second per user, but also doing all this in a much less expensive way as well.”
And unlike most companies focusing on in-memory compute that have thus far mostly deployed hardware in low-power Edge devices, Goodwin says what’s exciting for Fractile is that it’s one of the only companies trying to bring this technology to data center scale workloads.
“That’s one of the things that's more unique about what we're doing,” he says.
However, despite the company’s ambition, Goodwin explains that one thing Fractile has been careful not to do is simultaneously reinvent too many things, as it's important that the company can achieve not only a good time to market but also scale up production and play within the rules of existing semiconductor manufacturing.
“Fractile is doing things that might be kind of radical in terms of the circuits that we're designing for in-memory compute and how we're thinking about architecture and software but at the lowest level, from a silicon design point of view, we're doing our test chips at TSMC process nodes and our production chips will be at cutting-edge FinFET nodes on standard foundry processes. In that sense, we're seeking to be as normal as possible from a manufacturability point of view.“
Inference: The zero-billion dollar market
Given the costs associated with getting silicon to market – Goodwin says a mask set alone costs upwards of $10 million – the first large-scale silicon the company produces will be its first product.
Fractile has been working on prototype test chips, but to date, the designs have only been tested in computer simulations. While he declines to divulge the company’s anticipated timeline for bringing a product to market, Goodwin says tape-outs are expected in the next few months.
When asked if the semiconductor industry is at an inflection point, and whether we might begin to see a split form between the ever-dominant incumbents who are having great success with their tried and tested chip architecture and the enthusiastic startups who think there’s a new, better approach to be found, Goodwin is rather optimistic about the whole affair.
“When you have a rapid emergence of a very large scale and arguably new workload, which I guess we have today with the inference of these very large models, I think the exciting thing for startups is that there are these whole new markets that emerge.
“It's the Jensen Huang quote: ‘The zero-billion-dollar market.’ In Fractile’s case, I think blazingly fast data center scale inference is today, in some ways, a zero-billion-dollar market. There's nobody that can meet the needs for that, there is no hardware that exists. Fractile is on a path to produce that hardware, so I think what we're excited about is entering that entirely new space and creating a whole set of applications that we can enable.”
Goodwin says the last six months have been exciting for Fractile, revealing to DCD that, as of October 2024, the company has just opened a new office in Bristol and is looking to add a further 10 or 15 people to its current 23-person workforce.
“The critical things that we're working on at the moment actually, beyond the silicon which we've already talked about, is a very large portion of what Fractile works on is entirely in the software layer.
"So, in terms of the markets that we're looking to serve, it's very clear that what needs to be done in order to provide a turnkey solution is to have a hardware platform with a software stack.”
More in IT Hardware & Semiconductors
Read the orginal article: https://www.datacenterdynamics.com/en/analysis/fractile-ai-chip-startup-inference-pat-gelsinger/