‘The amount of inference compute needed is already 100x more’: How Europe’s AI companies can meet demands

When the cofounders of Gradient Labs met working at the UK neobank Monzo, they couldn’t have foreseen joining forces to build an autonomous AI agent for financial services. But when ChatGPT 3.5 was released in 2022, the potential was obvious.

At the time, even the most advanced customer support agents were only able to handle the simplest 25% of queries.

“We realised there was a huge amount of efficiency that could be gained,” says Danai Antoniou, cofounder and chief scientist of Gradient Labs. “Traditional language models cannot do fraud investigations, for example. They are too trivial and too basic to be able to do this intuition-based and very complicated, multi-dimensional problem solving, which humans are very good at.”

After leaving Monzo, the trio spent 15 months building their agent Otto. Early feedback has been very encouraging — 90% of customer queries are resolved out of the box, with customer satisfaction (CSAT) scores of 80-90%, which is on par or better than human customer service teams. Banks report their support costs have been reduced by up to 75%, with faster response times.

Buoyed by this early success, the founding Gradient Labs team recently closed a €11.08m Series A round and plan to expand into Europe and the US.

But that success has only been possible because they’ve invested in reliable infrastructure.

“That’s been critical for us because if you’re doing support for a bank, you cannot go down,” says Antoniou. “We do not rely on a single provider. That also allows you to work around inference limitations. It’s been an interesting problem that we’ve had to think through from the beginning.”

The challenge

The surge of AI workloads has thrown a new technology challenge into the spotlight — inference. That’s the process by which an AI model applies what it has learned during the training phase to make decisions based on new, real-world data.

The amount of inference compute needed is already 100x more.

Analysts at Morgan Stanley estimate more than 75% of power and computational demand in the US will be for inference in the coming years. Even Nvidia CEO Jensen Huang nodded to this surge in an earnings call in February.

“The amount of inference compute needed is already 100x more” than it was when large language models started out, Huang said. “And that’s just the beginning.”

For those building AI applications, inference matters a lot.

“There are three main demands,” Anton Osika, cofounder of Lovable, says about inference challenges for tech teams. The Swedish “vibe coding” startup became a unicorn eight months after its launch, when it recently closed a $200m Series A round.

“First and foremost is speed. Both raw throughput but also how much reasoning models are able to actually do. Secondly, caching is super important for agentic flows when there’s lots of back and forth between the model and the wider environment. And finally capacity. Getting access to the latest models and provisioning enough tokens isn’t easy at scale,” he says.

Demands are scaling fast

As AI adoption grows, inference workloads must also scale without overwhelming infrastructure or driving up costs. Researchers at Nebius, a global AI infrastructure company with seven data centres in Europe, the US and Israel, have inference efficiency high on the priority list.

This isn’t just about cost optimisation but also ensuring business continuity.

“The tasks we’re tackling today are relatively simple,” says Danila Shtan, Nebius’s chief technology officer. “In the future, we’ll see more complex scenarios that will require more computation resources and complex systems to enable them.”

The most advanced capabilities have traditionally been confined to closed ecosystems. However, that’s beginning to shift, and as inference demands scale fast, many startups are finding the cost of those proprietary systems unsustainable.

This has created an opportunity for platforms that make open-source alternatives. Nebius’s inference-as-as-service platform AI Studio, for example, provides access to and enables the deployment and scaling of open-source models like the latest releases from Llama, DeepSeek, Qwen and OpenAI.

“Most startups in their early stages rely on state-of-the-art models, which are always closed ecosystems with their own inference,” Shtan says. “But that becomes very expensive really fast and founders start to look for alternatives. They’ll train their own models or work with open-source models to alleviate some of that pressure. There are already a lot of capable open-source models out there and in the future, there will be more.”

It’s also a good way to maintain flexibility across providers. Shtan points to a recent controversy in the US when Anthropic cut Windsurf’s access to its Claude 3.x model, right before OpenAI moved to acquire the AI coding platform for $3bn. The move left the Windsurf team scrambling to boost its inference capacity across its other providers.

For Shtan, the episode proves the need for startup founders to build contingency with open-source.

“You can’t rely on a closed-source ecosystem unless your plan is to be part of that ecosystem forever,” he says. “This isn’t just about cost optimisation but also ensuring business continuity.”

Loveable’s Osika agrees, stating that while the best coding models will likely remain closed source in the near term, open source is crucial for transparency, customisation and avoiding vendor lock-in.

“We aim to work with both,” he says. “The real value isn’t just in the model, but in the entire system we’re building around it that makes AI coding reliable, fast and secure.”

In the future, much of the inference efficiency a startup can access will come down to the capacity of the local infrastructure, Nebius’s Shtan adds. In December, the European Commission pledged €750m in funding to establish and maintain AI-optimised supercomputers for startups to train their AI models. Nvidia is also planning to boost the AI hardware capacity in Europe by three times next year, with the amount of AI computing capacity increasing by a factor of 10.

Locality matters

Nebius recently launched operations in the UK with the deployment of a major GPU cluster, built on Nvidia’s platform. It’s due to be fully operational by the end of this year.

If you’re talking about inference, the locality matters.

“If you’re talking about inference, the locality matters,” Shtan says. “With these complex systems, it’s going to be more and more expensive to go over the Atlantic. We definitely see the demand for local infrastructure growing, and that is a firm part of our strategy.”

That’s something Gradient Labs has needed to rely on too.

“When you’re dealing with UK banks for example, they may not want their data to go to the US,” Antoniou says. And while she doesn’t see any significant barriers to Europe’s ability to compete in AI on the world stage, she says startups need to be ready to evolve as fast as the technology does.

“There’s a lot of low-level computational optimisation research being done to make these models faster and much more efficient,” Antoniou says. “What we’re focused on is making sure our agent can be evaluated as quickly as possible on the next available model. We have to put ourselves in the best position to take advantage of the latest capabilities as soon as they come out.”

Read the orginal article: https://sifted.eu/articles/europes-ai-efficiency-brnd/

‘The amount of inference compute needed is already 100x more’: How Europe’s AI companies can meet demands

Gateways to Italy

Gateways to Italy – Offer your services to funds and investors willing to explore opportunities in Italy. Become a partner!

Sign up to our newsletter

Related Posts

Defence tech mapped: 50+ European startups reshaping conflict

Lomond completes acquisition of Putney lettings specialist

Most parents willing to pay premium to live in top catchment areas

ItaHub

Crypto-assets supervision rules in Italy, Banca d’Italia will supervise payment systems and Consob on market abuse

Italy’s SMEs export toward 260 bn euros in 2025

With two months to go before the NPL Directive, in Italy the securitization rebus is still to be unraveled

EU’s AI Act, like previous rules on technology, looks more defensive than investment-oriented

Co-sponsor

Premium

AlixPartners: Automotive, retail and manufacturing sectors may go through restructuring in 2025

Funds vying for management consulting firm BIP, a CVC portfolio company. All deals in the sector

Private equity, Italy 2024 closes with 588 deals as for investments and divestments from 549 in 2023. Here is the new BeBeez’s report

Crypto-assets supervision rules in Italy, Banca d’Italia will supervise payment systems and Consob on market abuse

Data center campus planned for former oil rig factory in Dunoon, Scotland

Wirex Expands Web3 Payments in Europe with Visa-Backed EURC Settlements

C.so Italia 22 - 20122 - Milano
C.F. | P.IVA 09375120962

Aut. Trib. Milano n. 102
del 3 aprile 2013

Italy
Iberia
France
UK&Ireland
Benelux
DACH
Scandinavia&Baltics

Private Equity
Venture Capital
Private Debt
Distressed Assets
Real Estate
Fintech
Green

ItaHUB
Legal
Tax
Trend
Report
Insight view

About Us
Media Partnerships
Contact

Privacy Policy
Terms&Conditions
Cookie Police

Welcome Back!

Retrieve your password

‘The amount of inference compute needed is already 100x more’: How Europe’s AI companies can meet demands

The challenge

Demands are scaling fast

Locality matters

Gateways to Italy

Related Posts

ItaHub

Co-sponsor

Premium

C.so Italia 22 - 20122 - Milano C.F. | P.IVA 09375120962

Aut. Trib. Milano n. 102 del 3 aprile 2013

Welcome Back!

Retrieve your password

C.so Italia 22 - 20122 - Milano
C.F. | P.IVA 09375120962

Aut. Trib. Milano n. 102
del 3 aprile 2013