Though the company has often lagged behind the likes of Amazon and Microsoft in terms of its cloud business revenues, Google is pitching itself hard as the place to be when it comes to AI infrastructure investments.

Though debatable in reality, during the Google Cloud Summit in London in October 2024, Tara Brady, Google’s EMEA president, claimed it was a “fact” Google created generative AI – likely referring to the 2017 Google Research paper Attention Is All You Need, which introduced the transformer concept underpinning many of today’s genAI models.

Issue 55 – Poles Apart

Building a submarine cable through the North Pole

20 Dec 2024

Whether startup or enterprise, Google wants your AI dollars. Brady said that today 90 percent of AI unicorns use GCP, and this year has seen the company announce AI-centered deals with companies including Vodafone, Warner Bros. Discovery, Mercedes-Benz, Bayer, Best Buy, Orange, PwC, and others. Pfizer, Hiscox, Toyota, Lloyds Bank, Bupa, and Monzo were also named on stage in London as AI customers.

“We’re super excited,” Google Cloud GM for infrastructure, Sachin Gupta, tells DCD. “When you look at the number of industries, where they’re moving from experimentation to scaling and production, I think that’s very, very exciting.”

“AI is forcing a decision,” he says, speaking to DCD at the summit in London. “Unlike legacy and traditional applications, AI, in most cases, requires a new infrastructure investment.”

Hypercomputer – efficiency gains

Google is certainly investing. Like its fellow hyperscalers, the company is rapidly building out new locations globally and expanding existing campuses.

When asked if the rapid build-out of AI capacity is breaking the traditional public cloud data center architecture of large regions serviced by multiple nearby availability zones, Gupta notes approaches may have to be different for training versus inference.

“When you’re doing large-scale training, you want contiguous, large clusters. There are two ways to achieve that; you put them in the same location, or they’re close enough and put so much network bandwidth that it doesn’t become a bottleneck,” he says. “Both options exist, and depending on the location and what’s available, we’ll pursue a certain design option.”

“For inferencing, customers see that as more you’re now serving an end customer, and so the reliability and availability of that, the latency of that option, can change sort of how you think about designs.”

A key part of Google’s AI ambition is its hypercomputer concept. Announced in late 2023, the hypercomputer is described as a unified architecture combining and optimizing all elements of the software and hardware stack.

“We think of it as an architecture of how we build the entire AI-optimized infrastructure with all of the components,” he says.

That holistic concept runs from the physical data centers’ electrical and cooling systems, to the storage, networking, and IT hardware, right through to the software stack including different services and load balancing.

“How do you get the most benefit for customers out of that infrastructure in the most cost-effective way? Every single one of those components matters.”

Those gains combined, he says, means Google thinks it can glean around four times the performance from a GPU or TPU – the AI chips the company developed in-house – compared to a siloed approach.

In an age where companies are buying tens of thousands of GPUs for millions of dollars and building out gigawatt-scale clusters to house and power them, this improvement is significant.

On the company’s ongoing data center build-out, Gupta says Google’s thinking on AI infrastructure is generally spread across three broad buckets that consider different latency and sovereignty requirements.

“If it’s latency insensitive and you don’t have sovereign requirements, it’s probably best for the customer to just put it in a few locations in the world, and you can use any of our larger clusters to go do that,” he says.

“There are also many countries where you can train the model on general data anywhere. But to serve the model, or fine-tune the model with your own data, it must be in your country. For that, we look at which countries have those needs and how we put both GPUs and TPUs in-country to support customers.”

“The third way to think about it is sheer latency requirement, where now I need to be, wherever my end customers are. There are going to be unique use cases where I need single-digit millisecond latency and have to be much closer to the end user. But I think the latency-sensitive use cases are [in the] earlier stage.”

For the first two buckets – large clusters and in-country requirements – Gupta says the company’s current and planned region footprint is sufficient. For the low-latency and stricter sovereignty requirements, the company is seeking to offer on-premise cloud solutions.

Bringing AI to the on-premise cloud

Most of the major cloud providers are long past trying to convince large enterprises they should move every single workload to the cloud.

It was interesting, however, at Google Cloud’s London summit, to see a slide claiming the industry is now ‘past the last cycle of enterprise data center refreshes’ and that even large enterprises were no longer just ‘cloud first’ but aiming for ‘cloud only,’ though various headlines about cloud repatriation may suggest otherwise.

Gupta is more measured on the idea of the death of enterprise data centers.

“For AI, I have to do something new. But which new path do I take?” he says. “If you want the maximum scale-up and down, the maximum flexibility, the latest innovation delivered, and those economies of scale, public cloud is the best place to do it.

“But I work with a lot of customers where that’s just not going to work. If something keeps you on-premise, we want to make sure that you get that same cloud experience and you can get AI anywhere you need it,” he adds.

“Defense, federal government-type, highly regulated industries, central banks, the energy sector; there are just many use cases where they just are not ready to or cannot leverage the public cloud for some of their workloads and data.”

Google Distributed Cloud is the company’s answer to bringing cloud-like capabilities and consumption models to customers’ on-premise or Edge locations. Like Amazon Web Services (AWS), Microsoft Azure, and even Oracle Cloud, Google offers pre-configured and managed on-premise servers and racks that customers can place in their own data centers or other Edge locations, that offer access to Google’s Cloud services.

It uses third-party hardware from the likes of HPE or Dell rather than its own proprietary servers.

The connected version of the service starts at a 1U server, scalable to hundreds of racks. The air-gapped offering starts as a small appliance and also scales to hundreds of racks.

A notable customer is McDonald’s, which is putting Google Distributed Cloud hardware into thousands of its stores worldwide to analyze equipment. Several companies are using the service to offer their own sovereign cloud to customers.

For customers that want AI capabilities on-premise but want a cloud-like experience – and cost model – without the need to invest in GPUs, on-premise cloud offerings from the hyperscalers could be a viable option, potentially offering lower latency and costs.

AI-based use cases Google is seeing for on-premise customers include running translation, speech-to-text models, enterprise search capabilities, and local inferencing.

“We hear sometimes that customers are OK to train their model in the public cloud, but then they want to bring it on-premise, fine-tune it, and build their own agent or whatever application they’re trying to build,” says Gupta.

Application modernization – refactoring and updating existing enterprise applications while trying to build out new AI capabilities – is another driver for these kinds of on-premise deployments.

“Next to that AI application or agent is still a ton of enterprise applications and data in many different locations, a lot of it in VMs, sitting on-premise,” Gupta says. “As enterprises look at the investment decision on AI, how much of the rest of that estate also goes to a new cloud model? This can help you migrate those on-prem environments into another on-prem environment that is a cloud environment with Cloud APIs.”

For now, Gupta says the distributed cloud offering only provides Nvidia GPUs; customers are currently unable to get access to Google’s own high-end TPUs outside Google data centers – though it offers a mini version known as the Edge TPU on a dev board through its Coral subsidiary.

Google isn’t alone in this; neither Microsoft nor Amazon offer their custom silicon as part of their on-premise services. When asked if Google’s TPUs could ever make it to customer locations via its Distributed Cloud, Gupta says the company is open to the idea.

“Could we evolve that to AMD or Intel, or to supporting our own servers with TPUs? Yes. But right now, for the use cases we’re seeing so far, we feel we’ve got them covered with Nvidia’s A100 and H100, and we will look to support H200 as well.”

He adds: “We will continue to look at that market, and if we need to support different hardware there, of course, we absolutely will evolve.”

Sovereignty – the real AI investment driver?

Google, like its public cloud peers, is heavily investing in existing markets such as the UK, where the company announced a new $1 billion data center earlier this year.

Amid a major capacity crunch in many established markets, especially in Europe, DCD asks if sovereignty was partly driving the company to invest in new AI-focused facilities in these challenging markets. Gupta says that is “100 percent” the case.

“It varies depending on the use case what you need to put in,” he says. “How much infrastructure and what infrastructure you put in, really depends on the use cases you’re trying to service in the country.

“There are customers who have sovereignty requirements that require data to remain in-country. They must do inferencing in country. And so we need to look at how we augment and grow our infrastructure to support that.”

Describing this as “a continuum of sovereignty,” he continues: “We build data centers of all kinds, all different sizes, and we will put the right infrastructure based on the market needs; it could be TPUs, GPUs. It could be CPUs.”

After being the place where genAI began, and watching its competitors take much of the market, Google now hopes that this continuum will finally let it become the home of AI.

Chips chips chips

When it comes to AI chips, Google currently offers access to either Nvidia GPUs or its own Tensor Processing Units (TPUs).

“We’ve all heard Moore’s Law is slowing down on the CPU side,” Gupta said on stage at Google Cloud’s summit. “Thankfully, we can do a lot more processing in the same footprint on the GPU-TPU side as we go from one generation to the other.”

TPUs, an AI accelerator application-specific integrated circuit, were first developed in 2015 for the company’s internal use – including training its Gemini AI models. They were made available to Google Cloud customers in 2018, and the company is now onto its sixth generation TPU, known as Trillium, unveiled in May 2024.

Oracle, Microsoft, Amazon, and IBM all also offer access to Nvidia and AMD GPUs, with Microsoft and Amazon also offering their own custom silicon. IBM is currently the only one to come out saying it will offer access to Intel’s Gaudi 3 GPU.

Despite its rivals all announcing plans to support AMD’s latest GPU, the MI300X, Google has been surprisingly quiet. An April 2024 report from The Information suggested the company does not plan to offer AMD’s AI chips and “feels good” about its current hardware selection.

However, when asked about expanding what AI hardware Google might offer in future, Gupta says the company aims to be flexible to suit customer demand.

“We’ve been completely open with the CPU side,” he says. “In terms of GPUs or accelerators, we’ll be open on that side too.”

On the CPU side, Google currently offers access to chips from Ampere, AMD, Intel, and its own custom Axion chip.

“There’s no restriction for us. We are very open to using the latest and greatest innovations that our customers want there. And so if that happens to be AMD, if that happens to be Intel, absolutely we will go there.”