Blog 1: GTC 2024

March 23, 2024 Rajath Narasimha

Introduction

The first-to-market advantage albeit even with just enough model accuracy has been the dominant force in driving up compute needs for methodologies like few-shot/zero-shot learning with fewer epochs to train. While prioritization is key, the implications are neither immediately apparent nor easy to quantify.

In the AI world, it’s a tough call to make for trade-offs with different dimensions across accuracy, perplexity vs ethical humans first policy vs time to market. Sometimes even the safest option isn’t market-ready enough and can be riddled with biases, hallucinations, and inaccurate information by jailbreaking the model.

Various multi-modal models have seen harsh criticism in the recent past. The silver lining from these previous launches is that from here we can only make it better by being more adaptive and sensitive to the general public’s needs by instituting a better culture that reflects the world in an unbiased, fair, and equitable manner.

The process of creating an AI workflow can be broadly bucketed into 3 different segments from a platform engineering perspective.

a) Data collection and curation – Exploratory Data Analysis (EDA) and Extract Transform Load (ETL)

b) Distributed training for optimizing compute, low latency network, and storage with seamless workload orchestration

c) Accelerated, cost-effective, and low latency inference of massive GPT models with adequate frameworks

Product Manager’s POV

In the past, research and academia have been fairly isolated from the industry except for a few rare situations with different product horizontals. At GTC 2024, I observed a reverse trend where the emphasis on research was pretty high, and people invested in the AI/ML community have become fairly effective in reading and digesting the latest in Data Science papers to make meaningful decisions to be impactful at their jobs.

Seldom have I seen any data center product get this amount of engagement during a product launch. You’d expect them to be hidden away and locked up in a metal cage in a remote location. On the contrary, the GPUs were the front and center, creating all the buzz and the main showstopper at the keynote which felt no less than a rock concert at the SAP Center in Santa Clara.

Moving on, here are the top 5 segments that I believe would contribute to the acceleration towards general Artificial General Intelligence (AGI) promising compounded efficiencies over the next 5 years to gain critical mass across different industries (excluding, regulatory and legal aspects, etc.,)

1) AI inferencing optimization using NIMs and Blackwell GPUs

Gallery 1: Blackwell GPUs launch

The AI innovators seem to have a good grasp on efficient training and testing a model with various options on platforms. However, running deployments and inference using LLMs have become incrementally expensive and inefficient. In the keynote, Nvidia announced a Cloud Native API called NIMs (NVidia Inference Microservices) comparable to a docker container and will be a part of the Nvidia NeMo (Neural Modules) package for efficient inference responses.

I’m told that NIMs would greatly simplify the AI model deployment process by packaging algorithmic, system, and runtime optimizations and adding industry-standard APIs allowing integration with existing applications and infrastructure without intrusive customizations. Introducing the latest Blackwell GPUs could support cross-modal retrieval across all 6 modalities supporting 1 Trillion parameter models running at 1 Exa FLOP/sec to support large inference requests with large LLM sizes. If priced right, the adoption at ~$1/hr provides massive value given the limitations of the existing enterprise-grade production-ready options.

2) Advanced multimodal models, explainable AI, enterprise AI agents, and RAG

Gallery 2: Various LLM Models

Newer transformer model architectures like Vision Transformer (ViT) and Visual-Linguistic Transformer (VLT) will hold a lot of importance in processing both visual and textual information with cross-modal parameter sharing with semantically rich representations from multimodal data sources. As it gets adopted and fine-tuned for applications it will begin to be used more prominently in daily instances as compared to the limited text generation available today.

Also, being able to explain the wizardry and possibly control the information generation behind the layers to some degree will become necessary. With attention-based methods producing massive efficiencies already, more of the self-attention and cross-attention mechanisms could be used to visualize the contribution of individual input features to model predictions and natural language explanations conditioned on model internals.

With so many unique implementations of RAG (Retrieval-Augmented Generation) already in production, in the future, a solution for being able to dynamically search, retrieve, and generate contextually relevant information in multi-modal formats through a very large-scale multiformat database is going to be critical.

3) GPU hardware re-selling models for Enterprise Applications, rise in AI PCs

As the size scale of the Enterprise AI models increases, the need for economical, reliable, and on-demand GPU resources is critical for model training, testing, and inference. In the wake of this demand, the hyper scalers have all built out a solid scalable CPU cloud infrastructure, but this infra isn’t fully compatible with most parallel processing applications and just isn’t energy efficient in the long term, even with the latest low-power ARM architectures which boast lower cooling and energy costs. The jury is still out on whether they can provide a similar GPU environment with low or no data egress costs at reasonable prices for enterprises.

Enter the GPU hardware resellers who have solved exactly this problem undercutting prices by as much as 80% of hyperscalar compute pricing with almost guaranteed long-term contracts and access to the latest DGX Cloud and H100 GPUs. I’m told that they have been inundated with requests and investing heavily to build up their GPU Cloud infrastructure.

In a different environment raising multi-billion dollars for buying depreciating assets could have been a risky business. Given the market liquidity with the Fed announcing interest rate cuts as early as June 2024, how computing is soon becoming a commodity and the race to an eventual Automated General Intelligence (AGI), these businesses are on track to project triple-digit growth numbers. By the end of 2030, given the right leadership, partnerships, supply chain, and infrastructure, I won’t be too surprised if a few of these logos with this business model are valued at close to ~$100B while beating YoY revenue growth expectations.

Gallery 3: GPU reseller business models products

AI-powered Personal Computers (PCs) and MACs tackle the fundamental computational hurdle inherent in processing complex AI algorithms at scale for the everyday consumer. By integrating specialized hardware components like Graphics Processing Units (GPUs) and Tensor Processing Units (TPUs), alongside sophisticated software frameworks such as TensorFlow and PyTorch, these systems excel at executing intensive AI workloads, including deep learning models with millions of parameters.

Additionally with products like AI agents carrying out edge network workloads, the productivity on consumer electronics is expected to shoot through the roof. Imagine as a Product Manager, if your AI Agent can send automatic meeting invites, create meaningful custom slide decks, generate YoY revenue predictions, send out quarterly reports, track and summarize the tech debt, and track progress on major EPICs. Given the speed and momentum, I’m positive that all of these products and functionalities will get launched faster than you can imagine.

Gallery 4: AI PCs and MAC products for general consumers

4) Solid-state storage, software-defined storage, and workload orchestrators

The majority of the ML jobs are read/write intensive which demands almost zero latency. GPU cloud infrastructures would struggle to efficiently manage and utilize GPU resources, leading to inefficiencies, and suboptimal performance without breakthroughs in data storage technologies. The hybrid storage array technology combines parallel file systems along with scale-out and scale-up architectures increasing dynamic data storage capacity. The latest in Software-defined storage technology decouples the hardware network from the software layer which is compatible with multi-cloud and comes with a set of core features that allows for more fluid control of data across public and private clouds and across VPNs.

With the shortage in GPU resources, getting an almost 99% utilization rate on your GPUs by your AI/ML research and platform teams is going to dictate their productivity. While not built for AI/ML workloads, Kubernetes has emerged as the most popular containerized AI application resource management, scheduling, and automation, making it well-suited for deploying and managing AI workloads. Several tools exist which are mostly built on top of Kubernetes to extract data analytics from the utilization, some better than others with better UI and user-conscious applications that are fun to use and make the orchestration a seamless experience.

Gallery 5: Data platforms and workload orchestrator products

5) Data center solutions, water-cooling technology, and clean energy harvesting

Between AI companies bringing their models, large-scale customers using it for inference, and crypto mining companies using it to mine cryptocurrencies, the need for parallel accelerated computing has had a ton of demand with a limited supply of quality and quantity of GPUs in the market.

Given such an environment, challenges in data center design exist with network fabric optimization to implement low-latency, high-bandwidth fabrics like RDMA (Remote Direct Memory Access) over Converged Ethernet (RoCE) or higher InfiniBand speeds to facilitate fast data transfers between GPU nodes. When the GPU rack is being used 24/7/365, power distribution units(PDUs), network switches with redundant switches, network paths, and failover mechanisms are key.

Coolant formulations with enhanced thermal conductivity and corrosion resistance properties with dynamic and adaptive cooling control algorithms can adjust coolant flow rates, fan speeds, and pump pressures based on real-time temperature. In addition, integrated thermal health monitoring with diagnostics is essential to provide comprehensive water-cooling infrastructure to detect and mitigate potential issues such as leaks, pump failures, or coolant contamination. Drawing from previous personal experiences, the air-cooled technology can only be resilient for smaller racks, and liquid cooling while painful to design, manage, operate, and deploy, they usually have better throughput and reliability.

Finally, one of the most expensive operations of the entire GPU Cloud setup is power consumption and sourcing clean energy for attaining carbon neutrality. Integrating multiple hybrid smart renewable energy sources, such as solar, wind, and hydroelectric power, into hybrid energy systems to mitigate intermittency issues adjusting to workload scheduling based on real-time energy prices and availability.

Gallery 6: Data center solutions, water-cooling technology, and clean energy harvesting solutions

Outro

Navigating the future of the AI landscape demands a near-perfect calibration of trade-offs between model precision, ethical integrity, and rapid deployment imperatives, compounded by the complexities of bias mitigation and interpretability assurance. Technological advancements in GPU cloud infrastructures, accelerated by AI inferencing optimization, multimodal architecture refinement, and GPU reselling business model will converge using cutting-edge innovations such as software-defined and solid-state storage, workload orchestration platforms, and clean energy-consuming data center architectures.

This delicate ecosystem serves as the linchpin for propelling the trajectory toward achieving Artificial General Intelligence (AGI), catalyzing scalability, operational efficiency, and long-term environmental sustainability within the AI paradigm. I fully realize that while my views on this are fleeting and may be biased from attending GTC 2024 recently, given the rapid advancements in AI, I wouldn’t be too surprised if we get hit with a new tangential list of challenges beyond the ones listed here shortly.

But what’s certain is organizations with product leaders irrespective of size and scale who can anticipate and prioritize these customer and market needs ahead to solve them in the least amount of time, with a unique value proposition to drive impact will have substantial success in a market full of opportunities ahead.

At GTC 2024, I saw plenty of them already executing their vision, I’m excited to see how many more will join them, off to the races we go toward AGI!

Disclaimer: All opinions shared on this blog are solely mine and do not reflect my past, present, or future employers. All information on this site is intended for entertainment purposes only and any data, graphics, and media used from external sources if not self-created will be cited in the references section. All reasonable efforts have been made to ensure the accuracy of all the content in the blog. This website may contain links to third-party websites which are not under the control of this blog and they do not indicate any explicit or implicit endorsement, approval, recommendation, or preferences of those third-party websites or the products and services provided on them. Any use of or access to those third-party websites or their products and services is solely at your own risk.