Home > Episode > The marketplace for AI compute

The marketplace for AI compute with Jared Quincy Davis from Foundry

enAugust 22, 2024

No Priors: Artificial Intelligence | Machine Learning | Technology | Startups

What is Foundry's primary mission for AI workloads?

How does Foundry improve the economics of cloud services?

What issues does Foundry aim to address for GPU clouds?

What traditional business model does Foundry's situation resemble?

How has cloud computing evolved since its inception around 2007?

What is Foundry's primary mission for AI workloads?

How does Foundry improve the economics of cloud services?

What issues does Foundry aim to address for GPU clouds?

What traditional business model does Foundry's situation resemble?

How has cloud computing evolved since its inception around 2007?

Podcast Summary

Foundry's innovative cloud infrastructure for AI: Foundry, a cloud built for AI, aims to improve economics by 12-20x compared to existing solutions and address issues of low utilization rate due to hardware failures and idle time between workloads, making advanced AI resources more accessible to a wider audience.
Foundry, a public cloud built specifically for AI, aims to make advanced computational resources more accessible and affordable for a broader audience. Jared Davis, Foundry's CEO, was inspired by the achievements of small teams with significant computational power, such as DeepMind and AlphaFold 2. Foundry's mission is to reimagine cloud infrastructure from the ground up for AI workloads, improving economics by 12 to 20x compared to existing solutions. The current utilization rate of GPU clouds is often lower than expected due to frequent hardware failures and idle time between workloads. Foundry aims to address these issues and increase the frequency of groundbreaking AI developments. Their primary offerings include infrastructure as a service and tools for seamless access to state-of-the-art systems. The underutilization of GPU clouds is prevalent across various user categories, including hyperscalers, large clusters, and individual users. Despite high expectations, actual utilization rates can be significantly lower due to hardware failures and idle time between workloads. Foundry's innovative approach to cloud infrastructure could help bridge this gap and make advanced AI resources more accessible to a wider audience.
Complexity of GPUs and machine learning systems: The complexity of GPUs and machine learning systems, consisting of thousands to tens of thousands of components, leads to more frequent failures in newer, advanced systems and requires specialized tooling and orchestration to keep them running smoothly.
GPUs, which are often thought of as just chips, are actually complex systems consisting of thousands to tens of thousands of components. NVIDIA's DGX and HCX systems are able to compress an entire data center's worth of infrastructure into a single box, but the failure probability of these large supercomputers, made up of interconnected GPUs, is essentially zero due to the vast number of components. This complexity leads to more frequent failures in newer, more advanced systems. The large regime in machine learning requires orchestrating a cluster of GPUs to perform a single synchronous calculation, leading to many components collaborating to perform a single calculation and increasing the potential for degradation or failure if one component fails. The hyperscalers, such as Amazon Web Services (AWS), have made assumptions about the depreciation cycle of these large, complex systems, and the definition of cloud has shifted from the originally intended sense of being a service rather than just co-location of hardware. The cloud, as we know it today, started in 2003 with the launch of AWS in 2006, and it took time for the model to catch on due to uncertainty about its value proposition. The complexities of these large-scale systems require specialized tooling and orchestration to keep them running smoothly.
Cloud Elasticity: Cloud Elasticity allowed users to access on-demand compute resources and pay for only what they used, a significant departure from the traditional model of buying and maintaining physical servers, leading to cost savings and making the cloud a critical component of modern technology infrastructure.
The cloud computing revolution, which started around 2007, was not immediately recognized as a game-changer by everyone. Early adopters, particularly startups and early cloud service providers, saw the value in the elasticity and cost savings that the cloud offered. However, it took time for enterprises, especially those in regulated industries, to fully embrace the cloud. One of the key advantages of the cloud was its ability to make compute resources available on demand and pay only for what was used, a concept known as elasticity. This was a significant departure from the traditional model of buying and maintaining physical servers. While the cloud has come a long way since then, it's important to remember that its full potential was not immediately clear to everyone. The cloud made compute resources fast and free, enabling users to run workloads much faster for the same cost. However, realizing this potential was not trivial, requiring the ability to reshape workloads and have the necessary capacity available in the cloud. Despite these challenges, the cloud's elasticity and cost savings have proven to be a major advantage, making it a critical component of modern technology infrastructure. Today, it's hard for younger engineering teams to imagine a world without the cloud, but it's important to remember that its adoption was not an overnight success.
AI hardware resources challenges: AI hardware resources present significant challenges for companies due to upfront capital requirements, inflexibility, and risk management. Innovative business models and technical solutions aim to create a more efficient and flexible system.
The current state of AI hardware resources presents significant challenges for companies, particularly in the areas of upfront capital requirements, inflexibility, and risk management. This is due to the long-term commitments and high costs associated with purchasing and maintaining hardware for AI workloads. The situation is reminiscent of traditional parking lot businesses, where customers can either pay as they go or reserve a spot upfront, but the latter comes with significant upfront cost. To address these challenges, there's a need for business model and technical innovations that enable more efficient use of resources and provide greater flexibility. One such innovation is enabling pay-as-you-go users to utilize reserved spots, creating a win-win situation for both parties. However, implementing this requires a convenient and seamless system, such as a valet service, to manage the handover of reserved spots. The current state of AI hardware resources is a challenging one, but there are efforts being made to improve the situation through innovative business models and technical solutions. The goal is to create a more efficient and flexible system that reduces the upfront capital requirements and risks associated with AI hardware resources.
GPU spot management: AWS's new product, Spot, automates and optimizes GPU usage, benefiting companies using GPUs for training and inference, with substantial scale in GPU usage across various applications
Amazon Web Services (AWS) has launched a new product on the Foundry Cloud platform called Spot, which is a mechanism for managing and automating the use of GPU spots. This system uses sensors to detect when a user is present, automatically moves their car to another spot, and brings it back when they return. This creates more effective space and better economics for companies. The use of Spot is particularly beneficial for companies using GPUs for training and inference, and it opens up interesting conversations about the different classes of workloads and their needs. AWS found that companies are using this mechanism quite extensively for these purposes. The importance of Spot usability and automation is a significant trend in the tech industry. To put the scale of GPU capacity in context, during the training of GPT-3, 10,000 B100 GPUs were used for about 14.6 days. At the peak of Ethereum, there were around 10 to 20 million V100 equivalent GPUs in use, running continuously. This shows the substantial scale of GPU usage in various applications. The Ethereum network, which had a higher relative ratio of GPUs compared to Bitcoin, had tens of thousands of NVIDIA GPUs and less than 1% of global hash power. These numbers give an idea of the vast amount of GPU power being used in various applications.
Compute power utilization in AI: Despite an abundance of high-end compute power, utilization rates are low due to factors like healing buffers and market dynamics. The future of AI may involve a shift towards smaller, smart models and distributed training across multiple data centers.
While there is an abundance of compute power in the world, with high-end GPUs like the iPhone 15 Pro having around 35 teraflops, the utilization of this compute power is quite low. According to some sources, the utilization of even high-end H100 systems is only at most 20-25%. This is due in part to the fact that during pre-training rounds, utilization can be as low as 80% due to healing buffers. However, there are tools like Mars, which monitor, alert, and ensure resiliency and security, that can help boost the availability and uptime of GPUs. Additionally, there are market dynamics that make access to the largest, most interconnected clusters a premium, but there are also paradigms emerging that don't require these large clusters. This includes the concept of "pumpkin AI systems," which are smaller, but extremely smart models that can be trained on smaller clusters. Google, for example, has been experimenting with training models across multiple data centers. The future of AI may look less like everything requiring large clusters and more like a shift towards these new paradigms. In summary, while there is an abundance of compute power, the utilization is low, and the future of AI may involve a shift towards new paradigms that don't require large, interconnected clusters.
Scaling up AI models: Researchers are exploring new ways to scale up AI models by generating candidate responses and filtering down to the best one, utilizing synthetic data generation, compound systems, batch inference, and horizontally scalable workflows, and prioritizing verifiability for high performance.
Researchers are exploring new ways to scale up AI models by making more efficient use of existing resources and parallelizing workloads. This includes generating a large number of candidate responses from a model and filtering down to the best one, as demonstrated in the Chinchilla paper. This approach is becoming more common as systems like AlphaGeometry, which utilize synthetic data generation and compound systems, gain popularity. Additionally, the cost of training and inference is becoming a more significant consideration, leading to a shift towards batch inference and horizontally scalable workflows. The paper I recently authored delves deeper into this concept of compound AI systems, where many calls to a model are composed into a network of networks. The principle of verifiability, which refers to problems where it's easier to check an answer than generate one, can guide the architecture of these systems, resulting in high performance. This approach to scaling up AI models is gaining traction as a cost-effective and efficient alternative to traditional methods.
Massive language models: Combining multiple language models in a massive network can significantly improve performance on parallelizable tasks, such as code generation and neural network design, and is expected to become a common approach in the future.
A new approach using pre-training and composing massive networks with multiple language models could significantly improve performance on various tasks, especially those that are more parallelizable. This was demonstrated in a recent paper, where a 3% improvement was achieved on the MMLEU benchmark, which is a notable gap compared to previous best models. The idea is to have each stage in the network composed of the best of multiple language models, making millions of calls to answer a question and then choosing the best response. This approach, while seeming far-fetched now, is expected to become common sense in the future for tasks like code generation, design, and neural network design. The hope is that the community will explore this further, as it seems applicable to downstream tasks and could reduce the need for large, interconnected clusters for cutting-edge work.

Recent Episodes from No Priors: Artificial Intelligence | Machine Learning | Technology | Startups

Future of LLM Markets, Consolidation, and Small Models with Sarah and Elad

In this episode of No Priors, Sarah and Elad go deep into what's on everyone’s mind. They break down new partnerships and consolidation in the LLM market, specialization of AI models, and AMD’s strategic moves. Plus, Elad is looking for a humanoid robot. Sign up for new podcasts every week. Email feedback to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil Show Notes: (0:00) Introduction (0:24) LLM market consolidation (2:18) Competition and decreasing API costs (3:58) Innovation in LLM productization (8:20) Comparing the LLM and social network market (11:40) Increasing competition in image generation (13:21) Trend in smaller models with higher performance (14:43) Areas of innovation (17:33) Legacy of AirBnB and Uber pushing boundaries (24:19) AMD Acquires ZT (25:49) Elad’s looking for a Robot

No Priors: Artificial Intelligence | Machine Learning | Technology | Startups

enSeptember 12, 2024

The Road to Autonomous Intelligence with Andrej Karpathy

Andrej Karpathy joins Sarah and Elad in this week of No Priors. Andrej, who was a founding team member of OpenAI and former Senior Director of AI at Tesla, needs no introduction. In this episode, Andrej discusses the evolution of self-driving cars, comparing Tesla and Waymo’s approaches, and the technical challenges ahead. They also cover Tesla’s Optimus humanoid robot, the bottlenecks of AI development today, and how AI capabilities could be further integrated with human cognition. Andrej shares more about his new company Eureka Labs and his insights into AI-driven education, peer networks, and what young people should study to prepare for the reality ahead. Sign up for new podcasts every week. Email feedback to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil | @Karpathy Show Notes: (0:00) Introduction (0:33) Evolution of self-driving cars (2:23) The Tesla vs. Waymo approach to self-driving (6:32) Training Optimus with automotive models (10:26) Reasoning behind the humanoid form factor (13:22) Existing challenges in robotics (16:12) Bottlenecks of AI progress (20:27) Parallels between human cognition and AI models (22:12) Merging human cognition with AI capabilities (27:10) Building high performance small models (30:33) Andrej’s current work in AI-enabled education (36:17) How AI-driven education reshapes knowledge networks and status (41:26) Eureka Labs (42:25) What young people study to prepare for the future

No Priors: Artificial Intelligence | Machine Learning | Technology | Startups

enSeptember 05, 2024

Building toward a bright post-AGI future with Eric Steinberger from Magic.dev

Today on No Priors, Sarah Guo and Elad Gil are joined by Eric Steinberger, the co-founder and CEO of Magic.dev. His team is developing a software engineer co-pilot that will act more like a colleague than a tool. They discussed what makes Magic stand out from the crowd of AI co-pilots, the evaluation bar for a truly great AI assistant, and their predictions on what a post-AGI world could look like if the transition is managed with care. Sign up for new podcasts every week. Email feedback to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil | @EricSteinb Show Notes: (0:00) Introduction (0:45) Eric’s journey to founding Magic.dev (4:01) Long context windows for more accurate outcomes (10:53) Building a path toward AGI (15:18) Defining what is enough compute for AGI (17:34) Achieving Magic’s final UX (20:03) What makes a good AI assistant (22:09) Hiring at Magic (27:10) Impact of AGI (32:44) Eric’s north star for Magic (36:09) How Magic will interact in other tools

No Priors: Artificial Intelligence | Machine Learning | Technology | Startups

enAugust 30, 2024

Cloud Strategy in the AI Era with Matt Garman, CEO of AWS

In this episode of No Priors, hosts Sarah and Elad are joined by Matt Garman, the CEO of Amazon Web Services. They talk about the evolution of Amazon Web Services (AWS) from its inception to its current position as a major player in cloud computing and AI infrastructure. In this episode they touch on AI commuting hardware, partnerships with AI startups, and the challenges of scaling for AI workloads. Sign up for new podcasts every week. Email feedback to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil Show Notes: (00:00) Introduction (00:23) Matt’s early days at Amazon (02:53) Early conception of AWS (06:36) Understanding the full opportunity of cloud compute (12:21) Blockers to cloud migration (14:19) AWS reaction to Gen AI (18:04) First-party models at hyperscalers (20:18) AWS point of view on open source (22:46) Grounding and knowledge bases (26:07) Semiconductors and data center capacity for AI workloads (31:15) Infrastructure investment for AI startups (33:18) Value creation in the AI ecosystem (36:22) Enterprise adoption (38:48) Near-future predictions for AWS usage (41:25) AWS’s role for startups

No Priors: Artificial Intelligence | Machine Learning | Technology | Startups

enAugust 29, 2024

The marketplace for AI compute with Jared Quincy Davis from Foundry

In this episode of No Priors, hosts Sarah and Elad are joined by Jared Quincy Davis, former DeepMind researcher and the Founder and CEO of Foundry, a new AI cloud computing service provider. They discuss the research problems that led him to starting Foundry, the current state of GPU cloud utilization, and Foundry's approach to improving cloud economics for AI workloads. Jared also touches on his predictions for the GPU market and the thinking behind his recent paper on designing compound AI systems. Sign up for new podcasts every week. Email feedback to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil | @jaredq_ Show Notes: (00:00) Introduction (02:42) Foundry background (03:57) GPU utilization for large models (07:29) Systems to run a large model (09:54) Historical value proposition of the cloud (14:45) Sharing cloud compute to increase efficiency (19:17) Foundry’s new releases (23:54) The current state of GPU capacity (29:50) GPU market dynamics (36:28) Compound systems design (40:27) Improving open-ended tasks

No Priors: Artificial Intelligence | Machine Learning | Technology | Startups

enAugust 22, 2024

How AI can help build smarter systems for every team with Eric Glyman and Karim Atiyeh of Ramp

In this episode of No Priors, hosts Sarah and Elad are joined by Ramp co-founders Eric Glyman and Karim Atiyeh of Ramp. The pair has been working to build one of the fastest growing fintechs since they were teenagers. This conversation focuses on how Ramp engineers have been building new systems to help every team from sales and marketing to product. They’re building best-in-class SaaS solutions just for internal use to make sure their company remains competitive. They also get into how AI will augment marketing and creative fields, the challenges of selling productivity, and how they’re using LLMs to create internal podcasts using sales calls to share what customers are saying with the whole team. Sign up for new podcasts every week. Email feedback to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil | @eglyman l @karimatiyeh Show Notes: (0:00) Introduction to Ramp (3:17) Working with startups (8:13) Ramp’s implementation of AI (14:10) Resourcing and staffing (17:20) Deciding when to build vs buy (21:20) Selling productivity (25:01) Risk mitigation when using AI (28:48) What the AI stack is missing (30:50) Marketing with AI (37:26) Designing a modern marketing team (40:00) Giving creative freedom to marketing teams (42:12) Augmenting bookkeeping (47:00) AI-generated podcasts

No Priors: Artificial Intelligence | Machine Learning | Technology | Startups

enAugust 15, 2024

Innovating Spend Management through AI with Pedro Franceschi from Brex

Hunting down receipts and manually filling out invoices kills productivity. This week on No Priors, Sarah Guo and Elad Gil sit down with Pedro Franceschi, co-founder and CEO of Brex. Pedro discusses how Brex is harnessing AI to optimize spend management and automate tedious accounting and compliance tasks for teams. The conversation covers the reliability challenges in AI today, Pedro’s insights on the future of fintech in an AI-driven world, and the major transitions Brex has navigated in recent years. Sign up for new podcasts every week. Email feedback to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil | @Pedroh96 Show Notes: (0:00) Introduction (0:32) Brex’s business and transitioning to solo CEO (3:04) Building AI into Brex (7:09) Solving for risk and reliability in AI-enabled financial products (11:41) Allocating resources toward AI investment (14:00) Innovating data use in marketing (20:00) Building durable businesses in the face of AI (25:36) AI’s impact on finance (29:15) Brex’s decision to focus on startups and enterprises

No Priors: Artificial Intelligence | Machine Learning | Technology | Startups

enAugust 08, 2024

Google DeepMind's Vision for AI, Search and Gemini with Oriol Vinyals from Google DeepMind

In this episode of No Priors, hosts Sarah and Elad are joined by Oriol Vinyals, VP of Research, Deep Learning Team Lead, at Google DeepMind and Technical Co-lead of the Gemini project. Oriol shares insights from his career in machine learning, including leading the AlphaStar team and building competitive StarCraft agents. We talk about Google DeepMind, forming the Gemini project, and integrating AI technology throughout Google products. Oriol also discusses the advancements and challenges in long context LLMs, reasoning capabilities of models, and the future direction of AI research and applications. The episode concludes with a reflection on AGI timelines, the importance of specialized research, and advice for future generations in navigating the evolving landscape of AI. Sign up for new podcasts every week. Email feedback to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil | @oriolvinyalsml Show Notes: (00:00) Introduction to Oriol Vinyals (00:55) The Gemini Project and Its Impact (02:04) AI in Google Search and Chat Models (08:29) Infinite Context Length and Its Applications (14:42) Scaling AI and Reward Functions (31:55) The Future of General Models and Specialization (38:14) Reflections on AGI and Personal Insights (43:09) Will the Next Generation Study Computer Science? (45:37) Closing thoughts

No Priors: Artificial Intelligence | Machine Learning | Technology | Startups

enAugust 01, 2024

Low-Code in the Age of AI and Going Enterprise, with Howie Liu from Airtable

This week on No Priors, Sarah Guo and Elad Gil are joined by Howie Liu, the co-founder and CEO of Airtable. Howie discusses their Cobuilder launch, the evolution of Airtable from a simple productivity tool to an enterprise app platform with integrated AI capabilities. They talk about why the conventional wisdom of “app not platform” can be wrong, why there’s a future for low-code in the age of AI and code generation, and where enterprises need help adopting AI. Sign up for new podcasts every week. Email feedback to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil | @Howietl Show Notes: (00:00) Introduction (00:29) The Origin and Evolution of Airtable (02:31) Challenges and Successes in Building Airtable (06:09) Airtable's Transition to Enterprise Solutions (09:44) Insights on Product Management (16:23) Integrating AI into Airtable (21:55) The Future of No Code and AI (30:30) Workshops and Training for AI Adoption (36:28) The Role of Code Generation in No Code Platforms

No Priors: Artificial Intelligence | Machine Learning | Technology | Startups

enJuly 25, 2024

How AI is opening up new markets and impacting the startup status quo with Sarah Guo and Elad Gil

This week on No Priors, we have a host-only episode. Sarah and Elad catch up to discuss how tech history may be repeating itself. Much like in the early days of the internet, every company is clamoring to incorporate AI into their products or operations while some legacy players are skeptical that investment in AI will pay off. They also get into new opportunities and capabilities that AI is opening up, whether or not incubators are actually effective, and what companies are poised to stand the test of time in the changing tech landscape. Sign up for new podcasts every week. Email feedback to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil Show Notes: (0:00) Introduction (0:16) Old school operators AI misunderstandings (5:10) Tech history is repeating itself with slow AI adoption (6:09) New AI Markets (8:48) AI-backed buyouts (13:03) AI incubation (17:18) Exciting incubating applications (18:26) AI and the public markets (22:20) Staffing AI companies (25:14) Competition and shrinking head count

No Priors: Artificial Intelligence | Machine Learning | Technology | Startups

enJuly 18, 2024

Ask this episode Anything

What is Foundry's primary mission for AI workloads?

How does Foundry improve the economics of cloud services?

What issues does Foundry aim to address for GPU clouds?

What traditional business model does Foundry's situation resemble?

How has cloud computing evolved since its inception around 2007?