End-to-end cloud compute for AI/ML

en-usMarch 07, 2023

Practical AI: Machine Learning, Data Science

Podcast Summary

Simplifying cloud compute for data teams: Modal, founded by Eric Bernardson, aims to make it easier for data teams to deploy and manage their code in the cloud by focusing on the foundational compute problem.
Eric Bernardson, the founder of Modal, identified a common problem in the data community: making it easy for data teams to deploy and run their code in the cloud. He realized that starting at the foundational level, specifically addressing the compute problem, could provide a solid solution. Modal, an end-to-end stack for cloud compute, was born out of this need. Eric shared his background in working with data for over a decade, building various systems at companies like Spotify and Better. He noticed that data teams were struggling with the complexities of cloud compute, from scaling code to scheduling tasks to setting up web endpoints. By focusing on this issue, Modal aims to simplify the process, making it easier for data teams to deploy and manage their code in the cloud. The company has been particularly focused on online inference for the past six months. Eric's passion for addressing the challenges faced by data teams led him to start Modal, and his demo during the podcast left a lasting impression on the audience.
Bringing infrastructure into the innermost code loop: Modal helps data teams write and run code locally, then instantly in the cloud, eliminating slow feedback loops and improving productivity and happiness.
Modal focuses on improving developer productivity and happiness in data teams by bringing the infrastructure into the innermost code writing loop. Instead of deploying containers to the cloud and dealing with slow feedback loops, developers can write code locally and have it run in the cloud almost instantly. This was achieved by building their own container runtime, file system, and container builder. The benefit is a faster iteration speed and a more seamless experience for data teams, who often deal with large datasets and the need to run code in production. The goal was to eliminate the outer loops and focus on the inner loop of writing and immediately running code. This approach was not possible with existing solutions like Kubernetes or AWS Lambda, so Modal went deep into building the foundational infrastructure to make it work.
Challenges with traditional container platforms: To improve iteration speed and efficiency for machine learning and AI use cases, consider building a custom solution using lower-level primitives instead of traditional container platforms like Kubernetes, AWS Lambda, or Docker.
The traditional approaches to running applications in containers using platforms like Kubernetes, AWS Lambda, or Docker, have limitations when it comes to iteration speed and efficiency. For instance, the process of pushing a container image to a registry, then starting a job in Kubernetes, can be time-consuming and inefficient due to the need to duplicate and transfer large amounts of data. Docker, while revolutionary, also has inefficiencies with its layered file system and the duplication of unnecessary data. AWS Lambda, while faster than Kubernetes or Docker, still has limitations such as not supporting GPU or long-running jobs. To address these challenges, the team behind the project decided to build their solution using lower-level primitives and building much of it themselves. This approach is particularly well-suited for use cases around machine learning and AI, where rapid iteration and efficient use of resources are crucial. Overall, the team's experience highlights the importance of considering the specific needs of your use case when choosing a technology stack.
Serverless Machine Learning Platforms like Model offer benefits for deploying ML models: Serverless ML platforms provide cost-effective, easy-to-use alternatives to traditional infrastructure for ML model deployment, supporting various use cases beyond online inference.
Serverless machine learning platforms like Model offer significant benefits for deploying machine learning models, particularly for online inference, due to their cost-effectiveness and ease of use. These platforms provide an alternative to traditional infrastructure like Kubernetes with EC2, where users have to spin up and manage idle instances for model inference. Serverless models like Model also support a wide range of use cases beyond online inference, including batch processing, web scraping, computational biotech, simulations, and backtesting. The user experience for online inference is currently excellent, while the experience for batch processing and parallelism is good but not yet optimal. Model also aims to improve its support for data pipelines and scheduling in the future. The speaker, who is a fan and user of Model, described how users can write Python scripts with functions and dependencies, making it easier to manage compared to traditional AI and ML workflows. Overall, the shift towards serverless machine learning platforms is driven by the desire to avoid infrastructure management and the cost savings they offer.
Write, Run, and Deploy Code with Modal's Self-Provisioning Runtime: Modal allows developers to write, run, and deploy code in the cloud, eliminating the need for external configuration files and offering a faster feedback loop and consistent development experience through self-provisioning and importing dependencies as function arguments.
With Modal, you can write your code as if you're running it locally, but instead, it provisions and runs the necessary containers in the cloud. This means you can define the infrastructure and dependencies your functions need directly in your code, eliminating the need for external configuration files. This approach offers the developer productivity of local development and the power of the cloud, all within the same loop. By removing the need for maintaining separate local environments and the hassle of container building and pushing, Modal provides a faster feedback loop and a more consistent development experience. This idea of infrastructure and app code being intertwined is a concept further explored in the blog post "The Self Provisioning Runtime" by Swyx. With Modal, you can import necessary dependencies as function arguments and never have to worry about installing them locally. This results in a more efficient and streamlined development process.
Simplified Python function deployment with Modal: Modal simplifies Python function deployment, enabling users to test and run functions with minimal imports, and handle containerization, deployment, and scaling, making it suitable for various applications beyond machine learning.
Modal offers a simplified workflow for running Python functions, including machine learning models, in the cloud. Users can directly test and run functions with minimal imports, and Modal handles containerization, deployment, and scaling. Surprisingly, users have utilized Modal for various use cases beyond machine learning, such as web app development and job queues. The platform's focus on improving online inference and reducing startup times reflects these new use cases. Eric provided an example of a typical workflow: create a Python function, push it to Modal, and receive a URL to call the function. This simplicity sets Modal apart from other platforms and makes it an attractive option for various applications.
Simplify Python ML model deployment with Model platform: Model is a cloud-based platform that simplifies Python ML model deployment by allowing users to decorate functions with special decorators, define images using a special model syntax, and run the function in the cloud using commands like 'model deploy' or 'model run'.
Model, a cloud-based platform, simplifies the process of deploying and running Python functions using machine learning models, such as those from Hugging Face, by allowing users to decorate their existing functions with special decorators and define images using a special model syntax. This can be done entirely in code, and the platform supports installing various Python packages like transformers, accelerate, and diffusers. Users can run the function in the cloud using commands like "model deploy" or "model run," which builds a container if it doesn't already exist and runs the code. The platform supports running any Python function and allows users to install additional packages and manipulate images, among other advanced features. The platform is designed to provide a "magic" first experience for users, making it easy to install the Python package, set up a token, and run code immediately in the cloud. A notable feature is the seamless integration of development and deployment, allowing users to modify their code and test it directly in the terminal without having to switch between their code editor and terminal. This feedback loop and the ability to quickly iterate is particularly valuable for CTOs managing multiple teams and disciplines in software engineering.
Hot reloading in front end engineering: Hot reloading offers fast feedback loops for developers, enabling them to save code and see changes live in the cloud, achieved through monitoring file systems and reloading apps when updated. Although the underlying mechanism is not complex, advanced container and file system technology is crucial for a smooth experience.
Front end engineering, specifically in the context of modern toolchains, offers advanced feedback loops for software development, providing engineers with a super snappy experience that makes them happy. This is achieved through hot reloading, which allows developers to save their code and see the changes live in the cloud. Model, a web serving part of this toolchain, implements this functionality by monitoring the file system and reloading the entire app when any file is updated. Although the underlying mechanism is not complex, the foundation of fast container and file system technology is crucial to making this process relatively easy. For larger companies with existing infrastructure, migrating to such a system can be challenging, but starting small, experimenting with the technology, and gradually scaling up can help mitigate the pain and complexity of the migration process.
Engaging with larger businesses for data platforms: Identify a low-risk, non-disruptive use case for larger businesses to demonstrate value of data platforms like Model, focus on finding data scientists and ML engineers eager for a more streamlined solution, and expand platform capabilities to take over more tasks and simplify the data landscape.
When it comes to engaging with larger businesses for implementing data platforms like Model, the conversations and approaches can differ significantly compared to smaller organizations. Existing data platforms, security compliance, and various stakeholders with distinct roles and priorities are common factors in these larger-scale conversations. To initiate such projects, it's crucial to identify a low-risk, non-critical path use case that can demonstrate value without causing disruption to the business. This could involve finding data scientists and machine learning engineers who feel underserved by the current platform and are eager for a more streamlined solution. Model, as a platform, currently has a Python-centric development workflow, and the question of whether it will expand to support a broader range of jobs and apps remains open. Modal's founder envisions a long-term goal of creating general-purpose tools for data teams, but in practice, the focus is on finding a resonating use case that drives growth and validates demand. An obvious next step for Model is to make fine-tuning and training easier within the platform, as well as addressing pre-processing, scheduling retraining, and data movement. With the demand for integrated solutions increasing, expanding the platform's capabilities to take over more tasks and simplify the data landscape is a logical progression.
Modal's vision for fewer data vendors: Modal focuses on improving ergonomics of their SDK, plans to support multiple languages, and aims to serve serverless workloads with low latency.
Modal, a data platform, recognizes the need for consolidation or defragmentation in the data vendor space and aims for long-term vision of fewer vendors doing more. They currently focus on Python due to its widespread use among data teams but plan to support other languages like Rust, TypeScript, R, and Go in the future. Modal sees the edge computing market as primarily useful for very latency-sensitive applications, but they are not planning to dominate that space, instead focusing on serverless workloads with a few hundred milliseconds latency. The team is currently focused on improving the ergonomics of their SDK, making it intuitive for users to express distributed programs in the cloud. Over the next year, they are excited about making progress in this area and continuing to expand their offerings to better serve the data community.
Improving user experience and scalability for Model: Model team is focusing on enhancing user experience and scalability, addressing file systems, GPU support, containerization, expanding use cases to training and parallelization, and prioritizing security and compliance for enterprise clients, currently in a waitlist phase for sign-ups.
The team behind Model is focusing on improving the user experience and scalability of their platform, especially for running models in notebooks and for enterprise customers. They are working on issues related to file systems, GPU support, and containerization, among other things. They are also planning to expand the use cases of Model beyond online inference to include training and parallelization. The team is putting a lot of effort into security and compliance to attract more enterprise clients. They are currently in a waitlist phase for sign-ups due to the need to build a robust back-end infrastructure. The team is excited about the progress they are making and appreciates the support of their community.

Recent Episodes from Practical AI: Machine Learning, Data Science

Stanford's AI Index Report 2024

We’ve had representatives from Stanford’s Institute for Human-Centered Artificial Intelligence (HAI) on the show in the past, but we were super excited to talk through their 2024 AI Index Report after such a crazy year in AI! Nestor from HAI joins us in this episode to talk about some of the main takeaways including how AI makes workers more productive, the US is increasing regulations sharply, and industry continues to dominate frontier AI research.

Practical AI: Machine Learning, Data Science

en-usJuly 02, 2024

On this page

End-to-end cloud compute for AI/ML

Practical AI: Machine Learning, Data Science

Podcast Summary

Recent Episodes from Practical AI: Machine Learning, Data Science

Stanford's AI Index Report 2024

Apple Intelligence & Advanced RAG

The perplexities of information retrieval

Using edge models to find sensitive data

Rise of the AI PC & local LLMs

AI in the U.S. Congress

First impressions of GPT-4o

Full-stack approach for effective AI agents

Autonomous fighter jets?!

Private, open source chat UIs

Related Episodes

When data leakage turns into a flood of trouble

Stable Diffusion (Practical AI #193)

AlphaFold is revolutionizing biology

The nose knows

Zero-shot multitask learning (Practical AI #158)