Logo
    Search

    Podcast Summary

    • Simplifying cloud compute for data teamsModal, founded by Eric Bernardson, aims to make it easier for data teams to deploy and manage their code in the cloud by focusing on the foundational compute problem.

      Eric Bernardson, the founder of Modal, identified a common problem in the data community: making it easy for data teams to deploy and run their code in the cloud. He realized that starting at the foundational level, specifically addressing the compute problem, could provide a solid solution. Modal, an end-to-end stack for cloud compute, was born out of this need. Eric shared his background in working with data for over a decade, building various systems at companies like Spotify and Better. He noticed that data teams were struggling with the complexities of cloud compute, from scaling code to scheduling tasks to setting up web endpoints. By focusing on this issue, Modal aims to simplify the process, making it easier for data teams to deploy and manage their code in the cloud. The company has been particularly focused on online inference for the past six months. Eric's passion for addressing the challenges faced by data teams led him to start Modal, and his demo during the podcast left a lasting impression on the audience.

    • Bringing infrastructure into the innermost code loopModal helps data teams write and run code locally, then instantly in the cloud, eliminating slow feedback loops and improving productivity and happiness.

      Modal focuses on improving developer productivity and happiness in data teams by bringing the infrastructure into the innermost code writing loop. Instead of deploying containers to the cloud and dealing with slow feedback loops, developers can write code locally and have it run in the cloud almost instantly. This was achieved by building their own container runtime, file system, and container builder. The benefit is a faster iteration speed and a more seamless experience for data teams, who often deal with large datasets and the need to run code in production. The goal was to eliminate the outer loops and focus on the inner loop of writing and immediately running code. This approach was not possible with existing solutions like Kubernetes or AWS Lambda, so Modal went deep into building the foundational infrastructure to make it work.

    • Challenges with traditional container platformsTo improve iteration speed and efficiency for machine learning and AI use cases, consider building a custom solution using lower-level primitives instead of traditional container platforms like Kubernetes, AWS Lambda, or Docker.

      The traditional approaches to running applications in containers using platforms like Kubernetes, AWS Lambda, or Docker, have limitations when it comes to iteration speed and efficiency. For instance, the process of pushing a container image to a registry, then starting a job in Kubernetes, can be time-consuming and inefficient due to the need to duplicate and transfer large amounts of data. Docker, while revolutionary, also has inefficiencies with its layered file system and the duplication of unnecessary data. AWS Lambda, while faster than Kubernetes or Docker, still has limitations such as not supporting GPU or long-running jobs. To address these challenges, the team behind the project decided to build their solution using lower-level primitives and building much of it themselves. This approach is particularly well-suited for use cases around machine learning and AI, where rapid iteration and efficient use of resources are crucial. Overall, the team's experience highlights the importance of considering the specific needs of your use case when choosing a technology stack.

    • Serverless Machine Learning Platforms like Model offer benefits for deploying ML modelsServerless ML platforms provide cost-effective, easy-to-use alternatives to traditional infrastructure for ML model deployment, supporting various use cases beyond online inference.

      Serverless machine learning platforms like Model offer significant benefits for deploying machine learning models, particularly for online inference, due to their cost-effectiveness and ease of use. These platforms provide an alternative to traditional infrastructure like Kubernetes with EC2, where users have to spin up and manage idle instances for model inference. Serverless models like Model also support a wide range of use cases beyond online inference, including batch processing, web scraping, computational biotech, simulations, and backtesting. The user experience for online inference is currently excellent, while the experience for batch processing and parallelism is good but not yet optimal. Model also aims to improve its support for data pipelines and scheduling in the future. The speaker, who is a fan and user of Model, described how users can write Python scripts with functions and dependencies, making it easier to manage compared to traditional AI and ML workflows. Overall, the shift towards serverless machine learning platforms is driven by the desire to avoid infrastructure management and the cost savings they offer.

    • Write, Run, and Deploy Code with Modal's Self-Provisioning RuntimeModal allows developers to write, run, and deploy code in the cloud, eliminating the need for external configuration files and offering a faster feedback loop and consistent development experience through self-provisioning and importing dependencies as function arguments.

      With Modal, you can write your code as if you're running it locally, but instead, it provisions and runs the necessary containers in the cloud. This means you can define the infrastructure and dependencies your functions need directly in your code, eliminating the need for external configuration files. This approach offers the developer productivity of local development and the power of the cloud, all within the same loop. By removing the need for maintaining separate local environments and the hassle of container building and pushing, Modal provides a faster feedback loop and a more consistent development experience. This idea of infrastructure and app code being intertwined is a concept further explored in the blog post "The Self Provisioning Runtime" by Swyx. With Modal, you can import necessary dependencies as function arguments and never have to worry about installing them locally. This results in a more efficient and streamlined development process.

    • Simplified Python function deployment with ModalModal simplifies Python function deployment, enabling users to test and run functions with minimal imports, and handle containerization, deployment, and scaling, making it suitable for various applications beyond machine learning.

      Modal offers a simplified workflow for running Python functions, including machine learning models, in the cloud. Users can directly test and run functions with minimal imports, and Modal handles containerization, deployment, and scaling. Surprisingly, users have utilized Modal for various use cases beyond machine learning, such as web app development and job queues. The platform's focus on improving online inference and reducing startup times reflects these new use cases. Eric provided an example of a typical workflow: create a Python function, push it to Modal, and receive a URL to call the function. This simplicity sets Modal apart from other platforms and makes it an attractive option for various applications.

    • Simplify Python ML model deployment with Model platformModel is a cloud-based platform that simplifies Python ML model deployment by allowing users to decorate functions with special decorators, define images using a special model syntax, and run the function in the cloud using commands like 'model deploy' or 'model run'.

      Model, a cloud-based platform, simplifies the process of deploying and running Python functions using machine learning models, such as those from Hugging Face, by allowing users to decorate their existing functions with special decorators and define images using a special model syntax. This can be done entirely in code, and the platform supports installing various Python packages like transformers, accelerate, and diffusers. Users can run the function in the cloud using commands like "model deploy" or "model run," which builds a container if it doesn't already exist and runs the code. The platform supports running any Python function and allows users to install additional packages and manipulate images, among other advanced features. The platform is designed to provide a "magic" first experience for users, making it easy to install the Python package, set up a token, and run code immediately in the cloud. A notable feature is the seamless integration of development and deployment, allowing users to modify their code and test it directly in the terminal without having to switch between their code editor and terminal. This feedback loop and the ability to quickly iterate is particularly valuable for CTOs managing multiple teams and disciplines in software engineering.

    • Hot reloading in front end engineeringHot reloading offers fast feedback loops for developers, enabling them to save code and see changes live in the cloud, achieved through monitoring file systems and reloading apps when updated. Although the underlying mechanism is not complex, advanced container and file system technology is crucial for a smooth experience.

      Front end engineering, specifically in the context of modern toolchains, offers advanced feedback loops for software development, providing engineers with a super snappy experience that makes them happy. This is achieved through hot reloading, which allows developers to save their code and see the changes live in the cloud. Model, a web serving part of this toolchain, implements this functionality by monitoring the file system and reloading the entire app when any file is updated. Although the underlying mechanism is not complex, the foundation of fast container and file system technology is crucial to making this process relatively easy. For larger companies with existing infrastructure, migrating to such a system can be challenging, but starting small, experimenting with the technology, and gradually scaling up can help mitigate the pain and complexity of the migration process.

    • Engaging with larger businesses for data platformsIdentify a low-risk, non-disruptive use case for larger businesses to demonstrate value of data platforms like Model, focus on finding data scientists and ML engineers eager for a more streamlined solution, and expand platform capabilities to take over more tasks and simplify the data landscape.

      When it comes to engaging with larger businesses for implementing data platforms like Model, the conversations and approaches can differ significantly compared to smaller organizations. Existing data platforms, security compliance, and various stakeholders with distinct roles and priorities are common factors in these larger-scale conversations. To initiate such projects, it's crucial to identify a low-risk, non-critical path use case that can demonstrate value without causing disruption to the business. This could involve finding data scientists and machine learning engineers who feel underserved by the current platform and are eager for a more streamlined solution. Model, as a platform, currently has a Python-centric development workflow, and the question of whether it will expand to support a broader range of jobs and apps remains open. Modal's founder envisions a long-term goal of creating general-purpose tools for data teams, but in practice, the focus is on finding a resonating use case that drives growth and validates demand. An obvious next step for Model is to make fine-tuning and training easier within the platform, as well as addressing pre-processing, scheduling retraining, and data movement. With the demand for integrated solutions increasing, expanding the platform's capabilities to take over more tasks and simplify the data landscape is a logical progression.

    • Modal's vision for fewer data vendorsModal focuses on improving ergonomics of their SDK, plans to support multiple languages, and aims to serve serverless workloads with low latency.

      Modal, a data platform, recognizes the need for consolidation or defragmentation in the data vendor space and aims for long-term vision of fewer vendors doing more. They currently focus on Python due to its widespread use among data teams but plan to support other languages like Rust, TypeScript, R, and Go in the future. Modal sees the edge computing market as primarily useful for very latency-sensitive applications, but they are not planning to dominate that space, instead focusing on serverless workloads with a few hundred milliseconds latency. The team is currently focused on improving the ergonomics of their SDK, making it intuitive for users to express distributed programs in the cloud. Over the next year, they are excited about making progress in this area and continuing to expand their offerings to better serve the data community.

    • Improving user experience and scalability for ModelModel team is focusing on enhancing user experience and scalability, addressing file systems, GPU support, containerization, expanding use cases to training and parallelization, and prioritizing security and compliance for enterprise clients, currently in a waitlist phase for sign-ups.

      The team behind Model is focusing on improving the user experience and scalability of their platform, especially for running models in notebooks and for enterprise customers. They are working on issues related to file systems, GPU support, and containerization, among other things. They are also planning to expand the use cases of Model beyond online inference to include training and parallelization. The team is putting a lot of effort into security and compliance to attract more enterprise clients. They are currently in a waitlist phase for sign-ups due to the need to build a robust back-end infrastructure. The team is excited about the progress they are making and appreciates the support of their community.

    Recent Episodes from Practical AI: Machine Learning, Data Science

    Stanford's AI Index Report 2024

    Stanford's AI Index Report 2024
    We’ve had representatives from Stanford’s Institute for Human-Centered Artificial Intelligence (HAI) on the show in the past, but we were super excited to talk through their 2024 AI Index Report after such a crazy year in AI! Nestor from HAI joins us in this episode to talk about some of the main takeaways including how AI makes workers more productive, the US is increasing regulations sharply, and industry continues to dominate frontier AI research.

    Apple Intelligence & Advanced RAG

    Apple Intelligence & Advanced RAG
    Daniel & Chris engage in an impromptu discussion of the state of AI in the enterprise. Then they dive into the recent Apple Intelligence announcement to explore its implications. Finally, Daniel leads a deep dive into a new topic - Advanced RAG - covering everything you need to know to be practical & productive.

    The perplexities of information retrieval

    The perplexities of information retrieval
    Daniel & Chris sit down with Denis Yarats, Co-founder & CTO at Perplexity, to discuss Perplexity’s sophisticated AI-driven answer engine. Denis outlines some of the deficiencies in search engines, and how Perplexity’s approach to information retrieval improves on traditional search engine systems, with a focus on accuracy and validation of the information provided.

    Using edge models to find sensitive data

    Using edge models to find sensitive data
    We’ve all heard about breaches of privacy and leaks of private health information (PHI). For healthcare providers and those storing this data, knowing where all the sensitive data is stored is non-trivial. Ramin, from Tausight, joins us to discuss how they have deploy edge AI models to help company search through billions of records for PHI.

    Rise of the AI PC & local LLMs

    Rise of the AI PC & local LLMs
    We’ve seen a rise in interest recently and a number of major announcements related to local LLMs and AI PCs. NVIDIA, Apple, and Intel are getting into this along with models like the Phi family from Microsoft. In this episode, we dig into local AI tooling, frameworks, and optimizations to help you navigate this AI niche, and we talk about how this might impact AI adoption in the longer term.

    AI in the U.S. Congress

    AI in the U.S. Congress
    At the age of 72, U.S. Representative Don Beyer of Virginia enrolled at GMU to pursue a Master’s degree in C.S. with a concentration in Machine Learning. Rep. Beyer is Vice Chair of the bipartisan Artificial Intelligence Caucus & Vice Chair of the NDC’s AI Working Group. He is the author of the AI Foundation Model Transparency Act & a lead cosponsor of the CREATE AI Act, the Federal Artificial Intelligence Risk Management Act & the Artificial Intelligence Environmental Impacts Act. We hope you tune into this inspiring, nonpartisan conversation with Rep. Beyer about his decision to dive into the deep end of the AI pool & his leadership in bringing that expertise to Capitol Hill.

    Full-stack approach for effective AI agents

    Full-stack approach for effective AI agents
    There’s a lot of hype about AI agents right now, but developing robust agents isn’t yet a reality in general. Imbue is leading the way towards more robust agents by taking a full-stack approach; from hardware innovations through to user interface. In this episode, Josh, Imbue’s CTO, tell us more about their approach and some of what they have learned along the way.

    Private, open source chat UIs

    Private, open source chat UIs
    We recently gathered some Practical AI listeners for a live webinar with Danny from LibreChat to discuss the future of private, open source chat UIs. During the discussion we hear about the motivations behind LibreChat, why enterprise users are hosting their own chat UIs, and how Danny (and the LibreChat community) is creating amazing features (like RAG and plugins).

    Related Episodes

    When data leakage turns into a flood of trouble

    When data leakage turns into a flood of trouble
    Rajiv Shah teaches Daniel and Chris about data leakage, and its major impact upon machine learning models. It’s the kind of topic that we don’t often think about, but which can ruin our results. Raj discusses how to use activation maps and image embedding to find leakage, so that leaking information in our test set does not find its way into our training set.

    Stable Diffusion (Practical AI #193)

    Stable Diffusion (Practical AI #193)
    The new stable diffusion model is everywhere! Of course you can use this model to quickly and easily create amazing, dream-like images to post on twitter, reddit, discord, etc., but this technology is also poised to be used in very pragmatic ways across industry. In this episode, Chris and Daniel take a deep dive into all things stable diffusion. They discuss the motivations for the work, the model architecture, and the differences between this model and other related releases (e.g., DALL·E 2). (Image from stability.ai)

    AlphaFold is revolutionizing biology

    AlphaFold is revolutionizing biology
    AlphaFold is an AI system developed by DeepMind that predicts a protein’s 3D structure from its amino acid sequence. It regularly achieves accuracy competitive with experiment, and is accelerating research in nearly every field of biology. Daniel and Chris delve into protein folding, and explore the implications of this revolutionary and hugely impactful application of AI.

    Zero-shot multitask learning (Practical AI #158)

    Zero-shot multitask learning (Practical AI #158)
    In this Fully-Connected episode, Daniel and Chris ponder whether in-person AI conferences are on the verge of making a post-pandemic comeback. Then on to BigScience from Hugging Face, a year-long research workshop on large multilingual models and datasets. Specifically they dive into the T0, a series of natural language processing (NLP) AI models specifically trained for researching zero-shot multitask learning. Daniel provides a brief tour of the possible with the T0 family. They finish up with a couple of new learning resources.