Logo
    Search

    Podcast Summary

    • Automating Machine Learning Model DeploymentOctoML's proposal for data scientists to export models into containers for easy deployment using Apache TVM, which automates performance and accessibility on different hardware, and OctoML's progress in automating the process of getting models from data scientists into deployable artifacts.

      The current tools for model deployment in machine learning are not easily accessible to data scientists, but rather to machine learning infrastructure specialists. Luis Cessay, CEO of OctoML, proposes that with the right tools, a data scientist should be able to export their model into a well-defined container that can be handed off to existing DevOps teams and IT infrastructure, with the process from model to deployable artifact being automated. Apache TVM, a project that OctoML is involved with, has seen significant growth and progress in automating performance and making high-performance machine learning code accessible on different hardware. The TVM community has grown steadily, and the TVM conference last year saw a large turnout of contributors and industry professionals. OctoML has also made significant strides in its platform, which uses TVM as a key component to automate the deployment of machine learning models. The team has more than doubled in size, and they have recently released a private accelerated model hub, which showcases the power of the platform in automating the process of getting models from data scientists into deployable artifacts.

    • Automating Model Deployment with OctoMLOctoML simplifies the process of deploying optimized machine learning models on various hardware targets, allowing businesses to add value to their applications faster.

      The growth of model and data hubs, such as Hugging Face, has significantly impacted the machine learning landscape by making it easier for individuals and organizations to access and deploy pre-existing models. This trend has led to a shift in focus from model training to model deployment and optimization, which is where OctoML comes in. OctoML automates the process of extracting models from data scientists' notebooks and optimizing them for various hardware targets, making it easier for businesses to add value to their applications. The labor-intensive process of turning a trained model into a deployable piece of software is a crucial aspect of the machine learning workflow, and OctoML aims to streamline this process. The growth of model hubs and the increasing importance of model deployment are key developments in the machine learning industry, with implications for both researchers and practitioners. The focus on deployment is a reflection of the industry's maturation and the growing recognition that the bulk of value in machine learning comes from inference rather than model training.

    • Merging MLOps into DevOpsRecognizing ML models as software and merging MLOps into DevOps improves overall efficiency and maturity of ML development lifecycle

      Model creation and training are crucial steps in the machine learning process, but they should not be considered separate from DevOps practices. Machine learning models are an integral part of intelligent applications, and treating them as a special entity instead of just another piece of software slows down innovation. The term MLOps is evolving, and there is a growing consensus on its definition. The focus is shifting towards handling data and model creation, while other aspects such as containerization, deployment, monitoring, and CICD integrations should be considered part of DevOps. By recognizing this and merging MLOps into DevOps, we can improve the overall efficiency and maturity of the machine learning development lifecycle.

    • MLOps evolving towards specialized tools for each ML workflow stepMLOps is shifting towards specialized tools for data handling, network architecture search, model packaging, and deployment, focusing on clean integration points.

      The landscape of Machine Learning Operations (MLOps) is evolving towards best-in-class solutions for each step in the machine learning workflow, with a focus on clean integration points. This shift is moving away from fully integrated platforms towards specialized tools for data handling, network architecture search, model packaging, and deployment. Our solution, for instance, automates the process of getting a working software model from Jupyter notebooks or Python scripts and deploys it in a container, making it hardware-targeted yet compatible with regular DevOps flows. With the right API, you can use GitHub actions for CICD on your model, and with the right container format, you can use existing microservices to serve your model. Furthermore, monitoring deployed models and collecting data from them is crucial, but the focus is shifting towards abstracting away the data handling and finding higher-level behaviors for models to debug. The human dynamics of this transition involve teams becoming the driving force behind great engineers, and the importance of testing ideas and experimenting, as exemplified by the podcast "Ship It."

    • Bridging the gap between data scientists and DevOpsAutomation enables data scientists to export models for easy deployment by DevOps teams, increasing productivity and collaboration.

      The current divide between data scientists and DevOps teams in organizations, with data scientists focusing on creating models and DevOps teams handling deployments, can lead to confusion and inefficiency due to a lack of shared knowledge and access to tools. However, with the right automation, data scientists could export their models into a deployable format, making it easier for DevOps teams to integrate and deploy models as they would with any other software. This would allow data scientists to focus on creating models and DevOps teams to focus on deploying and maintaining applications, ultimately increasing productivity and breaking down the silos between different teams. The ultimate goal is to have a seamless workflow where both teams can work together effectively without the need for specialists in both areas. This would lead to a more efficient and collaborative organization, where everyone can contribute to the development and deployment of machine learning models.

    • Automation Bridges the Gap Between ML and DevOpsAutomation in ML enables businesses to put models into production without requiring specialized ML or systems expertise, increasing accessibility and efficiency.

      The intersection of machine learning (ML) and DevOps is a complex area that requires specialized knowledge, and automation is the key to making it more accessible for businesses. ML teams need to understand the tooling around various libraries, compilers, and hardware, as well as the cost implications for large-scale cloud deployments. However, DevOps teams lack the necessary ML expertise, and data scientists lack the systems expertise. This dynamic is set to change with automation, allowing companies to put ML models into production without having to hire specialized experts. A good analogy for this is cybersecurity, where specialized knowledge was once required to understand vulnerabilities, but automation tools like Snyk now allow for better security by finding vulnerabilities in the code as it is committed. The difference is that ML requires a wider gap between those who create models and those who write system software, making the automation in ML deeper and more complex. Automation in ML will bring significant progress in producing models that can be put into production, but it's important to note that it won't prevent all issues. However, it will allow for much better performance and cost evaluation, making ML more accessible to a wider audience.

    • Manual Intervention in Machine Learning ProcessDespite advancements in automating parts of machine learning, manual optimization and deployment are still required, especially for organizations with constraints. Tools like TVM are making progress, but education should also shift towards these aspects.

      While significant progress has been made in automating parts of the machine learning process, such as model training and security vulnerability analysis, there are still areas that require manual intervention, particularly in model optimization and deployment. This is especially true for organizations dealing with constraints. For instance, a common use case is verifying images for inappropriate content in an application. Traditionally, this would involve a data scientist or machine learning creator manually optimizing the model, choosing the right libraries for deployment based on hardware, and creating an interface for integration. Tools like TVM and others are making this process easier, but there's still a need for automation to go from uploading a raw model to having a package ready for deployment with the desired interface. Furthermore, many machine learning practitioners are primarily focused on model creation and are not fully aware of the importance of model optimization, deployment, and the various ways of serving models. To address this, there's a need to shift the focus in machine learning education towards these aspects as well.

    • Focus on model creation without worrying about deployment detailsCreating accurate models is crucial, but deployment details can wait until later. Use tools like Hugging Face to streamline the process and save time and resources.

      While it's important to create accurate models in AI development, there's a significant gap between creating models and deploying them. Most models may not make it to deployment due to performance concerns. Therefore, it's crucial to keep the focus on model creation without worrying too much about deployment details. This approach allows for more creativity and innovation. Moreover, using tools like Hugging Face can help bridge the gap between model creation and deployment by making it easier to specialize models for specific use cases without worrying about the systems details until deployment. This approach can save time and resources by providing a more end-to-end learning process for students and professionals. However, it's also essential to have an understanding of what it takes to take models into production. The more automated and seamless the deployment process becomes, the faster and more productive it can be. This way, developers can test their models in real-time and iterate on them without worrying about the practicalities of deployment. In essence, the focus should be on creating accurate models without worrying about deployment details during the model creation stage. However, an understanding of the deployment process and the automation of it can lead to more efficient and productive AI development.

    • Defining a clear API is crucial for integrating models into applications during deploymentClear APIs are necessary for seamlessly integrating models into applications during deployment, despite some aspects potentially becoming low-code or no-code.

      While there is progress towards making model creation a low-code, no-code process, the deployment aspect still requires significant coding efforts. Models are an integral part of applications, forming ensembles that include various types like computer vision, language, and decision trees. These models may interact directly or indirectly through data flow or shared infrastructure. As we move towards deployment, defining a well-structured API becomes crucial for integrating models into applications. While some aspects of deployment may eventually become low-code or no-code, the need for a clear API and consideration of factors like latency and throughput necessitates a certain level of coding expertise.

    • Challenges of low and no code model deploymentDespite the growing popularity of low and no code model creation, deployment and API concerns still pose challenges. However, as best practices emerge and standardization takes hold, opportunities for automation and easier integration may arise.

      While low code and no code approaches to model creation and optimization are gaining traction, they still require a deep understanding of the systems involved and specialized people. The challenges of deployment and API concerns are still prevalent, but as best practices emerge and standardization takes hold, there may be more opportunities for automation and low to no code solutions. The key is defining where the model fits in the larger application, which can then lead to automation and easier integration. If we can successfully turn trained models into agile, performant, and reliable pieces of software, the entire process will become more automated and easier overall.

    • Future of App Dev: Seamless Integration of Data Scientists, ML Models, and DevOps TeamsIn the future, data scientists will create models without worrying about deployment, while DevOps teams easily deploy them. Focus may shift from where models run to solving problems, with automation determining where each part runs. Advancements in ML chips and power management suggest a future of abstracting away design constraints.

      The future of application development, particularly those involving machine learning, is heading towards a more seamless integration between data scientists, machine learning models, and DevOps teams. Currently, there are separation concerns between these groups due to different thought processes. However, within the next year, it's hoped that data scientists will be able to create models without worrying about deployment systems, while DevOps teams can easily deploy these models. Looking further ahead, in five years, the focus may shift away from where models run (edge or cloud) and instead, on solving the problem at hand. Application creators should be able to specify their desired outcomes, and the system should automatically determine where each part of the model should run. This automation, once perfected, would allow for higher-level problem-solving. Additionally, advancements in machine learning-designed chips and power management in large-scale data centers suggest that we're moving towards abstracting away lower-level design constraints. This future holds excitement for creating applications without worrying about the intricacies of deployment and design.

    Recent Episodes from Practical AI: Machine Learning, Data Science

    Stanford's AI Index Report 2024

    Stanford's AI Index Report 2024
    We’ve had representatives from Stanford’s Institute for Human-Centered Artificial Intelligence (HAI) on the show in the past, but we were super excited to talk through their 2024 AI Index Report after such a crazy year in AI! Nestor from HAI joins us in this episode to talk about some of the main takeaways including how AI makes workers more productive, the US is increasing regulations sharply, and industry continues to dominate frontier AI research.

    Apple Intelligence & Advanced RAG

    Apple Intelligence & Advanced RAG
    Daniel & Chris engage in an impromptu discussion of the state of AI in the enterprise. Then they dive into the recent Apple Intelligence announcement to explore its implications. Finally, Daniel leads a deep dive into a new topic - Advanced RAG - covering everything you need to know to be practical & productive.

    The perplexities of information retrieval

    The perplexities of information retrieval
    Daniel & Chris sit down with Denis Yarats, Co-founder & CTO at Perplexity, to discuss Perplexity’s sophisticated AI-driven answer engine. Denis outlines some of the deficiencies in search engines, and how Perplexity’s approach to information retrieval improves on traditional search engine systems, with a focus on accuracy and validation of the information provided.

    Using edge models to find sensitive data

    Using edge models to find sensitive data
    We’ve all heard about breaches of privacy and leaks of private health information (PHI). For healthcare providers and those storing this data, knowing where all the sensitive data is stored is non-trivial. Ramin, from Tausight, joins us to discuss how they have deploy edge AI models to help company search through billions of records for PHI.

    Rise of the AI PC & local LLMs

    Rise of the AI PC & local LLMs
    We’ve seen a rise in interest recently and a number of major announcements related to local LLMs and AI PCs. NVIDIA, Apple, and Intel are getting into this along with models like the Phi family from Microsoft. In this episode, we dig into local AI tooling, frameworks, and optimizations to help you navigate this AI niche, and we talk about how this might impact AI adoption in the longer term.

    AI in the U.S. Congress

    AI in the U.S. Congress
    At the age of 72, U.S. Representative Don Beyer of Virginia enrolled at GMU to pursue a Master’s degree in C.S. with a concentration in Machine Learning. Rep. Beyer is Vice Chair of the bipartisan Artificial Intelligence Caucus & Vice Chair of the NDC’s AI Working Group. He is the author of the AI Foundation Model Transparency Act & a lead cosponsor of the CREATE AI Act, the Federal Artificial Intelligence Risk Management Act & the Artificial Intelligence Environmental Impacts Act. We hope you tune into this inspiring, nonpartisan conversation with Rep. Beyer about his decision to dive into the deep end of the AI pool & his leadership in bringing that expertise to Capitol Hill.

    Full-stack approach for effective AI agents

    Full-stack approach for effective AI agents
    There’s a lot of hype about AI agents right now, but developing robust agents isn’t yet a reality in general. Imbue is leading the way towards more robust agents by taking a full-stack approach; from hardware innovations through to user interface. In this episode, Josh, Imbue’s CTO, tell us more about their approach and some of what they have learned along the way.

    Private, open source chat UIs

    Private, open source chat UIs
    We recently gathered some Practical AI listeners for a live webinar with Danny from LibreChat to discuss the future of private, open source chat UIs. During the discussion we hear about the motivations behind LibreChat, why enterprise users are hosting their own chat UIs, and how Danny (and the LibreChat community) is creating amazing features (like RAG and plugins).

    Related Episodes

    When data leakage turns into a flood of trouble

    When data leakage turns into a flood of trouble
    Rajiv Shah teaches Daniel and Chris about data leakage, and its major impact upon machine learning models. It’s the kind of topic that we don’t often think about, but which can ruin our results. Raj discusses how to use activation maps and image embedding to find leakage, so that leaking information in our test set does not find its way into our training set.

    Stable Diffusion (Practical AI #193)

    Stable Diffusion (Practical AI #193)
    The new stable diffusion model is everywhere! Of course you can use this model to quickly and easily create amazing, dream-like images to post on twitter, reddit, discord, etc., but this technology is also poised to be used in very pragmatic ways across industry. In this episode, Chris and Daniel take a deep dive into all things stable diffusion. They discuss the motivations for the work, the model architecture, and the differences between this model and other related releases (e.g., DALL·E 2). (Image from stability.ai)

    AlphaFold is revolutionizing biology

    AlphaFold is revolutionizing biology
    AlphaFold is an AI system developed by DeepMind that predicts a protein’s 3D structure from its amino acid sequence. It regularly achieves accuracy competitive with experiment, and is accelerating research in nearly every field of biology. Daniel and Chris delve into protein folding, and explore the implications of this revolutionary and hugely impactful application of AI.

    Zero-shot multitask learning (Practical AI #158)

    Zero-shot multitask learning (Practical AI #158)
    In this Fully-Connected episode, Daniel and Chris ponder whether in-person AI conferences are on the verge of making a post-pandemic comeback. Then on to BigScience from Hugging Face, a year-long research workshop on large multilingual models and datasets. Specifically they dive into the T0, a series of natural language processing (NLP) AI models specifically trained for researching zero-shot multitask learning. Daniel provides a brief tour of the possible with the T0 family. They finish up with a couple of new learning resources.