Logo
    Search

    Podcast Summary

    • Shifting focus from data scientists to accessible ML solutionsOpen-source models and user-friendly tools are driving a shift towards accessible ML solutions for a wider audience, simplifying the deployment of ML models

      The landscape of artificial intelligence (AI) and machine learning (ML) has drastically changed since 2020, with data scientists no longer being the sole creators of ML applications. Open-source models have emerged as a game-changer, with communities like Hugging Face driving rapid advancements. Base 10, a company joined by Tuen, is focusing on simplifying the deployment of ML models, recognizing the need for solutions as the content in this field continues to grow exponentially. In essence, the focus is shifting from data scientists building ML apps to accessible, user-friendly solutions for a wider audience.

    • Hugging Face: The GitHub of AIOpen-source Hugging Face models have led to rapid advancements in AI, with commonplace solutions to complex problems and a shift towards larger models at scale.

      Hugging Face has become the GitHub of AI, serving as a go-to platform for discovering and using models. The open-source nature of Hugging Face models has led to rapid advancements, with solutions to once complex problems like transcription and multilingual support now becoming commonplace. The chatbot moment of AI, represented by models like ChatGPT, has raised the stakes, as consumers and developers expect fast, efficient models. This shift has created an infrastructure opportunity, as the need to run these models at scale becomes a necessity. Additionally, the focus has moved from small, in-memory models to larger models, which have become the norm. The pace of change in the industry is remarkable, with many in the general public just beginning to explore AI through tools like ChatGPT, while the industry has already come a long way in a short time.

    • Transitioning from closed APIs to larger modelsAddressing infrastructure challenges and product concerns is crucial when transitioning from closed APIs to larger models in production.

      Transitioning from using closed APIs to running larger models requires addressing both the infrastructure challenges and product concerns. The infrastructure challenge involves the ability to quickly iterate with various model types and handle the complexities of model hosting and workflow. The product concerns include latency, throughput, costs, data privacy, security, orchestration across hardware and clouds, benchmarking, and implementing necessary guardrails. These issues can be compared to flying a drone in manual mode after relying on autopilot - it may seem simple at first, but the consequences can be significant. Running models in production is a complex task that requires careful consideration and planning.

    • Deploying ML models in production involves more than just running the modelTo deploy ML models in production, infrastructure work is needed for containerization, secure serving, and workflow management, requiring expertise in areas like Kubernetes and security.

      Deploying a machine learning model into production involves more than just running the model itself. It requires significant infrastructure work to containerize the model, set up a scalable and secure serving layer, and manage the workflow and observability around the model. This process can be complex and time-consuming, requiring expertise in areas like Kubernetes and security. Additionally, ensuring the model does not produce incorrect or sensitive outputs is crucial. You can save time and resources by leveraging services that abstract away these complexities and provide first-class treatment for your models, allowing you to focus on developing and improving your machine learning models.

    • Base 10: A Compelling Alternative to Open APIs for EngineersBase 10, a self-hosted solution, offers engineers more control, cost savings, and data privacy compared to open APIs. Its one-line deployment process and ease of use make it attractive for custom models and data privacy concerns.

      Base 10 is gaining popularity among various personas, including engineers, as it offers more control, cost savings, and data privacy compared to open APIs. For engineers, especially those working on custom models or dealing with data privacy concerns, Base 10's self-hosting feature is particularly attractive. The approach is designed to be easy for application developers to use, with a one-line deployment process. Base 10 aims to strike a balance between ease of use and control, allowing users to treat their models as first-class assets with proper monitoring and logging. The infrastructure is designed to scale, with customers able to deploy Base 10 within their own VPCs, keeping their data within their accepted boundaries. The conversation also touched on the long tail of use cases, where fine-tuning models or dealing with specific modalities is required, and the benefits of self-hosting become even more pronounced. Overall, Base 10's approach offers a compelling alternative to open APIs for those seeking more control, cost savings, and data privacy.

    • Use Trust library for model deployment with added features and workflowTrust library offers a defined way to publish and manage model versions, providing features like scaling, observability, logging, and hardware management for easier and more efficient model deployment

      Using an open-source library like Trust can provide both flexibility and structure when developing and deploying machine learning models. With Trust, you write a Python class with load and predict functions, allowing you to compile and include pre-processing and post-processing functions. Trust is open-source and can be deployed in various ways, including Base Ten's hosted infrastructure. Compared to running a model in a simple API on your own cloud infrastructure, using Trust in Base Ten offers more features and workflow. While you can still run the model and get an output with an API, the depth of features comes from the added functionality like scaling, observability, logging, and hardware management. The real value, however, is the workflow. Trust provides a defined way to publish and manage model versions, allowing for A/B testing and easy rollbacks, saving developers from the grunt work of finding and managing their API files in production.

    • Managed machine learning solutions reduce time and effortBase10 allows teams to focus on business by easily replicating and improving APIs in just two days, offering benefits of ease of use and scalability.

      Using a managed solution like Base10 for deploying and scaling machine learning models can significantly reduce the time and effort required to get a product up and running, allowing teams to focus on other aspects of their business. The speaker shared an example of a late-stage startup that was able to replicate and improve upon their existing API in just two days using Base10. The ease of use and ability to scale are the two primary benefits that Base10 offers to its customers. Additionally, the speaker highlighted the opportunity in the machine learning infrastructure space, as new user stories and requirements are emerging that were not important a year ago. The entire new stack, from fine-tuning to observability, is ripe for innovation. The speaker also mentioned Mosaic as an interesting company to watch in this space, as they were acquired by Databricks for their expertise in model training and deployment. Overall, the use of managed solutions for machine learning deployments is becoming increasingly important due to the need to move fast in the market and the talent constraint.

    • The Future of Edge Computing and Optimized Machine Learning ModelsAs technology advances and more devices become AI-capable, there's potential for growth in edge computing and optimized machine learning models. However, challenges remain, such as generalization at the operating system level. Companies with cloud expertise may be well-positioned to expand into edge computing.

      While there is growing interest and research in running machine learning models at the edge and optimizing them for less hardware, the trend is still in its experimental phase. There are not yet many examples of these models being used in production. However, as technology continues to advance and more devices become AI-capable, there is significant potential for growth in this area. Companies that have built up expertise and experience in the cloud may be well-positioned to expand into edge computing and bring their models to millions of devices. However, challenges remain, such as the need for generalization at the operating system level to support various devices. Overall, the future of edge computing and optimized machine learning models is an intriguing and ripe area for innovation.

    • Focusing on infrastructure challenges of model hostingBase 10 is addressing infrastructure issues of model hosting, such as supporting various frameworks and bringing own compute, with features like multi-cluster capabilities and cost support.

      For many users, separating out the infrastructure concern of model hosting can be a useful way to approach things. This is particularly true for those who may not have the expertise to run Kubernetes in their own infrastructure or those who are dealing with edge models. However, there are still challenges to be addressed in the infrastructure side of model hosting, such as supporting various frameworks and bringing one's own compute to the platform. Base 10 is focusing on these areas, including the development of multi-cluster capabilities, which will allow users to bring their own compute to the platform, and the addition of cost support for various frameworks. Additionally, there is a growing interest in fine-tuning models, but more control is needed in this area. Overall, the future of model hosting infrastructure is exciting, with a focus on flexibility, control, and cost efficiency.

    • AWS Base10: Managing Machine Learning Models in a Multi-Cloud EnvironmentAWS Base10 is a new service that simplifies managing machine learning models across multiple cloud providers by offering a control plane for deployment, data collection, and fine-tuning. It also integrates with open AI endpoints and provides opportunities for AI/ML tooling developers.

      AWS's Base10, a new service for managing machine learning models, offers significant benefits for businesses operating in a multi-cloud environment. By providing a control plane for model deployment across various cloud clusters, Base10 alleviates the challenge of managing different cloud providers. Additionally, it enables data collection and fine-tuning, making it an attractive solution for enterprises in various industries. The service also offers opportunities for those building tooling in the AI/ML space. As the future lies in a multi-cloud world, Base10's capabilities can lead to increased efficiency and flexibility for businesses. Furthermore, the integration of open AI endpoints and fine-tuning options, such as Mistral or LAMA, can lead to improved model performance. Overall, Base10 represents an exciting development for businesses seeking to optimize their machine learning operations in a multi-cloud landscape.

    Recent Episodes from Practical AI: Machine Learning, Data Science

    Apple Intelligence & Advanced RAG

    Apple Intelligence & Advanced RAG
    Daniel & Chris engage in an impromptu discussion of the state of AI in the enterprise. Then they dive into the recent Apple Intelligence announcement to explore its implications. Finally, Daniel leads a deep dive into a new topic - Advanced RAG - covering everything you need to know to be practical & productive.

    The perplexities of information retrieval

    The perplexities of information retrieval
    Daniel & Chris sit down with Denis Yarats, Co-founder & CTO at Perplexity, to discuss Perplexity’s sophisticated AI-driven answer engine. Denis outlines some of the deficiencies in search engines, and how Perplexity’s approach to information retrieval improves on traditional search engine systems, with a focus on accuracy and validation of the information provided.

    Using edge models to find sensitive data

    Using edge models to find sensitive data
    We’ve all heard about breaches of privacy and leaks of private health information (PHI). For healthcare providers and those storing this data, knowing where all the sensitive data is stored is non-trivial. Ramin, from Tausight, joins us to discuss how they have deploy edge AI models to help company search through billions of records for PHI.

    Rise of the AI PC & local LLMs

    Rise of the AI PC & local LLMs
    We’ve seen a rise in interest recently and a number of major announcements related to local LLMs and AI PCs. NVIDIA, Apple, and Intel are getting into this along with models like the Phi family from Microsoft. In this episode, we dig into local AI tooling, frameworks, and optimizations to help you navigate this AI niche, and we talk about how this might impact AI adoption in the longer term.

    AI in the U.S. Congress

    AI in the U.S. Congress
    At the age of 72, U.S. Representative Don Beyer of Virginia enrolled at GMU to pursue a Master’s degree in C.S. with a concentration in Machine Learning. Rep. Beyer is Vice Chair of the bipartisan Artificial Intelligence Caucus & Vice Chair of the NDC’s AI Working Group. He is the author of the AI Foundation Model Transparency Act & a lead cosponsor of the CREATE AI Act, the Federal Artificial Intelligence Risk Management Act & the Artificial Intelligence Environmental Impacts Act. We hope you tune into this inspiring, nonpartisan conversation with Rep. Beyer about his decision to dive into the deep end of the AI pool & his leadership in bringing that expertise to Capitol Hill.

    Full-stack approach for effective AI agents

    Full-stack approach for effective AI agents
    There’s a lot of hype about AI agents right now, but developing robust agents isn’t yet a reality in general. Imbue is leading the way towards more robust agents by taking a full-stack approach; from hardware innovations through to user interface. In this episode, Josh, Imbue’s CTO, tell us more about their approach and some of what they have learned along the way.

    Private, open source chat UIs

    Private, open source chat UIs
    We recently gathered some Practical AI listeners for a live webinar with Danny from LibreChat to discuss the future of private, open source chat UIs. During the discussion we hear about the motivations behind LibreChat, why enterprise users are hosting their own chat UIs, and how Danny (and the LibreChat community) is creating amazing features (like RAG and plugins).

    Mamba & Jamba

    Mamba & Jamba
    First there was Mamba… now there is Jamba from AI21. This is a model that combines the best non-transformer goodness of Mamba with good ‘ol attention layers. This results in a highly performant and efficient model that AI21 has open sourced! We hear all about it (along with a variety of other LLM things) from AI21’s co-founder Yoav.

    Related Episodes

    When data leakage turns into a flood of trouble

    When data leakage turns into a flood of trouble
    Rajiv Shah teaches Daniel and Chris about data leakage, and its major impact upon machine learning models. It’s the kind of topic that we don’t often think about, but which can ruin our results. Raj discusses how to use activation maps and image embedding to find leakage, so that leaking information in our test set does not find its way into our training set.

    Stable Diffusion (Practical AI #193)

    Stable Diffusion (Practical AI #193)
    The new stable diffusion model is everywhere! Of course you can use this model to quickly and easily create amazing, dream-like images to post on twitter, reddit, discord, etc., but this technology is also poised to be used in very pragmatic ways across industry. In this episode, Chris and Daniel take a deep dive into all things stable diffusion. They discuss the motivations for the work, the model architecture, and the differences between this model and other related releases (e.g., DALL·E 2). (Image from stability.ai)

    AlphaFold is revolutionizing biology

    AlphaFold is revolutionizing biology
    AlphaFold is an AI system developed by DeepMind that predicts a protein’s 3D structure from its amino acid sequence. It regularly achieves accuracy competitive with experiment, and is accelerating research in nearly every field of biology. Daniel and Chris delve into protein folding, and explore the implications of this revolutionary and hugely impactful application of AI.

    Zero-shot multitask learning (Practical AI #158)

    Zero-shot multitask learning (Practical AI #158)
    In this Fully-Connected episode, Daniel and Chris ponder whether in-person AI conferences are on the verge of making a post-pandemic comeback. Then on to BigScience from Hugging Face, a year-long research workshop on large multilingual models and datasets. Specifically they dive into the T0, a series of natural language processing (NLP) AI models specifically trained for researching zero-shot multitask learning. Daniel provides a brief tour of the possible with the T0 family. They finish up with a couple of new learning resources.