Logo
    Search

    Podcast Summary

    • Understanding the LLM App StackThe LLM App Stack illustrates the various components that make up the larger generative AI ecosystem, including data, models, inference, finetuning, and applications.

      While large language models (LLMs) have been making headlines lately, it's important to remember that they are just one component of a larger generative AI application stack. The model itself does not provide the functionality that users want; instead, it's the ecosystem of tooling around it that makes the application work. During this episode of Practical AI, Daniel and Chris discussed the emerging LLM app stack, which was created by Andreessen Horowitz to help illustrate the various components that make up this new ecosystem. While the picture provides a helpful framework, it's important to note that Andreessen Horowitz has investments in many of the companies highlighted in the stack. The stack includes several categories, including data, models, inference, finetuning, and applications. Data refers to the large datasets used to train LLMs. Models are the actual language models, such as LAMA 2 or Stable Diffusion. Inference is the process of making predictions based on the model. Finetuning involves fine-tuning the model for specific use cases. Applications are the end products that use LLMs, such as chatbots or content generation tools. By understanding how these different components fit together, we can gain a better appreciation for the complexity of generative AI and the various players involved in its development and deployment.

    • Exploring Generative AI through Playgrounds and App HostingPlaygrounds offer a user-friendly interface for experimenting with generative AI, while app hosting enables the creation and deployment of more complex applications.

      Generative AI exploration often begins in "playgrounds," interactive platforms where users can experiment with models through a UI. These platforms, which can be found in various organizations and cloud providers, offer a browser-based interface for users to test out new features or topics without the need for extensive resources or hardware. Examples include chatGPT, Hugging Face, OpenAI, and ClipDrop. These playgrounds provide a valuable space for users to familiarize themselves with generative AI technology and its capabilities. Moreover, another category within the generative AI app stack is "app hosting." This term refers to the hosting and deployment of applications built using generative AI technology. While playgrounds are primarily focused on experimentation, app hosting allows for the creation and implementation of more complex applications. Both playgrounds and app hosting serve as essential components of the generative AI app stack, offering users a range of opportunities to explore, learn, and build with generative AI technology.

    • Merging Model Hosting and App Hosting for Seamless AI Application DevelopmentThe convergence of model hosting and app hosting, along with the addition of a convenience orchestration layer, simplifies AI application development and deployment for developers.

      We're witnessing a convergence of model hosting and app hosting in the world of AI development. Traditional hosting providers like Amazon ECS and newer platforms like Vercel are being used to host both applications and AI models. This merging of hosting categories is making it more manageable for developers to build and deploy AI applications. Previously, there was a clear distinction between the model and the app. Developers would create a "playground" to illustrate LLM functionality, but the actual app used by users would be separate and require hosting. However, the emerging generative AI stack is different. It includes a layer of orchestration, which is not the same as traditional orchestration tools like Kubernetes. Instead, it functions as a convenience layer that simplifies the interaction with models. For instance, when using a language model for question-answering, developers need to provide context for the question and insert it into a prompt before sending it to the model. This orchestration layer handles these tasks, making the interaction with the model more seamless. This convenience layer is a significant difference between the traditional non-AI stack and the emerging generative AI stack. Overall, the merging of model hosting and app hosting, along with the addition of a convenience orchestration layer, is making it easier for developers to build and deploy AI applications.

    • Bridging the gap between data and modelsOrchestration is the software and tools that wrap around AI models to make them usable and productive, including prompt templates, chains of prompts, agents, plugins, and orchestration tooling. It acts as a bridge between data and models, enabling efficient and effective use.

      "orchestration" in the context of AI models refers to the software and tools that wrap around the model to make it usable and productive. This includes prompt templates, chains of prompts, agents, plugins, and orchestration tooling. The term "orchestration" is a loaded word that encompasses various functions, from manual prompt templating to automation and API calls. The first layer of orchestration can be seen as DIY (Do-It-Yourself) convenience functionality built around Language Models (LLMs). An example of this is Python scripts. However, a more comprehensive solution is offered by platforms like Langchain. Langchain's orchestration functionality can be broken down into several categories. First, there's templating, which includes prompt templates and chaining. Templating allows for manual setup of a chain of calls in one call. Second, there's automation, which includes agents and other tools that automate functionality around calling LLMs or other generative AI models. Third, there are APIs and plugins. Lastly, there's maintenance, which includes logging, caching, and other tasks to keep the system running. In essence, orchestration serves as a bridge between the data or resource side and the model side. It's the layer that connects the two and enables the efficient and effective use of AI models. By understanding the different components of orchestration, we can gain a deeper appreciation for how AI models are used in practice.

    • Orchestrating resources for building an app with generative AIEffectively utilizing APIs, platforms, data, and vector databases is crucial for building apps in the generative AI space. Understanding tools for data pipelining and vector databases is important.

      Building an app using the latest generative AI technology involves orchestrating connections to various resources, which can include APIs, platforms like Zapier or Wolfram Alpha, and your own data or data pipelines. APIs can provide convenient integrations for tasks like Google searches, while your own data can come from traditional sources like databases or unstructured data. A unique aspect of this new app stack is the embedding and vector database piece, which allows for efficient storage and retrieval of vectors or high-dimensional data. This technology is becoming increasingly important as computer vision advances have made CAPTCHAs obsolete, leaving developers seeking alternative methods to differentiate between robots and humans. The discussion also touched upon the use of tools like Databricks, Airflow, and Packaderm for data pipelining, as well as the importance of having a solid understanding of vector databases. Overall, building an app in the generative AI space requires a strong foundation in orchestrating various resources and effectively utilizing the latest technologies like vector databases.

    • Utilizing generative AI models effectively with data discovery through embedding searchTo effectively use generative AI models, find relevant data using embedding search on existing databases. Choose the right embedding model for the task, evaluate performance, and consider size, speed, and benchmarks.

      To effectively utilize generative AI models, it's crucial to find relevant data for user queries and incorporate it into the model's calls for various applications like chat, question answering, image generation, or video generation. To discover pertinent data, an embedding search on existing data using vector databases is an emerging approach. This method involves an embedding model to create vectors for data and a vector database for semantic searches. The choice of embedding model significantly impacts the performance, with different models excelling in various tasks. For instance, image problems may require pre-trained feature extractor models, while text-only tasks have numerous options. Evaluating model performance on leaderboards like Hugging Face can guide decisions. Sentence transformers, a popular tool for creating text embeddings, also provides benchmarked options. Considering both performance metrics and model size and speed is essential when dealing with large datasets.

    • Considering factors for large-scale embedding projectsChoosing the right embedding size, optimizing database architecture, and prioritizing performance aspects like input or query speed are crucial for successful large-scale embedding projects.

      Implementing large-scale embedding projects, particularly with PDFs or other data types, can be time-consuming and resource-intensive. The speed and size of the embeddings, as well as the underlying database architecture, significantly impact the process. Vendors prioritize different aspects of their vector databases, such as data input speed or query speed, which can influence the overall performance. The size and complexity of the retrieval problem also play a role in determining the necessary embedding size and optimization. It's essential to consider these factors when planning and implementing embedding projects, as the choices made can have significant consequences for both performance and resource usage. The field is still evolving, with new practices and optimizations emerging regularly.

    • Optimizing AI performance with model middlewareModel middleware functions like caching, logging, and validation help optimize model performance, improve data management, and ensure data quality in AI systems.

      In the context of generative AI systems, there are three interconnected components: the application side, the data and resources side, and the model side. The model side is further broken down into model hosting and model middleware. Model middleware includes functions like caching, logging, and validation, which sit between the orchestration layer and the model hosting. Caching is a technique used to store frequently accessed data, such as model responses, in memory to reduce the number of requests to the underlying data source and improve response times. It is a common practice in various applications, including AI systems. Logging, specifically model logging, refers to the recording and storing of model-related information, such as requests, prompts, response times, and GPU usage. This data can be used to monitor and optimize model performance and identify potential issues. Validation is another important function in model middleware, ensuring that data and inputs meet certain criteria before being processed by the model. This can help improve model accuracy and prevent errors. These middleware functions are crucial in the AI stack, as they help optimize model performance, improve data management, and ensure data quality. They are often integrated into MLOps platforms, providing specific features and tools for managing and monitoring AI models.

    • Caching prompts and responses in generative AI for cost savings and performance benefitsCaching prompts and responses in generative AI applications reduces the need for model replicas, minimizes GPU costs, avoids redundant requests, and builds a competitive moat through domain-specific datasets.

      Caching prompts and responses in generative AI applications goes beyond traditional caching methods and offers significant cost savings, improved performance, and competitive advantages. This practice is particularly beneficial for large models that run on expensive specialized hardware or when making expensive requests to external models. By caching prompts and responses, companies can reduce the number of model replicas needed, minimize the cost of GPUs, and avoid making redundant requests to expensive models. Additionally, this data can be leveraged to build a competitive moat by creating a domain-specific dataset for fine-tuning smaller, more cost-effective models or for internal model development. This not only saves operational costs but also provides an advantage in the market. Furthermore, validation tools, such as Prediction Guard, play a crucial role in ensuring the reliability, privacy, security, and compliance of generative AI models by acting as a middleware layer to catch and correct any harmful or inappropriate outputs.

    • Considering the entire application stack for machine learning projectsMachine learning projects require careful consideration of validation, security, type and structuring, and consistency beyond just the model itself.

      While building and deploying machine learning models, it's essential to consider the entire application stack beyond just the model itself. The model is only a small component, and there are various aspects to consider, such as validation, security, type and structuring, and consistency. Validation involves ensuring the desired output is obtained and checking the validity of input data, such as JSON or image upscaling. Security focuses on protecting sensitive data and preventing prompt injections or prediction guarding. Type and structuring ensure the model's output fits the desired format, and consistency checks involve calling the model multiple times for self-consistency. Moreover, the model engineering space is evolving, and AI engineering plays a crucial role in managing the entire stack, from app hosting and data resources to the model and model middleware. By keeping this mental model of the three spokes of the stack – app and app hosting, data and resources, and model and model middleware – developers can effectively orchestrate these components. In essence, the model is just one piece of the puzzle, and a successful machine learning project requires careful consideration of the entire application stack.

    • Exploring the infrastructure components of modern tech stacks: model layer, data layer, and application layerLearn about the essential infrastructure components of modern tech stacks, including the model layer, data layer, and application layer, through a comprehensive conversation on the Practical AI podcast.

      The discussion revolved around the infrastructure components of a modern tech stack, which includes the model layer, data layer, and application layer. This conversation was led by a guest on the Practical AI podcast, who provided valuable insights into these concepts and their interconnections. For those interested in gaining a deeper understanding, checking out the diagram referenced in the show notes and experimenting with end-to-end examples is recommended. These examples can help solidify the concepts and provide a hands-on learning experience. The podcast hosts expressed their appreciation for the conversation and encouraged listeners to subscribe, share the podcast, and explore the resources mentioned in the episode. Fastly and Fly were thanked for their partnership, and a shoutout was given to the music group, BRAKE MASTER Cylinder. Overall, the episode provided a comprehensive exploration of the infrastructure components of modern tech stacks, offering listeners valuable insights and practical learning opportunities.

    Recent Episodes from Practical AI: Machine Learning, Data Science

    Stanford's AI Index Report 2024

    Stanford's AI Index Report 2024
    We’ve had representatives from Stanford’s Institute for Human-Centered Artificial Intelligence (HAI) on the show in the past, but we were super excited to talk through their 2024 AI Index Report after such a crazy year in AI! Nestor from HAI joins us in this episode to talk about some of the main takeaways including how AI makes workers more productive, the US is increasing regulations sharply, and industry continues to dominate frontier AI research.

    Apple Intelligence & Advanced RAG

    Apple Intelligence & Advanced RAG
    Daniel & Chris engage in an impromptu discussion of the state of AI in the enterprise. Then they dive into the recent Apple Intelligence announcement to explore its implications. Finally, Daniel leads a deep dive into a new topic - Advanced RAG - covering everything you need to know to be practical & productive.

    The perplexities of information retrieval

    The perplexities of information retrieval
    Daniel & Chris sit down with Denis Yarats, Co-founder & CTO at Perplexity, to discuss Perplexity’s sophisticated AI-driven answer engine. Denis outlines some of the deficiencies in search engines, and how Perplexity’s approach to information retrieval improves on traditional search engine systems, with a focus on accuracy and validation of the information provided.

    Using edge models to find sensitive data

    Using edge models to find sensitive data
    We’ve all heard about breaches of privacy and leaks of private health information (PHI). For healthcare providers and those storing this data, knowing where all the sensitive data is stored is non-trivial. Ramin, from Tausight, joins us to discuss how they have deploy edge AI models to help company search through billions of records for PHI.

    Rise of the AI PC & local LLMs

    Rise of the AI PC & local LLMs
    We’ve seen a rise in interest recently and a number of major announcements related to local LLMs and AI PCs. NVIDIA, Apple, and Intel are getting into this along with models like the Phi family from Microsoft. In this episode, we dig into local AI tooling, frameworks, and optimizations to help you navigate this AI niche, and we talk about how this might impact AI adoption in the longer term.

    AI in the U.S. Congress

    AI in the U.S. Congress
    At the age of 72, U.S. Representative Don Beyer of Virginia enrolled at GMU to pursue a Master’s degree in C.S. with a concentration in Machine Learning. Rep. Beyer is Vice Chair of the bipartisan Artificial Intelligence Caucus & Vice Chair of the NDC’s AI Working Group. He is the author of the AI Foundation Model Transparency Act & a lead cosponsor of the CREATE AI Act, the Federal Artificial Intelligence Risk Management Act & the Artificial Intelligence Environmental Impacts Act. We hope you tune into this inspiring, nonpartisan conversation with Rep. Beyer about his decision to dive into the deep end of the AI pool & his leadership in bringing that expertise to Capitol Hill.

    Full-stack approach for effective AI agents

    Full-stack approach for effective AI agents
    There’s a lot of hype about AI agents right now, but developing robust agents isn’t yet a reality in general. Imbue is leading the way towards more robust agents by taking a full-stack approach; from hardware innovations through to user interface. In this episode, Josh, Imbue’s CTO, tell us more about their approach and some of what they have learned along the way.

    Private, open source chat UIs

    Private, open source chat UIs
    We recently gathered some Practical AI listeners for a live webinar with Danny from LibreChat to discuss the future of private, open source chat UIs. During the discussion we hear about the motivations behind LibreChat, why enterprise users are hosting their own chat UIs, and how Danny (and the LibreChat community) is creating amazing features (like RAG and plugins).

    Related Episodes

    When data leakage turns into a flood of trouble

    When data leakage turns into a flood of trouble
    Rajiv Shah teaches Daniel and Chris about data leakage, and its major impact upon machine learning models. It’s the kind of topic that we don’t often think about, but which can ruin our results. Raj discusses how to use activation maps and image embedding to find leakage, so that leaking information in our test set does not find its way into our training set.

    Stable Diffusion (Practical AI #193)

    Stable Diffusion (Practical AI #193)
    The new stable diffusion model is everywhere! Of course you can use this model to quickly and easily create amazing, dream-like images to post on twitter, reddit, discord, etc., but this technology is also poised to be used in very pragmatic ways across industry. In this episode, Chris and Daniel take a deep dive into all things stable diffusion. They discuss the motivations for the work, the model architecture, and the differences between this model and other related releases (e.g., DALL·E 2). (Image from stability.ai)

    AlphaFold is revolutionizing biology

    AlphaFold is revolutionizing biology
    AlphaFold is an AI system developed by DeepMind that predicts a protein’s 3D structure from its amino acid sequence. It regularly achieves accuracy competitive with experiment, and is accelerating research in nearly every field of biology. Daniel and Chris delve into protein folding, and explore the implications of this revolutionary and hugely impactful application of AI.

    Zero-shot multitask learning (Practical AI #158)

    Zero-shot multitask learning (Practical AI #158)
    In this Fully-Connected episode, Daniel and Chris ponder whether in-person AI conferences are on the verge of making a post-pandemic comeback. Then on to BigScience from Hugging Face, a year-long research workshop on large multilingual models and datasets. Specifically they dive into the T0, a series of natural language processing (NLP) AI models specifically trained for researching zero-shot multitask learning. Daniel provides a brief tour of the possible with the T0 family. They finish up with a couple of new learning resources.