Logo
    Search

    Podcast Summary

    • Creating Capable AI AgentsCTO Josh Albrecht of Imbue is developing robust coding agents to handle mundane tasks, freeing up time for creative work. Despite challenges, the goal is to make AI research a practical and transformative force in daily life.

      Josh Albrecht, CTO and co-founder of Imbue, shares a long-standing fascination with artificial intelligence and its potential to create agents that can accomplish complex tasks on our behalf. Albrecht's background in AI research and practical applications led him to recognize the limitations of existing tools and the need for more capable agents. He and his team at Imbue aim to create robust coding agents that can take over mundane tasks, freeing up time for more creative and impactful work. However, they face challenges in ensuring the safety and reliability of agents acting in the real world. Overall, the goal is to create tools that make AI research not just an academic pursuit, but a practical and transformative force in our daily lives.

    • Creating Robust and Trustworthy AI AgentsEnsuring consistency and accuracy in AI agents to make them robust and trustworthy is a significant challenge, requiring continuous improvement in techniques and conditions.

      Creating trustworthy and robust AI systems, specifically agents, is a significant challenge. While there are various tools and techniques available, the main issue lies in ensuring the correctness and robustness of these systems. The promise of AI is to provide answers without the need for extensive detail, but the current state of agents only delivers a first pass version, which may be correct 60-70% of the time. To deploy these systems effectively, they need to perform consistently and accurately, which requires a significant amount of work. The more general the assistant category, the harder it becomes to achieve this level of reliability. The main challenge, therefore, is to make agents robust and correct, as this is what truly distinguishes them from other AI systems. For instance, a dog is an agent that is extremely good at not dying, demonstrating a high level of robustness. To overcome this challenge, researchers and developers are continuously working on improving techniques and conditions to make agents more effective.

    • Ensuring Robustness in Agent ImplementationImplementing agents for enterprise use in 2024 requires checks, safeguards, and continuous monitoring to ensure accuracy and robustness.

      Implementing and utilizing agents for productive enterprise use in 2024 requires a more holistic approach compared to using web interfaces. Agents, unlike web interfaces, lack common sense and reasoning capabilities, necessitating the implementation of various checks and safeguards to ensure robustness. These checks can include in-domain programming checks, LLM scoring, and human review. It's crucial to understand the potential places where things could go wrong and the confidence level you're getting back from the system. Internal evaluations and benchmarks are essential to ensure the system's accuracy and to calibrate it to human agreement. Additionally, it's important to consider the distribution of inputs in production and to monitor the system's performance over time. While failure rates may be acceptable for easier cases, it's crucial to understand these rates and to have guardrails in place. Our team has been focusing on internal evaluations and creating our own benchmarks to ensure accuracy and to contribute back to the community in the near future.

    • Full stack approach to building AI agentsSuccessful AI agents require domain expertise, full stack approach, and attention to details like hardware setup, pretraining, fine-tuning, RL evaluations, data generation, cleaning, and UI design.

      Building effective and robust AI agents requires a deep understanding of the specific domain and a full stack approach. The most successful use cases have been driven by individuals with high degrees of domain expertise. While there's a promise of AI that it can do anything with just a text box interface, the reality is that digging into the details, using few-shot examples, and retrieval techniques are necessary to achieve accurate results. Imbue's approach to building robust foundation for AI agents aligns with this idea, taking a full stack approach that includes setting up hardware and infrastructure, pretraining and fine-tuning, RL evaluations, data generation and cleaning, and UI design. By tweaking things at each level, the overall system can be optimized to better meet the needs of the product and the user.

    • Deep understanding of technologies through researchWe prioritize deep research into neural networks and language models, identifying strengths and weaknesses, for effective use and improved performance.

      Our company values a deep understanding of the technologies we work with, rather than treating them as black boxes. We believe in opening up these technologies, such as pre-training, fine-tuning, and reinforcement learning, to understand what's really happening inside. We have a weekly paper club where we discuss state-of-the-art research to gain insights into neural networks and language models, identifying their strengths and weaknesses. We also evaluate the accuracy and perplexity of our systems on multiple choice question answering datasets, providing a more precise understanding of their performance. This research-first approach allows us to make informed modifications and improvements, even though it takes more time and effort compared to a hack-together-and-throw-it-out approach. Our long-term benefits include making the most of the significant resources required to train these systems and avoiding wasted time on ineffective changes. By deeply understanding these technologies, we can use them effectively to deliver good user experiences in the real world. This approach sets us apart from competitors who may not prioritize research and deep understanding, allowing us to stay ahead in the rapidly evolving field of AI and machine learning.

    • Building robust systems and understanding scaling laws for machine learning modelsTeam is taking a long-term approach to foundation model building by focusing on developing automated hyperparameter tuners and exploring ways to adapt and combine models for specific tasks, resulting in more specialized and effective models.

      The team is focusing on building robust systems and understanding scaling laws for their machine learning models, rather than rushing to create the biggest foundation model. They believe that a long-term approach, which includes developing an automated hyperparameter tuner, will lead to more specialized and effective models. Despite the rapidly evolving landscape of foundation model building, the team is maintaining focus by exploring ways to adapt and combine models for specific tasks. They have seen promising results with smaller models and plan to share more information in upcoming blog posts.

    • Understanding fundamentals of deep learning crucial for creating robust agentsGaining a clearer picture of deep learning fundamentals leads to more efficient exploration and removal of hyperparameters, making machine learning less of a black box.

      While having a large amount of data is important for machine learning models, the quality of the data is even more crucial. The speaker emphasizes that understanding the fundamental theories behind deep learning is essential for creating robust agents. They mention specific areas of research, such as understanding initialization methods and learnability, which can lead to more efficient exploration of the space and the removal of hyperparameters. The speaker also compares the current state of machine learning to the early days of physics, where theoretical understanding came after experimental discovery. They highlight the importance of gaining a clearer picture of the fundamentals to make machine learning less of a black box and more efficient. Additionally, they discuss the importance of specialized and practical applications of these models, as well as the ongoing competition in the field.

    • Understanding Machine Learning Models through Expert InsightsExperts use various methods and techniques, including studying norms and quantities, applying different types of regularization, and examining hyperparameters, to gain insights into machine learning models. Focusing on both training process and post-training evaluation is crucial for building trust in models.

      While machine learning models, such as neural networks and language models, may seem like black boxes to some, there are many researchers and practitioners who have a deep understanding of how they work. These experts use various methods and techniques, including studying norms and quantities, applying different types of regularization, and examining the behavior of specific hyperparameters, to gain insights into these models. This collective knowledge helps inform us about what is happening under the hood, just as we don't need to fully understand every aspect of a car to use it effectively. When it comes to engineering trust into model training, it's essential to focus on both the training process and the post-training evaluation. While improving models during training is important, the largest impact can be made after training by implementing robust systems, validating results, and continuously monitoring and updating models as new data becomes available. By prioritizing these efforts, organizations can build confidence in their models and ensure they are delivering accurate and reliable outcomes.

    • Building Trust in AI Models: A Post-Training ConcernBuilding trust in AI models involves user interaction, real-time verification, and clear communication. Design interfaces with the user's needs in mind to improve the user experience and build trust.

      Building trust in AI models goes beyond the development process and requires careful consideration during deployment. Trust should be thought of as a set of different types of data that can give confidence in the model's performance. This includes auditing, real-time verification, user interaction, and other checks. It's essential to have separate systems for these checks to ensure the model's behavior is trustworthy. The user experience plays a significant role in building trust. For instance, reviewing code generated by a model can be time-consuming and frustrating. Instead, an interactive interface that flags potential issues and allows for back-and-forth communication between the user and the model can greatly improve the user experience. Interfaces should be designed with the user's needs in mind, making the process of using AI models more efficient and less error-prone. The speaker also mentioned their experiences with internal prototypes and products, emphasizing the importance of interactive interfaces and clear communication in building trust. For example, Copilot, which keeps suggestions short and easy to review, is appreciated for this reason. Overall, trust in AI models is a post-training concern that requires a practical approach, focusing on user experience, interaction, and real-time verification.

    • Advanced coding tools for higher-level pseudo code or intentFuture tools may help users write pseudo code or intent, translating it into real code to reduce errors and focus on intentions, but user's desired and actual versions may not always align, requiring presence and learning.

      The future of coding may involve advanced tools that help users write higher-level pseudo code or intent, which can then be translated into real code. This could significantly change the day-to-day experience of coding, potentially reducing the time spent on fixing errors and allowing users to work at a higher level of abstraction. However, it's important to note that the user's desired version of the code and the actual correct version may not always align, and the user needs to be present and able to learn and refine their intentions. The vision is to create tools that can robustly handle software development, allowing users to trust the system and have a dialogue when things don't go as planned. The goal is to make coding more efficient and less error-prone, enabling users to focus on their intentions rather than the intricacies of the code. This could lead to a more incremental change in the coding workflow, as opposed to the large, generational shifts that have historically been difficult to review and trust.

    • Empowering users with interactive tools for AI and software developmentInteractive tools that allow users to write pseudo code or commands will be key in creating adaptable agents, while the choice of programming language may continue to be a challenge, potentially leading to a focus on a smaller set of robustly supported languages or tools that convert between languages.

      The future of AI and software development lies in interactive and robust tools that empower users to write pseudo code or commands, rather than relying on full automation. This approach allows for the creation of agents capable of performing a wide range of tasks, even those unintended or unprogrammed. However, the choice of programming language may continue to be a challenge, as some languages excel in certain areas but struggle in others. The future may hold a shift towards focusing on a smaller set of robustly supported languages, or the development of tools that convert between languages, allowing users to work at a higher level of abstraction and ignore the underlying code details. Ultimately, the goal is to create tools that provide a better user experience and enable the creation of more capable and adaptable agents.

    • Exploring the future of language models and Imbue's advancementsLanguage models may shift towards languages with type safety and improved performance, but generating data could be a challenge. Imbue's focus is on advancing robust reasoning capabilities, which could lead to significant labor displacement and automation.

      The future of language models may involve a shift towards languages that better fit the needs of language models, such as those that offer type safety and improved performance. However, generating the necessary data for this to be effective may be a significant challenge. An alternative approach could be to convert existing Python pre-training data to other languages like JavaScript, Rust, and Elixir for training. Looking ahead, the most exciting development for Imbue over the next year is expected to be the advancement of robust reasoning capabilities for language models. This ability to reason through complex scenarios and provide accurate, robust answers could lead to significant labor displacement and disruption, as well as the emergence of more powerful tools that automate tasks currently requiring human intervention. Imbue's focus on this area is much appreciated, and their practical approach to this research is sure to yield valuable results.

    Recent Episodes from Practical AI: Machine Learning, Data Science

    Stanford's AI Index Report 2024

    Stanford's AI Index Report 2024
    We’ve had representatives from Stanford’s Institute for Human-Centered Artificial Intelligence (HAI) on the show in the past, but we were super excited to talk through their 2024 AI Index Report after such a crazy year in AI! Nestor from HAI joins us in this episode to talk about some of the main takeaways including how AI makes workers more productive, the US is increasing regulations sharply, and industry continues to dominate frontier AI research.

    Apple Intelligence & Advanced RAG

    Apple Intelligence & Advanced RAG
    Daniel & Chris engage in an impromptu discussion of the state of AI in the enterprise. Then they dive into the recent Apple Intelligence announcement to explore its implications. Finally, Daniel leads a deep dive into a new topic - Advanced RAG - covering everything you need to know to be practical & productive.

    The perplexities of information retrieval

    The perplexities of information retrieval
    Daniel & Chris sit down with Denis Yarats, Co-founder & CTO at Perplexity, to discuss Perplexity’s sophisticated AI-driven answer engine. Denis outlines some of the deficiencies in search engines, and how Perplexity’s approach to information retrieval improves on traditional search engine systems, with a focus on accuracy and validation of the information provided.

    Using edge models to find sensitive data

    Using edge models to find sensitive data
    We’ve all heard about breaches of privacy and leaks of private health information (PHI). For healthcare providers and those storing this data, knowing where all the sensitive data is stored is non-trivial. Ramin, from Tausight, joins us to discuss how they have deploy edge AI models to help company search through billions of records for PHI.

    Rise of the AI PC & local LLMs

    Rise of the AI PC & local LLMs
    We’ve seen a rise in interest recently and a number of major announcements related to local LLMs and AI PCs. NVIDIA, Apple, and Intel are getting into this along with models like the Phi family from Microsoft. In this episode, we dig into local AI tooling, frameworks, and optimizations to help you navigate this AI niche, and we talk about how this might impact AI adoption in the longer term.

    AI in the U.S. Congress

    AI in the U.S. Congress
    At the age of 72, U.S. Representative Don Beyer of Virginia enrolled at GMU to pursue a Master’s degree in C.S. with a concentration in Machine Learning. Rep. Beyer is Vice Chair of the bipartisan Artificial Intelligence Caucus & Vice Chair of the NDC’s AI Working Group. He is the author of the AI Foundation Model Transparency Act & a lead cosponsor of the CREATE AI Act, the Federal Artificial Intelligence Risk Management Act & the Artificial Intelligence Environmental Impacts Act. We hope you tune into this inspiring, nonpartisan conversation with Rep. Beyer about his decision to dive into the deep end of the AI pool & his leadership in bringing that expertise to Capitol Hill.

    Full-stack approach for effective AI agents

    Full-stack approach for effective AI agents
    There’s a lot of hype about AI agents right now, but developing robust agents isn’t yet a reality in general. Imbue is leading the way towards more robust agents by taking a full-stack approach; from hardware innovations through to user interface. In this episode, Josh, Imbue’s CTO, tell us more about their approach and some of what they have learned along the way.

    Private, open source chat UIs

    Private, open source chat UIs
    We recently gathered some Practical AI listeners for a live webinar with Danny from LibreChat to discuss the future of private, open source chat UIs. During the discussion we hear about the motivations behind LibreChat, why enterprise users are hosting their own chat UIs, and how Danny (and the LibreChat community) is creating amazing features (like RAG and plugins).

    Related Episodes

    When data leakage turns into a flood of trouble

    When data leakage turns into a flood of trouble
    Rajiv Shah teaches Daniel and Chris about data leakage, and its major impact upon machine learning models. It’s the kind of topic that we don’t often think about, but which can ruin our results. Raj discusses how to use activation maps and image embedding to find leakage, so that leaking information in our test set does not find its way into our training set.

    Stable Diffusion (Practical AI #193)

    Stable Diffusion (Practical AI #193)
    The new stable diffusion model is everywhere! Of course you can use this model to quickly and easily create amazing, dream-like images to post on twitter, reddit, discord, etc., but this technology is also poised to be used in very pragmatic ways across industry. In this episode, Chris and Daniel take a deep dive into all things stable diffusion. They discuss the motivations for the work, the model architecture, and the differences between this model and other related releases (e.g., DALL·E 2). (Image from stability.ai)

    AlphaFold is revolutionizing biology

    AlphaFold is revolutionizing biology
    AlphaFold is an AI system developed by DeepMind that predicts a protein’s 3D structure from its amino acid sequence. It regularly achieves accuracy competitive with experiment, and is accelerating research in nearly every field of biology. Daniel and Chris delve into protein folding, and explore the implications of this revolutionary and hugely impactful application of AI.

    Zero-shot multitask learning (Practical AI #158)

    Zero-shot multitask learning (Practical AI #158)
    In this Fully-Connected episode, Daniel and Chris ponder whether in-person AI conferences are on the verge of making a post-pandemic comeback. Then on to BigScience from Hugging Face, a year-long research workshop on large multilingual models and datasets. Specifically they dive into the T0, a series of natural language processing (NLP) AI models specifically trained for researching zero-shot multitask learning. Daniel provides a brief tour of the possible with the T0 family. They finish up with a couple of new learning resources.