Logo
    Search

    Podcast Summary

    • Exploring the intersection of AI, data science, and podcastingDan Whitenack discusses the global impact of AI and the importance of addressing underrepresented communities, sharing his experiences from SIL International and expanding Practical AI's focus.

      The convergence of various fields, including artificial intelligence (AI), data science, and even podcasting. Dan Whitenack, the host of Practical AI, shared his background in mathematical and computational physics and his experience building AI teams for low-resource scenarios. He recently left his position at SIL International to work on Prediction Guard and is also interim senior operations development director at DeepCandleCode. During the podcast crossover, Dan and the hosts of Late in Space discussed trends in AI and grilled each other on their opinions. They touched upon the global impact of AI and the importance of addressing the needs of communities that may not yet have access to AI technologies. Dan's work at SIL International, an NGO focusing on language-related work, is a testament to this. The discussion also highlighted the long-term evolution of Practical AI, which started as a podcast about open-source and software development but has since expanded to cover AI in its entirety. Dan's passion for various projects and his love for low-resource scenarios, whether in languages or music, further emphasized the far-reaching implications of AI. Overall, this episode underscores the significance of AI and its impact on various industries and communities, as well as the importance of addressing the needs of underrepresented areas.

    • The Origin Story of Practical AI PodcastTwo friends, Samar Abbas and Adam Stokoviak, started the Practical AI podcast in 2016 to fill the gap for hands-on data science content. Later, they joined forces with Chris Benson to provide practical, day-to-day information for listeners. Samar now focuses on Prediction Guard, a tool for compliant large language model usage.

      The Practical AI podcast was born out of a conversation between its hosts, Samar Abbas and Adam Stokoviak, at GopherCon in 2016. They discussed the idea of creating a practical data science podcast due to the lack of hands-on content available at the time. Years later, Samar met Chris Benson, and they teamed up to make the podcast a reality. The focus of Practical AI is on providing useful, day-to-day information for listeners. More recently, Samar has been working on Prediction Guard, a tool aimed at helping enterprises use large language models in a compliant manner while structuring and validating output. Some popular episodes include those featuring Shreya Rajpal from Garell and fully connected episodes where Samar and Chris discuss subjects in detail together. These episodes provide valuable learning opportunities for both hosts. Overall, Practical AI aims to provide practical, hands-on information for those interested in AI and data science.

    • Exploring the practical applications and workings of AI models and grassroots communitiesBoth speakers emphasized the importance of understanding AI models, engaging with grassroots communities, and staying open to new developments in the field.

      The field of AI is constantly evolving, with new models, techniques, and applications emerging regularly. Two individuals shared their personal highlights from their experiences exploring these developments. The first speaker mentioned their fascination with understanding the practical applications and workings of various AI models, such as chat GPT, stable diffusion, and AlphaFold. They also emphasized the importance of grassroots AI communities, like Masakhane, in creating models tailored to specific use cases around the world. The second speaker shared their excitement about interviewing experts in the field, like Mike Conover from Databricks, and the inspiration they gained from their passion and enthusiasm for AI. They also highlighted the importance of reporting on major developments in AI, capturing the reactions and perspectives of the community at the time. One surprising favorite topic for both speakers was Metaflow, a tool for building and managing machine learning workflows. This demonstrates the importance of staying open to new developments and exploring diverse areas within the field of AI. Overall, these discussions emphasize the importance of continuous learning and exploration in the ever-evolving world of AI.

    • From notebooks to production: navigating operational challengesThe life cycle of ML and AI projects extends beyond model creation, requiring focus on model versioning, orchestration, deployment, and integration into existing infrastructure. Traditional benchmarks are evolving towards more nuanced evaluation methods, and the podcast explores these operational aspects in depth.

      The life cycle of machine learning and AI projects goes beyond creating a single model or making a single inference. The challenges of model versioning, orchestration, deployment, and integration into existing infrastructure are crucial aspects of practical usage that often intrigue data science professionals. While foundation models offer the appeal of training once and never touching them again, organizations dealing with specific tasks require explainability and other considerations, making the MLOps era evolve into LLM ops. Our most popular podcast episodes reflect this trend, focusing on model-based topics rather than infrastructure-based ones. For instance, our conversation with Rasa Shivani about training the Replit code model was a hit among our community of software and AI engineers. The shift from traditional benchmarks to more nuanced evaluation methods is another emerging topic. Overall, the journey from notebooks to production involves navigating various operational challenges, and the podcast explores these aspects in depth.

    • Evaluating Large Language Models: Challenges and DevelopmentsRecent discussions focus on reconciling traditional benchmarks with open-ended text generation in LLMs. New approaches include model-based evaluation, generating benchmarks with models, and increasing linguistic diversity. However, concerns about mode collapse and the validity of models evaluating model-generated data persist.

      The evaluation of large language models (LLMs) is a complex issue, with the traditional benchmarks often being multiple choice questions, while most production workloads involve open-ended text generation. Reconciling these two approaches is a challenge that has come up frequently in recent discussions. There are ongoing efforts to use models for on-the-fly or model-based evaluation, as seen in machine translation with evaluators like Comet. However, the rapid evolution of models and benchmarks presents a race to find the state-of-the-art evaluations. An interesting development is the use of models to generate benchmarks, as seen in Hello Swag, where adversarially generated benchmarks were used. This trend of models evaluating models or using simulated data is not new but has gained significant scale. However, a concern is the potential for "mode collapse," where models are optimized for the median use case. To address this, efforts are being made to increase linguistic diversity in LLM datasets, such as Cohere for AI's community-driven initiative. Another surprising finding is that in large datasets, less than 1% of the data may be human-generated, with the rest being model-generated or AI-assisted. This raises questions about the validity and reliability of models evaluating model-generated data. The field is still in its early stages, and it will be interesting to see how these developments unfold.

    • Including diverse languages in LLMs benefits lower resource languages and scenariosGrassroots organizations like Masakane create technology tailored to specific contexts and languages, focusing on speech and use cases, to make technology accessible and beneficial to a wider range of users and contexts.

      The inclusion of diverse languages in large language model (LLM) datasets is crucial for benefiting lower resource languages and scenarios. This is because fine-tuning from these datasets makes the leap to less common languages easier. However, there are still challenges, as most content generated from models is not in lesser-known languages. Grassroots organizations like Masakane are making a significant impact by creating technology tailored to specific contexts and languages that aren't primarily text-based. Masakane, an African NLP research organization, understands that not all communities want or need Wikipedia translations and instead focus on speech and use cases like disease identification, disaster relief, and agriculture. This localized approach is essential, as technology needs in different regions can vary greatly. Raj Shah's episode on Hugging Face, "Capabilities of LLMs," provides a helpful overview of the landscape of large language models, including their availability, commercial use, multilinguality, and task specificity. These dimensions offer valuable insights into the capabilities and applications of these models. Overall, the involvement of diverse communities and organizations in creating and fine-tuning LLMs is crucial for ensuring that technology is accessible and beneficial to a wider range of users and contexts.

    • Staying informed about language models: Podcasts, Twitter, and Hugging FaceStay updated on language models through podcasts, Twitter, and Hugging Face for insights into new developments and access to the latest models and research.

      Staying updated on the constantly evolving landscape of language models can be a challenge, but it's essential for navigating the vast array of models available. Podcasts, social media platforms like Twitter, and model statistics on Hugging Face are useful resources for staying informed. For instance, podcasts provide a consistent platform for discussing new developments, while Twitter can offer real-time updates. Hugging Face, a popular repository for language models, can also provide insights into model popularity and usage. An unexpected discovery was learning that Meta had built upon research done by grassroots organizations on Hugging Face, demonstrating the far-reaching impact of open-source models. The language model market has shifted from a cathedral-like structure with a few dominant players to a more open and accessible landscape, with new models emerging frequently. MosaicML, for instance, aims to make it easier for individuals to develop and deploy their own models. Keeping up with these advancements may require some digging, but the rewards can be significant in terms of staying competitive and leveraging the latest technological innovations.

    • Early stages of ML integration in enterprisesUnderstand ML tasks, prioritize effectively, not every hyped tech becomes permanent, explore practical applications and limitations.

      While there is a lot of excitement and innovation in the field of machine learning and large language models (LLMs), the reality is that many enterprises are still in the early stages of integrating these technologies into their technology stacks. This was a key theme that emerged from the workshops and advisory work run by Datadam.io. Another important takeaway is the need to understand the different tasks involved in machine learning and to time-box and prioritize them effectively. This is particularly relevant for those working in smaller organizations where machine learning tasks may not be specialized functions, but rather part of a larger role. Furthermore, it's important to remember that not every hyped technology will become a permanent part of day-to-day life. It takes time for these technologies to be adopted and integrated into enterprise systems. This can be a refreshing perspective for those who feel overwhelmed by the constant stream of new developments. Lastly, it was emphasized that going beyond single prompts and exploring the practical applications and limitations of LLMs is crucial for effective use. This includes considering ethical and social implications, as well as the technical challenges of implementing and scaling these models. Overall, the workshops provided valuable insights into the current state of machine learning adoption in enterprises and the importance of a holistic approach to understanding and implementing these technologies.

    • Exploring the Depths of Large Language Models: Beyond Prompt EngineeringLeverage techniques like data augmentation, chaining, customization, and fine tuning to enhance the performance of large language models beyond prompt engineering. These skills are essential for enterprise users to fully harness the potential of AI models in their workflows.

      While people may initially be drawn to using large language models like ChatGPT for their use cases, there is a rich environment of techniques and tools available beneath the surface that can greatly enhance the models' performance. These techniques include prompt engineering, data augmentation, chaining, customization, and fine tuning, among others. Enterprise users, in particular, have not fully explored these possibilities, often stopping at the most basic level of interaction with the models. The shift from traditional machine learning workflows to this new environment requires rebuilding intuition and developing a practical workflow. The term "prompt engineering" may be overhyped, but the engineering and operations around prompts and related techniques is a real and valuable skill set. This is part of a broader trend of AI spilling over from the traditional machine learning space into software engineering, creating a new subspecialty of AI engineering. For those coming from a software engineering background, the unique challenges include dealing with the nondeterministic nature of AI systems for the first time and learning the specific tools and techniques for working with language models. For those coming from a data science background, the challenges may include learning the specifics of how to apply machine learning techniques to language models and adapting to the new workflow.

    • Exploring the untapped potential of off-the-shelf language modelsDespite powerful language models like GPT-4, progress can be made using them with the right approach to prompting, chaining, and data augmentation. UX of AI applications is valuable and can create instant user value. High-quality, diverse, and ethically sourced datasets are crucial in the realm of NLP.

      While we have access to powerful language models like GPT-4, we may not be fully utilizing their capabilities due to the complexities of model drift and the unexplored potential of their latent spaces. From a data science perspective, there's a tendency to rush towards fine-tuning or training custom models, but with the right approach to prompting, chaining, and data augmentation, significant progress can be made using off-the-shelf models. Furthermore, the user experience (UX) of AI applications can be just as valuable as the underlying technology. This was evident in the success of ChatGPT, where the smooth integration of AI into the interface created instant value for users. However, this is an area where data scientists may not have as much experience or expertise compared to engineers, who are more accustomed to UI/UX design. In the realm of natural and NLP datasets, there have been significant evolutions, with a growing emphasis on high-quality, diverse, and ethically sourced data. As we delve deeper into the world of datasets in upcoming episodes, it will be essential to consider these advancements and the challenges they present.

    • Bridging the gap between reinforcement learning and model training with human feedbackOpen source data labeling frameworks like Label Studio innovate with human-in-the-loop approaches, making fine tuning of generative AI models more accessible. New tools focus on instruction tuning, while labeling companies like Labelbox and Label Studio cater to custom models and state-of-the-art models with enterprise data.

      Open source data labeling frameworks like Label Studio are continuing to innovate and make fine tuning of generative AI models more accessible through human feedback or customized data. This human-in-the-loop approach is becoming a trend as it bridges the gap between reinforcement learning and model training. Label Studio's new tools are an example of this, focusing on instruction tuning of models. Previously, some companies might have struggled to understand the difference between their past model fine-tuning efforts and creating instruction-tuned models. The workflow remains similar, but the packaging and tooling make it more approachable for users. Despite the importance of labels, new labeling companies like Labelbox and Label Studio continue to emerge, catering to the demand for custom models and state-of-the-art models with enterprise data. The distinction lies in whether companies are thinking of a data platform for AI or simply bringing their data to the AI system. APIs like Cohere and OpenAI that offer fine-tuning as part of their API are popular choices for users who prefer a model-centric approach.

    • Exploring machine learning and language models with unlabeled datasetsResearchers are shifting towards using unlabeled datasets for unsupervised or self-supervised learning, but the optimal data mix is unclear. Some are experimenting with filtering and augmenting existing datasets, but lack of transparency makes it difficult to replicate success.

      There's ongoing exploration in the field of machine learning and language models, specifically regarding the use and mixing of datasets for training models. OpenAI's approach is shifting away from encouraging the use of fine-tuned datasets due to limitations and uncertainties. The use of unlabeled datasets for unsupervised or self-supervised learning is an emerging trend, but the optimal data mix remains unclear. Some researchers are experimenting with filtering and augmenting existing datasets to improve model performance. However, the lack of transparency regarding the specific data used in popular models makes it challenging for others to replicate their success. The landscape is dynamic, and it's encouraged to try various models and gain intuition about their behavior based on their training data.

    • Exploring AI's potential beyond English and written textLarge language models show surprising results in fraud detection and open up vast unexplored territory in AI for languages and modalities beyond English and written text. Get hands-on experience to build intuition and discover new possibilities.

      Surprising generalizability of large language models beyond traditional NLP tasks. The speaker shared an example of using a large language model for fraud detection in insurance transactions, which was unexpected and showed promising results. Another intriguing topic discussed was the vast unexplored territory in AI, particularly in languages and modalities beyond English and written text. The speaker emphasized the need to explore these areas further to understand communication limitations and expand our knowledge. Lastly, the speaker encouraged everyone to get hands-on experience with these models and tools to build intuition and explore new possibilities. Overall, this conversation highlighted the exciting potential of AI and the importance of continuous exploration and experimentation.

    Recent Episodes from Practical AI: Machine Learning, Data Science

    Stanford's AI Index Report 2024

    Stanford's AI Index Report 2024
    We’ve had representatives from Stanford’s Institute for Human-Centered Artificial Intelligence (HAI) on the show in the past, but we were super excited to talk through their 2024 AI Index Report after such a crazy year in AI! Nestor from HAI joins us in this episode to talk about some of the main takeaways including how AI makes workers more productive, the US is increasing regulations sharply, and industry continues to dominate frontier AI research.

    Apple Intelligence & Advanced RAG

    Apple Intelligence & Advanced RAG
    Daniel & Chris engage in an impromptu discussion of the state of AI in the enterprise. Then they dive into the recent Apple Intelligence announcement to explore its implications. Finally, Daniel leads a deep dive into a new topic - Advanced RAG - covering everything you need to know to be practical & productive.

    The perplexities of information retrieval

    The perplexities of information retrieval
    Daniel & Chris sit down with Denis Yarats, Co-founder & CTO at Perplexity, to discuss Perplexity’s sophisticated AI-driven answer engine. Denis outlines some of the deficiencies in search engines, and how Perplexity’s approach to information retrieval improves on traditional search engine systems, with a focus on accuracy and validation of the information provided.

    Using edge models to find sensitive data

    Using edge models to find sensitive data
    We’ve all heard about breaches of privacy and leaks of private health information (PHI). For healthcare providers and those storing this data, knowing where all the sensitive data is stored is non-trivial. Ramin, from Tausight, joins us to discuss how they have deploy edge AI models to help company search through billions of records for PHI.

    Rise of the AI PC & local LLMs

    Rise of the AI PC & local LLMs
    We’ve seen a rise in interest recently and a number of major announcements related to local LLMs and AI PCs. NVIDIA, Apple, and Intel are getting into this along with models like the Phi family from Microsoft. In this episode, we dig into local AI tooling, frameworks, and optimizations to help you navigate this AI niche, and we talk about how this might impact AI adoption in the longer term.

    AI in the U.S. Congress

    AI in the U.S. Congress
    At the age of 72, U.S. Representative Don Beyer of Virginia enrolled at GMU to pursue a Master’s degree in C.S. with a concentration in Machine Learning. Rep. Beyer is Vice Chair of the bipartisan Artificial Intelligence Caucus & Vice Chair of the NDC’s AI Working Group. He is the author of the AI Foundation Model Transparency Act & a lead cosponsor of the CREATE AI Act, the Federal Artificial Intelligence Risk Management Act & the Artificial Intelligence Environmental Impacts Act. We hope you tune into this inspiring, nonpartisan conversation with Rep. Beyer about his decision to dive into the deep end of the AI pool & his leadership in bringing that expertise to Capitol Hill.

    Full-stack approach for effective AI agents

    Full-stack approach for effective AI agents
    There’s a lot of hype about AI agents right now, but developing robust agents isn’t yet a reality in general. Imbue is leading the way towards more robust agents by taking a full-stack approach; from hardware innovations through to user interface. In this episode, Josh, Imbue’s CTO, tell us more about their approach and some of what they have learned along the way.

    Private, open source chat UIs

    Private, open source chat UIs
    We recently gathered some Practical AI listeners for a live webinar with Danny from LibreChat to discuss the future of private, open source chat UIs. During the discussion we hear about the motivations behind LibreChat, why enterprise users are hosting their own chat UIs, and how Danny (and the LibreChat community) is creating amazing features (like RAG and plugins).

    Related Episodes

    When data leakage turns into a flood of trouble

    When data leakage turns into a flood of trouble
    Rajiv Shah teaches Daniel and Chris about data leakage, and its major impact upon machine learning models. It’s the kind of topic that we don’t often think about, but which can ruin our results. Raj discusses how to use activation maps and image embedding to find leakage, so that leaking information in our test set does not find its way into our training set.

    Stable Diffusion (Practical AI #193)

    Stable Diffusion (Practical AI #193)
    The new stable diffusion model is everywhere! Of course you can use this model to quickly and easily create amazing, dream-like images to post on twitter, reddit, discord, etc., but this technology is also poised to be used in very pragmatic ways across industry. In this episode, Chris and Daniel take a deep dive into all things stable diffusion. They discuss the motivations for the work, the model architecture, and the differences between this model and other related releases (e.g., DALL·E 2). (Image from stability.ai)

    AlphaFold is revolutionizing biology

    AlphaFold is revolutionizing biology
    AlphaFold is an AI system developed by DeepMind that predicts a protein’s 3D structure from its amino acid sequence. It regularly achieves accuracy competitive with experiment, and is accelerating research in nearly every field of biology. Daniel and Chris delve into protein folding, and explore the implications of this revolutionary and hugely impactful application of AI.

    Zero-shot multitask learning (Practical AI #158)

    Zero-shot multitask learning (Practical AI #158)
    In this Fully-Connected episode, Daniel and Chris ponder whether in-person AI conferences are on the verge of making a post-pandemic comeback. Then on to BigScience from Hugging Face, a year-long research workshop on large multilingual models and datasets. Specifically they dive into the T0, a series of natural language processing (NLP) AI models specifically trained for researching zero-shot multitask learning. Daniel provides a brief tour of the possible with the T0 family. They finish up with a couple of new learning resources.