Logo
    Search

    Podcast Summary

    • AI in Biospace: Handling Complexity and Big DataAI is crucial in biospace to tackle complexity and big data in areas like immunotherapy and genomics, but the challenges have grown as the data dimensions increase, requiring a focus on specific problems and using AI to help interpret complex biological systems

      The application of AI in the biospace, particularly in areas like immunotherapy and genomics, presents unique challenges due to the vast amount of data and the complexity of biology. Drosun Wolsun, the director of machine learning at Immuni, shared his perspective on how AI has crept into these fields. He began his journey in AI and genomics around a decade ago when microarray data, which allowed for the measurement of 96 genes in a single experiment, was considered big data. Since then, the field has advanced significantly, with the ability to measure all 20,000 genes in the human genome. However, the challenges have also grown, as highly trained immunologists cannot interpret complex 20,000 dimensional sparse vectors. This makes it essential to focus on specific questions and problems, and to use AI to help make sense of the data in these complex and messy biological systems.

    • Revolution in Biology: Single Cell ProfilingSingle cell profiling revolutionizes biology by providing unique insights into individual cells, leading to a more comprehensive understanding of biological systems.

      The last decade has seen a revolution in the field of biology, driven by both advancements in algorithms and experimental techniques. The experimental techniques, such as single cell profiling, have arguably been improving at an even faster rate than the algorithms. This has led to three major revolutions in profiling individual cells and biology, from microarrays to bulk RNA sequencing, and then to the current single cell revolution. Single cell profiling allows scientists to understand what's happening in an individual cell, rather than just the average of many cells. The data from these experiments is crucial, as it's driving the advancements in the field. For those unfamiliar, single cell data involves measuring various aspects of a cell, such as its DNA and RNA, to understand its unique characteristics. This data is important as it provides insights into the complexities of biology and can lead to a better understanding of various diseases and conditions. The difference between single cell and bulk data lies in the level of detail and specificity. Bulk data provides an average of many cells, while single cell data provides information on the unique characteristics of each individual cell. This level of detail is essential for gaining a more comprehensive understanding of biological systems.

    • Measuring cell activity through various modalitiesRNA-seq offers a comprehensive view of gene expression but struggles with low-abundance transcripts, while protein measurement provides a complete picture of cellular function but is more complex and costly. Combining multiple modalities can lead to a more accurate understanding of a cell's state and function.

      Understanding the inner workings of a cell involves measuring various aspects of its activity, including the instructions being copied (DNA), the instructions being passed to factories (mRNA), and the final products being produced (proteins). Each of these measurements, or modalities, provides unique insights but also has its limitations. For instance, RNA sequencing (RNA-seq) allows counting mRNA molecules for every gene, offering a comprehensive view of gene expression, but it struggles with detecting low-abundance transcripts due to its sparse readout. On the other hand, measuring proteins directly provides a more complete picture of cellular function since every gene encodes a protein, but it is more complex and costly than RNA-seq. Therefore, using multiple modalities in parallel, such as RNA-seq, proteomics, and DNA sequencing, can provide a more comprehensive and accurate understanding of a cell's state and function.

    • Understanding the immune system and its interactions with cells for effective immunotherapiesResearchers use targeted approaches to analyze proteins and RNA for insights into cell types and processes, prioritizing data based on research questions. Immunotherapy, a cancer treatment, relies on the immune system, which is efficient at eliminating threats, making it a valuable area of study.

      In modern biology, researchers can use a targeted approach to analyze proteins on the surface of cells and RNA within the cells to gain insights into cell types, their states, and various biological processes. This approach requires careful consideration of which data to prioritize based on the research question, as each modality comes with its own costs and timelines. Immunotherapy, a type of cancer treatment, is connected to this concept as it relies on the immune system, which functions as the body's defense force, to identify and attack cancer cells. Understanding the immune system and its interactions with cells is crucial for developing effective immunotherapies. The immune system, as the body's defense industry, is incredibly efficient, eliminating 99.99% of threats, making it a valuable area of study for researchers and clinicians.

    • Understanding the immune system's role in fighting diseases and its potential flawsImmunotherapy is a game-changing cancer treatment that enhances the immune system's ability to detect and eliminate cancer cells by blocking specific checkpoints used by tumors to evade detection.

      The immune system is a complex and intricately balanced defense mechanism in the human body, evolved over millions of years, composed of various specialized players working together to fight viruses, bacteria, and even cancer cells. However, it can sometimes go awry, leading to issues like cancer or autoimmune diseases. Immunotherapy is a revolutionary approach to cancer treatment that involves coaching the immune system to be more effective at identifying and attacking cancer cells. This is achieved by blocking specific immune checkpoints or "light switches" that tumors use to evade detection by the immune system. By blocking these checkpoints, the immune system is able to recognize and eliminate cancer cells more effectively. Immunotherapy is a promising and rapidly evolving field, with PD-1 and CTLA-4 being two of the initial targets for checkpoint blockade therapy.

    • Applying AI to Immunotherapy's Complex DataAI is crucial in immunotherapy due to vast data and immune system's complexity, requiring focus on specific problems and collaboration to maximize progress, such as analyzing T-cell receptors with advanced techniques and significant computational power.

      In the complex field of bioinformatics and immunotherapy, the application of AI is crucial due to the vast amounts of data and the intricacy of the immune system. The human mind cannot effectively process and make inferences from large, complex datasets, making AI an essential tool. However, with the abundance of data and possibilities, it's important to identify specific problems and focus on solving those. The immune system's complexity and the data generated from it are challenging, as the ground truth can be hard to determine, and the problem space is vast. Immuni, a company in this field, recognized the potential of utilizing AI and other advancements, such as transformers, to address specific problems in immunotherapy. They saw the value in collaborating and sharing data and techniques to maximize progress. An example of this is the use of AI to analyze T-cell receptors, which are crucial in the immune response, and understanding their sequences to develop effective immunotherapies. This application of AI requires significant computational power and advanced techniques, making it a prime example of the hardest regime in the bio-AI combination.

    • Transfer learning in bioTransfer learning is crucial for AI in bio due to diverse data modalities and experimental conditions, allowing models to learn from one dataset and apply to another, leading to significant advancements.

      The future of AI in the bio field lies in transfer learning, which involves training unsupervised models on multiple tasks and fine-tuning them for specific applications. This approach is essential due to the diversity of data modalities and experimental conditions in bio compared to fields like vision and text. In bio, data comes from various smaller sources, requiring models that can learn embeddings or inferences from one dataset and apply them to another. This is a complex challenge that the community is working on, but progress is being made, as seen in natural language understanding. For instance, at Immune, a single cell company, they aim to classify cell types, such as "seal team 6 cell" or "apartment security guard cell," from different sources like blood, bone marrow, or tumors. Traditionally, these have been considered separate problems requiring separate tools and approaches. However, the reality is that they are just different instantiations of the same problem, and transfer learning offers a solution to tackle these similar yet distinct tasks. This approach can lead to significant advancements in the bio field and is an exciting area of research for the future.

    • Advancements in Single Cell Analysis using Machine LearningRecent advancements in single cell analysis involve using transformer models and masking techniques, inspired by NLP, to address unique challenges like distinguishing cells and dealing with technical issues. A generalist approach to pretraining and transfer learning across multiple domains and modalities can enable new discoveries in the biological world.

      The field of single cell analysis, particularly in biology, has seen rapid advancements in the last few years due to the increasing availability of large-scale data. Early approaches involved training autoencoders on gene expression data to reconstruct and analyze individual cells. More recently, there's been a shift towards using transformer models and masking techniques, inspired by natural language processing. However, there are unique challenges in single cell analysis, such as distinguishing between individual cells and dealing with technical issues like cell morphology. These problems, while complex, can be addressed through a generalist approach to pretraining and transfer learning across multiple domains and modalities of data. The goal is to enable new insights and discoveries in the biological world, much like how language models have revolutionized natural language understanding.

    • Understanding the problem and data firstEffective communication and collaboration between experts, simple models can cover significant portion of tasks, and understanding the problem and data before ML models is crucial.

      When approaching a problem in a machine learning pipeline, especially in less common applications, it's crucial to first understand the problem and the data without relying on ML models. The speaker shared their personal experience of taking a break from ML to focus on data engineering and analysis, which led to a deeper understanding of the problem domain and the importance of defining the problem well. In the bio AI world, where problems can be complex and require expertise from various domains, effective communication and collaboration between immunologists, software engineers, data engineers, computational biologists, and machine learning experts are essential. The only way to make this work is by finding individuals who are passionate about the intersection of these fields. Additionally, the speaker emphasized that simple models like logistic regression or XGBoost can cover a significant portion of classification tasks, and it's essential to avoid wasting resources on training large models without first understanding the problem.

    • Building successful bio-tech teams in AI requires interdisciplinary interestFind team members with overlapping expertise for effective communication and collaboration in bio-tech AI projects. Humans and AI are working together to solve core components of drug discovery and therapeutics assembly line.

      Building a successful team in the bio-tech industry, particularly in the field of AI, requires finding individuals with a strong interest in interdisciplinary work and overlapping areas of expertise. The speaker emphasizes the importance of having team members who are excited about the problems in other disciplines, such as immunology, rather than just focusing on improving ETL code. He uses the analogy of overlapping Gaussian distributions to describe the ideal team composition, where the tails of each team's area of expertise overlap, making communication and collaboration easier. The speaker also discusses the current state and future direction of the bio-tech industry, with a focus on the development of an AI-driven drug discovery and therapeutics assembly line. He notes that while some companies are making claims of fully AI-generated therapies, the reality is that humans are still involved in the process, but the core components, such as running in vitro experiments and patient matching, are starting to be solved with the help of AI and large amounts of data. The speaker is excited about the growing interest in these problems from outside the bio-space, as top researchers from various fields are collaborating with his organization due to the mission and data availability.

    • Exploring the Role of AI in Solving Complex Diseases like CancerAI, data, good people, and experimental methodologies are revolutionizing cancer research, bringing us closer to solving complex diseases. Collective efforts of many companies will lead to significant progress.

      The intersection of AI, data, good people, and experimental methodologies is bringing us closer to solving complex diseases like cancer. Jocelyn Stone, from Immuni, a biotech company, shares her excitement about the potential of this field and the role AI plays in it. She compares the current state of cancer research to where HIV treatment was 30 years ago and expresses her belief that we are on the verge of making significant progress. Stone also emphasizes that it's not just one company that will solve these problems but the collective efforts of many. She aspires to be part of this grand challenge and leave a legacy for future generations. Overall, the conversation highlights the promise and potential of AI in the biotech industry and the inspiring work being done to tackle complex health issues.

    Recent Episodes from Practical AI: Machine Learning, Data Science

    Stanford's AI Index Report 2024

    Stanford's AI Index Report 2024
    We’ve had representatives from Stanford’s Institute for Human-Centered Artificial Intelligence (HAI) on the show in the past, but we were super excited to talk through their 2024 AI Index Report after such a crazy year in AI! Nestor from HAI joins us in this episode to talk about some of the main takeaways including how AI makes workers more productive, the US is increasing regulations sharply, and industry continues to dominate frontier AI research.

    Apple Intelligence & Advanced RAG

    Apple Intelligence & Advanced RAG
    Daniel & Chris engage in an impromptu discussion of the state of AI in the enterprise. Then they dive into the recent Apple Intelligence announcement to explore its implications. Finally, Daniel leads a deep dive into a new topic - Advanced RAG - covering everything you need to know to be practical & productive.

    The perplexities of information retrieval

    The perplexities of information retrieval
    Daniel & Chris sit down with Denis Yarats, Co-founder & CTO at Perplexity, to discuss Perplexity’s sophisticated AI-driven answer engine. Denis outlines some of the deficiencies in search engines, and how Perplexity’s approach to information retrieval improves on traditional search engine systems, with a focus on accuracy and validation of the information provided.

    Using edge models to find sensitive data

    Using edge models to find sensitive data
    We’ve all heard about breaches of privacy and leaks of private health information (PHI). For healthcare providers and those storing this data, knowing where all the sensitive data is stored is non-trivial. Ramin, from Tausight, joins us to discuss how they have deploy edge AI models to help company search through billions of records for PHI.

    Rise of the AI PC & local LLMs

    Rise of the AI PC & local LLMs
    We’ve seen a rise in interest recently and a number of major announcements related to local LLMs and AI PCs. NVIDIA, Apple, and Intel are getting into this along with models like the Phi family from Microsoft. In this episode, we dig into local AI tooling, frameworks, and optimizations to help you navigate this AI niche, and we talk about how this might impact AI adoption in the longer term.

    AI in the U.S. Congress

    AI in the U.S. Congress
    At the age of 72, U.S. Representative Don Beyer of Virginia enrolled at GMU to pursue a Master’s degree in C.S. with a concentration in Machine Learning. Rep. Beyer is Vice Chair of the bipartisan Artificial Intelligence Caucus & Vice Chair of the NDC’s AI Working Group. He is the author of the AI Foundation Model Transparency Act & a lead cosponsor of the CREATE AI Act, the Federal Artificial Intelligence Risk Management Act & the Artificial Intelligence Environmental Impacts Act. We hope you tune into this inspiring, nonpartisan conversation with Rep. Beyer about his decision to dive into the deep end of the AI pool & his leadership in bringing that expertise to Capitol Hill.

    Full-stack approach for effective AI agents

    Full-stack approach for effective AI agents
    There’s a lot of hype about AI agents right now, but developing robust agents isn’t yet a reality in general. Imbue is leading the way towards more robust agents by taking a full-stack approach; from hardware innovations through to user interface. In this episode, Josh, Imbue’s CTO, tell us more about their approach and some of what they have learned along the way.

    Private, open source chat UIs

    Private, open source chat UIs
    We recently gathered some Practical AI listeners for a live webinar with Danny from LibreChat to discuss the future of private, open source chat UIs. During the discussion we hear about the motivations behind LibreChat, why enterprise users are hosting their own chat UIs, and how Danny (and the LibreChat community) is creating amazing features (like RAG and plugins).

    Related Episodes

    When data leakage turns into a flood of trouble

    When data leakage turns into a flood of trouble
    Rajiv Shah teaches Daniel and Chris about data leakage, and its major impact upon machine learning models. It’s the kind of topic that we don’t often think about, but which can ruin our results. Raj discusses how to use activation maps and image embedding to find leakage, so that leaking information in our test set does not find its way into our training set.

    Stable Diffusion (Practical AI #193)

    Stable Diffusion (Practical AI #193)
    The new stable diffusion model is everywhere! Of course you can use this model to quickly and easily create amazing, dream-like images to post on twitter, reddit, discord, etc., but this technology is also poised to be used in very pragmatic ways across industry. In this episode, Chris and Daniel take a deep dive into all things stable diffusion. They discuss the motivations for the work, the model architecture, and the differences between this model and other related releases (e.g., DALL·E 2). (Image from stability.ai)

    AlphaFold is revolutionizing biology

    AlphaFold is revolutionizing biology
    AlphaFold is an AI system developed by DeepMind that predicts a protein’s 3D structure from its amino acid sequence. It regularly achieves accuracy competitive with experiment, and is accelerating research in nearly every field of biology. Daniel and Chris delve into protein folding, and explore the implications of this revolutionary and hugely impactful application of AI.

    Zero-shot multitask learning (Practical AI #158)

    Zero-shot multitask learning (Practical AI #158)
    In this Fully-Connected episode, Daniel and Chris ponder whether in-person AI conferences are on the verge of making a post-pandemic comeback. Then on to BigScience from Hugging Face, a year-long research workshop on large multilingual models and datasets. Specifically they dive into the T0, a series of natural language processing (NLP) AI models specifically trained for researching zero-shot multitask learning. Daniel provides a brief tour of the possible with the T0 family. They finish up with a couple of new learning resources.