Logo
    Search

    Podcast Summary

    • Machine learning systems can lead to biased and punitive consequences, as seen in European welfare systemsMachine learning models can incorrectly flag individuals for investigations, leading to false positives and wrongful accusations, highlighting the importance of transparency and accountability in machine learning systems.

      Deployed machine learning systems, often referred to as "suspicion machines," can lead to punitive and biased consequences, as seen in the case of welfare systems in Europe. These systems assign risk scores to individuals and flag those with the highest scores for investigations, which can result in false positives and wrongful accusations. For instance, in the Netherlands, a machine learning model led to a scandal involving 30,000 families being wrongly accused of welfare fraud. This issue highlights the importance of understanding the potential risks and biases of deployed machine learning systems and the need for transparency and accountability to prevent unintended harm. Journalists Justin Braun and Gabriel Geiger, who have investigated this topic, shared their findings and insights on the Practical AI podcast, shedding light on the real-world problems that arise when machine learning systems are not properly understood or regulated.

    • Investigating the lack of transparency in predictive risk assessment modelsJournalists and researchers face challenges in locating, accessing source code and training data of opaque predictive risk assessment models to ensure fairness and transparency.

      Transparency and understanding the inner workings of predictive risk assessment models used in various systems, such as welfare or education, is crucial. These models, which can impact individuals' lives significantly, often lack clear explanations of their decision-making processes. This lack of transparency can lead to issues like disparate impact and unfairness towards certain groups. To investigate these models, journalists and researchers face challenges, including locating the models and obtaining access to their source code and training data. The use of freedom of information laws and a tiered approach to document requests can help in this process. However, data protection laws and potential resistance from agencies can make the task difficult. The interest in this topic stems from broader discussions around AI fairness, which emerged after ProPublica's machine bias piece. Predictive risk assessments in various contexts, such as education and welfare, have similar issues, including disparate impact and the need to determine fairness data and threshold settings.

    • Two paths of predictive analytics implementation in EuropeEurope's predictive analytics in gov & welfare began late but gained momentum, with two paths: one led by industry hype, another by agencies building their own tools or collaborating with academia. Justified by fraud detection, actual scale and nature are debated, with concerns over intentional vs unintentional errors.

      The deployment of predictive analytics in Europe, particularly in government and welfare sectors, began later than in the US but has gained momentum in the last decade. There are two main paths in its implementation: one where large industry players hype up the use of these systems, often leading to failures due to lack of agency knowledge and tool effectiveness; and another where agencies build their own tools or collaborate with universities and smaller startups. The justification for using these systems lies in their ability to detect fraud, but the actual scale and nature of welfare fraud are still uncertain and debated. Some consultancies overhype the issue to sell their solutions, while national audits suggest the actual scale is much smaller. Additionally, these systems are often argued to be fairer than human equivalents, as they eliminate biases and are more effective at detecting fraud. However, there is ongoing debate about whether they are actually catching fraud or unintentional mistakes, and whether they are treating the latter as fraud.

    • Government's complex machine learning models access challengesPersistence and creativity are crucial when dealing with complex data access issues in government's machine learning models

      Obtaining and understanding complex machine learning models deployed by governments or organizations can be a challenging process, even with the use of freedom of information requests. In this case study, the researchers encountered various obstacles in their quest to obtain and analyze a predictive model used by the Dutch city of Rotterdam for flagging potential fraudsters. Initially, they received the source code but lacked the actual model file, which was withheld due to security concerns. After a lengthy battle, the city eventually disclosed the model file. However, understanding the model's output required determining what realistic people looked like and setting the boundary for high-risk individuals. While estimating the threshold was relatively straightforward, obtaining realistic testing data proved to be more challenging. Eventually, the researchers discovered that the entire training data was contained within an HTML file, allowing them to analyze it and gain valuable insights. This experience highlights the importance of persistence and creativity when faced with complex data access issues.

    • Exploring potential discriminatory practices in a welfare fraud modelWhile access to data and code can reveal features linked to discrimination, absence of labels limits investigation. Critically examining features and their impact is crucial for fairness and accuracy.

      While having access to training data and source code of a machine learning model can provide valuable insights into potential discriminatory practices, the absence of labels in the data can limit the investigation to identifying which characteristics lead to higher or lower scores without knowing if those scores are erroneous for certain groups. The model in question, a gradient boosting machine, is familiar to many, but the features and their potential connection to the welfare fraud situation were worth exploring during the initial discovery and exploratory data analysis. However, it's important to note that including seemingly discriminatory features does not automatically result in discriminatory outcomes, but features measuring ethnic background through language skills and behavioral assessments by caseworkers can raise concerns. The former, with 30 or more variables on language skills, measured everything from spoken and written fluency to the specific language spoken and the number of languages spoken. The latter, which included variables on how someone wore makeup, especially for women, were problematic due to their inclusion and the loss of information when transformed into 0-1 variables. It's crucial to critically examine the features and their potential impact on the model's outcomes to ensure fairness and accuracy.

    • Subjective variables in investigation systems can lead to inconsistencies and potential biasThe use of subjective variables and lack of access to actual labels in investigation systems can result in inconsistencies, potential bias, and severe consequences for those falsely accused.

      The use of subjective variables in a system aimed at reducing bias can undermine its effectiveness. In the case of an investigation system, the inclusion of variables based on individual caseworker assessments may lead to inconsistencies and potential bias. Additionally, the lack of access to the actual labels in the dataset, which distinguishes between intentional fraud and unintentional mistakes, is problematic. Furthermore, the consequences of being flagged for investigation, even if ultimately found to be innocent, can be severe and punitive. These findings were based on ground reporting in Rotterdam, which revealed that single mothers of a migration background living in certain neighborhoods were disproportionately targeted. The system's labeling of investigations as either "fraud" or "not fraud" without distinction between intentional and unintentional errors also calls into question the validity and consistency of the labels. Overall, the consequences of being flagged, even if ultimately found to be innocent, can be detrimental and raise ethical concerns.

    • City of Rotterdam's fraud detection modelDespite a 10% improvement over random, the model's ROC curve was poor, and there were disparities in flagged individuals. The limited interaction between consultancy and city employees and the city's full control post-deployment complicated evaluating its effectiveness.

      Even with an accurate model, biased or flawed data can lead to problematic outcomes. In the case discussed, the city of Rotterdam hired Accenture to build a predictive model for fraud detection. The model had a 10% improvement above random, but the ROC curve was terrible, and there were significant disparities in who was getting flagged. The model was built with limited interaction between the consultancy and Rotterdam employees, and the city took full control of the model after its deployment. However, the question of whether the model was helpful or not is complex. While it did identify some fraud, the problems in the data and labeling may have led to more chaos than solutions. It's essential to consider the quality of the data and the potential biases when evaluating model performance.

    • Bias in machine learning models due to biased selection processMachine learning models trained on biased data can lead to unfair outcomes and disparate treatment based on demographic factors. It's crucial to ensure transparency and ethical considerations in model development and selection process.

      The patterns observed in a machine learning model for detecting fraud may not accurately reflect real-world fraud behaviors if the training data is biased due to the selection process. For instance, men in the train data were likely selected through neighborhood investigations with a low fraud detection rate, while women were more likely selected through anonymous tips or random sampling, which have higher fraud detection probabilities. This bias in the selection process could lead to disparate outcomes and unfairly flag certain groups. The reception of this story among non-technical audiences was enlightening. While the discriminatory angle was a major concern, many people were intrigued by the decision trees portion of the model, which showed how features interacted nonlinearly and led to different evaluations for men and women. This raised questions about fairness and understanding of how these interactions work. Rotterdam, the city under investigation, responded gracefully, acknowledging the findings as informative and educational. They called on other cities to be transparent about their fraud detection models and ultimately decided to discontinue using the model due to ethical concerns. This response was a testament to the importance of transparency and ethical considerations in machine learning applications.

    • The importance of considering the entire life cycle of AI systems for fairnessEnsuring fairness in AI requires examining every stage of a system's life cycle, from training data to ethical implications, and promoting transparency to enhance understanding and adherence to ethical guidelines.

      The discussion around algorithmic accountability and fairness has primarily focused on outcome fairness, but it's essential to consider the entire life cycle of a system, including the training data, input features, model types, and ethical implications. The training data aspect is particularly intriguing, as the quality and representativeness of the data significantly impact the system's performance and fairness. Moreover, as we look to the future of AI, it's crucial to encourage transparency around these systems. Contrary to the belief that making systems public allows people to manipulate them, these systems operate like administrative guidelines, and transparency can lead to better understanding and adherence to ethical guidelines. It's essential to proactively make the case for transparency and learning how these systems work, as it can lead to improved systems and adherence to legal and ethical standards.

    • Addressing challenges in creating effective and ethical AI systemsCareful consideration is needed to create effective and ethical AI systems, addressing issues like feature selection, training data, disparate impacts, and larger ethical questions.

      Many current AI systems have significant issues, including poor feature selection, problematic training data, and disparate impacts on different groups. However, with careful consideration and attention to these areas, it may be possible to create more effective and ethical systems. It's important to ask questions such as whether machines are more explainable than humans, if equal treatment is being achieved, and if probabilistic assessments are appropriate. Society also needs to grapple with larger ethical questions, such as when it's acceptable to use these systems and if they're addressing the entire problem or just a piece of it. For instance, in the European welfare context, models that aim to detect fraud overlook those who are eligible for benefits but don't use them due to fear of the system. These issues have significant societal consequences, and it's crucial to consider the broader implications of deploying AI systems.

    • Sharing Knowledge and Collaborating in TechThe tech industry thrives on knowledge sharing, collaboration, and innovation. Keep learning, growing, and creating by sharing resources and experiences with others.

      Importance of community and collaboration in the field of technology. The speakers expressed their gratitude for the opportunity to share their knowledge with a wider audience, and encouraged listeners to do the same by sharing the Practical AI podcast with their networks. They also acknowledged the support of their partners, Fastly and Fly, in making the podcast possible. Lastly, they gave a shout-out to Breakmaster Cylinder, the resident DJ, for providing the perfect beats to keep the energy high. Overall, this podcast episode underscores the value of coming together to learn, grow, and create in the tech industry. So, keep sharing, keep collaborating, and keep innovating! And don't forget to check out Fastly and Fly at fastly.com and fly.io, respectively.

    Recent Episodes from Practical AI: Machine Learning, Data Science

    Apple Intelligence & Advanced RAG

    Apple Intelligence & Advanced RAG
    Daniel & Chris engage in an impromptu discussion of the state of AI in the enterprise. Then they dive into the recent Apple Intelligence announcement to explore its implications. Finally, Daniel leads a deep dive into a new topic - Advanced RAG - covering everything you need to know to be practical & productive.

    The perplexities of information retrieval

    The perplexities of information retrieval
    Daniel & Chris sit down with Denis Yarats, Co-founder & CTO at Perplexity, to discuss Perplexity’s sophisticated AI-driven answer engine. Denis outlines some of the deficiencies in search engines, and how Perplexity’s approach to information retrieval improves on traditional search engine systems, with a focus on accuracy and validation of the information provided.

    Using edge models to find sensitive data

    Using edge models to find sensitive data
    We’ve all heard about breaches of privacy and leaks of private health information (PHI). For healthcare providers and those storing this data, knowing where all the sensitive data is stored is non-trivial. Ramin, from Tausight, joins us to discuss how they have deploy edge AI models to help company search through billions of records for PHI.

    Rise of the AI PC & local LLMs

    Rise of the AI PC & local LLMs
    We’ve seen a rise in interest recently and a number of major announcements related to local LLMs and AI PCs. NVIDIA, Apple, and Intel are getting into this along with models like the Phi family from Microsoft. In this episode, we dig into local AI tooling, frameworks, and optimizations to help you navigate this AI niche, and we talk about how this might impact AI adoption in the longer term.

    AI in the U.S. Congress

    AI in the U.S. Congress
    At the age of 72, U.S. Representative Don Beyer of Virginia enrolled at GMU to pursue a Master’s degree in C.S. with a concentration in Machine Learning. Rep. Beyer is Vice Chair of the bipartisan Artificial Intelligence Caucus & Vice Chair of the NDC’s AI Working Group. He is the author of the AI Foundation Model Transparency Act & a lead cosponsor of the CREATE AI Act, the Federal Artificial Intelligence Risk Management Act & the Artificial Intelligence Environmental Impacts Act. We hope you tune into this inspiring, nonpartisan conversation with Rep. Beyer about his decision to dive into the deep end of the AI pool & his leadership in bringing that expertise to Capitol Hill.

    Full-stack approach for effective AI agents

    Full-stack approach for effective AI agents
    There’s a lot of hype about AI agents right now, but developing robust agents isn’t yet a reality in general. Imbue is leading the way towards more robust agents by taking a full-stack approach; from hardware innovations through to user interface. In this episode, Josh, Imbue’s CTO, tell us more about their approach and some of what they have learned along the way.

    Private, open source chat UIs

    Private, open source chat UIs
    We recently gathered some Practical AI listeners for a live webinar with Danny from LibreChat to discuss the future of private, open source chat UIs. During the discussion we hear about the motivations behind LibreChat, why enterprise users are hosting their own chat UIs, and how Danny (and the LibreChat community) is creating amazing features (like RAG and plugins).

    Mamba & Jamba

    Mamba & Jamba
    First there was Mamba… now there is Jamba from AI21. This is a model that combines the best non-transformer goodness of Mamba with good ‘ol attention layers. This results in a highly performant and efficient model that AI21 has open sourced! We hear all about it (along with a variety of other LLM things) from AI21’s co-founder Yoav.

    Related Episodes

    When data leakage turns into a flood of trouble

    When data leakage turns into a flood of trouble
    Rajiv Shah teaches Daniel and Chris about data leakage, and its major impact upon machine learning models. It’s the kind of topic that we don’t often think about, but which can ruin our results. Raj discusses how to use activation maps and image embedding to find leakage, so that leaking information in our test set does not find its way into our training set.

    Stable Diffusion (Practical AI #193)

    Stable Diffusion (Practical AI #193)
    The new stable diffusion model is everywhere! Of course you can use this model to quickly and easily create amazing, dream-like images to post on twitter, reddit, discord, etc., but this technology is also poised to be used in very pragmatic ways across industry. In this episode, Chris and Daniel take a deep dive into all things stable diffusion. They discuss the motivations for the work, the model architecture, and the differences between this model and other related releases (e.g., DALL·E 2). (Image from stability.ai)

    AlphaFold is revolutionizing biology

    AlphaFold is revolutionizing biology
    AlphaFold is an AI system developed by DeepMind that predicts a protein’s 3D structure from its amino acid sequence. It regularly achieves accuracy competitive with experiment, and is accelerating research in nearly every field of biology. Daniel and Chris delve into protein folding, and explore the implications of this revolutionary and hugely impactful application of AI.

    Zero-shot multitask learning (Practical AI #158)

    Zero-shot multitask learning (Practical AI #158)
    In this Fully-Connected episode, Daniel and Chris ponder whether in-person AI conferences are on the verge of making a post-pandemic comeback. Then on to BigScience from Hugging Face, a year-long research workshop on large multilingual models and datasets. Specifically they dive into the T0, a series of natural language processing (NLP) AI models specifically trained for researching zero-shot multitask learning. Daniel provides a brief tour of the possible with the T0 family. They finish up with a couple of new learning resources.