Suspicion machines ⚙️

en-usDecember 05, 2023

Practical AI: Machine Learning, Data Science

Podcast Summary

Machine learning systems can lead to biased and punitive consequences, as seen in European welfare systems: Machine learning models can incorrectly flag individuals for investigations, leading to false positives and wrongful accusations, highlighting the importance of transparency and accountability in machine learning systems.
Deployed machine learning systems, often referred to as "suspicion machines," can lead to punitive and biased consequences, as seen in the case of welfare systems in Europe. These systems assign risk scores to individuals and flag those with the highest scores for investigations, which can result in false positives and wrongful accusations. For instance, in the Netherlands, a machine learning model led to a scandal involving 30,000 families being wrongly accused of welfare fraud. This issue highlights the importance of understanding the potential risks and biases of deployed machine learning systems and the need for transparency and accountability to prevent unintended harm. Journalists Justin Braun and Gabriel Geiger, who have investigated this topic, shared their findings and insights on the Practical AI podcast, shedding light on the real-world problems that arise when machine learning systems are not properly understood or regulated.
Investigating the lack of transparency in predictive risk assessment models: Journalists and researchers face challenges in locating, accessing source code and training data of opaque predictive risk assessment models to ensure fairness and transparency.
Transparency and understanding the inner workings of predictive risk assessment models used in various systems, such as welfare or education, is crucial. These models, which can impact individuals' lives significantly, often lack clear explanations of their decision-making processes. This lack of transparency can lead to issues like disparate impact and unfairness towards certain groups. To investigate these models, journalists and researchers face challenges, including locating the models and obtaining access to their source code and training data. The use of freedom of information laws and a tiered approach to document requests can help in this process. However, data protection laws and potential resistance from agencies can make the task difficult. The interest in this topic stems from broader discussions around AI fairness, which emerged after ProPublica's machine bias piece. Predictive risk assessments in various contexts, such as education and welfare, have similar issues, including disparate impact and the need to determine fairness data and threshold settings.
Two paths of predictive analytics implementation in Europe: Europe's predictive analytics in gov & welfare began late but gained momentum, with two paths: one led by industry hype, another by agencies building their own tools or collaborating with academia. Justified by fraud detection, actual scale and nature are debated, with concerns over intentional vs unintentional errors.
The deployment of predictive analytics in Europe, particularly in government and welfare sectors, began later than in the US but has gained momentum in the last decade. There are two main paths in its implementation: one where large industry players hype up the use of these systems, often leading to failures due to lack of agency knowledge and tool effectiveness; and another where agencies build their own tools or collaborate with universities and smaller startups. The justification for using these systems lies in their ability to detect fraud, but the actual scale and nature of welfare fraud are still uncertain and debated. Some consultancies overhype the issue to sell their solutions, while national audits suggest the actual scale is much smaller. Additionally, these systems are often argued to be fairer than human equivalents, as they eliminate biases and are more effective at detecting fraud. However, there is ongoing debate about whether they are actually catching fraud or unintentional mistakes, and whether they are treating the latter as fraud.
Government's complex machine learning models access challenges: Persistence and creativity are crucial when dealing with complex data access issues in government's machine learning models
Obtaining and understanding complex machine learning models deployed by governments or organizations can be a challenging process, even with the use of freedom of information requests. In this case study, the researchers encountered various obstacles in their quest to obtain and analyze a predictive model used by the Dutch city of Rotterdam for flagging potential fraudsters. Initially, they received the source code but lacked the actual model file, which was withheld due to security concerns. After a lengthy battle, the city eventually disclosed the model file. However, understanding the model's output required determining what realistic people looked like and setting the boundary for high-risk individuals. While estimating the threshold was relatively straightforward, obtaining realistic testing data proved to be more challenging. Eventually, the researchers discovered that the entire training data was contained within an HTML file, allowing them to analyze it and gain valuable insights. This experience highlights the importance of persistence and creativity when faced with complex data access issues.
Exploring potential discriminatory practices in a welfare fraud model: While access to data and code can reveal features linked to discrimination, absence of labels limits investigation. Critically examining features and their impact is crucial for fairness and accuracy.
While having access to training data and source code of a machine learning model can provide valuable insights into potential discriminatory practices, the absence of labels in the data can limit the investigation to identifying which characteristics lead to higher or lower scores without knowing if those scores are erroneous for certain groups. The model in question, a gradient boosting machine, is familiar to many, but the features and their potential connection to the welfare fraud situation were worth exploring during the initial discovery and exploratory data analysis. However, it's important to note that including seemingly discriminatory features does not automatically result in discriminatory outcomes, but features measuring ethnic background through language skills and behavioral assessments by caseworkers can raise concerns. The former, with 30 or more variables on language skills, measured everything from spoken and written fluency to the specific language spoken and the number of languages spoken. The latter, which included variables on how someone wore makeup, especially for women, were problematic due to their inclusion and the loss of information when transformed into 0-1 variables. It's crucial to critically examine the features and their potential impact on the model's outcomes to ensure fairness and accuracy.
Subjective variables in investigation systems can lead to inconsistencies and potential bias: The use of subjective variables and lack of access to actual labels in investigation systems can result in inconsistencies, potential bias, and severe consequences for those falsely accused.
The use of subjective variables in a system aimed at reducing bias can undermine its effectiveness. In the case of an investigation system, the inclusion of variables based on individual caseworker assessments may lead to inconsistencies and potential bias. Additionally, the lack of access to the actual labels in the dataset, which distinguishes between intentional fraud and unintentional mistakes, is problematic. Furthermore, the consequences of being flagged for investigation, even if ultimately found to be innocent, can be severe and punitive. These findings were based on ground reporting in Rotterdam, which revealed that single mothers of a migration background living in certain neighborhoods were disproportionately targeted. The system's labeling of investigations as either "fraud" or "not fraud" without distinction between intentional and unintentional errors also calls into question the validity and consistency of the labels. Overall, the consequences of being flagged, even if ultimately found to be innocent, can be detrimental and raise ethical concerns.
City of Rotterdam's fraud detection model: Despite a 10% improvement over random, the model's ROC curve was poor, and there were disparities in flagged individuals. The limited interaction between consultancy and city employees and the city's full control post-deployment complicated evaluating its effectiveness.
Even with an accurate model, biased or flawed data can lead to problematic outcomes. In the case discussed, the city of Rotterdam hired Accenture to build a predictive model for fraud detection. The model had a 10% improvement above random, but the ROC curve was terrible, and there were significant disparities in who was getting flagged. The model was built with limited interaction between the consultancy and Rotterdam employees, and the city took full control of the model after its deployment. However, the question of whether the model was helpful or not is complex. While it did identify some fraud, the problems in the data and labeling may have led to more chaos than solutions. It's essential to consider the quality of the data and the potential biases when evaluating model performance.
Bias in machine learning models due to biased selection process: Machine learning models trained on biased data can lead to unfair outcomes and disparate treatment based on demographic factors. It's crucial to ensure transparency and ethical considerations in model development and selection process.
The patterns observed in a machine learning model for detecting fraud may not accurately reflect real-world fraud behaviors if the training data is biased due to the selection process. For instance, men in the train data were likely selected through neighborhood investigations with a low fraud detection rate, while women were more likely selected through anonymous tips or random sampling, which have higher fraud detection probabilities. This bias in the selection process could lead to disparate outcomes and unfairly flag certain groups. The reception of this story among non-technical audiences was enlightening. While the discriminatory angle was a major concern, many people were intrigued by the decision trees portion of the model, which showed how features interacted nonlinearly and led to different evaluations for men and women. This raised questions about fairness and understanding of how these interactions work. Rotterdam, the city under investigation, responded gracefully, acknowledging the findings as informative and educational. They called on other cities to be transparent about their fraud detection models and ultimately decided to discontinue using the model due to ethical concerns. This response was a testament to the importance of transparency and ethical considerations in machine learning applications.
The importance of considering the entire life cycle of AI systems for fairness: Ensuring fairness in AI requires examining every stage of a system's life cycle, from training data to ethical implications, and promoting transparency to enhance understanding and adherence to ethical guidelines.
The discussion around algorithmic accountability and fairness has primarily focused on outcome fairness, but it's essential to consider the entire life cycle of a system, including the training data, input features, model types, and ethical implications. The training data aspect is particularly intriguing, as the quality and representativeness of the data significantly impact the system's performance and fairness. Moreover, as we look to the future of AI, it's crucial to encourage transparency around these systems. Contrary to the belief that making systems public allows people to manipulate them, these systems operate like administrative guidelines, and transparency can lead to better understanding and adherence to ethical guidelines. It's essential to proactively make the case for transparency and learning how these systems work, as it can lead to improved systems and adherence to legal and ethical standards.
Addressing challenges in creating effective and ethical AI systems: Careful consideration is needed to create effective and ethical AI systems, addressing issues like feature selection, training data, disparate impacts, and larger ethical questions.
Many current AI systems have significant issues, including poor feature selection, problematic training data, and disparate impacts on different groups. However, with careful consideration and attention to these areas, it may be possible to create more effective and ethical systems. It's important to ask questions such as whether machines are more explainable than humans, if equal treatment is being achieved, and if probabilistic assessments are appropriate. Society also needs to grapple with larger ethical questions, such as when it's acceptable to use these systems and if they're addressing the entire problem or just a piece of it. For instance, in the European welfare context, models that aim to detect fraud overlook those who are eligible for benefits but don't use them due to fear of the system. These issues have significant societal consequences, and it's crucial to consider the broader implications of deploying AI systems.
Sharing Knowledge and Collaborating in Tech: The tech industry thrives on knowledge sharing, collaboration, and innovation. Keep learning, growing, and creating by sharing resources and experiences with others.
Importance of community and collaboration in the field of technology. The speakers expressed their gratitude for the opportunity to share their knowledge with a wider audience, and encouraged listeners to do the same by sharing the Practical AI podcast with their networks. They also acknowledged the support of their partners, Fastly and Fly, in making the podcast possible. Lastly, they gave a shout-out to Breakmaster Cylinder, the resident DJ, for providing the perfect beats to keep the energy high. Overall, this podcast episode underscores the value of coming together to learn, grow, and create in the tech industry. So, keep sharing, keep collaborating, and keep innovating! And don't forget to check out Fastly and Fly at fastly.com and fly.io, respectively.

Recent Episodes from Practical AI: Machine Learning, Data Science

Apple Intelligence & Advanced RAG

Daniel & Chris engage in an impromptu discussion of the state of AI in the enterprise. Then they dive into the recent Apple Intelligence announcement to explore its implications. Finally, Daniel leads a deep dive into a new topic - Advanced RAG - covering everything you need to know to be practical & productive.

Practical AI: Machine Learning, Data Science

en-usJune 25, 2024

On this page

Suspicion machines ⚙️

Practical AI: Machine Learning, Data Science

Podcast Summary

Recent Episodes from Practical AI: Machine Learning, Data Science

Apple Intelligence & Advanced RAG

The perplexities of information retrieval

Using edge models to find sensitive data

Rise of the AI PC & local LLMs

AI in the U.S. Congress

First impressions of GPT-4o

Full-stack approach for effective AI agents

Autonomous fighter jets?!

Private, open source chat UIs

Mamba & Jamba

Related Episodes

When data leakage turns into a flood of trouble

Stable Diffusion (Practical AI #193)

AlphaFold is revolutionizing biology

The nose knows

Zero-shot multitask learning (Practical AI #158)