94 | Stuart Russell on Making Artificial Intelligence Compatible with Humans

enApril 27, 2020

Sean Carroll's Mindscape: Science, Society, Philosophy, Culture, Arts, and Ideas

Podcast Summary

Streamline hiring with Indeed: Indeed connects employers with potential candidates efficiently with 350 million monthly visitors and advanced matching technology, delivering the best quality matches compared to other job sites (93% employer agreement)
When it comes to hiring, instead of actively searching for candidates, utilizing a platform like Indeed can streamline the process and provide high-quality matches. With over 350 million monthly visitors and advanced matching technology, Indeed helps employers connect with potential candidates quickly and efficiently. Moreover, 93% of employers agree that Indeed delivers the best quality matches compared to other job sites. As we navigate through unprecedented times, technology, such as Indeed, plays a crucial role in enabling communication and productivity. In the realm of artificial intelligence, ensuring that AI aligns with human goals is a significant challenge. Stuart Russell, a leading AI expert, proposes an approach to teach AIs to learn from human behavior to determine what we want. This method opens up new questions regarding the feasibility of this solution, the level of concern for super-intelligent AI, and its broader implications for the future.
Making Machines Intelligent: AI on a Continuum: AI is a gradual progression from simple programs to complex systems, determined by task environment and objective.
Artificial intelligence (AI) refers to making machines intelligent to achieve objectives set by humans. The definition of AI lies on the continuum between simple programs like a thermostat and complex systems like those that can teach someone to speak French or run a country. The complexity of the task environment and the objective determine where on this continuum a system falls. AI is not a distinct phase transition from regular computer programs, but rather a gradual progression. Stuart Russell, an expert in AI research, emphasized that the objectives can range from simple tasks to those in complex environments involving human interaction. He also highlighted that the field of AI is constantly evolving, with ideas ranging from classic concepts to more advanced theories. The discussion also touched on the importance of addressing potential dangers and ethical considerations in AI development.
The Challenges of Solving Open-Ended AI Tasks: While AI excels at solving well-defined problems, it struggles with open-ended tasks due to constantly changing circumstances and unknown objectives. Understanding the unique challenges of different AI applications and progress made in solving them is crucial for learning.
While AI has made significant strides in solving well-defined problems like Go, it still faces challenges when it comes to more complex, open-ended tasks like driving a car. The rules of Go are explicitly defined, and there is a right answer to every question, making it a solvable problem. However, driving involves dealing with constantly changing circumstances and unknown objectives, making it an unsolved problem. The speaker also noted that students may overlook the importance of studying well-solved problems because they no longer see them as AI, which can limit their learning. The discussion highlighted the importance of understanding the unique challenges of different AI applications and the progress made in solving them.
Understanding Intelligence as Maximizing Utility or Goals: Intelligence, including AI, can be viewed as optimizing goals or utilities, aligning with the economic theory of rational decision making since the 1940s. This concept, allowing for uncertainty and preferences, can explain instinctive human reactions as rational.
Intelligence, including AI, can be understood as optimizing or maximizing some goal or utility function. This idea, known as rational decision making, has been a cornerstone of economic theory since the 1940s. It allows for uncertainty and preferences, and an agent acting in accordance with this principle can be described as if it is maximizing utility, even if it doesn't actually perform the complex calculations. For instance, our instinctive reactions to danger, like closing our eyes to avoid injury, can be seen as rational even if they don't involve conscious utility calculations. This concept was discussed in the context of AI development and the potential for machines to surpass human intelligence. While some may find this connection less obvious for ordinary human intelligence, it has been a powerful framework in economics for decades.
Understanding rational decision making and its complexities: The Von Neumann-Morgenstern theorem describes rational behavior as maximizing a utility function, but humans don't always behave rationally and machines face computational challenges in optimizing utility functions. Humans overcome computational difficulties through hierarchical organization of behavior.
The Von Neumann-Morgenstern theorem states that rational behavior can be described as maximizing a utility function based on reasonable axioms, such as transitivity of preferences. However, optimizing over a utility function in many situations can be computationally intractable for machines, and humans don't behave rationally all the time either. The intriguing question is whether machines will surpass human decision-making capabilities, and if so, when and by how much. Despite their computational limitations, humans have evolved and learned to overcome computational difficulties in decision making through hierarchical organization of behavior. Although we may not be good at playing complex games like Go, our brains are excellent at managing complex tasks through natural hierarchies. Ultimately, understanding the implications of the Von Neumann-Morgenstern theorem can help us appreciate the complexities of rational decision making and the potential role of machines in enhancing or surpassing human capabilities.
Understanding Hierarchical Planning in Humans and AI: Humans manage complexity through hierarchical planning, breaking tasks into smaller components and planning ahead. AI struggles to create its own abstract actions and hierarchies, a crucial step towards advanced systems.
Humans are incredibly complex systems, capable of managing vast amounts of information and making decisions at multiple levels of abstraction, from the millisecond to years and even decades. This complexity arises from our ability to break down tasks into smaller components and plan ahead, using a hierarchy of abstract actions. This concept, known as hierarchical planning, has been studied for decades, with early attempts to implement it in AI systems. However, the challenge lies in how to teach a system to develop its own hierarchy and invent abstract actions autonomously. This is a significant difference between conventional AI and the ultimate goal of creating intelligent machines that can truly understand and navigate the world like humans. While we have algorithms for constructing long-term plans, the missing piece is enabling a system to create its own abstract actions and hierarchies. This is a complex problem, as not all abstract actions were invented by humans, and some, like getting a PhD or immigrating to a new country, did not exist in the past. Instead, we have a vast library of abstract actions at various levels of abstraction, from long-term goals to short-term tasks, and many of these actions were not invented individually. Understanding and replicating this human ability to manage complexity through hierarchical planning is a crucial step in creating more advanced AI systems.
Inherited Culture as a Source of Human Intelligence: Current AI systems lack the ability to understand and apply common sense knowledge, a crucial aspect of human intelligence, due to their limited access to cultural background knowledge.
The intelligence we possess as humans is not solely an individual trait, but rather something we inherit from our culture in the form of a vast library of abstract knowledge. This background knowledge is essential for navigating the complexities of civilization and making informed decisions. However, current AI systems, particularly those based on deep learning, lack this common sense knowledge and can only perform tasks based on the data they are given. Efforts to represent and process this knowledge mathematically have proven challenging, and the AI community is currently focusing on deep learning approaches that do not attempt to construct representations of knowledge or common sense. While impressive achievements, such as a machine learning system passing a university entrance exam, have been made, these systems are still unable to answer common sense questions that go beyond their specific tasks.
Integrating deep learning with classical AI techniques: Deep learning needs human-like reasoning and understanding context to advance in AI. Tokyo exam example highlights deep learning's limitations, but superintelligent AI remains a potential threat.
While deep learning has made significant strides in processing large amounts of data and identifying patterns, it currently lacks the ability to reason and understand context like humans. The integration of deep learning with classical AI techniques, such as symbolic reasoning and knowledge representation, is necessary for the next advancements in AI. The Tokyo entrance exam example illustrates the limitations of deep learning, but it doesn't diminish the potential danger of superintelligent AI. We will likely overcome the major obstacles to human-level AI, but it doesn't have to surpass human intelligence in every aspect to pose a threat. As for the Blinkist app, it's a convenient solution for busy individuals who want to understand the key ideas from popular books without committing to lengthy reads. The future of AI raises important questions about its potential capabilities and implications for humanity, and ongoing research and development will continue to shape our understanding.
Preparing for the Risks of Superintelligent AI: Invest in AI research, plan for potential risks, and ensure we retain control over superintelligent AI to mitigate negative impacts on the world.
While the development of superintelligent AI may not be an imminent threat, it's important to be concerned about the negative impacts of current AI systems and prepare for potential risks in the future. AI systems, such as those that run on social media, are already having a significant negative impact on the world. The timeline for the development of human-level AI capabilities is uncertain, but it's expected to be towards the end of the century. The risks from AI are different from those of an asteroid impact, as the consequences of AI going wrong are harder to predict. It's crucial to invest in AI research with a plan for what to do when we succeed and to avoid ceding control to a more powerful species. The specific form of superintelligent AI is uncertain - it could be a program, an emergent entity, or distributed in the cloud. Regardless, it's essential to be prepared and ensure that we retain control.
AI as a composite system in the cloud: AI's impact on the world can be devastating if objectives are incorrectly defined, leading to unintended consequences.
The typical depiction of AI as a single, sentient robot in science fiction is unlikely. Instead, AI is more likely to be a composite system drawing on the computational power of the entire cloud or a significant portion of it. However, this interconnectedness could lead to unanticipated interactions and negative consequences, as seen in the stock market's "flash crash." The standard model of AI, which involves creating optimizing machinery with human-defined objectives, is a risky engineering approach, especially when dealing with potentially more powerful entities. AI systems can impact the world through communication and the Internet, and the failure mode we should anticipate is not the algorithm's stupidity, but our incorrect specification of its objectives. This can result in the algorithm carrying out its tasks with extreme efficiency, but with devastating side effects, such as the fossil fuel industry's destruction of the environment while optimizing profits. It's crucial to consider these potential risks and ensure that objectives are defined holistically, taking into account all possible consequences.
The connection between superintelligent AI and human benefit is not inherent: Assumptions of AI acting in human interest lack justification, potential negative consequences vast and unpredictable, careful consideration and regulation crucial as AI impact grows.
There is no inherent connection between superintelligent AI and it being beneficial to humans. Some argue that a sufficiently intelligent entity would act in a way that benefits humans due to their nature, but this assumption lacks justification. The belief that AI will be human-compatible must be intentionally designed. The potential negative consequences of AI are vast and unpredictable, ranging from misdirecting the food supply to manipulating social media, and even causing human extinction. The difficulty lies in anticipating and preventing these outcomes, as scenarios can always be countered with rules or alternative solutions, leading to unintended consequences. As our intelligent systems become more advanced and have greater impact on the world, the need for careful consideration and regulation becomes increasingly important. The off switch is no longer a reliable solution.
Designing AI with Flexible Objectives: To prevent AI harm, design systems with flexible objectives that align with human preferences and allow for revision based on feedback.
We need to reconsider how we set objectives for artificial intelligence (AI) systems to prevent them from pursuing their goals at the expense of human values. The discussion emphasizes that once AI systems leave the lab, their impact on the real world can be immense, and they are designed to achieve their objectives relentlessly. The proposed solution is to design AI systems with a flexible objective that aligns with human preferences for the future, rather than a fixed objective. This approach allows the AI system to learn and revise its plan based on human feedback, avoiding potential harm. The key is to ensure that the AI system understands that it doesn't know everything about human preferences and should defer to human instruction and seek permission before taking actions. These behaviors are not programmed but a logical consequence of framing the problem in this way. In essence, we must design AI systems that prioritize human values and are open to human guidance to ensure a beneficial and harmonious future.
Developing AI algorithms based on human desires: Researchers create AI systems that learn from human guidance to fulfill desires, improving human-machine interaction and problem-solving
Researchers are developing AI algorithms based on a concept called an "assistance game," where the machine's objective is to fulfill the human's desires, even if it's unsure what those desires are. This game involves the human guiding the machine, teaching it what to do and what not to do, leading to behaviors that defer to humans, ask permission, and learn from them. These behaviors are a more general and interesting problem-solving situation than traditional machines pursuing fixed objectives. The system may become more certain over time about human motivations, but it's unlikely to ever fully predict them due to the vastness of human experiences. This research bears a resemblance to Bayesian inference, where prior beliefs are updated based on new information. The ultimate goal is to create machines that can help humans achieve their objectives effectively and efficiently.
Understanding human preferences is complex and uncertain: AI systems can't predict human preferences with absolute certainty due to inherent uncertainty, but can function like a perfect butler based on prior knowledge and broad assumptions.
Human preferences, even the most seemingly trivial ones, are complex and uncertain. For instance, people's reactions to the color of the sky or trying a new food like durian can elicit strong positive or negative responses, yet individuals have no way of knowing their own preferences without experiencing them firsthand. This epistemic uncertainty applies to various aspects of life, including career choices. An AI system, no matter how advanced, cannot predict human preferences with absolute certainty due to this inherent uncertainty. The Bayesian inference approach can help reduce uncertainty by continually updating beliefs based on new evidence, but it will never reach complete certainty. Instead, an AI system could function like a perfect butler, anticipating human needs based on prior knowledge and broad assumptions, but it would be unlikely to reach this level of understanding in finite time. Ultimately, understanding human preferences remains a complex and ongoing exploration.
Two selves theory in human preferences: Humans have two selves: an experiencing self focused on momentary enjoyment and a remembering self that remembers the most pleasurable or painful parts of experiences, influencing decision-making with memory and expectations, but humans may be inconsistent and AI cannot satisfy inconsistent preferences, yet can provide satisfactory experiences
Understanding and replicating human preferences in AI is a complex task. While humans may appear to have consistent preferences in some experiments, they often deviate from the rationality axioms and are influenced by memory and expectations. Economist Daniel Kahneman's research suggests humans have two selves: an experiencing self, focused on momentary enjoyment or pain, and a remembering self, which remembers the most pleasurable or painful parts of experiences and the end of experiences. This "peak-end theory" is difficult to reconcile with standard economic models. The remembering self's role in decision-making, which incorporates memory and expectations, might be rational. However, humans may be inconsistent, and their preferences can change or be incoherent. AI systems cannot satisfy inconsistent preferences, but they can provide a satisfactory experience, like pizza instead of nothing. The challenge lies in bridging the gap between the idealized model of a human with stable, consistent preferences and the reality of actual humans.
Designing AI systems for stability and predictability: Despite mathematical proofs of AI behavior, ensuring they don't change preferences or manipulate humans is complex. Potential for human preference changes and side channels add challenges. Future with human-level or superintelligent AI offers great potential, but significant design work remains.
Ensuring stability and predictability in artificial intelligence (AI) systems is a complex challenge due to the potential for human preferences to change. This raises questions about which self an AI system should be designed to satisfy and whether it's possible to assume that we're baking in their fundamental motivational structure. Meanwhile, proving mathematical theorems about software, including AI systems, is becoming increasingly sophisticated. We can prove that algorithms behave according to specifications, but ensuring that they don't change their preferences to satisfy human ones or convince humans to make changes is more challenging. As we put AI systems into the real world, we must also consider the potential for side channels, or ways the AI can convince humans to make physical changes. Looking ahead, assuming we can design AI systems that we retain control over, the potential for a future with human-level or superintelligent AI is vast, offering the possibility of a better civilization. However, significant algorithm design and engineering work remains to be done to take the basic framework and turn it into technology.
Improving standard of living with AI and robotics: AI and robotics technology could increase global GDP by 13.5 trillion dollars, but it also raises questions about human role in a world where machines can do most things better and potentially for free. Encouraging humans to continue learning and finding value in their lives is crucial.
Advancing AI and robotics technology has the potential to significantly improve the standard of living for everyone on earth, leading to a potential increase in the world's GDP by 13.5 trillion dollars. This is a massive opportunity, but it also raises important questions about the role of humans in a world where machines can do most things better and potentially for free. The AI systems may need to encourage humans to continue learning and finding value in their lives, rather than becoming idle and enfeebled. This is a complex issue that requires deep thought and discussion, and Stuart Russell encourages us all to consider the implications of this technological advancement.

Recent Episodes from Sean Carroll's Mindscape: Science, Society, Philosophy, Culture, Arts, and Ideas

276 | Gavin Schmidt on Measuring, Predicting, and Protecting Our Climate

The Earth's climate keeps changing, largely due to the effects of human activity, and we haven't been doing enough to slow things down. Indeed, over the past year, global temperatures have been higher than ever, and higher than most climate models have predicted. Many of you have probably seen plots like this. Today's guest, Gavin Schmidt, has been a leader in measuring the variations in Earth's climate, modeling its likely future trajectory, and working to get the word out. We talk about the current state of the art, and what to expect for the future.

Support Mindscape on Patreon.

Blog post with transcript: https://www.preposterousuniverse.com/podcast/2024/05/20/276-gavin-schmidt-on-measuring-predicting-and-protecting-our-climate/

Gavin Schmidt received his Ph.D. in applied mathematics from University College London. He is currently Director of NASA's Goddard Institute for Space Studies, and an affiliate of the Center for Climate Systems Research at Columbia University. His research involves both measuring and modeling climate variability. Among his awards are the inaugural Climate Communications Prize of the American Geophysical Union. He is a cofounder of the RealClimate blog.

See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

Sean Carroll's Mindscape: Science, Society, Philosophy, Culture, Arts, and Ideas

enMay 20, 2024

On this page

94 | Stuart Russell on Making Artificial Intelligence Compatible with Humans

Sean Carroll's Mindscape: Science, Society, Philosophy, Culture, Arts, and Ideas

Podcast Summary

Recent Episodes from Sean Carroll's Mindscape: Science, Society, Philosophy, Culture, Arts, and Ideas

276 | Gavin Schmidt on Measuring, Predicting, and Protecting Our Climate

275 | Solo: Quantum Fields, Particles, Forces, and Symmetries

AMA | May 2024

274 | Gizem Gumuskaya on Building Robots from Human Cells

273 | Stefanos Geroulanos on the Invention of Prehistory

272 | Leslie Valiant on Learning and Educability in Computers and People

AMA | April 2024

271 | Claudia de Rham on Modifying General Relativity

270 | Solo: The Coming Transition in How Humanity Lives

269 | Sahar Heydari Fard on Complexity, Justice, and Social Dynamics

Related Episodes

43 | Matthew Luczy on the Pleasures of Wine

18 | Clifford Johnson on What's So Great About Superstring Theory

248 | Yejin Choi on AI and Common Sense

AMA | November 2021

203 | N.J. Enfield on Why Language is Good for Lawyers and Not Scientists