
    #95 – Dawn Song: Adversarial Machine Learning and Computer Security

    May 12, 2020

    Podcast Summary

    • Advancements in formal verification and program analysis for secure codingWhile formal verification and program analysis are significant steps towards secure coding, they don't provide a silver bullet against all types of attacks. A holistic approach to security is necessary.

      While it's challenging to write completely secure code due to the ever-evolving nature of vulnerabilities and attacks, advancements in formal verification and program analysis techniques are making it possible to create formally verified systems with proven security properties. However, it's essential to recognize that these systems may still be vulnerable to other types of attacks, and the importance of security goes beyond just code analysis. In the conversation with Professor Dawn Song, she highlighted the broad spectrum of attacks, including memory safety vulnerabilities and side channels, and the importance of providing as many security guarantees as possible. Formal verification, which involves mathematically proving the correctness of a program, is a significant step forward in creating secure systems, but it's not a silver bullet. It's crucial to continue making progress in the field and to recognize the importance of a holistic approach to security.

    • Humans are the weakest link in cybersecurityWhile we focus on securing software systems, humans remain a significant vulnerability due to social engineering attacks and lack of ability to patch or upgrade themselves.

      While we strive to create more secure software systems, it's important to recognize that humans remain a significant vulnerability in cybersecurity. Traditional program verification techniques focus on static analysis, as they cannot fully account for the diverse and evolving nature of attacks. The tension between nations and groups in the cybersecurity realm only adds to the concern, as the future of conflicts may lie in this domain. Security is a complex issue, as we aim to prove a statement of no vulnerabilities, an impossible feat given the ever-changing landscape of attacks. Humans are the weakest link in cybersecurity, and attacks are increasingly targeting them through social engineering and deep fake technologies. As machines and systems can be patched and upgraded, humans lack this capability. Social engineering attacks, such as phishing, have already led to significant breaches at reputable organizations. In the future, these attacks are expected to become even more sophisticated and effective. To help mitigate this issue, projects are being developed that use AI and machine learning to assist humans in defending against these attacks. For instance, NLP and chatbot techniques can be employed to observe conversations between users and potential attackers, helping to identify and prevent social engineering attacks. This is a crucial step in addressing the human vulnerability in cybersecurity.

    • Chatbots and NLP enhance online security against social engineering attacksChatbots and advanced NLP techniques can detect and respond to phishing scams and social engineering attacks, acting as a user's personal security representative across all online platforms. However, privacy and control considerations are important as a powerful chatbot would need access to personal information.

      Chatbots and advanced NLP techniques have the potential to significantly enhance online security by detecting and responding to social engineering attacks. For instance, in phishing scams, chatbots can recognize suspicious patterns and even engage in challenge and response interactions to verify the correspondent's identity. As NLP and chatbot technologies continue to advance, they could become a user's personal security representative across all online platforms. However, there are important considerations regarding privacy and control, as a powerful chatbot would need access to a significant amount of personal information to effectively protect the user. Another intriguing area of research is adversarial machine learning, where attackers manipulate machine learning systems to produce incorrect or misleading results. These attacks can occur at both the inference and training stages. For instance, at the inference stage, attackers can add subtle perturbations to inputs to cause the machine learning system to give a completely wrong output. At the training stage, attackers can provide poisoned data sets to manipulate the model's learning. These attacks can have serious consequences, including incorrect predictions or decisions that benefit the attacker. Understanding and addressing these adversarial attacks is crucial for maintaining the accuracy and reliability of machine learning systems.

    • Manipulating machine learning systems with poisoned dataAttackers can introduce small numbers of poisoned data points during training to manipulate machine learning systems, leading to incorrect classifications and potential security risks.

      Attackers can manipulate machine learning systems by introducing small numbers of poisoned data points during the training phase, leading to the learning system making incorrect classifications, especially in specific situations known only to the attacker. This type of attack is stealthy and difficult to detect, even for humans visually reviewing the training data sets. For instance, using facial recognition as an example, attackers only need to insert a few poisoned data points to fool the learning system into learning the wrong model, potentially allowing unauthorized access or impersonation. The learning system learns patterns and associates them with certain labels, making it possible to manipulate it by providing training data with specific objects or characteristics, such as glasses, even if they are not visible to humans. The implications of this research are significant, as it highlights the vulnerability of machine learning systems to targeted attacks and the need for more robust security measures.

    • Physical Adversarial Attacks on Machine Learning SystemsPhysical adversarial attacks can manipulate machine learning systems by altering inputs in the physical world, posing unique challenges due to physical constraints, and can have severe consequences in applications like autonomous driving.

      Attacks on machine learning systems can occur both at the training stage by manipulating data and at the inference stage by altering inputs in both the digital and physical worlds. The physical world poses unique challenges as creating adversarial examples requires considering physical constraints, such as the location of perturbations and the need for perceptible changes after the camera captures the image. For instance, in the context of autonomous driving, an attacker could create a maliciously perturbed stop sign that can cause a learning system to misclassify it into a speed limit sign, potentially leading to severe consequences. These physical adversarial examples can remain effective despite changes in viewing distances, angles, and conditions. Understanding and addressing these attacks is crucial for ensuring the safety and reliability of machine learning systems in real-world applications.

    • Understanding the limitations of current deep learning modelsDespite advancements, deep learning models are still vulnerable to adversarial examples, highlighting the need for richer representations to build more resilient learning systems.

      While deep learning models have made significant advancements, the creation of adversarial examples in both the digital and physical worlds reveals that we are still in the early stages of developing robust and generalizable machine learning methods. The scientific process of generating adversarial examples involves understanding the constraints and limitations of the physical world and optimizing for them, but it also highlights that our current models may not be learning the right representations or a rich enough representation of the world. Although there have been numerous papers on defense mechanisms, their effectiveness is limited. To build more resilient learning systems, we need to learn richer representations that can better understand and interpret the nuances of the world, just as humans do.

    • Using Spatial Consistency as a Constraint in Segmentation SystemsResearchers explore using spatial consistency as a constraint in segmentation systems to defend against adversarial examples, making it harder for attackers to satisfy both the segmentation task and spatial consistency, resulting in an effective defense mechanism.

      To make machine learning models more robust and able to represent information richly, we need to make them less sensitive to noise and avoid learning spurious correlations. One example of richer information representation is semantic segmentation in image processing, where humans can identify more information than what an image classification system can. However, segmentation systems are also easily fooled by adversarial examples. To defend against this, researchers have explored using spatial consistency as a constraint in segmentation systems. Spatial consistency means that if two patches of an image have an intersection, the segmentation results at the intersection should be consistent. In experiments, this holds true for normal images but poses a challenge for adversarial examples, making it difficult for attackers to satisfy both the segmentation task and spatial consistency, resulting in an effective defense mechanism. This also aligns with the idea of having learning systems learn from multiple modalities or ways to check their predictions.

    • Adversarial attacks can be effective in various domainsResearchers and organizations must stay informed about the latest attack methods and invest in robust defense mechanisms against adversarial attacks in vision, audio, and natural language domains.

      While spatial and temporal consistency checks have shown promise in detecting adversarial examples in research settings, the attackers have the upper hand in the current literature. Adversarial attacks can be effective not only in vision but also in audio and natural language domains. Real-world systems, including Google Translate and cloud vision APIs, have already been successfully attacked using black-box methods. The ease of creating imitation models and generating adversarial examples is a significant concern. Regarding autonomous driving, the feasibility of attacks is a concern, with research already demonstrating the potential for manipulating Tesla's Autopilot system using stickers. However, the question remains whether such attacks can be executed in the actual physical world, such as on a highway. While the feasibility of the attack is a certainty, the intention and execution of such attacks are separate concerns. The current state of the literature shows that attackers have a significant advantage, with various attack methods and techniques being developed. The hope is that real-world systems will be more difficult to attack, but recent research has shown that this may not be the case. It is crucial for researchers and organizations to stay informed about the latest attack methods and invest in robust defense mechanisms.

    • Recognizing the limitations of autonomous vehicles and implementing robust defensesAutonomous vehicles are not infallible and can make wrong decisions even without attacks. Defenses against vulnerabilities include multi-modal, multi-sensor approaches to increase system integrity and confidentiality, and recognizing the importance of protecting privacy in machine learning settings.

      While the feasibility of sensory-based attacks on autonomous vehicles is a concern, it's important to remember that even without attacks, these systems can make wrong decisions. Moreover, natural settings have shown that learning systems don't always generalize well, leading to misbehavior. However, there are ways to defend against these vulnerabilities. One approach is to use a smart and model-based defense, with consistent checks and multiple sensors. This multi-modal, multi-sensor approach makes it harder for attackers to compromise the system's integrity or confidentiality. In terms of privacy, the main vulnerabilities lie in the confidentiality of the system, with attackers potentially gaining sensitive information. Integrity and confidentiality are two essential properties in security, and protecting privacy in the machine learning setting requires addressing these vulnerabilities. Overall, the key is to recognize that these systems are not infallible and to implement robust defenses to mitigate potential risks.

    • Protecting sensitive data during machine learning trainingMachine learning models can remember sensitive info from training data, leading to potential privacy attacks. Differential privacy adds noise to protect privacy, and data ownership is crucial to consider.

      During the training of machine learning models, protecting the privacy and confidentiality of the sensitive training data is of utmost importance. This is because machine learning models, especially those with high capacity like neural networks, can remember a significant amount of information from the training data. An attacker, whether through white box attacks, where they have access to the model parameters, or query attacks, where they only have access to the model to query, can potentially infer sensitive information about the original training data. For instance, an attacker can extract sensitive personally identifiable information like social security numbers and credit card numbers from a language model trained on email datasets. To protect against these attacks, there are mechanisms like differential privacy that add noise during the training process to ensure that the presence of a particular person in the data cannot be determined. Differential privacy enhances privacy protection by making the learned model private, making attacks less effective. Another related concept is data ownership, which is an interesting idea in the context of using online services for seemingly free. It's essential to consider who owns the data generated or collected during the use of these services and how it is being used. Ultimately, it's crucial to be aware of the potential risks and take appropriate measures to protect sensitive information.

    • Personal data ownership and control in the digital age compared to property rightsEstablishing clear ownership of personal data and allowing individuals control over its use is crucial for privacy, control, and economic growth.

      The ownership and control of personal data is a crucial aspect of economic growth and individual privacy in the digital age. The comparison can be drawn to property rights, which have been a significant driver for economic growth throughout history. Currently, internet companies largely own and control the data generated by individuals, leading to targeted advertising and potential privacy concerns. Establishing clear ownership of personal data and allowing individuals to define how it is used is essential to prevent manipulation and ensure privacy. This not only benefits individuals but also promotes economic growth. The recognition and enforcement of these rights are vital, as seen in the historical development of property rights and their impact on economic growth. The digital world, where more and more information and assets are moving, necessitates a shift in focus towards data ownership and control. This is a complex issue that requires careful consideration and action to ensure that individuals have the power to decide how their data is used, promoting privacy, control, and economic growth.

    • Balancing User Privacy and Company UtilityA nuanced dialogue is needed to balance user privacy and company utility, involving technical solutions and regulatory frameworks to ensure responsible data usage

      The ownership and control of data on the internet is a complex issue with both positive and negative implications. On one hand, it can lead to long-term economic growth and free services for users. On the other hand, it could potentially change the way the internet looks and reduce the value of seemingly free services if users are hesitant to hand over their data. However, the solution is not a simple fight between user privacy and company utility. Instead, a more nuanced dialogue is needed to establish a balance between the two. This dialogue should involve understanding the technical challenges and developing privacy-preserving technologies, as well as providing regulatory frameworks to help both sides willingly engage in data trade. Ultimately, the goal is to ensure that data is utilized responsibly and in a way that benefits all parties involved.

    • Impacts of Facebook and Blockchain on our Digital WorldFacebook shapes digital identity while blockchain ensures security and immutability; ongoing dialogue and solutions are necessary for addressing challenges in identity, privacy, and security.

      Both Facebook and emerging technologies like blockchain have significant impacts on our digital world, bringing about new possibilities and challenges, particularly in the areas of identity, privacy, and security. Facebook's role in creating a digital identity is undeniable, allowing people to be themselves online using their real names and pictures. However, it's crucial to have ongoing dialogue about the negative aspects and work towards constructive solutions. Similarly, blockchain, a decentralized and distributed digital ledger, offers security and immutability, essential for transactions and digital currency. Understanding the importance of security and privacy in these contexts is vital as we navigate the digital landscape.

    • Decentralized systems ensure security but face challenges with integrity and privacyDecentralized systems offer security through consensus mechanisms but struggle with ensuring transaction confidentiality. Solutions include secure computing and confidential smart contracts.

      While decentralized systems like cryptocurrencies offer security through distributed consensus mechanisms, they also come with challenges related to integrity and privacy. The security of these systems depends on the consensus mechanism and the resources required to compromise them. For instance, Bitcoin's proof-of-work mechanism has required significant electricity usage, making it more secure. However, the public nature of these ledgers means that transactions are not confidential. To address this, confidentiality can be ensured through additional mechanisms like secure computing and confidential smart contracts. Oasis Labs is an example of a startup working on such solutions. Program synthesis, another intriguing area in computer science, involves teaching computers to write code. While neural networks can help learn aspects of program synthesis, it remains a complex problem. For the speaker, shifting from security to AI and machine learning led them to explore program synthesis and adversarial machine learning.

    • Exploring the Challenges and Progress in Program SynthesisProgram synthesis is a critical area of AI research, pushing machine intelligence to generate complex programs and achieve AGI. Despite challenges, progress is being made, particularly in limited domains, and the potential for advancements is significant.

      Program synthesis is an essential area of research in the field of artificial intelligence and machine learning, serving as a "perfect playground" for building intelligent machines and achieving artificial general intelligence. It represents the ultimate test of machine intelligence, as it involves generating programs that can express complex ideas, reason through them, and convert them into algorithms. While we have made significant progress in this field, particularly in limited domains such as natural language translation, there are still challenges to be addressed, including increasing the complexity of the programs we can synthesize and measuring progress effectively. The community of researchers in this area is growing, and we are seeing real-world applications in limited domains. The ability to learn in the space of programs is an exciting prospect, despite the challenges, and the potential for advancements in this field is significant. The metrics for measuring progress in program synthesis include the complexity of the task to be synthesized and the complexity of the synthesizer programs themselves. The field is still small but growing, and its researchers are making strides towards synthesizing increasingly complex programs. Program synthesis is a crucial step towards building intelligent machines and achieving artificial general intelligence.

    • From physics to computer science: Insights from an interdisciplinary backgroundAdvancements in programming synthesis require focus on complexity, generalization, and adaptation to create versatile and adaptive tools. Interdisciplinary backgrounds, like physics and computer science, offer unique insights in this field.

      Advancements in programming synthesis and machine learning, specifically in the areas of complexity, generalization, and adaptation, hold great potential for creating tools that can learn and solve new problems. The journey from physics to computer science for the speaker involved a transition from studying the natural world to designing and creating it, and this interdisciplinary background has informed their research in programming synthesis. Complexity in programming synthesis has evolved from simple if-then-that programs to more complex SQL queries and recursive programs. Generalization is another crucial aspect, allowing learn programs to synthesize solutions for a wide range of inputs and tasks. Adaptation, which goes beyond programming synthesis, involves learning from past experiences to solve new tasks, much like how humans learn. The speaker emphasizes the importance of focusing on these areas to create more versatile and adaptive tools. Their unique background in physics and computer science has provided valuable insights in this field. The differences between cultures, such as China and the United States, have also influenced their perspective, with the emphasis on theoretical foundations in physics and the practical applications in computer science. Overall, the potential for programming synthesis to learn and adapt to new problems, combined with the speaker's interdisciplinary background, presents an exciting opportunity for future advancements in this field.

    • From Physics to Computer Science: A Journey of DiscoveryThe speaker's background in physics influenced their approach to computer science, appreciating the elegance of deriving complex concepts from simple laws, while recognizing the importance of understanding historical context and design choices in computer systems, and reflecting on their experience transitioning from China to the US.

      The speaker's background in physics deeply influenced their approach to machine learning and computer science. The speaker was initially drawn to physics due to its elegance and ability to derive complex concepts from simple laws. However, during their graduate studies, they found that the research process in physics was more complex and time-consuming than they anticipated. In contrast, they found computer science to be more straightforward, as ideas could be quickly brought to life through coding. The speaker also noted that while physics provides a solid foundation for problem-solving and critical thinking, the artificial nature of computer systems requires an understanding of historical context and design choices. Lastly, the speaker discussed their experience transitioning from China to the United States and how it shaped their perspective, highlighting how times have changed with increased globalization and access to technology.

    • Collaboration in AI between US and ChinaCultural differences and geographical distances do not hinder scientific progress in AI as ideas and advancements are shared globally.

      Despite cultural differences and geographical distances, collaboration in fields like AI between the US and China is possible due to the borderless nature of science and academic research. This openness allows for the sharing of ideas and advancements, leading to progress for the whole world. A transformative moment for the speaker in falling in love with computer science was the realization that they could bring their ideas to life through programming. As for the meaning of life, the speaker believes that each individual should define their own purpose and fulfillment, rather than relying on external sources. This belief was not influenced by a mortality experience, but rather a personal introspection.

    • Discovering the Meaning of LifeEach person must define their own meaning of life, offering freedom and responsibility to shape a fulfilling existence. Reflection on personal beliefs and values can provide valuable insights.

      The meaning of life is a deeply personal and subjective concept that each individual must define for themselves. While some may seek guidance from external factors or voices, ultimately, it is the individual who holds the power to define their own purpose and meaning in life. This can be a daunting responsibility, but it also offers the freedom to shape one's life in a way that brings joy, fulfillment, and growth. Some people may find meaning through creation, experience, or growth, while others may discover it through different means. The question of the meaning of life may never have a definitive answer, but the act of asking it and reflecting on one's own beliefs and values can be a valuable and enriching experience.

    • The search for meaning is a personal journeyFocusing on a specific goal or passion can lead to fulfillment and success, it's a personal journey to define what gives life purpose and value.

      While the question of the meaning of life is a profound and important one, it may not lead to happiness or definitive answers. However, having a clear sense of purpose can be liberating and help focus one's efforts. It's a question that humans are naturally drawn to, but it's important not to get lost in it. Instead, focusing on a specific goal or passion can lead to fulfillment and success. Don's personal experience of shifting his focus from security to AI and machine learning serves as an example of this. Ultimately, the search for meaning is a personal journey, and it's up to each individual to define what gives their life purpose and value.

