Podcast Summary
A seasoned AI professional with a passion for AI shares his background and vision for MultiOn: Div Garg, an experienced AI expert, discusses his latest project, MultiOn, a new AI personal agent that uses the browser to execute complex tasks, and its potential to fill the gap left by projects like Auto GPT.
Div Garg, the founder of MultiOn, is a seasoned AI professional with a background in physics and a passion for AI that started during his undergraduate studies. With almost six years of experience in the field, Div has worked on various AI projects, including autonomous driving cars at Uber and research on making autonomous driving safer and more cost-effective. His latest venture, MultiOn, is a new AI personal agent that uses the browser to execute complex tasks and has gained significant interest due to its potential to fill the gap left by projects like Auto GPT. The conversation between Div and the podcast host explores his background, MultiOn's current capabilities, and the future of AI personal agents. Div's unique perspective, gained from his extensive experience in the field, adds value to the discussion on the potential and possibilities of AI personal agents.
From Top-Secret AI Projects to Teaching Robots with Human Data: The speaker's journey in AI spans from working on top-secret projects in big companies to researching human data to teach robots and agents, ultimately believing language model technology is bridging the gap between humans and AI.
The speaker's experience in the field of AI spans from working on top-secret projects in big companies like Google, Apple, and NVIDIA, to researching and developing algorithms for more controllable robots and physical agents at Stanford University. During his time at these companies, he worked on various AI projects, including computer vision, reinforcement learning, and diffusion models. However, he noted that while these projects were interesting, they lacked real-life applications. Later, during his PhD at Stanford, he focused on using human data to teach robots and agents to learn and perform tasks, such as building houses in Minecraft using human videos as examples. He also worked on a project to enable robots to be controlled using natural language. The speaker then transitioned to working at a robotic startup and was excited about the progress being made in language model technology, which he believed was bridging the gap between humans and AI agents. He saw this as a turning point where AI could become more usable and accessible to everyone.
Exploring the future of AI agents through web browser interaction: The future of AI agents lies in their ability to effectively use web browsers for interaction, enabling them to plan and execute tasks on the web and potentially the desktop.
The future of AI agents lies in their ability to interact with the web in a human-like way. The speaker's interest in this field stems from his observation of the evolution of technology and his past experiences in making theoretical tools usable. He believes that teaching an AI to use the web browser as a front door is a powerful and horizontal approach, as it allows for interaction with anything on the web and potentially the desktop. This is in contrast to current methods that rely on restrictive plugins and APIs. The speaker's thesis is that an AI trained to effectively use a web browser can create powerful virtual agents. The excitement around AI agents, as seen with the hype and subsequent deflation around Auto GP-T, was rooted in the idea that an AI could figure out how to achieve a goal and then execute the steps. However, many found that while the planning part worked, the execution did not. MultiOn's approach aims to address this by focusing on the browser as a means of interaction, ensuring that the AI can not only plan but also execute its actions. The speaker also mentions that MultiOn's AI was functional back in February but was not released due to concerns around trust and safety and the skepticism of potential users.
Advanced AI agent MultiOn assists non-tech users in web navigation: MultiOn, an advanced AI agent, assists non-technical users in web navigation by automating tasks, asking clarifying questions, and explaining each step in a trustworthy and reliable manner.
MultiOn, an advanced AI agent, is currently in closed beta testing and is designed to help non-technical users navigate the web by automating tasks and asking clarifying questions. The developers have been focusing on making the agent more reliable and trustworthy, with safety guardrails in place to prevent misuse. The agent works by creating a plan and then executing it while explaining each step to the user in a MultiOn window. It can also ask clarifying questions and automate tasks such as ordering food online. The developers are currently increasing the number of beta users and plan to iteratively improve the agent based on user feedback. The login process is taken seriously to prevent misuse. Overall, MultiOn aims to make web browsing a more efficient and trustworthy experience for non-technical users.
Interactive AI tool with two modes: step-by-step and auto: Users can control an AI tool in two ways: step-by-step for learning or trust-building, and auto for hands-off task completion. It's being used for research and social media tasks.
The new AI tool offers users two modes for interaction: a step-by-step mode where users approve each action, and an auto mode where the AI performs tasks independently. Users can control the step-by-step mode with hot keys, allowing them to pause, resume, or give new commands. Some users may prefer this mode as a learning tool or for building trust, while others find it entertaining. The auto mode lets users watch the AI interactively perform tasks, with a pause button for control. Users have been using the tool for research and social media tasks, such as wishing happy birthdays on Facebook or finding contacts on LinkedIn. Overall, the tool offers a new way to control a computer, potentially replacing the need for a keyboard and mouse in the future.
Streamline scheduling and sending Zoom invites: AI assistant saves time by creating calendar invites and emailing participants with Zoom links using personal info and preferences
The discussed AI assistant can streamline the process of creating calendar invites and emailing participants with Zoom links, saving users time and effort, especially for those who frequently schedule meetings. The assistant uses the user's personal information and preferences, such as their calendar and Zoom link, to create customized invites. It also includes a verification workflow for the user to review and approve the invite before sending it out. The assistant currently defaults to using Google Calendar and Google for searches, but in the future, users may be able to customize these defaults. Additionally, the assistant is being developed with a memory scratch pad feature that allows users to provide personal details, such as their Zoom link and preferences, for the assistant to use.
Combining general and specialized AI assistants: People prefer interacting with one AI assistant for multiple tasks, but specialized agents are valuable for complex tasks, and AI integration into apps will make them more accessible.
The future of AI personal assistance is likely to involve a combination of general and specialized agents. People prefer interacting with one agent that can handle multiple tasks, reducing everyday friction. However, there is also a place for specialized agents, particularly in areas where complex tasks require extensive research and planning, such as travel or finance. Additionally, as more companies integrate AI interfaces into their services, users may not need to seek out specialized agents as they become integrated into the apps we already use. Overall, the goal is to create a helpful and efficient AI assistant that can handle a range of tasks, from the mundane to the complex.
Exploring the potential of agent-like AI models in the browser: Companies are integrating advanced AI models like ChatGPT, but there's also growing interest in agent-like models for new applications and experiences in the browser. A hackathon is being organized to explore potential uses and capabilities, including handling large sums of money, while ensuring safety and moderation.
That there's a growing interest among companies to integrate advanced AI models like ChatGPT into their operations, but there's also a significant opportunity for horizontal, agent-like models to enable new applications and experiences, particularly in the browser. The speakers also mentioned that they are expanding their beta testing and organizing a hackathon to explore the potential of their agent technology, which could allow users to build powerful applications and even control purchasing power. This upcoming event could potentially serve as a testing ground for new AI capabilities, including the ability to handle large sums of money. The speakers also emphasized the importance of ensuring safety and moderation during the hackathon. Overall, the conversation highlighted the exciting potential of AI technology and the innovative ways it's being explored and applied.