Logo
    Search

    Podcast Summary

    • Leveraging advanced tools and engineering cultures from large tech companies to drive innovation in smaller startupsStatsig offers a unified platform for feature flags, experimentation, and analytics, enabling smaller companies to build, ship, and understand the impact of new features effectively, ultimately making data-driven decisions and improving products.

      The founders of tech startups often draw inspiration from their experiences at larger tech companies, where they gain access to advanced tools and engineering cultures that help drive innovation and efficiency. Vijay from Statsig shared his personal journey of observing these practices at Facebook and being motivated to bring similar sophistication to smaller companies. He saw the opportunity to level the playing field by making these advanced tools accessible to a wider audience. Statsig, the company he founded, aims to do just that by offering a unified platform for feature flags, experimentation, and analytics. This allows engineers to build, ship, and understand the impact of new features more effectively, ultimately helping companies make data-driven decisions and improve their products. The opportunity to bring these advanced tools outside of large tech companies is significant, as not every company has the resources to build such capabilities in-house.

    • Revolutionizing UI automation with natural languageAsk UI uses AI to bridge the gap between natural language and UI tasks, enabling efficient and accessible automation of repetitive UI tasks.

      Ask UI is a company focused on freeing humans from repetitive tasks on user interfaces by bridging the gap between describing intentions in natural language and automating user interface tasks. The founders, Dominic and Jonas, were inspired by their experiences in software development and testing, where they recognized the need for a more efficient and accessible solution for automating UI tasks. The data used for this automation can come from various sources, with a focus on visual information. The challenge lies in training AI models to accurately interpret and respond to natural language input and interact with user interfaces or web pages. This involves understanding the complexities and variations of different UI designs and ensuring the AI model can accurately interpret and execute intended tasks. Dominic's background in software development and testing, along with his curiosity about the potential of AI, led him to start Ask UI and embark on this journey to revolutionize UI automation.

    • AI-driven UI automation using screenshotsAI models analyze screenshots for UI elements detection, enabling automation of legacy applications on multiple operating systems

      The discussed technology uses AI models to analyze screenshots of user interfaces instead of directly interacting with applications for automation. This approach allows detection of various UI elements such as buttons, text fields, and icons. The technology also includes object detection models to identify elements on screenshots and tie them to specific tests or workflows. Unlike traditional web scraping methods, this technique starts with a screenshot and performs classification on it. The technology is currently used on Windows, Mac, and iOS operating systems, and can be particularly useful for testing legacy applications. The flexibility of AI technology enables users to describe new use cases that might not have been initially considered.

    • Automating tasks between unstructured data and user interfacesTechnology can recognize and automate repetitive tasks, copy info from PDFs, and mimic human interactions with interfaces, making tasks more efficient and freeing up time for complex work.

      The technology being discussed has the potential to automate various tasks, particularly those involving the transfer of information between unstructured data and user interfaces. This could include automatically copying information from PDFs or other sources to specific formulas, as well as recognizing and automating repetitive tasks based on historical data. The technology can be applied to different platforms, including web apps and enterprise apps, by accessing screenshots and controlling the interface. The ultimate goal is to create systems that can understand and mimic human interactions with interfaces, making tasks more efficient and freeing up time for more complex work. This automation can be particularly beneficial for repetitive tasks, although concerns around job displacement are valid. The technology's flexibility and ability to learn and adapt make it a promising solution for various industries and applications.

    • Bridging the gap between machine learning research and productionTo make machine learning and AI systems accessible and adaptable for customers, focus on creating software patterns and iterating based on customer feedback. Initially, provide a ready-to-use solution, then transition towards self-service models as tools become available.

      Ask UI approaches building and deploying machine learning and AI systems with a software engineering mindset, focusing on making these technologies accessible and adaptable for customers. The team recognized a gap in the research community, where models were developed but not brought to production. They sought to create software patterns, like metric and trainer patterns, to streamline the process. Initially, they built an application using their model directly, but customers complained about its performance. They then improved the model, supported more applications, and iterated based on customer feedback. As tools like TensorFlow and Pytorch became available, they shifted towards data pipelines, allowing customers to train models themselves. From the customer's perspective, deploying and utilizing Ask UI involves engaging with the team, receiving improvements based on feedback, and eventually training and using the models themselves.

    • Automating UI interactions and expanding capabilitiesThe platform aims to simplify automation by reducing learning hurdles and enabling users to automate not only known tasks but also new tasks using large language models and documentation translation.

      The discussion revolved around the current and future capabilities of a platform that enables users to automate interactions with UIs, and the potential for expanding this functionality to include tasks that don't require direct user interaction. The platform aims to make automation easy for users by reducing the hurdles to learning and implementing it. The possibility of an agent executing tasks on behalf of the user, such as creating an AWS account and setting up infrastructure, was brought up. While this idea was considered a potential bad one due to communication hurdles, the use of large language models and documentation translation for automating new tasks is a current and future direction for the platform. The goal is to enable users to automate not only tasks they've already done with UIs, but also new tasks they don't want to learn how to do manually.

    • Prioritize Security in Automated Tests with Synthetic DataUse synthetic data in tests to prevent leaks and ensure security compliance. Inject sensitive info with env vars or secret files. Combine with other testing frameworks and connect to databases for comprehensive testing. Prioritize security and flexibility in testing strategy.

      When automating tests for applications, it's crucial to prioritize security by using synthetic or generated data instead of production data. This helps prevent leaks and ensures compliance with security standards. Additionally, using environment variables or secret files to inject sensitive information is recommended. Our tool, while primarily focused on TypeScript, can be combined with other testing frameworks like Selenium and can even connect to databases for more comprehensive testing. However, there is a limit to what low-code user interface automation can accomplish, and developers are needed to build more complex integrations. Overall, prioritizing security and flexibility are key when designing and implementing a testing strategy.

    • Creating a lightweight search solution with PageFindPageFind is a static search library that generates a small search bundle for large websites, offering a fast and efficient search experience while minimizing bandwidth usage. Despite challenges in implementing machine learning and AI, the developer persevered and created a tool that can potentially replace services like Algolia.

      PageFind, a static search library, offers a solution for large websites to provide search functionality while minimizing bandwidth usage. This library, which can be used alongside static site generators like Hugo and 11ty, generates a static search bundle and exposes a JavaScript search API. PageFind's search index is split into chunks, allowing for efficient browsing even on sites with tens of thousands of pages. The library's total network payload is typically under 100 kilobytes, making it a potential replacement for services like Algolia. For the developer behind PageFind, implementing machine learning and AI in the product presented several challenges. Initially, they lacked practical experience with machine learning and faced difficulties with concepts like learning rates and connecting layers. As they progressed, they encountered challenges related to making experiments visible, managing data, increasing data, versioning data, and ensuring repeatable experiments. Through these experiences, they learned about various tools to help address these challenges. However, even with these accomplishments, they encountered a significant setback when they realized their code had been inadvertently released to the public. Despite these hurdles, the developer's determination to apply machine learning and AI to real-world problems led to the creation of PageFind, a search solution that aims to provide a seamless user experience while minimizing bandwidth usage.

    • Learning from proven patterns and tools for machine learning projectsUse libraries like PyTorch and Hugging Face for modular models, and consider developing custom labeling tools for data exchange and efficiency.

      Starting a machine learning project or building a startup in this field can be a complex and evolving journey. To get started, it's essential to learn from others and adopt proven patterns and tools. For instance, using libraries like PyTorch and Hugging Face can help build modular models and save time. However, as projects grow, new challenges emerge, such as exchanging and labeling data. In such cases, developing custom labeling tools can significantly improve productivity and efficiency. When starting, it's important to introduce supportive tools and continuously learn, as the field is constantly evolving. Remember, the journey may be daunting, but with determination and the right resources, success is achievable.

    • Collaboration between software engineers and ML researchersEffective collaboration between software engineers and ML researchers is crucial for successful projects. Version control systems like DVC can facilitate this by enabling efficient communication and knowledge exchange. Focus on improving collaboration and development processes to ensure team alignment and success.

      Effective collaboration between software engineers and machine learning researchers is crucial for optimizing development processes and achieving successful projects. Version control systems like DVC can facilitate this collaboration by enabling efficient communication and knowledge exchange. The main challenge now is to streamline the development process itself, ensuring the right research is conducted, designs are sound, and requirements are clearly defined. This requires a common understanding and alignment within the team. Looking ahead, the future of the project may involve tackling technical challenges related to generative AI and expanding the capabilities of the models to support a wider range of use cases. However, the primary focus should be on improving collaboration and development processes to ensure the team is working effectively towards a shared goal.

    • Combining large language models with visual capabilities for end-to-end automationLarge language models with visual capabilities can automate various tasks, making technology accessible to everyone, including non-tech savvy individuals, and bring down barriers to usage.

      The future of technology lies in combining large language models with visual capabilities to create end-to-end solutions that can automate various tasks, making them accessible to everyone, including those who may not be tech-savvy. This includes using manuals to teach the model how to interact with software, allowing it to create accounts or perform other tasks automatically. The potential benefits of such automation extend beyond technical fields and can help people scale tasks they don't want to do or can't handle. The conversation between the podcast guests highlighted the positive aspects of automation and the excitement for the future developments in this area. The use of large language models with visual capabilities can bring down the barriers to technology usage, making it accessible to everyone, including grandpas. The speakers expressed their enthusiasm for the future work in this field and appreciated the opportunity to discuss it on the podcast. Practical AI listeners are encouraged to subscribe, share the podcast with others, and check out Fastly and Fly for their partnership in bringing changelog podcasts.

    Recent Episodes from Practical AI: Machine Learning, Data Science

    Apple Intelligence & Advanced RAG

    Apple Intelligence & Advanced RAG
    Daniel & Chris engage in an impromptu discussion of the state of AI in the enterprise. Then they dive into the recent Apple Intelligence announcement to explore its implications. Finally, Daniel leads a deep dive into a new topic - Advanced RAG - covering everything you need to know to be practical & productive.

    The perplexities of information retrieval

    The perplexities of information retrieval
    Daniel & Chris sit down with Denis Yarats, Co-founder & CTO at Perplexity, to discuss Perplexity’s sophisticated AI-driven answer engine. Denis outlines some of the deficiencies in search engines, and how Perplexity’s approach to information retrieval improves on traditional search engine systems, with a focus on accuracy and validation of the information provided.

    Using edge models to find sensitive data

    Using edge models to find sensitive data
    We’ve all heard about breaches of privacy and leaks of private health information (PHI). For healthcare providers and those storing this data, knowing where all the sensitive data is stored is non-trivial. Ramin, from Tausight, joins us to discuss how they have deploy edge AI models to help company search through billions of records for PHI.

    Rise of the AI PC & local LLMs

    Rise of the AI PC & local LLMs
    We’ve seen a rise in interest recently and a number of major announcements related to local LLMs and AI PCs. NVIDIA, Apple, and Intel are getting into this along with models like the Phi family from Microsoft. In this episode, we dig into local AI tooling, frameworks, and optimizations to help you navigate this AI niche, and we talk about how this might impact AI adoption in the longer term.

    AI in the U.S. Congress

    AI in the U.S. Congress
    At the age of 72, U.S. Representative Don Beyer of Virginia enrolled at GMU to pursue a Master’s degree in C.S. with a concentration in Machine Learning. Rep. Beyer is Vice Chair of the bipartisan Artificial Intelligence Caucus & Vice Chair of the NDC’s AI Working Group. He is the author of the AI Foundation Model Transparency Act & a lead cosponsor of the CREATE AI Act, the Federal Artificial Intelligence Risk Management Act & the Artificial Intelligence Environmental Impacts Act. We hope you tune into this inspiring, nonpartisan conversation with Rep. Beyer about his decision to dive into the deep end of the AI pool & his leadership in bringing that expertise to Capitol Hill.

    Full-stack approach for effective AI agents

    Full-stack approach for effective AI agents
    There’s a lot of hype about AI agents right now, but developing robust agents isn’t yet a reality in general. Imbue is leading the way towards more robust agents by taking a full-stack approach; from hardware innovations through to user interface. In this episode, Josh, Imbue’s CTO, tell us more about their approach and some of what they have learned along the way.

    Private, open source chat UIs

    Private, open source chat UIs
    We recently gathered some Practical AI listeners for a live webinar with Danny from LibreChat to discuss the future of private, open source chat UIs. During the discussion we hear about the motivations behind LibreChat, why enterprise users are hosting their own chat UIs, and how Danny (and the LibreChat community) is creating amazing features (like RAG and plugins).

    Mamba & Jamba

    Mamba & Jamba
    First there was Mamba… now there is Jamba from AI21. This is a model that combines the best non-transformer goodness of Mamba with good ‘ol attention layers. This results in a highly performant and efficient model that AI21 has open sourced! We hear all about it (along with a variety of other LLM things) from AI21’s co-founder Yoav.

    Related Episodes

    When data leakage turns into a flood of trouble

    When data leakage turns into a flood of trouble
    Rajiv Shah teaches Daniel and Chris about data leakage, and its major impact upon machine learning models. It’s the kind of topic that we don’t often think about, but which can ruin our results. Raj discusses how to use activation maps and image embedding to find leakage, so that leaking information in our test set does not find its way into our training set.

    Stable Diffusion (Practical AI #193)

    Stable Diffusion (Practical AI #193)
    The new stable diffusion model is everywhere! Of course you can use this model to quickly and easily create amazing, dream-like images to post on twitter, reddit, discord, etc., but this technology is also poised to be used in very pragmatic ways across industry. In this episode, Chris and Daniel take a deep dive into all things stable diffusion. They discuss the motivations for the work, the model architecture, and the differences between this model and other related releases (e.g., DALL·E 2). (Image from stability.ai)

    AlphaFold is revolutionizing biology

    AlphaFold is revolutionizing biology
    AlphaFold is an AI system developed by DeepMind that predicts a protein’s 3D structure from its amino acid sequence. It regularly achieves accuracy competitive with experiment, and is accelerating research in nearly every field of biology. Daniel and Chris delve into protein folding, and explore the implications of this revolutionary and hugely impactful application of AI.

    Zero-shot multitask learning (Practical AI #158)

    Zero-shot multitask learning (Practical AI #158)
    In this Fully-Connected episode, Daniel and Chris ponder whether in-person AI conferences are on the verge of making a post-pandemic comeback. Then on to BigScience from Hugging Face, a year-long research workshop on large multilingual models and datasets. Specifically they dive into the T0, a series of natural language processing (NLP) AI models specifically trained for researching zero-shot multitask learning. Daniel provides a brief tour of the possible with the T0 family. They finish up with a couple of new learning resources.