Logo
    Search

    Podcast Summary

    • Creating a solution for unstructured data using databases and machine learningImpera uses a combination of database and machine learning technology to help users work with unstructured data, allowing for easy verification of machine learning predictions and quick training through user feedback.

      Impera, a company founded by Ankur Goyal, aims to make it easy for users to work with unstructured data using a combination of database and machine learning technology. The motivation behind the company came from speaking with customers who struggled to use relational databases for complex, unstructured data. Initially, they believed the biggest challenge would be helping companies understand image and video content. However, they later learned that the bottleneck was actually the creation of this content rather than its understanding. Goyal, who has a background in relational databases, saw the potential of machine learning to help people work with any kind of data, no matter how messy or complicated. Impera's approach is designed to allow users to easily see if machine learning predictions are correct or not and make adjustments, which drives feedback into the model and incrementally trains it. This design results in a very lightweight machine learning approach that can train and evaluate quickly.

    • Discovering the Potential of Unstructured Data ProcessingThrough a fortunate discovery, a team recognized the potential of machine learning models in processing unstructured data, leading to the development of accessible and powerful solutions for businesses.

      Companies discovered an opportunity to work with unstructured data, specifically in invoices and documents, through a happy coincidence with a machine learning model that utilized optical character recognition (OCR). This discovery led to the realization that there was a significant potential for helping businesses process and analyze unstructured data, which was not typically considered in machine learning applications. At first, this challenge did not concern the team, as they had confidence in their ability to find solutions. They leaned on computer vision to reason about PDF files due to their hybrid nature of text and visual elements. However, they soon realized that they could work with various document formats, including emails, HTML files, scanned images, and pictures from phones. They preprocessed uploaded files into a consistent data structure, normalizing them into pixels, text, and bounding boxes. During the early machine learning days, OCR and reading data from invoices were not new concepts. However, most businesses did not take advantage of these solutions due to the lack of user-friendly options. The team identified this gap and aimed to create an accessible and powerful solution to help businesses work with unstructured data more efficiently.

    • Challenges with pre-trained OCR modelsWhile convenient, pre-trained OCR models may struggle with low-quality images or handwriting and may not extract all necessary fields, leading to inconsistent data extraction and the need for additional processing.

      While pre-trained OCR models like Textract offer convenience by eliminating the need for template definition, they come with their own set of challenges. These models may struggle with low-quality images or handwriting, and may not extract all the necessary fields from a document. Moreover, if a document contains fields that are consistently missed, there's no way to instruct the model to improve. These issues can lead to inconsistent data extraction and the need for additional processing to normalize the results. This is where our solution comes in, providing a more accurate and customizable OCR experience by allowing users to define their own templates and leveraging machine learning to continually improve the extraction process.

    • Addressing unexpected schema differences in document processingTo enhance document processing with AI models, address unexpected schema differences by catering to both beginner and advanced Excel users, allowing complex expressions and formulas, and simplifying data extraction with natural language queries.

      When dealing with document processing using AI models, unexpected schema differences between uploaded documents and pretrained models can lead to a manual and time-consuming process for users. This issue was not initially considered but resulted in the need for users to implement their own machine learning models to translate the inferred schema back to the intended one. This process was laborious and required a significant amount of manual effort, even with available tooling. Initially, the focus was on OCR and visual models, but the models were not pretrained, and they learned solely from the user's documents. The challenges included addressing white space issues and dealing with inferred schemas that were not intended. To improve the experience, the team set constraints for a self-service product, supporting any document schema, and catering to both beginner and advanced Excel users. Through user research, they discovered that most users were either basic or advanced Excel users. As a result, Impira allowed users to create complex expressions and formulas, making the product more accessible and user-friendly for non-technical users. This approach ultimately led to the development of dotquery, which simplifies the process of extracting data from documents using natural language queries.

    • Designing a machine learning approach for document processing with user feedbackThe team created Impirus, a lightweight model for document processing with quick training, but later introduced Stock Query to handle manual judgment and interpretation tasks, enhancing overall efficiency.

      The team aimed to create a machine learning approach that allows users to easily check the accuracy of predictions and provide feedback for incremental training. This design resulted in a lightweight, quick-training model. Users primarily interact with documents by integrating information from them into their workflows and asking analytical questions. These tasks often involve manual judgment and interpretation, and the team's technology, Impirus, initially missed addressing these needs. However, they later introduced Stock Query to tackle these aspects. Overall, the goal is to make document processing more efficient and less manual, allowing users to focus on higher-level tasks.

    • Impira addresses pain points of nonprofits and small organizations with time-consuming data labeling tasksImpira simplifies data labeling process for nonprofits and small organizations using text-based question answering models, reducing the need for manual labeling and improving handling of various formats

      Impira, a data extraction tool, identified the pain points of users, particularly in nonprofits and small organizations, who often handle administrative tasks that are time-consuming and not their areas of expertise. These tasks include providing labels for models to learn from, which can be tedious and time-consuming, especially when dealing with a wide variety of formats. In response, Impira aimed to improve the user experience by enabling users to work with any field they want and creating a simpler labeling process. They explored text-based question answering models, like those offered by Hugging Face, which proved to be surprisingly accurate even without specific training or context. This discovery led Impira to believe they could achieve better results with a little more effort, ultimately reducing the need for manual labeling and improving the tool's ability to handle a wide range of formats.

    • Drift's new question answering tool, Dotquery, was inspired by the potential of question answering frameworks.Drift recognized the potential of question answering frameworks and developed Dotquery to address the generalization problem and improve existing solutions.

      The development of Dotquery, Drift's new question answering tool, was inspired by the infinite possibilities offered by the question answering framework, aligning well with their product philosophy. This realization came about when the model, which had never seen documents like those being pasted into the text box, performed exceptionally well, indicating its potential to solve the generalization problem. This breakthrough occurred during a memorable car ride and late-night hot spot sessions. This recent announcement, on September 1st, also involved the integration of Hugging Face and its pipeline. Drift had been working on large language models, as mentioned on Twitter, and had collaborated with Hugging Face on this problem. The pipeline abstracts away the complex machinery, making it easier for non-experts to work with models. The question answering pipeline, specifically, caught their attention due to its compatibility with models that fit the question answering framework. They were also aware of Microsoft's layout LM, a language model that takes both text and bounding boxes as input, introducing geometric information relevant to their problem. However, they couldn't find a question answering pipeline that worked with layout LM, leading them to believe that there was an opportunity to innovate and improve the existing solution.

    • Open-source initiatives foster innovation and opportunitiesOpen-source projects can lead to wider access, innovation, and business opportunities. Teams can benefit from personal drive, potential distribution, and confidence in their unique value proposition.

      Open-source initiatives can bring significant benefits to both the community and the entrepreneur. In this case, a team identified a gap in document-based question answering and collaborated with Hugging Face to create an easy-to-use solution. They open-sourced their contribution, motivated by personal innovation drive, potential distribution opportunities, and confidence in their proprietary strategy. Open-source distribution not only allows for wider access and innovation but also exposes the company to potential customers and builds credibility. The team's confidence comes from their unique value proposition: real-time data flywheel, ease of use, and advanced integrations, which are challenging to build and engineer independently. By open-sourcing their solution, they can still thrive with their proprietary product's core features. This story demonstrates that open-source initiatives can foster innovation, build community, and create business opportunities.

    • Lack of expertise as a strength for non-experts starting a business in techStarting a tech business as a non-expert can provide a unique perspective and identify market gaps. Being open-minded and user-focused can lead to innovative solutions and a successful business.

      Having a fresh perspective as a non-expert in a field can be an asset when starting a business, particularly in technology. The founder of Dotquery and Hugging Face, Thomas Wolf, shared how his lack of deep learning expertise initially gave him a unique perspective on making complex models more accessible to non-experts. He emphasized the importance of being naive and open-minded, allowing him to identify gaps in the market and understand user needs. Additionally, Wolf discussed the benefits of open-source business models, which can lead to higher adoption rates due to increased accessibility and ease of use. Looking ahead for Hugging Face, the team plans to continue developing user-friendly tools and expanding their offerings in the natural language processing space. This approach of combining user needs with technical expertise can lead to innovative solutions and a successful business.

    • Expanding DocQuery's capabilities for complex document queriesDocQuery, a question answering framework, is enhancing its features to identify document types, extract table data, and query across multiple documents. The team is confident in its progress towards these goals.

      DocQuery, a question answering framework developed by Impira, is making significant strides in expanding its capabilities to answer more complex questions about documents. Currently, users are asking for the ability to identify document types and extract information from tables, which the team plans to address in the near term. Additionally, the team is working on enabling users to ask natural language questions across multiple documents, a feature that is currently in the training phase. The team is confident in their ability to expand the question answering framework to support these features due to its flexibility and the success they've had with similar functionalities in Impira's product. The ultimate goal is to enable users to ask complex queries over a pile of documents, such as finding all invoices due next month or identifying the most relevant invoice from a vendor for a contract. While the team is making progress, there are still a few moving parts to figure out before this feature becomes widely available.

    • Exploring Commanding Data with Doc QueryDoc Query team is developing a system for users to type actions related to documents, called 'commanding data', which has the potential to make interacting with data more intuitive and powerful. They plan to open-source parts of the project to engage the community and reach a larger audience.

      The team behind Doc Query, a document querying system, is exploring the idea of enabling users to type actions related to documents, rather than just asking questions. This approach, which they call "commanding data," has the potential to make interacting with data more intuitive and powerful. They plan to open-source parts of Doc Query to engage the community and tap into different use cases and domains. The team sees this as a significant shift in how people work with data, and they believe that open sourcing the project will help them reach a larger audience and achieve greater impact. Doc Query's vision is to make it simple for anyone to ask anything of any data and easily sequence the parts together. Despite the challenges, they are excited about the possibilities and the potential benefits for a wide range of users.

    Recent Episodes from Practical AI: Machine Learning, Data Science

    Stanford's AI Index Report 2024

    Stanford's AI Index Report 2024
    We’ve had representatives from Stanford’s Institute for Human-Centered Artificial Intelligence (HAI) on the show in the past, but we were super excited to talk through their 2024 AI Index Report after such a crazy year in AI! Nestor from HAI joins us in this episode to talk about some of the main takeaways including how AI makes workers more productive, the US is increasing regulations sharply, and industry continues to dominate frontier AI research.

    Apple Intelligence & Advanced RAG

    Apple Intelligence & Advanced RAG
    Daniel & Chris engage in an impromptu discussion of the state of AI in the enterprise. Then they dive into the recent Apple Intelligence announcement to explore its implications. Finally, Daniel leads a deep dive into a new topic - Advanced RAG - covering everything you need to know to be practical & productive.

    The perplexities of information retrieval

    The perplexities of information retrieval
    Daniel & Chris sit down with Denis Yarats, Co-founder & CTO at Perplexity, to discuss Perplexity’s sophisticated AI-driven answer engine. Denis outlines some of the deficiencies in search engines, and how Perplexity’s approach to information retrieval improves on traditional search engine systems, with a focus on accuracy and validation of the information provided.

    Using edge models to find sensitive data

    Using edge models to find sensitive data
    We’ve all heard about breaches of privacy and leaks of private health information (PHI). For healthcare providers and those storing this data, knowing where all the sensitive data is stored is non-trivial. Ramin, from Tausight, joins us to discuss how they have deploy edge AI models to help company search through billions of records for PHI.

    Rise of the AI PC & local LLMs

    Rise of the AI PC & local LLMs
    We’ve seen a rise in interest recently and a number of major announcements related to local LLMs and AI PCs. NVIDIA, Apple, and Intel are getting into this along with models like the Phi family from Microsoft. In this episode, we dig into local AI tooling, frameworks, and optimizations to help you navigate this AI niche, and we talk about how this might impact AI adoption in the longer term.

    AI in the U.S. Congress

    AI in the U.S. Congress
    At the age of 72, U.S. Representative Don Beyer of Virginia enrolled at GMU to pursue a Master’s degree in C.S. with a concentration in Machine Learning. Rep. Beyer is Vice Chair of the bipartisan Artificial Intelligence Caucus & Vice Chair of the NDC’s AI Working Group. He is the author of the AI Foundation Model Transparency Act & a lead cosponsor of the CREATE AI Act, the Federal Artificial Intelligence Risk Management Act & the Artificial Intelligence Environmental Impacts Act. We hope you tune into this inspiring, nonpartisan conversation with Rep. Beyer about his decision to dive into the deep end of the AI pool & his leadership in bringing that expertise to Capitol Hill.

    Full-stack approach for effective AI agents

    Full-stack approach for effective AI agents
    There’s a lot of hype about AI agents right now, but developing robust agents isn’t yet a reality in general. Imbue is leading the way towards more robust agents by taking a full-stack approach; from hardware innovations through to user interface. In this episode, Josh, Imbue’s CTO, tell us more about their approach and some of what they have learned along the way.

    Private, open source chat UIs

    Private, open source chat UIs
    We recently gathered some Practical AI listeners for a live webinar with Danny from LibreChat to discuss the future of private, open source chat UIs. During the discussion we hear about the motivations behind LibreChat, why enterprise users are hosting their own chat UIs, and how Danny (and the LibreChat community) is creating amazing features (like RAG and plugins).

    Related Episodes

    When data leakage turns into a flood of trouble

    When data leakage turns into a flood of trouble
    Rajiv Shah teaches Daniel and Chris about data leakage, and its major impact upon machine learning models. It’s the kind of topic that we don’t often think about, but which can ruin our results. Raj discusses how to use activation maps and image embedding to find leakage, so that leaking information in our test set does not find its way into our training set.

    Stable Diffusion (Practical AI #193)

    Stable Diffusion (Practical AI #193)
    The new stable diffusion model is everywhere! Of course you can use this model to quickly and easily create amazing, dream-like images to post on twitter, reddit, discord, etc., but this technology is also poised to be used in very pragmatic ways across industry. In this episode, Chris and Daniel take a deep dive into all things stable diffusion. They discuss the motivations for the work, the model architecture, and the differences between this model and other related releases (e.g., DALL·E 2). (Image from stability.ai)

    AlphaFold is revolutionizing biology

    AlphaFold is revolutionizing biology
    AlphaFold is an AI system developed by DeepMind that predicts a protein’s 3D structure from its amino acid sequence. It regularly achieves accuracy competitive with experiment, and is accelerating research in nearly every field of biology. Daniel and Chris delve into protein folding, and explore the implications of this revolutionary and hugely impactful application of AI.

    Zero-shot multitask learning (Practical AI #158)

    Zero-shot multitask learning (Practical AI #158)
    In this Fully-Connected episode, Daniel and Chris ponder whether in-person AI conferences are on the verge of making a post-pandemic comeback. Then on to BigScience from Hugging Face, a year-long research workshop on large multilingual models and datasets. Specifically they dive into the T0, a series of natural language processing (NLP) AI models specifically trained for researching zero-shot multitask learning. Daniel provides a brief tour of the possible with the T0 family. They finish up with a couple of new learning resources.