Chunking express: An expert breaks down how to build your RAG system

enMarch 05, 2024

The Stack Overflow Podcast

What is Retrieval Augmented Generation (RAG)?

How does chunking aid in machine content understanding?

What challenges arise from user query limitations in RAG?

What techniques improve query accuracy in search systems?

Why is evaluating business cases important for GenAI implementation?

What is Retrieval Augmented Generation (RAG)?

How does chunking aid in machine content understanding?

What challenges arise from user query limitations in RAG?

What techniques improve query accuracy in search systems?

Why is evaluating business cases important for GenAI implementation?

Podcast Summary

Retrieval Augmented Generation (RAG): RAG, specifically LLMs, make content easier to understand for machines through chunking and embedding. Chunking is important but not the only solution, and determining the best approach depends on the complexity of the text.
Key takeaway from this episode of the Stack Overflow podcast is the discussion around Retrieval Augmented Generation (RAG), specifically LLMs, and their role in making content easier to understand for machines. Rowey Schwabber Cohen, a staff developer advocate at Pinecone, shared insights on chunking, embedding, and getting these models to behave like a natural language operating system. While there's a resemblance to the semantic web and XML, it's not a massive leap to get there. Chunking is an essential part of the process, but it's not the most crucial piece. Even if your chunking isn't perfect, you can still recover. The key is determining whether simple recursive text segmentation is sufficient or if you need a more elaborate solution, such as markdown segmentation. The conversation touched on the idea of hand-marking up content, making it easier for machines to understand, reminiscent of the semantic web and XML. While there's already a foundation in place, it's important to remember that chunking is just one piece of the puzzle. Ryan's perspective added to the conversation, emphasizing the importance of considering whether simple recursive text segmentation is enough or if a more complex solution is necessary. Overall, the episode highlighted the significance of RAG and its potential impact on how we interact with and understand data.
Semantic web and GenAI implementation: Evaluate benefits vs costs, consider testing different models, decide between custom or open-source, and assess data control, legal, and governance risks before implementing a Semantic web and GenAI system in an organization.
The semantic web, a long-standing dream of connecting data and understanding relationships between information, is becoming a reality through the use of semantic agents. These agents can search the web, identify relationships, and potentially assist users by pulling relevant information together. However, before implementing a Generative AI (GenAI) system in an organization, it's essential to consider the business case and evaluate the benefits against the costs. This may involve testing different models, deciding between training a custom model or using an open-source one, and considering factors such as data control, legal and governance risks, and data ownership. Ultimately, careful consideration and planning can lead to effective implementation and utilization of GenAI technology.
Creating LLMs: Creating large language models from scratch is a high-cost, high-effort endeavor that requires extensive resources and expertise, while fine-tuning existing models can be a more feasible option for smaller companies but still requires specialized expertise and resources, and neither fundamentally changes the way the model operates.
Creating large language models (LLMs) from scratch or fine-tuning existing models for specific use cases come with their unique challenges and costs. Let's break down the discussion. Creating your own LLM is a high-cost, high-effort endeavor. It requires an extensive amount of data and human resources to label, train, and maintain the model. This approach is best suited for large corporations with the necessary resources. Alternatively, fine-tuning existing models can be a more feasible option for smaller companies. This approach involves taking a pre-existing model and adapting it to a specific industry or use case by providing labeled data. However, the cost remains high due to the need for specialized expertise and resources. Moreover, fine-tuning doesn't fundamentally change the way the model operates. The model still generates responses based on its pre-existing knowledge and the new data provided, meaning it continues to "hallucinate" or generate responses based on its initial programming. In conclusion, while both approaches have their merits, they come with significant costs and challenges. Companies need to carefully consider their resources, goals, and industry specifics before deciding which path to take.
RAG's accessibility for building AI apps: RAG's simplicity, explainability, and affordability make it an effective and accessible solution for teams without ML engineers to build AI applications.
RAG (RapidAPI's AI Generator) is an effective and accessible solution for building AI applications, especially for teams without ML engineers. Its simplicity and explainability make it a valuable tool for understanding and challenging the system's responses. Additionally, the rise of open source models and decreasing costs of compute resources mean that having an in-house data science team is no longer a requirement for building an effective AI application. However, there are edge cases where fine-tuning may be necessary, such as in fields like petrochemicals, molecular biology, or other specialized areas where a large language model may not have sufficient knowledge. Overall, RAG's affordability, ease of use, and transparency make it a promising option for most users.
Query Fusion, Multi-query Generation: To enhance the effectiveness of vector databases and embedding-based search, techniques like query fusion and multi-query generation are employed to address the challenge of imprecise user queries, by breaking down the user's initial query into multiple distinct queries, providing more focused and relevant information for accurate results.
Vector databases like Pinecone offer hybrid search capabilities, combining dense and sparse embeddings as well as traditional semantic search methods with modern embedding-based search. This is particularly useful in domain-specific fields with unique terminology. However, even with advanced techniques like BERT and RoBERTa, there are limitations to using a naive flow of user query, embedding, and database search. One challenge is the user's lack of knowledge or ability to ask precise questions. For instance, a query like "plan a trip to Tokyo" may not yield sufficient information for effective planning. To address this, techniques like query fusion, specifically multi-query generation, are employed. This process involves breaking down the user's initial query into multiple distinct queries, allowing for more comprehensive and accurate results. For example, "what museums can I visit in Tokyo?" can provide more focused and relevant information for planning a trip. As we delve deeper into the realm of generative AI, it's crucial to acknowledge and address these challenges to ensure the best possible outcomes.
Query expansion and re-ranking: Expand user queries using multi query generation and query expansion techniques to understand user intent better. Use re-ranking to refine results based on user query and machine learning models. Corrective reranking assesses result set adequacy and modifies it as needed for a better user experience.
To effectively answer user queries, especially those about complex topics like traveling in Tokyo, it's essential to go beyond the initial query and expand it using multi query generation and query expansion techniques. This approach allows us to understand the user's intent better and retrieve more accurate and relevant information from our knowledge base. However, even with these advanced techniques, the results may not always be perfect. Therefore, it's crucial to employ methods like re-ranking and corrective reranking to further refine the results. Re-ranking uses machine learning models to reorder the results based on the user's query, ensuring that the most relevant results appear at the top. Corrective reranking, on the other hand, assesses the adequacy of the result set and modifies it if necessary using tools from the world of agents. In essence, these techniques add more steps in between the query and the final answer to augment the results and provide a better user experience. By expanding queries, re-ranking results, and making corrections as needed, we can significantly improve the accuracy and relevance of the answers we provide to users.
Corrective Rag and Knowledge Graphs: Combining Corrective Rag and Knowledge Graphs allows for effective utilization of LLMs in traditional systems by improving precision in search queries and enabling semantic processing and deduction.
To effectively utilize large language models (LLMs) in a more traditional system, such as a SQL database or a graph database, we need to employ a corrective dialog technique called "corrective rag." This method involves using the LLM's response and the original query to build a more precise search query, retrieve additional content, and refine the response. This approach can also be applied to improve the clarity of ambiguous answers. Previously, I worked in traditional AI fields, where we used LLMs for specific tasks but also employed rules-based, classical symbolic AI to aid in decision-making processes. However, most of what I've seen so far in the context of GenAI primarily focuses on retrieval. To bridge the gap between an LLM and a fully semantic query, we can look to knowledge graphs as a solution. Knowledge graphs, which started with graph databases and triples, consist of facts represented as entities and relationships. For instance, "Joe knows Bob" or "Bob is friends with Joe" are examples of facts that can be represented on a graph. Reasoners can then be employed to deduce information based on these knowledge graphs. For example, if we ask "Who works at Costco?", the reasoner would respond with "Bob." By combining the power of LLMs for retrieval and understanding of natural language queries, and knowledge graphs and reasoners for semantic processing and deduction, we can effectively utilize LLMs in more traditional systems.
NLP and Multidatabase Approach: NLP models like La La Land can generate Cypher queries for graph databases, but this method is brittle and may not provide accurate results. A more effective solution is to combine various types of databases, including vector, graph, SQL, and document-based databases, to provide accurate context for LLMs and generate faithful queries.
Natural language processing (NLP) models like La La Land can be used to generate structured queries for graph databases, such as Cypher, which is commonly used in Neo4J. This approach involves asking a question in natural language, and the NLP model generates the corresponding Cypher query. However, this method is brittle and may not always provide accurate results due to the potential mismatch between the query and the graph schema, or the graph not having the required information. Therefore, a more effective solution may be to combine various types of databases, including vector databases, graph databases, SQL-based databases, and document-based databases, to provide the LLM with accurate and relevant context. This multidatabase approach can help ensure that the LLM receives accurate information and generates faithful queries. Additionally, there are extensions and modules in systems like Langchain that can convert open-ended questions into SQL queries, further expanding the possibilities for data retrieval and analysis. Overall, the combination of NLP models and various types of databases can lead to more effective and accurate data processing and analysis.
AI systems integration: AI systems are evolving with the integration of SQL databases and traditional machine learning models, enhancing logic and machine learning capabilities in a more traditional way, improving accuracy and efficiency.
SQL databases and traditional machine learning models are being integrated more into AI systems, particularly in areas like reranking and decision-making processes. This integration allows for more logic and machine learning capabilities to be added to existing systems in a more traditional way. While AI 1.0 technologies like classifiers and reasoners have not yet become common in the pipeline in a substantial way, they hold potential for improving the accuracy and efficiency of AI systems. During the discussion, the hosts also acknowledged the contributions of the Stack Overflow community, specifically mentioning a question about TypeScript arrow functions with generics that received a large number of views. The hosts encouraged listeners to engage with the community and leave ratings and reviews if they enjoyed the show. As for the hosts themselves, Ben Popper, director of content at Stack Overflow, can be found on Twitter @benpopper. Ryan Donovan, editor of the Stack Overflow blog, can be reached on Twitter @rthordonovan. And Royce Robert Cohen, developer advocate at Pinecone, can be found on Pinecone's website and on Twitter @roycercohen. Overall, the conversation highlighted the ongoing evolution of AI systems and the importance of integrating traditional technologies like SQL databases and machine learning models to improve their capabilities.

Recent Episodes from The Stack Overflow Podcast

The world’s largest open-source business has plans for enhancing LLMs

Red Hat Enterprise Linux may be the world’s largest open-source software business. You can dive into the docs here.

Created by IBM and Red Hat, InstructLab is an open-source project for enhancing LLMs. Learn more here or join the community on GitHub.

Connect with Scott on LinkedIn.

User AffluentOwl earned a Great Question badge by wondering How to force JavaScript to deep copy a string?.

The Stack Overflow Podcast

enSeptember 13, 2024

open source

llms

red hat

The evolution of full stack engineers

From her early days coding on a TI-84 calculator, to working as an engineer at IBM, to pivoting over to her new role in DevRel, speaking, and community, Mrina has seen the world of coding from many angles.

You can follow her on Twitter here and on LinkedIn here.

You can learn more about CK editor here and TinyMCE here.

Congrats to Stack Overflow user NYI for earning a great question badge by asking:

How do I convert a bare git repository into a normal one (in-place)?

The Stack Overflow Podcast

enSeptember 10, 2024

The creator of Jenkins discusses CI/CD and balancing business with open source

You can learn more about Kohsuke on his website.

You can read more about Jenkins here.

You can read more about Cloudbees here.

Shout to Mossmyr for contributing a question that's now part of our CI/CD Collective: Is there a way to call a Jenkins Shared Library method from another Jenkins Shared Library?

The Stack Overflow Podcast

enSeptember 06, 2024

At scale, anything that could fail definitely will

Pradeep talks about building at global scale and preparing for inevitable system failures. He talks about extra layers of security, including viewing your own VMs as untrustworthy. And he lays out where he thinks the world of cloud computing is headed as GenAI becomes a bigger piece of many company’s tech stack.

You can find Pradeep on LinkedIn. He also writes a blog and hosts a podcast over at Oracle First Principles.

Congrats to Stack Overflow user shantanu, who earned a Great Question badge for asking:

Which shell I am using in mac?

Over 100,000 people have benefited from your curiosity.

The Stack Overflow Podcast

enSeptember 03, 2024

Mobile Observability: monitoring performance through cracked screens, old batteries, and crappy Wi-Fi

You can learn more about Austin on LinkedIn and check out a blog he wrote on building the SDK for Open Telemetry here.

You can find Austin at the CNCF Slack community, in the OTel SIG channel, or the client-side SIG channels. The calendar is public on opentelemetry.io. Embrace has its own Slack community to talk all things Embrace or all things mobile observability. You can join that by going to embrace.io as well.

Congrats to Stack Overflow user Cottentail for earning an Illuminator badge, awarded when a user edits and answers 500 questions, both actions within 12 hours.

The Stack Overflow Podcast

enAugust 30, 2024

Where does Postgres fit in a world of GenAI and vector databases?

For the last two years, Postgres has been the most popular database among respondents to our Annual Developer Survey.

Timescale is a startup working on an open-source PostgreSQEL stack for AI applications. You can follow the company on X and check out their work on GitHub.

You can learn more about Avthar on his website and on LinkedIn.

Congrats to Stack Overflow user Haymaker for earning a Great Question badge. They asked:

How Can I Override the Default SQLConnection Timeout

? Nearly 250,000 other people have been curious about this same question.

The Stack Overflow Podcast

enAugust 27, 2024

From PHP to JavaScript to Kubernetes: how backend engineering evolved

You can learn more about Geshan on his website or check him out on LinkedIn.

Geshan also shared the slide decks for a few of his talks on serverless and containers.

Congrats to Stack Overflow user Matthew Reed for earning a populist badge with his answer to the question: GitHub: How to do case sensitive search for the code in repository?

The Stack Overflow Podcast

enAugust 23, 2024

Ryan Dahl explains why Deno had to evolve with version 2.0

If you’ve never seen it, check out Ryan’s classic talk, 10 Things I Regret About Node.JS, which gives a great overview of the reasons he felt compelled to create Deno.

You can learn more about Ryan on Wikipedia, his website, and his Github page.

To learn more about Deno 2.0, listen to Ryan talk about it here and check out the project’s Github page here.

Congrats to Hugo G, who earned a Great Answer Badge for his input on the following question:

How can I declare and use Boolean variables in a shell script?

The Stack Overflow Podcast

enAugust 20, 2024

Battling ticket bots and untangling taxes at the frontiers of e-commerce

You can find Ilya on LinkedIn here.

You can listen to Ilya talk about Commerce Components here, a system he describes as a "modern way to approach your commerce architecture without reducing it to a (false) binary choice between microservices and monoliths."

As Ilya notes, “there are a lot of interesting implications for runtime and how we're solving it at Shopify. There is a direct bridge there to a performance conversation as well: moving untrusted scripts off the main thread, sandboxing UI extensions, and more.”

No badge winner today. Instead, user Kaizen has a question about Shopify that still needs an answer. Maybe you can help!

How to Activate Shopify Web Pixel Extension on Production Store?

The Stack Overflow Podcast

enAugust 16, 2024

Scaling systems to manage the data about the data

Coalesce is a solution to transform data at scale.

You can find Satish on LinkedIn.

We previously spoke to Satish for a Q&A on the blog: AI is only as good as the data: Q&A with Satish Jayanthi of Coalesce

We previously covered metadata on the blog: Metadata, not data, is what drags your database down

Congrats to Lifeboat winner nwinkler for saving this question with a great answer: Docker run hello-world not working

The Stack Overflow Podcast

enAugust 13, 2024

Ask this episode Anything

What is Retrieval Augmented Generation (RAG)?

How does chunking aid in machine content understanding?

What challenges arise from user query limitations in RAG?

What techniques improve query accuracy in search systems?

Why is evaluating business cases important for GenAI implementation?