Logo

    Tales at Scale

    Tales at Scale cracks open the world of analytics projects. We’ll be diving into Apache Druid but also hearing from folks in the data ecosystem tackling everything from architecture to open source, from scaling to streaming and everything in between- brought to you by Imply.
    en-us24 Episodes

    People also ask

    What is the main theme of the podcast?
    Who are some of the popular guests the podcast?
    Were there any controversial topics discussed in the podcast?
    Were any current trending topics addressed in the podcast?
    What popular books were mentioned in the podcast?

    Episodes (24)

    A Year in Review: Apache Druid's 2023 Highlights with Peter Marshall

    A Year in Review: Apache Druid's 2023 Highlights with Peter Marshall

    In this special episode of Tales at Scale - this is our final episode of our first season! - Peter Marshall, Director of Developer Relations at Imply joins the show to discuss the highlights of 2023 for Apache Druid. We dive into the significant feature releases and enhancements that have transformed Druid over the past year, including the SQL standardizaion, query from deep storage, experimental window functions, and the growing Druid community. Come for the retrospective, stay for the peek into the future of what’s to come for us and for Druid in 2024. See you all next year!

    From ANSI SQL Support to Multi-topic Kafka Ingestion: What's New in Apache Druid 28 with Will Xu

    From ANSI SQL Support to Multi-topic Kafka Ingestion: What's New in Apache Druid 28 with Will Xu

    On this episode, we dive into Apache Druid 28. This latest Druid release includes improved ANSI SQL and Apache Calcite support, the addition of window functions as an experimental feature, async queries and query from deep storage going GA, array enhancements, multi-topic Apache Kafka ingestion, and so much more! Will Xu, program manager at Imply returns to give us the full scoop.

    Druid and Joins Debunked! with Sergio Ferragut and Hellmar Becker

    Druid and Joins Debunked! with Sergio Ferragut and Hellmar Becker

    On this episode, we debunk the myth that Druid can't do joins. Druid doesn't function as a traditional relational database because it was purpose-built for lightning-fast queries on large datasets. However, this doesn't mean Druid is entirely devoid of join capabilities – it simply approaches them differently. Our myth-busting team features returning guests Sergio Ferragut and Hellmar Becker from Imply ready to clarify how Druid handles joins in its own unique way and tackle what Druid is for in the first place. 

    Scaling with Speed: How Atlassian's Confluence Big Data Platform Team Delivers Customer-Facing Insights with Apache Druid with Gautam Jethwani and Kasirajan Selladurai Selvakumari

    Scaling with Speed: How Atlassian's Confluence Big Data Platform Team Delivers Customer-Facing Insights with Apache Druid with Gautam Jethwani and Kasirajan Selladurai Selvakumari

    On this episode, we explore how Atlassian leverages Apache Druid's capabilities to handle millions of daily events and empower users with intelligent data-driven features. We’re joined by Gautam Jethwani and Kasirajan Selladurai Selvakumari from the Confluence Big Data Platform Team who will talk through how they use Druid to power intelligent features, sub-second query latency, and complex ingestion tasks.

    Fraud Fighters: How Apache Druid and Imply help Ibotta combat fraud with faster anomaly detection with Jaylyn Stoesz

    Fraud Fighters: How Apache Druid and Imply help Ibotta combat fraud with faster anomaly detection with Jaylyn Stoesz

    When it comes to fraud detection, initial detection is key, but so is the ability to quickly dissect and address the problem to minimize losses. This means access to real-time data is paramount. The only way to combat fraud in the digital age is to fight fire with fire…automation with automation. In this episode, we’re joined by Jaylyn Stoesz, Staff Data Engineer at Ibotta, a free cashback rewards platform, who walks us through Ibotta’s multifaceted approach to fraud detection that includes Apache Druid and gives us the full scoop on their use of Imply Polaris.

    All things Apache Druid 27.0: From deep storage queries to new visualization with Will Xu

    All things Apache Druid 27.0: From deep storage queries to new visualization with Will Xu

    We’re back again with another Druid release! Here we are at Apache Druid 27.0, thanks to the dedication of the Druid Community. This release was made possible by over 350 commits & 46 contributors. Will Xu, Product Manager at Imply joins the show to discuss new features like Smart Segment Loading, a new mechanism for managing data files as the database scales, improvements to schema auto-discovery, and the long-awaited feature – querying from deep storage!

    Orb and Apache Druid: Building customer trust through data correctness with Kshitij Grover

    Orb and Apache Druid: Building customer trust through data correctness with Kshitij Grover

    Real-time data has many applications but one place where it’s extremely valuable is with usage tracking, billing, and generating reports. Ensuring the freshness and availability of this data is not only essential for financial success but also for establishing a more challenging aspect—trust. That's precisely why Orb chose Apache Druid and Imply as the backbone of their advanced pricing platform. This platform encompasses invoicing, usage monitoring, and comprehensive reporting. On this episode, Kshitij Grover, co-founder and CTO at Orb, guides us through their innovative utilization of Druid, allowing their users to assess metrics across queries and beyond. He also gives some great advice for those just starting on their real-time data journey. Definitely worth a listen...or two!

    Confluent, Kafka, Druid, and Flink: The Future of Streaming Data with Kai Waehner

    Confluent, Kafka, Druid, and Flink: The Future of Streaming Data with Kai Waehner

    Apache Kafka® is a streaming platform that can handle large-scale, real-time data streams reliably. It’s used for real-time data pipelines, event sourcing, log aggregation, stream processing, and building analytics applications. Apache® Druid is a database designed to provide fast, interactive, and scalable analytics on time-series and event-based data, empowering organizations to derive insights, monitor real-time metrics, and build analytics applications. Naturally, these two things just go together and are often both key parts of a company’s data architecture. Confluent is one of those companies. On this episode, Kai Waehner, Field CTO at Confluent walks us through how they use Kafka and Druid together, where Apache Flink fits into the mix and shares insights and trends from the world of data streaming.

    Driving Innovation with Open Standards: How Voltron Data is Shaping the Data Ecosystem with Apache Arrow and Ibis with Josh Patterson

    Driving Innovation with Open Standards: How Voltron Data is Shaping the Data Ecosystem with Apache Arrow and Ibis with Josh Patterson

    Today's show is all about the world of big data and open source projects, and we've got a real gem to share with you—Voltron Data!  They're on a mission to revolutionize the data analytics industry through open standards. To unleash the untapped potential in data, Voltron Data uses cutting-edge tech and provides top-notch support services, with a special focus on Apache Arrow. This open-source framework lets you process data in both flat and hierarchical formats, all packed into a super-efficient columnar memory setup. And that's not all! Meet Ibis—an amazing framework that gives data analysts, scientists, and engineers the power to access their data with a user-friendly and engine-agnostic Python library. Excited to learn more? We've got Josh Patterson, the CEO of Voltron Data, here to give us all the details.

    How Apache Druid Revolutionized Digital Turbine’s Analytics Infrastructure with Lioz Nudel and Alon Edelman

    How Apache Druid Revolutionized Digital Turbine’s Analytics Infrastructure with Lioz Nudel and Alon Edelman

    Who better to talk about the real-world usage of Apache Druid than Digital Turbine, a leading mobile growth and monetization platform? The folks at DT go way back with Druid. On this episode Lioz Nudel, Engineering Group Manager at Digital Turbine and Alon Edelman, Data Architect at Digital Turbine discuss how Druid has significantly improved their analytics infrastructure in terms of performance and scalability. We cover their journey from using MySQL to Druid, highlighting the scalability, performance, and agility that Druid offers and delve into specific use cases, such as analyzing massive amounts of data and managing cloud computing costs.

    Decoding Emotions: Leveraging ChatGPT and Apache Druid for Sentiment Analysis with Rick Jacobs

    Decoding Emotions: Leveraging ChatGPT and Apache Druid for Sentiment Analysis with Rick Jacobs

    Whether you're a data engineer, data scientist, technology enthusiast, or just a person on the Internet, you’ve heard about ChatGPT. But did you know there are some great use cases for it that work with Apache Druid? Druid and ChatGPT are two cutting-edge technologies that are revolutionizing the world of real-time analytics and natural language processing. In this episode, we’re joined by Rick Jacobs, Senior Technical Evangelist at Imply, who will dive into the benefits of combining a trained NLP model with Apache Druid for sentiment analysis and how he did it with ChatGPT. We’ll also get into the broader applications of AI and how to use it responsibly, especially when you’re dealing with important datasets.

    Druid Operator: Simplifying the management of Apache Druid in Kubernetes with Adheip Singh

    Druid Operator: Simplifying the management of Apache Druid in Kubernetes with Adheip Singh

    Deploying and configuring Apache Druid manually in a Kubernetes environment can be complex and time-consuming. But it doesn’t have to be. Enter Druid Operator, a tool specifically designed for managing Apache Druid deployments in a Kubernetes environment. Adheip Singh, founder of DataInfra and contributor to Druid Operator, walks us through the benefits, including managing upgrades and rollbacks of Druid clusters, scaling Druid clusters, adding more storage to underlying persistent volume claims and more. 

    Apache Druid 26.0: Breaking Down Druid's Latest Release with Vadim Ogievetsky

    Apache Druid 26.0: Breaking Down Druid's Latest Release with Vadim Ogievetsky

    Breaking news! Apache Druid 26.0 is now available! Druid 26.0 has a few key features including schema auto discovery and shuffle JOINs but that’s not all. On this episode, we’re joined by Vadim Ogievetsky, Apache Druid PMC, co-founder of Imply and one of the very first Druid users, to talk through what’s new and why it’s cool. Special thanks to the Apache Druid community:  60+ contributors and the nearly 400 commits made this possible!

    The Dynamic Duo: Apache Druid and Kubernetes with Yoav Nordmann

    The Dynamic Duo: Apache Druid and Kubernetes with Yoav Nordmann

    Kubernetes, an open-source container orchestration platform, has been making waves in the Apache Druid community. It makes sense - using Druid with Kubernetes can help you build a more scalable, flexible, and resilient data analytics infrastructure. Yoav Nordmann, Tech Lead and Architect at Tikal Knowledge shares why Kubernetes is so hot right now - along with some of his own Apache Druid stories. 

    Documenting Apache Druid Experiments with Hellmar Becker

    Documenting Apache Druid Experiments with Hellmar Becker

    Working on/with Apache Druid is one thing, but talking about it is another. On today’s episode, we get tips and tricks for writing about your technical projects from Hellmar Becker, Apache Druid blogger and sales engineer at Imply. Spoiler alert: It doesn't have to be War and Peace. Learn how to get started with your own blog, the value of documenting your process as you go, and what to do when you hit challenges. 

    The World of Operational Visibility with Will To

    The World of Operational Visibility with Will To

    One of Apache Druid's top use cases is operational visibility, which involves monitoring, understanding, and optimizing systems in real time. If that sounds a little boring, you’re in for a treat. We talk about what you need to get started and then dive into some really interesting use cases for operational visibility across industries. Listen to learn about how operational visibility and IoT are making transportation safer and helping utility companies plan their energy needs or how it’s being used to stop money laundering at a major bank. Our guest, Will To, Sr. Product Evangelist at Imply, taps into his background as an educator to give us the full story, and some bonus insights.

    Speed, Scale, and Streaming: Building Analytics Applications with Darin Briskman

    Speed, Scale, and Streaming: Building Analytics Applications with Darin Briskman

    What is an analytics application? We state at the top of every show that they’re different from BI tools but so far, we haven’t said why. It’s time we break it all down. Darin Briskman, Director of Technology at Imply and author of the O'Reilly report Building Real Time Analytics Applications: Operational Workflows with Apache Druid, joins us to talk about how real-time data, the growth of streaming, and more have created a new world of analytics applications and how to get started if you want to build you own.

    Everything You Need to Know About SQL-Based Ingestion in Apache Druid with Sergio Ferragut

    Everything You Need to Know About SQL-Based Ingestion in Apache Druid with Sergio Ferragut

    If you’re a fan of SQL, this episode is for you. The addition of the multi-stage query engine in Apache Druid has enabled SQL-based ingestion. While that’s not something new to the database space, it makes Druid easier to use since more developers know SQL. Sergio Ferragut, Senior Developer Advocate at Imply walks us through using SQL to define and run batch ingestions and how it’s simpler and faster than traditional Druid batch ingestion.

    Accurate, Validated, and Real Time: Diving into Reddit’s Druid-powered Ad Platform with Lakshmi Ramasubramanian

    Accurate, Validated, and Real Time: Diving into Reddit’s Druid-powered Ad Platform with Lakshmi Ramasubramanian

    How do ads work on the “front page of the internet?” On today’s episode, staff software engineer at Reddit Lakshmi Ramasubramanian discusses Reddit’s ad platform, including how it handles ad pacing, real-time data, and more. We’ll dive into the challenges they needed to solve and why Apache Druid was the right database for the job.

    The Tale of Two Vehicles: Apache Druid's New Shape Takes Form

    The Tale of Two Vehicles: Apache Druid's New Shape Takes Form

    Apache Druid today isn’t the Druid that you’re used to. It’s so much more. The addition of the multi-stage query engine didn’t just change the way Druid handles queries but enabled data and transformation on ingestion and inside of Druid from one table to another using SQL. This has made Druid about 40% faster. But why stop there? Get the inside scoop of what’s coming to Druid this year, from cold tier storage to asynchronous queries and more.

    Logo

    © 2024 Podcastworld. All rights reserved

    Stay up to date

    For any inquiries, please email us at hello@podcastworld.io