The Dynamic Duo: Apache Druid and Kubernetes with Yoav Nordmann

en-usMay 16, 2023

Tales at Scale

What was the main topic of the podcast episode?

Summarise the key points discussed in the episode?

Were there any notable quotes or insights from the speakers?

Which popular books were mentioned in this episode?

Were there any points particularly controversial or thought-provoking discussed in the episode?

Were any current events or trending topics addressed in the episode?

What was the main topic of the podcast episode?

Summarise the key points discussed in the episode?

Were there any notable quotes or insights from the speakers?

Which popular books were mentioned in this episode?

Were there any points particularly controversial or thought-provoking discussed in the episode?

Were any current events or trending topics addressed in the episode?

About this Episode

Kubernetes, an open-source container orchestration platform, has been making waves in the Apache Druid community. It makes sense - using Druid with Kubernetes can help you build a more scalable, flexible, and resilient data analytics infrastructure. Yoav Nordmann, Tech Lead and Architect at Tikal Knowledge shares why Kubernetes is so hot right now - along with some of his own Apache Druid stories.

Recent Episodes from Tales at Scale

A Year in Review: Apache Druid's 2023 Highlights with Peter Marshall

In this special episode of Tales at Scale - this is our final episode of our first season! - Peter Marshall, Director of Developer Relations at Imply joins the show to discuss the highlights of 2023 for Apache Druid. We dive into the significant feature releases and enhancements that have transformed Druid over the past year, including the SQL standardizaion, query from deep storage, experimental window functions, and the growing Druid community. Come for the retrospective, stay for the peek into the future of what’s to come for us and for Druid in 2024. See you all next year!

Tales at Scale

en-usDecember 28, 2023

From ANSI SQL Support to Multi-topic Kafka Ingestion: What's New in Apache Druid 28 with Will Xu

On this episode, we dive into Apache Druid 28. This latest Druid release includes improved ANSI SQL and Apache Calcite support, the addition of window functions as an experimental feature, async queries and query from deep storage going GA, array enhancements, multi-topic Apache Kafka ingestion, and so much more! Will Xu, program manager at Imply returns to give us the full scoop.

Tales at Scale

en-usDecember 12, 2023

Druid and Joins Debunked! with Sergio Ferragut and Hellmar Becker

On this episode, we debunk the myth that Druid can't do joins. Druid doesn't function as a traditional relational database because it was purpose-built for lightning-fast queries on large datasets. However, this doesn't mean Druid is entirely devoid of join capabilities – it simply approaches them differently. Our myth-busting team features returning guests Sergio Ferragut and Hellmar Becker from Imply ready to clarify how Druid handles joins in its own unique way and tackle what Druid is for in the first place.

Tales at Scale

en-usNovember 16, 2023

joins

query performance

apache druid

Scaling with Speed: How Atlassian's Confluence Big Data Platform Team Delivers Customer-Facing Insights with Apache Druid with Gautam Jethwani and Kasirajan Selladurai Selvakumari

On this episode, we explore how Atlassian leverages Apache Druid's capabilities to handle millions of daily events and empower users with intelligent data-driven features. We’re joined by Gautam Jethwani and Kasirajan Selladurai Selvakumari from the Confluence Big Data Platform Team who will talk through how they use Druid to power intelligent features, sub-second query latency, and complex ingestion tasks.

Tales at Scale

en-usOctober 11, 2023

Fraud Fighters: How Apache Druid and Imply help Ibotta combat fraud with faster anomaly detection with Jaylyn Stoesz

When it comes to fraud detection, initial detection is key, but so is the ability to quickly dissect and address the problem to minimize losses. This means access to real-time data is paramount. The only way to combat fraud in the digital age is to fight fire with fire…automation with automation. In this episode, we’re joined by Jaylyn Stoesz, Staff Data Engineer at Ibotta, a free cashback rewards platform, who walks us through Ibotta’s multifaceted approach to fraud detection that includes Apache Druid and gives us the full scoop on their use of Imply Polaris.

Tales at Scale

en-usSeptember 26, 2023

fraud detection

real time data

apache druid

All things Apache Druid 27.0: From deep storage queries to new visualization with Will Xu

We’re back again with another Druid release! Here we are at Apache Druid 27.0, thanks to the dedication of the Druid Community. This release was made possible by over 350 commits & 46 contributors. Will Xu, Product Manager at Imply joins the show to discuss new features like Smart Segment Loading, a new mechanism for managing data files as the database scales, improvements to schema auto-discovery, and the long-awaited feature – querying from deep storage!

Tales at Scale

en-usSeptember 06, 2023

Orb and Apache Druid: Building customer trust through data correctness with Kshitij Grover

Real-time data has many applications but one place where it’s extremely valuable is with usage tracking, billing, and generating reports. Ensuring the freshness and availability of this data is not only essential for financial success but also for establishing a more challenging aspect—trust. That's precisely why Orb chose Apache Druid and Imply as the backbone of their advanced pricing platform. This platform encompasses invoicing, usage monitoring, and comprehensive reporting. On this episode, Kshitij Grover, co-founder and CTO at Orb, guides us through their innovative utilization of Druid, allowing their users to assess metrics across queries and beyond. He also gives some great advice for those just starting on their real-time data journey. Definitely worth a listen...or two!

en-usAugust 22, 2023

Confluent, Kafka, Druid, and Flink: The Future of Streaming Data with Kai Waehner

Apache Kafka® is a streaming platform that can handle large-scale, real-time data streams reliably. It’s used for real-time data pipelines, event sourcing, log aggregation, stream processing, and building analytics applications. Apache® Druid is a database designed to provide fast, interactive, and scalable analytics on time-series and event-based data, empowering organizations to derive insights, monitor real-time metrics, and build analytics applications. Naturally, these two things just go together and are often both key parts of a company’s data architecture. Confluent is one of those companies. On this episode, Kai Waehner, Field CTO at Confluent walks us through how they use Kafka and Druid together, where Apache Flink fits into the mix and shares insights and trends from the world of data streaming.

en-usAugust 08, 2023

Driving Innovation with Open Standards: How Voltron Data is Shaping the Data Ecosystem with Apache Arrow and Ibis with Josh Patterson

Today's show is all about the world of big data and open source projects, and we've got a real gem to share with you—Voltron Data! They're on a mission to revolutionize the data analytics industry through open standards. To unleash the untapped potential in data, Voltron Data uses cutting-edge tech and provides top-notch support services, with a special focus on Apache Arrow. This open-source framework lets you process data in both flat and hierarchical formats, all packed into a super-efficient columnar memory setup. And that's not all! Meet Ibis—an amazing framework that gives data analysts, scientists, and engineers the power to access their data with a user-friendly and engine-agnostic Python library. Excited to learn more? We've got Josh Patterson, the CEO of Voltron Data, here to give us all the details.

en-usJuly 27, 2023

How Apache Druid Revolutionized Digital Turbine’s Analytics Infrastructure with Lioz Nudel and Alon Edelman

Who better to talk about the real-world usage of Apache Druid than Digital Turbine, a leading mobile growth and monetization platform? The folks at DT go way back with Druid. On this episode Lioz Nudel, Engineering Group Manager at Digital Turbine and Alon Edelman, Data Architect at Digital Turbine discuss how Druid has significantly improved their analytics infrastructure in terms of performance and scalability. We cover their journey from using MySQL to Druid, highlighting the scalability, performance, and agility that Druid offers and delve into specific use cases, such as analyzing massive amounts of data and managing cloud computing costs.

en-usJuly 18, 2023

Ask this episode Anything

What was the main topic of the podcast episode?

Summarise the key points discussed in the episode?

Were there any notable quotes or insights from the speakers?

Which popular books were mentioned in this episode?

Were there any points particularly controversial or thought-provoking discussed in the episode?

Were any current events or trending topics addressed in the episode?