Logo

    streaming data

    Explore " streaming data" with insightful episodes like "Apache Flink for Real Time Data Analysis", "Data Lakehouses and Apache Hudi", "Bayesian Inference: The Foundation of Data Science", "The Cloudcast #337 - Cutivating IoT From Farm to Cloud" and "Episode 022: Flight Data Streams and Indian Aerospace Dreams" from podcasts like ""The New Stack Podcast", "The Cloudcast", "DataCafé", "The Cloudcast" and "#PaxEx Podcast"" and more!

    Episodes (5)

    Apache Flink for Real Time Data Analysis

    Apache Flink for Real Time Data Analysis

    This episode delves into Apache Flink, a versatile platform for executing both batch and real-time streaming data analysis tasks. This session marks the beginning of a three-part series unveiling Amazon Web Services' (AWS) new managed service built on Flink. Future episodes will explore this service in detail and examine customer experiences.

    The podcast features insights from Danny Cranmer, a principal engineer at AWS and an Apache Flink PMC and Committer, along with Hong Teoh, a software development engineer at AWS.

    Flink stands out as a high-level framework for defining data analytics jobs, accommodating both batch and streaming data sets. It offers APIs for building analysis jobs in various languages, including Java, Python, and SQL. Flink also provides a distributed job execution engine with fault tolerance and horizontal scaling capabilities.

    One prominent use case is Extract-Transform-Load (ETL), where raw data is swiftly processed for specific workloads. Flink excels in delivering low-latency transformations for unbounded data streams. Additionally, Flink supports event-driven applications, responding immediately to triggers such as user requests for weather data.

    Flink ensures exactly-once processing, critical for scenarios like financial transactions. It employs checkpoints to maintain data integrity in case of node failures.

    The podcast also touches on AWS's role in supporting the open-source Flink project and the future outlook for this powerful data processing framework.

    Learn more from The New Stack about Apache Flink:

    3 Reasons Why You Need Apache Flink for Stream Processing

    Apache Flink for Unbounded Data Streams

    8 Real-Time Data Best Practices

    Data Lakehouses and Apache Hudi

    Data Lakehouses and Apache Hudi

    Kyle Weller (@KyleJWeller, Head of Product @onehousehq) talks about the latest trends in  OSS Data Lakes, Data Warehouses, and the evolution to “Data Lakehouses” with Apache Hudi

    SHOW: 694

    CLOUD NEWS OF THE WEEK - http://bit.ly/cloudcast-cnotw

    NEW TO CLOUD? CHECK OUT - "CLOUDCAST BASICS"

    SHOW SPONSORS:

    SHOW NOTES:

    Topic 1 - Welcome to the show. Tell us a little bit of your background, and where you focus your efforts at Onehouse?

    Topic 2 - Your focus is on an emerging open source project, Apache Hudi. Before we dive into the project and technologies, we’re always interested in the background of what drove the creation of new projects. What problems existed before Hudi? 

    Topic 3 - Let’s dive into Hudi. Data lakes, Delta Lakes, Lake houses, Icebergs. What is going on with all these water metaphors?  

    Topic 4 - Hudi is focused on streaming data lakes. What are some of the things (types of applications) that need a streaming data lake? Where do transactions come into play? Where do data warehouse capabilities come into play?

    Topic 5 - Stitching together open source projects and platforms can be complicated. How does the Onehouse platform simplify all of this for either data scientists or platform teams?

    Topic 6 - What are some examples of how companies are using Onehouse and Hudi today? 

    FEEDBACK?

    Bayesian Inference: The Foundation of Data Science

    Bayesian Inference: The Foundation of Data Science

    In this episode we talk about all things Bayesian. What is Bayesian inference and why is it the cornerstone of Data Science?

    Bayesian statistics embodies the Data Scientist and their role in the data modelling process. A Data Scientist starts with an idea of how to capture a particular phenomena in a mathematical model - maybe derived from talking to experts in the company. This represents the prior belief about the model. Then the model consumes data around the problem - historical data, real-time data, it doesn't matter. This data is used to update the model and the result is called the posterior.

    Why is this Data Science? Because models that react to data and refine their representation of the world in response to the data they see are what the Data Scientist is all about.

    We talk with Dr Joseph Walmswell, Principal Data Scientist at life sciences company Abcam, about his experience with Bayesian modelling.

    Further Reading

    Some links above may require payment or login. We are not endorsing them or receiving any payment for mentioning them. They are provided as is. Often free versions of papers are available and we would encourage you to investigate.

    Recording date: 16 March 2021
    Interview date: 26 February 2021




    Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.

    The Cloudcast #337 - Cutivating IoT From Farm to Cloud

    The Cloudcast #337 - Cutivating IoT From Farm to Cloud
    Brian talks with Steve Ridder (@saridder, CEO @teralytic) and Dan Casson (VP Engineering, @teralytic) about #AgTech, nanofabrication, engaging with large, complex ecosystems, building modern IoT platforms, and how data is helping to advance one of the oldest industries in the world.

    Show Links:

    Show Notes
    • Topic 1 - Welcome to the show. Tell us about your background and how people from NYC and Silicon Valley know more about soli than farmers with their boots in the dirt?
    • Topic 2 - Let’s talk about soil, NPK (Nitrogen, Phosphorus, Potassium) levels and how frequently the “playing field” in farming is changing? What are the big challenges in managing soil? How significant are soil differences around the world?
    • Topic 3 - Before we get to the technology (in the sensor and the cloud), let’s talk about the business side of IoT sensors. You have to have mass production (edge devices), ridiculously simple sensor installations, run wireless infrastructure (?), run a cloud SaaS infrastructure (or just sell the data to brokers?) and do farmers have IT departments to translate all this techie stuff?
    • Topic 4 - Let’s talk about the technology in the IoT edge sensors. They have to be precise, inexpensive, and future proof? How much is industrial engineering vs. computer science?
    • Topic 5 - Let’s talk about the technology in the analytics cloud?  These devices only transmit every 15mins. What does a cloud stack look like?
    • Topic 6 - Any advice for techies, or industry experts, to find the people to collaborate with in these vertical solutions?
      Feedback?

      Episode 022: Flight Data Streams and Indian Aerospace Dreams

      Episode 022: Flight Data Streams and Indian Aerospace Dreams

      The aviation industry is grappling with another tragedy, the crash of a Germanwings Airbus A320 in the French Alps. Audio from the cockpit voice recorder indicates that the captain was locked out of the cockpit, and couldn’t get back in, though the flight data recorder has yet to be found. This crash – and the fact that sourcing information from the physical black boxes is crucial to understanding what happened in any accident - has reignited the conversation over whether aircraft should stream some level of black box data in real-time, and whether there should be video cameras in the cockpit. Co-hosts Max Flight and Mary Kirby talk about why the chorus of voices calling for change is growing louder, and explain why black box streaming is relative to #PaxEx.

      Next, Neelam has been writing a series of articles for Runway Girl Network about India’s efforts to attract women to aviation. She tells us why SpiceJet and other carriers are in a hiring push, and how they're getting the word out via social media. She notes that women from all walks of life - not just the "elite" daughters of Indian pilots - are increasingly drawn to this profession in her country.

      Last but not least, Neelam draws on her deep knowledge of the Indian aviation scene to bring us up to speed on how Indian manufacturers are growing their footprint in the aerospace supply chain in partnership with Airbus and Boeing. We also consider the opportunities for Indian firms to ultimately break into the highly regulated #PaxEx market.