Logo

    Interoperability, Governance, and Divergent Teams with Prukalpa Sankar

    en-usMarch 02, 2022
    What was the main topic of the podcast episode?
    Summarise the key points discussed in the episode?
    Were there any notable quotes or insights from the speakers?
    Which popular books were mentioned in this episode?
    Were there any points particularly controversial or thought-provoking discussed in the episode?
    Were any current events or trending topics addressed in the episode?

    About this Episode

    This episode features an interview with Prukalpa Sankar, Co-Founder of Atlan. Atlan is a venture-backed startup building a modern data workspace. Prukalpa also co-founded SocialCops, a data for good company behind landmark projects such as India’s National Data Platform. Prukalpa is a recognized industry leader, landing on the Forbes 30 Under 30 list and Fortune’s 40 Under 40.

    In this episode, Prukalpa and Sam discuss how diversity is a data team’s biggest strength, why governance isn’t always a bad thing, and what they hope the modern data stack will look like in 5 years.

    -------------------

    “Diversity is our biggest strength but our biggest weakness, because it's really hard to make that team collaborate. Because most of the teams in the world are very uniform. So when every single person in the room is a subject matter expert on something, nobody else actually can have oversight on each other's work because they've never done it before. Then how do you create true trust? How do you create trust when things are breaking? If you're able to create a way for these diverse people to collaborate really effectively, to be a dream team, a dream data team where they trust each other and they can collaborate effectively, then magic can happen.” – Prukalpa Sankar

    -------------------

    Episode Timestamps:

    [01:55]: What open source data means to Prukalpa

    [05:38]: Prukalpa’s journey to data for good movement

    [04:51]: How Prukalpa and her team provided gas to 80 million Indian women

    [06:33]: How diversity can help a data team succeed

    [15:10]: What gives Atlan its magic

    [18:58]: How being open by default influenced Atlan’s architecture choices

    [22:45]: The reality of the modern data stack in 5 years

    [27:36]: Advice for people getting started with DataOps

    -------------------

    Links:

    LinkedIn - Connect with Prukalpa

    LinkedIn - Connect with Atlan

    Twitter - Follow Prukalpa

    Twitter - Follow Atlan

    Visit Atlan

    Recent Episodes from Open||Source||Data

    Tech, Trust, and Transformation with Paula Paul

    Tech, Trust, and Transformation with Paula Paul

    Timestamps
    00:00 - Intro

    05:10 - Paula’s Professional Journey

    10:30 - What Inspired Paula to Go Through the Open Source Path

    14:50 - What are some of the biggest challenges and impacts that Paula sees in companies trying to derive value?

    23:30 - Is the Tech World a Meritocracy? 

    25:35 - A Shift Of What is a Tech Company?

    27:30 - Kids Interacting with New Technologies

    31:30 - What Does Open Source Data Means to Paula? 

    42:50 - What is a Question that Paula has never been asked before?

    47:00 - What Advice would you give to the audience? 

    51:50 - Backstage with Executive Producer Leo Godoy

     

    LinkedIn - Connect with Charna

    Linkedin - Connect with Paula

     

    Open||Source||Data
    en-usMarch 12, 2024

    An Innovative Approach to AI & NLP with Milos Rusic

    An Innovative Approach to AI & NLP with Milos Rusic
    Starting the new season of Open Source Data, our new host Charna Parkey welcomes the CEO and Co-founder of deepset, Milos Rusic. With an impressive journey around NLP and AI, pioneering several areas in the Open Source field, Milos has revolutionized data search processes and brought about a new era of user-friendly and efficient enterprise search systems. Charna also shares some common ground with Milos when talking about joining an NLP Startup in 2015-16, predictive maintenance and more. Don’t miss it!
    Open||Source||Data
    en-usFebruary 27, 2024

    New Beginnings: Open||Source||Data in Transition

    New Beginnings: Open||Source||Data in Transition

    This episode features an interview with Charna Parkey, Real-Time AI Product and Strategy Leader at DataStax. Charna has been developing AI and ML products over the last 17 years and has worked with 90 of the Fortune 100 in her various roles. She is also a co-author and inventor on several patents.

    In this episode, Sam and Charna discuss handing over the role as host, Sam’s new startup journey, and how their thinking has evolved during the explosion of LLMs.

    -------------------

    “Now, it seems like we have this opportunity where the conversation and the place that society is at is different. Where we want to contribute to the right set of data when we talk open source data. We want to make sure that we have the right data to train this model in order to get the right outcome. We want to provide a lens of, ‘All right, you are this persona. How would you say this thing?’ I do think that from a lot of what the LLMs have today, the outcome of those words are still missing. And we need to solve that. Like, ‘Is this piece of writing actually going to achieve the outcome I want versus am I following legal's guidelines? Am I technically correct? Is my CEO going to like it?’ That doesn't mean you're achieving impact in the world. There's an aspect there where we've given feedback loops, it seems, to be like, ‘Did I like the answer or not?’ But not, ‘Did I take an action?’ As we get to autonomousness, we're going to have to have an outcome or multiple outcomes associated with the reward of the system.” – Charna Parkey

    “I personally believe that all cognition is bias. My degree is in cognitive science. One of the things that we trained on is attention. And to pay attention, literally means to selectively choose what data is coming in from the world that you're going to pay attention to and what you're going to discard. Which is also, to me, the definition of bias. All cognition is bias, but what do we care about? Do you trust this thing? What does that mean? Well, do you trust it to do these particular actions to a level of consistency in this particular domain? It doesn't mean that you're going to trust it in all environments. There's a lot more nuance that hopefully will evolve in this strange age of nuanced destruction machines.” – Sam Ramji

    -------------------

    Episode Timestamps:

    (01:04): Sam and Charna catch up 

    (06:05): Sam explains his new company, Sailplane 

    (14:21): How Charna’s thinking has evolved during the LLM explosion

    (25:45): Sam’s thoughts after 5 seasons of Open||Source||Data

    (38:52): What Charna is looking forward to in the next season of the podcast

    (40:44): A question Sam wishes to be asked

    (45:45): Backstage takeaways with executive producer, Audra Montenegro

    -------------------

    Links:

    LinkedIn - Connect with Charna

    LinkedIn - Connect with Sam

    Learn more about Sailplane

    The Intersection of Open Source and AI with Stefano Maffulli & Stephen O’Grady

    The Intersection of Open Source and AI with Stefano Maffulli & Stephen O’Grady

    This episode features a panel discussion with Stefano Maffulli, Executive Director of the Open Source Initiative (OSI); and Stephen O’Grady, Co-founder of RedMonk. Stefano has decades of experience in open source advocacy. He co-founded the Italian chapter of Free Software Foundation Europe, built the developer community of the OpenStack Foundation, and led open source marketing teams at several international companies. Stephen has been an industry analyst for several decades and is author of the developer playbook, The New Kingmakers: How Developers Conquered the World.

    In this episode, Sam, Stefano, and Stephen discuss the intersection of open source and AI, good data for everyone, and open data foundations.

    -------------------

    “Internet Archive, Wikipedia, they have that mission to accumulate data. The OpenStreetMap is another big one with a lot of interesting data. It's a fascinating space, though. There are so many facets of the word ‘data.’ One of the reasons why open data is so hard to manage and hasn't had that same impact of open source is because, like Stephen, the stories that he was telling about the startups having a hard time assembling the mixing and matching, or modifying of data has a different connotation. It's completely different from being able to do the same with software.” – Stefano Maffulli

    “It's also not clear how said foundation would get buy-in. Because, as far as a lot of the model holders themselves, they've been able to do most of what they want already. What's the foundation really going to offer them? They've done what they wanted. Not having any inside information here, but just judging by the fact that they are willing to indemnify their users, they feel very confident legally in their stance. Therefore, it at least takes one of the major cards off the table for them.” – Stephen O’Grady

    -------------------

    Episode Timestamps:

    (01:44): What open source in the context of AI means to each guest

    (16:21): Stefano explains OSI’s opportunity to shine a light on models and teams

    (21:22): The next step of open source AI according to Stephen

    (25:38): Creating better definitions in order to modify software

    (33:09): The case of funding an open data foundation

    (42:31): The future of open source data

    (51:54): Executive producer, Audra Montenegro's backstage takeaways

    -------------------

    Links:

    LinkedIn - Connect with Stefano

    Visit Open Source Initiative

    LinkedIn - Connect with Stephen

    Visit RedMonk

    Throwback: The AI-Native Stack with Mikiko Bazeley, Zain Hasan, and Tuana Celik

    Throwback: The AI-Native Stack with Mikiko Bazeley, Zain Hasan, and Tuana Celik

    This episode features a panel discussion with Mikiko Bazeley, Head of MLOps at Featureform; Zain Hasan, Senior Developer Advocate at Weaviate; and Tuana Celik, Developer Advocate at deepset.

    In this episode, Mikiko, Zain, and Tuana discuss what open source data means to them, how their companies fit into the AI-first ecosystem, and how jobs will need to evolve with the AI-native stack.

    -------------------

    “We're almost part of a fancy new AI robot kitchen that you'd find in Tokyo, in some ways. I see a virtual feature store as, yes, you can have a bunch of your ingredients tossed into a closet. Or, what you can do is you can essentially have a nice way to organize them. You can have a way to label them, to capture information.” – Mikiko Bazeley

    “I really like that analogy as well. I like how Mikiko put it where a vector search engine is really extracting value from what you've already got. [...] So where I see vector search engines, really, is if we think of these embedding providers as the translators to take all of our unstructured data and bring it into vector space into a common machine language, vector search engines are essentially the workhorses that allow us to compute and search over these objects in vectorized format. They're essentially the calculators of the AI stack.” – Zain Hasan

    “Haystack, I would really position as the kitchen. I need Mikiko to bring the apples. I need Zain to bring the pears. I need Hugging Face or OpenAI to bring the oranges to make a good fruit salad. But, Haystack will provide the spoons and the pans and the knives to make that into something that works together.” – Tuana Celik

    -------------------

    Episode Timestamps:

    (02:58): What open source data means to the panelists

    (09:11): What interested the panelists about AI/ML

    (24:10): Mikiko explains Featureform

    (27:00): Zain explains Weaviate

    (30:23): Tuana explains deepset

    (36:00): The panelists discuss how their companies fit into the AI-first ecosystem

    (44:58): How jobs need to evolve with the AI-native stack

    (54:35): Executive producer, Audra Montenegro's backstage takeaways

    -------------------

    Links:

    LinkedIn - Connect with Mikiko

    Visit Featureform

    LinkedIn - Connect with Zain

    Visit Weaviate

    LinkedIn - Connect with Tuana

    Visit deepset

    Visit Data-centric AI

    How We Should Think About Data Reliability for Our LLMs with Mona Rakibe

    How We Should Think About Data Reliability for Our LLMs with Mona Rakibe

    This episode features an interview with Mona Rakibe, CEO and Co-founder of Telmai, an AI-based data observability platform built for open architecture. Mona is a veteran in the data infrastructure space and has held engineering and product leadership positions that drove product innovation and growth strategies for startups and enterprises. She has served companies like Reltio, EMC, Oracle, and BEA where AI-driven solutions have played a pivotal role.

    In this episode, Sam sits down with Mona to discuss the application of LLMs, cleaning up data pipelines, and how we should think about data reliability.

    -------------------

    “When this push of large language model generative AI came in, the discussions shifted a little bit. People are more keen on, ‘How do I control the noise level in my data, in-stream, so that my model training is proper or is not very expensive, we have better precision?’ We had to shift a little bit that, ‘Can we separate this data in-stream for our users?’ Like good data, suspicious data, so they train it on little bit pre-processed data and they can optimize their costs. There's a lot that has changed from even people, their education level, but use cases also just within the last three years. Can we, as a tool, let users have some control and what they define as quality data reliability, and then monitor on those metrics was some of the things that we have done. That's how we think of data reliability. Full pipeline from ingestion to consumption, ability to have some human’s input in the system.” – Mona Rakibe

    -------------------

    Episode Timestamps:

    (01:04): The journey of Telmai 

    (05:30): How we should think about data reliability, quality, and observability 

    (13:37): What open source data means to Mona

    (15:34): How Mona guides people on cleaning up their data pipelines 

    (26:08): LLMs in real life

    (30:37): A question Mona wishes to be asked

    (33:22): Mona’s advice for the audience

    (36:02): Backstage takeaways with executive producer, Audra Montenegro

    -------------------

    Links:

    LinkedIn - Connect with Mona

    Learn more about Telmai

    Throwback: Open Source Innovation, The GPL for Data, and The Data In to Data Out Ratio with Larry Augustin

    Throwback: Open Source Innovation, The GPL for Data, and The Data In to Data Out Ratio with Larry Augustin

    This episode features an interview with Larry Augustin, angel investor and advisor to early-stage technology companies. Larry previously served as the Vice President for Applications at AWS, where he was responsible for application services like Pinpoint, Chime, and WorkSpaces.

    Before joining AWS, Larry was the CEO of SugarCRM, an open source CRM vendor. He also was the founder and CEO of VA Linux, where he launched SourceForge. Among the group who coined the term “open source”, Larry has sat on the boards of several open source and Linux organizations.

    In this episode, Sam and Larry discuss who owns the rights to data, the data in to data out ratio, and why Larry is an open source titan.

    -------------------

    "People are willing to give up so much of their personal information because they get an awful lot back. And privacy experts come along and say, ‘Well, you're taking all this personal information’. But then most people look at that and say, ‘But I get a lot of value back out of that.’ And it's this data ratio value question, which is: for a little in, I get a lot back. That becomes a key element in this. And I think there has to be some kind of similar thought process around open source data in general, which is if I contribute some data into this, I'm going to get a lot of value back. So this data in to data out ratio, I think it's an incredibly important one. And it gets everyone in the mindset of, ‘How do I provide more and more and take less and less?’ It's a principle of application development that I like a lot. And I think there's a similar concept here around open source data. Are there models or structures that we can come up with where people can contribute small amounts of data and as a result of that, they get back a lot of value.” – Larry Augustin

    -------------------

    Episode Timestamps:

    (02:52): How Larry is spending his time now after AWS

    (06:25): What drove Larry to open source

    (18:41): What is the GPL for data?

    (24:28): Areas of progress in open source data

    (28:57): The data in to data out ratio

    (36:39): Larry’s advice for folks in open source

    -------------------

    Links:

    LinkedIn - Connect with Larry

    Twitter - Follow Larry

    Reframing Machine Learning and AI-Assisted Development with Jorge Torres

    Reframing Machine Learning and AI-Assisted Development with Jorge Torres

    This episode features an interview with Jorge Torres, Co-founder and CEO of MindsDB. MindsDB is a virtual AI database that works with existing data to help developers build AI-centered apps. In 2008, Jorge began his work on scaling solutions using machine learning as the first full-time engineer at Couchsurfing, growing the company from a few thousand users to a few million. He has also served a number of data-intensive start-ups and was a visiting scholar at UC Berkeley researching machine learning automation and explainability.

    In this episode, Sam and Jorge discuss the inspiration and challenges behind MindsDB, classic data science AI versus applied AI, and time series transformers.

    -------------------

    “So much data in the world is time series data, so much data. Even data that people don't know is time series, it's time series. So long as it’s moving over time, it is time series data. Whether you store it or not, that's a different thing. For having a pre-trained model on time series data, it even enabled the fact that you don't have to store all the historical data. You can just take the model and start passing data as it comes through, and then you get out the forecast. So you don't even have to have the historical data. All you need to have is the data at that given instance, and you can pass it to the model and you get an output. It's mind blowing.” – Jorge Torres

    -------------------

    Episode Timestamps:

    (05:20): The inspiration behind MindsDB

    (10:20): Classic data science AI approach vs. applied AI

    (22:09): What open source data means to Jorge

    (28:51): What excites Jorge about Nixtla and time series transformers

    (37:07): A question Jorge wishes to be asked

    (40:20): Jorge’s advice for the audience

    (41:38): Backstage takeaways with executive producer, Audra Montenegro

    -------------------

    Links:

    LinkedIn - Connect with Jorge

    Learn more about MindsDB open source code

    Learn more about MindsDB

    A Sam Ramji Feature: The Evolution of Open Source, Kubernetes, and AI's Forward Journey

    A Sam Ramji Feature: The Evolution of Open Source, Kubernetes, and AI's Forward Journey

    On this episode, we’ve partnered with the Future Rodeo podcast for a discussion between Sam and Matt Wallace. Matt is the Chief Technology Officer and EVP at Faction, a pioneer of multi-cloud data services, and host of Future Rodeo.

    In this episode, Sam and Matt discuss Microsoft’s transformation, the impact of Kubernetes on container orchestration, and the rapid acceleration of AI research and development.

    -------------------

    Episode Timestamps:

    (01:38): Microsoft’s open source transformation

    (13:19): The impact of Kubernetes and how it defragmented the industry

    (22:06): The transformative power of AI and how it’s changing the value of reasoning

    (54:58): The concept of cognitive economy and its potential impact on AI and software development

    (01:03:25): Potential implications of advancements in robotics, AI, and clean energy

    (01:04:17): Sam’s advice for those entering the industry or choosing a career path

    -------------------

    Links:

    LinkedIn - Connect with Matt

    Listen to the Future Rodeo podcast

    The Importance of Open Source Data for Generative AI, Now and in the Future with Abby Kearns

    The Importance of Open Source Data for Generative AI, Now and in the Future with Abby Kearns

    This episode features an interview with Abby Kearns, technology executive, board director, and angel investor. Her career has spanned executive leadership, product marketing, product management, and consulting across Fortune 500 companies and startups, including Puppet, Cloud Foundry Foundation, and Verizon. Abby currently serves as a board director for Lightbend, Stackpath, and Invoke. 

    In this episode, Sam sits down with Abby to discuss the betrayal source license, the role open source plays in AI, and empowering trust.

    -------------------

    “There's so much happening so quickly that I think open source has the power to help harness a lot of that innovative conversation. In a way that I think it's going to be really, really hard to match in a proprietary way. I think open source and the ability, given the fact that we're talking about AI and data, the two are very interrelated at this point. AI is not super interesting without data. I think the power of open source right now and what's happening, I think it has to happen in open source and I think it really has to have that level of transparency and visibility. But, always the ability for everyone to step up and understand what's happening at this moment in time and shape it.” – Abby Kearns

    -------------------

    Episode Timestamps:

    (00:50): Sam and Abby discuss the betrayal source license

    (14:12): What open source data means to Abby

    (23:30): Abby dives into the companies she’s investing in

    (34:30): How nonprofits can empower trust

    (38:32): A question Abby wishes to be asked

    (40:21): Abby’s advice for the audience

    (43:53): Backstage takeaways with executive producer, Audra Montenegro

    -------------------

    Links:

    LinkedIn - Connect with Abby

    Twitter - Follow Abby

    Read Design the Life You Love