Logo
    Search

    Podcast Summary

    • Exploring the Intersection of Creativity and Technology with DeepfakesDeepfakes hold immense potential for both good and misuse, blurring the line between human and machine communication, and significantly impacting creative content.

      We are living in an exciting time where technology is rapidly advancing, particularly in the realm of synthetic media and deepfakes. Deepfakes refer to virtual human interactions and the bridge between how we communicate with machines in the future. While some may be frightened by this technology, it holds immense potential for both good and misuse. People outside of the AI community are becoming increasingly aware of deepfakes due to their ability to create compelling and convincing videos, be it for misinformation or positive purposes. The impact of AI on creative content is becoming more significant. Our guest, Lior Hakim, CTO of Hour 1, will dive deeper into this topic and discuss what they're doing with this technology. Overall, the intersection of creativity and technology is an intriguing and important place to explore. Tune in to learn more.

    • Exploring the creative potential of deepfakesDeepfakes offer opportunities for new user experiences and interactions with technology, allowing us to digitize aspects of our identities and normalize their use.

      Deepfakes represent an exciting time for creativity and interaction with technology. Beyond the negative connotations often associated with deepfakes, they offer opportunities for new user experiences and interactions with automated systems. Deepfakes allow us to digitize and put to use various aspects of our identities, such as our likeness, tone of voice, and gestures. This could include creating avatars of ourselves or digitizing real-life captures to be used in various contexts, such as commercials or movies. As technology continues to advance, we can expect more normalization of these interactions and a wider spectrum of uses and misuses. It's essential to consider the implications of these developments, including the potential for both good and misuse, and adjust our perspectives accordingly. Deepfakes challenge traditional notions of personas and identity, and as we continue to explore this new frontier, it's important to keep an open mind and embrace the possibilities.

    • Digitization of Likeness and Tone of VoiceTechnology advances enable creators to control digital personas, reach wider audiences, build trust, and personalize content. However, accessibility and potential misuse are concerns as tech becomes more accessible.

      Technology is advancing to allow for the digitization and manipulation of likeness and tone of voice, giving creators more control over their digital personas and enabling them to reach wider audiences. This can lead to new opportunities for trust-building and community-building, as well as more personalized content consumption. However, it's important to consider the potential accessibility issues and the potential for misuse if this technology remains in the hands of a select few. As technology becomes more accessible to a wider audience, it's crucial to ensure that everyone has the opportunity to create and control their own digital personas, rather than creating an imbalance of power. An example of this is the ability to change the voice or language of audiobooks or navigation systems, giving consumers control over their preferred interactions. This shift in technology has the potential to revolutionize the way we create, consume, and interact with content.

    • Exploring the possibilities of synthetic media and virtual humansTechnology advancements in synthetic media, virtual humans, image generation, and text-to-speech are expanding creativity and communication opportunities, particularly for those without tech skills. Potential applications include virtual collaborators, personalized podcasts, and customized book voiceovers.

      The advancement of technology in the realms of synthetic media, virtual humans, image generation, and text-to-speech is opening up new possibilities for creativity and communication, particularly for those without tech skills. This shift is expanding into our culture and changing the way we consume and create content. From a technical standpoint, this includes text input, voice cloning, text-to-speech engines, and the ability to generate realistic images and emotions in speech. This technology is still in its early stages, but it's making it possible for users to create virtual versions of themselves, record podcasts with them, and even generate voiceovers for books. As this technology continues to grow and adapt, it's important to consider the potential implications for our society, including issues of ownership, moderation, and ethics. But overall, the possibilities are exciting, and it's an open and expanding marketplace of skills and traits that is being built. For example, imagine being able to create a virtual version of a friend or colleague to collaborate with when they're not available. Or imagine being able to listen to books in your preferred voice or language, with the author potentially earning rewards for their work. These are just a few of the many possibilities that this technology is opening up. It's an exciting time for innovation and creativity, and it's important for us to stay curious and explore the potential of this technology.

    • Building Trust and Authenticity with Deepfakes and Virtual HumansTo build trust and authenticity with deepfakes and virtual humans, showcase positive use cases, learn from past misuses, and gradually build trust through a 'character economy' or 'virtual human economy'.

      The future of deepfakes and virtual humans holds immense possibilities for creativity and optimization, but building trust and authenticity among the public is crucial for its widespread adoption. The use of deepfakes and virtual humans can bring numerous benefits, including improved communication and engagement in various industries. However, the fear of misuses and potential negative consequences can hinder people from embracing this technology. To address this challenge, it's essential to showcase positive use cases and let people see the benefits firsthand. Building trust gradually is key, and learning from past misuses can help prevent future issues. The idea of creating a "character economy" or "virtual human economy" can help people understand the potential value and rewards of participating in this technology. Moreover, early adopters and positive examples, such as using characters for communication or seeing trusted figures like Obama or Tom Cruise in positive contexts, can help build trust and encourage more people to join. By focusing on the positive aspects and addressing concerns, we can create a more welcoming and trusting environment for deepfakes and virtual humans.

    • Exploring new ways for video content creationVirtual characters and text-to-speech tech enable easier, more accessible content creation, potentially leading to a new economy for creators and collaboration opportunities.

      Creators are exploring new ways to produce engaging video content for their audiences, particularly through the use of virtual characters and text-to-speech technology. This can make content creation more accessible and easier to produce, even in situations where recording high-quality video is difficult. Additionally, there's potential for creators to collaborate and incentivize each other by sharing content and branding it with their own virtual presence. This could lead to a new economy for content creation, where creators can recognize and reward each other for their contributions. It's still early days for this technology, but it's clear that there's a demand for rich, engaging video content and a need for easier, more accessible ways to produce it. As for the future, it's not hard to imagine that the entertainment industry could also adopt this technology, allowing actors, musicians, and other creators to offer their brands as virtual interfaces for their audiences. This could open up new possibilities for collaboration, innovation, and engagement.

    • Merging UGC and deep fake technology in the future economyIndividuals can expand reach and control over appearances and content delivery through the merge of UGC and deep fake technology. Brands and celebrities will participate, and transactions will become programmatic and personalized. Technological focus is on creating a bridge between audio and facial encoding and decoding using GANs.

      The future economy will see a merge of user-generated content and deep fake technology, allowing individuals to expand their reach and control over their appearances and content delivery. Brands and celebrities will participate in this economy, and transactions for content will become more programmatic and personalized. From a technological standpoint, the synthesis of voice and avatar with matching mouth movements is an emerging field, requiring large amounts of labeled video data for success. The focus is on creating a bridge between audio and facial encoding and decoding, using Generative Adversarial Networks (GANs) to ensure temporal stability and correctness of expression.

    • Managing audio and video data in AIUnique challenges exist in handling audio and video data for AI applications, including data pipelines, normalization, and ensuring clean data. Our current solution focuses on creating custom avatars and videos for businesses, with potential for future advancements in data processing and interactive digital humans.

      Working with audio and video data in the AI space presents unique challenges, particularly in regards to data pipelines and normalization. The infrastructure for capturing, cleaning, aligning, and labeling data is crucial, as is ensuring clean data with minimal noise or misalignment between audio and video. The end-to-end pipeline, from data acquisition to GPU processing, is a significant challenge, especially when dealing with data from various sources over the internet. Our current offerings include a SaaS solution for creating customized avators and videos for business use, with a focus on enhancing the world of work through richer and more engaging media. Looking ahead, we're excited about the potential for advancements in this field, such as improved data processing and the integration of more features to create even more realistic and interactive digital humans.

    • Exploring the Future of Text-to-Video TechnologyText-to-video technology is advancing with features like avatar creation, image generation, and media integration. Prompt engineering, 3D environments, and even inverting referencing words into prompts are areas of focus. The potential uses are vast, but it may become challenging to distinguish reality from generated content.

      The future of text-to-video technology is exciting and full of possibilities. The ability to create avatars, generate images from prompts, and add media to videos are just some of the features that are being developed and will make the technology more accessible and compelling. Prompt engineering, 3D environments, and even the ability to invert referencing words into prompts and creating imagery from them are areas of focus. The potential uses of this technology are vast, from generating compelling transition shots to creating almost realistic avatars. However, as this technology advances and becomes more sophisticated, it may become increasingly difficult to distinguish what is real and what is generated. It's important for individuals to consider how they approach this cultural shift and whether it's still relevant to differentiate between what is real and what is generated. Ultimately, the potential benefits of this technology are vast, and it's an exciting time to be a part of its development and growth.

    • AI's role in shaping our perception of realityAI learns from our culture and applies it, shaping our perception of reality, bringing both opportunities and challenges

      AI is not just a tool that we use, but it also reflects and teaches us about our culture. The discussions around Photoshop retouching and social media filters lead us to ponder the role of AI in shaping our perception of reality. AI learns from the data it is given, which is a reflection of our culture. This two-way communication between technology and culture is exciting, but it also brings challenges. Societies and governments will need to grapple with these issues as AI continues to evolve. Despite the complexities, I am optimistic about the possibilities and look forward to exploring this topic further. I appreciate Lior joining us for this thought-provoking conversation, and I am excited to continue diving into these ideas. Remember to subscribe to Practical AI for more insightful discussions, and if you find value in our show, share it with a friend or colleague. Thanks to our sponsors for their support, and we'll talk to you again soon.

    Recent Episodes from Practical AI: Machine Learning, Data Science

    Stanford's AI Index Report 2024

    Stanford's AI Index Report 2024
    We’ve had representatives from Stanford’s Institute for Human-Centered Artificial Intelligence (HAI) on the show in the past, but we were super excited to talk through their 2024 AI Index Report after such a crazy year in AI! Nestor from HAI joins us in this episode to talk about some of the main takeaways including how AI makes workers more productive, the US is increasing regulations sharply, and industry continues to dominate frontier AI research.

    Apple Intelligence & Advanced RAG

    Apple Intelligence & Advanced RAG
    Daniel & Chris engage in an impromptu discussion of the state of AI in the enterprise. Then they dive into the recent Apple Intelligence announcement to explore its implications. Finally, Daniel leads a deep dive into a new topic - Advanced RAG - covering everything you need to know to be practical & productive.

    The perplexities of information retrieval

    The perplexities of information retrieval
    Daniel & Chris sit down with Denis Yarats, Co-founder & CTO at Perplexity, to discuss Perplexity’s sophisticated AI-driven answer engine. Denis outlines some of the deficiencies in search engines, and how Perplexity’s approach to information retrieval improves on traditional search engine systems, with a focus on accuracy and validation of the information provided.

    Using edge models to find sensitive data

    Using edge models to find sensitive data
    We’ve all heard about breaches of privacy and leaks of private health information (PHI). For healthcare providers and those storing this data, knowing where all the sensitive data is stored is non-trivial. Ramin, from Tausight, joins us to discuss how they have deploy edge AI models to help company search through billions of records for PHI.

    Rise of the AI PC & local LLMs

    Rise of the AI PC & local LLMs
    We’ve seen a rise in interest recently and a number of major announcements related to local LLMs and AI PCs. NVIDIA, Apple, and Intel are getting into this along with models like the Phi family from Microsoft. In this episode, we dig into local AI tooling, frameworks, and optimizations to help you navigate this AI niche, and we talk about how this might impact AI adoption in the longer term.

    AI in the U.S. Congress

    AI in the U.S. Congress
    At the age of 72, U.S. Representative Don Beyer of Virginia enrolled at GMU to pursue a Master’s degree in C.S. with a concentration in Machine Learning. Rep. Beyer is Vice Chair of the bipartisan Artificial Intelligence Caucus & Vice Chair of the NDC’s AI Working Group. He is the author of the AI Foundation Model Transparency Act & a lead cosponsor of the CREATE AI Act, the Federal Artificial Intelligence Risk Management Act & the Artificial Intelligence Environmental Impacts Act. We hope you tune into this inspiring, nonpartisan conversation with Rep. Beyer about his decision to dive into the deep end of the AI pool & his leadership in bringing that expertise to Capitol Hill.

    Full-stack approach for effective AI agents

    Full-stack approach for effective AI agents
    There’s a lot of hype about AI agents right now, but developing robust agents isn’t yet a reality in general. Imbue is leading the way towards more robust agents by taking a full-stack approach; from hardware innovations through to user interface. In this episode, Josh, Imbue’s CTO, tell us more about their approach and some of what they have learned along the way.

    Private, open source chat UIs

    Private, open source chat UIs
    We recently gathered some Practical AI listeners for a live webinar with Danny from LibreChat to discuss the future of private, open source chat UIs. During the discussion we hear about the motivations behind LibreChat, why enterprise users are hosting their own chat UIs, and how Danny (and the LibreChat community) is creating amazing features (like RAG and plugins).

    Related Episodes

    When data leakage turns into a flood of trouble

    When data leakage turns into a flood of trouble
    Rajiv Shah teaches Daniel and Chris about data leakage, and its major impact upon machine learning models. It’s the kind of topic that we don’t often think about, but which can ruin our results. Raj discusses how to use activation maps and image embedding to find leakage, so that leaking information in our test set does not find its way into our training set.

    Stable Diffusion (Practical AI #193)

    Stable Diffusion (Practical AI #193)
    The new stable diffusion model is everywhere! Of course you can use this model to quickly and easily create amazing, dream-like images to post on twitter, reddit, discord, etc., but this technology is also poised to be used in very pragmatic ways across industry. In this episode, Chris and Daniel take a deep dive into all things stable diffusion. They discuss the motivations for the work, the model architecture, and the differences between this model and other related releases (e.g., DALL·E 2). (Image from stability.ai)

    AlphaFold is revolutionizing biology

    AlphaFold is revolutionizing biology
    AlphaFold is an AI system developed by DeepMind that predicts a protein’s 3D structure from its amino acid sequence. It regularly achieves accuracy competitive with experiment, and is accelerating research in nearly every field of biology. Daniel and Chris delve into protein folding, and explore the implications of this revolutionary and hugely impactful application of AI.

    Zero-shot multitask learning (Practical AI #158)

    Zero-shot multitask learning (Practical AI #158)
    In this Fully-Connected episode, Daniel and Chris ponder whether in-person AI conferences are on the verge of making a post-pandemic comeback. Then on to BigScience from Hugging Face, a year-long research workshop on large multilingual models and datasets. Specifically they dive into the T0, a series of natural language processing (NLP) AI models specifically trained for researching zero-shot multitask learning. Daniel provides a brief tour of the possible with the T0 family. They finish up with a couple of new learning resources.