Podcast Summary
AI-driven visual storytelling: Joshua, inspired by AI advancements at Snapchat, founded HM to make visual storytelling accessible to all using AI technology, initially focusing on virtual avatars, but with applications beyond that, aiming to replace the camera role in video production for individuals and businesses, offering a more efficient and cost-effective solution
Joshua, the co-founder and CEO of HM, started his company with a mission to make visual storytelling accessible to all by replacing the traditional camera with AI technology. He was inspired by the advancements in AI at Snapchat, where he had worked for six and a half years before starting HM. The initial focus of HM was on creating virtual avatars, which could generate content and remove the barriers for visual content creation. The technology developed by HM has applications beyond avatars, and the company is working on replacing the camera role in video production, making it easier and more accessible for individuals and businesses to create content. The high cost and time constraints associated with traditional video production methods, such as scheduling camera crews and studios, make the AI-driven solution a more viable and efficient option.
Avatar and AI video content creation: H.J. is revolutionizing video content creation using avatars and AI technology, prioritizing high-quality avatars for effective replacement of real-life videos, with future advancements including full-body avatar generation and integration of multiple sources into videos.
H.J. (Hijan) is revolutionizing video content creation through the use of avatars and AI technology. Currently, the platform is primarily used for creating, localizing, and personalizing videos, particularly in the areas of product explainers, how-to videos, learning development, and self-enablement training content. The quality of the avatars is a top priority, with the goal of surpassing the threshold where they can effectively replace real-life video production. Exciting advancements on the horizon include full-body avatar generation and the integration of multiple sources of elements into videos. Ultimately, the technology has the potential to replace real-time conversations and provide a more efficient and scalable solution for video content production.
Full body avatar technology: Full body avatar technology integrates text, voice, and video, with the latest machine learning models enabling real-time interaction and full body rendering. Advances in text-to-video generation models pave the way for voice and gesture connection and revolutionize industries.
The future of interactive media is moving towards full body avatar technology, which can significantly enhance the engagement and authenticity of various use cases, from educational content to high-end marketing. This technology involves the integration of text, voice, and video, with the latest advancements in machine learning models like GPT 4.0 and in-house video stack enabling real-time interaction and full body rendering. A key challenge is connecting voice and gesture motion, which can be addressed through multi-modal model training. While technology like SORA is not yet available, advances in text-to-video generation models are paving the way for this innovation. Companies like Hagen Avata aim to help businesses solve the video creation problem by providing quality, control, and consistency. The path to achieving this includes exploring text-to-image synthesis and generating the entire video at once. This technology can revolutionize various industries, from education to marketing, by providing dynamic, engaging, and authentic content.
Video Production Components: Hagen disassembles video production into A-roll and B-roll components, focusing on technical aspects for more control, consistency, and flexibility in delivering high-quality videos that align with the brand's style, while also integrating third-party tools and prioritizing safety and innovation.
At Hagen, they approach video production by disassembling it into components, focusing on the technical aspects of A-roll (avatar) and B-roll (all other elements like voiceover, music, transitions). They believe this approach offers more control, consistency, and flexibility to deliver high-quality videos that align with the brand's style. They also integrate third-party tools like Sora as components generators. When it comes to research, they combine studying available academic knowledge with understanding customer needs and technology limitations to create innovative video experiences. An example is video translation technology, which uses a lip sync model, voice, and translation with chat GPT to preserve the user's natural voice and facial expression. Safety is a priority, with strict policies against political or election content, advanced user verification, and rapid human review. They integrate safety measures into the design process to ensure a positive user experience.
Video content generation with AI: AI technology enables businesses to generate personalized, high-quality video content at scale, making it more accessible and cost-effective. Real-time video avatars and personalized ads are on the horizon, offering customized content to each user. Ethical considerations are important to ensure authenticity and avoid deep fakes.
The ability to generate personalized, high-quality video content at scale using AI technology could revolutionize how businesses communicate and engage with customers. This technology has the potential to make video content creation more accessible and cost-effective, leading to increased usage and new use cases. Real-time video avatars and personalized video ads are on the horizon, offering the possibility of delivering customized content to each user in real time. This shift could transform the way we think about video content, moving beyond the current immutable MP4 file format to a more dynamic, user-specific experience. Ultimately, this could lead to more effective marketing, sales, and customer support strategies. However, it's important to consider the ethical implications of this technology, such as avoiding deep fakes and ensuring authenticity. The future of video content generation is exciting, and it's a space to watch for continued innovation and development.
Personalized video communication research: Despite the potential of personalized video communication in education and businesses, creating effective video models is a challenge. Hijan, a platform offering this service, serves 40,000+ customers and is hiring across various teams to address this challenge. Understanding customer preferences is crucial for success, not just objective metrics.
Personalized video communication is an untapped opportunity in the educational and business worlds, especially with the rise of platforms like Hijan that can generate and distribute personalized videos at scale. However, creating effective video models that produce visually appealing outcomes and can be evaluated objectively is a significant challenge. The research in this area is important as video communication offers a more effective learning experience and a more personalized way for businesses to communicate with their employees and customers. The founder of Hijan, discussed their work in this area and how they are serving over 40,000 paying customers, mostly from non-tech companies, with their platform. They are currently hiring across different teams, including product, design, engineering, AI research, and go-to-market. The conversation also touched upon the lessons learned from working on consumer products at Snapchat and the importance of understanding customer preferences rather than just focusing on objective metrics like resolution.