Music consumers are becoming the creators with Suno CEO Mikey Shulman

enMay 16, 2024

No Priors: Artificial Intelligence | Machine Learning | Technology | Startups

Podcast Summary

From Music to Physics to AI: Mikey Schulman's Unconventional Career Path: Mikey Schulman's journey from a music lover to a physics PhD, machine learning opportunities, and eventually to AI and music generation demonstrates that unconventional paths can lead to exciting and innovative careers.
Mikey Schulman, the co-founder and CEO of Suno, has a unique background that led him from a love of music to a career in physics, and eventually to the field of AI and music generation. Mikey's passion for music started at a young age, but he soon realized that he wasn't good enough to make a career out of it. Instead, he pursued a PhD in physics, where he discovered a love for quantum computing. However, he soon found himself at a local company called Kentro, where he stumbled into machine learning opportunities. With a background in physics and a newfound passion for machine learning, Mikey built a team and created fun products. Kentro was acquired by S&P Global in 2018, and Mikey continued to pursue interesting projects. He eventually found his way into AI and music generation with Suno, which allows users to create songs with just a text prompt. Despite an unconventional path, Mikey's love for music, physics, and AI have come together in an exciting and innovative way. Suno is a young company, but it's already making waves in the AI music industry. Mikey's story is a reminder that sometimes the most unexpected paths can lead to the most rewarding careers.
From speech transcription to music generation: The Bark team started with speech transcription but found unique challenges in music generation, focusing on tokenization as a primary area of innovation, and acknowledged limitations while looking forward to scaling transformer models.
The founders of Bark, an open-source music generation model, initially started with an open-source project focused on transcribing earnings calls for S&P Global after their acquisition. Although they were musicians, they didn't initially plan to focus on music generation. However, they discovered a lack of creativity in speech transcription and became captivated by the potential in music. The team, which includes individuals with backgrounds in text and transformer models, found that music generation posed unique challenges due to the continuous nature of audio data. They focused on tokenization, converting the audio signal into manageable units, as a primary area of innovation. The team's approach to measuring model quality is not explicitly stated but can be inferred as a combination of human evaluation and potentially other metrics. Additionally, the team acknowledged the limitations of current models and their reliance on open-source text community advancements. They also mentioned the potential for scaling transformer models in music generation, although the specifics of how this will be achieved were not discussed. Overall, the team's journey from text processing to music generation highlights the potential for innovation in the field of audio AI, particularly in the area of music generation, and the importance of understanding the unique challenges posed by the continuous nature of audio data.
Balancing metrics and human evaluation in AI music development: Importance of using human ears to evaluate AI music models, recognizing limitations of benchmarks in audio domain, and balancing quantitative metrics with qualitative human evaluation
While metrics are important in AI development, including in the field of music generation, aesthetics and human evaluation cannot be overlooked. The speaker emphasized the importance of using human ears to evaluate models, acknowledging that benchmarks can be less effective in the audio domain. They also shared that the music background of the team has influenced the development of Suno, particularly in the early stages. The team has tried to avoid implicit bias in their model, but they may need to reconsider this approach as they explore the unique challenges of AI music. The speaker admitted that they have not given much thought to the specific difficulties the model faces in music generation, focusing more on easily measurable aspects like stereo and bit rate. Overall, the team recognizes the importance of balancing quantitative metrics with qualitative human evaluation in the development of AI music.
Exploring music creation through AI: The company aims to make music creation accessible to all, experimenting with various business models to understand user motivation and sustainability, and ultimately changing how the world interacts with music.
The creation and experience of music through AI is a deeply emotional and subjective process that is not yet fully understood, and it caters to a wide range of human emotions, cultural backgrounds, and age groups. The company discussed in the conversation aims to make music creation accessible to everyone, not just professionals or hobbyists, and is currently exploring various business models to understand what motivates users to pay for the product. The history of digital business models shows that what works in the short term may not necessarily be the most sustainable solution in the long term. The company is trying to pioneer new behaviors around music creation and therefore, it's important to experiment with different pricing structures to understand what resonates with the user base. The ultimate goal is to change how the world interacts with music and open up new experiences for people.
Revolutionizing Music Creation and Collaboration: Suno's technology enables unexpected collaborations and creativity in music, bringing people together and providing joy in the music creation process.
Suno's technology is revolutionizing the way music is created and shared by enabling collaboration and creativity in unexpected ways. It's not just about the final product but also about the journey and the joy of creating music together. This is reminiscent of the way people have always resonated with music and the desire to make it with others. The technology has opened up a magical experience for users, allowing them to collaborate with AI or each other in various ways, such as co-writing lyrics or trading off verses and choruses. It's fulfilling for Suno to see how this technology brings people together and brings them joy, even if it may not be curing cancer, but it's a significant step forward in the world of music creation.
Suno's potential to change the creator-listener ratio in music: Suno empowers creators to share niche music, learn new genres, and feel ownership, potentially skewing the creation-to-consumption ratio
Suno, a creation platform, has the potential to significantly change the skewed ratio of creators to listeners in the music industry. The platform opens up opportunities for smaller, niche micro-sharing, allowing for the creation of songs that resonate with a specific group of people. This dynamic is currently absent in music. Additionally, Suno provides a ground-up learning experience, allowing users to discover new genres and even create hybrid genres. The platform's simple features, such as editing song titles, have led to unexpected user behavior, further demonstrating the desire for creators to feel ownership and pride in their work. The enjoyment of the creation process on Suno could potentially skew the creation-to-consumption ratio even further, making it a unique and exciting space in the music industry.
The future of music: blending creation and consumption: The future of music will see a blurred line between creation and consumption, leading to increased engagement and participation from a larger population, resulting in a faster-paced music culture with a strong emotional connection between fans and artists.
The future of music consumption and creation will blur together, leading to increased engagement and participation from a larger population. This shift, driven by accessible technology, will result in a faster-paced music culture where new styles and trends emerge more frequently. Despite this, the emotional connection between fans and their favorite artists is expected to remain strong. The advent of digital tools like DAWs (Digital Audio Workstations) in the past has already revolutionized music production, allowing more people to create music and contributing to a more diverse musical landscape. This trend is likely to continue, making music more accessible and interesting for both creators and listeners alike.
Revolutionizing Music Creation with AI Technology: Suno uses AI technology to generate unique sounds, unlock new song structures, and create melodically new music, making music creation more accessible and easier for all.
Suno is revolutionizing music creation by making it more accessible and easier through AI technology. This technology not only generates unique sounds but also unlocks new song structures, chord changes, and the ability to mix different styles. It has the potential to create melodically new music that could keep listeners engaged for longer periods of time. Suno is growing and hiring new team members who share a passion for technology and music. The company's synthetic songs, which include machine-created vocals and music, showcase the incredible capabilities of this technology. The machine doesn't even recognize the concept of voice, yet it produces sounds that resonate with humans. Suno's mission is to bring more music to the world, and they are always looking for talented individuals to join their team. If you're interested, check out their website for career opportunities.

Recent Episodes from No Priors: Artificial Intelligence | Machine Learning | Technology | Startups

State Space Models and Real-time Intelligence with Karan Goel and Albert Gu from Cartesia

This week on No Priors, Sarah Guo and Elad Gil sit down with Karan Goel and Albert Gu from Cartesia. Karan and Albert first met as Stanford AI Lab PhDs, where their lab invented Space Models or SSMs, a fundamental new primitive for training large-scale foundation models. In 2023, they Founded Cartesia to build real-time intelligence for every device. One year later, Cartesia released Sonic which generates high quality and lifelike speech with a model latency of 135ms—the fastest for a model of this class. Sign up for new podcasts every week. Email feedback to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil | @krandiash | @_albertgu Show Notes: (0:00) Introduction (0:28) Use Cases for Cartesia and Sonic (1:32) Karan Goel & Albert Gu’s professional backgrounds (5:06) Steady State Models (SSMs) versus Transformer Based Architectures (11:51) Domain Applications for Hybrid Approaches (13:10) Text to Speech and Voice (17:29) Data, Size of Models and Efficiency (20:34) Recent Launch of Text to Speech Product (25:01) Multimodality & Building Blocks (25:54) What’s Next at Cartesia? (28:28) Latency in Text to Speech (29:30) Choosing Research Problems Based on Aesthetic (31:23) Product Demo (32:48) Cartesia Team & Hiring

No Priors: Artificial Intelligence | Machine Learning | Technology | Startups

enJune 27, 2024

Can AI replace the camera? with Joshua Xu from HeyGen

AI video generation models still have a long way to go when it comes to making compelling and complex videos but the HeyGen team are well on their way to streamlining the video creation process by using a combination of language, video, and voice models to create videos featuring personalized avatars, b-roll, and dialogue. This week on No Priors, Joshua Xu the co-founder and CEO of HeyGen, joins Sarah and Elad to discuss how the HeyGen team broke down the elements of a video and built or found models to use for each one, the commercial applications for these AI videos, and how they’re safeguarding against deep fakes. Links from episode: HeyGen McDonald’s commercial Sign up for new podcasts every week. Email feedback to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil | @joshua_xu_ Show Notes: (0:00) Introduction (3:08) Applications of AI content creation (5:49) Best use cases for Hey Gen (7:34) Building for quality in AI video generation (11:17) The models powering HeyGen (14:49) Research approach (16:39) Safeguarding against deep fakes (18:31) How AI video generation will change video creation (24:02) Challenges in building the model (26:29) HeyGen team and company

No Priors: Artificial Intelligence | Machine Learning | Technology | Startups

enJune 20, 2024

How the ARC Prize is democratizing the race to AGI with Mike Knoop from Zapier

The first step in achieving AGI is nailing down a concise definition and Mike Knoop, the co-founder and Head of AI at Zapier, believes François Chollet got it right when he defined general intelligence as a system that can efficiently acquire new skills. This week on No Priors, Miked joins Elad to discuss ARC Prize which is a multi-million dollar non-profit public challenge that is looking for someone to beat the Abstraction and Reasoning Corpus (ARC) evaluation. In this episode, they also get into why Mike thinks LLMs will not get us to AGI, how Zapier is incorporating AI into their products and the power of agents, and why it’s dangerous to regulate AGI before discovering its full potential. Show Links: About the Abstraction and Reasoning Corpus Zapier Central Sign up for new podcasts every week. Email feedback to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil | @mikeknoop Show Notes: (0:00) Introduction (1:10) Redefining AGI (2:16) Introducing ARC Prize (3:08) Definition of AGI (5:14) LLMs and AGI (8:20) Promising techniques to developing AGI (11:0) Sentience and intelligence (13:51) Prize model vs investing (16:28) Zapier AI innovations (19:08) Economic value of agents (21:48) Open source to achieve AGI (24:20) Regulating AI and AGI

No Priors: Artificial Intelligence | Machine Learning | Technology | Startups

enJune 11, 2024

The evolution and promise of RAG architecture with Tengyu Ma from Voyage AI

After Tengyu Ma spent years at Stanford researching AI optimization, embedding models, and transformers, he took a break from academia to start Voyage AI which allows enterprise customers to have the most accurate retrieval possible through the most useful foundational data. Tengyu joins Sarah on this week’s episode of No priors to discuss why RAG systems are winning as the dominant architecture in enterprise and the evolution of foundational data that has allowed RAG to flourish. And while fine-tuning is still in the conversation, Tengyu argues that RAG will continue to evolve as the cheapest, quickest, and most accurate system for data retrieval. They also discuss methods for growing context windows and managing latency budgets, how Tengyu’s research has informed his work at Voyage, and the role academia should play as AI grows as an industry. Show Links: Tengyu Ma Key Research Papers: Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training Non-convex optimization for machine learning: design, analysis, and understanding Provable Guarantees for Self-Supervised Deep Learning with Spectral Contrastive Loss Larger language models do in-context learning differently, 2023 Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning On the Optimization Landscape of Tensor Decompositions Sign up for new podcasts every week. Email feedback to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil | @tengyuma Show Notes: (0:00) Introduction (1:59) Key points of Tengyu’s research (4:28) Academia compared to industry (6:46) Voyage AI overview (9:44) Enterprise RAG use cases (15:23) LLM long-term memory and token limitations (18:03) Agent chaining and data management (22:01) Improving enterprise RAG (25:44) Latency budgets (27:48) Advice for building RAG systems (31:06) Learnings as an AI founder (32:55) The role of academia in AI

No Priors: Artificial Intelligence | Machine Learning | Technology | Startups

enJune 06, 2024

How YC fosters AI Innovation with Garry Tan

Garry Tan is a notorious founder-turned-investor who is now running one of the most prestigious accelerators in the world, Y Combinator. As the president and CEO of YC, Garry has been credited with reinvigorating the program. On this week’s episode of No Priors, Sarah, Elad, and Garry discuss the shifting demographics of YC founders and how AI is encouraging younger founders to launch companies, predicting which early stage startups will have longevity, and making YC a beacon for innovation in AI companies. They also discussed the importance of building companies in person and if San Francisco is, in fact, back. Sign up for new podcasts every week. Email feedback to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil | @garrytan Show Notes: (0:00) Introduction (0:53) Transitioning from founder to investing (5:10) Early social media startups (7:50) Trend predicting at YC (10:03) Selecting YC founders (12:06) AI trends emerging in YC batch (18:34) Motivating culture at YC (20:39) Choosing the startups with longevity (24:01) Shifting YC found demographics (29:24) Building in San Francisco (31:01) Making YC a beacon for creators (33:17) Garry Tan is bringing San Francisco back

No Priors: Artificial Intelligence | Machine Learning | Technology | Startups

enMay 23, 2024

The Data Foundry for AI with Alexandr Wang from Scale

Alexandr Wang was 19 when he realized that gathering data will be crucial as AI becomes more prevalent, so he dropped out of MIT and started Scale AI. This week on No Priors, Alexandr joins Sarah and Elad to discuss how Scale is providing infrastructure and building a robust data foundry that is crucial to the future of AI. While the company started working with autonomous vehicles, they’ve expanded by partnering with research labs and even the U.S. government. In this episode, they get into the importance of data quality in building trust in AI systems and a possible future where we can build better self-improvement loops, AI in the enterprise, and where human and AI intelligence will work together to produce better outcomes. Sign up for new podcasts every week. Email feedback to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil | @alexandr_wang (0:00) Introduction (3:01) Data infrastructure for autonomous vehicles (5:51) Data abundance and organization (12:06) Data quality and collection (15:34) The role of human expertise (20:18) Building trust in AI systems (23:28) Evaluating AI models (29:59) AI and government contracts (32:21) Multi-modality and scaling challenges

No Priors: Artificial Intelligence | Machine Learning | Technology | Startups

enMay 22, 2024

Music consumers are becoming the creators with Suno CEO Mikey Shulman

Mikey Shulman, the CEO and co-founder of Suno, can see a future where the Venn diagram of music creators and consumers becomes one big circle. The AI music generation tool trying to democratize music has been making waves in the AI community ever since they came out of stealth mode last year. Suno users can make a song complete with lyrics, just by entering a text prompt, for example, “koto boom bap lofi intricate beats.” You can hear it in action as Mikey, Sarah, and Elad create a song live in this episode. In this episode, Elad, Sarah, And Mikey talk about how the Suno team took their experience making at transcription tool and applied it to music generation, how the Suno team evaluates aesthetics and taste because there is no standardized test you can give an AI model for music, and why Mikey doesn’t think AI-generated music will affect people’s consumption of human made music. Listen to the full songs played and created in this episode: Whispers of Sakura Stone Statistical Paradise Statistical Paradise 2 Sign up for new podcasts every week. Email feedback to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil | @MikeyShulman Show Notes: (0:00) Mikey’s background (3:48) Bark and music generation (5:33) Architecture for music generation AI (6:57) Assessing music quality (8:20) Mikey’s music background as an asset (10:02) Challenges in generative music AI (11:30) Business model (14:38) Surprising use cases of Suno (18:43) Creating a song on Suno live (21:44) Ratio of creators to consumers (25:00) The digitization of music (27:20) Mikey’s favorite song on Suno (29:35) Suno is hiring

No Priors: Artificial Intelligence | Machine Learning | Technology | Startups

enMay 16, 2024

Context windows, computer constraints, and energy consumption with Sarah and Elad

This week on No Priors hosts, Sarah and Elad are catching up on the latest AI news. They discuss the recent developments in AI music generation, and if you’re interested in generative AI music, stay tuned for next week’s interview! Sarah and Elad also get into device-resident models, AI hardware, and ask just how smart smaller models can really get. These hardware constraints were compared to the hurdles AI platforms are continuing to face including computing constraints, energy consumption, context windows, and how to best integrate these products in apps that users are familiar with. Have a question for our next host-only episode or feedback for our team? Reach out to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil Show Notes: (0:00) Intro (1:25) Music AI generation (4:02) Apple’s LLM (11:39) The role of AI-specific hardware (15:25) AI platform updates (18:01) Forward thinking in investing in AI (20:33) Unlimited context (23:03) Energy constraints

No Priors: Artificial Intelligence | Machine Learning | Technology | Startups

enMay 09, 2024

Cognition’s Scott Wu on how Devin, the AI software engineer, will work for you

Scott Wu loves code. He grew up competing in the International Olympiad in Informatics (IOI) and is a world class coder, and now he's building an AI agent designed to create more, not fewer, human engineers. This week on No Priors, Sarah and Elad talk to Scott, the co-founder and CEO of Cognition, an AI lab focusing on reasoning. Recently, the Cognition team released a demo of Devin, an AI software engineer that can increasingly handle entire tasks end to end. In this episode, they talk about why the team built Devin with a UI that mimics looking over another engineer’s shoulder as they work and how this transparency makes for a better result. Scott discusses why he thinks Devin will make it possible for there to be more human engineers in the world, and what will be important for software engineers to focus on as these roles evolve. They also get into how Scott thinks about building the Cognition team and that they’re just getting started. Sign up for new podcasts every week. Email feedback to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil | @ScottWu46 Show Notes: (0:00) Introduction (1:12) IOI training and community (6:39) Cognition’s founding team (8:20) Meet Devin (9:17) The discourse around Devin (12:14) Building Devin’s UI (14:28) Devin’s strengths and weakness (18:44) The evolution of coding agents (22:43) Tips for human engineers (26:48) Hiring at Cognition

No Priors: Artificial Intelligence | Machine Learning | Technology | Startups

enMay 02, 2024

OpenAI’s Sora team thinks we’ve only seen the "GPT-1 of video models"

AI-generated videos are not just leveled-up image generators. But rather, they could be a big step forward on the path to AGI. This week on No Priors, the team from Sora is here to discuss OpenAI’s recently announced generative video model, which can take a text prompt and create realistic, visually coherent, high-definition clips that are up to a minute long. Sora team leads, Aditya Ramesh, Tim Brooks, and Bill Peebles join Elad and Sarah to talk about developing Sora. The generative video model isn’t yet available for public use but the examples of its work are very impressive. However, they believe we’re still in the GPT-1 era of AI video models and are focused on a slow rollout to ensure the model is in the best place possible to offer value to the user and more importantly they’ve applied all the safety measures possible to avoid deep fakes and misinformation. They also discuss what they’re learning from implementing diffusion transformers, why they believe video generation is taking us one step closer to AGI, and why entertainment may not be the main use case for this tool in the future. Show Links: Bling Zoo video Man eating a burger video Tokyo Walk video Sign up for new podcasts every week. Email feedback to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil | @_tim_brooks l @billpeeb l @model_mechanic Show Notes: (0:00) Sora team Introduction (1:05) Simulating the world with Sora (2:25) Building the most valuable consumer product (5:50) Alternative use cases and simulation capabilities (8:41) Diffusion transformers explanation (10:15) Scaling laws for video (13:08) Applying end-to-end deep learning to video (15:30) Tuning the visual aesthetic of Sora (17:08) The road to “desktop Pixar” for everyone (20:12) Safety for visual models (22:34) Limitations of Sora (25:04) Learning from how Sora is learning (29:32) The biggest misconceptions about video models

No Priors: Artificial Intelligence | Machine Learning | Technology | Startups

enApril 25, 2024