Logo
    Search

    Podcast Summary

    • Formation of News Research: A company born out of open source language model communityNews Research is a company formed by a collective of individuals with a background in language models, starting from GPT 2 era, who came together to focus on the creation and development of these models after facing challenges with open sourcing and centralization in the community

      News Research is an open source research organization that has recently become a company, bringing together a collective of individuals who have been experimenting with language models since the release of GPT 2. They have contributed significantly to the open source language model community, creating and releasing various models on Hugging Face. The team's background includes individuals who have been involved in the language model space since different eras, with some starting as early as the GPT 2 release. They began by collaborating with other open source collectives but faced challenges when OpenAI closed-sourced GPT 3. In response, the COBOLD AI community emerged, allowing individuals to centralize and continue their work on customizing and interacting with these models. Eventually, there was a need for more formal organizations to focus on the creation and development of these models, leading to the formation of News Research.

    • Open source initiatives like Llama and Alpaca by Meta led to creation of Hermes model and formation of Nus Research communityMeta's open source initiatives inspired individuals to create advanced models and build a community around AI research

      Meta, through its open source initiatives like Llama and Alpaca, has played a pivotal role in advancing the AI community by making advanced models like GPT 4 accessible to researchers and developers. This led to the creation of the Hermes model by two individuals, who, inspired by the Alpaca project, decided to use only GPT 4 outputs to train their model. The resulting model, Hermes, gained significant attention in the community, leading to the formation of Nus Research, an eclectic group of individuals working on various AI projects. The founders of Nus Research, who came from diverse backgrounds and were not initially from an academic setting, were surprised by the attention their model received and the subsequent growth of their community. They formed a Discord server to bring together individuals from various age groups and backgrounds, leading to a collaborative effort on various AI projects. The success of Hermes and Nus Research showcases the power of the open source community and the impact of Meta's initiatives on advancing AI research.

    • Creating an open source research organization through social media interactionsBy focusing on synthetic datasets and data distillation, smaller research teams can make their models more competitive with larger ones, enabling their use in various applications

      The team behind Alpaca Research stumbled upon creating an open source research organization through interactions on Twitter and Discord. They focused on synthetic datasets, which are data generated by other language models or AI, to train and fine-tune smaller models due to limited computational resources. Synthetic data is useful because it allows for data distillation, where larger models with extensive knowledge can compress complex information into simpler forms, enabling smaller models to learn effectively. This approach helps make smaller models more competitive with larger ones, allowing for their use in various applications, such as edge devices, phones, or drones.

    • Using distilled information from large language models leads to performance boosts for smaller modelsDistilling human-like comprehension and offline capabilities from large language models allows smaller models to perform better and be more comprehensible, enabling more freedom of thought and conversation without safety constraints.

      The use of distilled information from large language models like GPT 3.5 and 4 has led to significant performance boosts for smaller models. This method, which involves creating compressed instructions, input questions, and answers, allows for the transfer of human-like comprehension and offline capabilities. Despite the potential challenges of model licensing, the use of open-source models and data distillation techniques enables continued innovation and development in the field. The Alpaca team, for instance, used this approach to create Hermes 1, which showed remarkable improvements compared to other models not trained using this method. This paradigm shift not only enables local models to be more comprehensible but also allows for more freedom of thought and conversation without the same level of safety constraints as larger models. While there are ongoing discussions and evolving regulations regarding model licensing, the Alpaca team's approach focuses on open-source releases and respectful use of others' models for the betterment of the community. As new models like Mistral become available, the distillation techniques learned from larger models will be applied to create models that can be used commercially.

    • Google and OpenAI's Data Ownership and Terms of ServiceThe ownership and usage of data in large language models raise complex legal issues. Companies must respect intellectual property rights and adhere to Terms of Service to maintain ethical business practices.

      The lines between who owns the data used to train large language models and who can use that data for commercial purposes are not clearly defined. Companies like Google and OpenAI, which have large language models, are likely trained on a mix of copyrighted and copyright-free material. Enforcing Terms of Service (ToS) in such a complex web of connections could be challenging, as it might require companies to open their books to scrutiny. This was illustrated in an interaction between Google and OpenAI, where Google's Bard model was accused of violating OpenAI's ToS but no legal action was taken. The news research group at NUS has worked on various collections of models over time, including Hermes, Yarn models, Capybara, Puffin, and Obsidian. The Hermes series marked the initial efforts, but Tek subsequently focused on creating more synthetic data and using open datasets. The collective's ongoing work includes future projects, which will continue to expand the capabilities of language models. Despite the complexities and potential for hypocrisy, it's essential to respect the intellectual property rights of others and adhere to ToS to maintain a fair and ethical business environment.

    • Decentralized Collaboration Drives Growth in Synthetic Data Vault CollectiveThe Synthetic Data Vault collective, led by Technium, fosters innovation through decentralized collaboration, with projects like Hermes, Capybara, and Puffin, and benefits from centralized collaboration through initiatives like Yarn, all while promoting cross-team learning and a culture of creativity and autonomy.

      The Synthetic Data Vault collective, led by Technium, has seen significant growth and innovation through a decentralized, collaborative approach. The Hermes project, spearheaded by Technium, uses synthetic data and open datasets, setting the foundation for the organization's popular model series. Other projects, like Capybara and Puffin, were developed by volunteers, demonstrating the collective's commitment to fostering autonomy and creativity among its members. The Yarn project, led by Emozilla, showcases the benefits of centralized collaboration and resource allocation. As the collective has grown, communication and knowledge sharing have become essential, leading to a culture that encourages cross-team learning and collaboration. The organization's structure, now as a c corp, supports these interactions through dedicated channels and sectors focused on data synthesis, training, agents, and future simulation predictions. Overall, the Synthetic Data Vault collective thrives on the synergy of its diverse and autonomous members, pushing each other forward to advance the field of synthetic data and machine learning.

    • Collaboration and Specialization in AIFocus on hyperparameters for best model results, prioritize community and collaboration, stay updated on research, and ensure openness and transparency in AI development

      In the field of artificial intelligence, collaboration and specialization among teams are crucial for advancement. The training, data synthesis, agents, and SIEM (Security Information and Event Management) systems are interconnected, and each team member has a specific role to play. As teams grow, it's essential to tier people in and assign roles based on their expertise and contributions. Blockchains are one potential solution to the authenticity problem in the age of AI-generated content. For those fine-tuning models, it's essential to focus on hyperparameters to get the best results. The speaker also emphasized the importance of community and collaboration in the AI field, with platforms like Discord serving as a hub for interaction and knowledge sharing. Additionally, staying updated on the latest research and advancements is crucial for fine-tuners looking to make a difference in the field. Finally, the speaker highlighted the importance of openness and transparency in the development of AI technologies, as seen in Chris Dixon's book "Read, Write, Own."

    • Exploring advanced techniques for AI improvementExplore advanced techniques like instruction tuning, model merging, and reward models for better AI performance. These methods include creating better formatted data, combining models, and enabling more control over model behavior.

      While hyperparameters can be seen as less important by some, they can significantly impact model performance. A good learning rate and thorough research are crucial. Training for longer periods, if not overfitting, can also lead to better results if computational resources allow. The Axolotl trainer is recommended for LoRa models and fine-tuning. Regarding the future of AI, there's a shift towards more complex approaches beyond fine-tuning, such as model merging, instruction tuning, and reward models. Instruction tuning allows for better formatted data creation, while model merging combines models to potentially improve results. Reward models, like DPO and RLHF, enable more control over model behavior. More complex techniques, like chain of thought and tree of thought for multistep prompting, and creating datasets from these methods, can also lead to significant improvements. Overall, there's a growing emphasis on exploring new instruction methodologies, model merging, and reward models to enhance AI performance.

    • Manipulating model behavior through vector alterationUsers can control model output by hacking model activations, offering more robust and faithful representations of concepts. Other techniques include soft prompting and advanced sampling methods.

      Model activation hacking is a powerful technique that allows users to manipulate a model's behavior by altering its vectors, creating a more robust and faithful representation of the desired concepts. This method goes beyond system prompts and offers more control, though it's not as easily circumvented as system prompts. Other techniques mentioned include soft prompting, which compresses large prompts into fewer tokens, and advanced sampling methods, which could significantly improve model performance. The team behind the discussion has recently secured a $5.2 million seed financing round and plans to focus on locality, offline capabilities, and empowering users to run models themselves. While AGI is an intriguing goal, the team's immediate focus is on these practical applications.

    • Emphasizing smaller model sizes and community accessNoose Research focuses on solving unsolved problems at smaller model sizes, making tools and services to enhance open-source projects, and maintaining community access as they grow.

      Noose Research, an organization known for its open-source language models, believes in the importance of addressing unsolved problems at smaller model sizes before scaling up. This ethos stems from the community's desire for access to these tools and the belief that everyone should be able to automate their lives and push their understanding of various topics further. As Noose Research transitions from a purely open-source volunteer group to a more corporate entity, they remain committed to their ethos and maintaining the openness of the community. They aim to create tools and provide services that will enhance the capabilities of existing open-source projects, rather than creating a closed system. The community's support and inspiration have validated their work, and they look forward to continuing their contributions to the field of AI.

    Recent Episodes from Practical AI: Machine Learning, Data Science

    Apple Intelligence & Advanced RAG

    Apple Intelligence & Advanced RAG
    Daniel & Chris engage in an impromptu discussion of the state of AI in the enterprise. Then they dive into the recent Apple Intelligence announcement to explore its implications. Finally, Daniel leads a deep dive into a new topic - Advanced RAG - covering everything you need to know to be practical & productive.

    The perplexities of information retrieval

    The perplexities of information retrieval
    Daniel & Chris sit down with Denis Yarats, Co-founder & CTO at Perplexity, to discuss Perplexity’s sophisticated AI-driven answer engine. Denis outlines some of the deficiencies in search engines, and how Perplexity’s approach to information retrieval improves on traditional search engine systems, with a focus on accuracy and validation of the information provided.

    Using edge models to find sensitive data

    Using edge models to find sensitive data
    We’ve all heard about breaches of privacy and leaks of private health information (PHI). For healthcare providers and those storing this data, knowing where all the sensitive data is stored is non-trivial. Ramin, from Tausight, joins us to discuss how they have deploy edge AI models to help company search through billions of records for PHI.

    Rise of the AI PC & local LLMs

    Rise of the AI PC & local LLMs
    We’ve seen a rise in interest recently and a number of major announcements related to local LLMs and AI PCs. NVIDIA, Apple, and Intel are getting into this along with models like the Phi family from Microsoft. In this episode, we dig into local AI tooling, frameworks, and optimizations to help you navigate this AI niche, and we talk about how this might impact AI adoption in the longer term.

    AI in the U.S. Congress

    AI in the U.S. Congress
    At the age of 72, U.S. Representative Don Beyer of Virginia enrolled at GMU to pursue a Master’s degree in C.S. with a concentration in Machine Learning. Rep. Beyer is Vice Chair of the bipartisan Artificial Intelligence Caucus & Vice Chair of the NDC’s AI Working Group. He is the author of the AI Foundation Model Transparency Act & a lead cosponsor of the CREATE AI Act, the Federal Artificial Intelligence Risk Management Act & the Artificial Intelligence Environmental Impacts Act. We hope you tune into this inspiring, nonpartisan conversation with Rep. Beyer about his decision to dive into the deep end of the AI pool & his leadership in bringing that expertise to Capitol Hill.

    Full-stack approach for effective AI agents

    Full-stack approach for effective AI agents
    There’s a lot of hype about AI agents right now, but developing robust agents isn’t yet a reality in general. Imbue is leading the way towards more robust agents by taking a full-stack approach; from hardware innovations through to user interface. In this episode, Josh, Imbue’s CTO, tell us more about their approach and some of what they have learned along the way.

    Private, open source chat UIs

    Private, open source chat UIs
    We recently gathered some Practical AI listeners for a live webinar with Danny from LibreChat to discuss the future of private, open source chat UIs. During the discussion we hear about the motivations behind LibreChat, why enterprise users are hosting their own chat UIs, and how Danny (and the LibreChat community) is creating amazing features (like RAG and plugins).

    Mamba & Jamba

    Mamba & Jamba
    First there was Mamba… now there is Jamba from AI21. This is a model that combines the best non-transformer goodness of Mamba with good ‘ol attention layers. This results in a highly performant and efficient model that AI21 has open sourced! We hear all about it (along with a variety of other LLM things) from AI21’s co-founder Yoav.

    Related Episodes

    When data leakage turns into a flood of trouble

    When data leakage turns into a flood of trouble
    Rajiv Shah teaches Daniel and Chris about data leakage, and its major impact upon machine learning models. It’s the kind of topic that we don’t often think about, but which can ruin our results. Raj discusses how to use activation maps and image embedding to find leakage, so that leaking information in our test set does not find its way into our training set.

    Stable Diffusion (Practical AI #193)

    Stable Diffusion (Practical AI #193)
    The new stable diffusion model is everywhere! Of course you can use this model to quickly and easily create amazing, dream-like images to post on twitter, reddit, discord, etc., but this technology is also poised to be used in very pragmatic ways across industry. In this episode, Chris and Daniel take a deep dive into all things stable diffusion. They discuss the motivations for the work, the model architecture, and the differences between this model and other related releases (e.g., DALL·E 2). (Image from stability.ai)

    AlphaFold is revolutionizing biology

    AlphaFold is revolutionizing biology
    AlphaFold is an AI system developed by DeepMind that predicts a protein’s 3D structure from its amino acid sequence. It regularly achieves accuracy competitive with experiment, and is accelerating research in nearly every field of biology. Daniel and Chris delve into protein folding, and explore the implications of this revolutionary and hugely impactful application of AI.

    Zero-shot multitask learning (Practical AI #158)

    Zero-shot multitask learning (Practical AI #158)
    In this Fully-Connected episode, Daniel and Chris ponder whether in-person AI conferences are on the verge of making a post-pandemic comeback. Then on to BigScience from Hugging Face, a year-long research workshop on large multilingual models and datasets. Specifically they dive into the T0, a series of natural language processing (NLP) AI models specifically trained for researching zero-shot multitask learning. Daniel provides a brief tour of the possible with the T0 family. They finish up with a couple of new learning resources.