Logo
    Search

    Podcast Summary

    • Intel's Advancements in Local and Data Center AI TechnologiesIntel is pushing the boundaries of AI with local machine applications and a new developer cloud, contributing to a broader microelectronics revolution. This shift towards local processing offers benefits for personal assistants and data-sensitive applications.

      There are significant advancements happening in both local and data center AI technologies. Intel, in particular, is making strides in this area, with a focus on AI-enabled applications on local machines and the introduction of their Intel Developer Cloud, which offers access to powerful processors for accelerated workloads. These developments are part of a larger revolution in microelectronics, with various chip types and chiplets emerging to compete with GPUs. The implications of these advancements are significant, particularly for personal AI assistants and applications where data privacy and low latency are crucial. As technology continues to evolve, we can expect to see more capabilities shifting to local devices while maintaining the power and performance of data center infrastructure.

    • Protecting Sensitive AI Data with Confidential ComputingConfidential computing, specifically trusted execution environments, are crucial for safeguarding sensitive AI data, especially in cloud-based and federated workflows. Intel's TDX and Cloudflare's serverless GPU workers' AI are recent examples of this trend.

      As AI workloads continue to grow and become more prevalent, particularly in federated workflows and cloud-based environments, the need for secure and confidential computing solutions is becoming increasingly important. Confidential computing, specifically trusted execution environments (TEEs), have been around for years but are gaining renewed attention due to the sensitive nature of AI data. TEEs protect data inside the processor from potential adversaries, even when data is being processed across multiple systems. Intel's TDX is one example of a trusted execution environment. Additionally, the convergence of AI and infrastructure is leading to new solutions, such as Cloudflare's serverless GPU workers' AI. This latest offering signals a growing trend towards serverless GPU environments that support AI workloads. From a developer or technical team perspective, keeping up with the latest AI models and infrastructure solutions can be challenging. New models, like Mistral AI's, are being released frequently, and it's important for teams to understand how to effectively deploy and manage these models in secure and confidential computing environments. Overall, the intersection of AI and security is a critical area of focus as we continue to see advancements in both fields. Confidential computing solutions, like trusted execution environments, are becoming essential for protecting sensitive AI data, particularly in cloud-based and federated workflows.

    • Explore the Hugging Face platform for access to a wide range of modelsDiscover and experiment with over 345,000 models on Hugging Face, from open access to commercially licensed, to find the best fit for your use case.

      If you're interested in using advanced models for your projects and experimenting with them, you can find and access a vast number of models on the Hugging Face platform. Hugging Face hosts around 345,000 models, ranging from open access, open and restricted, to commercially licensed ones. The platform is similar to GitHub, with users contributing their fine-tuned models. A key consideration for users is to assess the popularity and reliability of the models based on the number of downloads. Additionally, having hands-on experience with various models can help developers build intuition and make informed decisions when choosing the best model for their specific use case. It's essential to keep in mind that the landscape of AI infrastructure and models is constantly evolving, so staying informed and experimenting with different options is crucial.

    • Explore machine learning models with Hugging FaceHugging Face offers a platform to discover, test, and use machine learning models for computer vision, NLP, and audio. Consider task and hardware before selecting a model.

      Hugging Face is a valuable resource for discovering and experimenting with various machine learning models, particularly in the areas of computer vision, natural language processing, and audio. With its user-friendly interface, you can explore trending models, check their download numbers, and even test them out using interactive interfaces or demo apps without having to download them first. While there are many models to choose from, it's essential to consider your specific task and hardware capabilities when selecting a model. Some models may require substantial resources, such as multiple GPUs, making them impractical for smaller setups. To get a better understanding of these models and their applications, attend the free online conference "nodes" at neo4j.com/nodes, where industry experts will share their insights on using graph technology to enhance machine learning and more.

    • Choose model based on output behavior, then consider hardwareStart by selecting a transformer language model based on its output behavior, then address hardware considerations and optimization as needed

      When choosing a transformer language model, focusing on the model's output behavior should be the priority in the initial stages. Hardware considerations, such as running the model on a consumer GPU or CPU, can be addressed later in the process. It's recommended to start with smaller models and work your way up to larger ones based on your use case requirements. For models that can reasonably fit on a single processor or accelerator, it's advisable to try running them on a single instance before scaling up to more complex infrastructure. Furthermore, it's important to note that for more complex tasks, such as producing high-quality synthesized speech or accurate transcriptions from audio, larger models may be necessary. However, if you've identified a suitable model, the next step is to consider how to run it based on your infrastructure constraints. For larger models that can't be run on a single processor or accelerator, open-source tooling for model optimization can be utilized to run these models on consumer hardware or even CPUs. This optimization piece should be considered as part of your pipeline when deciding how to run the model. In summary, the key takeaway is to prioritize the model's output behavior in the initial stages, then address hardware considerations and optimization as needed based on the identified model and your infrastructure constraints.

    • Understanding resource requirements and optimization opportunities of a new machine learning modelStart by running a single inference to gauge resource usage, estimate minimum hardware requirements, and consider optimization techniques.

      When exploring the use of a new machine learning model, a crucial initial step is to understand its resource requirements and potential optimization opportunities. The speaker recommends starting by running a single inference to gauge the model's resource consumption, which can be done using hosted Jupyter Notebooks like Google Colab or by running the model locally on a personal workstation. By checking the resources used during inference, one can estimate the minimum hardware requirements and consider optimization techniques to reduce resource usage or improve performance. This process helps in making informed decisions about the feasibility and cost-effectiveness of deploying the model.

    • Optimizing and deploying AI models on various hardwareExplore optimization libraries and projects like LAMA CPP, GPTQ, ggml, and Bits and Bytes for smaller, more efficient models. Consider deploying as a REST or gRPC API for flexibility and ease of use. On-premises solutions and alternative chip offerings are growing in popularity.

      As the use of AI models continues to grow, there are increasing options for optimizing and deploying these models on various hardware, from CPUs to GPUs and beyond. For those looking to make their models smaller or more efficient for specific hardware, there are numerous projects and libraries, such as LAMA CPP, GPTQ, ggml, and Bits and Bytes, which can help with optimization. Additionally, deploying models as a REST or gRPC API can provide flexibility and ease of use. As for hardware, while cloud environments are still popular, there is a trend towards on-premises solutions and exploring other chip offerings. For those considering an in-house GPU setup, it's important to consider the different ways to deploy an AI model, such as running it as a REST API and having your application code connect to it. This separation allows for flexibility in deployment and can save time and resources.

    • Model Serving: Separating Model and ApplicationModel serving separates model and application for easier testing and deployment, with options including serverless, containerized, and optimized frameworks.

      Model serving involves separating the concerns of the model and the application, allowing for easier testing and deployment. There are various ways to deploy models, including serverless options where you can spin up a GPU when needed and pay less, but with longer cold start times. Another way is to use containerized model servers running on VMs or bare metal servers with accelerators, which have higher uptime but require constant payment. Each vendor has its own approach to setting up model serving, and there are optimization projects like VLLM that can make the inference process more efficient. The choice of framework and optimization methods adds another layer of complexity to model serving.

    • Optimizing and deploying machine learning modelsTo optimize and deploy machine learning models, experiment with various frameworks and tools like TensorFlow Serving, TorchServ, and Hugging Face transformers. Utilize libraries for optimization and consider deploying using tools like truss, TGI, VLLM, or cloud providers like AWS SageMaker.

      Building and deploying machine learning models involves several steps, from model selection and experimentation to optimization and deployment. During the experimentation phase, it's not necessary to spin up your own infrastructure, but if you decide to do so, optimize it for better performance. Once you're ready to deploy, consider using a model server specifically designed for inference tasks. You can use popular frameworks like TensorFlow Serving, TorchServ, or build your own using fast API services. When it comes to pulling down models and running them for inference, the Hugging Face transformers library is a comprehensive solution for general-purpose functionality, including language, speech, and computer vision models. For model optimization, consider packages like Optimum, which optimizes models for various architectures, and libraries like bits and bytes by Hugging Face, OpenVINO, and Apache TVM. On the deployment side, tools like truss from Base 10, TGI from Hugging Face, and VLLM are popular options for packaging and deploying models. You can also deploy models using cloud providers like AWS SageMaker. Remember, each tool has its unique features, and it's essential to explore them to find the best fit for your specific use case.

    • Getting started with machine learning modelsStart small, understand basics, experiment, follow examples, and gradually build up knowledge to make meaningful progress in machine learning

      Getting started with machine learning models doesn't require a significant investment or complex setup. You can experiment with different models in a notebook to find the best fit for your needs. Once you've identified the model, you can look for resources online to help you implement it, whether it's running on a CPU or more specialized hardware. Development often involves copying and pasting examples from reputable sources. Don't be intimidated by the process; many developers follow the same path. The landscape of machine learning can be overwhelming, but with the right resources and a willingness to learn, you can make progress. Remember, you're not alone in this journey. The process of implementing machine learning models involves understanding the basics, experimenting, and following examples from trusted sources. It's an iterative process that requires dedication and a willingness to learn. By starting small and gradually building up your knowledge, you'll be able to make meaningful progress in the field.

    Recent Episodes from Practical AI: Machine Learning, Data Science

    Apple Intelligence & Advanced RAG

    Apple Intelligence & Advanced RAG
    Daniel & Chris engage in an impromptu discussion of the state of AI in the enterprise. Then they dive into the recent Apple Intelligence announcement to explore its implications. Finally, Daniel leads a deep dive into a new topic - Advanced RAG - covering everything you need to know to be practical & productive.

    The perplexities of information retrieval

    The perplexities of information retrieval
    Daniel & Chris sit down with Denis Yarats, Co-founder & CTO at Perplexity, to discuss Perplexity’s sophisticated AI-driven answer engine. Denis outlines some of the deficiencies in search engines, and how Perplexity’s approach to information retrieval improves on traditional search engine systems, with a focus on accuracy and validation of the information provided.

    Using edge models to find sensitive data

    Using edge models to find sensitive data
    We’ve all heard about breaches of privacy and leaks of private health information (PHI). For healthcare providers and those storing this data, knowing where all the sensitive data is stored is non-trivial. Ramin, from Tausight, joins us to discuss how they have deploy edge AI models to help company search through billions of records for PHI.

    Rise of the AI PC & local LLMs

    Rise of the AI PC & local LLMs
    We’ve seen a rise in interest recently and a number of major announcements related to local LLMs and AI PCs. NVIDIA, Apple, and Intel are getting into this along with models like the Phi family from Microsoft. In this episode, we dig into local AI tooling, frameworks, and optimizations to help you navigate this AI niche, and we talk about how this might impact AI adoption in the longer term.

    AI in the U.S. Congress

    AI in the U.S. Congress
    At the age of 72, U.S. Representative Don Beyer of Virginia enrolled at GMU to pursue a Master’s degree in C.S. with a concentration in Machine Learning. Rep. Beyer is Vice Chair of the bipartisan Artificial Intelligence Caucus & Vice Chair of the NDC’s AI Working Group. He is the author of the AI Foundation Model Transparency Act & a lead cosponsor of the CREATE AI Act, the Federal Artificial Intelligence Risk Management Act & the Artificial Intelligence Environmental Impacts Act. We hope you tune into this inspiring, nonpartisan conversation with Rep. Beyer about his decision to dive into the deep end of the AI pool & his leadership in bringing that expertise to Capitol Hill.

    Full-stack approach for effective AI agents

    Full-stack approach for effective AI agents
    There’s a lot of hype about AI agents right now, but developing robust agents isn’t yet a reality in general. Imbue is leading the way towards more robust agents by taking a full-stack approach; from hardware innovations through to user interface. In this episode, Josh, Imbue’s CTO, tell us more about their approach and some of what they have learned along the way.

    Private, open source chat UIs

    Private, open source chat UIs
    We recently gathered some Practical AI listeners for a live webinar with Danny from LibreChat to discuss the future of private, open source chat UIs. During the discussion we hear about the motivations behind LibreChat, why enterprise users are hosting their own chat UIs, and how Danny (and the LibreChat community) is creating amazing features (like RAG and plugins).

    Mamba & Jamba

    Mamba & Jamba
    First there was Mamba… now there is Jamba from AI21. This is a model that combines the best non-transformer goodness of Mamba with good ‘ol attention layers. This results in a highly performant and efficient model that AI21 has open sourced! We hear all about it (along with a variety of other LLM things) from AI21’s co-founder Yoav.

    Related Episodes

    When data leakage turns into a flood of trouble

    When data leakage turns into a flood of trouble
    Rajiv Shah teaches Daniel and Chris about data leakage, and its major impact upon machine learning models. It’s the kind of topic that we don’t often think about, but which can ruin our results. Raj discusses how to use activation maps and image embedding to find leakage, so that leaking information in our test set does not find its way into our training set.

    Stable Diffusion (Practical AI #193)

    Stable Diffusion (Practical AI #193)
    The new stable diffusion model is everywhere! Of course you can use this model to quickly and easily create amazing, dream-like images to post on twitter, reddit, discord, etc., but this technology is also poised to be used in very pragmatic ways across industry. In this episode, Chris and Daniel take a deep dive into all things stable diffusion. They discuss the motivations for the work, the model architecture, and the differences between this model and other related releases (e.g., DALL·E 2). (Image from stability.ai)

    AlphaFold is revolutionizing biology

    AlphaFold is revolutionizing biology
    AlphaFold is an AI system developed by DeepMind that predicts a protein’s 3D structure from its amino acid sequence. It regularly achieves accuracy competitive with experiment, and is accelerating research in nearly every field of biology. Daniel and Chris delve into protein folding, and explore the implications of this revolutionary and hugely impactful application of AI.

    Zero-shot multitask learning (Practical AI #158)

    Zero-shot multitask learning (Practical AI #158)
    In this Fully-Connected episode, Daniel and Chris ponder whether in-person AI conferences are on the verge of making a post-pandemic comeback. Then on to BigScience from Hugging Face, a year-long research workshop on large multilingual models and datasets. Specifically they dive into the T0, a series of natural language processing (NLP) AI models specifically trained for researching zero-shot multitask learning. Daniel provides a brief tour of the possible with the T0 family. They finish up with a couple of new learning resources.