Logo
    Search

    Podcast Summary

    • From data science to business value in small organizationsSmall organizations often require team members to handle various tasks, making ML projects unique. Start by converting processes from Excel to Python to demonstrate data science's power and potential in small businesses.

      The role of a data scientist in organizations, especially smaller ones, is not just about building models but converting data into business value using data science techniques. Kirsten Lumb, the co-founder and CPO of Storytellers AI, shared her experience of starting in the field without a data science degree but through analytics in startups. She emphasized that small organizations often require team members to handle various tasks and figure out what needs to be done, making ML at small organizations unique. Kirsten's first data science project involved converting a marketing process from Excel to Python, significantly reducing the time spent and demonstrating the power of data science. Although her initial project was at a large company, the lack of access to data scientists made her realize the potential of data science in smaller organizations. Overall, the conversation highlighted the importance of data science in small organizations and the unique challenges and opportunities it presents.

    • Small organizations can make a big impact with data scienceSmall businesses can leverage data science and machine learning for growth despite perceived challenges like hiring, data readiness, and system integration

      Even a small organization without extensive resources can make a significant impact by embracing data science and machine learning. This was demonstrated by a marketing analyst who, with her knowledge of Python and data analysis skills, was able to streamline processes and bring about growth in her business. However, many small organizations may feel daunted by the perceived challenges, such as hiring the right person, ensuring data readiness, and integrating data science into existing systems. These concerns can often be overestimated, and the data science community could do more to provide accessible resources and guidance for small businesses looking to make the transition. Ultimately, the potential benefits of data science, including increased efficiency and growth opportunities, outweigh the perceived obstacles.

    • The Role of a Data Scientist in a Small OrganizationData scientists in small organizations pull data together, build models, and explain how to integrate them into the business, a crucial role as data and technologies vary.

      While low-code and no-code tools are making it easier for non-technical team members to perform certain tasks, there will always be a need for data scientists and analysts to reconcile and make sense of the data from various sources. The role of a data scientist in a small organization is to pull data together, build models, and explain how to integrate them into the business. This role is essential as data and technologies used by organizations vary, making it necessary to have someone who can reconcile and interpret the data. The use of tools like Jasper for content generation and libraries for specific tasks can augment the work of team members, but the role of a data scientist remains crucial in making sense of the data and guiding the organization. This role is not only necessary but also the most enjoyable part of data science for many, as it involves problem-solving and adapting to unique scenarios. The idea that BI tools would make the role of a Business Intelligence analyst obsolete did not hold true, and similarly, the role of a data scientist will continue to be essential in the future.

    • Broad understanding of data science workflow essential for small companiesSmall data scientists should handle various aspects of data science process beyond model training, including data infrastructure, ETL, model deployment, and simple pipelines.

      For data scientists or machine learning professionals at small companies, it's essential to have a broad understanding of the entire data science workflow rather than being an expert in just one area. Instead of focusing solely on model training, they need to be able to handle various aspects of the process, including data infrastructure, ETL, model deployment, and simple pipelines. This doesn't mean they have to be experts in MLOps or come up with new ways of doing it, but they should have a good grasp of the basics to deploy their models and manage simple pipelines. The discussion also touched upon the democratization of technology and how a 9-year-old building a drone from Lego pieces shows that complex tasks can be simplified, making it possible for more people to innovate and contribute in various fields, including open-source development.

    • Developing patterns for success in application building and managementDevelop strong project management skills, focus on tabular data and gradient boosted trees, and have a clear baselining process to ensure progress and impact in application development

      Building and managing applications, especially in a small business environment, requires a strong foundation in project management skills and a focus on simple, effective solutions. When bringing someone new into this field, it's important to provide them with clear patterns or recipes for success. One unintuitive but crucial pattern is to develop strong project management skills, including the ability to manage projects from start to finish and shepherd projects from the very beginning, even when data isn't yet in a database. Another important recipe is to focus on tabular data and use gradient boosted trees as a baseline model. Lastly, having a clear baselining process is essential to knowing when a model is good enough to move on to the next project, as multiple models can have a greater impact on a business than one perfect model. These patterns and recipes can help set someone up for success in the complex and ever-evolving world of applications.

    • Communicating Value in Small BusinessesIn small businesses, data scientists must focus on delivering measurable results, communicate clearly, prioritize effectively, and collaborate with other teams to navigate unique challenges and ensure the value of data science is understood.

      In small businesses, where strategies and priorities can change rapidly, data scientists must focus on delivering measurable results and maintaining clear communication with their teams and stakeholders. This can help mitigate the instability and ensure that everyone understands the value data science brings to the company. Additionally, having a well-defined prioritization framework and being flexible with changing roadmaps can help data scientists navigate the unique challenges of a small business environment. Furthermore, effective collaboration with other teams, such as software development or infrastructure, is crucial for successful implementation of data science projects. Ultimately, it's important to remember that people are the key to getting things done in any organization, and strong relationships and communication are essential for success in data science.

    • Building trust as a data scientist goes beyond technical skillsUnderstand org architecture, meet key people, and align work with their goals for stronger collaboration and successful projects. Use tools like Trello or Google Sheets for effective project management.

      Earning trust within an organization as a data scientist goes beyond just technical skills. It's crucial to understand the organization's architecture, meet with key people, and identify how your work can help them achieve their goals. This can lead to stronger collaboration and more successful projects. When it comes to project management for data science, there's no one-size-fits-all solution. Some may prefer tools like Trello or Jira, while others might find success with simpler methods like Google Sheets. The key is to find a system that works for your specific needs and makes the project management process clear and accessible to everyone involved. For those starting out, Trello is a great option as it's shareable and offers templates for data science projects. Google Sheets is also a versatile tool that can be especially helpful for smaller teams or organizations without a well-established project management system. By experimenting with different tools and finding what works best for your team, you can streamline your project management workflow and build trust and collaboration within your organization.

    • Effective communication and strong relationships in data science projectsClear communication and collaboration with stakeholders through simple project management frameworks and agile methodologies foster trust and ensure everyone is informed. Data science benefits all functions, so involving non-technical team members can lead to growth.

      Effective communication and strong relationships are crucial for the success of data science projects in small organizations. While focusing on creating accurate models and pipelines is important, it's equally essential to prioritize downstream processes and relationships with stakeholders. Regular communication through simple project management frameworks and agile methodologies can help build trust and keep everyone informed. Additionally, it's important to remember that data science can benefit all functions within an organization, and bringing non-technical team members on board with a data-centric mindset can lead to significant growth. By being patient, clear, and persistent in communicating the benefits of data science, you can help create a culture that values and integrates data-driven decision making.

    • Demonstrating the Value of Data Science in Small OrganizationsSmall data scientists must build trust and educate colleagues through effective A/B testing, while also simplifying machine learning tech stack for easier deployment.

      As a data scientist in a small organization, it's not just about delivering accurate results, but also about educating your colleagues about the benefits and impact of data science. This requires a strong A/B testing framework to demonstrate the product's value. Building trust within a small organization is crucial, and delivering results is the output of earning that trust. Small machine learning organizations have advantages over larger ones, as they often deal with simpler parts of the machine learning tech stack, such as batch tabular inference, which can be easier to learn and deploy. However, the responsibility of a data scientist in a small organization goes beyond just their work; they also represent the discipline within the company and must convince others of its value.

    • Impact of Company Size on Data Scientist RoleSmaller companies offer a broader perspective, but lack resources. Mid to larger-sized firms provide opportunities to learn from pros and acquire essential skills. Ultimately, choose based on personal goals and available opportunities.

      The size of a company can significantly impact a data scientist's role and opportunities for growth. At smaller companies, data scientists may have the advantage of a broader perspective, as they might get to engage in various aspects of machine learning, including data engineering and MLOps. However, they may lack the resources and established processes found in larger organizations. For those just starting their careers in data science, it's generally recommended to join a mid to larger-sized company to learn from experienced professionals and acquire essential skills. Startups led by data science experts can also be excellent opportunities for mentorship. Ultimately, the choice between small and large companies depends on individual goals, career stage, and the specific opportunities available. Unfortunately, there's no comprehensive resource that covers end-to-end data science workflows, making practical experience a crucial aspect of mastering the field.

    • Exploring the potential of data science in small organizationsUnderstanding the entire workflow and observing processes in larger organizations can help implement effective data science techniques in smaller businesses. Exciting potential for user-friendly MLOps tools and measuring excellence by impact in small orgs.

      To excel in data science, particularly in smaller companies, it's crucial to understand the entire workflow from data preparation to storytelling. Observing the processes of colleagues upstream and downstream in larger organizations can provide valuable insights for implementing effective data science techniques in smaller businesses. Kirsten, a data science leader, expressed her excitement about the potential of data science in small organizations, particularly in the development of user-friendly MLOps tools and the shift in measuring excellence by impact rather than just state-of-the-art performance. The future of data science lies in its ability to make a tangible difference in various industries, from education to universities, and the community's focus on creating practical solutions for small businesses.

    Recent Episodes from Practical AI: Machine Learning, Data Science

    Stanford's AI Index Report 2024

    Stanford's AI Index Report 2024
    We’ve had representatives from Stanford’s Institute for Human-Centered Artificial Intelligence (HAI) on the show in the past, but we were super excited to talk through their 2024 AI Index Report after such a crazy year in AI! Nestor from HAI joins us in this episode to talk about some of the main takeaways including how AI makes workers more productive, the US is increasing regulations sharply, and industry continues to dominate frontier AI research.

    Apple Intelligence & Advanced RAG

    Apple Intelligence & Advanced RAG
    Daniel & Chris engage in an impromptu discussion of the state of AI in the enterprise. Then they dive into the recent Apple Intelligence announcement to explore its implications. Finally, Daniel leads a deep dive into a new topic - Advanced RAG - covering everything you need to know to be practical & productive.

    The perplexities of information retrieval

    The perplexities of information retrieval
    Daniel & Chris sit down with Denis Yarats, Co-founder & CTO at Perplexity, to discuss Perplexity’s sophisticated AI-driven answer engine. Denis outlines some of the deficiencies in search engines, and how Perplexity’s approach to information retrieval improves on traditional search engine systems, with a focus on accuracy and validation of the information provided.

    Using edge models to find sensitive data

    Using edge models to find sensitive data
    We’ve all heard about breaches of privacy and leaks of private health information (PHI). For healthcare providers and those storing this data, knowing where all the sensitive data is stored is non-trivial. Ramin, from Tausight, joins us to discuss how they have deploy edge AI models to help company search through billions of records for PHI.

    Rise of the AI PC & local LLMs

    Rise of the AI PC & local LLMs
    We’ve seen a rise in interest recently and a number of major announcements related to local LLMs and AI PCs. NVIDIA, Apple, and Intel are getting into this along with models like the Phi family from Microsoft. In this episode, we dig into local AI tooling, frameworks, and optimizations to help you navigate this AI niche, and we talk about how this might impact AI adoption in the longer term.

    AI in the U.S. Congress

    AI in the U.S. Congress
    At the age of 72, U.S. Representative Don Beyer of Virginia enrolled at GMU to pursue a Master’s degree in C.S. with a concentration in Machine Learning. Rep. Beyer is Vice Chair of the bipartisan Artificial Intelligence Caucus & Vice Chair of the NDC’s AI Working Group. He is the author of the AI Foundation Model Transparency Act & a lead cosponsor of the CREATE AI Act, the Federal Artificial Intelligence Risk Management Act & the Artificial Intelligence Environmental Impacts Act. We hope you tune into this inspiring, nonpartisan conversation with Rep. Beyer about his decision to dive into the deep end of the AI pool & his leadership in bringing that expertise to Capitol Hill.

    Full-stack approach for effective AI agents

    Full-stack approach for effective AI agents
    There’s a lot of hype about AI agents right now, but developing robust agents isn’t yet a reality in general. Imbue is leading the way towards more robust agents by taking a full-stack approach; from hardware innovations through to user interface. In this episode, Josh, Imbue’s CTO, tell us more about their approach and some of what they have learned along the way.

    Private, open source chat UIs

    Private, open source chat UIs
    We recently gathered some Practical AI listeners for a live webinar with Danny from LibreChat to discuss the future of private, open source chat UIs. During the discussion we hear about the motivations behind LibreChat, why enterprise users are hosting their own chat UIs, and how Danny (and the LibreChat community) is creating amazing features (like RAG and plugins).

    Related Episodes

    When data leakage turns into a flood of trouble

    When data leakage turns into a flood of trouble
    Rajiv Shah teaches Daniel and Chris about data leakage, and its major impact upon machine learning models. It’s the kind of topic that we don’t often think about, but which can ruin our results. Raj discusses how to use activation maps and image embedding to find leakage, so that leaking information in our test set does not find its way into our training set.

    Stable Diffusion (Practical AI #193)

    Stable Diffusion (Practical AI #193)
    The new stable diffusion model is everywhere! Of course you can use this model to quickly and easily create amazing, dream-like images to post on twitter, reddit, discord, etc., but this technology is also poised to be used in very pragmatic ways across industry. In this episode, Chris and Daniel take a deep dive into all things stable diffusion. They discuss the motivations for the work, the model architecture, and the differences between this model and other related releases (e.g., DALL·E 2). (Image from stability.ai)

    AlphaFold is revolutionizing biology

    AlphaFold is revolutionizing biology
    AlphaFold is an AI system developed by DeepMind that predicts a protein’s 3D structure from its amino acid sequence. It regularly achieves accuracy competitive with experiment, and is accelerating research in nearly every field of biology. Daniel and Chris delve into protein folding, and explore the implications of this revolutionary and hugely impactful application of AI.

    Zero-shot multitask learning (Practical AI #158)

    Zero-shot multitask learning (Practical AI #158)
    In this Fully-Connected episode, Daniel and Chris ponder whether in-person AI conferences are on the verge of making a post-pandemic comeback. Then on to BigScience from Hugging Face, a year-long research workshop on large multilingual models and datasets. Specifically they dive into the T0, a series of natural language processing (NLP) AI models specifically trained for researching zero-shot multitask learning. Daniel provides a brief tour of the possible with the T0 family. They finish up with a couple of new learning resources.