[Bite] Why Data Science projects fail

en-gbJune 21, 2021

DataCafé

What was the main topic of the podcast episode?

Summarise the key points discussed in the episode?

Were there any notable quotes or insights from the speakers?

Which popular books were mentioned in this episode?

Were there any points particularly controversial or thought-provoking discussed in the episode?

Were any current events or trending topics addressed in the episode?

What was the main topic of the podcast episode?

Summarise the key points discussed in the episode?

Were there any notable quotes or insights from the speakers?

Which popular books were mentioned in this episode?

Were there any points particularly controversial or thought-provoking discussed in the episode?

Were any current events or trending topics addressed in the episode?

About this Episode

Data Science in a commercial setting should be a no-brainer, right? Firstly, data is becoming ubiquitous, with gigabytes being generated and collected every second. And secondly, there are new and more powerful data science tools and algorithms being developed and published every week. Surely just bringing the two together will deliver success...

In this episode, we explore why so many Data Science projects fail to live up to their initial potential. In a recent Gartner report, it is anticipated that 85% of Data Science projects will fail to deliver the value they should due to "bias in data, algorithms or the teams responsible for managing them". There are many reasons why data science projects stutter even aside from the data, the algorithms and the people.
We discuss six key technical reasons why Data Science projects typically don't succeed based on our experience and one big non-technical reason!

And being 'on the air' for a year now we'd like to give a big Thank You to all our brilliant guests and listeners - we really could not have done this without you! It's been great getting feedback and comments on episodes. Do get in touch jeremy@datacafe.uk or jason@datacafe.uk if you would like to tell us your experiences of successful or unsuccessful data science projects and share your ideas for future episodes.

Further Reading and Resources

Article: "Why Big Data Science & Data Analytics Projects Fail" (https://bit.ly/3dfPzoH via Data Science Project Management)
Article: "10 reasons why data science projects fail" (https://bit.ly/3gIuhSL via Fast Data Science)
Press Release: "Gartner Says Nearly Half of CIOs Are Planning to Deploy Artificial Intelligence" (https://gtnr.it/2TTYDZa via Gartner)
Article: "6 Reasons Why Data Science Projects Fail" (https://bit.ly/2TN3sDK via ODSC Open Data Science)
Blog: "Reasons Why Data Projects Fail" (https://bit.ly/3zJrFeA via KDnuggets)

Some links above may require payment or login. We are not endorsing them or receiving any payment for mentioning them. They are provided as is. Often free versions of papers are available and we would encourage you to investigate.

Recording date: 18 June 2021

Intro music by Music 4 Video Library (Patreon supporter)

Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.

Recent Episodes from DataCafé

Science Communication with physicist Laurie Winkless, author of "Sticky" & "Science and the City"

A key part of the scientific method is communicating the insights to an audience, for any field of research or problem context. This is where the ultimate value comes from: by sharing the cutting-edge results that can improve our understanding of the world and help deliver new innovations in people's lives. Effective science communication sits at the intersection of data, research, and the art of storytelling.

In this episode of the DataCafé we have the pleasure of welcoming Laurie Winkless, a physicist, author and science communications expert. Laurie has extensive experience in science journalism, having written numerous fascinating articles for Forbes Magazine, Wired, Esquire, and The Economist. She has also authored two science books which we will talk about today:

Laurie tells us about the amazing insights in her books from her research, interviews and discussions with leading scientists around the world. She gives us an idea of how the scientific method sits at the core of this work. Her efforts involve moving across many complicated data landscapes to uncover and articulate the key insights of the scientists working in these fields. And she does this through the art of storytelling, in a manner that can capture people's imagination whilst educating and surprising them at the same time.

Interview guest: Laurie Winkless, physicist, author, science communicator. Contactable via her website, and on twitter, mastodon, and linkedin.

Further information:

DataCafé

en-gbJune 02, 2023

data

science

science communication

physics

data science

A Culture of Innovation

Culture is a key enabler of innovation in an organisation. Culture underpins the values that are important to people and the motivations for their behaviours. When these values and behaviours align with the goals of innovation, it can lead to high performance across teams that are tasked with the challenge of leading, inspiring and delivering innovation. Many scientists and researchers are faced with these challenges in various scenarios, yet may be unaware of the level of influence that comes from the culture they are part of.

In this episode we talk about what it means to design and embed a culture of innovation. We outline some of our findings in literature about the levels of culture that may be invisible or difficult to measure. Assessing culture helps understand the ways it can empower people to experiment and take risks, and the importance this has for innovation. And where a culture is deemed to be limiting innovation, action can be taken to motivate the right culture and steer the organisation towards a better chance of success.

Futher Reading

Paper: Hogan & Coote (2014) Organizational Culture, Innovation and Performance (via www.researchgate.net)
Book: Johnson & Scholes (1999) Exploring Corporate Strategy: Text and Cases
Article: Understanding Organisational Culture - Checklist by CMI (via www.managers.org.uk)
Article: The Cultural Web (via www.mindtools.com)
Paper: Mossop et al. (2013) Analysing the hidden curriculum: use of a cultural web (via www.ncbi.nlm.nih.gov)
Book: Bruch & Vogel (2011) Fully Charged: How Great Leaders Boost Their Organization’s Energy and Ignite High Performance (via reading.ac.uk)
Webinar: Bruch (2012) Fully Charged: How Great Leaders Boost Their Organization’s Energy and Ignite High Performance (via hbr.org)
Article: Pisano (2019) The Hard Truth About Innovative Cultures (via hbr.org)

DataCafé

en-gbSeptember 06, 2022

Scaling the Internet

Do you have multiple devices connected to your internet fighting for your bandwidth? Are you asking your children (or even neighbours!) to get off the network so you can finish an important call? Recent lockdowns caused huge network contention as everyone moved to online meetings and virtual classrooms. This is an optimisation challenge that requires advanced modelling and simulation to tackle. How can a network provider know how much bandwidth to provision to a town or a city to cope with peak demands? That's where agent-based simulations come in - to allow network designers to anticipate and then plan for high-demand events, applications and trends.

In this episode of the DataCafé we hear from Dr. Lucy Gullon, AI and Optimisation Research Specialist at Applied Research, BT. She tells us about the efforts underway to assess the need for bandwidth across different households and locations, and the work they lead to model, simulate, and optimise the provision of that bandwidth across the network of the UK. We hear how planning for peak use, where, say, the nation is streaming a football match is an important consideration. At the same time, reacting to times of low throughput can help to switch off unused circuits and equipment and save a lot of energy.

Interview Guest: Dr. Lucy Gullon, AI and Optimisation Research Specialist from Applied Research, BT.

Further reading:

BT Research and Development (https://www.bt.com/about/bt/research-and-development)
Anylogic agent-based simulator (https://www.anylogic.com/use-of-simulation/agent-based-modeling/)
Article: Agent-based modelling (via Wikipedia)
Article:Prisoner's Dilemma (via Wikipedia)
Article: Crowd Simulation (via Wikipedia)
Book: Science and the City (via Bloomsbury)
Research group: Traffic Modelling (via mit.edu)

en-gbJuly 30, 2022

[Bite] Documenting Data Science Projects

Do you ever find yourself wondering what the data was you used in a project? When was it obtained and where is it stored? Or even just the way to run a piece of code that produced a previous output and needs to be revisited?

Chances are the answer is yes. And it’s likely you have been frustrated by not knowing how to reproduce an output or rerun a codebase or even who to talk to to obtain a refresh of the data - in some way, shape, or form.

The problem that a lot of project teams face, and data scientists in particular, is the agreement and effort to document their work in a robust and reliable fashion. Documentation is a broad term and can refer to all manner of project details, from the actions captured in a team meeting to the technical guides for executing an algorithm.

In this bite episode of DataCafé we discuss the challenges around documentation in data science projects (though it applies more broadly). We motivate the need for good documentation through agreement of the responsibilities, expectations, and methods of capturing notes and guides. This can be everything from a summary of the data sources and how to preprocess input data, to project plans and meeting minutes, through to technical details on the dependencies and setups for running codes.

en-gbJune 29, 2022

Landing Data Science Projects: The Art of Change Management & Implementation

Are people resistant to change? And if so, how do you manage that when trying to introduce and deliver innovation through Data Science?

In this episode of the DataCafé we discuss the challenges faced when trying to land a data science project. There are a number of potential barriers to success that need to be carefully managed. We talk about "change management" and aspects of employee behaviours and stakeholder management that influence the chances of landing a project. This is especially important for embedding innovation in your company or organisation, and implementing a plan to sustain the changes needed to deliver long-term value.

Further reading & references

Kotter's 8 Step Change Plan
Armenakis, Achilles & Harris, Stanley & Mossholder, Kevin. (1993). Creating Readiness for Organizational Change. Human Relations. 46. 681-704. 10.1177/001872679304600601.
Lewin, K (1944a) Constructs in Field Theory. In D Cartwright(Ed):(1952) Field Theory in Social Science: Selected Theoretical Papers by Kurt Lewin. London: Social Science Paperbacks. pp30-42
Lewin, K. (1947) ‘Frontiers in Group Dynamics: Concept, Method and Reality in Social Science; Social Equilibria and Social Change’, Human Relations, 1(1), pp. 5–41. doi: 10.1177/001872674700100103.

en-gbMay 31, 2022

[Bite] Version Control for Data Scientists

Data scientists usually have to write code to prototype software, be it to preprocess and clean data, engineer features, build a model, or deploy a codebase into a production environment or other use case. The evolution of a codebase is important for a number of reasons which is where version control can help, such as:

collaborating with other code developers (due diligence in coordination and delegation)
generating backups
recording versions
tracking changes
experimenting and testing
and working with agility.

In this bite episode of the DataCafé we talk about these motivators for version control and how it can strengthen your code development and teamwork in building a data science model, pipeline or product.

Further reading:

Version control via Wikipedia https://en.wikipedia.org/wiki/Version_control
git-scm via https://git-scm.com/
"Version Control & Git" by Jason Byrne via Slideshare https://www.slideshare.net/JasonByrne6/version-control-git-86928367
"Learn git" via codecademy https://www.codecademy.com/learn/learn-git
"Become a git guru" via Atlassian https://www.atlassian.com/git/tutorials
Gitflow workflow via Atlassian https://www.atlassian.com/git/tutorials/comparing-workflows/gitflow-workflow
"A successful git branching model" by Vincent Dressian https://nvie.com/posts/a-successful-git-branching-model/
Branching strategies via GitVersion https://gitversion.net/docs/learn/branching-strategies/

Recording date: 21 April 2022

en-gbMay 05, 2022

Deep Learning Neural Networks: Building Trust and Breaking Bias

We explore one of the key issues around Deep Learning Neural Networks - how can you prove that your neural network will perform correctly? Especially if the neural network in question is at the heart of a mission-critical application, such as making a real-time control decision in an autonomous car. Similarly, how can you establish if you've trained your neural network at the heart of a loan decision agent with a prebuilt bias? How can you be sure that your black box is going to adapt to critical new situations?

We speak with Prof. Alessio Lomuscio about how Mixed Integer Linear Programs (MILPs) and Symbolic Interval Propagation can be used to capture and solve verification problems in large Neural Networks. Prof. Lomuscio leads the Verification of Autonomous Systems Group in the Dept. of Computing at Imperial College; their results have shown that verification is feasible for models in the millions of tunable parameters, which was previously not possible. Tools like VENUS and VeriNet, developed in their lab, can verify key operational properties in Deep Learning Networks and this has a particular relevance for safety-critical applications in e.g. the aviation industry, medical imaging and autonomous transportation. Particularly importantly, given that neural networks are only as good as the training data that they have learned from, it is also possible to prove that a particular defined bias does or does not exist for a given network. This latter case is, of course, important for many social or industrial applications: being able to show that a decisioning tool treats people of all genders, ethnicities and abilities equitably.

Interview Guest

Our interview guest Alessio Lomuscio is Professor of Safe Artificial Intelligence in the Department of Computing at Imperial College London. Anyone wishing to contact Alessio about his team's verification technology can do so via his Imperial College website, or via the Imperial College London spin-off Safe Intelligence that will be commercialising the AI verification technology in the future.

Further Reading

Publication list for Prof. Alessio Lomuscio (via Imperial College London)
Paper on Formal Analysis of Neural Network-based Systems in the Aircraft Domain using the VENUS tool (via Imperial College London)
Paper on Scalable Complete Verification of ReLU Neural Networks via Dependency-based Branching (via IJCAI.org)
Paper on DEEPSPLIT: An Efficient Splitting Method for Neural Network Verification via Indirect Effect Analysis (via IJCAI.org)
Team: Verification of Autonomous Systems Group, Department of Computing, Imperial College London
Tools: VENUS and VeriNet

DataCafé

en-gbApril 07, 2022

artificial intelligence

machine learning

neural networks

data science

[Bite] Wordle: Winning against the algorithm

The grey, green and yellow squares taking over social media in the last few weeks is an example of the fascinating field of study known as Game Theory. In this bite episode of DataCafé we talk casually about Wordle - the internet phenomenon currently challenging players to guess a new five letter word each day.

Six guesses inform players what letters they have gotten right and if they are in the right place. It’s a lovely example of the different ways people approach game strategy through their choice of guesses and ways to use the information presented within the game.

Wordles