Home > Podcasts > DataCafé

DataCafé

Welcome to the DataCafé: a special-interest Data Science podcast with Dr Jason Byrne and Dr Jeremy Bradley, interviewing leading data science researchers and domain experts in all things business, stats, maths, science and tech.

en-gb26 Episodes

People also ask

What is the main theme of the podcast?

Who are some of the popular guests the podcast?

Were there any controversial topics discussed in the podcast?

Were any current trending topics addressed in the podcast?

What popular books were mentioned in the podcast?

What is the main theme of the podcast?

Who are some of the popular guests the podcast?

Were there any controversial topics discussed in the podcast?

Were any current trending topics addressed in the podcast?

What popular books were mentioned in the podcast?

Episodes (26)

Science Communication with physicist Laurie Winkless, author of "Sticky" & "Science and the City"

Science Communication with physicist Laurie Winkless, author of "Sticky" & "Science and the City"

A key part of the scientific method is communicating the insights to an audience, for any field of research or problem context. This is where the ultimate value comes from: by sharing the cutting-edge results that can improve our understanding of the world and help deliver new innovations in people's lives. Effective science communication sits at the intersection of data, research, and the art of storytelling.

In this episode of the DataCafé we have the pleasure of welcoming Laurie Winkless, a physicist, author and science communications expert. Laurie has extensive experience in science journalism, having written numerous fascinating articles for Forbes Magazine, Wired, Esquire, and The Economist. She has also authored two science books which we will talk about today:

Laurie tells us about the amazing insights in her books from her research, interviews and discussions with leading scientists around the world. She gives us an idea of how the scientific method sits at the core of this work. Her efforts involve moving across many complicated data landscapes to uncover and articulate the key insights of the scientists working in these fields. And she does this through the art of storytelling, in a manner that can capture people's imagination whilst educating and surprising them at the same time.

Interview guest: Laurie Winkless, physicist, author, science communicator. Contactable via her website, and on twitter, mastodon, and linkedin.

Further information:

Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.

en-gbJune 02, 2023

science communication

A Culture of Innovation

A Culture of Innovation

Culture is a key enabler of innovation in an organisation. Culture underpins the values that are important to people and the motivations for their behaviours. When these values and behaviours align with the goals of innovation, it can lead to high performance across teams that are tasked with the challenge of leading, inspiring and delivering innovation. Many scientists and researchers are faced with these challenges in various scenarios, yet may be unaware of the level of influence that comes from the culture they are part of.

In this episode we talk about what it means to design and embed a culture of innovation. We outline some of our findings in literature about the levels of culture that may be invisible or difficult to measure. Assessing culture helps understand the ways it can empower people to experiment and take risks, and the importance this has for innovation. And where a culture is deemed to be limiting innovation, action can be taken to motivate the right culture and steer the organisation towards a better chance of success.

Futher Reading

Paper: Hogan & Coote (2014) Organizational Culture, Innovation and Performance (via www.researchgate.net)
Book: Johnson & Scholes (1999) Exploring Corporate Strategy: Text and Cases
Article: Understanding Organisational Culture - Checklist by CMI (via www.managers.org.uk)
Article: The Cultural Web (via www.mindtools.com)
Paper: Mossop et al. (2013) Analysing the hidden curriculum: use of a cultural web (via www.ncbi.nlm.nih.gov)
Book: Bruch & Vogel (2011) Fully Charged: How Great Leaders Boost Their Organization’s Energy and Ignite High Performance (via reading.ac.uk)
Webinar: Bruch (2012) Fully Charged: How Great Leaders Boost Their Organization’s Energy and Ignite High Performance (via hbr.org)
Article: Pisano (2019) The Hard Truth About Innovative Cultures (via hbr.org)

Some links above may require payment or login. We are not endorsing them or receiving any payment for mentioning them. They are provided as is. Often free versions of papers are available and we would encourage you to investigate.

Recording date: 12 Aug 2022

Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.

en-gbSeptember 06, 2022

Scaling the Internet

Scaling the Internet

Do you have multiple devices connected to your internet fighting for your bandwidth? Are you asking your children (or even neighbours!) to get off the network so you can finish an important call? Recent lockdowns caused huge network contention as everyone moved to online meetings and virtual classrooms. This is an optimisation challenge that requires advanced modelling and simulation to tackle. How can a network provider know how much bandwidth to provision to a town or a city to cope with peak demands? That's where agent-based simulations come in - to allow network designers to anticipate and then plan for high-demand events, applications and trends.

In this episode of the DataCafé we hear from Dr. Lucy Gullon, AI and Optimisation Research Specialist at Applied Research, BT. She tells us about the efforts underway to assess the need for bandwidth across different households and locations, and the work they lead to model, simulate, and optimise the provision of that bandwidth across the network of the UK. We hear how planning for peak use, where, say, the nation is streaming a football match is an important consideration. At the same time, reacting to times of low throughput can help to switch off unused circuits and equipment and save a lot of energy.

Interview Guest: Dr. Lucy Gullon, AI and Optimisation Research Specialist from Applied Research, BT.

Further reading:

BT Research and Development (https://www.bt.com/about/bt/research-and-development)
Anylogic agent-based simulator (https://www.anylogic.com/use-of-simulation/agent-based-modeling/)
Article: Agent-based modelling (via Wikipedia)
Article:Prisoner's Dilemma (via Wikipedia)
Article: Crowd Simulation (via Wikipedia)
Book: Science and the City (via Bloomsbury)
Research group: Traffic Modelling (via mit.edu)

Some links above may require payment or login. We are not endorsing them or receiving any payment for mentioning them. They are provided as is. Often free versions of papers are available and we would encourage you to investigate.

Recording date: 5 May 2022
Interview date: 27 Apr 2022

Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.

en-gbJuly 30, 2022

[Bite] Documenting Data Science Projects

[Bite] Documenting Data Science Projects

Do you ever find yourself wondering what the data was you used in a project? When was it obtained and where is it stored? Or even just the way to run a piece of code that produced a previous output and needs to be revisited?

Chances are the answer is yes. And it’s likely you have been frustrated by not knowing how to reproduce an output or rerun a codebase or even who to talk to to obtain a refresh of the data - in some way, shape, or form.

The problem that a lot of project teams face, and data scientists in particular, is the agreement and effort to document their work in a robust and reliable fashion. Documentation is a broad term and can refer to all manner of project details, from the actions captured in a team meeting to the technical guides for executing an algorithm.

In this bite episode of DataCafé we discuss the challenges around documentation in data science projects (though it applies more broadly). We motivate the need for good documentation through agreement of the responsibilities, expectations, and methods of capturing notes and guides. This can be everything from a summary of the data sources and how to preprocess input data, to project plans and meeting minutes, through to technical details on the dependencies and setups for running codes.

Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.

en-gbJune 29, 2022

project management

Landing Data Science Projects: The Art of Change Management & Implementation

Landing Data Science Projects: The Art of Change Management & Implementation

Are people resistant to change? And if so, how do you manage that when trying to introduce and deliver innovation through Data Science?

In this episode of the DataCafé we discuss the challenges faced when trying to land a data science project. There are a number of potential barriers to success that need to be carefully managed. We talk about "change management" and aspects of employee behaviours and stakeholder management that influence the chances of landing a project. This is especially important for embedding innovation in your company or organisation, and implementing a plan to sustain the changes needed to deliver long-term value.

Further reading & references

Kotter's 8 Step Change Plan
Armenakis, Achilles & Harris, Stanley & Mossholder, Kevin. (1993). Creating Readiness for Organizational Change. Human Relations. 46. 681-704. 10.1177/001872679304600601.
Lewin, K (1944a) Constructs in Field Theory. In D Cartwright(Ed):(1952) Field Theory in Social Science: Selected Theoretical Papers by Kurt Lewin. London: Social Science Paperbacks. pp30-42
Lewin, K. (1947) ‘Frontiers in Group Dynamics: Concept, Method and Reality in Social Science; Social Equilibria and Social Change’, Human Relations, 1(1), pp. 5–41. doi: 10.1177/001872674700100103.

Some links above may require payment or login. We are not endorsing them or receiving any payment for mentioning them. They are provided as is. Often free versions of papers are available and we would encourage you to investigate.

Recording date: 10 February 2022

Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.

en-gbMay 31, 2022

change management

project management

business strategy

[Bite] Version Control for Data Scientists

[Bite] Version Control for Data Scientists

Data scientists usually have to write code to prototype software, be it to preprocess and clean data, engineer features, build a model, or deploy a codebase into a production environment or other use case. The evolution of a codebase is important for a number of reasons which is where version control can help, such as:

collaborating with other code developers (due diligence in coordination and delegation)
generating backups
recording versions
tracking changes
experimenting and testing
and working with agility.

In this bite episode of the DataCafé we talk about these motivators for version control and how it can strengthen your code development and teamwork in building a data science model, pipeline or product.

Further reading:

Version control via Wikipedia https://en.wikipedia.org/wiki/Version_control
git-scm via https://git-scm.com/
"Version Control & Git" by Jason Byrne via Slideshare https://www.slideshare.net/JasonByrne6/version-control-git-86928367
"Learn git" via codecademy https://www.codecademy.com/learn/learn-git
"Become a git guru" via Atlassian https://www.atlassian.com/git/tutorials
Gitflow workflow via Atlassian https://www.atlassian.com/git/tutorials/comparing-workflows/gitflow-workflow
"A successful git branching model" by Vincent Dressian https://nvie.com/posts/a-successful-git-branching-model/
Branching strategies via GitVersion https://gitversion.net/docs/learn/branching-strategies/

Recording date: 21 April 2022

Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.

en-gbMay 05, 2022

project management

Deep Learning Neural Networks: Building Trust and Breaking Bias

Deep Learning Neural Networks: Building Trust and Breaking Bias

We explore one of the key issues around Deep Learning Neural Networks - how can you prove that your neural network will perform correctly? Especially if the neural network in question is at the heart of a mission-critical application, such as making a real-time control decision in an autonomous car. Similarly, how can you establish if you've trained your neural network at the heart of a loan decision agent with a prebuilt bias? How can you be sure that your black box is going to adapt to critical new situations?

We speak with Prof. Alessio Lomuscio about how Mixed Integer Linear Programs (MILPs) and Symbolic Interval Propagation can be used to capture and solve verification problems in large Neural Networks. Prof. Lomuscio leads the Verification of Autonomous Systems Group in the Dept. of Computing at Imperial College; their results have shown that verification is feasible for models in the millions of tunable parameters, which was previously not possible. Tools like VENUS and VeriNet, developed in their lab, can verify key operational properties in Deep Learning Networks and this has a particular relevance for safety-critical applications in e.g. the aviation industry, medical imaging and autonomous transportation. Particularly importantly, given that neural networks are only as good as the training data that they have learned from, it is also possible to prove that a particular defined bias does or does not exist for a given network. This latter case is, of course, important for many social or industrial applications: being able to show that a decisioning tool treats people of all genders, ethnicities and abilities equitably.

Interview Guest

Our interview guest Alessio Lomuscio is Professor of Safe Artificial Intelligence in the Department of Computing at Imperial College London. Anyone wishing to contact Alessio about his team's verification technology can do so via his Imperial College website, or via the Imperial College London spin-off Safe Intelligence that will be commercialising the AI verification technology in the future.

Further Reading

Publication list for Prof. Alessio Lomuscio (via Imperial College London)
Paper on Formal Analysis of Neural Network-based Systems in the Aircraft Domain using the VENUS tool (via Imperial College London)
Paper on Scalable Complete Verification of ReLU Neural Networks via Dependency-based Branching (via IJCAI.org)
Paper on DEEPSPLIT: An Efficient Splitting Method for Neural Network Verification via Indirect Effect Analysis (via IJCAI.org)
Team: Verification of Autonomous Systems Group, Department of Computing, Imperial College London
Tools: VENUS and VeriNet

Some links above may require payment or login. We are not endorsing them or receiving any payment for mentioning them. They are provided as is. Often free versions of papers are available and we would encourage you to inve

Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.

en-gbApril 07, 2022

artificial intelligence

machine learning

neural networks

[Bite] Wordle: Winning against the algorithm

[Bite] Wordle: Winning against the algorithm

The grey, green and yellow squares taking over social media in the last few weeks is an example of the fascinating field of study known as Game Theory. In this bite episode of DataCafé we talk casually about Wordle - the internet phenomenon currently challenging players to guess a new five letter word each day.

Six guesses inform players what letters they have gotten right and if they are in the right place. It’s a lovely example of the different ways people approach game strategy through their choice of guesses and ways to use the information presented within the game.

Wordles

Wordle - the original
Absurdle - it's Wordle but it fights you!
Nerdle - Maths Wordle
Quordle - when one Wordle is not enough!
Foclach - Irish Wordle

Analysis

Statistical analysis of hard-mode Wordle with Matlab by Matt Tearle (youtube)
The science behind Wordle by Ido Frizler (medium.com)

Some links above may require payment or login. We are not endorsing them or receiving any payment for mentioning them. They are provided as is. Often free versions of papers are available and we would encourage you to investigate.

Recording date: 15 Feb 2022

Intro music by Music 4 Video Library (Patreon supporter)

Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.

en-gbMarch 14, 2022

Series 2 Introduction

Series 2 Introduction

Looks like we might be about to have a new Series of DataCafé!

Recording date: 15 Feb 2022

Intro music by Music 4 Video Library (Patreon supporter)

Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.

en-gbMarch 14, 2022

artificial intelligence

machine learning

[Bite] Why Data Science projects fail

[Bite] Why Data Science projects fail

Data Science in a commercial setting should be a no-brainer, right? Firstly, data is becoming ubiquitous, with gigabytes being generated and collected every second. And secondly, there are new and more powerful data science tools and algorithms being developed and published every week. Surely just bringing the two together will deliver success...

In this episode, we explore why so many Data Science projects fail to live up to their initial potential. In a recent Gartner report, it is anticipated that 85% of Data Science projects will fail to deliver the value they should due to "bias in data, algorithms or the teams responsible for managing them". There are many reasons why data science projects stutter even aside from the data, the algorithms and the people.
We discuss six key technical reasons why Data Science projects typically don't succeed based on our experience and one big non-technical reason!

And being 'on the air' for a year now we'd like to give a big Thank You to all our brilliant guests and listeners - we really could not have done this without you! It's been great getting feedback and comments on episodes. Do get in touch jeremy@datacafe.uk or jason@datacafe.uk if you would like to tell us your experiences of successful or unsuccessful data science projects and share your ideas for future episodes.

Further Reading and Resources

Article: "Why Big Data Science & Data Analytics Projects Fail" (https://bit.ly/3dfPzoH via Data Science Project Management)
Article: "10 reasons why data science projects fail" (https://bit.ly/3gIuhSL via Fast Data Science)
Press Release: "Gartner Says Nearly Half of CIOs Are Planning to Deploy Artificial Intelligence" (https://gtnr.it/2TTYDZa via Gartner)
Article: "6 Reasons Why Data Science Projects Fail" (https://bit.ly/2TN3sDK via ODSC Open Data Science)
Blog: "Reasons Why Data Projects Fail" (https://bit.ly/3zJrFeA via KDnuggets)

Some links above may require payment or login. We are not endorsing them or receiving any payment for mentioning them. They are provided as is. Often free versions of papers are available and we would encourage you to investigate.

Recording date: 18 June 2021

Intro music by Music 4 Video Library (Patreon supporter)

Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.

en-gbJune 21, 2021

project management

Data Science for Good

Data Science for Good

What's the difference between a commercial data science project and a Data Science project for social benefit? Often so-called Data Science for Good projects involve a throwing together of many people from different backgrounds under a common motivation to have a positive effect.

We talk to a Data Science team that was formed to tackle the unemployment crisis that is coming out of the pandemic and help people to find excellent jobs in different industries for which they have a good skills match.

We interview Erika Gravina, Rajwinder Bhatoe and Dehaja Senanayake about their story helping to create the Job Finder Machine with the Emergent Alliance, DataSparQ, Reed and Google.

Further Information

Project: Job Finder Machine
Project Group: Emergent Alliance and DataSparQ
Shout out: Code First Girls for fantastic courses, mentoring and support for women in tech and data science

Some links above may require payment or login. We are not endorsing them or receiving any payment for mentioning them. They are provided as is. Often free versions of papers are available and we would encourage you to investigate.

Interview date: 25 March 2021
Recording date: 13 May 2021

Intro audio Music 4 Video Library (Patreon supporter)

Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.

en-gbMay 31, 2021

[Bite] Data Science and the Scientific Method

[Bite] Data Science and the Scientific Method

The scientific method consists of systematic observation, measurement, and experiment, and the formulation, testing, and modification of hypotheses. But what does this mean in the context of Data Science, where a wealth of unstructured data and variety of computational models can be used to deduce an insight and inform a stakeholder's decision?

In this bite episode we discuss the importance of the scientific method for data scientists. Data science is, after all, the application of scientific techniques and processes to large data sets to obtain impact in a given application area. So we ask how the scientific method can be harnessed efficiently and effectively when there is so much uncertainty in the design and interpretation of an experiment or model.

Further Reading and Resources

Paper: "Defining the scientific method" via Nature https://www.nature.com/articles/nmeth0409-237
Paper: "Big data: the end of the scientific method" via The Royal Society https://royalsocietypublishing.org/doi/10.1098/rsta.2018.0145
Article: "The Data Scientific Method" via Medium https://towardsdatascience.com/a-data-scientific-method-80caa190dbd4
Article: "The scientific method of machine learning" via Datascience.aero https://datascience.aero/scientific-method-machine-learning/
Article: "Putting the 'Science' Back in Data Science" via KDnuggets https://www.kdnuggets.com/2017/09/science-data-science.html
Podcast: "In Our Time: The Scientific Method" via BBC Radio 4 https://www.bbc.co.uk/programmes/b01b1ljm
Podcast: "The end of the scientific method" via The Economist https://www.economist.com/podcasts/2019/11/27/the-end-of-the-scientific-method
Video: "The Scientific Method" via Coursera https://www.coursera.org/lecture/data-science-fundamentals-for-data-analysts/the-scientific-method-Ha5hq
Cartoon: "Machine Learning" via xkcd https://xkcd.com/1838/

Some links above may require payment or login. We are not endorsing them or receiving any payment for mentioning them. They are provided as is. Often free versions of papers are available and we would encourage you to investigate.

Recording date: 30 April 2021

Intro music by Music 4 Video Library (Patreon supporter)

Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.

en-gbMay 03, 2021

scientific method

hypothesis driven approach

Data Science on Mars

Data Science on Mars

On 30 July 2020 NASA launched the Mars 2020 mission from Earth carrying a rover called Perseverance, and rotorcraft called Ingenuity, to land on and study Mars. The mission so far has been a resounding success, touching down in Jezero Crater on 18 February 2021, and sending back data and imagery of the Martian landscape since then.

The aim of the mission is to advance NASA's scientific goals of establishing if there was ever life on Mars, what its climate and geology are, and to pave the way for human exploration of the red planet in the future. Ingenuity will also demonstrate the first air flight on another world, in the low-density atmosphere of Mars approximately 1% of the density of Earth's atmosphere.

The efforts involved are an impressive demonstration of the advances and expertise of the science, engineering, and project teams. Data from the mission will drive new scientific insights as well as prove the technical abilities demonstrated throughout. Of particular interest is the Terrain Relative Navigation (TRN) system that enables autonomous landing of missions on planetary bodies like Mars, being so far away that we cannot have ground communications on Earth in the loop.

We talk with Prof. Paul Byrne, a planetary geologist from North Carolina State University, about the advances in planetary science and what the Mars 2020 mission means for him, his field of research, and for humankind.

Further Reading and Resources

Website: Profile page for Prof. Paul Byrne at the Center for Geospatial Analytics at NCSU (https://bit.ly/3gkP4vD via ncsu.edu)
Website: Mars 2020 (https://mars.nasa.gov/mars2020/)
Paper: Mars 2020 Science Definition Team Report (https://go.nasa.gov/3x5d6AF via nasa.gov)
Video: Perseverance Rover's Descent and Touchdown on Mars (https://bit.ly/32o6248 via youtube)
Website: Lunar rocks and soils from Apollo missions (https://curator.jsc.nasa.gov/lunar/)
Article: Terrain Relative Navigation (https://go.nasa.gov/2RMd9RZ via nasa.gov)
Paper: A General Approach to Terrain Relative Navigation for Planetary Landing (https://bit.ly/3mXCN1z via aiaa.org)
Video: Terrain Relative Navigation, NASA JPL (https://bit.ly/2QCcTEB via youtube)
Video: Studying Alien Worlds to Understand Earth (https://bit.ly/3tpZ1f3 via youtube)

Some links above may require payment or login. We are not endorsing them or receiving any payment for mentioning them. They are provided as is. Often free versions of papers are available and we would encourage you to investigate.

Interview date: 25 March 2021
Recording date:

Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.

en-gbApril 19, 2021

artificial intelligence

machine learning

space exploration

[Bite] How to hire a great Data Scientist

[Bite] How to hire a great Data Scientist

Welcome to the first DataCafé Bite: a bite-size episode where Jason and Jeremy drop-in for a quick chat about a relevant or newsworthy topic from the world of Data Science. In this episode, we discuss how to hire a great Data Scientist, which is a challenge faced by many companies and is not easy to get right.

From endless coding tests and weird logic puzzles, to personality quizzes and competency-based interviews; there are many examples of how companies try to assess how a candidate handles and reacts to data problems. We share our thoughts and experiences on ways to set yourself up for success in hiring the best person for your team or company.

Have you been asked to complete a week-long data science mini-project for a company, or taken part in a data hackathon? We'd love to hear your experiences of good and bad hiring practice around Data Science. You can email us as jason at datacafe.uk or jeremy at datacafe.uk with your experiences. We'll be sure to revisit this topic as it's such a rich and changing landscape.

Further Reading

Article: Guide to hiring data Scientists (https://bit.ly/2OjnALi via kdnuggets.com)
Article: Hiring a data scientist: the good the bad and the ugly! (https://bit.ly/3cMpLR5 via forbes.com)
Article: How to Hire (https://bit.ly/3dCLTfO via Harvard Business Review)
Podcast: How to start a startup (https://bit.ly/3sOWxGU via Y-Combinator/Stanford University)
Video: Adam Grant: Hire for Culture Fit or Add? (https://bit.ly/3cNGWl3 via YouTube/Stanford eCorner)

Some links above may require payment or login. We are not endorsing them or receiving any payment for mentioning them. They are provided as is. Often free versions of papers are available and we would encourage you to investigate.

Recording date: 1 April 2021

Intro music by Music 4 Video Library (Patreon supporter)

Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.

en-gbApril 05, 2021

Bayesian Inference: The Foundation of Data Science

Bayesian Inference: The Foundation of Data Science

In this episode we talk about all things Bayesian. What is Bayesian inference and why is it the cornerstone of Data Science?

Bayesian statistics embodies the Data Scientist and their role in the data modelling process. A Data Scientist starts with an idea of how to capture a particular phenomena in a mathematical model - maybe derived from talking to experts in the company. This represents the prior belief about the model. Then the model consumes data around the problem - historical data, real-time data, it doesn't matter. This data is used to update the model and the result is called the posterior.

Why is this Data Science? Because models that react to data and refine their representation of the world in response to the data they see are what the Data Scientist is all about.

We talk with Dr Joseph Walmswell, Principal Data Scientist at life sciences company Abcam, about his experience with Bayesian modelling.

Further Reading

Publication list for Dr. Joseph Walmswell (https://bit.ly/3s8xluH via researchgate.net)
Blog on Bayesian Inference for parameter estimation (https://bit.ly/2OX46fV via towardsdatascience.com)
Book Chapter on Bayesian Inference (https://bit.ly/2Pi9Ct9 via cmu.edu)
Article on The Monty Hall problem (https://bit.ly/3f1pefr via Wikipedia)
Podcast on "The truth about obesity and Covid-19", More or Less: Behind the Stats podcast (https://bbc.in/3lBqCGS via bbc.co.uk)
Gov.uk guidance:
- Article on "Understanding lateral flow antigen testing for people without symptoms" (https://bit.ly/313JDs9)
- Article on "Households and bubbles of pupils, students and staff of schools, nurseries and colleges: get rapid lateral flow tests" (https://bit.ly/3c5ZXih)

Some links above may require payment or login. We are not endorsing them or receiving any payment for mentioning them. They are provided as is. Often free versions of papers are available and we would encourage you to investigate.

Recording date: 16 March 2021
Interview date: 26 February 2021

Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.

en-gbMarch 23, 2021

Apple Tasting: Reinforcement learning for quality control

Apple Tasting: Reinforcement learning for quality control

Have you ever come home from the supermarket to discover one of the apples you bought is rotten? It's likely your trust for that grocer was diminished, or you might stop buying that particular brand of apples altogether.

In this episode, we discuss how the quality controls in a production line need to use smart sampling methods in order to avoid sending bad products to the customer, which could ruin the reputation of both the brand and seller.

To do this we describe a thought experiment called Apple Tasting. This allows us to demonstrate the concepts of regret and reward in a sampling process, giving rise to the use of Contextual Bandit Algorithms. Contextual Bandits come from the field of Reinforcement Learning which is a form of Machine Learning where an agent performs an action and tries to maximise the cumulative reward from its environment over time. Standard bandit algorithms simply choose between a number of actions and measure the reward in order to determine the average reward of each action. But a Contextual Bandit also uses information from its environment to inform both the likely reward and regret of subsequent actions. This is particularly useful in personalised product recommendation engines where the bandit algorithm is given some contextual information about the user.

Back to Apple Tasting and product quality control. The contextual bandit in this scenario, consumes a signal from a benign test that is indicative, but not conclusive, of there being a fault and then makes the decision to perform a more in-depth test or not. So the answer for when you should discard or test your product depends on the relative costs of making the right decision (reward) or wrong decision (regret) and how your experience of the environment affected these in the past.

We speak with Prof. David Leslie about how this logic can be applied to any manufacturing pipeline where there is a downside risk of not quality checking the product but a cost in a false positive detection of a bad product.

Other areas of application include:

Anomalous behaviour in a jet engine e.g. low fuel efficiency, which could be nothing or could be serious, so it might be worth taking the plane in for repair.
Changepoints in network data time series - does it mean there’s a fault on the line or does it mean the next series of The Queen’s Gambit has just been released? Should we send an engineer out?

With interview guest David Leslie, Professor of Statistical Learning in the Department of Mathematics and Statistics at Lancaster University.

Further Reading

Publication list for Prof. David Leslie (http://bitly.ws/bQ4a via Lancaster University)
Paper on "Selecting Multiple Web Adverts - a Contextual Multi-armed Bandit with State Uncertainty" in Journal of the ORS (http://bitly.ws/bQ3X via Lancaster University)
Paper on "Apple tasting" (http://bitly.ws/bQeW via ScienceDirect)
Paper by Google Inc. on "AutoML for Contextual Bandits" (https://arxiv.org/abs/1909.03212 via arXiv)

Some links above may require payment or login. We are not endorsing them or receiving any payment for mentioning th

Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.

en-gbFebruary 22, 2021

artificial intelligence

machine learning

reinforcement learning

Optimising the Future

Optimising the Future

As we look ahead to a new year, and reflect on the last, we consider how data science can be used to optimise the future. But to what degree can we trust past experiences and observations, essentially relying on historical data to predict the future? And with what level of accuracy?

In this episode of the DataCafé we ask: how can we optimise our predictions of future scenarios to maximise the benefit we can obtain from them while minimising the risk of unknowns?

Data Science is made up of many diverse technical disciplines that can help to answer these questions. Two among them are mathematical optimisation and machine learning. We explore how these two fascinating areas interact and how they can both help to turbo charge the other's cutting edge in the future.

We speak with Dimitrios Letsios from King's College London about his work in optimisation and what he sees as exciting new developments in the field by working together with the field of machine learning.

With interview guest Dr. Dimitrios Letsios, lecturer (assistant professor) in the Department of Informatics at King's College London and a member of the Algorithms and Data Analysis Group.

Further reading

Dimirios Letsios' publication list (https://bit.ly/35vHirH via King's College London)
Paper on taking into account uncertainty in an optimisation model: Approximating Bounded Job Start Scheduling with Application in Royal Mail Deliveries under Uncertainty (https://bit.ly/3pLHICV via King's College London)
Paper on lexicographic optimisation: Exact Lexicographic Scheduling and Approximate Rescheduling (https://bit.ly/3rS8Xxk via arXiv)
Paper on combination of AI and Optimisation: Argumentation for Explainable Scheduling (https://bit.ly/3oobgGF via AAAI Conference on Artificial Intelligence)

Some links above may require payment or login. We are not endorsing them or receiving any payment for mentioning them. They are provided as is. Often free versions of papers are available and we would encourage you to investigate.

Recording date: 23 October 2020
Interview date: 21 February 2020

Intro music by Music 4 Video Library (Patreon supporter)

Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.

en-gbJanuary 04, 2021

machine learning

business intelligence

US Election Special

US Election Special

What exciting data science problems emerge when you try to forecast an election? Many, it turns out!

We're very excited to turn our DataCafé lens on the current Presidential race in the US as an exemplar of statistical modelling right now. Typically state election polls are asking around 1000 people in a state of maybe 12 million people how they will vote (or even if they have voted already) and return a predictive result with an estimated polling error of about 4%.

In this episode, we look at polling as a data science activity and discuss how issues of sampling bias can have dramatic impacts on the outcome of a given poll. Elections are a fantastic use-case for Bayesian modelling where pollsters have to tackle questions like "What's the probability that a voter in Florida will vote for President Trump, given that they are white, over 60 and college educated".

There are many such questions as each electorate feature (gender, age, race, education, and so on) potentially adds another multiplicative factor to the size of demographic sample needed to get a meaningful result out of an election poll.

Finally, we even hazard a quick piece of psephological analysis ourselves and show how some naive Bayes techniques can at least get a foot in the door of these complex forecasting problems. (Caveat: correlation is still very important and can be a source of error if not treated appropriately!)

Further reading:

Article: Ensemble Learning to Improve Machine Learning Results (https://bit.ly/34MW3HO via statsbot.co)
Paper: Combining Forecasts: An Application to Elections (https://bit.ly/3efx5nm via researchgate.net)
Interactive map: Explore The Ways Trump Or Biden Could Win The Election (https://53eig.ht/2TIlAvh via fivethirtyeight.com)
Podcast: 538 Politics Podcast (https://53eig.ht/2HSkwCA via fivethirtyeight.com)
Update US polling map: Consensus Forecast Electoral Map (https://bit.ly/2HY1FWk via 270towin.com)

Some links above may require payment or login. We are not endorsing them or receiving any payment for mentioning them. They are provided as is. Often free versions of papers are available and we would encourage you to investigate.

Recording date: 30 October 2020
Intro music by Music 4 Video Library (Patreon supporter)

Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.

en-gbNovember 01, 2020

bayesian statistics

Forecasting Solar Radiation Storms

Forecasting Solar Radiation Storms

What are solar storms? How are they caused? And how can we use data science to forecast them?

In this episode of DataCafé we talk about the Sun and how it drives space weather, and the efforts to forecast solar radiation storms that can have a massive impact here on Earth.

On a regular day, the Sun has a constant stream of charged particles, or plasma, coming off its surface into the solar system, known as the solar wind. But in times of high activity it can undergo much more explosive phenomena: two of these being solar flares and coronal mass ejections (CMEs). These eruptions on the Sun launch energetic particles into space in the form of plasma and magnetic field that can reach us here on Earth and cause radiation storms and/or geomagnetic storms. These storms can degrade satellites, affect telecommunications and power grids, and disrupt space exploration and aviation.

Although we can be glad the strongest events are rare, this means they are hard to predict because of the difficulties in observing, studying and classifying them. So the challenge then becomes, how can we forecast them?

To answer this we speak to Dr. Hazel Bain, a research scientist specializing in the development of tools for operational space weather forecasting. She tells us about her efforts to bring together physics-based models with machine learning in order to improve solar storm forecasts and provide alerts to customers in industries like aviation, agriculture and space exploration.

With special guest Dr. Hazel M Bain, Research Scientist at the Cooperative Institute for Research in Environmental Sciences (CIRES) at the University of Colorado, Boulder and NOAA’s Space Weather Prediction Center (SWPC).

Further reading

Online Presentation: Solar Radiation Storms by Dr. Hazel Bain (HAO colloquium via YouTube https://bit.ly/3k8WuBc)
Article: NASA Space Weather (via NASA https://go.nasa.gov/2T3v5VG)
Algorithm: AdaBoost (via scikit-learn https://bit.ly/35bkfSU)
Press Release: New Space Weather Advisories Serve Aviation (via CIRES https://bit.ly/3dyqDHI)
Paper: Shock Connectivity in the 2010 August and 2012 July Solar Energetic Particle Events Inferred from Observations and ENLIL Modeling (via IOP https://bit.ly/2IEtGTs)
Paper: Diagnostics of Space Weather Drivers Enabled by Radio Observations (via arXiv https://arxiv.org/abs/1904.05817)
Paper: Bridging EUV and White-Light Observations to Inspect the Initiation Phase of a “Two-Stage” Solar Eruptive Event (via Springer or arXiv https://arxiv.org/abs/1406.4919)

Some links above may require payment or login. We are not endorsing them or receiving any payment for mentioning them

Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.

en-gbOctober 19, 2020

Entrepreneurship in Data Science

Entrepreneurship in Data Science

How do you get your latest and greatest data science tool to make an impact? How can you avoid wasting time building a supposedly great data product only to see it fall flat on launch?

In this episode, we discuss how you need to start with the idea before you get to a data product. As all good entrepreneurs know, if you can't sell the idea, you're certainly not going to be able to sell the product. We take inspiration from a particular way of thinking about software engineering called Lean Startup, and learn how it can be applied to data science projects and to startups in general.

We are lucky enough to talk with Freddie Odukomaiya, CTO of a startup that is aiming to revolutionise commercial property decision-making. He tells us about his entrepreneur journey, creating an innovative data tech company and we learn how Lean Startup has influenced the way he has approached developing his business.

With interview guest Freddie Odukomaiya, CTO and Co-founder of GeoHood.

Further reading

Article: The Lean Startup Methodology by Eric Ries (article via theleanstartup.com)
Article: Data science and entrepreneurship: Business models for data science (blogpost via thedatascientist.com)
Article: A Lean Start-up Approach to Data Science by Ben Dias (article via LinkedIn)
Podcast: Linear Digressions with Katie and Ben (via lineardigressions.com)

Some links above may require payment or login. We are not endorsing them or receiving any payment for mentioning them. They are provided as is. Often free versions of papers are available and we would encourage you to investigate.

Recording date: 11 September 2020
Interview date: 16 June 2020

Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.

en-gbSeptember 19, 2020

entrepreneurship

Ask DataCafé Anything

What is the main theme of the podcast?

Who are some of the popular guests the podcast?

Were there any controversial topics discussed in the podcast?

Were any current trending topics addressed in the podcast?

What popular books were mentioned in the podcast?

Sign In to save message history

© 2024 Podcastworld. All rights reserved

Company

About us Blog Pricing Testimonials

Pages

Podcasts Episodes Topics

Support

Refund Policy Terms of Service Contact us Privacy Policy

Stay up to date

For any inquiries, please email us at hello@podcastworld.io