Logo

    datacafé

    Explore " datacafé" with insightful episodes like "Entrepreneurship in Data Science", "Viruses: Keep Calm and Use Statistics", "Changepoint Detection: Secret Weapon of the Data Scientist", "Optimal Control in Price Decision Making" and "Vehicle Routing Problem for Electric Vehicles" from podcasts like ""DataCafé", "DataCafé", "DataCafé", "DataCafé" and "DataCafé"" and more!

    Episodes (7)

    Entrepreneurship in Data Science

    Entrepreneurship in Data Science

    How do you get your latest and greatest data science tool to make an impact? How can you avoid wasting time building a supposedly great data product only to see it fall flat on launch?

    In this episode, we discuss how you need to start with the idea before you get to a data product. As all good entrepreneurs know, if you can't sell the idea, you're certainly not going to be able to sell the product. We take inspiration from a particular way of thinking about software engineering called Lean Startup, and learn how it can be applied to data science projects and to startups in general. 

    We are lucky enough to talk with Freddie Odukomaiya, CTO of a startup that is aiming to revolutionise commercial property decision-making. He tells us about his entrepreneur journey, creating an innovative data tech company and we learn how Lean Startup has influenced the way he has approached developing his business.

    With interview guest Freddie Odukomaiya, CTO and Co-founder of GeoHood.

    Further reading

    Some links above may require payment or login. We are not endorsing them or receiving any payment for mentioning them. They are provided as is. Often free versions of papers are available and we would encourage you to investigate.

    Recording date: 11 September 2020
    Interview date: 16 June 2020

    Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.

    Viruses: Keep Calm and Use Statistics

    Viruses: Keep Calm and Use Statistics

    What is a virus? How can we spot human viruses in danger of becoming pandemics? How can we use statistics to understand their origins and transmission? This turns out to be a hard problem - not least because there can be many hundreds or thousands of slightly modified strains of a virus in a small sample of blood. It is of great importance which version of a virus will become a pandemic in a population and which will merely peter out.

    Viral geneticists have to be expert statisticians to be able to disentangle this story. Fundamentally if we can use statistical techniques to understand which versions of a virus are prevalent and where they originated from we can start to design counter measures to defeat the further spread of the virus.

    We speak to statistician and data scientist Dr. Kat James about her DPhil and post-doctoral work on the statistical genetics of animal-human viruses, in particular HIV-2, at the Nuffield Department of Medicine and the Wellcome Trust Centre for Human Genetics, University of Oxford. She is now Head of Data Science at Royal Mail and has some some valuable insights on the crossover between statistical genetics and data science.

    As we discover, the current coronavirus pandemic is a so-called zoonotic virus - which means it transitioned from animals to humans at some point and has become a very successful virus in the human population. COVID-19 has similarities to influenza, HIV-1 and HIV-2, MERS and SARS as we will discover in this episode and Kat gives us some interesting lessons to learn from previous pandemics.

    Background on HIV

    HIV-1 is one of the major viral pandemics of the 20th century. Untreated, it has a greater than 95% probability of death and it has killed 33 million people (it still accounts for 750,000 deaths per year).

    Using statistical genetics, researchers have been able to identify 3 spillover events into humans for HIV-1. Human viruses often interact with developments in human geography as part of the infection dynamics and this is certainly true of HIV-1 over the course of its emergence as a pandemic virus.

    HIV-2 is a distinct but similar virus to HIV-1 and people who are infected with HIV-2 often demonstrate resistance to HIV-1. Eight spillover events from Mangabey monkeys have been identified for HIV-2.

    With interview guest Dr. Kat James who is now Head of Data Science at Royal Mail.

    Further reading

    Some links above may require payment or login. We are not endorsing them or receiving any payment for mentioning them. They are provided as is. Often free versions of papers are available and we would encourage you to investigate.

    Recording date: 7 July 2020
    Interview date: 9 June 2020


    Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.

    Changepoint Detection: Secret Weapon of the Data Scientist

    Changepoint Detection: Secret Weapon of the Data Scientist

    How can we spot a change in a jet engine vibration that might mean it’s about to fail catastrophically? How can a service forecast adapt to unexpected changes brought about by a pandemic? How might we spot an increase in rate of change of pollution in the atmosphere? The answer to all these questions is changepoints, or rather changepoint detection

    Common to all these systems is a set of ordered data, usually a time series of observations or measurements that may be noisy but have some underlying pattern. As the world changes, so those changes might lead to dramatic changes in the measurements and a disruption of the usual pattern. Unless these forecasts or failure-detection systems are updated quickly to take account of a change in measurement data, they will likely produce erroneous or unpredictable results. 

    Changepoints have many important applications in areas such as:

    • Climatology
    • Genetic sequencing
    • Finance
    • Medical imaging
    • Forecasting in industry

    We speak to statistician Dr. Rebecca Killick from Lancaster University about her work in changepoint detection and how it is a critical part of the statistical toolkit for analysing time series and other ordered data sets. In particular:

    • In forecasting where most methods tend to work on the basis of extrapolating trends, it is essential to know if a changepoint has occurred so that a refreshed model calculation can be started.
    • If there is a change in the underlying dynamics of a system that causes a complex change in the observed output then this can often be detected with a changepoint. This might be indicative of a mechanical failure or impending change in operation or an unobserved event buried deep in a difficult-to-measure environment, like a nuclear reactor. 

    With interview guest Dr. Rebecca Killick, Associate Professor of Statistics at Lancaster University.

    Further reading

    Some links above may require payment or login. We are not endorsing them or receiving any payment for mentioning them. They are provided as is. Often free versions of papers are available and we would encourage you to investigate.

    Recording date: 10 June 2020
    Interview date: 5 June 2020

    Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.

    Optimal Control in Price Decision Making

    Optimal Control in Price Decision Making

    Optimal Control is the science of making decisions in a way that optimises a key quantity such as revenue, customer satisfaction, or quality of service.

    Cake example
    Bertrand has a cake. He likes cake a lot but he can overeat cake sometimes in which case he doesn’t enjoy it so much. He would like to work out how much cake he should eat today and the next and the next so that he maximises his overall enjoyment of the cake, possibly making it last a long time (but not so long that it goes stale). The development of this decision strategy is a good example of optimal control.

    Airline example
    When selling tickets to customers, airlines face the problem of setting the right price, which allows them to both get a satisfactory instantaneous reward but also to reserve some capacity for later demand, typically associated with a higher willingness to pay. In this context, how can they make sure such a right price is offered to the customer at each moment of time?

    Interview guest Dr. Manuel Offidani, Data Scientist at easyJet.
    https://uk.linkedin.com/in/manuel-offidani

    Further Reading

    Paper: An Optimal Control Problem of Dynamic Pricing (summary via researchgate)
    Book: The Theory and Practice of Revenue Management (contents via Springer)
    Book: Dynamic Programming and Optimal Control (summary via researchgate)

    Some links above may require payment or login. We are not endorsing them or receiving any payment for mentioning them. They are provided as is. Often free versions of papers are available and we would encourage you to investigate.

    Recording date: 6 Mar. 2020
    Interview date: 7 Feb. 2020

    Additional sound effects from
    https://www.zapsplat.com

    Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.

    Vehicle Routing Problem for Electric Vehicles

    Vehicle Routing Problem for Electric Vehicles

    How can we generate efficient routes for a large fleets of vehicles that have to make many thousands of deliveries a day while taking into account breaks, shift patterns and traffic conditions? Now let's make those vehicles electric and we need to take into account vehicle battery charge level, recharging station locations and anticipated energy efficiency. It's a challenging problem!

    Vehicle Routing Problem (VRP) is the optimisation problem that describes all manifested delivery operations. It provides an optimal way of sorting deliveries onto multiple vehicles and providing each vehicle with an optimal sequence for delivery. The problem is NP-hard and suffers from a combinatorial explosion of solutions as the number of vehicles and deliveries increases.

    We speak to Merve Keskin of the Warwick Business School about extending VRP as an area of optimisation to electric vehicles with a number of interesting developments:

    • Keeping track of battery level en route and inserting possible stops at nearby compatible recharging points
    • Allowing delay for possible queues at recharge points
    • Potential for inflight negotiation on charge point booking so as to minimise contention for resource and wait times on long journeys.


    With interview guest Dr. Merve Keskin, Warwick Business School and KTP fellow.
    https://www.wbs.ac.uk/about/person/merve-keskin/

    Further reading

    Some links above may require payment or login. We are not endorsing them or receiving any payment for mentioning them. They are provided as is. Often free versions of papers are available and we would encourage you to investigate.

    Recording date: 28 Feb. 2020
    Interview date: 12 Feb. 2020

    Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.

    Inventory Optimisation: Reducing waste, Improving availability

    Inventory Optimisation: Reducing waste, Improving availability

    How do big grocery retailers maintain product availability for their customers day after day while minimising food wastage and storage costs? The answer is Inventory Optimisation, the science of maintaining sufficient stock levels of a set of products so that customers see an appropriate level of availability when they walk into your store. 

    The trade-off
    It’s hard because it often costs money to maintain a large inventory of products, because of space that is given over to bulky stock as the cost of buying a large amount of potentially expensive items without getting a return on that investment until sale. A long lead time from stocking to use or sale means no value is extracted from those items and that can have cashflow implications for a company while obviously minimising risk of so-called stockout.

    Wastage
    A significant further complication comes from food and grocery retail where the items being stocked are themselves perishable with varying expiry dates. Further significant costs are incurred if the product expires while still being stocked. This leads to huge food waste problems around the world which in turn have a significant carbon and environmental impact on the direct supply chain for the retailer.

    With interview guest Dr. Anna Lena-Sachs, Lecturer in Predictive Analytics at the Department of Management Sciences, Lancaster University.
    https://www.lancaster.ac.uk/lums/people/anna-lena-sachs

    Further Reading

    Some links above may require payment or login. We are not endorsing them or receiving any payment for mentioning them. They are provided as is. Often free versions of papers are available and we would encourage you to investigate.

    Recording date: 29 Jan. 2020
    Interview date: 24 Jan. 2020


    Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.

    Multi-Armed Bandits: Learning better decisions

    Multi-Armed Bandits: Learning better decisions

    Have you ever wondered why you keep getting adverts for products that you've only just bought and now don't need? The online advert auto-server is probably using a multi-armed bandit learner that needs a little algorithmic improvement. We speak to Ciara Pike-Burke about her work on trying to make multi-armed bandits smarter and more useful.

    The multi-armed bandit problem is a classic reinforcement learning problem where we are given a slot machine with n arms (bandits) with each arm having its own rigged probability distribution of success. Pulling any one of the arms might give you a stochastic reward of either R=+1 for success, or R=0 for failure. Our objective is to pull the arms one-by-one in sequence such that we maximize our total reward collected in the long run. In the world of data science, the bandit arms are possible decisions that can be taken, the reward is the possible win you get from taking a decision and the uncertainty in the problem is what makes this hard (and exciting!).

    With interview guest Dr. Ciara Pike-Burke from the Universitat Pompeu Fabra (Barcelona)
    https://sites.google.com/view/cpikeburke

    Further Reading

    Some links above may require payment or login. We are not endorsing them or receiving any payment for mentioning them. They are provided as is. Often free versions of papers are available and we would encourage you to investigate.

    Recording date: 29 Jan. 2020
    Interview date: 10 Jan. 2020

    Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.