Logo
    Search

    Department of Statistics

    The Department of Statistics at Oxford is a world leader in research including computational statistics and statistical methodology, applied probability, bioinformatics and mathematical genetics. In the 2014 Research Excellence Framework (REF), Oxford's Mathematical Sciences submission was ranked overall best in the UK. This is an exciting time for the Department. We have now moved into our new home on St Giles and we are currently settling in. The new building provides improved lecture and teaching space, a variety of interaction areas, and brings together researchers in Probability and Statistics. It has created a highly visible centre for the Department in Oxford. Since 2010, the Department has been awarded over forty research grants with a total value of £9M, not counting several very large EPSRC and MRC funded awards for Centres for doctoral training.The main sponsors are the European Commission, EPSRC, the Medical Research Council and the Wellcome Trust. We offer an undergraduate degree (BA or MMath) in Mathematics and Statistics, jointly with the Mathematical Institute. At postgraduate level there is an MSc course in Applied Statistics, as well as a lively and stimulating environment for postgraduate research (DPhil or MSc by Research). Our graduates are employed in a wide range of occupational sectors throughout the world, including the university sector. The Department co-hosts the EPSRC and MRC Centre for Doctoral Training (CDT) in Next-Generational Statistical Science- the Oxford-Warwick Statistics Programme OxWaSP.
    enOxford University42 Episodes

    Episodes (42)

    A Theory of Weak-Supervision and Zero-Shot Learning

    A Theory of Weak-Supervision and Zero-Shot Learning
    A lecture exploring alternatives to using labeled training data. Labeled training data is often scarce, unavailable, or can be very costly to obtain. To circumvent this problem, there is a growing interest in developing methods that can exploit sources of information other than labeled data, such as weak-supervision and zero-shot learning. While these techniques obtained impressive accuracy in practice, both for vision and language domains, they come with no theoretical characterization of their accuracy. In a sequence of recent works, we develop a rigorous mathematical framework for constructing and analyzing algorithms that combine multiple sources of related data to solve a new learning task. Our learning algorithms provably converge to models that have minimum empirical risk with respect to an adversarial choice over feasible labelings for a set of unlabeled data, where the feasibility of a labeling is computed through constraints defined by estimated statistics of the sources. Notably, these methods do not require the related sources to have the same labeling space as the multiclass classification task. We demonstrate the effectiveness of our approach with experimentations on various image classification tasks.

    Victims of Algorithmic Violence: An Introduction to AI Ethics and Human-AI Interaction

    Victims of Algorithmic Violence: An Introduction to AI Ethics and Human-AI Interaction
    A high-level overview of key areas of AI ethics and not-ethics, exploring the challenges of algorithmic decision-making, kinds of bias, and interpretability, linking these issues to problems of human-system interaction. Much attention is now being focused on AI Ethics and Safety, with the EU AI Act and other emerging legislation being proposed to identify and curb "AI risks" worldwide. Are such ethical concerns unique to AI systems - and not just digital systems in general?

    The practicalities of academic research ethics - how to get things done

    The practicalities of academic research ethics - how to get things done
    A brief introduction to various legal and procedural ethical concepts and their applications within and beyond academia. It's all very well to talk about truth, beauty and justice for academic research ethics. But how do you do these things at a practical level? If you have a big idea, or stumble across something with important implications, what do you do with it? How do you make sure you've got appropriate safeguards without drowning in bureaucracy?

    Joining Bayesian submodels with Markov melding

    Joining Bayesian submodels with Markov melding
    This seminar explains and illustrates the approach of Markov melding for joint analysis. Integrating multiple sources of data into a joint analysis provides more precise estimates and reduces the risk of biases introduced by using only partial data. However, it can be difficult to conduct a joint analysis in practice. Instead each data source is typically modelled separately, but this results in uncertainty not being fully propagated. We propose to address this problem using a simple, general method, which requires only small changes to existing models and software. We first form a joint Bayesian model based upon the original submodels using a generic approach we call "Markov melding". We show that this model can be fitted in submodel-specific stages, rather than as a single, monolithic model. We also show the concept can be extended to "chains of submodels", in which submodels relate to neighbouring submodels via common quantities. The relationship to the "cut distribution" will also be discussed. We illustrate the approach using examples from an A/H1N1 influenza severity evidence synthesis; integrated population models in ecology; and modelling uncertain-time-to-event data in hospital intensive care units.

    Neural Networks and Deep Kernel Shaping

    Neural Networks and Deep Kernel Shaping
    Rapid training of deep neural networks without skip connections or normalization layers using Deep Kernel Shaping. Using an extended and formalized version of the Q/C map analysis of Pool et al. (2016), along with Neural Tangent Kernel theory, we identify the main pathologies present in deep networks that prevent them from training fast and generalizing to unseen data, and show how these can be avoided by carefully controlling the "shape" of the network's initialization-time kernel function. We then develop a method called Deep Kernel Shaping (DKS), which accomplishes this using a combination of precise parameter initialization, activation function transformations, and small architectural tweaks, all of which preserve the model class. In our experiments we show that DKS enables SGD training of residual networks without normalization layers on Imagenet and CIFAR-10 classification tasks at speeds comparable to standard ResNetV2 and Wide-ResNet models, with only a small decrease in generalization performance. And when using K-FAC as the optimizer, we achieve similar results for networks without skip connections. Our results apply for a large variety of activation functions, including those which traditionally perform very badly, such as the logistic sigmoid. In addition to DKS, we contribute a detailed analysis of skip connections, normalization layers, special activation functions like RELU and SELU, and various initialization schemes, explaining their effectiveness as alternative (and ultimately incomplete) ways of "shaping" the network's initialization-time kernel.

    Ethics from the perspective of an applied statistician

    Ethics from the perspective of an applied statistician
    Professor Denise Lievesley discusses ethical issues and codes of conduct relevant to applied statisticians. Statisticians work in a wide variety of different political and cultural environments which influence their autonomy and their status, which in turn impact on the ethical frameworks they employ. The need for a UN-led fundamental set of principles governing official statistics became apparent at the end of the 1980s when countries in Central Europe began to change from centrally planned economies to market-oriented democracies. It was essential to ensure that national statistical systems in such countries would be able to produce appropriate and reliable data that adhered to certain professional and scientific standards. Alongside the UN initiative, a number of professional statistical societies adopted codes of conduct. Do such sets of principles and ethical codes remain relevant over time? Or do changes in the way statistics are compiled and used mean that we need to review and adapt them? For example as combining data sources becomes more prevalent, record linkage, in particular, poses privacy and ethical challenges. Similarly obtaining informed consent from units for access to and linkage of their data from non-survey sources continues to be challenging. Denise draws on her earlier role as a statistician in the United Nations, working with some 200 countries, to discuss some of the ethical issues she encountered then and how these might change over time.

    Metropolis Adjusted Langevin Trajectories: a robust alternative to Hamiltonian Monte-Carlo

    Metropolis Adjusted Langevin Trajectories: a robust alternative to Hamiltonian Monte-Carlo
    Lionel Riou-Durand gives a talk on sampling methods. Sampling approximations for high dimensional statistical models often rely on so-called gradient-based MCMC algorithms. It is now well established that these samplers scale better with the dimension than other state of the art MCMC samplers, but are also more sensitive to tuning. Among these, Hamiltonian Monte Carlo is a widely used sampling method shown to achieve gold standard d^{1/4} scaling with respect to the dimension. However it is also known that its efficiency is quite sensible to the choice of integration time. This problem is related to periodicity in the autocorrelations induced by the deterministic trajectories of Hamiltonian dynamics. To tackle this issue, we develop a robust alternative to HMC built upon Langevin diffusions (namely Metropolis Adjusted Langevin Trajectories, or MALT), inducing randomness in the trajectories through a continuous refreshment of the velocities. We study the optimal scaling problem for MALT and recover the d^{1/4} scaling of HMC without additional assumptions. Furthermore we highlight the fact that autocorrelations for MALT can be controlled by a uniform and monotonous bound thanks to the randomness induced in the trajectories, and therefore achieves robustness to tuning. Finally, we compare our approach to Randomized HMC and establish quantitative contraction rates for the 2-Wasserstein distance that support the choice of Langevin dynamics. This is a joint work with Jure Vogrinc, University of Warwick.

    Modelling infectious diseases: what can branching processes tell us?

    Modelling infectious diseases: what can branching processes tell us?
    Professor Samir Bhatt gives a talk on the mathematics underpinning infectious disease models. Mathematical descriptions of infectious disease outbreaks are fundamental to understanding how transmission occurs. Reductively, two approaches are used: individual based simulators and governing equation models, and both approaches have a multitude of pros and cons. This talk connects these two worlds via general branching processes and discusses (at a high level) the rather beautiful mathematics that arises from them and how they can help us understand the assumptions underpinning mathematical models for infectious disease. This talk explains how this new maths can help us understand uncertainty better, and shows some simple examples. This talk is somewhat technical, but focuses as much as possible on intuition and the big picture.

    Causality and Autoencoders in the Light of Drug Repurposing for COVID-19

    Causality and Autoencoders in the Light of Drug Repurposing for COVID-19
    Caroline Uhler (MIT), gives a OxCSML Seminar on Friday 2nd July 2021. Abstract: Massive data collection holds the promise of a better understanding of complex phenomena and ultimately, of better decisions. An exciting opportunity in this regard stems from the growing availability of perturbation / intervention data (genomics, advertisement, education, etc.). In order to obtain mechanistic insights from such data, a major challenge is the integration of different data modalities (video, audio, interventional, observational, etc.). Using genomics as an example, I will first discuss our recent work on coupling autoencoders to integrate and translate between data of very different modalities such as sequencing and imaging. I will then present a framework for integrating observational and interventional data for causal structure discovery and characterize the causal relationships that are identifiable from such data. We then provide a theoretical analysis of autoencoders linking overparameterization to memorization. In particular, I will characterize the implicit bias of overparameterized autoencoders and show that such networks trained using standard optimization methods implement associative memory. We end by demonstrating how these ideas can be applied for drug repurposing in the current COVID-19 crisis.

    Recent Applications of Stein's Method in Machine Learning

    Recent Applications of Stein's Method in Machine Learning
    Qiang Liu (University of Texas at Austin) gives the OxCSML Seminar on Friday 4th June 2021. Abstract: Stein's method is a powerful technique for deriving fundamental theoretical results on approximating and bounding distances between probability measures, such as central limit theorem. Recently, it was found that the key ideas in Stein's method, despite being originally designed as a pure theoretical technique, can be repurposed to provide a basis for developing practical and scalable computational methods for learning and using large scale, intractable probabilistic models. This talk will give an overview for some of these recent advances of Stein's method in machine learning.

    Do Simpler Models Exist and How Can We Find Them?

    Do Simpler Models Exist and How Can We Find Them?
    Cynthia Rudin (Duke University) gives a OxCSML Seminar on Friday 14th May 2021. Abstract: While the trend in machine learning has tended towards more complex hypothesis spaces, it is not clear that this extra complexity is always necessary or helpful for many domains. In particular, models and their predictions are often made easier to understand by adding interpretability constraints. These constraints shrink the hypothesis space; that is, they make the model simpler. Statistical learning theory suggests that generalization may be improved as a result as well. However, adding extra constraints can make optimization (exponentially) harder. For instance it is much easier in practice to create an accurate neural network than an accurate and sparse decision tree. We address the following question: Can we show that a simple-but-accurate machine learning model might exist for our problem, before actually finding it? If the answer is promising, it would then be worthwhile to solve the harder constrained optimization problem to find such a model. In this talk, I present an easy calculation to check for the possibility of a simpler model. This calculation indicates that simpler-but-accurate models do exist in practice more often than you might think. This talk is mainly based on the following paper Lesia Semenova, Cynthia Rudin, and Ron Parr. A Study in Rashomon Curves and Volumes: A New Perspective on Generalization and Model Simplicity in Machine Learning. In progress, 2020. https://arxiv.org/abs/1908.01755

    Practical pre-asymptotic diagnostic of Monte Carlo estimates in Bayesian inference and machine learning

    Practical pre-asymptotic diagnostic of Monte Carlo estimates in Bayesian inference and machine learning
    Aki Vehtari (Aalto University) gives the OxCSML Seminar on Friday 7th May 2021 Abstract: I discuss the use of the Pareto-k diagnostic as a simple and practical approach for estimating both the required minimum sample size and empirical pre-asymptotic convergence rate for Monte Carlo estimates. Even when by construction a Monte Carlo estimate has finite variance the pre-asymptotic behaviour and convergence rate can be very different from the asymptotic behaviour following the central limit theorem. I demonstrate with practical examples in importance sampling, stochastic optimization, and variational inference, which are commonly used in Bayesian inference and machine learning.

    Complexity of local MCMC methods for high-dimensional model selection

    Complexity of local MCMC methods for high-dimensional model selection
    Quan Zhou, Texas A and M University, gives an OxCSML Seminar on Friday 25th June 2021. Abstract: In a model selection problem, the size of the state space typically grows exponentially (or even faster) with p (the number of variables). But MCMC methods for model selection usually rely on local moves which only look at a neighborhood of size polynomial in p. Naturally one may wonder how efficient these sampling methods are at exploring the posterior distribution. Consider variable selection first. Yang, Wainwright and Jordan (2016) proved that the random-walk add-delete-swap sampler is rapidly mixing under mild high-dimensional assumptions. By using an informed proposal scheme, we obtain a new MCMC sampler which achieves a much faster mixing time that is independent of p, under the same assumptions. The mixing time proof relies on a novel approach called "two-stage drift condition", which can be useful for obtaining tight complexity bounds. This result shows that the mixing rate of locally informed MCMC methods can be fast enough to offset the computational cost of local posterior evaluation, and thus such methods scale well to high-dimensional data. Next, we generalize this result to other model selection problems. It turns out that locally informed samplers attain a dimension-free mixing time if the posterior distribution satisfies a unimodal condition. We show that this condition can be established for the high-dimensional structure learning problem even when the ordering of variables is unknown. This talk is based on joint works with H. Chang, J. Yang, D. Vats, G. Roberts and J. Rosenthal. Bio: Quan Zhou is an assistant professor of the Department of Statistics at Texas A&M University (TAMU). Before joining TAMU, he was a postdoctoral research fellow at Rice University. He did his PhD at Baylor College of Medicine.

    Assessing Personalization in Digital Health

    Assessing Personalization in Digital Health
    Distinguished Speaker Seminar - Friday 18th June 2021, with Susan Murphy, Professor of Statistics and Computer Science, Harvard John A. Paulson School of Engineering and Applied Sciences. Reinforcement Learning provides an attractive suite of online learning methods for personalizing interventions in a Digital Health. However after a reinforcement learning algorithm has been run in a clinical study, how do we assess whether personalization occurred? We might find users for whom it appears that the algorithm has indeed learned in which contexts the user is more responsive to a particular intervention. But could this have happened completely by chance? We discuss some first approaches to addressing these questions.

    Machine Learning in Drug Discovery

    Machine Learning in Drug Discovery
    Graduate Lecture - Thursday 3rd June 2021, with Dr Fergus Boyles. Department of Statistics, University of Oxford. Drug discovery is a long and laborious process, with ever growing costs and dwindling productivity making it ever more difficult to bring new medicines to the market in an affordable and timely fashion. There is a long history of applying statistical modelling and machine learning to problems in drug discovery, and, as in many fields, there is growing excitement about the potential of modern machine learning techniques to both automate and accelerate time-consuming tasks, and to enable previously unfeasible experiments. In this talk I will describe the drug discovery pipeline and introduce computer-aided drug discovery. Drawing on my own research and that of others, I will explain how machine learning is currently being applied to problems in drug discovery and highlight challenges and pitfalls that remain to be addressed.

    Several structured thresholding bandit problems

    Several structured thresholding bandit problems
    OxCSML Seminar - Friday 28th May 2021, presented by Alexandra Carpentier (University of Magdeburg). In this talk we will discuss the thresholding bandit problem, i.e. a sequential learning setting where the learner samples sequentially K unknown distributions for T times, and aims at outputting at the end the set of distributions whose means \mu_k are above a threshold \tau. We will study this problem under four structural assumptions, i.e. shape constraints: that the sequence of means is monotone, unimodal, concave, or unstructured (vanilla case). We will provide in each case minimax results on the performance of any strategies, as well as matching algorithms. This will highlight the fact that even more than in batch learning, structural assumptions have a huge impact in sequential learning.

    A primer on PAC-Bayesian learning *followed by* News from the PAC-Bayes frontline

    A primer on PAC-Bayesian learning *followed by* News from the PAC-Bayes frontline
    Benjamin Guedj, University College London, gives a OxCSML Seminar on 26th March 2021. Abstract: PAC-Bayes is a generic and flexible framework to address generalisation abilities of machine learning algorithms. It leverages the power of Bayesian inference and allows to derive new learning strategies. I will briefly present the key concepts of PAC-Bayes and highlight a few recent contributions from my group.