fakultät für mathematik

Explore "fakultät für mathematik" with insightful episodes like "Network-based analysis of gene expression data", "Context-based RNA-seq mapping", "Computing hybridization networks using agreement forests", "Spectral and dynamical properties of certain quantum hamiltonians in dimension two" and "Biclustering: Methods, Software and Application" from podcasts like ""Fakultät für Mathematik, Informatik und Statistik - Digitale Hochschulschriften der LMU - Teil 02/02", "Fakultät für Mathematik, Informatik und Statistik - Digitale Hochschulschriften der LMU - Teil 02/02", "Fakultät für Mathematik, Informatik und Statistik - Digitale Hochschulschriften der LMU - Teil 02/02", "Fakultät für Mathematik, Informatik und Statistik - Digitale Hochschulschriften der LMU - Teil 02/02" and "Fakultät für Mathematik, Informatik und Statistik - Digitale Hochschulschriften der LMU - Teil 01/02"" and more!

Episodes (5)

Network-based analysis of gene expression data

The methods of molecular biology for the quantitative measurement of gene expression have undergone a rapid development in the past two decades. High-throughput assays with the microarray and RNA-seq technology now enable whole-genome studies in which several thousands of genes can be measured at a time. However, this has also imposed serious challenges on data storage and analysis, which are subject of the young, but rapidly developing field of computational biology. To explain observations made on such a large scale requires suitable and accordingly scaled models of gene regulation. Detailed models, as available for single genes, need to be extended and assembled in larger networks of regulatory interactions between genes and gene products. Incorporation of such networks into methods for data analysis is crucial to identify molecular mechanisms that are drivers of the observed expression. As methods for this purpose emerge in parallel to each other and without knowing the standard of truth, results need to be critically checked in a competitive setup and in the context of the available rich literature corpus. This work is centered on and contributes to the following subjects, each of which represents important and distinct research topics in the field of computational biology: (i) construction of realistic gene regulatory network models; (ii) detection of subnetworks that are significantly altered in the data under investigation; and (iii) systematic biological interpretation of detected subnetworks. For the construction of regulatory networks, I review existing methods with a focus on curation and inference approaches. I first describe how literature curation can be used to construct a regulatory network for a specific process, using the well-studied diauxic shift in yeast as an example. In particular, I address the question how a detailed understanding, as available for the regulation of single genes, can be scaled-up to the level of larger systems. I subsequently inspect methods for large-scale network inference showing that they are significantly skewed towards master regulators. A recalibration strategy is introduced and applied, yielding an improved genome-wide regulatory network for yeast. To detect significantly altered subnetworks, I introduce GGEA as a method for network-based enrichment analysis. The key idea is to score regulatory interactions within functional gene sets for consistency with the observed expression. Compared to other recently published methods, GGEA yields results that consistently and coherently align expression changes with known regulation types and that are thus easier to explain. I also suggest and discuss several significant enhancements to the original method that are improving its applicability, outcome and runtime. For the systematic detection and interpretation of subnetworks, I have developed the EnrichmentBrowser software package. It implements several state-of-the-art methods besides GGEA, and allows to combine and explore results across methods. As part of the Bioconductor repository, the package provides a unified access to the different methods and, thus, greatly simplifies the usage for biologists. Extensions to this framework, that support automating of biological interpretation routines, are also presented. In conclusion, this work contributes substantially to the research field of network-based analysis of gene expression data with respect to regulatory network construction, subnetwork detection, and their biological interpretation. This also includes recent developments as well as areas of ongoing research, which are discussed in the context of current and future questions arising from the new generation of genomic data.

Fakultät für Mathematik, Informatik und Statistik - Digitale Hochschulschriften der LMU - Teil 02/02

deApril 29, 2016

fakultät für mathematik

informatik und statistik

Context-based RNA-seq mapping

In recent years, the sequencing of RNA (RNA-seq) using next generation sequencing (NGS) technology has become a powerful tool for analyzing the transcriptomic state of a cell. Modern NGS platforms allow for performing RNA-seq experiments in a few days, resulting in millions of short sequencing reads. A crucial step in analyzing RNA-seq data generally is determining the transcriptomic origin of the sequencing reads (= read mapping). In principal, read mapping is a sequence alignment problem, in which the short sequencing reads (30 - 500 nucleotides) are aligned to much larger reference sequences such as the human genome (3 billion nucleotides). In this thesis, we present ContextMap, an RNA-seq mapping approach that evaluates the context of the sequencing reads for determining the most likely origin of every read. The context of a sequencing read is defined by all other reads aligned to the same genomic region. The ContextMap project started with a proof of concept study, in which we showed that our approach is able to improve already existing read mapping results provided by other mapping programs. Subsequently, we developed a standalone version of ContextMap. This implementation no longer relied on mapping results of other programs, but determined initial alignments itself using a modification of the Bowtie short read alignment program. However, the original ContextMap implementation had several drawbacks. In particular, it was not able to predict reads spanning over more than two exons and to detect insertions or deletions (indels). Furthermore, ContextMap depended on a modification of a specific Bowtie version. Thus, it could neither benefit of Bowtie updates nor of novel developments (e.g. improved running times) in the area of short read alignment software. For addressing these problems, we developed ContextMap 2, an extension of the original ContextMap algorithm. The key features of ContextMap 2 are the context-based resolution of ambiguous read alignments and the accurate detection of reads crossing an arbitrary number of exon-exon junctions or containing indels. Furthermore, a plug-in interface is provided that allows for the easy integration of alternative short read alignment programs (e.g. Bowtie 2 or BWA) into the mapping workflow. The performance of ContextMap 2 was evaluated on real-life as well as synthetic data and compared to other state-of-the-art mapping programs. We found that ContextMap 2 had very low rates of misplaced reads and incorrectly predicted junctions or indels. Additionally, recall values were as high as for the top competing methods. Moreover, the runtime of ContextMap 2 was at least two fold lower than for the best competitors. In addition to the mapping of sequencing reads to a single reference, the ContextMap approach allows the investigation of several potential read sources (e.g. the human host and infecting pathogens) in parallel. Thus, ContextMap can be applied to mine for infections or contaminations or to map data from meta-transcriptomic studies. Furthermore, we developed methods based on mapping-derived statistics that allow to assess confidence of mappings to identified species and to detect false positive hits. ContextMap was evaluated on three real-life data sets and results were compared to metagenomics tools. Here, we showed that ContextMap can successfully identify the species contained in a sample. Moreover, in contrast to most other metagenomics approaches, ContextMap also provides read mapping results to individual species. As a consequence, read mapping results determined by ContextMap can be used to study the gene expression of all species contained in a sample at the same time. Thus, ContextMap might be applied in clinical studies, in which the influence of infecting agents on host organisms is investigated. The methods presented in this thesis allow for an accurate and fast mapping of RNA-seq data. As the amount of available sequencing data increases constantly, these methods will likely become an important part of many RNA-seq data analyses and thus contribute valuably to research in the field of transcriptomics.

Fakultät für Mathematik, Informatik und Statistik - Digitale Hochschulschriften der LMU - Teil 02/02

deApril 22, 2016

fakultät für mathematik

informatik und statistik

Computing hybridization networks using agreement forests

Rooted phylogenetic trees are widely used in biology to represent the evolutionary history of certain species. Usually, such a tree is a simple binary tree only containing internal nodes of in-degree one and out-degree two representing specific speciation events. In applied phylogenetics, however, trees can contain nodes of out-degree larger than two because, often, in order to resolve some orderings of speciation events, there is only insufficient information available and the common way to model this uncertainty is to use nonbinary nodes (i.e., nodes of out-degree of at least three), also denoted as polytomies. Moreover, in addition to such speciation events, there exist certain biological events that cannot be modeled by a tree and, thus, require the more general concept of rooted phylogenetic networks or, more specifically, of hybridization networks. Examples for such reticulate events are horizontal gene transfer, hybridization, and recombination. Nevertheless, in order to construct hybridization networks, the less general concept of a phylogenetic tree can still be used as building block. More precisely, often, in a first step, phylogenetic trees for a set of species, each based on a distinctive orthologous gene, are constructed. In a second step, specific sets containing common subtrees of those trees, known as maximum acyclic agreement forests, are calculated, which are then glued together to a single hybridization network. In such a network, hybridization nodes (i.e., nodes of in-degree larger than or equal to two) can exist representing potential reticulate events of the underlying evolutionary history. As such events are considered as rare phenomena, from a biological point of view, especially those networks representing a minimum number of reticulate events, which is denoted as hybridization number, are of high interest. Consequently, in a mathematical aspect, the problem of calculating hybridization networks can be briefly described as follows. Given a set T of rooted phylogenetic trees sharing the same set of taxa, compute a hybridization network N displaying T with minimum hybridization number. In this context, we say that such a network N displays a phylogenetic tree T, if we can obtain T from N by removing as well as contracting some of its nodes and edges. Unfortunately, this is a computational hard problem (i.e., it is NP-hard), even for the simplest case given just two binary input trees. In this thesis, we present several methods tackling this NP-hard problem. Our first approach describes how to compute a representative set of minimum hybridization networks for two binary input trees. For that purpose, our approach implements the first non-naive algorithm - called allMAAFs - calculating all maximum acyclic agreement forests for two rooted binary phylogenetic trees on the same set of taxa. In a subsequent step, in order to maximize the efficiency of the algorithm allMAAFs, we have developed additionally several modifications each reducing the number of computational steps and, thus, significantly improving its practical runtime. Our second approach is an extension of our first approach making the underlying algorithm accessible to more than two binary input trees. For this purpose, our approach implements the algorithm allHNetworks being the first algorithm calculating all relevant hybridization networks displaying a set of rooted binary phylogenetic trees on the same set of taxa, which is a preferable feature when studying hybridization events. Lastly, we have developed a generalization of our second approach that can now deal with multiple nonbinary input trees. For that purpose, our approach implements the first non-naive algorithm - called allMulMAAFs - calculating a relevant set of nonbinary maximum acyclic agreement forests for two rooted (nonbinary) phylogenetic trees on the same set of taxa. Each of the algorithms above is integrated into our user friendly Java-based software package Hybroscale, which is freely available and platform independent, so that it runs on all major operating systems. Our program provides a graphical user interface for visualizing trees and networks. Moreover, it facilitates the interpretation of computed hybridization networks by adding specific features to its graphical representation and, thus, supports biologists in investigating reticulate evolution. In addition, we have implemented a method using a user friendly SQL-style modeling language for filtering the usually large amount of reported networks.

Fakultät für Mathematik, Informatik und Statistik - Digitale Hochschulschriften der LMU - Teil 02/02

deApril 08, 2016

fakultät für mathematik

informatik und statistik

Spectral and dynamical properties of certain quantum hamiltonians in dimension two

After 2004, when it was possible for the first time to isolate graphene flakes, the interest in quantum mechanics of plain systems has been intensified significantly. In graphene, that is a single layer of carbon atoms aligned in a regular hexagonal structure, the generator of dynamics near the band edge is the massless Dirac operator in dimension two. We investigate the spectrum of the two-dimensional massless Dirac operator H_D coupled to an external electro-magnetic field. More precisely, our focus lies on the characterisation of the spectrum σ(H_D) for field configurations that are generated by unbounded electric and magnetic potentials. We observe that the existence of gaps in σ(H_D) depends on the ratio V^2/B at infinity, which is a ratio of the electric potential V and the magnetic field B. In particular, a sharp bound on V^2/B is given, below which σ(H_D) is purely discrete. Further, we show that if the ratio V^2/B is unbounded at infinity, H_D has no spectral gaps for a huge class of fields B and potentials V . The latter statement leads to examples of two-dimensional massless Dirac operators with dense pure point spectrum. We extend the ideas, developed for H_D, to the classical Pauli (and the magnetic Schrödinger) operator in dimension two. It turns out that also such non-relativistic operators with a strong repulsive potential do admit criteria for spectral gaps in terms of B and V . Similarly as in the case of the Dirac operator, we show that those gaps do not occur in general if |V| is dominating B at infinity. It should be mentioned that this leads to a complete characterisation of the spectrum of certain Pauli (and Schrödinger) operators with very elementary, rotationally symmetric field configurations. Considering for the Dirac operator H_D the regime of a growing ratio V^2/B, there happens a transition from pure point to continuous spectrum. A phenomenon that is particularly interesting from the dynamical point of view. Therefore, we address in a second part of the thesis the question under which spectral conditions ballistic wave package spreading in two-dimensional Dirac systems is possible. To be more explicit, we study the following problem: Do statements on the spectral type of H_D already suffice to decide whether the time mean of the expectation value $$\frac{1}{T} \int_0^T \sps{\psi(t)}{|\bx|^2\psi(t)} \rd t $$ behaves like T^2? Here ψ(t) denotes the time evolution of a state ψ under the corresponding Dirac operator. We can answer that question affirmatively, at least for certain electro-magnetic fields with symmetry.

Fakultät für Mathematik, Informatik und Statistik - Digitale Hochschulschriften der LMU - Teil 02/02

deOctober 19, 2015

fakultät für mathematik

informatik und statistik

Biclustering: Methods, Software and Application

Over the past 10 years, biclustering has become popular not only in the field of biological data analysis but also in other applications with high-dimensional two way datasets. This technique clusters both rows and columns simultaneously, as opposed to clustering only rows or only columns. Biclustering retrieves subgroups of objects that are similar in one subgroup of variables and different in the remaining variables. This dissertation focuses on improving and advancing biclustering methods. Since most existing methods are extremely sensitive to variations in parameters and data, we developed an ensemble method to overcome these limitations. It is possible to retrieve more stable and reliable bicluster in two ways: either by running algorithms with different parameter settings or by running them on sub- or bootstrap samples of the data and combining the results. To this end, we designed a software package containing a collection of bicluster algorithms for different clustering tasks and data scales, developed several new ways of visualizing bicluster solutions, and adapted traditional cluster validation indices (e.g. Jaccard index) for validating the bicluster framework. Finally, we applied biclustering to marketing data. Well-established algorithms were adjusted to slightly different data situations, and a new method specially adapted to ordinal data was developed. In order to test this method on artificial data, we generated correlated original random values. This dissertation introduces two methods for generating such values given a probability vector and a correlation structure. All the methods outlined in this dissertation are freely available in the R packages biclust and orddata. Numerous examples in this work illustrate how to use the methods and software.

Fakultät für Mathematik, Informatik und Statistik - Digitale Hochschulschriften der LMU - Teil 01/02

deMay 12, 2011

fakultät für mathematik

informatik und statistik

On this page

fakultät für mathematik

Episodes (5)

Network-based analysis of gene expression data

Context-based RNA-seq mapping

Computing hybridization networks using agreement forests

Spectral and dynamical properties of certain quantum hamiltonians in dimension two

Biclustering: Methods, Software and Application