Logo

    informatik und statistik

    Explore " informatik und statistik" with insightful episodes like "Survival Analysis with Multivariate adaptive Regression Splines", "Graph Kernels", "Model Driven Software Engineering for Web Applications", "Zur mikroskopischen Begründung der Streutheorie" and "Position Management für ortsbezogene Community-Dienste" from podcasts like ""Fakultät für Mathematik, Informatik und Statistik - Digitale Hochschulschriften der LMU - Teil 01/02", "Fakultät für Mathematik, Informatik und Statistik - Digitale Hochschulschriften der LMU - Teil 01/02", "Fakultät für Mathematik, Informatik und Statistik - Digitale Hochschulschriften der LMU - Teil 01/02", "Fakultät für Mathematik, Informatik und Statistik - Digitale Hochschulschriften der LMU - Teil 01/02" and "Fakultät für Mathematik, Informatik und Statistik - Digitale Hochschulschriften der LMU - Teil 01/02"" and more!

    Episodes (100)

    Survival Analysis with Multivariate adaptive Regression Splines

    Survival Analysis with Multivariate adaptive Regression Splines
    Multivariate adaptive regression splines (MARS) are a useful tool to identify linear and nonlinear effects and interactions between two covariates. In this dissertation a new proposal to model survival type data with MARS is introduced. Martingale and deviance residuals of a Cox PH model are used as response in a common MARS approach to model functional forms of covariate effects as well as possible interactions in a data-driven way. Simulation studies prove that the new method yields a better fit to the data than the traditional Cox PH approach. The analysis of real data of the German Heart Center on survivors of an acute myocardial infarction also documents the good performance of the method.

    Graph Kernels

    Graph Kernels
    As new graph structured data is constantly being generated, learning and data mining on graphs have become a challenge in application areas such as molecular biology, telecommunications, chemoinformatics, and social network analysis. The central algorithmic problem in these areas, measuring similarity of graphs, has therefore received extensive attention in the recent past. Unfortunately, existing approaches are slow, lacking in expressivity, or hard to parameterize. Graph kernels have recently been proposed as a theoretically sound and promising approach to the problem of graph comparison. Their attractivity stems from the fact that by defining a kernel on graphs, a whole family of data mining and machine learning algorithms becomes applicable to graphs. These kernels on graphs must respect both the information represented by the topology and the node and edge labels of the graphs, while being efficient to compute. Existing methods fall woefully short; they miss out on important topological information, are plagued by runtime issues, and do not scale to large graphs. Hence the primary goal of this thesis is to make learning and data mining with graph kernels feasible. In the first half of this thesis, we review and analyze the shortcomings of state-of-the-art graph kernels. We then propose solutions to overcome these weaknesses. As highlights of our research, we - speed up the classic random walk graph kernel from O(n^6) to O(n^3), where n is the number of nodes in the larger graph, and by a factor of up to 1,000 in CPU runtime, by extending concepts from Linear Algebra to Reproducing Kernel Hilbert Spaces, - define novel graph kernels based on shortest paths that avoid tottering and outperform random walk kernels in accuracy, - define novel graph kernels that estimate the frequency of small subgraphs within a large graph and that work on large graphs hitherto not handled by existing graph kernels. In the second half of this thesis, we present algorithmic solutions to two novel problems in graph mining. First, we define a two-sample test on graphs. Given two sets of graphs, or a pair of graphs, this test lets us decide whether these graphs are likely to originate from the same underlying distribution. To solve this so-called two-sample-problem, we define the first kernel-based two-sample test. Combined with graph kernels, this results in the first two-sample test on graphs described in the literature. Second, we propose a principled approach to supervised feature selection on graphs. As in feature selection on vectors, feature selection on graphs aims at finding features that are correlated with the class membership of a graph. Towards this goal, we first define a family of supervised feature selection algorithms based on kernels and the Hilbert-Schmidt Independence Criterion. We then show how to extend this principle of feature selection to graphs, and how to combine it with gSpan, the state-of-the-art method for frequent subgraph mining. On several benchmark datasets, our novel procedure manages to select a small subset of dozens of informative features among thousands and millions of subgraphs detected by gSpan. In classification experiments, the features selected by our method outperform those chosen by other feature selectors in terms of classification accuracy. Along the way, we also solve several problems that can be deemed contributions in their own right: - We define a unifying framework for describing both variants of random walk graph kernels proposed in the literature. - We present the first theoretical connection between graph kernels and molecular descriptors from chemoinformatics. - We show how to determine sample sizes for estimating the frequency of certain subgraphs within a large graph with a given precision and confidence, which promises to be a key to the solution of important problems in data mining and bioinformatics. Three branches of computer science immediately benefit from our findings: data mining, machine learning, and bioinformatics. For data mining, our efficient graph kernels allow us to bring to bear the large family of kernel methods to mining problems on real-world graph data. For machine learning, we open the door to extend strong theoretical results on learning on graphs into useful practical applications. For bioinformatics, we make a number of principled kernel methods and efficient kernel functions available for biological network comparison, and structural comparisons of proteins. Apart from these three areas, other fields may also benefit from our findings, as our algorithms are general in nature and not restricted to a particular type of application.

    Model Driven Software Engineering for Web Applications

    Model Driven Software Engineering for Web Applications
    Model driven software engineering (MDSE) is becoming a widely accepted approach for developing complex applications and it is on its way to be one of the most promising paradigms in software engineering. MDSE advocates the use of models as the key artifacts in all phases of the development process, from analysis to design, implementation and testing. The most promising approach to model driven engineering is the Model Driven Architecture (MDA) defined by the Object Management Group (OMG). Applications are modeled at a platform independent level and are transformed to (possibly several) platform specific implementations. Model driven Web engineering (MDWE) is the application of model driven engineering to the domain of Web application development where it might be particularly helpful because of the continuous evolution of Web technologies and platforms. However, most current approaches for MDWE provide only a partial application of the MDA pattern. Further, metamodels and transformations are not always made explicit and metamodels are often too general or do not contain sufficient information for the automatic code generation. Thus, the main goal of this work is the complete application of the MDA pattern to the Web application domain from analysis to the generated implementation, with transformations playing an important role at every stage of the development process. Explicit metamodels are defined for the platform independent analysis and design and for the platform specific implementation of dynamic Web applications. Explicit transformations allow the automatic generation of executable code for a broad range of technologies. For pursuing this goal, the following approach was chosen. A metamodel is defined for the platform independent analysis and for the design of the content, navigation, process and presentation concerns of Web applications as a conservative extension of the UML (Unified Modeling Language) metamodel, together with a cor-responding UML profile as notation. OCL constraints ensure the well-formedness of models and are checked by transformations. Transformations implement the systematic evolution of analysis and design models. A generic platform for Web applications built on an open-source Web platform and a generic runtime environment is proposed that represents a family of platforms supporting the combination of a broad range of technologies. The transformation to the platform specific models for this generic platform is decomposed along the concerns of Web applications to cope in a fine-grained way with technology changes. For each of the concerns a metamodel for the corresponding technology is defined together with the corresponding transformations from the platform independent design models. The resulting models are serialized to code by means of serialization transformations.

    Zur mikroskopischen Begründung der Streutheorie

    Zur mikroskopischen Begründung der Streutheorie
    Das Ziel der Arbeit ist es, einen Beitrag zur mikroskopischen Begründung der Streutheorie zu liefern, d.h. zu zeigen, inwiefern der asymptotische Formalismus der Streutheorie, mit Objekten wie der $S$-Matrix sowie den ein- und auslaufenden Asymptoten $\psi_{in}$ und $\psi_{out},$ aus einer mikroskopischen Beschreibung des zugrunde liegenden Systems abgeleitet werden kann. Wir konzentrieren uns dabei auf zwei Dinge. Zunächst wird die Austrittsstatistik von einem $N$-Teilchensystem durch weit entfernte Oberflächen abgeleitet. Anschließend beschränken wir uns auf die $1$-Teilchenstreuung und verwenden die Austrittsstatistik, um den Streuquerschnitt aus einer mikroskopischen Beschreibung der Streusituation abzuleiten. Die zugrunde liegende Dynamik ist die Bohmsche Mechanik, eine Theorie über die Bewegung von Punktteilchen, die alle Ergebnisse der nichtrelativistischen Quantenmechanik reproduziert.

    Position Management für ortsbezogene Community-Dienste

    Position Management für ortsbezogene Community-Dienste
    In Location-based Community Services (LBCSs) mobile users interchange and correlate their spatial positions, for example, in order to find out which other community members are currently staying nearby. The so-called position management is responsible for the transmission, analysis, processing and access control of position information, which is directed along a corresponding supply chain. The supply chain spans from the mobile device of the target person, where the position is derived, for example, by GPS, via intermediaries like the location or LBS provider, to the domain of the user. Community services pose special requirements on position management, which can be coarsely divided into the fields privacy protection and efficiency: First, the target person must be able to control by who and under which circumstances her position information is accessed. To guarantee that, it must be possible to anonymize the position data with respect to the location and LBS provider, for which so far no technique exists that is suited for community services. Also, the target person must be able to authorize requests to access her position in an easy and socially acceptable fashion. Second, concepts for efficiently realizing so-called proactive multi-target LBCSs are needed. These services are automatically triggered as soon as two or more target person have entered into a certain pre-defined spatial constellation. An example is buddy tracking, which automatically detects when two persons have approached each other below a certain proximity distance. The technical problem to solve is the frequent transmission of position information over the scarce air-interface and the associated energy consumption at the mobile terminal of the target person. This dissertation develops new concepts in both of the sketched fields and shows their feasibility based on numerous simulations and analytical reflection. Also the TraX-platform is presented, which practically implements the developed concepts.

    Kontextbereitstellung in Automobilen Ad-hoc Netzen

    Kontextbereitstellung in Automobilen Ad-hoc Netzen
    Je detaillierter ein Fahrer über den Streckenabschnitt informiert ist, den er in naher Zukunft befahren wird, desto größer ist die Wahrscheinlichkeit, dass er rechtzeitig und angemessen auf komplexe Verkehrssituationen reagiert. Die umfassende Verfügbarkeit von qualitativ hochwertigen Kontextinformationen im Fahrzeug leistet vor diesem Hintergrund einen wichtigen Beitrag zur Erhöhung der Verkehrssicherheit und -effizienz. Ziel dieser Arbeit ist eine zuverlässige Vorhersage der zukünftigen Fahrsituation auf Basis des gemeinschaftlich bekannten Wissens der Verkehrsteilnehmer. Dabei steht die Verwaltung ortsbezogener Kontextinformationen, die Fusion von verschiedenartigen Informationsquellen, sowie die Problematik der Verteilung der von den Fahrzeugen erzeugten Kontextinformationen über automobile Ad-hoc Netzen im Fokus der Arbeit. Aufbauend auf einer formalen Lösungsspezifikation beschreibt die Arbeit einen zweistufigen Bewertungsprozess, der es erlaubt, auf Basis verteilter Sensorbeobachtungen unterschiedlicher Fahrzeuge ein Wahrscheinlichkeitsmaß für das Eintreten eines konkreten Zustands eines relevanten Fahrkontexts abzuleiten. Die räumlichen und zeitlichen Eigenschaften des Kontextaspekts werden dabei gewichtet interpoliert. Anschließend werden auf Basis eines Bayesschen Netzes die kausalen Zusammenhänge unterschiedlicher Kontextaspekte quervalidiert. Zudem wird aufgezeigt, wie Kontextinformationen zwischen Fahrzeugen in einem automobilen Ad-hoc Netzwerk ausgetauscht werden können. Das aus drahtgebundenen Netzen bekannte Konzept der Nutzenmaximierung des Netzwerks wird hierzu auf die speziellen Charakteristika automobiler Netze erweitert. Es wird zudem eine schichtenübergreifende Lösungsarchitektur vorgestellt, die situationsadaptiv sowohl kurze Latenzzeiten für kritische Nachrichten, als auch eine nachhaltige Skalierbarkeit des Netzes in Szenarien mit geringen und hohen Fahrzeugdichten sicherstellt. Der Kanalzugriff und die Verbreitung der Kontextinformationen im Netzwerk basieren dabei auf einer situationsabhängigen Bewertung des Anwendungsnutzens der zu übertragenden Nachrichten. Mit Hilfe von Simulationen wird das Verhalten des Systems bewertet. Durch eine ontologiebasierte Verwaltung wird auch nichtfahrzeugbezogenen Systemen eine domänenübergreifende Nutzung der Sensorinformationen und kausalen Zusammenhänge ermöglicht.

    Hierarchical Subspace Clustering

    Hierarchical Subspace Clustering
    It is well-known that traditional clustering methods considering all dimensions of the feature space usually fail in terms of efficiency and effectivity when applied to high-dimensional data. This poor behavior is based on the fact that clusters may not be found in the high-dimensional feature space, although clusters exist in subspaces of the feature space. To overcome these limitations of traditional clustering methods, several methods for subspace clustering have been proposed recently. Subspace clustering algorithms aim at automatically identifying lower dimensional subspaces of the feature space in which clusters exist. There exist two types of subspace clustering algorithms: Algorithms for detecting clusters in axis-parallel subspaces and, as an extension, algorithms for finding clusters in subspaces which are arbitrarily oriented. Generally, the subspace clusters may be hierarchically nested, i.e., several subspace clusters of low dimensionality may form a subspace cluster of higher dimensionality. Since existing subspace clustering methods are not able to detect these complex structures, hierarchical approaches for subspace clustering have to be applied. The goal of this dissertation is to develop new efficient and effective methods for hierarchical subspace clustering by identifying novel challenges for the hierarchical approach and proposing innovative and solid solutions for these challenges. The first Part of this work deals with the analysis of hierarchical subspace clusters in axis-parallel subspaces. Two new methods are proposed that search simultaneously for subspace clusters of arbitrary dimensionality in order to detect complex hierarchies of subspace clusters. Furthermore, a new visualization model of the clustering result by means of a graph representation is provided. In the second Part of this work new methods for hierarchical clustering in arbitrarily oriented subspaces of the feature space are discussed. The so-called correlation clustering can be seen as an extension of axis-parallel subspace clustering. Correlation clustering aims at grouping the data set into subsets, the so-called correlation clusters, such that the objects in the same correlation cluster show uniform attribute correlations. Two new hierarchical approaches are proposed which combine density-based clustering with Principal Component Analysis in order to identify hierarchies of correlation clusters. The last Part of this work addresses the analysis and interpretation of the results obtained from correlation clustering algorithms. A general method is introduced to extract quantitative information on the linear dependencies between the objects of given correlation clusters. Furthermore, these quantitative models can be used to predict the probability that an object is created by one of these models. Both, the efficiency and the effectiveness of the presented techniques are thoroughly analyzed. The benefits over traditional approaches are shown by evaluating the new methods on synthetic as well as real-world test data sets.

    Statistical Diffusion Tensor Imaging

    Statistical Diffusion Tensor Imaging
    Magnetic resonance diffusion tensor imaging (DTI) allows to infere the ultrastructure of living tissue. In brain mapping, neural fiber trajectories can be identified by exploiting the anisotropy of diffusion processes. Manifold statistical methods may be linked into the comprehensive processing chain that is spanned between DTI raw images and the reliable visualization of fibers. In this work, a space varying coefficients model (SVCM) using penalized B-splines was developed to integrate diffusion tensor estimation, regularization and interpolation into a unified framework. The implementation challenges originating in multiple 3d space varying coefficient surfaces and the large dimensions of realistic datasets were met by incorporating matrix sparsity and efficient model approximation. Superiority of B-spline based SVCM to the standard approach was demonstrable from simulation studies in terms of the precision and accuracy of the individual tensor elements. The integration with a probabilistic fiber tractography algorithm and application on real brain data revealed that the unified approach is at least equivalent to the serial application of voxelwise estimation, smoothing and interpolation. From the error analysis using boxplots and visual inspection the conclusion was drawn that both the standard approach and the B-spline based SVCM may suffer from low local adaptivity. Therefore, wavelet basis functions were employed for filtering diffusion tensor fields. While excellent local smoothing was indeed achieved by combining voxelwise tensor estimation with wavelet filtering, no immediate improvement was gained for fiber tracking. However, the thresholding strategy needs to be refined and the proposed model of an incorporation of wavelets into an SVCM needs to be implemented to finally assess their utility for DTI data processing. In summary, an SVCM with specific consideration of the demands of human brain DTI data was developed and implemented, eventually representing a unified postprocessing framework. This represents an experimental and statistical platform to further improve the reliability of tractography.

    Modelling extreme wind speeds

    Modelling extreme wind speeds
    Very strong wind gusts can cause derailment of some high speed trains so knowledge of the wind process at extreme levels is required. Since the sensitivity of the train to strong wind occurrences varies with the relative direction of a gust this aspect has to be accounted for. We first focus on the wind process at one weather station. An extreme value model accounting at the same time for very strong wind speeds and wind directions is considered and applied to both raw data and component data, where the latter represent the force of the wind in a chosen direction. Extreme quantiles and exceedance probabilities are estimated and we give corresponding confidence intervals. A common problem with wind data, called the masking problem, is that per time interval only the largest wind speed over all directions is recorded, while occurrences in all other directions remain unrecorded for this time interval. To improve model estimates we suggest a model accounting for the masking problem. A simulation study is carried out to analyse the behaviour of this model under different conditions; the performance is judged by comparing the new model with a traditional model using the mean square error of high quantiles. Thereafter the model is applied to wind data. The model turns out to have desirable properties in the simulation study as well as in the data application. We further consider a multivariate extreme value model recently introduced; it allows for a broad range of dependence structures and is thus ideally suited for many applications. As the dependence structure of this model is characterised by several components, quantifying the degree of dependence is not straight forward. We therefore consider visual summary measures to support judging the degree of dependence and study their behaviour and usefulness via a simulation study. Subsequently, the new multivariate extreme value model is applied to wind data of two gauging stations where directional aspects are accounted for. Therefore this model allows for statements about the joint wind behaviour at the two stations. This knowledge gives insight whether storm events are likely to be jointly present at larger parts of a railway track or rather occur localized.

    Efficient and Effective Similarity Search on Complex Objects

    Efficient and Effective Similarity Search on Complex Objects
    Due to the rapid development of computer technology and new methods for the extraction of data in the last few years, more and more applications of databases have emerged, for which an efficient and effective similarity search is of great importance. Application areas of similarity search include multimedia, computer aided engineering, marketing, image processing and many more. Special interest adheres to the task of finding similar objects in large amounts of data having complex representations. For example, set-valued objects as well as tree or graph structured objects are among these complex object representations. The grouping of similar objects, the so-called clustering, is a fundamental analysis technique, which allows to search through extensive data sets. The goal of this dissertation is to develop new efficient and effective methods for similarity search in large quantities of complex objects. Furthermore, the efficiency of existing density-based clustering algorithms is to be improved when applied to complex objects. The first part of this work motivates the use of vector sets for similarity modeling. For this purpose, a metric distance function is defined, which is suitable for various application ranges, but time-consuming to compute. Therefore, a filter refinement technology is suggested to efficiently process range queries and k-nearest neighbor queries, two basic query types within the field of similarity search. Several filter distances are presented, which approximate the exact object distance and can be computed efficiently. Moreover, a multi-step query processing approach is described, which can be directly integrated into the well-known density-based clustering algorithms DBSCAN and OPTICS. In the second part of this work, new application ranges for density-based hierarchical clustering using OPTICS are discussed. A prototype is introduced, which has been developed for these new application areas and is based on the aforementioned similarity models and accelerated clustering algorithms for complex objects. This prototype facilitates interactive semi-automatic cluster analysis and allows visual search for similar objects in multimedia databases. Another prototype extends these concepts and enables the user to analyze multi-represented and multi-instance data. Finally, the problem of music genre classification is addressed as another application supporting multi-represented and multi-instance data objects. An extensive experimental evaluation examines efficiency and effectiveness of the presented techniques using real-world data and points out advantages in comparison to conventional approaches.

    Physical Mobile Interactions: Mobile Devices as Pervasive Mediators for Interactions with the Real World

    Physical Mobile Interactions: Mobile Devices as Pervasive Mediators for Interactions with the Real World
    So far, mobile devices have mainly been used for interactions between the user, the device and the used service without considering the context of use. However, during the last years we have seen a huge interest in industry and academia in using mobile devices for interactions with things, places and people in the real world, termed physical mobile interactions in this thesis. Until now there has been no comprehensive analysis of these interaction techniques and no user studies have been conducted to analyze when which interaction technique is preferred by which users. Furthermore there is no comprehensive framework available which can be reused by application developers to integrate such interactions into their applications, and no specific methods and best practices have been reported that can be of use when developing physical mobile interactions and applications. This dissertation presents the first comprehensive analysis and classification of physical mobile interactions. Furthermore a mature framework was developed that provides various implementations of four different interaction techniques. These four physical mobile interaction techniques were then used in five different prototypes and analysed in five different user studies. The results concern the advantages and disadvantages of these interaction techniques as seen by potential users. This work also reports experiences, guidelines, methods and best practices that simplify the process of developing physical mobile interactions and applications. Furthermore this dissertation provides an analysis of privacy aspects in mobile interactions with public displays, presents the novel interaction technique rotating compass and the first concept of using the mobile device for direct touch-based interaction with dynamic displays.

    Statistical Models for Infectious Disease Surveillance Counts

    Statistical Models for Infectious Disease Surveillance Counts
    Models for infectious disease surveillance counts have to take into account the specific characteristics of this type of data. While showing a regular, often seasonal, pattern over long time periods, there are occasional irregularities or outbreaks. A model which is a compromise between mechanistic models and empirical models is proposed. A key idea is to distinguish between an endemic and an epidemic component, which allows to separate the regular pattern from the irregularities and outbreaks. This is of particular advantage for outbreak detection in public health surveillance. While the endemic component is parameter-driven, the epidemic component is based on observationdriven approaches, including an autoregression on past observations. A particular challenge of infectious disease counts is the modelling of the outbreaks and irregularities in the data. We model the autoregressive parameter of the epidemic component by a Bayesian changepoint model, which shows an adaptive amount of smoothing, and is able to model the jumps and fast increases as well as the smooth decreases in the data. While the model can be used as a generic approach for infectious disease counts, it is particularly suited for outbreak detection in public health surveillance. Furthermore, the predictive qualities of the Bayesian changepoint model allow for short term predictions of the number of disease cases, which are of particular public health interest. A sequential update using a particle filter is provided, that can be used for a prospective analysis of the changepoint model conditioning on fixed values for the other parameters, which is of particular advantage for public health surveillance. A suitable multivariate extension is provided, that is able to explain the interactions between units, e.g. age groups or spatial regions. An application to influenza and meningococcal disease data shows that the occasional outbreaks of meningococcal disease can largely be explained by the influence of influenza on meningococcal disease. The risk of a future meningococcal disease outbreak caused by influenza can be predicted. The comparison of the different models, including a model based on Gaussian Markov random fields shows that the inclusion of the epidemic component as well as a time varying epidemic parameter improves the fit and the predictive qualities of the model.

    Similarity search and data mining techniques for advanced database systems.

    Similarity search and data mining techniques for advanced database systems.
    Modern automated methods for measurement, collection, and analysis of data in industry and science are providing more and more data with drastically increasing structure complexity. On the one hand, this growing complexity is justified by the need for a richer and more precise description of real-world objects, on the other hand it is justified by the rapid progress in measurement and analysis techniques that allow the user a versatile exploration of objects. In order to manage the huge volume of such complex data, advanced database systems are employed. In contrast to conventional database systems that support exact match queries, the user of these advanced database systems focuses on applying similarity search and data mining techniques. Based on an analysis of typical advanced database systems — such as biometrical, biological, multimedia, moving, and CAD-object database systems — the following three challenging characteristics of complexity are detected: uncertainty (probabilistic feature vectors), multiple instances (a set of homogeneous feature vectors), and multiple representations (a set of heterogeneous feature vectors). Therefore, the goal of this thesis is to develop similarity search and data mining techniques that are capable of handling uncertain, multi-instance, and multi-represented objects. The first part of this thesis deals with similarity search techniques. Object identification is a similarity search technique that is typically used for the recognition of objects from image, video, or audio data. Thus, we develop a novel probabilistic model for object identification. Based on it, two novel types of identification queries are defined. In order to process the novel query types efficiently, we introduce an index structure called Gauss-tree. In addition, we specify further probabilistic models and query types for uncertain multi-instance objects and uncertain spatial objects. Based on the index structure, we develop algorithms for an efficient processing of these query types. Practical benefits of using probabilistic feature vectors are demonstrated on a real-world application for video similarity search. Furthermore, a similarity search technique is presented that is based on aggregated multi-instance objects, and that is suitable for video similarity search. This technique takes multiple representations into account in order to achieve better effectiveness. The second part of this thesis deals with two major data mining techniques: clustering and classification. Since privacy preservation is a very important demand of distributed advanced applications, we propose using uncertainty for data obfuscation in order to provide privacy preservation during clustering. Furthermore, a model-based and a density-based clustering method for multi-instance objects are developed. Afterwards, original extensions and enhancements of the density-based clustering algorithms DBSCAN and OPTICS for handling multi-represented objects are introduced. Since several advanced database systems like biological or multimedia database systems handle predefined, very large class systems, two novel classification techniques for large class sets that benefit from using multiple representations are defined. The first classification method is based on the idea of a k-nearest-neighbor classifier. It employs a novel density-based technique to reduce training instances and exploits the entropy impurity of the local neighborhood in order to weight a given representation. The second technique addresses hierarchically-organized class systems. It uses a novel hierarchical, supervised method for the reduction of large multi-instance objects, e.g. audio or video, and applies support vector machines for efficient hierarchical classification of multi-represented objects. User benefits of this technique are demonstrated by a prototype that performs a classification of large music collections. The effectiveness and efficiency of all proposed techniques are discussed and verified by comparison with conventional approaches in versatile experimental evaluations on real-world datasets.

    Efficient Analysis in Multimedia Databases

    Efficient Analysis in Multimedia Databases
    The rapid progress of digital technology has led to a situation where computers have become ubiquitous tools. Now we can find them in almost every environment, be it industrial or even private. With ever increasing performance computers assumed more and more vital tasks in engineering, climate and environmental research, medicine and the content industry. Previously, these tasks could only be accomplished by spending enormous amounts of time and money. By using digital sensor devices, like earth observation satellites, genome sequencers or video cameras, the amount and complexity of data with a spatial or temporal relation has gown enormously. This has led to new challenges for the data analysis and requires the use of modern multimedia databases. This thesis aims at developing efficient techniques for the analysis of complex multimedia objects such as CAD data, time series and videos. It is assumed that the data is modeled by commonly used representations. For example CAD data is represented as a set of voxels, audio and video data is represented as multi-represented, multi-dimensional time series. The main part of this thesis focuses on finding efficient methods for collision queries of complex spatial objects. One way to speed up those queries is to employ a cost-based decompositioning, which uses interval groups to approximate a spatial object. For example, this technique can be used for the Digital Mock-Up (DMU) process, which helps engineers to ensure short product cycles. This thesis defines and discusses a new similarity measure for time series called threshold-similarity. Two time series are considered similar if they expose a similar behavior regarding the transgression of a given threshold value. Another part of the thesis is concerned with the efficient calculation of reverse k-nearest neighbor (RkNN) queries in general metric spaces using conservative and progressive approximations. The aim of such RkNN queries is to determine the impact of single objects on the whole database. At the end, the thesis deals with video retrieval and hierarchical genre classification of music using multiple representations. The practical relevance of the discussed genre classification approach is highlighted with a prototype tool that helps the user to organize large music collections. Both the efficiency and the effectiveness of the presented techniques are thoroughly analyzed. The benefits over traditional approaches are shown by evaluating the new methods on real-world test datasets.

    Themenstudienarbeit

    Themenstudienarbeit
    In diesem Buch werden zwei Schwerpunktziele verfolgt. In einem ersten Teil wird die Rahmenkonzeption der Themenstudienarbeit als Lernumgebung im Mathematikunterricht theoretisch fundiert. In einem zweiten Teil wird eine Studie beschrieben, die im Rahmen eines DFG-geförderten Forschungsprojekts durchgeführt wurde und die Evaluation einer Themenstudienarbeit zum Inhaltsbereich des geometrischen Beweisens und Argumentierens zum Gegenstand hatte.

    Combining Speech User Interfaces of Different Applications

    Combining Speech User Interfaces of Different Applications
    Recent technological advances allow for building real-time, inter-active multi-modal dialog systems for a wide variety of applications ranging from information systems to communication systems interacting with back-end services. To retrieve or update information from various information systems the user has to interact among other man-machine-interfaces (simultaneously) with speech dialog systems. This will inevitably lead a situation where a user has to interact with multiple speech dialog systems within a single thread of activity. Exposing the users to such an environment with diverse speech interfaces will result in increased cognitive load and thus bad usability. An integrated speech enabled access layer to all available information from different applications would allow the user to access information more efficiently and easily. This dissertation proposes a novel approach to build such an integrated speech user interface to different applications by combining the existing speech user interfaces of different applications automatically or semi-automatically. By analyzing the dialog specifications of different applications, functional and semantic overlaps between the applications are recognized. The overlaps are solved successfully in the level of dialog specification so that the integrated speech user interface provides transparent access to different applications, solves the problems of task sharing and enables information sharing among different applications.
    Logo

    © 2024 Podcastworld. All rights reserved

    Stay up to date

    For any inquiries, please email us at hello@podcastworld.io