Logo

    ddc:000

    Explore "ddc:000" with insightful episodes like "Efficient data mining algorithms for time series and complex medical data", "Ubiquitous Navigation", "Model-based testing of automotive HMIs with consideration for product variability", "The role of personal and shared displays in scripted collaborative learning" and "Information flow analysis for mobile code in dynamic security environments" from podcasts like ""Fakultät für Mathematik, Informatik und Statistik - Digitale Hochschulschriften der LMU - Teil 02/02", "Fakultät für Mathematik, Informatik und Statistik - Digitale Hochschulschriften der LMU - Teil 01/02", "Fakultät für Mathematik, Informatik und Statistik - Digitale Hochschulschriften der LMU - Teil 01/02", "Fakultät für Mathematik, Informatik und Statistik - Digitale Hochschulschriften der LMU - Teil 01/02" and "Fakultät für Mathematik, Informatik und Statistik - Digitale Hochschulschriften der LMU - Teil 01/02"" and more!

    Episodes (41)

    Ubiquitous Navigation

    Ubiquitous Navigation
    Ortsbezogene Dienste (Location-based Services, LBS) sind in den letzten Jahren durch die weite Verfügbarkeit von GPS zu einem Massenphänomen gewachsen. Insbesondere für die Steuerung von mobilen Endgeräten wird mehr und mehr Kontextinformation hinzugenommen, da sowohl die Bedienbarkeit als auch die Informationsdichte auf den kleinen Smartphones im Kontrast zur Informationsfülle des Internets stehen. Daher werden vielfach Dienste nicht mehr allein auf der Nutzereingabe basierend erbracht (reaktiv), sondern bieten dem Nutzer relevante Informationen vollautomatisch und kontextabhängig (proaktiv) an. Durch die proaktive Diensterbringung und ortsbezogene Personalisierung wird die Benutzbarkeit der Dienste verbessert. Leider kann man derzeit solche Dienste nur außerhalb von Gebäuden anbieten, da zum Einen kein einheitliches und günstiges Positionierungssystem verfügbar ist, und zum Anderen die notwendigen Kartendaten nicht vorliegen. Vor allem bei den Kartendaten fehlt es an einfachen Standards und Tools, die es dem Eigentümer eines Gebäudes ermöglichen, qualitativ hochwertige Kartendaten zur Verfügung zu stellen. In der vorliegenden Dissertation werden einige notwendige Schritte zur Ermöglichung ubiquitärer und skalierbarer Indoornavigation vorgestellt. Hierbei werden die Themenfelder Positionierung, Modellierung und Kartographie, sowie Navigation und Wegfindung in einen umfassenden Zusammenhang gestellt, um so eine tragfähige, einfache und skalierbare Navigation zu ermöglichen. Zunächst werden einige Verbesserungen an Terminal-basierten WLAN-Indoorpositionierungssystemen vorgeschlagen und diskutiert, die teils auf der Verwendung neuer Sensorik aktueller Smartphones basieren, und teils auf der Verwendung besonders kompakter Umgebungsmodelle auf mobilen Endgeräten. Insbesondere werden Verfahren vorgeschlagen und evaluiert, mit denen sich aus typischen CAD-Daten mit geringem Bearbeitungsaufwand die notwendigen Zusatzinformationen für eine flächendeckende Indoornavigation modellieren lassen. Darüber hinaus werden Methoden untersucht, die diese semantischen Erweiterungen teil- bzw. vollautomatisch aus Zeichnungen extrahieren können. Ausgehend von dem Ziel, flächendeckende Navigation basierend auf CAD-Daten zu ermöglichen, stößt man auf das Problem, eine Menge an interessanten Punkten so zu ordnen, dass der Reiseweg kurz ist.Dieses Problem ist mit dem NP-vollständigen Travelling-Salesman-Problem verwandt. Allerdings ist die geometrische Situation in Gebäuden derart komplex, dass sich die meisten derzeit bekannten heuristischen Lösungsalgorithmen für das Travelling-Salesman-Problem nicht ohne Weiteres auf die Situation im Inneren von Gebäuden übertragen lassen. Für dieses Problem wird ein heuristischer Algorithmus angegeben, der in linearer Laufzeit kleinere Instanzen des Problems mit akzeptablen Fehlern lösen kann. Diese Verfahren wurden im Rahmen eines Projekts mit dem Flughafen München erarbeitet und umgesetzt. In diesem Projekt sollten die ungelösten Probleme einer bereits existierenden kleinflächigen Demonstrator-Implementierung eines Fluggastinformationssystems gelöst werden. Auf diesen Algorithmen und Verfahren basiert die Navigationskomponente des Fluggastinformationssystems InfoGate, das die flächendeckende Navigation in den Terminals des Flughafen München mit verteilten Anzeigesystemen ermöglicht und seit dem 6. Juni 2011 im Produktivbetrieb ist. So konnten die Verfahren dieser Arbeit in einer real existierenden, komplexen Umgebung evaluiert werden.

    Model-based testing of automotive HMIs with consideration for product variability

    Model-based testing of automotive HMIs with consideration for product variability
    The human-machine interfaces (HMIs) of today’s premium automotive infotainment systems are complex embedded systems which have special characteristics in comparison to GUIs of standard PC applications, in particular regarding their variability. The variability of infotainment system HMIs results from different car models, product series, markets, equipment configuration possibilities, system types and languages and necessitates enormous testing efforts. The model-based testing approach is a promising solution for reducing testing efforts and increasing test coverage. However, while model-based testing has been widely used for function tests of subsystems in practice, HMI tests have remained manual or only semi-automated and are very time-consuming and work-intensive. Also, it is very difficult to achieve systematic or high test coverage via manual tests. A large amount of research work has addressed GUI testing in recent years. In addition, variability is becoming an ever more popular topic in the domain of software product line development. However, a model-based testing approach for complex HMIs which also considers variability is still lacking. This thesis presents a model-based testing approach for infotainment system HMIs with the particular aim of resolving the variability problem. Furthermore, the thesis provides a foundation for future standards of HMI testing in practice. The proposed approach is based on a model-based HMI testing framework which includes two essential components: a test-oriented HMI specification and a test generation component. The test-oriented HMI specification has a layered structure and is suited to specifying data which is required for testing different features of the HMI. Both the dynamic behavior and the representation of the HMI are the testing focuses of this thesis. The test generation component automatically generates tests from the test-oriented HMI specification. Furthermore, the framework can be extended in order to automatically execute the generated tests. Generated tests must first be initialized, which means that they are enhanced with concrete user input data. Afterwards, initialized tests can be automatically executed with the help of a test execution tool which must be extended into the testing framework. In this thesis, it is proposed to specify and test different HMI-variants which have a large set of commonalities based on the software product line approach. This means the test-oriented HMI specification is extended in order to describe the commonalities and variabilities between HMI variants of an HMI product line. In particular, strategies are developed in order to generate tests for different HMI products. One special feature is that redundancies are avoided both for the test generation and the execution processes. This is especially important for the industrial practice due to limited test resources. Modeling and testing variability of automotive HMIs make up the main research contributions of this thesis. We hope that the results presented in this thesis will offer GUI testing research a solution for model-based testing of multi-variant HMIs and provide the automotive industry with a foundation for future HMI testing standards.

    The role of personal and shared displays in scripted collaborative learning

    The role of personal and shared displays in scripted collaborative learning
    Over the last decades collaborative learning has gained immensely in importance and popularity due to its high potential. Unfortunately, learners rarely engage in effective learning activities unless they are provided with instructional support. In order to maximize learning outcomes it is therefore advisable to structure collaborative learning sessions. One way of doing this is using collaboration scripts, which define a sequence of activities to be carried out by the learners. The field of computer-supported collaborative learning (CSCL) produced a variety of collaboration scripts that proved to have positive effects on learning outcomes. These scripts provide detailed descriptions of successful learning scenarios and are therefore used as foundation for this thesis. In many cases computers are used to support collaborative learning. Traditional personal computers are often chosen for this purpose. However, during the last decades new technologies have emerged, which seem to be better suited for co-located collaboration than personal computers. Large interactive displays, for example, allow a number of people to work simultaneously on the same surface while being highly aware of the co-learners' actions. There are also multi-display environments that provide several workspaces, some of which may be shared, others may be personal. However, there is a lack of knowledge regarding the influence of different display types on group processes. For instance, it remains unclear in which cases shareable user interfaces should replace traditional single-user devices and when both personal and shared workspaces should be provided. This dissertation therefore explores the role of personal and shared workspaces in various situations in the area of collaborative learning. The research questions include the choice of technological devices, the seating arrangement as well as how user interfaces can be designed to guide learners. To investigate these questions a two-fold approach was chosen. First, a framework was developed, which supports the implementation of scripted collaborative learning applications. Second, different prototypes were implemented to explore the research questions. Each prototype is based on at least one collaboration script. The result is a set of studies, which contribute to answering the above-mentioned research questions. With regard to the choice of display environment the studies showed several reasons for integrating personal devices such as laptops. Pure tabletop applications with around-the-table seating arrangements whose benefits for collaboration are widely discussed in the relevant literature revealed severe drawbacks for text-based learning activities. The combination of laptops and an interactive wall display, on the other hand, turned out to be a suitable display environment for collaborative learning in several cases. In addition, the thesis presents several ways of designing the user interface in a way that guides learners through collaboration scripts.

    Information flow analysis for mobile code in dynamic security environments

    Information flow analysis for mobile code in dynamic security environments
    With the growing amount of data handled by Internet-enabled mobile devices, the task of preventing software from leaking confidential information is becoming increasingly important. At the same time, mobile applications are typically executed on different devices whose users have varying requirements for the privacy of their data. Users should be able to define their personal information security settings, and they should get a reliable assurance that the installed software respects these settings. Language-based information flow security focuses on the analysis of programs to determine information flows among accessed data resources of different security levels, and to verify and formally certify that these flows follow a given policy. In the mobile code scenario, however, both the dynamic aspect of the security environment and the fact that mobile software is distributed as bytecode pose a challenge for existing static analysis approaches. This thesis presents a language-based mechanism to certify information flow security in the presence of dynamic environments. An object-oriented high-level language as well as a bytecode language are equipped with facilities to inspect user-defined information flow security settings at runtime. This way, the software developer can create privacy-aware programs that can adapt their behaviour to arbitrary security environments, a property that is formalized as "universal noninterference". This property is statically verified by an information flow type system that uses restrictive forms of dependent types to judge abstractly on the concrete security policy that is effective at runtime. To verify compiled bytecode programs, a low-level version of the type system is presented that works on an intermediate code representation in which the original program structure is partially restored. Rigorous soundness proofs and a type-preserving compilation enable the generation of certified bytecode programs in the style of proof-carrying code. To show the practical feasibility of the approach, the system is implemented and demonstrated on a concrete application scenario, where personal data are sent from a mobile device to a server on the Internet.

    Effects of computer-supported collaboration script and incomplete concept maps on web design skills in an online design-based learning environment

    Effects of computer-supported collaboration script and incomplete concept maps on web design skills in an online design-based learning environment
    Web design skills are an important component of media literacy. The aim of our study was to promote university students’ web design skills through online design-based learning (DBL). Combined in a 2x2-factorial design, two types of scaffolding were implemented in an online DBL environment to support the students through their effort to design, build, modify, and publish web sites on processes and outcomes measures, namely collaboration scripts and incomplete concept maps. The results showed that both treatments had positive effects on collaborative (content-related discourse quality, collaboration skills, and quality of published web sites) and individual (domain-specific knowledge and skills related to the design and building of websites) learning outcomes. There was synergism between the two scaffolds in that the combination of the collaboration script and incomplete concept maps produced the most positive results. To be effective, online DBL thus needs to be enhanced by appropriate scaffolds, and both collaboration scripts and incomplete concept maps are effective examples.

    Analysis of methods for extraction of programs from non-constructive proofs

    Analysis of methods for extraction of programs from non-constructive proofs
    The present thesis compares two computational interpretations of non-constructive proofs: refined A-translation and Gödel's functional "Dialectica" interpretation. The behaviour of the extraction methods is evaluated in the light of several case studies, where the resulting programs are analysed and compared. It is argued that the two interpretations correspond to specific backtracking implementations and that programs obtained via the refined A-translation tend to be simpler, faster and more readable than programs obtained via Gödel's interpretation. Three layers of optimisation are suggested in order to produce faster and more readable programs. First, it is shown that syntactic repetition of subterms can be reduced by using let-constructions instead of meta substitutions abd thus obtaining a near linear size bound of extracted terms. The second improvement allows declaring syntactically computational parts of the proof as irrelevant and that this can be used to remove redundant parameters, possibly improving the efficiency of the program. Finally, a special case of induction is identified, for which a more efficient recursive extracted term can be defined. It is shown the outcome of case distinctions can be memoised, which can result in exponential improvement of the average time complexity of the extracted program.

    Activity of microRNAs and transcription factors in Gene Regulatory Networks

    Activity of microRNAs and transcription factors in Gene Regulatory Networks
    In biological research, diverse high-throughput techniques enable the investigation of whole systems at the molecular level. The development of new methods and algorithms is necessary to analyze and interpret measurements of gene and protein expression and of interactions between genes and proteins. One of the challenges is the integrated analysis of gene expression and the associated regulation mechanisms. The two most important types of regulators, transcription factors (TFs) and microRNAs (miRNAs), often cooperate in complex networks at the transcriptional and post-transcriptional level and, thus, enable a combinatorial and highly complex regulation of cellular processes. For instance, TFs activate and inhibit the expression of other genes including other TFs whereas miRNAs can post-transcriptionally induce the degradation of transcribed RNA and impair the translation of mRNA into proteins. The identification of gene regulatory networks (GRNs) is mandatory in order to understand the underlying control mechanisms. The expression of regulators is itself regulated, i.e. activating or inhibiting regulators in varying conditions and perturbations. Thus, measurements of gene expression following targeted perturbations (knockouts or overexpressions) of these regulators are of particular importance. The prediction of the activity states of the regulators and the prediction of the target genes are first important steps towards the construction of GRNs. This thesis deals with these first bioinformatics steps to construct GRNs. Targets of TFs and miRNAs are determined as comprehensively and accurately as possible. The activity state of regulators is predicted for specific high-throughput data and specific contexts using appropriate statistical approaches. Moreover, (parts of) GRNs are inferred, which lead to explanations of given measurements. The thesis describes new approaches for these tasks together with accompanying evaluations and validations. This immediately defines the three main goals of the current thesis: 1. The development of a comprehensive database of regulator-target relation. Regulators and targets are retrieved from public repositories, extracted from the literature via text mining and collected into the miRSel database. In addition, relations can be predicted using various published methods. In order to determine the activity states of regulators (see 2.) and to infer GRNs (3.) comprehensive and accurate regulator-target relations are required. It could be shown that text mining enables the reliable extraction of miRNA, gene, and protein names as well as their relations from scientific free texts. Overall, the miRSel contains about three times more relations for the model organisms human, mouse, and rat as compared to state-of-the-art databases (e.g. TarBase, one of the currently most used resources for miRNA-target relations). 2. The prediction of activity states of regulators based on improved target sets. In order to investigate mechanisms of gene regulation, the experimental contexts have to be determined in which the respective regulators become active. A regulator is predicted as active based on appropriate statistical tests applied to the expression values of its set of target genes. For this task various gene set enrichment (GSE) methods have been proposed. Unfortunately, before an actual experiment it is unknown which genes are affected. The missing standard-of-truth so far has prevented the systematic assessment and evaluation of GSE tests. In contrast, the trigger of gene expression changes is of course known for experiments where a particular regulator has been directly perturbed (i.e. by knockout, transfection, or overexpression). Based on such datasets, we have systematically evaluated 12 current GSE tests. In our analysis ANOVA and the Wilcoxon test performed best. 3. The prediction of regulation cascades. Using gene expression measurements and given regulator-target relations (e.g. from the miRSel database) GRNs are derived. GSE tests are applied to determine TFs and miRNAs that change their activity as cellular response to an overexpressed miRNA. Gene regulatory networks can constructed iteratively. Our models show how miRNAs trigger gene expression changes: either directly or indirectly via cascades of miRNA-TF, miRNA-kinase-TF as well as TF-TF relations. In this thesis we focus on measurements which have been obtained after overexpression of miRNAs. Surprisingly, a number of cancer relevant miRNAs influence a common core of TFs which are involved in processes such as proliferation and apoptosis.

    Wiki im Wissensmanagement: Determinanten der Akzeptanz eines Web 2.0 Projektes innerhalb eines internationalen Zulieferers der Automobilindustrie

    Wiki im Wissensmanagement: Determinanten der Akzeptanz eines Web 2.0 Projektes innerhalb eines internationalen Zulieferers der Automobilindustrie
    Wikis have been used increasingly as a tool for collaborative authoring and knowledge management (KM) in recent years. One possible application is the collective documentation of technical knowledge and experience within a corporation or its departments. However, just like with other new technologies, the success of wikis depends on the intended users’ acceptance, which cannot always be presupposed. Therefore, the present work focused on the acceptance of introducing wiki technology in an existing corporate knowledge management (KM) system at an international automotive supplier enterprise. Drawing on theories of technology acceptance (e.g. Venkatesh & Davis, 2000), KM research on virtual communities of practice, and motivation theory, a theoretical framework model was developed. It distinguishes individual and organizational factors and wiki features as potential influences on successful use of wikis in corporations. The model was used as framework for two studies investigating users’ acceptance of a newly introduced wiki on corporate level as well as wikis on departmental level. A third study analyzed the wiki’s contribution to overall corporate KM activities. The wiki had been used to build a corporate lexicon, available for more than 120,000 employees worldwide. Methodologically, the studies incorporated surveys among wiki users and log file analyses. The main findings emphasize former research results about motivation and perceived ease of use as key influences on acceptance of new technology. However, further evidence identified users’ expectancies as an additional important factor. Concerning the wiki’s contribution to KM, hopes of using it as a tool to foster generation of new knowledge were dampened, as it was used mainly for the representation of existing knowledge. Overall, the study was able to identify a number of factors for the successful introduction of wikis as collaborative authoring tools in a corporate setting.

    Finding correlations and independences in omics data

    Finding correlations and independences in omics data
    Biological studies across all omics fields generate vast amounts of data. To understand these complex data, biologically motivated data mining techniques are indispensable. Evaluation of the high-throughput measurements usually relies on the identification of underlying signals as well as shared or outstanding characteristics. Therein, methods have been developed to recover source signals of present datasets, reveal objects which are more similar to each other than to other objects as well as to detect observations which are in contrast to the background dataset. Biological problems got individually addressed by using solutions from computer science according to their needs. The study of protein-protein interactions (interactome) focuses on the identification of clusters, the sub-graphs of graphs: A parameter-free graph clustering algorithm was developed, which was based on the concept of graph compression, in order to find sets of highly interlinked proteins sharing similar characteristics. The study of lipids (lipidome) calls for co-regulation analyses: To reveal those lipids similarly responding to biological factors, partial correlations were generated with differential Gaussian Graphical Models while accounting for solely disease-specific correlations. The study on single cell level (cytomics) aims to understand cellular systems often with the help of microscopy techniques: A novel noise robust source separation technique allowed to reliably extract independent components from microscopy images describing protein behaviors. The study of peptides (peptidomics) often requires the detection outstanding observations: By assessing regularities in the data set, an outlier detection algorithm was implemented based on compression efficacy of independent components of the dataset. All developed algorithms had to fulfill most diverse constraints in each omics field, but were met with methods derived from standard correlation and dependency analyses.

    Multimodal interaction: developing an interaction concept for a touchscreen incorporating tactile feedback

    Multimodal interaction: developing an interaction concept for a touchscreen incorporating tactile feedback
    The touchscreen, as an alternative user interface for applications that normally require mice and keyboards, has become more and more commonplace, showing up on mobile devices, on vending machines, on ATMs and in the control panels of machines in industry, where conventional input devices cannot provide intuitive, rapid and accurate user interaction with the content of the display. The exponential growth in processing power on the PC, together with advances in understanding human communication channels, has had a significant effect on the design of usable, human-factored interfaces on touchscreens, and on the number and complexity of applications available on touchscreens. Although computer-driven touchscreen interfaces provide programmable and dynamic displays, the absence of the expected tactile cues on the hard and static surfaces of conventional touchscreens is challenging interface design and touchscreen usability, in particular for distracting, low-visibility environments. Current technology allows the human tactile modality to be used in touchscreens. While the visual channel converts graphics and text unidirectionally from the computer to the end user, tactile communication features a bidirectional information flow to and from the user as the user perceives and acts on the environment and the system responds to changing contextual information. Tactile sensations such as detents and pulses provide users with cues that make selecting and controlling a more intuitive process. Tactile features can compensate for deficiencies in some of the human senses, especially in tasks which carry a heavy visual or auditory burden. In this study, an interaction concept for tactile touchscreens is developed with a view to employing the key characteristics of the human sense of touch effectively and efficiently, especially in distracting environments where vision is impaired and hearing is overloaded. As a first step toward improving the usability of touchscreens through the integration of tactile effects, different mechanical solutions for producing motion in tactile touchscreens are investigated, to provide a basis for selecting suitable vibration directions when designing tactile displays. Building on these results, design know-how regarding tactile feedback patterns is further developed to enable dynamic simulation of UI controls, in order to give users a sense of perceiving real controls on a highly natural touch interface. To study the value of adding tactile properties to touchscreens, haptically enhanced UI controls are then further investigated with the aim of mapping haptic signals to different usage scenarios to perform primary and secondary tasks with touchscreens. The findings of the study are intended for consideration and discussion as a guide to further development of tactile stimuli, haptically enhanced user interfaces and touchscreen applications.

    Software engineering perspectives on physiological computing

    Software engineering perspectives on physiological computing
    Physiological computing is an interesting and promising concept to widen the communication channel between the (human) users and computers, thus allowing an increase of software systems' contextual awareness and rendering software systems smarter than they are today. Using physiological inputs in pervasive computing systems allows re-balancing the information asymmetry between the human user and the computer system: while pervasive computing systems are well able to flood the user with information and sensory input (such as sounds, lights, and visual animations), users only have a very narrow input channel to computing systems; most of the time, restricted to keyboards, mouse, touchscreens, accelerometers and GPS receivers (through smartphone usage, e.g.). Interestingly, this information asymmetry often forces the user to subdue to the quirks of the computing system to achieve his goals -- for example, users may have to provide information the software system demands through a narrow, time-consuming input mode that the system could sense implicitly from the human body. Physiological computing is a way to circumvent these limitations; however, systematic means for developing and moulding physiological computing applications into software are still unknown. This thesis proposes a methodological approach to the creation of physiological computing applications that makes use of component-based software engineering. Components help imposing a clear structure on software systems in general, and can thus be used for physiological computing systems as well. As an additional bonus, using components allow physiological computing systems to leverage reconfigurations as a means to control and adapt their own behaviours. This adaptation can be used to adjust the behaviour both to the human and to the available computing environment in terms of resources and available devices - an activity that is crucial for complex physiological computing systems. With the help of components and reconfigurations, it is possible to structure the functionality of physiological computing applications in a way that makes them manageable and extensible, thus allowing a stepwise and systematic extension of a system's intelligence. Using reconfigurations entails a larger issue, however. Understanding and fully capturing the behaviour of a system under reconfiguration is challenging, as the system may change its structure in ways that are difficult to fully predict. Therefore, this thesis also introduces a means for formal verification of reconfigurations based on assume-guarantee contracts. With the proposed assume-guarantee contract framework, it is possible to prove that a given system design (including component behaviours and reconfiguration specifications) is satisfying real-time properties expressed as assume-guarantee contracts using a variant of real-time linear temporal logic introduced in this thesis - metric interval temporal logic for reconfigurable systems. Finally, this thesis embeds both the practical approach to the realisation of physiological computing systems and formal verification of reconfigurations into Scrum, a modern and agile software development methodology. The surrounding methodological approach is intended to provide a frame for the systematic development of physiological computing systems from first psychological findings to a working software system with both satisfactory functionality and software quality aspects. By integrating practical and theoretical aspects of software engineering into a self-contained development methodology, this thesis proposes a roadmap and guidelines for the creation of new physiological computing applications.

    Efficient Knowledge Extraction from Structured Data

    Efficient Knowledge Extraction from Structured Data
    Knowledge extraction from structured data aims for identifying valid, novel, potentially useful, and ultimately understandable patterns in the data. The core step of this process is the application of a data mining algorithm in order to produce an enumeration of particular patterns and relationships in large databases. Clustering is one of the major data mining tasks and aims at grouping the data objects into meaningful classes (clusters) such that the similarity of objects within clusters is maximized, and the similarity of objects from different clusters is minimized. In this thesis, we advance the state-of-the-art data mining algorithms for analyzing structured data types. We describe the development of innovative solutions for hierarchical data mining. The EM-based hierarchical clustering method ITCH (Information-Theoretic Cluster Hierarchies) is designed to propose solid solutions for four different challenges. (1) to guide the hierarchical clustering algorithm to identify only meaningful and valid clusters. (2) to represent each cluster content in the hierarchy by an intuitive description with e.g. a probability density function. (3) to consistently handle outliers. (4) to avoid difficult parameter settings. ITCH is built on a hierarchical variant of the information-theoretic principle of Minimum Description Length (MDL). Interpreting the hierarchical cluster structure as a statistical model of the dataset, it can be used for effective data compression by Huffman coding. Thus, the achievable compression rate induces a natural objective function for clustering, which automatically satisfies all four above mentioned goals. The genetic-based hierarchical clustering algorithm GACH (Genetic Algorithm for finding Cluster Hierarchies) overcomes the problem of getting stuck in a local optimum by a beneficial combination of genetic algorithms, information theory and model-based clustering. Besides hierarchical data mining, we also made contributions to more complex data structures, namely objects that consist of mixed type attributes and skyline objects. The algorithm INTEGRATE performs integrative mining of heterogeneous data, which is one of the major challenges in the next decade, by a unified view on numerical and categorical information in clustering. Once more, supported by the MDL principle, INTEGRATE guarantees the usability on real world data. For skyline objects we developed SkyDist, a similarity measure for comparing different skyline objects, which is therefore a first step towards performing data mining on this kind of data structure. Applied in a recommender system, for example SkyDist can be used for pointing the user to alternative car types, exhibiting a similar price/mileage behavior like in his original query. For mining graph-structured data, we developed different approaches that have the ability to detect patterns in static as well as in dynamic networks. We confirmed the practical feasibility of our novel approaches on large real-world case studies ranging from medical brain data to biological yeast networks. In the second part of this thesis, we focused on boosting the knowledge extraction process. We achieved this objective by an intelligent adoption of Graphics Processing Units (GPUs). The GPUs have evolved from simple devices for the display signal preparation into powerful coprocessors that do not only support typical computer graphics tasks but can also be used for general numeric and symbolic computations. As major advantage, GPUs provide extreme parallelism combined with a high bandwidth in memory transfer at low cost. In this thesis, we propose algorithms for computationally expensive data mining tasks like similarity search and different clustering paradigms which are designed for the highly parallel environment of a GPU, called CUDA-DClust and CUDA-k-means. We define a multi-dimensional index structure which is particularly suited to support similarity queries under the restricted programming model of a GPU. We demonstrate the superiority of our algorithms running on GPU over their conventional counterparts on CPU in terms of efficiency.

    Boosting in structured additive models

    Boosting in structured additive models
    Variable selection and model choice are of major concern in many statistical applications, especially in regression models for high-dimensional data. Boosting is a convenient statistical method that combines model fitting with intrinsic model selection. We investigate the impact of base-learner specification on the performance of boosting as a model selection procedure. We show that variable selection may be biased if the base-learners have different degrees of flexibility, both for categorical covariates and for smooth effects of continuous covariates. We investigate these problems from a theoretical perspective and suggest a framework for unbiased model selection based on a general class of penalized least squares base-learners. Making all base-learners comparable in terms of their degrees of freedom strongly reduces the selection bias observed with naive boosting specifications. Furthermore, the definition of degrees of freedom that is used in the smoothing literature is questionable in the context of boosting, and an alternative definition is theoretically derived. The importance of unbiased model selection is demonstrated in simulations and in an application to forest health models. A second aspect of this thesis is the expansion of the boosting algorithm to new estimation problems: by using constraint base-learners, monotonicity constrained effect estimates can be seamlessly incorporated in the existing boosting framework. This holds for both, smooth effects and ordinal variables. Furthermore, cyclic restrictions can be integrated in the model for smooth effects of continuous covariates. In particular in time-series models, cyclic constraints play an important role. Monotonic and cyclic constraints of smooth effects can, in addition, be extended to smooth, bivariate function estimates. If the true effects are monotonic or cyclic, simulation studies show that constrained estimates are superior to unconstrained estimates. In three case studies (the modeling the presence of Red Kite in Bavaria, the modeling of activity profiles for Roe Deer, and the modeling of deaths caused by air pollution in Sao Paulo) it is shown that both constraints can be integrated in the boosting framework and that they are easy to use. All described results were included in the R add-on package mboost.

    Synchronization Inspired Data Mining

    Synchronization Inspired Data Mining
    Advances of modern technologies produce huge amounts of data in various fields, increasing the need for efficient and effective data mining tools to uncover the information contained implicitly in the data. This thesis mainly aims to propose innovative and solid algorithms for data mining from a novel perspective: synchronization. Synchronization is a prevalent phenomenon in nature that a group of events spontaneously come into co-occurrence with a common rhythm through mutual interactions. The mechanism of synchronization allows controlling of complex processes by simple operations based on interactions between objects. The first main part of this thesis focuses on developing the innovative algorithms for data mining. Inspired by the concept of synchronization, this thesis presents Sync (Clustering by Synchronization), a novel approach to clustering. In combination with the Minimum Description Length principle (MDL), it allows discovering the intrinsic clusters without any data distribution assumptions and parameters setting. In addition, relying on the dierent dynamic behaviors of objects during the process towards synchronization,the algorithm SOD (Synchronization-based Outlier Detection) is further proposed. The outlier objects can be naturally flagged by the denition of Local Synchronization Factor (LSF). To cure the curse of dimensionality in clustering,a subspace clustering algorithm ORSC is introduced which automatically detects clusters in subspaces of the original feature space. This approach proposes a weighted local interaction model to ensure all objects in a common cluster, which accommodate in arbitrarily oriented subspace, naturally move together. In order to reveal the underlying patterns in graphs, a graph partitioning approach RSGC (Robust Synchronization-based Graph Clustering) is presented. The key philosophy of RSGC is to consider graph clustering as a dynamic process towards synchronization. Inherited from the powerful concept of synchronization, RSGC shows several desirable properties that don't exist in other competitive methods. For all presented algorithms, their efficiency and eectiveness are thoroughly analyzed. The benets over traditional approaches are further demonstrated by evaluating them on synthetic as well as real-world data sets. Not only the theory research on novel data mining algorithms, the second main part of the thesis focuses on brain network analysis based on Diusion Tensor Images (DTI). A new framework for automated white matter tracts clustering is rst proposed to identify the meaningful ber bundles in the Human Brain by combining ideas from time series mining with density-based clustering. Subsequently, the enhancement and variation of this approach is discussed allowing for a more robust, efficient, or eective way to find hierarchies of ber bundles. Based on the structural connectivity network, an automated prediction framework is proposed to analyze and understand the abnormal patterns in patients of Alzheimer's Disease.

    Similarity Search in Medical Data

    Similarity Search in Medical Data
    The ongoing automation in our modern information society leads to a tremendous rise in the amount as well as complexity of collected data. In medical imaging for example the electronic availability of extensive data collected as part of clinical trials provides a remarkable potentiality to detect new relevant features in complex diseases like brain tumors. Using data mining applications for the analysis of the data raises several problems. One problem is the localization of outstanding observations also called outliers in a data set. In this work a technique for parameter-free outlier detection, which is based on data compression and a general data model which combines the Generalized Normal Distribution (GND) with independent components, to cope with existing problems like parameter settings or implicit data distribution assumptions, is proposed. Another problem in many modern applications amongst others in medical imaging is the efficient similarity search in uncertain data. At present, an adequate therapy planning of newly detected brain tumors assumedly of glial origin needs invasive biopsy due to the fact that prognosis and treatment, both vary strongly for benign, low-grade, and high-grade tumors. To date differentiation of tumor grades is mainly based on the expertise of neuroradiologists examining contrast-enhanced Magnetic Resonance Images (MRI). To assist neuroradiologist experts during the differentiation between tumors of different malignancy we proposed a novel, efficient similarity search technique for uncertain data. The feature vector of an object is thereby not exactly known but is rather defined by a Probability Density Function (PDF) like a Gaussian Mixture Model (GMM). Previous work is limited to axis-parallel Gaussian distributions, hence, correlations between different features are not considered in these similarity searches. In this work a novel, efficient similarity search technique for general GMMs without independence assumption is presented. The actual components of a GMM are approximated in a conservative but tight way. The conservativity of the approach leads to a filter-refinement architecture, which guarantees no false dismissals and the tightness of the approximations causes good filter selectivity. An extensive experimental evaluation of the approach demonstrates a considerable speed-up of similarity queries on general GMMs. Additionally, promising results for advancing the differentiation between brain tumors of different grades could be obtained by applying the approach to four-dimensional Magnetic Resonance Images of glioma patients.

    Constructive Reasoning for Semantic Wikis

    Constructive Reasoning for Semantic Wikis
    One of the main design goals of social software, such as wikis, is to support and facilitate interaction and collaboration. This dissertation explores challenges that arise from extending social software with advanced facilities such as reasoning and semantic annotations and presents tools in form of a conceptual model, structured tags, a rule language, and a set of novel forward chaining and reason maintenance methods for processing such rules that help to overcome the challenges. Wikis and semantic wikis were usually developed in an ad-hoc manner, without much thought about the underlying concepts. A conceptual model suitable for a semantic wiki that takes advanced features such as annotations and reasoning into account is proposed. Moreover, so called structured tags are proposed as a semi-formal knowledge representation step between informal and formal annotations. The focus of rule languages for the Semantic Web has been predominantly on expert users and on the interplay of rule languages and ontologies. KWRL, the KiWi Rule Language, is proposed as a rule language for a semantic wiki that is easily understandable for users as it is aware of the conceptual model of a wiki and as it is inconsistency-tolerant, and that can be efficiently evaluated as it builds upon Datalog concepts. The requirement for fast response times of interactive software translates in our work to bottom-up evaluation (materialization) of rules (views) ahead of time – that is when rules or data change, not when they are queried. Materialized views have to be updated when data or rules change. While incremental view maintenance was intensively studied in the past and literature on the subject is abundant, the existing methods have surprisingly many disadvantages – they do not provide all information desirable for explanation of derived information, they require evaluation of possibly substantially larger Datalog programs with negation, they recompute the whole extension of a predicate even if only a small part of it is affected by a change, they require adaptation for handling general rule changes. A particular contribution of this dissertation consists in a set of forward chaining and reason maintenance methods with a simple declarative description that are efficient and derive and maintain information necessary for reason maintenance and explanation. The reasoning methods and most of the reason maintenance methods are described in terms of a set of extended immediate consequence operators the properties of which are proven in the classical logical programming framework. In contrast to existing methods, the reason maintenance methods in this dissertation work by evaluating the original Datalog program – they do not introduce negation if it is not present in the input program – and only the affected part of a predicate’s extension is recomputed. Moreover, our methods directly handle changes in both data and rules; a rule change does not need to be handled as a special case. A framework of support graphs, a data structure inspired by justification graphs of classical reason maintenance, is proposed. Support graphs enable a unified description and a formal comparison of the various reasoning and reason maintenance methods and define a notion of a derivation such that the number of derivations of an atom is always finite even in the recursive Datalog case. A practical approach to implementing reasoning, reason maintenance, and explanation in the KiWi semantic platform is also investigated. It is shown how an implementation may benefit from using a graph database instead of or along with a relational database.

    Types with potential: polynomial resource bounds via automatic amortized analysis

    Types with potential: polynomial resource bounds via automatic amortized analysis
    A primary feature of a computer program is its quantitative performance characteristics: the amount of resources such as time, memory, and power the program needs to perform its task. Concrete resource bounds for specific hardware have many important applications in software development but their manual determination is tedious and error-prone. This dissertation studies the problem of automatically determining concrete worst-case bounds on the quantitative resource consumption of functional programs. Traditionally, automatic resource analyses are based on recurrence relations. The difficulty of both extracting and solving recurrence relations has led to the development of type-based resource analyses that are compositional, modular, and formally verifiable. However, existing automatic analyses based on amortization or sized types can only compute bounds that are linear in the sizes of the arguments of a function. This work presents a novel type system that derives polynomial bounds from first-order functional programs. As pioneered by Hofmann and Jost for linear bounds, it relies on the potential method of amortized analysis. Types are annotated with multivariate resource polynomials, a rich class of functions that generalize non-negative linear combinations of binomial coefficients. The main theorem states that type derivations establish resource bounds that are sound with respect to the resource-consumption of programs which is formalized by a big-step operational semantics. Simple local type rules allow for an efficient inference algorithm for the type annotations which relies on linear constraint solving only. This gives rise to an analysis system that is fully automatic if a maximal degree of the bounding polynomials is given. The analysis is generic in the resource of interest and can derive bounds on time and space usage. The bounds are naturally closed under composition and eventually summarized in closed, easily understood formulas. The practicability of this automatic amortized analysis is verified with a publicly available implementation and a reproducible experimental evaluation. The experiments with a wide range of examples from functional programming show that the inference of the bounds only takes a couple of seconds in most cases. The derived heap-space and evaluation-step bounds are compared with the measured worst-case behavior of the programs. Most bounds are asymptotically tight, and the constant factors are close or even identical to the optimal ones. For the first time we are able to automatically and precisely analyze the resource consumption of involved programs such as quick sort for lists of lists, longest common subsequence via dynamic programming, and multiplication of a list of matrices with different, fitting dimensions.

    Refinement of Classical Proofs for Program Extraction

    Refinement of Classical Proofs for Program Extraction
    The A-Translation enables us to unravel the computational information in classical proofs, by first transforming them into constructive ones, however at the cost of introducing redundancies in the extracted code. This is due to the fact that all negations inserted during translation are replaced by the computationally relevant form of the goal. In this thesis we are concerned with eliminating such redundancies, in order to obtain better extracted programs. For this, we propose two methods: a controlled and minimal insertion of negations, such that a refinement of the A-Translation can be used and an algorithmic decoration of the proofs, in order to mark the computationally irrelevant components. By restricting the logic to be minimal, the Double Negation Translation is no longer necessary. On this fragment of minimal logic we apply the refined A-Translation, as proposed in (Berget et al., 2002). This method identifies further selected classes of formulas for which the negations do not need to be substituted by computationally relevant formulas. However, the refinement imposes restrictions which considerably narrow the applicability domain of the A-Translation. We address this issue by proposing a controlled insertion of double negations, with the benefit that some intuitionistically valid \Pi^0_2-formulas become provable in minimal logic and that certain formulas are transformed to match the requirements of the refined A-Translation. We present the outcome of applying the refined A-translation to a series of examples. Their purpose is two folded. On one hand, they serve as case studies for the role played by negations, by shedding a light on the restrictions imposed by the translation method. On the other hand, the extracted programs are characterized by a specific behaviour: they adhere to the continuation passing style and the recursion is in general in tail form. The second improvement concerns the detection of the computationally irrelevant subformulas, such that no terms are extracted from them. In order to achieve this, we assign decorations to the implication and universal quantifier. The algorithm that we propose is shown to be optimal, correct and terminating and is applied on the examples of factorial and list reversal.

    Placeable and localizable elements in translation memory systems

    Placeable and localizable elements in translation memory systems
    Translation memory systems (TM systems) are software packages used in computer-assisted translation (CAT) to support human translators. As an example of successful natural language processing (NLP), these applications have been discussed in monographic works, conferences, articles in specialized journals, newsletters, forums, mailing lists, etc. This thesis focuses on how TM systems deal with placeable and localizable elements, as defined in 2.1.1.1. Although these elements are mentioned in the cited sources, there is no systematic work discussing them. This thesis is aimed at filling this gap and at suggesting improvements that could be implemented in order to tackle current shortcomings. The thesis is divided into the following chapters. Chapter 1 is a general introduction to the field of TM technology. Chapter 2 presents the conducted research in detail. The chapters 3 to 12 each discuss a specific category of placeable and localizable elements. Finally, chapter 13 provides a conclusion summarizing the major findings of this research project.
    Logo

    © 2024 Podcastworld. All rights reserved

    Stay up to date

    For any inquiries, please email us at hello@podcastworld.io