Dissertations / Theses: 'Data Analysis Workflow'

1

Rodrigues, Roberto Wagner da Silva. "Deviation analysis of inter-organisational workflow systems." Thesis, Imperial College London, 2002. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.271151.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Marsolo, Keith Allen. "A workflow for the modeling and analysis of biomedical data." Columbus, Ohio : Ohio State University, 2007. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=osu1180309265.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Cutler, Darren W., and Tyler J. Rasmussen. "Usability Testing and Workflow Analysis of the TRADOC Data Visualization Tool." Thesis, Monterey, California. Naval Postgraduate School, 2012. http://hdl.handle.net/10945/17350.

Full text

Abstract:

Approved for public release; distribution is unlimited
The volume of data available to military decision makers is vast. Leaders need tools to sort, analyze, and present information in an effective manner. Software complexity is also increasing, with user interfaces becoming more intricate and interactive. The Data Visualization Tool (DaViTo) is an effort by TRAC Monterey to produce a tool for use by personnel with little statistical background to process and display this data. To meet the program goals and make analytical capabilities more widely available, the user interface and data representation techniques need refinement. This usability test is a task-oriented study using eye-tracking, data representation techniques, and surveys to generate recommendations for software improvement. Twenty-four subjects participated in three sessions using DaViTo over a three-week period. The first two sessions consisted of training followed by basic reinforcement tasks, evaluation of graphical methods, and a brief survey. The final session was a task-oriented session followed by graphical representations evaluation and an extensive survey. Results from the three sessions were analyzed and 37 recommendations generated for the improvement of DaViTo. Improving software latency, providing more graphing options and tools, and inclusion of an effective training product are examples of important recommendations that would greatly improve usability.

APA, Harvard, Vancouver, ISO, and other styles

4

Nagavaram, Ashish. "Cloud Based Dynamic Workflow with QOS For Mass Spectrometry Data Analysis." The Ohio State University, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=osu1322681210.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Kwak, Daniel (Daniel Joowon). "Investigation of intrinsic rotation dependencies in Alcator C-Mod using a new data analysis workflow." Thesis, Massachusetts Institute of Technology, 2015. http://hdl.handle.net/1721.1/103705.

Full text

Abstract:

Thesis: S.M., Massachusetts Institute of Technology, Department of Nuclear Science and Engineering, 2015.
Cataloged from PDF version of thesis.
Includes bibliographical references (pages 190-193).
Toroidal rotation, important for suppressing various turbulent modes, mitigating MHD instabilities, and preventing locked modes that cause disruptions, may not be sufficiently generated by external devices in larger devices i.e. ITER. One possible solution is intrinsic rotation, self-generated flow without external momentum input, which has been observed in multiple tokamaks. More specifically, rotation reversals, a sudden change in direction of intrinsic rotation without significant change in global plasma parameters, have also been observed and are not yet fully understood. Studying this phenomenon in ohmic L-mode plasmas presents a rich opportunity to gain better understanding of intrinsic rotation and of momentum transport as a whole. The literature presents many different hypotheses, and this thesis explores three in particular. The first two hypotheses each posits a unique parameter as the primary dependency of reversals - the dominant turbulent mode, or the fastest growing turbulent mode(TEM/ITG), and the local density and temperature profile gradients, especially the electron density gradient, respectively. Other studies state that neoclassical effects cause the reversals and one study in particular presents a 1-D analytical model. Utilizing a new data analysis workflow built around GYRO, a gyrokinetic-Maxwell solver, hundreds of intrinsic rotation shots at Alcator C-Mod can be processed and analyzed without constant user management, which is used to test the three hypotheses. By comparing the rotation gradient u', a proxy variable indicative of the core toroidal intrinsic rotation velocity, to the parameters identified by the hypotheses, little correlation has been found between u' and the dominant turbulence regime and the ion temperature, electron temperature, and electron density profile gradients. The plasma remains ITG-dominated based on linear stability analysis regardless of rotation direction and the local profile gradients are not statistically significant in predicting the u'. Additionally, the experimental results in C-Mod and ASDEX Upgrade have shown strong disagreement with the 1 -D neoclassical model. Strong correlation has been found between u' and the effective collisionality Veff. These findings are inconsistent with previous experimental studies and suggest that further work is required to identify other key dependencies and/or uncover the complex physics and mechanisms at play.
by Daniel (Joowon) Kwak
S.M.

APA, Harvard, Vancouver, ISO, and other styles

6

Ba, Mouhamadou. "Composition guidée de services : application aux workflows d’analyse de données en bio-informatique." Thesis, Rennes, INSA, 2015. http://www.theses.fr/2015ISAR0024/document.

Full text

Abstract:

Dans les domaines scientifiques, particulièrement en bioinformatique, des services élémentaires sont composés sous forme de workflows pour effectuer des expériences d’analyse de données complexes. À cause de l’hétérogénéité des ressources, la composition de services est une tâche difficile. Les utilisateurs, en composant des workflows, manquent d’assistance pour retrouver et interconnecter les services compatibles. Les solutions existantes utilisent des services spéciaux définis de manière manuelle pour gérer les conversions de formats de données entre les entrées et sorties des services dans les workflows. Cela est pénible pour un utilisateur final. Gérer les incompatibilités des services avec des convertisseurs manuels prend du temps et est lourd. Il existe des solutions automatisées pour faciliter la composition de workflows mais elles sont généralement limitées dans le guidage et l’adaptation des données entre services. La première contribution de cette thèse propose de détecter systématiquement la convertibilité des sorties vers les entrées des services. La détection de convertibilité repose sur un système de règles basé sur une abstraction des types d’entrée et sortie des services. L’abstraction de types permet de considérer la nature et la composition des données d’entrée et sortie. Les règles permettent la décomposition et la composition ainsi que la spécialisation et la généralisation de types. Elles permettent également de générer des convertisseurs de données à utiliser entre services dans les workflows. La deuxième contribution propose une approche interactive qui permet de guider des utilisateurs à composer des workflows en fournissant des suggestions de services et de liaisons compatibles basées sur la convertibilité de types d’entrée et sortie des services. L’approche est basée sur le modèle des Systèmes d’Information Logiques (LIS) qui permettent des requêtes et une navigation guidées et sûres sur des données représentées avec une logique uniforme. Avec notre approche, la composition de workflows est sûre et complète vis-à-vis de propriétés désirées. Les résultats et les expériences, effectués sur des services et des types de données en bioinformatique, montrent la pertinence de nos approches. Nos approches offrent des mécanismes adaptés pour gérer les incompatibilités de services dans les workflows, en prenant en compte la structure composite des données d’entrée et sortie. Elles permettent également de guider, étape par étape, des utilisateurs à définir des workflows bien formés à travers des suggestions pertinentes
In scientific domains, particularly in bioinformatics, elementary services are composed as workflows to perform complex data analysis experiments. Due to the heterogeneity of resources, the composition of services is a difficult task. Users, when composing workflows, lack assistance to find and interconnect compatible services. Existing solutions use special services manually defined to manage data format conversions between the inputs and outputs of services in workflows, it is difficult for an end user. Managing service incompatibilities with manual converters is time-consuming and heavy. There are automated solutions to facilitate composing workflows but they are generally limited in the guidance and the data adaptation between services they offer. The first contribution of this thesis proposes to systematically detect convertibility from outputs to inputs of services. Convertibility detection relies on a rule system based on an abstraction of input and output types of services. Type abstraction enables to consider the nature and the composition of input and output data. Rules enable decomposition and composition as well as specialization and generalization of types. They also enable to generate data converters to use between services in workflows. The second contribution proposes an interactive approach that enables to guide users to compose workflows by providing suggestions of compatible services and links based on convertibility of input and output types of services. The approach is based on the framework of Logical Information Systems (LIS) that enables safe and guided requests and navigation on data represented with a uniform logic. With our approach, composition of workflows is safe and complete w.r.t. desired properties. The results and experiences, conducted on bioinformatics services and datatypes, show the relevance of our approaches. Our approaches offer adapted mechanisms to manage service incompatibilities in workflows, by taking into account the composite structure of inputs and outputs data. They enable to guide, step by step, users to define well-formed workflows through relevant suggestions

APA, Harvard, Vancouver, ISO, and other styles

7

Kreiß, Lucas [Verfasser], Oliver [Akademischer Betreuer] Friedrich, and Maximilian [Gutachter] Waldner. "Advanced Optical Technologies for Label-free Tissue Diagnostics - A complete workflow from the optical bench, over experimental studies to data analysis / Lucas Kreiß ; Gutachter: Maximilian Waldner ; Betreuer: Oliver Friedrich." Erlangen : Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), 2021. http://d-nb.info/1228627568/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Jaradat, Ward. "On the construction of decentralised service-oriented orchestration systems." Thesis, University of St Andrews, 2016. http://hdl.handle.net/10023/8036.

Full text

Abstract:

Modern science relies on workflow technology to capture, process, and analyse data obtained from scientific instruments. Scientific workflows are precise descriptions of experiments in which multiple computational tasks are coordinated based on the dataflows between them. Orchestrating scientific workflows presents a significant research challenge: they are typically executed in a manner such that all data pass through a centralised computer server known as the engine, which causes unnecessary network traffic that leads to a performance bottleneck. These workflows are commonly composed of services that perform computation over geographically distributed resources, and involve the management of dataflows between them. Centralised orchestration is clearly not a scalable approach for coordinating services dispersed across distant geographical locations. This thesis presents a scalable decentralised service-oriented orchestration system that relies on a high-level data coordination language for the specification and execution of workflows. This system's architecture consists of distributed engines, each of which is responsible for executing part of the overall workflow. It exploits parallelism in the workflow by decomposing it into smaller sub-workflows, and determines the most appropriate engines to execute them using computation placement analysis. This permits the workflow logic to be distributed closer to the services providing the data for execution, which reduces the overall data transfer in the workflow and improves its execution time. This thesis provides an evaluation of the presented system which concludes that decentralised orchestration provides scalability benefits over centralised orchestration, and improves the overall performance of executing a service-oriented workflow.

APA, Harvard, Vancouver, ISO, and other styles

9

Musaraj, Kreshnik. "Extraction automatique de protocoles de communication pour la composition de services Web." Thesis, Lyon 1, 2010. http://www.theses.fr/2010LYO10288/document.

Full text

Abstract:

La gestion des processus-métiers, des architectures orientées-services et leur rétro-ingénierie s’appuie fortement sur l’extraction des protocoles-métier des services Web et des modèles des processus-métiers à partir de fichiers de journaux. La fouille et l’extraction de ces modèles visent la (re)découverte du comportement d'un modèle mis en œuvre lors de son exécution en utilisant uniquement les traces d'activité, ne faisant usage d’aucune information a priori sur le modèle cible. Notre étude préliminaire montre que : (i) une minorité de données sur l'interaction sont enregistrées par le processus et les architectures de services, (ii) un nombre limité de méthodes d'extraction découvrent ce modèle sans connaître ni les instances positives du protocole, ni l'information pour les déduire, et (iii) les approches actuelles se basent sur des hypothèses restrictives que seule une fraction des services Web issus du monde réel satisfont. Rendre possible l'extraction de ces modèles d'interaction des journaux d'activité, en se basant sur des hypothèses réalistes nécessite: (i) des approches qui font abstraction du contexte de l'entreprise afin de permettre une utilisation élargie et générique, et (ii) des outils pour évaluer le résultat de la fouille à travers la mise en œuvre du cycle de vie des modèles découverts de services. En outre, puisque les journaux d'interaction sont souvent incomplets, comportent des erreurs et de l’information incertaine, alors les approches d'extraction proposées dans cette thèse doivent être capables de traiter ces imperfections correctement. Nous proposons un ensemble de modèles mathématiques qui englobent les différents aspects de la fouille des protocoles-métiers. Les approches d’extraction que nous présentons, issues de l'algèbre linéaire, nous permettent d'extraire le protocole-métier tout en fusionnant les étapes classiques de la fouille des processus-métiers. D'autre part, notre représentation du protocole basée sur des séries temporelles des variations de densité de flux permet de récupérer l'ordre temporel de l'exécution des événements et des messages dans un processus. En outre, nous proposons la définition des expirations propres pour identifier les transitions temporisées, et fournissons une méthode pour les extraire en dépit de leur propriété d'être invisible dans les journaux. Finalement, nous présentons un cadre multitâche visant à soutenir toutes les étapes du cycle de vie des workflow de processus et des protocoles, allant de la conception à l'optimisation. Les approches présentées dans ce manuscrit ont été implantées dans des outils de prototypage, et validées expérimentalement sur des ensembles de données et des modèles de processus et de services Web. Le protocole-métier découvert, peut ensuite être utilisé pour effectuer une multitude de tâches dans une organisation ou une entreprise
Business process management, service-oriented architectures and their reverse engineering heavily rely on the fundamental endeavor of mining business process models and Web service business protocols from log files. Model extraction and mining aim at the (re)discovery of the behavior of a running model implementation using solely its interaction and activity traces, and no a priori information on the target model. Our preliminary study shows that : (i) a minority of interaction data is recorded by process and service-aware architectures, (ii) a limited number of methods achieve model extraction without knowledge of either positive process and protocol instances or the information to infer them, and (iii) the existing approaches rely on restrictive assumptions that only a fraction of real-world Web services satisfy. Enabling the extraction of these interaction models from activity logs based on realistic hypothesis necessitates: (i) approaches that make abstraction of the business context in order to allow their extended and generic usage, and (ii) tools for assessing the mining result through implementation of the process and service life-cycle. Moreover, since interaction logs are often incomplete, uncertain and contain errors, then mining approaches proposed in this work need to be capable of handling these imperfections properly. We propose a set of mathematical models that encompass the different aspects of process and protocol mining. The extraction approaches that we present, issued from linear algebra, allow us to extract the business protocol while merging the classic process mining stages. On the other hand, our protocol representation based on time series of flow density variations makes it possible to recover the temporal order of execution of events and messages in the process. In addition, we propose the concept of proper timeouts to refer to timed transitions, and provide a method for extracting them despite their property of being invisible in logs. In the end, we present a multitask framework aimed at supporting all the steps of the process workflow and business protocol life-cycle from design to optimization.The approaches presented in this manuscript have been implemented in prototype tools, and experimentally validated on scalable datasets and real-world process and web service models.The discovered business protocols, can thus be used to perform a multitude of tasks in an organization or enterprise

APA, Harvard, Vancouver, ISO, and other styles

10

Khemiri, Wael. "Data-intensive interactive workflows for visual analytics." Phd thesis, Université Paris Sud - Paris XI, 2011. http://tel.archives-ouvertes.fr/tel-00659227.

Full text

Abstract:

The increasing amounts of electronic data of all forms, produced by humans (e.g. Web pages, structured content such as Wikipedia or the blogosphere etc.) and/or automatic tools (loggers, sensors, Web services, scientific programs or analysis tools etc.) leads to a situation of unprecedented potential for extracting new knowledge, finding new correlations, or simply making sense of the data.Visual analytics aims at combining interactive data visualization with data analysis tasks. Given the explosion in volume and complexity of scientific data, e.g., associated to biological or physical processes or social networks, visual analytics is called to play an important role in scientific data management.Most visual analytics platforms, however, are memory-based, and are therefore limited in the volume of data handled. Moreover, the integration of each new algorithm (e.g. for clustering) requires integrating it by hand into the platform. Finally, they lack the capability to define and deploy well-structured processes where users with different roles interact in a coordinated way sharing the same data and possibly the same visualizations.This work is at the convergence of three research areas: information visualization, database query processing and optimization, and workflow modeling. It provides two main contributions: (i) We propose a generic architecture for deploying a visual analytics platform on top of a database management system (DBMS) (ii) We show how to propagate data changes to the DBMS and visualizations, through the workflow process. Our approach has been implemented in a prototype called EdiFlow, and validated through several applications. It clearly demonstrates that visual analytics applications can benefit from robust storage and automatic process deployment provided by the DBMS while obtaining good performance and thus it provides scalability.Conversely, it could also be integrated into a data-intensive scientific workflow platform in order to increase its visualization features.

APA, Harvard, Vancouver, ISO, and other styles

11

Chan, Kai Kin. "Managing service-oriented data analysis workflows using semantic web technology." HKBU Institutional Repository, 2009. http://repository.hkbu.edu.hk/etd_ra/1055.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Backlund, Per. "The Use of Patterns in Information System Engineering." Thesis, University of Skövde, Department of Computer Science, 2001. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-619.

Full text

Abstract:

The aims of this dissertation are to investigate the use and usefulness of patterns in Information Systems Engineering and to identify future areas of research. In order to do this there is a need to survey different types of patterns and find a common concept of patterns. A pattern is based on experience found in the real world. A text or a model or a combination of the both can describe the pattern. A pattern is typically described in terms of context, forces, problem, and solution. These can be explicitly expressed or implicitly found in the description of the pattern.

The types of patterns dealt with are: object-oriented patterns; design patterns, analysis patterns; data model patterns; domain patterns; business patterns; workflow patterns and the deontic pattern. The different types of patterns are presented using the authors' own terminology.

The patterns described in the survey are classified with respect to different aspects. The intention of this analysis is to form a taxonomy for patterns and to bring order into the vast amount of patterns. This is an important step in order to find out how patterns are used and can be used in Information Systems Engineering. The aspects used in the classification are: level of abstraction; text or model emphasis; product or process emphasis; life cycle stage usage and combinations of these aspects.

Finally an outline for future areas of research is presented. The areas that have been considered of interest are: patterns and Information Systems Engineering methods; patterns and tools (tool support for patterns); patterns as a pedagogical aid; the extraction and documentation of patterns and patterns and novel applications of information technology. Each future area of research is sketched out.

APA, Harvard, Vancouver, ISO, and other styles

13

Krishnan, Niranjan Rao. "A Web-Based Software Platform for Data Processing Workflows and its Applications in Aerial Data Analysis." University of Cincinnati / OhioLINK, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1562842713394706.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Oluwaseun, Ajayi Olabode. "An evaluation of galaxy and ruffus-scripting workflows system for DNA-seq analysis." University of the Western Cape, 2018. http://hdl.handle.net/11394/6765.

Full text

Abstract:

>Magister Scientiae - MSc
Functional genomics determines the biological functions of genes on a global scale by using large volumes of data obtained through techniques including next-generation sequencing (NGS). The application of NGS in biomedical research is gaining in momentum, and with its adoption becoming more widespread, there is an increasing need for access to customizable computational workflows that can simplify, and offer access to, computer intensive analyses of genomic data. In this study, the Galaxy and Ruffus frameworks were designed and implemented with a view to address the challenges faced in biomedical research. Galaxy, a graphical web-based framework, allows researchers to build a graphical NGS data analysis pipeline for accessible, reproducible, and collaborative data-sharing. Ruffus, a UNIX command-line framework used by bioinformaticians as Python library to write scripts in object-oriented style, allows for building a workflow in terms of task dependencies and execution logic. In this study, a dual data analysis technique was explored which focuses on a comparative evaluation of Galaxy and Ruffus frameworks that are used in composing analysis pipelines. To this end, we developed an analysis pipeline in Galaxy, and Ruffus, for the analysis of Mycobacterium tuberculosis sequence data. Furthermore, this study aimed to compare the Galaxy framework to Ruffus with preliminary analysis revealing that the analysis pipeline in Galaxy displayed a higher percentage of load and store instructions. In comparison, pipelines in Ruffus tended to be CPU bound and memory intensive. The CPU usage, memory utilization, and runtime execution are graphically represented in this study. Our evaluation suggests that workflow frameworks have distinctly different features from ease of use, flexibility, and portability, to architectural designs.

APA, Harvard, Vancouver, ISO, and other styles

15

Wagner, Laurent. "MINESTIS, the route to resource estimates." Technische Universitaet Bergakademie Freiberg Universitaetsbibliothek "Georgius Agricola", 2015. http://nbn-resolving.de/urn:nbn:de:bsz:105-qucosa-181676.

Full text

Abstract:

Minestis software allows geological domain modeling and resource estimation through an efficient and simplified geostatistics-based workflow. It has been designed for all those, geologists, mining engineers or auditors, for whom quick production of quality models is at the heart of their concerns.

APA, Harvard, Vancouver, ISO, and other styles

16

Kadkhodamohammadi, Abdolrahim. "3D detection and pose estimation of medical staff in operating rooms using RGB-D images." Thesis, Strasbourg, 2016. http://www.theses.fr/2016STRAD047/document.

Full text

Abstract:

Dans cette thèse, nous traitons des problèmes de la détection des personnes et de l'estimation de leurs poses dans la Salle Opératoire (SO), deux éléments clés pour le développement d'applications d'assistance chirurgicale. Nous percevons la salle grâce à des caméras RGB-D qui fournissent des informations visuelles complémentaires sur la scène. Ces informations permettent de développer des méthodes mieux adaptées aux difficultés propres aux SO, comme l'encombrement, les surfaces sans texture et les occlusions. Nous présentons des nouvelles approches qui tirent profit des informations temporelles, de profondeur et des vues multiples afin de construire des modèles robustes pour la détection des personnes et de leurs poses. Une évaluation est effectuée sur plusieurs jeux de données complexes enregistrés dans des salles opératoires avec une ou plusieurs caméras. Les résultats obtenus sont très prometteurs et montrent que nos approches surpassent les méthodes de l'état de l'art sur ces données cliniques
In this thesis, we address the two problems of person detection and pose estimation in Operating Rooms (ORs), which are key ingredients in the development of surgical assistance applications. We perceive the OR using compact RGB-D cameras that can be conveniently integrated in the room. These sensors provide complementary information about the scene, which enables us to develop methods that can cope with numerous challenges present in the OR, e.g. clutter, textureless surfaces and occlusions. We present novel part-based approaches that take advantage of depth, multi-view and temporal information to construct robust human detection and pose estimation models. Evaluation is performed on new single- and multi-view datasets recorded in operating rooms. We demonstrate very promising results and show that our approaches outperform state-of-the-art methods on this challenging data acquired during real surgeries

APA, Harvard, Vancouver, ISO, and other styles

17

Lemon, Alexander Michael. "A Shared-Memory Coupled Architecture to Leverage Big Data Frameworks in Prototyping and In-Situ Analytics for Data Intensive Scientific Workflows." BYU ScholarsArchive, 2019. https://scholarsarchive.byu.edu/etd/7545.

Full text

Abstract:

There is a pressing need for creative new data analysis methods whichcan sift through scientific simulation data and produce meaningfulresults. The types of analyses and the amount of data handled by currentmethods are still quite restricted, and new methods could providescientists with a large productivity boost. New methods could be simpleto develop in big data processing systems such as Apache Spark, which isdesigned to process many input files in parallel while treating themlogically as one large dataset. This distributed model, combined withthe large number of analysis libraries created for the platform, makesSpark ideal for processing simulation output.Unfortunately, the filesystem becomes a major bottleneck in any workflowthat uses Spark in such a fashion. Faster transports are notintrinsically supported by Spark, and its interface almost denies thepossibility of maintainable third-party extensions. By leveraging thesemantics of Scala and Spark's recent scheduler upgrades, we forceco-location of Spark executors with simulation processes and enable fastlocal inter-process communication through shared memory. This provides apath for bulk data transfer into the Java Virtual Machine, removing thecurrent Spark ingestion bottleneck.Besides showing that our system makes this transfer feasible, we alsodemonstrate a proof-of-concept system integrating traditional HPC codeswith bleeding-edge analytics libraries. This provides scientists withguidance on how to apply our libraries to gain a new and powerful toolfor developing new analysis techniques in large scientific simulationpipelines.

APA, Harvard, Vancouver, ISO, and other styles

18

Gröbe, Mathias. "Konzeption und Entwicklung eines automatisierten Workflows zur geovisuellen Analyse von georeferenzierten Textdaten(strömen) / Microblogging Content." Master's thesis, Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2016. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-210672.

Full text

Abstract:

Die vorliegende Masterarbeit behandelt den Entwurf und die exemplarische Umsetzung eines Arbeitsablaufs zur Aufbereitung von georeferenziertem Microblogging Content. Als beispielhafte Datenquelle wurde Twitter herangezogen. Darauf basierend, wurden Überlegungen angestellt, welche Arbeitsschritte nötig und mit welchen Mitteln sie am besten realisiert werden können. Dabei zeigte sich, dass eine ganze Reihe von Bausteinen aus dem Bereich des Data Mining und des Text Mining für eine Pipeline bereits vorhanden sind und diese zum Teil nur noch mit den richtigen Einstellungen aneinandergereiht werden müssen. Zwar kann eine logische Reihenfolge definiert werden, aber weitere Anpassungen auf die Fragestellung und die verwendeten Daten können notwendig sein. Unterstützt wird dieser Prozess durch verschiedenen Visualisierungen mittels Histogrammen, Wortwolken und Kartendarstellungen. So kann neues Wissen entdeckt und nach und nach die Parametrisierung der Schritte gemäß den Prinzipien des Geovisual Analytics verfeinert werden. Für eine exemplarische Umsetzung wurde nach der Betrachtung verschiedener Softwareprodukte die für statistische Anwendungen optimierte Programmiersprache R ausgewählt. Abschließend wurden die Software mit Daten von Twitter und Flickr evaluiert
This Master's Thesis deals with the conception and exemplary implementation of a workflow for georeferenced Microblogging Content. Data from Twitter is used as an example and as a starting point to think about how to build that workflow. In the field of Data Mining and Text Mining, there was found a whole range of useful software modules that already exist. Mostly, they only need to get lined up to a process pipeline using appropriate preferences. Although a logical order can be defined, further adjustments according to the research question and the data are required. The process is supported by different forms of visualizations such as histograms, tag clouds and maps. This way new knowledge can be discovered and the options for the preparation can be improved. This way of knowledge discovery is already known as Geovisual Analytics. After a review of multiple existing software tools, the programming language R is used to implement the workflow as this language is optimized for solving statistical problems. Finally, the workflow has been tested using data from Twitter and Flickr

APA, Harvard, Vancouver, ISO, and other styles

19

Zielasko, Daniel [Verfasser], Torsten [Akademischer Betreuer] Kuhlen, and Benjamin [Akademischer Betreuer] Weyers. "DeskVR: seamless integration of virtual reality into desk-based data analysis workflows / Daniel Zielasko ; Torsten Kuhlen, Benjamin Weyers." Aachen : Universitätsbibliothek der RWTH Aachen, 2020. http://d-nb.info/1220360120/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

Hafez, Khafaga Ahmed Ibrahem 1987. "Bioinformatics approaches for integration and analysis of fungal omics data oriented to knowledge discovery and diagnosis." Doctoral thesis, TDX (Tesis Doctorals en Xarxa), 2021. http://hdl.handle.net/10803/671160.

Full text

Abstract:

Aquesta tesi presenta una sèrie de recursos bioinformàtics desenvolupats per a donar suport en l'anàlisi de dades de NGS i altres òmics en el camp d'estudi i diagnòstic d'infeccions fúngiques. Hem dissenyat tècniques de computació per identificar nous biomarcadors i determinar potencial trets de resistència, pronosticant les característiques de les seqüències d'ADN/ARN, i planejant estratègies optimitzades de seqüenciació per als estudis de hoste-patogen transcriptomes (Dual RNA-seq). Hem dissenyat i desenvolupat tambe una solució bioinformàtica composta per un component de costat de servidor (constituït per diferents pipelines per a fer anàlisi VariantSeq, Denovoseq i RNAseq) i un altre component constituït per eines software basades en interfícies gràfiques (GUIs) per permetre a l'usuari accedir, gestionar i executar els pipelines mitjançant interfícies amistoses. També hem desenvolupat i validat un software per a l'anàlisi de seqüències i el disseny dels primers (SeqEditor) orientat a la identificació i detecció d'espècies en el diagnòstic de la PCR. Finalment, hem desenvolupat CandidaMine una base de dades integrant dades omiques de fongs patògens.
The aim of this thesis has been to develop a series of bioinformatic resources for analysis of NGS data, proteomics, or other omics technologies in the field of study and diagnosis of yeast infections. In particular, we have explored and designed distinct computational techniques to identify novel biomarker candidates of resistance traits, to predict DNA/RNA sequences’ features, and to optimize sequencing strategies for host-pathogen transcriptome sequencing studies (Dual RNA-seq). We have designed and developed an efficient bioinformatic solution composed of a server-side component constituted by distinct pipelines for VariantSeq, Denovoseq and RNAseq analyses as well as another component constituted by distinct GUI-based software to let the user to access, manage and run the pipelines with friendly-to-use interfaces. We have also designed and developed SeqEditor a software for sequence analysis and primers design for species identification and detection in PCR diagnosis. We also have developed CandidaMine an integrated data warehouse of fungal omics and for data analysis and knowledge discovery.

APA, Harvard, Vancouver, ISO, and other styles

21

Shomroni, Orr [Verfasser], Stefan [Akademischer Betreuer] [Gutachter] Bonn, and Stephan [Gutachter] Waack. "Development of algorithms and next-generation sequencing data workflows for the analysis of gene regulatory networks / Orr Shomroni ; Gutachter: Stefan Bonn, Stephan Waack ; Betreuer: Stefan Bonn." Göttingen : Niedersächsische Staats- und Universitätsbibliothek Göttingen, 2017. http://d-nb.info/1129956350/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Emami, Khoonsari Payam. "Proteomics Studies of Subjects with Alzheimer’s Disease and Chronic Pain." Doctoral thesis, Uppsala universitet, Klinisk kemi, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-331748.

Full text

Abstract:

Alzheimer’s disease (AD) is a neurodegenerative disease and the major cause of dementia, affecting more than 50 million people worldwide. Chronic pain is long-lasting, persistent pain that affects more than 1.5 billion of the world population. Overlapping and heterogenous symptoms of AD and chronic pain conditions complicate their diagnosis, emphasizing the need for more specific biomarkers to improve the diagnosis and understand the disease mechanisms. To characterize disease pathology of AD, we measured the protein changes in the temporal neocortex region of the brain of AD subjects using mass spectrometry (MS). We found proteins involved in exo-endocytic and extracellular vesicle functions displaying altered levels in the AD brain, potentially resulting in neuronal dysfunction and cell death in AD. To detect novel biomarkers for AD, we used MS to analyze cerebrospinal fluid (CSF) of AD patients and found decreased levels of eight proteins compared to controls, potentially indicating abnormal activity of complement system in AD. By integrating new proteomics markers with absolute levels of Aβ42, total tau (t-tau) and p-tau in CSF, we improved the prediction accuracy from 83% to 92% of early diagnosis of AD. We found increased levels of chitinase-3-like protein 1 (CH3L1) and decreased levels of neurosecretory protein VGF (VGF) in AD compared to controls. By exploring the CSF proteome of neuropathic pain patients before and after successful spinal cord stimulation (SCS) treatment, we found altered levels of twelve proteins, involved in neuroprotection, synaptic plasticity, nociceptive signaling and immune regulation. To detect biomarkers for diagnosing a chronic pain state known as fibromyalgia (FM), we analyzed the CSF of FM patients using MS. We found altered levels of four proteins, representing novel biomarkers for diagnosing FM. These proteins are involved in inflammatory mechanisms, energy metabolism and neuropeptide signaling. Finally, to facilitate fast and robust large-scale omics data handling, we developed an e-infrastructure. We demonstrated that the e-infrastructure provides high scalability, flexibility and it can be applied in virtually any fields including proteomics. This thesis demonstrates that proteomics is a promising approach for gaining deeper insight into mechanisms of nervous system disorders and find biomarkers for diagnosis of such diseases.

APA, Harvard, Vancouver, ISO, and other styles

23

HUANG, CHI-WEI, and 黃智偉. "Integrating Microarray Data Analysis Services with Web Services and Workflow Infrastructure." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/53326427509156230587.

Full text

Abstract:

碩士
國立陽明大學
衛生資訊與決策研究所
92
Over the past few years, the technologies on biology research are significant breakthroughs. Numerous organizations and people have developed various applications and generate biological data host in the bioinformatics field. However, when we want to use those services, we must click and link to every service’s URL one by one so that we couldn’t use all services conveniently and completely. If we can put those useful services together and build a flexible environment, researcher can use all services within this environment by an uncomplicated way. For this purpose, we meditate an integrated system to get rid of the predicament in bioinformatics. Web services, an emerging technology, used basic and common standards such as XML and HTTP. In this thesis, we propose to adopt Web Services to build a system for integrating several services those are useful tool in bioinformatics for researcher. We provide a feasible solution to achieve the goal. First, we construct a interface that defers to Web Services standard and make good use of R statistical application to implement some web services for remote use. Second, we developed Registry Editor of the Integrated Analysis Environment to provide an integrated GUI environment. Finally, we designed a Workflow Editor and Engine to accomplish the job scheduling analysis tasks using workflow mechanism. We demonstrated this system can work well in microarray data analysis. This system provided some conveniently interface for easy use, and it also made use of workflow to automate process the job of analysis. The system has completed basic function, but there still remains room for future study.

APA, Harvard, Vancouver, ISO, and other styles

24

Liao, Tzu-Huei, and 廖子慧. "The Workflow for Next-Generation Sequencing Data Analysis of Human Genome." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/a9ag69.

Full text

Abstract:

碩士
國立交通大學
統計學研究所
101
Next-generation sequencing (NGS) technology is fast and economical. It also has high-output, high-resolution and low failure rate. We can get whole genome sequencing (WGS) and whole exome sequencing (WES) data by NGS technology. Recently, WGS and WES analyses are the most popular way to analyze disease association with genome. They can help us understand biological evolution and compare the different between individuals. However, it is difficult to process WGS and WES data since these data a too large to store and analyze. At present, there have been few literatures about the workflow for analyzing WGS and WES data from beginning to end. Therefore, this thesis offers a general workflow to analyze WGS and WES data. The workflow first aligns raw sequence reads to reference by software BWA or Bowtie2. Then, we convert around different file formats, mark PCR duplicates, and perform local realigning around indels by using software Picard or samtools. Finally, we use software GATK to discover variants, analize depth of coverage, and detect somatic indel. Following this flow path, we can obtain many useful files for subsequent research. In this thesis, the WGS and WES data of sample NA12878 from 1000 Genomes website is used to illustrate the summarized workflow.

APA, Harvard, Vancouver, ISO, and other styles

25

Henriques, David. "Using Building Data Models to Represent Workﬂows and a Contextual Dimension." Thesis, 2009. http://hdl.handle.net/10012/4649.

Full text

Abstract:

The context-workﬂow relationship is often poorly deﬁned or forgotten entirely. In workﬂow systems and applications context is either omitted, deﬁned by the workﬂow or deﬁned based on a single aspect of a contextual dimension. In complex environments this can be problematic as the deﬁnition of context is useful in determining the set of possible workﬂows. Context provides the envelope that surrounds the workﬂow and determines what is or is not possible. The relationship between workﬂow and context is also poorly deﬁned. That context can exist independently of workﬂow is often ignored, and workﬂow does not exist independently of context. Workﬂow representations void of context violate this stipulation. In order for a workﬂow representation to exist in a contextual dimension it must possess the same dimensions as the context. In this thesis we selected one contextual dimension to study, in this case the spatial dimension, and developed a comprehensive deﬁnition using building data models. Building data models are an advanced form of representation that build geometric data models into an ob ject-oriented representation consisting of common building elements. The building data model used was the Industry Foundation Classes (IFC) as it is the leading standard in this emerging ﬁeld. IFC was created for the construction of facilities and not the use of facilities at a later time. In order to incorporate workﬂows into IFC models, a zoning technique was developed in order to represent the workﬂow in IFC. The zoning concept was derived from multi-criteria layout for facilities layout and was adapted for IFC and workﬂow. Based on the above work a zoning extension was created to explore the combination of IFC, workﬂow and simulation. The extension is a proof of concept and is not intended to represent a robust formalized system. The results indicate that the use of a comprehensive deﬁnition of a contextual dimension may prove valuable to future expert systems.

APA, Harvard, Vancouver, ISO, and other styles

26

Kuhn, Thomas [Verfasser]. "Open source workflow engine for cheminformatics : from data curation to data analysis / vorgelegt von Thomas Kuhn." 2009. http://d-nb.info/994060726/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

27

Ginige, Jeewani A., University of Western Sydney, College of Health and Science, and School of Computing and Mathematics. "Change impact analysis to manage process evolution in web workflows." 2008. http://handle.uws.edu.au:8081/1959.7/32727.

Full text

Abstract:

Organisations have processes to manage their business activities, often referred to as business processes. In today’s competitive global economy, automation of processes with appropriate technology is advantageous. However, the paradox of processes automation is the continuous evolution and change that occurs in business processes. As the business processes evolve and change, the underpinning automated systems need to reflect those changes. Even after a decade of research in the areas of business process automation (BPA) and business process evolution management (BPEM), organisations still find it challenging to manage evolution of automated processes. Therefore, this thesis finds answers to the question of “How can business process evolutions be accurately and effectively reflected in already implemented web-based workflow systems?” In order to provide a holistic solution to the above research question, this research introduces a framework named paradigm of process automation – PoPA framework and discusses its role in managing process evolution. This framework embodies a business process at four levels as pragmatic, semantic, syntactic, and implementation. Each of these levels deals with a distinctive representation of a business process. For example, the pragmatic level represents the contextual artefact elements such as Acts, policies, organisational structures, rules, and guidelines; that define a process, and the syntactic level denotes the models created for the purposes of automation. When a change takes place in any one of the levels of the PoPA framework, it creates a propagating impact on elements in the above-mentioned four levels. This propagation of impact takes place due to constraints, associations, dependencies (CAD) among elements within and across the levels (intra and inter-level CAD). When analysing intra and inter-level CAD most correlations are found to be hierarchical; therefore, a relational database structure is appropriate to capture these hierarchical associations. However, operational processes at the semantic level have complex associations, which are not hierarchical. Therefore, this research proposes to use Kleene Algebra with Test (KAT) for representing CAD at the semantic level. Propagating impact does not exclusively depend on inter and intra-level CAD, but is also closely associated with the nature of evolution. Depending on the nature of evolution, the propagating impact can be categorised as direct, indirect, secondary, and non-cautionary (DISN) impact. These DISN impacts suggest the severity of the propagating impact. The core contribution of this research is the Process Evolution and Change Impact Analysis (PECIA) Model, which enables the management of process evolution accurately and effectively in automated systems. In this research, a process automation project named Online Courses Approval System (OCAS) is used as an exploratory case study. The practical utility of the PECIA Model is validated using evolution scenarios of OCAS and epistemic utility is analysed based on a study of the literature. Amidst a plethora of literature on BPA and BPEM, this research is significant due to the following theoretical contributions that facilitate in managing automated processes in tandem with organisational process evolution: �� PECIA Model holistically captures inter and intra-level CAD of process elements facilitating the propagating impact analysis within and across the four levels of the PoPA framework. • A novel use of KAT to capture CAD among process elements cohesively and completely into linear expressions, in order to analyse the impact propagation. • An algorithm that analyses KAT expressions of a process, to locate DISN impacts so that evolutions can be carried out accurately and effectively. The future works that arise from this work are manifold. These may include improving the use of the PECIA Model as a corporate process knowledge repository, and exploring possible other uses of the PECIA Model and KAT based process expressions.
Doctor of Philosophy (PhD)

APA, Harvard, Vancouver, ISO, and other styles

28

Shomroni, Orr. "Development of algorithms and next-generation sequencing data workflows for the analysis of gene regulatory networks." Doctoral thesis, 2017. http://hdl.handle.net/11858/00-1735-0000-0023-3E0C-8.

Full text

APA, Harvard, Vancouver, ISO, and other styles

29

Witty, Derick. "Implementation of a Laboratory Information Management System To Manage Genomic Samples." Thesis, 2013. http://hdl.handle.net/1805/3521.

Full text

Abstract:

Indiana University-Purdue University Indianapolis (IUPUI)
A Laboratory Information Management Systems (LIMS) is designed to manage laboratory processes and data. It has the ability to extend the core functionality of the LIMS through configuration tools and add-on modules to support the implementation of complex laboratory workflows. The purpose of this project is to demonstrate how laboratory data and processes from a complex workflow can be implemented using a LIMS. Genomic samples have become an important part of the drug development process due to advances in molecular testing technology. This technology evaluates genomic material for disease markers and provides efficient, cost-effective, and accurate results for a growing number of clinical indications. The preparation of the genomic samples for evaluation requires a complex laboratory process called the precision aliquotting workflow. The precision aliquotting workflow processes genomic samples into precisely created aliquots for analysis. The workflow is defined by a set of aliquotting scheme attributes that are executed based on scheme specific rules logic. The aliquotting scheme defines the attributes of each aliquot based on the achieved sample recovery of the genomic sample. The scheme rules logic executes the creation of the aliquots based on the scheme definitions. LabWare LIMS is a Windows® based open architecture system that manages laboratory data and workflow processes. A LabWare LIMS model was developed to implement the precision aliquotting workflow using a combination of core functionality and configured code.

APA, Harvard, Vancouver, ISO, and other styles

30

Jabour, Abdulrahman M. "Cancer reporting: timeliness analysis and process reengineering." Diss., 2015. http://hdl.handle.net/1805/10481.

Full text

Abstract:

Indiana University-Purdue University Indianapolis (IUPUI)
Introduction: Cancer registries collect tumor-related data to monitor incident rates and support population-based research. A common concern with using population-based registry data for research is reporting timeliness. Data timeliness have been recognized as an important data characteristic by both the Centers for Disease Control and Prevention (CDC) and the Institute of Medicine (IOM). Yet, few recent studies in the United States (U.S.) have systemically measured timeliness. The goal of this research is to evaluate the quality of cancer data and examine methods by which the reporting process can be improved. The study aims are: 1- evaluate the timeliness of cancer cases at the Indiana State Department of Health (ISDH) Cancer Registry, 2- identify the perceived barriers and facilitators to timely reporting, and 3- reengineer the current reporting process to improve turnaround time. Method: For Aim 1: Using the ISDH dataset from 2000 to 2009, we evaluated the reporting timeliness and subtask within the process cycle. For Aim 2: Certified cancer registrars reporting for ISDH were invited to a semi-structured interview. The interviews were recorded and qualitatively analyzed. For Aim 3: We designed a reengineered workflow to minimize the reporting timeliness and tested it using simulation. Result: The results show variation in the mean reporting time, which ranged from 426 days in 2003 to 252 days in 2009. The barriers identified were categorized into six themes and the most common barrier was accessing medical records at external facilities. We also found that cases reside for a few months in the local hospital database while waiting for treatment data to become available. The recommended workflow focused on leveraging a health information exchange for data access and adding a notification system to inform registrars when new treatments are available.

APA, Harvard, Vancouver, ISO, and other styles

31

"Utilization of automated location tracking for clinical workflow analytics and visualization." Doctoral diss., 2018. http://hdl.handle.net/2286/R.I.51634.

Full text

Abstract:

abstract: The analysis of clinical workflow offers many challenges to clinical stakeholders and researchers, especially in environments characterized by dynamic and concurrent processes. Workflow analysis in such environments is essential for monitoring performance and finding bottlenecks and sources of error. Clinical workflow analysis has been enhanced with the inclusion of modern technologies. One such intervention is automated location tracking which is a system that detects the movement of clinicians and equipment. Utilizing the data produced from automated location tracking technologies can lead to the development of novel workflow analytics that can be used to complement more traditional approaches such as ethnography and grounded-theory based qualitative methods. The goals of this research are to: (i) develop a series of analytic techniques to derive deeper workflow-related insight in an emergency department setting, (ii) overlay data from disparate sources (quantitative and qualitative) to develop strategies that facilitate workflow redesign, and (iii) incorporate visual analytics methods to improve the targeted visual feedback received by providers based on the findings. The overarching purpose is to create a framework to demonstrate the utility of automated location tracking data used in conjunction with clinical data like EHR logs and its vital role in the future of clinical workflow analysis/analytics. This document is categorized based on two primary aims of the research. The first aim deals with the use of automated location tracking data to develop a novel methodological/exploratory framework for clinical workflow. The second aim is to overlay the quantitative data generated from the previous aim on data from qualitative observation and shadowing studies (mixed methods) to develop a deeper view of clinical workflow that can be used to facilitate workflow redesign. The final sections of the document speculate on the direction of this work where the potential of this research in the creation of fully integrated clinical environments i.e. environments with state-of-the-art location tracking and other data collection mechanisms, is discussed. The main purpose of this research is to demonstrate ways by which clinical processes can be continuously monitored allowing for proactive adaptations in the face of technological and process changes to minimize any negative impact on the quality of patient care and provider satisfaction.
Dissertation/Thesis
Doctoral Dissertation Biomedical Informatics 2018

APA, Harvard, Vancouver, ISO, and other styles

32

Gröbe, Mathias. "Konzeption und Entwicklung eines automatisierten Workflows zur geovisuellen Analyse von georeferenzierten Textdaten(strömen) / Microblogging Content." Master's thesis, 2015. https://tud.qucosa.de/id/qucosa%3A29848.

Full text

Abstract:

Die vorliegende Masterarbeit behandelt den Entwurf und die exemplarische Umsetzung eines Arbeitsablaufs zur Aufbereitung von georeferenziertem Microblogging Content. Als beispielhafte Datenquelle wurde Twitter herangezogen. Darauf basierend, wurden Überlegungen angestellt, welche Arbeitsschritte nötig und mit welchen Mitteln sie am besten realisiert werden können. Dabei zeigte sich, dass eine ganze Reihe von Bausteinen aus dem Bereich des Data Mining und des Text Mining für eine Pipeline bereits vorhanden sind und diese zum Teil nur noch mit den richtigen Einstellungen aneinandergereiht werden müssen. Zwar kann eine logische Reihenfolge definiert werden, aber weitere Anpassungen auf die Fragestellung und die verwendeten Daten können notwendig sein. Unterstützt wird dieser Prozess durch verschiedenen Visualisierungen mittels Histogrammen, Wortwolken und Kartendarstellungen. So kann neues Wissen entdeckt und nach und nach die Parametrisierung der Schritte gemäß den Prinzipien des Geovisual Analytics verfeinert werden. Für eine exemplarische Umsetzung wurde nach der Betrachtung verschiedener Softwareprodukte die für statistische Anwendungen optimierte Programmiersprache R ausgewählt. Abschließend wurden die Software mit Daten von Twitter und Flickr evaluiert.
This Master's Thesis deals with the conception and exemplary implementation of a workflow for georeferenced Microblogging Content. Data from Twitter is used as an example and as a starting point to think about how to build that workflow. In the field of Data Mining and Text Mining, there was found a whole range of useful software modules that already exist. Mostly, they only need to get lined up to a process pipeline using appropriate preferences. Although a logical order can be defined, further adjustments according to the research question and the data are required. The process is supported by different forms of visualizations such as histograms, tag clouds and maps. This way new knowledge can be discovered and the options for the preparation can be improved. This way of knowledge discovery is already known as Geovisual Analytics. After a review of multiple existing software tools, the programming language R is used to implement the workflow as this language is optimized for solving statistical problems. Finally, the workflow has been tested using data from Twitter and Flickr.

APA, Harvard, Vancouver, ISO, and other styles

33

(9237002), Amani M. Abu Jabal. "Digital Provenance Techniques and Applications." Thesis, 2020.

Find full text

Abstract:

This thesis describes a data provenance framework and other associated frameworks for utilizing provenance for data quality and reproducibility. We first identify the requirements for the design of a comprehensive provenance framework which can be applicable to various applications, supports a rich set of provenance metadata, and is interoperable with other provenance management systems. We then design and develop a provenance framework, called SimP, addressing such requirements. Next, we present four prominent applications and investigate how provenance data can be beneficial to such applications. The first application is the quality assessment of access control policies. Towards this, we design and implement the ProFact framework which uses provenance techniques for collecting comprehensive data about actions which were either triggered due to a network context or a user (i.e., a human or a device) action. Provenance data are used to determine whether the policies meet the quality requirements. ProFact includes two approaches for policy analysis: structure-based and classification-based. For the structure-based approach, we design tree structures to organize and assess the policy set efficiently. For the classification-based approach, we employ several classification techniques to learn the characteristics of policies and predict their quality. In addition, ProFact supports policy evolution and the assessment of its impact on the policy quality. The second application is workflow reproducibility. Towards this, we implement ProWS which is a provenance-based architecture for retrieving workflows. Specifically, ProWS transforms data provenance into workflows and then organizes data into a set of indexes to support efficient querying mechanisms. ProWS supports composite queries on three types of search criteria: keywords of workflow tasks, patterns of workflow structure, and metadata about workflows (e.g., how often a workflow was used). The third application is the access control policy reproducibility. Towards this, we propose a novel framework, Polisma, which generates attribute-based access control policies from data, namely from logs of historical access requests and their corresponding decisions. Polisma combines data mining, statistical, and machine learning techniques, and capitalizes on potential context information obtained from external sources (e.g., LDAP directories) to enhance the learning process. The fourth application is the policy reproducibility by utilizing knowledge and experience transferability. Towards this, we propose a novel framework, FLAP, which transfer attribute-based access control policies between different parties in a collaborative environment, while considering the challenges of minimal sharing of data and support policy adaptation to address conflict. All frameworks are evaluated with respect to performance and accuracy.

APA, Harvard, Vancouver, ISO, and other styles

34

Σφήκα, Νίκη. "Δυναμική ανάθεση υπολογιστικών πόρων και συ-ντονισμός εκτέλεσης πολύπλοκων διαδικασιών ανάλυσης δεδομένων σε υποδομή Cloud." Thesis, 2015. http://hdl.handle.net/10889/8814.

Full text

Abstract:

Το Υπολογιστικό Νέφος (Cloud Computing) χαρακτηρίζεται ως το νέο μοντέλο ανάπτυξης λογισμικού και παροχής υπηρεσιών στον τομέα των Τεχνολογιών Πληροφορικής και Επικοινωνιών. Τα κύρια χαρακτηριστικά του είναι η κατά απαίτηση διάθεση υπολογιστικών πόρων, η απομακρυσμένη πρόσβαση σε αυτούς μέσω διαδικτύου και η ευελιξία των παρεχόμενων υπηρεσιών. Η ευελιξία επιτρέπει την αναβάθμιση ή υποβάθμιση των υπολογιστικών πόρων σύμφωνα με τις απαιτήσεις του τελικού χρήστη. Επιπλέον, η συνεχής αύξηση του μεγέθους της παραγόμενης από διάφορες πηγές πληροφορίας (διαδίκτυο, επιστημονικά πειράματα) έχει δημιουργήσει μία τεράστια ποσότητα πολύπλοκων και διάχυτων ψηφιακών δεδομένων . Η απόσπαση χρήσιμης γνώσης από μεγάλου όγκου ψηφιακά δεδομένα απαιτεί έξυπνες και ευκόλως επεκτάσιμες υπηρεσίες ανάλυσης, εργαλεία προγραμματισμού και εφαρμογές. Επομένως, η δυνατότητα της ελαστικότητας και της επεκτασιμότητας έχει κάνει το Υ-πολογιστικό Νέφος να είναι μια αναδυόμενη τεχνολογία αναφορικά με τις αναλύσεις μεγάλου όγκου δεδομένων οι οποίες απαιτούν παραλληλισμό, πολύπλοκες ροές ανάλυσης και υψηλό υπολογιστικό φόρτο εργασίας. Για την καλύτερη δυνατή διαχείριση πολύπλοκων αναλύσεων και ενορχήστρωση των απαιτούμενων διαδικασιών, είναι απαραίτητη η ένθεση ροών εργασιών. Μια ροή εργασίας είναι ένα οργανωμένο σύνολο ενεργειών που πρέπει να πραγματοποιηθούν για να επιτευχθεί μια εμπορική ή ερευνητική διεργασία, καθώς και οι μεταξύ τους εξαρτήσεις αφού κάθε ενέργεια αποτελείται από ορισμένα βήματα που πρέπει να εκτελεστούν σε συγκεκριμένη σειρά. Στην παρούσα μεταπτυχιακή διπλωματική εργασία δημιουργήθηκε ένα σύστημα για τη δυναμική διαχείριση των προσφερόμενων πόρων σε μια υποδομή Υπολογιστικού Νέφους και την εκτέλεση κατανεμημένων υλοποιήσεων υπολογιστικής ανάλυσης δεδομένων. Συγκεκριμένα, η εφαρμογή, αφού λάβει από το χρήστη τα δεδομένα εισόδου για την έναρξη μιας νέας διαδικασίας ανάλυσης, εξετάζει τα δεδομένα των επιστημονικών προβλημάτων καθώς και την πολυπλοκότητά τους και παρέχει δυναμικά και αυτόματα τους αντίστοιχους υπολογιστικούς πόρους για την εκτέλεση της αντίστοιχης λειτουργίας ανάλυσής τους. Επίσης, επιτρέπει την καταγραφή της ανάλυσης και αναθέτει τον συντονισμό της διαδικασίας σε αντίστοιχες ροές εργασιών ώστε να διευκολυνθεί η ενορχήστρωση των παρεχόμενων πόρων και η παρακολούθηση της εκτέλεσης της υπολογιστικής διαδικασίας. Η συγκεκριμένη μεταπτυχιακή εργασία, με τη χρήση τόσο των παρεχόμενων υπηρεσιών μιας υποδομής Υπολογιστικού Νέφους όσο και των δυνατοτήτων που παρέχουν οι ροές εργασιών στην διαχείριση των εργασιών, έχει σαν αποτέλεσμα να απλουστεύει την πρόσβαση, τον έλεγχο, την οργάνωση και την εκτέλεση πολύπλοκων και παράλληλων υλοποιήσεων ανάλυσης δεδομένων από την στιγμή εισαγωγής των δεδομένων από το χρήστη έως τον υπολογισμό του τελικού αποτελέσματος. Πιο αναλυτικά η διπλωματική εργασία επικεντρώθηκε στη πρόταση μιας ολοκληρωμένης λύσης για: 1. τη παροχή μιας εφαρμογής στην οποία ο χρήστης θα έχει τη δυνατότητα να εισάγεται και να ξεκινά μια σύνθετη ανάλυση δεδομένων, 2. τη δημιουργία της κατάλληλης υποδομής για τη δυναμική διάθεση πόρων από μια cloud υποδομή ανάλογα με τις ανάγκες του εκάστοτε προβλήματος και 3. την αυτοματοποιημένη εκτέλεση και συντονισμό της διαδικασίας της ανάλυσης με χρήση ροών εργασιών. Για την επικύρωση και αξιολόγηση της εφαρμογής, αναπτύχθηκε η πλατφόρμα IRaaS η οποία παρέχει στους χρήστες του τη δυνατότητα επίλυσης προβλημάτων πολλαπλών πεδίων / πολλαπλών φυσικών. Η πλατφόρμα IRaaS βασίστηκε πάνω στην προαναφερόμενη εφαρμογή για τη δυναμική ανάθεση υπολογιστικών πόρων και συντονισμός εκτέλεσης πολύπλοκων διαδικασιών ανάλυσης δεδομένων. Εκτελώντας μια σειρά αναλύσεων παρατηρήθηκε ότι η συγκεκριμένη εφαρμογή παρέχει καλύτερους χρόνους εκτέλεσης, μικρότερη δέσμευση υπολογιστικών πόρων και κατά συνέπεια μικρότερο κόστος για τις αναλύσεις. Η εγκατάσταση της πλατφόρμας IRaaS για την εκτέλεση των πειραμάτων έγινε στην υποδομή Υπολογιστικού Νέφους του εργαστηρίου Αναγνώρισης Προτύπων. Η υποδομή βασίστηκε στα λογισμικά XenServer και Cloudstack, τα οποία εγκαταστάθηκαν και παραμετροποιήθηκαν στα πλαίσια της παρούσας εργασίας.
Cloud Computing is the new software development and service providing model in the area of Information and Communication Technologies. The main aspects of Cloud Computing are the on-demand allocation of computational resources, the remote access to the latter via the Internet and the elasticity of the provided services. Elasticity provides the capability to scale the computational resources depending on the computational needs. The continuous proliferation of data warehouses, webpages, audio and video streams, tweets, and blogs is generating a massive amount of complex and pervasive digital data. Extracting useful knowledge from huge digital datasets requires smart and scalable analytics services, programming tools, and applications. Due to the aspects of elasticity and scalability, Cloud Computing has become an emerging technology regarding to big data analysis, which demands parallelization, complex workflow analysis and massive computational workload. In this respect, workflows have an important role in managing complex flows and orchestrating the required processes. A workflow is an orchestrated set of activities that are necessary in order to complete a commercial or scientific task, as well as any dependencies between these tasks, since each one of them can be further decomposed into finer tasks that need to be executed in a predefined order. In this thesis, a system is presented that dynamically allocates the available resources provided by a cloud infrastructure and orchestrates the execution of complex and distrib-uted data analysis on these allocated resources. In particular, the system calculates the required computational resources (memory and CPU) based on the size of the input data and on the available resources of the cloud infrastructure, concluding to allocate dynamically the most suitable resources. . Moreover, the application offers the ability to coordinate the distributed analysis process utilising workflows for the orchestration and monitoring of the different tasks of the computational flow execution. Taking advantage of the services provided by a cloud infrastructure as well as the functionality of workflows in task management, this thesis has resulted in simplifying access, control, coordination and execution of complex and parallel data analysis implementations from the moment that a user enters a set of input data to the computation of the final result. In this context, this thesis focuses on a comprehensive and integrated solution that: 1. provides an application, through which the user is able to log in and start a complex data analysis, 2. offers the necessary infrastructure for dynamically allocating the cloud resources of, based on the needs of the particular problem, and 3. executes and coordinates the analysis process automatically by leveraging workflows. In order to validate and evaluate the application, the IRaaS platform was developed, offering the ability of solving multi-domain/multi-physics problems. The IRaaS platform is based on the aforementioned system in order to enable the dynamic allocation of computational resources and to coordinate the execution of complex data analysis processes. By executing a series of experiments with different input data, we observed that the presented application resulted in improved execution times, better allocation of computational resources and, thus, lower cost. In order to perform experiments, the IRaaS platform was set up on the cloud infrastructure of Pattern Recognition laboratory. In the context of this thesis, a new infrastructure has been installed and parameterized based on XenServer as virtualization hypervisor and CloudStack platform for the creation of a private cloud infrastructure.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Data Analysis Workflow'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles