Дисертації з теми "Data Analysis Workflow"
Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями
Ознайомтеся з топ-34 дисертацій для дослідження на тему "Data Analysis Workflow".
Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.
Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.
Переглядайте дисертації для різних дисциплін та оформлюйте правильно вашу бібліографію.
Rodrigues, Roberto Wagner da Silva. "Deviation analysis of inter-organisational workflow systems." Thesis, Imperial College London, 2002. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.271151.
Повний текст джерелаMarsolo, Keith Allen. "A workflow for the modeling and analysis of biomedical data." Columbus, Ohio : Ohio State University, 2007. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=osu1180309265.
Повний текст джерелаCutler, Darren W., and Tyler J. Rasmussen. "Usability Testing and Workflow Analysis of the TRADOC Data Visualization Tool." Thesis, Monterey, California. Naval Postgraduate School, 2012. http://hdl.handle.net/10945/17350.
Повний текст джерелаThe volume of data available to military decision makers is vast. Leaders need tools to sort, analyze, and present information in an effective manner. Software complexity is also increasing, with user interfaces becoming more intricate and interactive. The Data Visualization Tool (DaViTo) is an effort by TRAC Monterey to produce a tool for use by personnel with little statistical background to process and display this data. To meet the program goals and make analytical capabilities more widely available, the user interface and data representation techniques need refinement. This usability test is a task-oriented study using eye-tracking, data representation techniques, and surveys to generate recommendations for software improvement. Twenty-four subjects participated in three sessions using DaViTo over a three-week period. The first two sessions consisted of training followed by basic reinforcement tasks, evaluation of graphical methods, and a brief survey. The final session was a task-oriented session followed by graphical representations evaluation and an extensive survey. Results from the three sessions were analyzed and 37 recommendations generated for the improvement of DaViTo. Improving software latency, providing more graphing options and tools, and inclusion of an effective training product are examples of important recommendations that would greatly improve usability.
Nagavaram, Ashish. "Cloud Based Dynamic Workflow with QOS For Mass Spectrometry Data Analysis." The Ohio State University, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=osu1322681210.
Повний текст джерелаKwak, Daniel (Daniel Joowon). "Investigation of intrinsic rotation dependencies in Alcator C-Mod using a new data analysis workflow." Thesis, Massachusetts Institute of Technology, 2015. http://hdl.handle.net/1721.1/103705.
Повний текст джерелаCataloged from PDF version of thesis.
Includes bibliographical references (pages 190-193).
Toroidal rotation, important for suppressing various turbulent modes, mitigating MHD instabilities, and preventing locked modes that cause disruptions, may not be sufficiently generated by external devices in larger devices i.e. ITER. One possible solution is intrinsic rotation, self-generated flow without external momentum input, which has been observed in multiple tokamaks. More specifically, rotation reversals, a sudden change in direction of intrinsic rotation without significant change in global plasma parameters, have also been observed and are not yet fully understood. Studying this phenomenon in ohmic L-mode plasmas presents a rich opportunity to gain better understanding of intrinsic rotation and of momentum transport as a whole. The literature presents many different hypotheses, and this thesis explores three in particular. The first two hypotheses each posits a unique parameter as the primary dependency of reversals - the dominant turbulent mode, or the fastest growing turbulent mode(TEM/ITG), and the local density and temperature profile gradients, especially the electron density gradient, respectively. Other studies state that neoclassical effects cause the reversals and one study in particular presents a 1-D analytical model. Utilizing a new data analysis workflow built around GYRO, a gyrokinetic-Maxwell solver, hundreds of intrinsic rotation shots at Alcator C-Mod can be processed and analyzed without constant user management, which is used to test the three hypotheses. By comparing the rotation gradient u', a proxy variable indicative of the core toroidal intrinsic rotation velocity, to the parameters identified by the hypotheses, little correlation has been found between u' and the dominant turbulence regime and the ion temperature, electron temperature, and electron density profile gradients. The plasma remains ITG-dominated based on linear stability analysis regardless of rotation direction and the local profile gradients are not statistically significant in predicting the u'. Additionally, the experimental results in C-Mod and ASDEX Upgrade have shown strong disagreement with the 1 -D neoclassical model. Strong correlation has been found between u' and the effective collisionality Veff. These findings are inconsistent with previous experimental studies and suggest that further work is required to identify other key dependencies and/or uncover the complex physics and mechanisms at play.
by Daniel (Joowon) Kwak
S.M.
Ba, Mouhamadou. "Composition guidée de services : application aux workflows d’analyse de données en bio-informatique." Thesis, Rennes, INSA, 2015. http://www.theses.fr/2015ISAR0024/document.
Повний текст джерелаIn scientific domains, particularly in bioinformatics, elementary services are composed as workflows to perform complex data analysis experiments. Due to the heterogeneity of resources, the composition of services is a difficult task. Users, when composing workflows, lack assistance to find and interconnect compatible services. Existing solutions use special services manually defined to manage data format conversions between the inputs and outputs of services in workflows, it is difficult for an end user. Managing service incompatibilities with manual converters is time-consuming and heavy. There are automated solutions to facilitate composing workflows but they are generally limited in the guidance and the data adaptation between services they offer. The first contribution of this thesis proposes to systematically detect convertibility from outputs to inputs of services. Convertibility detection relies on a rule system based on an abstraction of input and output types of services. Type abstraction enables to consider the nature and the composition of input and output data. Rules enable decomposition and composition as well as specialization and generalization of types. They also enable to generate data converters to use between services in workflows. The second contribution proposes an interactive approach that enables to guide users to compose workflows by providing suggestions of compatible services and links based on convertibility of input and output types of services. The approach is based on the framework of Logical Information Systems (LIS) that enables safe and guided requests and navigation on data represented with a uniform logic. With our approach, composition of workflows is safe and complete w.r.t. desired properties. The results and experiences, conducted on bioinformatics services and datatypes, show the relevance of our approaches. Our approaches offer adapted mechanisms to manage service incompatibilities in workflows, by taking into account the composite structure of inputs and outputs data. They enable to guide, step by step, users to define well-formed workflows through relevant suggestions
Kreiß, Lucas [Verfasser], Oliver [Akademischer Betreuer] Friedrich, and Maximilian [Gutachter] Waldner. "Advanced Optical Technologies for Label-free Tissue Diagnostics - A complete workflow from the optical bench, over experimental studies to data analysis / Lucas Kreiß ; Gutachter: Maximilian Waldner ; Betreuer: Oliver Friedrich." Erlangen : Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), 2021. http://d-nb.info/1228627568/34.
Повний текст джерелаJaradat, Ward. "On the construction of decentralised service-oriented orchestration systems." Thesis, University of St Andrews, 2016. http://hdl.handle.net/10023/8036.
Повний текст джерелаMusaraj, Kreshnik. "Extraction automatique de protocoles de communication pour la composition de services Web." Thesis, Lyon 1, 2010. http://www.theses.fr/2010LYO10288/document.
Повний текст джерелаBusiness process management, service-oriented architectures and their reverse engineering heavily rely on the fundamental endeavor of mining business process models and Web service business protocols from log files. Model extraction and mining aim at the (re)discovery of the behavior of a running model implementation using solely its interaction and activity traces, and no a priori information on the target model. Our preliminary study shows that : (i) a minority of interaction data is recorded by process and service-aware architectures, (ii) a limited number of methods achieve model extraction without knowledge of either positive process and protocol instances or the information to infer them, and (iii) the existing approaches rely on restrictive assumptions that only a fraction of real-world Web services satisfy. Enabling the extraction of these interaction models from activity logs based on realistic hypothesis necessitates: (i) approaches that make abstraction of the business context in order to allow their extended and generic usage, and (ii) tools for assessing the mining result through implementation of the process and service life-cycle. Moreover, since interaction logs are often incomplete, uncertain and contain errors, then mining approaches proposed in this work need to be capable of handling these imperfections properly. We propose a set of mathematical models that encompass the different aspects of process and protocol mining. The extraction approaches that we present, issued from linear algebra, allow us to extract the business protocol while merging the classic process mining stages. On the other hand, our protocol representation based on time series of flow density variations makes it possible to recover the temporal order of execution of events and messages in the process. In addition, we propose the concept of proper timeouts to refer to timed transitions, and provide a method for extracting them despite their property of being invisible in logs. In the end, we present a multitask framework aimed at supporting all the steps of the process workflow and business protocol life-cycle from design to optimization.The approaches presented in this manuscript have been implemented in prototype tools, and experimentally validated on scalable datasets and real-world process and web service models.The discovered business protocols, can thus be used to perform a multitude of tasks in an organization or enterprise
Khemiri, Wael. "Data-intensive interactive workflows for visual analytics." Phd thesis, Université Paris Sud - Paris XI, 2011. http://tel.archives-ouvertes.fr/tel-00659227.
Повний текст джерелаChan, Kai Kin. "Managing service-oriented data analysis workflows using semantic web technology." HKBU Institutional Repository, 2009. http://repository.hkbu.edu.hk/etd_ra/1055.
Повний текст джерелаBacklund, Per. "The Use of Patterns in Information System Engineering." Thesis, University of Skövde, Department of Computer Science, 2001. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-619.
Повний текст джерелаThe aims of this dissertation are to investigate the use and usefulness of patterns in Information Systems Engineering and to identify future areas of research. In order to do this there is a need to survey different types of patterns and find a common concept of patterns. A pattern is based on experience found in the real world. A text or a model or a combination of the both can describe the pattern. A pattern is typically described in terms of context, forces, problem, and solution. These can be explicitly expressed or implicitly found in the description of the pattern.
The types of patterns dealt with are: object-oriented patterns; design patterns, analysis patterns; data model patterns; domain patterns; business patterns; workflow patterns and the deontic pattern. The different types of patterns are presented using the authors' own terminology.
The patterns described in the survey are classified with respect to different aspects. The intention of this analysis is to form a taxonomy for patterns and to bring order into the vast amount of patterns. This is an important step in order to find out how patterns are used and can be used in Information Systems Engineering. The aspects used in the classification are: level of abstraction; text or model emphasis; product or process emphasis; life cycle stage usage and combinations of these aspects.
Finally an outline for future areas of research is presented. The areas that have been considered of interest are: patterns and Information Systems Engineering methods; patterns and tools (tool support for patterns); patterns as a pedagogical aid; the extraction and documentation of patterns and patterns and novel applications of information technology. Each future area of research is sketched out.
Krishnan, Niranjan Rao. "A Web-Based Software Platform for Data Processing Workflows and its Applications in Aerial Data Analysis." University of Cincinnati / OhioLINK, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1562842713394706.
Повний текст джерелаOluwaseun, Ajayi Olabode. "An evaluation of galaxy and ruffus-scripting workflows system for DNA-seq analysis." University of the Western Cape, 2018. http://hdl.handle.net/11394/6765.
Повний текст джерелаFunctional genomics determines the biological functions of genes on a global scale by using large volumes of data obtained through techniques including next-generation sequencing (NGS). The application of NGS in biomedical research is gaining in momentum, and with its adoption becoming more widespread, there is an increasing need for access to customizable computational workflows that can simplify, and offer access to, computer intensive analyses of genomic data. In this study, the Galaxy and Ruffus frameworks were designed and implemented with a view to address the challenges faced in biomedical research. Galaxy, a graphical web-based framework, allows researchers to build a graphical NGS data analysis pipeline for accessible, reproducible, and collaborative data-sharing. Ruffus, a UNIX command-line framework used by bioinformaticians as Python library to write scripts in object-oriented style, allows for building a workflow in terms of task dependencies and execution logic. In this study, a dual data analysis technique was explored which focuses on a comparative evaluation of Galaxy and Ruffus frameworks that are used in composing analysis pipelines. To this end, we developed an analysis pipeline in Galaxy, and Ruffus, for the analysis of Mycobacterium tuberculosis sequence data. Furthermore, this study aimed to compare the Galaxy framework to Ruffus with preliminary analysis revealing that the analysis pipeline in Galaxy displayed a higher percentage of load and store instructions. In comparison, pipelines in Ruffus tended to be CPU bound and memory intensive. The CPU usage, memory utilization, and runtime execution are graphically represented in this study. Our evaluation suggests that workflow frameworks have distinctly different features from ease of use, flexibility, and portability, to architectural designs.
Wagner, Laurent. "MINESTIS, the route to resource estimates." Technische Universitaet Bergakademie Freiberg Universitaetsbibliothek "Georgius Agricola", 2015. http://nbn-resolving.de/urn:nbn:de:bsz:105-qucosa-181676.
Повний текст джерелаKadkhodamohammadi, Abdolrahim. "3D detection and pose estimation of medical staff in operating rooms using RGB-D images." Thesis, Strasbourg, 2016. http://www.theses.fr/2016STRAD047/document.
Повний текст джерелаIn this thesis, we address the two problems of person detection and pose estimation in Operating Rooms (ORs), which are key ingredients in the development of surgical assistance applications. We perceive the OR using compact RGB-D cameras that can be conveniently integrated in the room. These sensors provide complementary information about the scene, which enables us to develop methods that can cope with numerous challenges present in the OR, e.g. clutter, textureless surfaces and occlusions. We present novel part-based approaches that take advantage of depth, multi-view and temporal information to construct robust human detection and pose estimation models. Evaluation is performed on new single- and multi-view datasets recorded in operating rooms. We demonstrate very promising results and show that our approaches outperform state-of-the-art methods on this challenging data acquired during real surgeries
Lemon, Alexander Michael. "A Shared-Memory Coupled Architecture to Leverage Big Data Frameworks in Prototyping and In-Situ Analytics for Data Intensive Scientific Workflows." BYU ScholarsArchive, 2019. https://scholarsarchive.byu.edu/etd/7545.
Повний текст джерелаGröbe, Mathias. "Konzeption und Entwicklung eines automatisierten Workflows zur geovisuellen Analyse von georeferenzierten Textdaten(strömen) / Microblogging Content." Master's thesis, Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2016. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-210672.
Повний текст джерелаThis Master's Thesis deals with the conception and exemplary implementation of a workflow for georeferenced Microblogging Content. Data from Twitter is used as an example and as a starting point to think about how to build that workflow. In the field of Data Mining and Text Mining, there was found a whole range of useful software modules that already exist. Mostly, they only need to get lined up to a process pipeline using appropriate preferences. Although a logical order can be defined, further adjustments according to the research question and the data are required. The process is supported by different forms of visualizations such as histograms, tag clouds and maps. This way new knowledge can be discovered and the options for the preparation can be improved. This way of knowledge discovery is already known as Geovisual Analytics. After a review of multiple existing software tools, the programming language R is used to implement the workflow as this language is optimized for solving statistical problems. Finally, the workflow has been tested using data from Twitter and Flickr
Zielasko, Daniel [Verfasser], Torsten [Akademischer Betreuer] Kuhlen, and Benjamin [Akademischer Betreuer] Weyers. "DeskVR: seamless integration of virtual reality into desk-based data analysis workflows / Daniel Zielasko ; Torsten Kuhlen, Benjamin Weyers." Aachen : Universitätsbibliothek der RWTH Aachen, 2020. http://d-nb.info/1220360120/34.
Повний текст джерелаHafez, Khafaga Ahmed Ibrahem 1987. "Bioinformatics approaches for integration and analysis of fungal omics data oriented to knowledge discovery and diagnosis." Doctoral thesis, TDX (Tesis Doctorals en Xarxa), 2021. http://hdl.handle.net/10803/671160.
Повний текст джерелаThe aim of this thesis has been to develop a series of bioinformatic resources for analysis of NGS data, proteomics, or other omics technologies in the field of study and diagnosis of yeast infections. In particular, we have explored and designed distinct computational techniques to identify novel biomarker candidates of resistance traits, to predict DNA/RNA sequences’ features, and to optimize sequencing strategies for host-pathogen transcriptome sequencing studies (Dual RNA-seq). We have designed and developed an efficient bioinformatic solution composed of a server-side component constituted by distinct pipelines for VariantSeq, Denovoseq and RNAseq analyses as well as another component constituted by distinct GUI-based software to let the user to access, manage and run the pipelines with friendly-to-use interfaces. We have also designed and developed SeqEditor a software for sequence analysis and primers design for species identification and detection in PCR diagnosis. We also have developed CandidaMine an integrated data warehouse of fungal omics and for data analysis and knowledge discovery.
Shomroni, Orr [Verfasser], Stefan [Akademischer Betreuer] [Gutachter] Bonn, and Stephan [Gutachter] Waack. "Development of algorithms and next-generation sequencing data workflows for the analysis of gene regulatory networks / Orr Shomroni ; Gutachter: Stefan Bonn, Stephan Waack ; Betreuer: Stefan Bonn." Göttingen : Niedersächsische Staats- und Universitätsbibliothek Göttingen, 2017. http://d-nb.info/1129956350/34.
Повний текст джерелаEmami, Khoonsari Payam. "Proteomics Studies of Subjects with Alzheimer’s Disease and Chronic Pain." Doctoral thesis, Uppsala universitet, Klinisk kemi, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-331748.
Повний текст джерелаHUANG, CHI-WEI, and 黃智偉. "Integrating Microarray Data Analysis Services with Web Services and Workflow Infrastructure." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/53326427509156230587.
Повний текст джерела國立陽明大學
衛生資訊與決策研究所
92
Over the past few years, the technologies on biology research are significant breakthroughs. Numerous organizations and people have developed various applications and generate biological data host in the bioinformatics field. However, when we want to use those services, we must click and link to every service’s URL one by one so that we couldn’t use all services conveniently and completely. If we can put those useful services together and build a flexible environment, researcher can use all services within this environment by an uncomplicated way. For this purpose, we meditate an integrated system to get rid of the predicament in bioinformatics. Web services, an emerging technology, used basic and common standards such as XML and HTTP. In this thesis, we propose to adopt Web Services to build a system for integrating several services those are useful tool in bioinformatics for researcher. We provide a feasible solution to achieve the goal. First, we construct a interface that defers to Web Services standard and make good use of R statistical application to implement some web services for remote use. Second, we developed Registry Editor of the Integrated Analysis Environment to provide an integrated GUI environment. Finally, we designed a Workflow Editor and Engine to accomplish the job scheduling analysis tasks using workflow mechanism. We demonstrated this system can work well in microarray data analysis. This system provided some conveniently interface for easy use, and it also made use of workflow to automate process the job of analysis. The system has completed basic function, but there still remains room for future study.
Liao, Tzu-Huei, and 廖子慧. "The Workflow for Next-Generation Sequencing Data Analysis of Human Genome." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/a9ag69.
Повний текст джерела國立交通大學
統計學研究所
101
Next-generation sequencing (NGS) technology is fast and economical. It also has high-output, high-resolution and low failure rate. We can get whole genome sequencing (WGS) and whole exome sequencing (WES) data by NGS technology. Recently, WGS and WES analyses are the most popular way to analyze disease association with genome. They can help us understand biological evolution and compare the different between individuals. However, it is difficult to process WGS and WES data since these data a too large to store and analyze. At present, there have been few literatures about the workflow for analyzing WGS and WES data from beginning to end. Therefore, this thesis offers a general workflow to analyze WGS and WES data. The workflow first aligns raw sequence reads to reference by software BWA or Bowtie2. Then, we convert around different file formats, mark PCR duplicates, and perform local realigning around indels by using software Picard or samtools. Finally, we use software GATK to discover variants, analize depth of coverage, and detect somatic indel. Following this flow path, we can obtain many useful files for subsequent research. In this thesis, the WGS and WES data of sample NA12878 from 1000 Genomes website is used to illustrate the summarized workflow.
Henriques, David. "Using Building Data Models to Represent Workflows and a Contextual Dimension". Thesis, 2009. http://hdl.handle.net/10012/4649.
Повний текст джерелаKuhn, Thomas [Verfasser]. "Open source workflow engine for cheminformatics : from data curation to data analysis / vorgelegt von Thomas Kuhn." 2009. http://d-nb.info/994060726/34.
Повний текст джерелаGinige, Jeewani A., University of Western Sydney, College of Health and Science, and School of Computing and Mathematics. "Change impact analysis to manage process evolution in web workflows." 2008. http://handle.uws.edu.au:8081/1959.7/32727.
Повний текст джерелаDoctor of Philosophy (PhD)
Shomroni, Orr. "Development of algorithms and next-generation sequencing data workflows for the analysis of gene regulatory networks." Doctoral thesis, 2017. http://hdl.handle.net/11858/00-1735-0000-0023-3E0C-8.
Повний текст джерелаWitty, Derick. "Implementation of a Laboratory Information Management System To Manage Genomic Samples." Thesis, 2013. http://hdl.handle.net/1805/3521.
Повний текст джерелаA Laboratory Information Management Systems (LIMS) is designed to manage laboratory processes and data. It has the ability to extend the core functionality of the LIMS through configuration tools and add-on modules to support the implementation of complex laboratory workflows. The purpose of this project is to demonstrate how laboratory data and processes from a complex workflow can be implemented using a LIMS. Genomic samples have become an important part of the drug development process due to advances in molecular testing technology. This technology evaluates genomic material for disease markers and provides efficient, cost-effective, and accurate results for a growing number of clinical indications. The preparation of the genomic samples for evaluation requires a complex laboratory process called the precision aliquotting workflow. The precision aliquotting workflow processes genomic samples into precisely created aliquots for analysis. The workflow is defined by a set of aliquotting scheme attributes that are executed based on scheme specific rules logic. The aliquotting scheme defines the attributes of each aliquot based on the achieved sample recovery of the genomic sample. The scheme rules logic executes the creation of the aliquots based on the scheme definitions. LabWare LIMS is a Windows® based open architecture system that manages laboratory data and workflow processes. A LabWare LIMS model was developed to implement the precision aliquotting workflow using a combination of core functionality and configured code.
Jabour, Abdulrahman M. "Cancer reporting: timeliness analysis and process reengineering." Diss., 2015. http://hdl.handle.net/1805/10481.
Повний текст джерелаIntroduction: Cancer registries collect tumor-related data to monitor incident rates and support population-based research. A common concern with using population-based registry data for research is reporting timeliness. Data timeliness have been recognized as an important data characteristic by both the Centers for Disease Control and Prevention (CDC) and the Institute of Medicine (IOM). Yet, few recent studies in the United States (U.S.) have systemically measured timeliness. The goal of this research is to evaluate the quality of cancer data and examine methods by which the reporting process can be improved. The study aims are: 1- evaluate the timeliness of cancer cases at the Indiana State Department of Health (ISDH) Cancer Registry, 2- identify the perceived barriers and facilitators to timely reporting, and 3- reengineer the current reporting process to improve turnaround time. Method: For Aim 1: Using the ISDH dataset from 2000 to 2009, we evaluated the reporting timeliness and subtask within the process cycle. For Aim 2: Certified cancer registrars reporting for ISDH were invited to a semi-structured interview. The interviews were recorded and qualitatively analyzed. For Aim 3: We designed a reengineered workflow to minimize the reporting timeliness and tested it using simulation. Result: The results show variation in the mean reporting time, which ranged from 426 days in 2003 to 252 days in 2009. The barriers identified were categorized into six themes and the most common barrier was accessing medical records at external facilities. We also found that cases reside for a few months in the local hospital database while waiting for treatment data to become available. The recommended workflow focused on leveraging a health information exchange for data access and adding a notification system to inform registrars when new treatments are available.
"Utilization of automated location tracking for clinical workflow analytics and visualization." Doctoral diss., 2018. http://hdl.handle.net/2286/R.I.51634.
Повний текст джерелаDissertation/Thesis
Doctoral Dissertation Biomedical Informatics 2018
Gröbe, Mathias. "Konzeption und Entwicklung eines automatisierten Workflows zur geovisuellen Analyse von georeferenzierten Textdaten(strömen) / Microblogging Content." Master's thesis, 2015. https://tud.qucosa.de/id/qucosa%3A29848.
Повний текст джерелаThis Master's Thesis deals with the conception and exemplary implementation of a workflow for georeferenced Microblogging Content. Data from Twitter is used as an example and as a starting point to think about how to build that workflow. In the field of Data Mining and Text Mining, there was found a whole range of useful software modules that already exist. Mostly, they only need to get lined up to a process pipeline using appropriate preferences. Although a logical order can be defined, further adjustments according to the research question and the data are required. The process is supported by different forms of visualizations such as histograms, tag clouds and maps. This way new knowledge can be discovered and the options for the preparation can be improved. This way of knowledge discovery is already known as Geovisual Analytics. After a review of multiple existing software tools, the programming language R is used to implement the workflow as this language is optimized for solving statistical problems. Finally, the workflow has been tested using data from Twitter and Flickr.
(9237002), Amani M. Abu Jabal. "Digital Provenance Techniques and Applications." Thesis, 2020.
Знайти повний текст джерелаΣφήκα, Νίκη. "Δυναμική ανάθεση υπολογιστικών πόρων και συ-ντονισμός εκτέλεσης πολύπλοκων διαδικασιών ανάλυσης δεδομένων σε υποδομή Cloud". Thesis, 2015. http://hdl.handle.net/10889/8814.
Повний текст джерелаCloud Computing is the new software development and service providing model in the area of Information and Communication Technologies. The main aspects of Cloud Computing are the on-demand allocation of computational resources, the remote access to the latter via the Internet and the elasticity of the provided services. Elasticity provides the capability to scale the computational resources depending on the computational needs. The continuous proliferation of data warehouses, webpages, audio and video streams, tweets, and blogs is generating a massive amount of complex and pervasive digital data. Extracting useful knowledge from huge digital datasets requires smart and scalable analytics services, programming tools, and applications. Due to the aspects of elasticity and scalability, Cloud Computing has become an emerging technology regarding to big data analysis, which demands parallelization, complex workflow analysis and massive computational workload. In this respect, workflows have an important role in managing complex flows and orchestrating the required processes. A workflow is an orchestrated set of activities that are necessary in order to complete a commercial or scientific task, as well as any dependencies between these tasks, since each one of them can be further decomposed into finer tasks that need to be executed in a predefined order. In this thesis, a system is presented that dynamically allocates the available resources provided by a cloud infrastructure and orchestrates the execution of complex and distrib-uted data analysis on these allocated resources. In particular, the system calculates the required computational resources (memory and CPU) based on the size of the input data and on the available resources of the cloud infrastructure, concluding to allocate dynamically the most suitable resources. . Moreover, the application offers the ability to coordinate the distributed analysis process utilising workflows for the orchestration and monitoring of the different tasks of the computational flow execution. Taking advantage of the services provided by a cloud infrastructure as well as the functionality of workflows in task management, this thesis has resulted in simplifying access, control, coordination and execution of complex and parallel data analysis implementations from the moment that a user enters a set of input data to the computation of the final result. In this context, this thesis focuses on a comprehensive and integrated solution that: 1. provides an application, through which the user is able to log in and start a complex data analysis, 2. offers the necessary infrastructure for dynamically allocating the cloud resources of, based on the needs of the particular problem, and 3. executes and coordinates the analysis process automatically by leveraging workflows. In order to validate and evaluate the application, the IRaaS platform was developed, offering the ability of solving multi-domain/multi-physics problems. The IRaaS platform is based on the aforementioned system in order to enable the dynamic allocation of computational resources and to coordinate the execution of complex data analysis processes. By executing a series of experiments with different input data, we observed that the presented application resulted in improved execution times, better allocation of computational resources and, thus, lower cost. In order to perform experiments, the IRaaS platform was set up on the cloud infrastructure of Pattern Recognition laboratory. In the context of this thesis, a new infrastructure has been installed and parameterized based on XenServer as virtualization hypervisor and CloudStack platform for the creation of a private cloud infrastructure.