Journal articles on the topic 'Data Analysis Workflow'

To see the other types of publications on this topic, follow the link: Data Analysis Workflow.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Data Analysis Workflow.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Pfaff, Claas-Thido, Karin Nadrowski, Sophia Ratcliffe, Christian Wirth, and Helge Bruelheide. "Readable workflows need simple data." F1000Research 3 (May 14, 2014): 110. http://dx.doi.org/10.12688/f1000research.3940.1.

Full text
Abstract:
Sharing scientific analyses via workflows has great potential to improve the reproducibility of science as well as communicating research results. This is particularly useful for trans-disciplinary research fields such as biodiversity - ecosystem functioning (BEF), where syntheses need to merge data ranging from genes to the biosphere. Here we argue that enabling simplicity in the very beginning of workflows, at the point of data description and merging, offers huge potentials in reducing workflow complexity and in fostering data and workflow reuse. We illustrate our points using a typical analysis in BEF research, the aggregation of carbon pools in a forest ecosystem. We introduce indicators for the complexity of workflow components including data sources. We show that workflow complexity decreases exponentially during the course of the analysis and that simple text-based measures help to identify bottlenecks in a workflow and group workflow components according to tasks. We thus suggest that focusing on simplifying steps of data aggregation and imputation will greatly improve workflow readability and thus reproducibility. Providing feedback to data providers about the complexity of their datasets may help to produce better focused data that can be used more easily in further studies. At the same time, providing feedback about the complexity of workflow components may help to exchange shorter and simpler workflows for easier reuse. Additionally, identifying repetitive tasks informs software development in providing automated solutions. We discuss current initiatives in software and script development that implement quality control for simplicity and social tools of script valuation. Taken together we argue that focusing on simplifying data sources and workflow components will improve and accelerate data and workflow reuse and simplify the reproducibility of data-driven science.
APA, Harvard, Vancouver, ISO, and other styles
2

Song, Tianhong, Sven Köhler, Bertram Ludäscher, James Hanken, Maureen Kelly, David Lowery, James A. Macklin, Paul J. Morris, and Robert A. Morris. "Towards Automated Design, Analysis and Optimization of Declarative Curation Workflows." International Journal of Digital Curation 9, no. 2 (October 29, 2014): 111–22. http://dx.doi.org/10.2218/ijdc.v9i2.337.

Full text
Abstract:
Data curation is increasingly important. Our previous work on a Kepler curation package has demonstrated advantages that come from automating data curation pipelines by using workflow systems. However, manually designed curation workflows can be error-prone and inefficient due to a lack of user understanding of the workflow system, misuse of actors, or human error. Correcting problematic workflows is often very time-consuming. A more proactive workflow system can help users avoid such pitfalls. For example, static analysis before execution can be used to detect the potential problems in a workflow and help the user to improve workflow design. In this paper, we propose a declarative workflow approach that supports semi-automated workflow design, analysis and optimization. We show how the workflow design engine helps users to construct data curation workflows, how the workflow analysis engine detects different design problems of workflows and how workflows can be optimized by exploiting parallelism.
APA, Harvard, Vancouver, ISO, and other styles
3

Hribar, Michelle R., Sarah Read-Brown, Isaac H. Goldstein, Leah G. Reznick, Lorinna Lombardi, Mansi Parikh, Winston Chamberlain, and Michael F. Chiang. "Secondary use of electronic health record data for clinical workflow analysis." Journal of the American Medical Informatics Association 25, no. 1 (September 26, 2017): 40–46. http://dx.doi.org/10.1093/jamia/ocx098.

Full text
Abstract:
Abstract Objective Outpatient clinics lack guidance for tackling modern efficiency and productivity demands. Workflow studies require large amounts of timing data that are prohibitively expensive to collect through observation or tracking devices. Electronic health records (EHRs) contain a vast amount of timing data – timestamps collected during regular use – that can be mapped to workflow steps. This study validates using EHR timestamp data to predict outpatient ophthalmology clinic workflow timings at Oregon Health and Science University and demonstrates their usefulness in 3 different studies. Materials and Methods Four outpatient ophthalmology clinics were observed to determine their workflows and to time each workflow step. EHR timestamps were mapped to the workflow steps and validated against the observed timings. Results The EHR timestamp analysis produced times that were within 3 min of the observed times for >80% of the appointments. EHR use patterns affected the accuracy of using EHR timestamps to predict workflow times. Discussion EHR timestamps provided a reasonable approximation of workflow and can be used for workflow studies. They can be used to create simulation models, analyze EHR use, and quantify the impact of trainees on workflow. Conclusion The secondary use of EHR timestamp data is a valuable resource for clinical workflow studies. Sample timestamp data files and algorithms for processing them are provided and can be used as a template for more studies in other clinical specialties and settings.
APA, Harvard, Vancouver, ISO, and other styles
4

Suetake, Hirotaka, Tomoya Tanjo, Manabu Ishii, Bruno P. Kinoshita, Takeshi Fujino, Tsuyoshi Hachiya, Yuichi Kodama, et al. "Sapporo: A workflow execution service that encourages the reuse of workflows in various languages in bioinformatics." F1000Research 11 (August 4, 2022): 889. http://dx.doi.org/10.12688/f1000research.122924.1.

Full text
Abstract:
The increased demand for efficient computation in data analysis encourages researchers in biomedical science to use workflow systems. Workflow systems, or so-called workflow languages, are used for the description and execution of a set of data analysis steps. Workflow systems increase the productivity of researchers, specifically in fields that use high-throughput DNA sequencing applications, where scalable computation is required. As systems have improved the portability of data analysis workflows, research communities are able to share workflows to reduce the cost of building ordinary analysis procedures. However, having multiple workflow systems in a research field has resulted in the distribution of efforts across different workflow system communities. As each workflow system has its unique characteristics, it is not feasible to learn every single system in order to use publicly shared workflows. Thus, we developed Sapporo, an application to provide a unified layer of workflow execution upon the differences of various workflow systems. Sapporo has two components: an application programming interface (API) that receives the request of a workflow run and a browser-based client for the API. The API follows the Workflow Execution Service API standard proposed by the Global Alliance for Genomics and Health. The current implementation supports the execution of workflows in four languages: Common Workflow Language, Workflow Description Language, Snakemake, and Nextflow. With its extensible and scalable design, Sapporo can support the research community in utilizing valuable resources for data analysis.
APA, Harvard, Vancouver, ISO, and other styles
5

Thang, Mike W. C., Xin-Yi Chua, Gareth Price, Dominique Gorse, and Matt A. Field. "MetaDEGalaxy: Galaxy workflow for differential abundance analysis of 16s metagenomic data." F1000Research 8 (May 23, 2019): 726. http://dx.doi.org/10.12688/f1000research.18866.1.

Full text
Abstract:
Metagenomic sequencing is an increasingly common tool in environmental and biomedical sciences yet analysis workflows remain immature relative to other field such as DNASeq and RNASeq analysis pipelines. While software for detailing the composition of microbial communities using 16S rRNA marker genes is constantly improving, increasingly researchers are interested in identifying changes exhibited within microbial communities under differing environmental conditions. In order to gain maximum value from metagenomic sequence data we must improve the existing analysis environment by providing accessible and scalable computational workflows able to generate reproducible results. Here we describe a complete end-to-end open-source metagenomics workflow running within Galaxy for 16S differential abundance analysis. The workflow accepts 454 or Illumina sequence data (either overlapping or non-overlapping paired end reads) and outputs lists of the operational taxonomic unit (OTUs) exhibiting the greatest change under differing conditions. A range of analysis steps and graphing options are available giving users a high-level of control over their data and analyses. Additionally, users are able to input complex sample-specific metadata information which can be incorporated into differential analysis and used for grouping / colouring within graphs. Detailed tutorials containing sample data and existing workflows are available for three different input types: overlapping and non-overlapping read pairs as well as for pre-generated Biological Observation Matrix (BIOM) files. Using the Galaxy platform we developed MetaDEGalaxy, a complete metagenomics differential abundance analysis workflow. MetaDEGalaxy is designed for bench scientists working with 16S data who are interested in comparative metagenomics. MetaDEGalaxy builds on momentum within the wider Galaxy metagenomics community with the hope that more tools will be added as existing methods mature.
APA, Harvard, Vancouver, ISO, and other styles
6

Šimko, Tibor, Lukas Heinrich, Harri Hirvonsalo, Dinos Kousidis, and Diego Rodríguez. "REANA: A System for Reusable Research Data Analyses." EPJ Web of Conferences 214 (2019): 06034. http://dx.doi.org/10.1051/epjconf/201921406034.

Full text
Abstract:
The revalidation, reinterpretation and reuse of research data analyses requires having access to the original computing environment, the experimental datasets, the analysis software, and the computational workflow steps which were used by researchers to produce the original scientific results in the first place. REANA (Reusable Analyses) is a nascent platform enabling researchers to structure their research data analyses in view of enabling future reuse. The analysis is described by means of a YAML file that captures sufficient information about the analysis assets, parameters and processes. The REANA platform consists of a set of micro-services allowing to launch and monitor container-based computational workflow jobs on the cloud. The REANA user interface and the command-line client enables researchers to easily rerun analysis workflows with new input parameters. The REANA platform aims at supporting several container technologies (Docker), workflow engines (CWL, Yadage), shared storage systems (Ceph, EOS) and compute cloud infrastructures (Ku-bernetes/OpenStack, HTCondor) used by the community. REANA was developed with the particle physics use case in mind and profits from synergies with general reusable research data analysis patterns in other scientific disciplines, such as bioinformatics and life sciences.
APA, Harvard, Vancouver, ISO, and other styles
7

Souza, Renan, Vitor Silva, Alexandre A. B. Lima, Daniel de Oliveira, Patrick Valduriez, and Marta Mattoso. "Distributed in-memory data management for workflow executions." PeerJ Computer Science 7 (May 7, 2021): e527. http://dx.doi.org/10.7717/peerj-cs.527.

Full text
Abstract:
Complex scientific experiments from various domains are typically modeled as workflows and executed on large-scale machines using a Parallel Workflow Management System (WMS). Since such executions usually last for hours or days, some WMSs provide user steering support, i.e., they allow users to run data analyses and, depending on the results, adapt the workflows at runtime. A challenge in the parallel execution control design is to manage workflow data for efficient executions while enabling user steering support. Data access for high scalability is typically transaction-oriented, while for data analysis, it is online analytical-oriented so that managing such hybrid workloads makes the challenge even harder. In this work, we present SchalaDB, an architecture with a set of design principles and techniques based on distributed in-memory data management for efficient workflow execution control and user steering. We propose a distributed data design for scalable workflow task scheduling and high availability driven by a parallel and distributed in-memory DBMS. To evaluate our proposal, we develop d-Chiron, a WMS designed according to SchalaDB’s principles. We carry out an extensive experimental evaluation on an HPC cluster with up to 960 computing cores. Among other analyses, we show that even when running data analyses for user steering, SchalaDB’s overhead is negligible for workloads composed of hundreds of concurrent tasks on shared data. Our results encourage workflow engine developers to follow a parallel and distributed data-oriented approach not only for scheduling and monitoring but also for user steering.
APA, Harvard, Vancouver, ISO, and other styles
8

Thang, Mike W. C., Xin-Yi Chua, Gareth Price, Dominique Gorse, and Matt A. Field. "MetaDEGalaxy: Galaxy workflow for differential abundance analysis of 16s metagenomic data." F1000Research 8 (October 18, 2019): 726. http://dx.doi.org/10.12688/f1000research.18866.2.

Full text
Abstract:
Metagenomic sequencing is an increasingly common tool in environmental and biomedical sciences. While software for detailing the composition of microbial communities using 16S rRNA marker genes is relatively mature, increasingly researchers are interested in identifying changes exhibited within microbial communities under differing environmental conditions. In order to gain maximum value from metagenomic sequence data we must improve the existing analysis environment by providing accessible and scalable computational workflows able to generate reproducible results. Here we describe a complete end-to-end open-source metagenomics workflow running within Galaxy for 16S differential abundance analysis. The workflow accepts 454 or Illumina sequence data (either overlapping or non-overlapping paired end reads) and outputs lists of the operational taxonomic unit (OTUs) exhibiting the greatest change under differing conditions. A range of analysis steps and graphing options are available giving users a high-level of control over their data and analyses. Additionally, users are able to input complex sample-specific metadata information which can be incorporated into differential analysis and used for grouping / colouring within graphs. Detailed tutorials containing sample data and existing workflows are available for three different input types: overlapping and non-overlapping read pairs as well as for pre-generated Biological Observation Matrix (BIOM) files. Using the Galaxy platform we developed MetaDEGalaxy, a complete metagenomics differential abundance analysis workflow. MetaDEGalaxy is designed for bench scientists working with 16S data who are interested in comparative metagenomics. MetaDEGalaxy builds on momentum within the wider Galaxy metagenomics community with the hope that more tools will be added as existing methods mature.
APA, Harvard, Vancouver, ISO, and other styles
9

Curcin, Vasa, Moustafa Ghanem, and Yike Guo. "The design and implementation of a workflow analysis tool." Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 368, no. 1926 (September 13, 2010): 4193–208. http://dx.doi.org/10.1098/rsta.2010.0157.

Full text
Abstract:
Motivated by the use of scientific workflows as a user-oriented mechanism for building executable scientific data integration and analysis applications, this article introduces a framework and a set of associated methods for analysing the execution properties of scientific workflows. Our framework uses a number of formal modelling techniques to characterize the process and data behaviour of workflows and workflow components and to reason about their functional and execution properties. We use the framework to design the architecture of a customizable tool that can be used to analyse the key execution properties of scientific workflows at authoring stage. Our design is generic and can be applied to a wide variety of scientific workflow languages and systems, and is evaluated by building a prototype of the tool for the Discovery Net system. We demonstrate and discuss the utility of the framework and tool using workflows from a real-world medical informatics study.
APA, Harvard, Vancouver, ISO, and other styles
10

Jackson, Michael, Kostas Kavoussanakis, and Edward W. J. Wallace. "Using prototyping to choose a bioinformatics workflow management system." PLOS Computational Biology 17, no. 2 (February 25, 2021): e1008622. http://dx.doi.org/10.1371/journal.pcbi.1008622.

Full text
Abstract:
Workflow management systems represent, manage, and execute multistep computational analyses and offer many benefits to bioinformaticians. They provide a common language for describing analysis workflows, contributing to reproducibility and to building libraries of reusable components. They can support both incremental build and re-entrancy—the ability to selectively re-execute parts of a workflow in the presence of additional inputs or changes in configuration and to resume execution from where a workflow previously stopped. Many workflow management systems enhance portability by supporting the use of containers, high-performance computing (HPC) systems, and clouds. Most importantly, workflow management systems allow bioinformaticians to delegate how their workflows are run to the workflow management system and its developers. This frees the bioinformaticians to focus on what these workflows should do, on their data analyses, and on their science. RiboViz is a package to extract biological insight from ribosome profiling data to help advance understanding of protein synthesis. At the heart of RiboViz is an analysis workflow, implemented in a Python script. To conform to best practices for scientific computing which recommend the use of build tools to automate workflows and to reuse code instead of rewriting it, the authors reimplemented this workflow within a workflow management system. To select a workflow management system, a rapid survey of available systems was undertaken, and candidates were shortlisted: Snakemake, cwltool, Toil, and Nextflow. Each candidate was evaluated by quickly prototyping a subset of the RiboViz workflow, and Nextflow was chosen. The selection process took 10 person-days, a small cost for the assurance that Nextflow satisfied the authors’ requirements. The use of prototyping can offer a low-cost way of making a more informed selection of software to use within projects, rather than relying solely upon reviews and recommendations by others.
APA, Harvard, Vancouver, ISO, and other styles
11

Weigel, Tobias, Ulrich Schwardmann, Jens Klump, Sofiane Bendoukha, and Robert Quick. "Making Data and Workflows Findable for Machines." Data Intelligence 2, no. 1-2 (January 2020): 40–46. http://dx.doi.org/10.1162/dint_a_00026.

Full text
Abstract:
Research data currently face a huge increase of data objects with an increasing variety of types (data types, formats) and variety of workflows by which objects need to be managed across their lifecycle by data infrastructures. Researchers desire to shorten the workflows from data generation to analysis and publication, and the full workflow needs to become transparent to multiple stakeholders, including research administrators and funders. This poses challenges for research infrastructures and user-oriented data services in terms of not only making data and workflows findable, accessible, interoperable and reusable, but also doing so in a way that leverages machine support for better efficiency. One primary need to be addressed is that of findability, and achieving better findability has benefits for other aspects of data and workflow management. In this article, we describe how machine capabilities can be extended to make workflows more findable, in particular by leveraging the Digital Object Architecture, common object operations and machine learning techniques.
APA, Harvard, Vancouver, ISO, and other styles
12

Romano, P., G. Bertolini, F. De Paoli, M. Fattore, D. Marra, G. Mauri, E. Merelli, I. Porro, S. Scaglione, and L. Milanesi. "Network integration of data and analysis of oncology interest." Journal of Integrative Bioinformatics 3, no. 1 (June 1, 2006): 45–55. http://dx.doi.org/10.1515/jib-2006-21.

Full text
Abstract:
Summary The Human Genome Project has deeply transformed biology and the field has since then expanded to the management, processing, analysis and visualization of large quantities of data from genomics, proteomics, medicinal chemistry and drug screening. This huge amount of data and the heterogeneity of software tools that are used implies the adoption on a very large scale of new, flexible tools that can enable researchers to integrate data and analysis on the network. ICT technology standards and tools, like Web Services and related languages, and workflow management systems, can support the creation and deployment of such systems. While a number of Web Services are appearing and personal workflow management systems are also being more and more offered to researchers, a reference portal enabling the vast majority of unskilled researchers to take profit from these new technologies is still lacking. In this paper, we introduce the rationale for the creation of such a portal and present the architecture and some preliminary results for the development of a portal for the enactment of workflows of interest in oncology.
APA, Harvard, Vancouver, ISO, and other styles
13

Stoudt, Sara, Váleri N. Vásquez, and Ciera C. Martinez. "Principles for data analysis workflows." PLOS Computational Biology 17, no. 3 (March 18, 2021): e1008770. http://dx.doi.org/10.1371/journal.pcbi.1008770.

Full text
Abstract:
A systematic and reproducible “workflow”—the process that moves a scientific investigation from raw data to coherent research question to insightful contribution—should be a fundamental part of academic data-intensive research practice. In this paper, we elaborate basic principles of a reproducible data analysis workflow by defining 3 phases: the Explore, Refine, and Produce Phases. Each phase is roughly centered around the audience to whom research decisions, methodologies, and results are being immediately communicated. Importantly, each phase can also give rise to a number of research products beyond traditional academic publications. Where relevant, we draw analogies between design principles and established practice in software development. The guidance provided here is not intended to be a strict rulebook; rather, the suggestions for practices and tools to advance reproducible, sound data-intensive analysis may furnish support for both students new to research and current researchers who are new to data-intensive work.
APA, Harvard, Vancouver, ISO, and other styles
14

Dallmeier-Tiessen, Sünje, Varsha Khodiyar, Fiona Murphy, Amy Nurnberger, Lisa Raymond, and Angus Whyte. "Connecting Data Publication to the Research Workflow: A Preliminary Analysis." International Journal of Digital Curation 12, no. 1 (September 16, 2017): 88–105. http://dx.doi.org/10.2218/ijdc.v12i1.533.

Full text
Abstract:
The data curation community has long encouraged researchers to document collected research data during active stages of the research workflow, to provide robust metadata earlier, and support research data publication and preservation. Data documentation with robust metadata is one of a number of steps in effective data publication. Data publication is the process of making digital research objects ‘FAIR’, i.e. findable, accessible, interoperable, and reusable; attributes increasingly expected by research communities, funders and society. Research data publishing workflows are the means to that end. Currently, however, much published research data remains inconsistently and inadequately documented by researchers. Documentation of data closer in time to data collection would help mitigate the high cost that repositories associate with the ingest process. More effective data publication and sharing should in principle result from early interactions between researchers and their selected data repository. This paper describes a short study undertaken by members of the Research Data Alliance (RDA) and World Data System (WDS) working group on Publishing Data Workflows. We present a collection of recent examples of data publication workflows that connect data repositories and publishing platforms with research activity ‘upstream’ of the ingest process. We re-articulate previous recommendations of the working group, to account for the varied upstream service components and platforms that support the flow of contextual and provenance information downstream. These workflows should be open and loosely coupled to support interoperability, including with preservation and publication environments. Our recommendations aim to stimulate further work on researchers’ views of data publishing and the extent to which available services and infrastructure facilitate the publication of FAIR data. We also aim to stimulate further dialogue about, and definition of, the roles and responsibilities of research data services and platform providers for the ‘FAIRness’ of research data publication workflows themselves.
APA, Harvard, Vancouver, ISO, and other styles
15

Emami Khoonsari, Payam, Pablo Moreno, Sven Bergmann, Joachim Burman, Marco Capuccini, Matteo Carone, Marta Cascante, et al. "Interoperable and scalable data analysis with microservices: applications in metabolomics." Bioinformatics 35, no. 19 (March 9, 2019): 3752–60. http://dx.doi.org/10.1093/bioinformatics/btz160.

Full text
Abstract:
Abstract Motivation Developing a robust and performant data analysis workflow that integrates all necessary components whilst still being able to scale over multiple compute nodes is a challenging task. We introduce a generic method based on the microservice architecture, where software tools are encapsulated as Docker containers that can be connected into scientific workflows and executed using the Kubernetes container orchestrator. Results We developed a Virtual Research Environment (VRE) which facilitates rapid integration of new tools and developing scalable and interoperable workflows for performing metabolomics data analysis. The environment can be launched on-demand on cloud resources and desktop computers. IT-expertise requirements on the user side are kept to a minimum, and workflows can be re-used effortlessly by any novice user. We validate our method in the field of metabolomics on two mass spectrometry, one nuclear magnetic resonance spectroscopy and one fluxomics study. We showed that the method scales dynamically with increasing availability of computational resources. We demonstrated that the method facilitates interoperability using integration of the major software suites resulting in a turn-key workflow encompassing all steps for mass-spectrometry-based metabolomics including preprocessing, statistics and identification. Microservices is a generic methodology that can serve any scientific discipline and opens up for new types of large-scale integrative science. Availability and implementation The PhenoMeNal consortium maintains a web portal (https://portal.phenomenal-h2020.eu) providing a GUI for launching the Virtual Research Environment. The GitHub repository https://github.com/phnmnl/ hosts the source code of all projects. Supplementary information Supplementary data are available at Bioinformatics online.
APA, Harvard, Vancouver, ISO, and other styles
16

Khan, Fakhri Alam, Sardar Hussain, Ivan Janciak, and Peter Brezany. "Towards Next Generation Provenance Systems for e-Science." International Journal of Information System Modeling and Design 2, no. 3 (July 2011): 24–48. http://dx.doi.org/10.4018/jismd.2011070102.

Full text
Abstract:
e-Science helps scientists to automate scientific discovery processes and experiments, and promote collaboration across organizational boundaries and disciplines. These experiments involve data discovery, knowledge discovery, integration, linking, and analysis through different software tools and activities. Scientific workflow is one technique through which such activities and processes can be interlinked, automated, and ultimately shared amongst the collaborating scientists. Workflows are realized by the workflow enactment engine, which interprets the process definition and interacts with the workflow participants. Since workflows are typically executed on a shared and distributed infrastructure, the information on the workflow activities, data processed, and results generated (also known as provenance), needs to be recorded in order to be reproduced and reused. A range of solutions and techniques have been suggested for the provenance of data collection and analysis; however, these are predominantly workflow enactment engine and domain dependent. This paper includes taxonomy of existing provenance techniques and a novel solution named VePS (The Vienna e-Science Provenance System) for e-Science provenance collection.
APA, Harvard, Vancouver, ISO, and other styles
17

Graham, Andrew, Stephan Gmur, and Travis Scott. "Improved SCAT data workflow to increase efficiency and data accuracy." International Oil Spill Conference Proceedings 2017, no. 1 (May 1, 2017): 2674–93. http://dx.doi.org/10.7901/2169-3358-2017.1.2674.

Full text
Abstract:
ABSTRACT #2017-302 Traditional Shoreline Cleanup Assessment Technique (SCAT) data workflows typically entail collecting data in the field using notebooks, handheld GPS units and digital cameras, transcribing these data onto paper forms, and then manually entering into a local database. Processed data are pushed to a SCAT geographical information system (GIS) specialist, ultimately providing exports as paper and electronic versions of maps, spreadsheets and reports. The multiple and sometimes iterative steps required can affect the dissemination of accurate and timely information to decision makers and compound the potential for introducing errors into the data. To improve this process a revised SCAT data workflow has been developed that decreases data processing steps and time requirements while increasing data accuracy in several facets of the process. The workflow involves using mobile data collection devices in the field to capture attribute data, photographs and geospatial data. These data are uploaded to a web-enabled database that allows field team members to complete, review and adjust their data, along with data manager approval before presentation to others in the response. For response personnel with internet access and proper login credentials, SCAT data, including photographs, reports and results can be searched for by attribute, time or location, and reviewed online in form view or on a web map. For traditional SCAT spatial analysis products, approved data can be exported and processed in a GIS as normal, but can also be returned to the web-enabled database to be viewed on a map or distributed via web mapping services (WMS) to other web GIS data viewers or common operating pictures (COPs). Field testing of the improved workflow shows decreased data processing time for data, a more robust yet streamlined quality assurance and quality control process (QA/QC), and easier more inclusive access to the data relative to traditional paper forms and data processing. While the improved workflow entails a steeper learning curve and a heavier reliance on technology than traditional SCAT workflows, the benefits are significant.
APA, Harvard, Vancouver, ISO, and other styles
18

Tang, Jing, Jianbo Fu, Yunxia Wang, Bo Li, Yinghong Li, Qingxia Yang, Xuejiao Cui, et al. "ANPELA: analysis and performance assessment of the label-free quantification workflow for metaproteomic studies." Briefings in Bioinformatics 21, no. 2 (January 15, 2019): 621–36. http://dx.doi.org/10.1093/bib/bby127.

Full text
Abstract:
Abstract Label-free quantification (LFQ) with a specific and sequentially integrated workflow of acquisition technique, quantification tool and processing method has emerged as the popular technique employed in metaproteomic research to provide a comprehensive landscape of the adaptive response of microbes to external stimuli and their interactions with other organisms or host cells. The performance of a specific LFQ workflow is highly dependent on the studied data. Hence, it is essential to discover the most appropriate one for a specific data set. However, it is challenging to perform such discovery due to the large number of possible workflows and the multifaceted nature of the evaluation criteria. Herein, a web server ANPELA (https://idrblab.org/anpela/) was developed and validated as the first tool enabling performance assessment of whole LFQ workflow (collective assessment by five well-established criteria with distinct underlying theories), and it enabled the identification of the optimal LFQ workflow(s) by a comprehensive performance ranking. ANPELA not only automatically detects the diverse formats of data generated by all quantification tools but also provides the most complete set of processing methods among the available web servers and stand-alone tools. Systematic validation using metaproteomic benchmarks revealed ANPELA’s capabilities in 1 discovering well-performing workflow(s), (2) enabling assessment from multiple perspectives and (3) validating LFQ accuracy using spiked proteins. ANPELA has a unique ability to evaluate the performance of whole LFQ workflow and enables the discovery of the optimal LFQs by the comprehensive performance ranking of all 560 workflows. Therefore, it has great potential for applications in metaproteomic and other studies requiring LFQ techniques, as many features are shared among proteomic studies.
APA, Harvard, Vancouver, ISO, and other styles
19

Lakaraju, Sandeep, Dianxiang Xu, and Yong Wang. "Analysis of Healthcare Workflows in Accordance with Access Control Policies." International Journal of Healthcare Information Systems and Informatics 11, no. 1 (January 2016): 1–20. http://dx.doi.org/10.4018/ijhisi.2016010101.

Full text
Abstract:
Healthcare information systems deal with sensitive data across complex workflows. They often allow various stakeholders from different environments to access data across organizational boundaries. This elevates the risk of exposing sensitive healthcare information to unauthorized personnel, leading ‘controlling access to resources' a major concern. To prevent unwanted access to sensitive information, healthcare organizations need to adopt effective workflows and access control mechanisms. Many healthcare organizations are not yet considering or do not know how to accommodate the ‘context' element as a crucial element in their workflows and access control policies. The authors envision the future of healthcare where ‘context' will be considered as a crucial element. They can accommodate context through a new element ‘environment' in workflows, and can accommodate context in policies through well-known attribute based access control mechanism (ABAC). This research mainly addresses these problems by proposing a model to integrate workflows and access control policies and thereby identifying workflow activities that are not being protected by access control policies and improving the workflow activities and/or existing access control policies using SARE (Subject, Action, Resource, and environment) elements.
APA, Harvard, Vancouver, ISO, and other styles
20

Ahn, Shinyoung, ByoungSeob Kim, Hyun-Hwa Choi, Seunghyub Jeon, Seungjo Bae, and Wan Choi. "Workflow-based Bio Data Analysis System for HPC." KIPS Transactions on Software and Data Engineering 2, no. 2 (February 28, 2013): 97–106. http://dx.doi.org/10.3745/ktsde.2013.2.2.097.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Missier, Paolo, Simon Woodman, Hugo Hiden, and Paul Watson. "Provenance and data differencing for workflow reproducibility analysis." Concurrency and Computation: Practice and Experience 28, no. 4 (April 30, 2013): 995–1015. http://dx.doi.org/10.1002/cpe.3035.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

Clarkson, C. R. "Production data analysis of unconventional gas wells: Workflow." International Journal of Coal Geology 109-110 (April 2013): 147–57. http://dx.doi.org/10.1016/j.coal.2012.11.016.

Full text
APA, Harvard, Vancouver, ISO, and other styles
23

Saleem, Hamza, and Muhammad Naveed. "SoK: Anatomy of Data Breaches." Proceedings on Privacy Enhancing Technologies 2020, no. 4 (October 1, 2020): 153–74. http://dx.doi.org/10.2478/popets-2020-0067.

Full text
Abstract:
AbstractWe systematize the knowledge on data breaches into concise step-by-step breach workflows and use them to describe the breach methods. We present the most plausible workflows for 10 famous data breaches. We use information from a variety of sources to develop our breach workflows, however, we emphasize that for many data breaches, information about crucial steps was absent. We researched such steps to develop complete breach workflows; as such, our workflows provide descriptions of data breaches that were previously unavailable. For generalizability, we present a general workflow of 50 data breaches from 2015. Based on our data breach analysis, we develop requirements that organizations need to meet to thwart data breaches. We describe what requirements are met by existing security technologies and propose future research directions to thwart data breaches.
APA, Harvard, Vancouver, ISO, and other styles
24

Bhardwaj, Vivek, Steffen Heyne, Katarzyna Sikora, Leily Rabbani, Michael Rauer, Fabian Kilpert, Andreas S. Richter, Devon P. Ryan, and Thomas Manke. "snakePipes: facilitating flexible, scalable and integrative epigenomic analysis." Bioinformatics 35, no. 22 (May 27, 2019): 4757–59. http://dx.doi.org/10.1093/bioinformatics/btz436.

Full text
Abstract:
Abstract Summary Due to the rapidly increasing scale and diversity of epigenomic data, modular and scalable analysis workflows are of wide interest. Here we present snakePipes, a workflow package for processing and downstream analysis of data from common epigenomic assays: ChIP-seq, RNA-seq, Bisulfite-seq, ATAC-seq, Hi-C and single-cell RNA-seq. snakePipes enables users to assemble variants of each workflow and to easily install and upgrade the underlying tools, via its simple command-line wrappers and yaml files. Availability and implementation snakePipes can be installed via conda: `conda install -c mpi-ie -c bioconda -c conda-forge snakePipes’. Source code (https://github.com/maxplanck-ie/snakepipes) and documentation (https://snakepipes.readthedocs.io/en/latest/) are available online. Supplementary information Supplementary data are available at Bioinformatics online.
APA, Harvard, Vancouver, ISO, and other styles
25

Silva Junior, Daniel, Esther Pacitti, Aline Paes, and Daniel de Oliveira. "Provenance-and machine learning-based recommendation of parameter values in scientific workflows." PeerJ Computer Science 7 (July 5, 2021): e606. http://dx.doi.org/10.7717/peerj-cs.606.

Full text
Abstract:
Scientific Workflows (SWfs) have revolutionized how scientists in various domains of science conduct their experiments. The management of SWfs is performed by complex tools that provide support for workflow composition, monitoring, execution, capturing, and storage of the data generated during execution. In some cases, they also provide components to ease the visualization and analysis of the generated data. During the workflow’s composition phase, programs must be selected to perform the activities defined in the workflow specification. These programs often require additional parameters that serve to adjust the program’s behavior according to the experiment’s goals. Consequently, workflows commonly have many parameters to be manually configured, encompassing even more than one hundred in many cases. Wrongly parameters’ values choosing can lead to crash workflows executions or provide undesired results. As the execution of data- and compute-intensive workflows is commonly performed in a high-performance computing environment e.g., (a cluster, a supercomputer, or a public cloud), an unsuccessful execution configures a waste of time and resources. In this article, we present FReeP—Feature Recommender from Preferences, a parameter value recommendation method that is designed to suggest values for workflow parameters, taking into account past user preferences. FReeP is based on Machine Learning techniques, particularly in Preference Learning. FReeP is composed of three algorithms, where two of them aim at recommending the value for one parameter at a time, and the third makes recommendations for n parameters at once. The experimental results obtained with provenance data from two broadly used workflows showed FReeP usefulness in the recommendation of values for one parameter. Furthermore, the results indicate the potential of FReeP to recommend values for n parameters in scientific workflows.
APA, Harvard, Vancouver, ISO, and other styles
26

Oh, Sehyun, Ludwig Geistlinger, Marcel Ramos, Martin Morgan, Levi Waldron, and Markus Riester. "Reliable Analysis of Clinical Tumor-Only Whole-Exome Sequencing Data." JCO Clinical Cancer Informatics, no. 4 (September 2020): 321–35. http://dx.doi.org/10.1200/cci.19.00130.

Full text
Abstract:
PURPOSE Allele-specific copy number alteration (CNA) analysis is essential to study the functional impact of single-nucleotide variants (SNVs) and the process of tumorigenesis. However, controversy over whether it can be performed with sufficient accuracy in data without matched normal profiles and a lack of open-source implementations have limited its application in clinical research and diagnosis. METHODS We benchmark allele-specific CNA analysis performance of whole-exome sequencing (WES) data against gold standard whole-genome SNP6 microarray data and against WES data sets with matched normal samples. We provide a workflow based on the open-source PureCN R/Bioconductor package in conjunction with widely used variant-calling and copy number segmentation algorithms for allele-specific CNA analysis from WES without matched normals. This workflow further classifies SNVs by somatic status and then uses this information to infer somatic mutational signatures and tumor mutational burden (TMB). RESULTS Application of our workflow to tumor-only WES data produces tumor purity and ploidy estimates that are highly concordant with estimates from SNP6 microarray data and matched normal WES data. The presence of cancer type–specific somatic mutational signatures was inferred with high accuracy. We also demonstrate high concordance of TMB between our tumor-only workflow and matched normal pipelines. CONCLUSION The proposed workflow provides, to our knowledge, the only open-source option with demonstrated high accuracy for comprehensive allele-specific CNA analysis and SNV classification of tumor-only WES. An implementation of the workflow is available on the Terra Cloud platform of the Broad Institute (Cambridge, MA).
APA, Harvard, Vancouver, ISO, and other styles
27

Rodríguez, Diego, Rokas Mačiulaitis, Jan Okraska, and Tibor Šimko. "Hybrid analysis pipelines in the REANA reproducible analysis platform." EPJ Web of Conferences 245 (2020): 06041. http://dx.doi.org/10.1051/epjconf/202024506041.

Full text
Abstract:
We introduce the feasibility of running hybrid analysis pipelines in the REANA reproducible analysis platform. The REANA platform allows researchers to specify declarative computational workflow steps describing the analysis process and to execute analysis workload on remote containerised compute clouds. We have designed an abstract job controller component permitting to execute different parts of the analysis workflow on different compute backends, such as HTCondor, Kubernetes and SLURM. We have prototyped the designed solution including the job execution, job monitoring, and input/output file staging mechanism between the various compute backends. We have tested the prototype using several particle physics model analyses. The present work introduces support for hybrid analysis workflows in the REANA reproducible analysis platform and paves the way towards studying underlying performance advantages and challenges associated with hybrid analysis patterns in complex particle physics data analyses.
APA, Harvard, Vancouver, ISO, and other styles
28

Ha, Thang N., Kurt J. Marfurt, Bradley C. Wallet, and Bryce Hutchinson. "Pitfalls and implementation of data conditioning, attribute analysis, and self-organizing maps to 2D data: Application to the Exmouth Plateau, North Carnarvon Basin, Australia." Interpretation 7, no. 3 (August 1, 2019): SG23—SG42. http://dx.doi.org/10.1190/int-2018-0248.1.

Full text
Abstract:
Recent developments in attribute analysis and machine learning have significantly enhanced interpretation workflows of 3D seismic surveys. Nevertheless, even in 2018, many sedimentary basins are only covered by grids of 2D seismic lines. These 2D surveys are suitable for regional feature mapping and often identify targets in areas not covered by 3D surveys. With continuing pressure to cut costs in the hydrocarbon industry, it is crucial to extract as much information as possible from these 2D surveys. Unfortunately, much if not most modern interpretation software packages are designed to work exclusively with 3D data. To determine if we can apply 3D volumetric interpretation workflows to grids of 2D seismic lines, we have applied data conditioning, attribute analysis, and a machine-learning technique called self-organizing maps to the 2D data acquired over the Exmouth Plateau, North Carnarvon Basin, Australia. We find that these workflows allow us to significantly improve image quality, interpret regional geologic features, identify local anomalies, and perform seismic facies analysis. However, these workflows are not without pitfalls. We need to be careful in choosing the order of filters in the data conditioning workflow and be aware of reflector misties at line intersections. Vector data, such as reflector convergence, need to be extracted and then mapped component-by-component before combining the results. We are also unable to perform attribute extraction along a surface or geobody extraction for 2D data in our commercial interpretation software package. To address this issue, we devise a point-by-point attribute extraction workaround to overcome the incompatibility between 3D interpretation workflow and 2D data.
APA, Harvard, Vancouver, ISO, and other styles
29

Sinaci, A. Anil, Francisco J. Núñez-Benjumea, Mert Gencturk, Malte-Levin Jauer, Thomas Deserno, Catherine Chronaki, Giorgio Cangioli, et al. "From Raw Data to FAIR Data: The FAIRification Workflow for Health Research." Methods of Information in Medicine 59, S 01 (June 2020): e21-e32. http://dx.doi.org/10.1055/s-0040-1713684.

Full text
Abstract:
Abstract Background FAIR (findability, accessibility, interoperability, and reusability) guiding principles seek the reuse of data and other digital research input, output, and objects (algorithms, tools, and workflows that led to that data) making them findable, accessible, interoperable, and reusable. GO FAIR - a bottom-up, stakeholder driven and self-governed initiative - defined a seven-step FAIRification process focusing on data, but also indicating the required work for metadata. This FAIRification process aims at addressing the translation of raw datasets into FAIR datasets in a general way, without considering specific requirements and challenges that may arise when dealing with some particular types of data. Objectives This scientific contribution addresses the architecture design of an open technological solution built upon the FAIRification process proposed by “GO FAIR” which addresses the identified gaps that such process has when dealing with health datasets. Methods A common FAIRification workflow was developed by applying restrictions on existing steps and introducing new steps for specific requirements of health data. These requirements have been elicited after analyzing the FAIRification workflow from different perspectives: technical barriers, ethical implications, and legal framework. This analysis identified gaps when applying the FAIRification process proposed by GO FAIR to health research data management in terms of data curation, validation, deidentification, versioning, and indexing. Results A technological architecture based on the use of Health Level Seven International (HL7) FHIR (fast health care interoperability resources) resources is proposed to support the revised FAIRification workflow. Discussion Research funding agencies all over the world increasingly demand the application of the FAIR guiding principles to health research output. Existing tools do not fully address the identified needs for health data management. Therefore, researchers may benefit in the coming years from a common framework that supports the proposed FAIRification workflow applied to health datasets. Conclusion Routine health care datasets or data resulting from health research can be FAIRified, shared and reused within the health research community following the proposed FAIRification workflow and implementing technical architecture.
APA, Harvard, Vancouver, ISO, and other styles
30

Wu, Lei, Ran Ding, Zhaohong Jia, and Xuejun Li. "Cost-Effective Resource Provisioning for Real-Time Workflow in Cloud." Complexity 2020 (March 30, 2020): 1–15. http://dx.doi.org/10.1155/2020/1467274.

Full text
Abstract:
In the era of big data, mining and analysis of the enormous amount of data has been widely used to support decision-making. This complex process including huge-volume data collecting, storage, transmission, and analysis could be modeled as workflow. Meanwhile, cloud environment provides sufficient computing and storage resources for big data management and analytics. Due to the clouds providing the pay-as-you-go pricing scheme, executing a workflow in clouds should pay for the provisioned resources. Thus, cost-effective resource provisioning for workflow in clouds is still a critical challenge. Also, the responses of the complex data management process are usually required to be real-time. Therefore, deadline is the most crucial constraint for workflow execution. In order to address the challenge of cost-effective resource provisioning while meeting the real-time requirements of workflow execution, a resource provisioning strategy based on dynamic programming is proposed to achieve cost-effectiveness of workflow execution in clouds and a critical-path based workflow partition algorithm is presented to guarantee that the workflow can be completed before deadline. Our approach is evaluated by simulation experiments with real-time workflows of different sizes and different structures. The results demonstrate that our algorithm outperforms the existing classical algorithms.
APA, Harvard, Vancouver, ISO, and other styles
31

Toelle, B., and J. Lingley. "A METHOD FOR CONDUCTING A 'COMPLETE PROCESS' WORKFLOW ANALYSIS." APPEA Journal 40, no. 1 (2000): 596. http://dx.doi.org/10.1071/aj99039.

Full text
Abstract:
Many companies within the exploration and production industry have recently begun performing workflow studies with in-house personnel, or in collaboration with consulting groups that have the required expertise. While in-house studies often yield good results, outside consultants specialising in these types of studies can often recognise a broader range of workflow-related issues. There are many benefits companies realise from conducting these studies. Complete and periodic reviews of the E&P processes being conducted allow companies to determine whether or not they are using the 'State of the Science' methodologies needed to maintain a competitive lead. Information obtained during a workflow study helps a company's management to develop or adjust existing plans for effectively combining existing personnel, technology and data.Workflow studies may be approached in a number of ways. While some address specific portions of a company's workflow, such as its data flow, others seek rapid benefits, or 'quick hits'. The various types of workflow studies all have a place in the process re-engineering arena. However, one type, the 'complete process workflow analysis', offers companies the opportunity to solve the greatest number of process oriented issues.This type of study follows a specific pattern and starts by identifying main business drivers and objectives. Additional steps include the review of existing 'planned workflows', determining the actual workflow, analysis and issue identification, developing a 'recommended workflow', and educating company personnel in the new 'recommended' process. It is this 'complete process workflow analysis' that is discussed in this paper.
APA, Harvard, Vancouver, ISO, and other styles
32

Pakdil, Mete Ercan, and Rahmi Nurhan Çelik. "Serverless Geospatial Data Processing Workflow System Design." ISPRS International Journal of Geo-Information 11, no. 1 (December 30, 2021): 20. http://dx.doi.org/10.3390/ijgi11010020.

Full text
Abstract:
Geospatial data and related technologies have become an increasingly important aspect of data analysis processes, with their prominent role in most of them. Serverless paradigm have become the most popular and frequently used technology within cloud computing. This paper reviews the serverless paradigm and examines how it could be leveraged for geospatial data processes by using open standards in the geospatial community. We propose a system design and architecture to handle complex geospatial data processing jobs with minimum human intervention and resource consumption using serverless technologies. In order to define and execute workflows in the system, we also propose new models for both workflow and task definitions models. Moreover, the proposed system has new Open Geospatial Consortium (OGC) Application Programming Interface (API) Processes specification-based web services to provide interoperability with other geospatial applications with the anticipation that it will be more commonly used in the future. We implemented the proposed system on one of the public cloud providers as a proof of concept and evaluated it with sample geospatial workflows and cloud architecture best practices.
APA, Harvard, Vancouver, ISO, and other styles
33

Simopoulos, Caitlin M. A., Zhibin Ning, Xu Zhang, Leyuan Li, Krystal Walker, Mathieu Lavallée-Adam, and Daniel Figeys. "pepFunk: a tool for peptide-centric functional analysis of metaproteomic human gut microbiome studies." Bioinformatics 36, no. 14 (May 5, 2020): 4171–79. http://dx.doi.org/10.1093/bioinformatics/btaa289.

Full text
Abstract:
Abstract Motivation Enzymatic digestion of proteins before mass spectrometry analysis is a key process in metaproteomic workflows. Canonical metaproteomic data processing pipelines typically involve matching spectra produced by the mass spectrometer to a theoretical spectra database, followed by matching the identified peptides back to parent-proteins. However, the nature of enzymatic digestion produces peptides that can be found in multiple proteins due to conservation or chance, presenting difficulties with protein and functional assignment. Results To combat this challenge, we developed pepFunk, a peptide-centric metaproteomic workflow focused on the analysis of human gut microbiome samples. Our workflow includes a curated peptide database annotated with Kyoto Encyclopedia of Genes and Genomes (KEGG) terms and a gene set variation analysis-inspired pathway enrichment adapted for peptide-level data. Analysis using our peptide-centric workflow is fast and highly correlated to a protein-centric analysis, and can identify more enriched KEGG pathways than analysis using protein-level data. Our workflow is open source and available as a web application or source code to be run locally. Availability and implementation pepFunk is available online as a web application at https://shiny.imetalab.ca/pepFunk/ with open-source code available from https://github.com/northomics/pepFunk. Contact dfigeys@uottawa.ca Supplementary information Supplementary data are available at Bioinformatics online.
APA, Harvard, Vancouver, ISO, and other styles
34

Moonsamy, Darisia, and Nikki Gentle. "SASCRiP: A Python workflow for preprocessing UMI count-based scRNA-seq data." F1000Research 11 (February 15, 2022): 190. http://dx.doi.org/10.12688/f1000research.75243.1.

Full text
Abstract:
In order to reduce the impact of technical variation inherent in single-cell RNA sequencing (scRNA-seq) technologies on biological interpretation of experiments, rigorous preprocessing and quality control is required to transform raw sequencing reads into high-quality, gene and transcript counts. While hundreds of tools have been developed for this purpose, the vast majority of the most widely used tools are built for the R software environment. With an increasing number of new tools now being developed using Python, it is necessary to develop integrative workflows that leverage tools from both platforms. We have therefore developed, SASCRiP (Sequencing Analysis of Single-Cell RNA in Python), a modular single-cell preprocessing workflow that integrates functionality from existing, widely used R and Python packages, and additional custom features and visualizations, to enable preprocessing of scRNA-seq data derived from technologies that use unique molecular identifier (UMI) sequences in a single Python analysis workflow. We describe the utility of SASCRiP using datasets derived from peripheral blood mononuclear cells sequenced using droplet-based, 3′-end sequencing technology. We highlight SASCRiP’s diagnostic visualizations and fully customizable functions, and demonstrate how SASCRiP provides a highly flexible, integrative Python workflow for preparing unprocessed UMI count-based scRNA-seq data for subsequent downstream analyses. SASCRiP is freely available through PyPi or from the GitHub page.
APA, Harvard, Vancouver, ISO, and other styles
35

Zhang, Bin, Le Yu, Yunbo Feng, Lijun Liu, and Shuai Zhao. "Application of Workflow Technology for Big Data Analysis Service." Applied Sciences 8, no. 4 (April 9, 2018): 591. http://dx.doi.org/10.3390/app8040591.

Full text
APA, Harvard, Vancouver, ISO, and other styles
36

Weigelt, Karin, Christoph Moehle, Thomas Stempfl, Bernhard Weber, and Thomas Langmann. "An integrated workflow for analysis of ChIP-chip data." BioTechniques 45, no. 2 (August 2008): 131–40. http://dx.doi.org/10.2144/000112819.

Full text
APA, Harvard, Vancouver, ISO, and other styles
37

Yang, In Seok, and Sangwoo Kim. "Analysis of Whole Transcriptome Sequencing Data: Workflow and Software." Genomics & Informatics 13, no. 4 (2015): 119. http://dx.doi.org/10.5808/gi.2015.13.4.119.

Full text
APA, Harvard, Vancouver, ISO, and other styles
38

Pendarvis, Ken, Ranjit Kumar, Shane C. Burgess, and Bindu Nanduri. "An automated proteomic data analysis workflow for mass spectrometry." BMC Bioinformatics 10, Suppl 11 (2009): S17. http://dx.doi.org/10.1186/1471-2105-10-s11-s17.

Full text
APA, Harvard, Vancouver, ISO, and other styles
39

Bartoněk, Dalibor, Jirí Bureš, and Irena Opatřilová. "Workflow for Analysis of Enormous Amounts of Geographical Data." Advanced Science Letters 21, no. 12 (December 1, 2015): 3680–83. http://dx.doi.org/10.1166/asl.2015.6540.

Full text
APA, Harvard, Vancouver, ISO, and other styles
40

Trost, Nils, Eugen Rempel, Olga Ermakova, Srividya Tamirisa, Letiția Pârcălăbescu, Michael Boutros, Jan U. Lohmann, and Ingrid Lohmann. "WEADE: A workflow for enrichment analysis and data exploration." PLOS ONE 13, no. 9 (September 28, 2018): e0204016. http://dx.doi.org/10.1371/journal.pone.0204016.

Full text
APA, Harvard, Vancouver, ISO, and other styles
41

Tchagna Kouanou, Aurelle, Daniel Tchiotsop, Romanic Kengne, Djoufack Tansaa Zephirin, Ngo Mouelas Adele Armele, and René Tchinda. "An optimal big data workflow for biomedical image analysis." Informatics in Medicine Unlocked 11 (2018): 68–74. http://dx.doi.org/10.1016/j.imu.2018.05.001.

Full text
APA, Harvard, Vancouver, ISO, and other styles
42

Deelman, Ewa, Christopher Carothers, Anirban Mandal, Brian Tierney, Jeffrey S. Vetter, Ilya Baldin, Claris Castillo, et al. "PANORAMA: An approach to performance modeling and diagnosis of extreme-scale workflows." International Journal of High Performance Computing Applications 31, no. 1 (July 27, 2016): 4–18. http://dx.doi.org/10.1177/1094342015594515.

Full text
Abstract:
Computational science is well established as the third pillar of scientific discovery and is on par with experimentation and theory. However, as we move closer toward the ability to execute exascale calculations and process the ensuing extreme-scale amounts of data produced by both experiments and computations alike, the complexity of managing the compute and data analysis tasks has grown beyond the capabilities of domain scientists. Thus, workflow management systems are absolutely necessary to ensure current and future scientific discoveries. A key research question for these workflow management systems concerns the performance optimization of complex calculation and data analysis tasks. The central contribution of this article is a description of the PANORAMA approach for modeling and diagnosing the run-time performance of complex scientific workflows. This approach integrates extreme-scale systems testbed experimentation, structured analytical modeling, and parallel systems simulation into a comprehensive workflow framework called Pegasus for understanding and improving the overall performance of complex scientific workflows.
APA, Harvard, Vancouver, ISO, and other styles
43

Wu, Danny T. Y., Lindsey Barrick, Mustafa Ozkaynak, Katherine Blondon, and Kai Zheng. "Principles for Designing and Developing a Workflow Monitoring Tool to Enable and Enhance Clinical Workflow Automation." Applied Clinical Informatics 13, no. 01 (January 2022): 132–38. http://dx.doi.org/10.1055/s-0041-1741480.

Full text
Abstract:
Abstract Background Automation of health care workflows has recently become a priority. This can be enabled and enhanced by a workflow monitoring tool (WMOT). Objectives We shared our experience in clinical workflow analysis via three cases studies in health care and summarized principles to design and develop such a WMOT. Methods The case studies were conducted in different clinical settings with distinct goals. Each study used at least two types of workflow data to create a more comprehensive picture of work processes and identify bottlenecks, as well as quantify them. The case studies were synthesized using a data science process model with focuses on data input, analysis methods, and findings. Results Three case studies were presented and synthesized to generate a system structure of a WMOT. When developing a WMOT, one needs to consider the following four aspects: (1) goal orientation, (2) comprehensive and resilient data collection, (3) integrated and extensible analysis, and (4) domain experts. Discussion We encourage researchers to investigate the design and implementation of WMOTs and use the tools to create best practices to enable workflow automation and improve workflow efficiency and care quality.
APA, Harvard, Vancouver, ISO, and other styles
44

Aditama, Redi, Zulfikar Achmad Tanjung, Widyartini Made Sudania, and Toni Liwang. "SMART-RDA: A Galaxy Workflow for RNA-Seq Data Analysis." KnE Life Sciences 3, no. 4 (March 27, 2017): 186. http://dx.doi.org/10.18502/kls.v3i4.703.

Full text
Abstract:
<p class="Els-Abstract-text">RNA-seq using the Next Generation Sequencing (NGS) approach is a common technology to analyze large-scale RNA transcript data for gene expression studies. However, an appropriate bioinformatics tool is needed to analyze a large amount of transcriptomes data from RNA-seq experiment. The aim of this study was to construct a system that can be easily applied to analyze RNA-seq data. RNA-seq analysis tool as SMART-RDA was constructed in this study. It is a computational workflow based on Galaxy framework to be used for analyzing RNA-seq raw data into gene expression information. This workflow was adapted from a well-known Tuxedo Protocol for RNA-seq analysis with some modifications. Expression value from each transcriptome was quantitatively stated as Fragments Per Kilobase of exon per Million fragments (FPKM). RNA-seq data of sterile and fertile oil palm (Pisifera) pollens derived from Sequence Read Archive (SRA) NCBI were used to test this workflow in local facility Galaxy server. The results showed that differentially gene expression in pollens might be responsible for sterile and fertile characteristics in palm oil Pisifera.</p><p><strong>Keywords:</strong> FPKM; Galaxy workflow; Gene expression; RNA sequencing.</p>
APA, Harvard, Vancouver, ISO, and other styles
45

Elmsheuser, Johannes, Alessandro Di Girolamo, Andrej Filipcic, Antonio Limosani, Markus Schulz, David Smith, Andrea Sciaba, and Andrea Valassi. "ATLAS Grid Workflow Performance Optimization." EPJ Web of Conferences 214 (2019): 03021. http://dx.doi.org/10.1051/epjconf/201921403021.

Full text
Abstract:
The CERN ATLAS experiment grid workflow system manages routinely 250 to 500 thousand concurrently running production and analysis jobs to process simulation and detector data. In total more than 370 PB of data is distributed over more than 150 sites in the WLCG. At this scale small improvements in the software and computing performance and workflows can lead tosignificant resource usage gains. ATLAS is reviewing together with CERN IT experts several typical simulation and data processing workloads for potential performance improvements in terms of memory and CPU usage, disk and network I/O. All ATLASproduction and analysis grid jobs are instrumented to collect many performance metrics for detailed statistical studies using modern data analytics tools like ElasticSearch and Kibana. This presentation will review and explain the performance gains of several ATLAS simulation and data processing workflows and present analytics studies of the ATLAS grid workflows.
APA, Harvard, Vancouver, ISO, and other styles
46

Talia, Domenico. "Workflow Systems for Science: Concepts and Tools." ISRN Software Engineering 2013 (January 8, 2013): 1–15. http://dx.doi.org/10.1155/2013/404525.

Full text
Abstract:
The wide availability of high-performance computing systems, Grids and Clouds, allowed scientists and engineers to implement more and more complex applications to access and process large data repositories and run scientific experiments in silico on distributed computing platforms. Most of these applications are designed as workflows that include data analysis, scientific computation methods, and complex simulation techniques. Scientific applications require tools and high-level mechanisms for designing and executing complex workflows. For this reason, in the past years, many efforts have been devoted towards the development of distributed workflow management systems for scientific applications. This paper discusses basic concepts of scientific workflows and presents workflow system tools and frameworks used today for the implementation of application in science and engineering on high-performance computers and distributed systems. In particular, the paper reports on a selection of workflow systems largely used for solving scientific problems and discusses some open issues and research challenges in the area.
APA, Harvard, Vancouver, ISO, and other styles
47

Krachunov, Milko, Ognyan Kulev, Valeriya Simeonova, Maria Nisheva, and Dimitar Vassilev. "Manageable Workflows for Processing Parallel Sequencing Data." Serdica Journal of Computing 8, no. 1 (February 2, 2015): 1–14. http://dx.doi.org/10.55630/sjc.2014.8.1-14.

Full text
Abstract:
Data analysis after parallel sequencing is a process that uses combinations of software tools that is often subject to experimentation and on-the-fly substitution, with the necessary file conversion. This article presents a developing system for creating and managing workflows aiding the tasks one encounters after parallel sequences, particularly in the area of metagenomics. The semantics, description language and software implementation aim to allow the creation of flexible, configurable workflows that are suitable for sharing and are easy to manipulate through software or by hand. The execution system design provides user-defined operations and interchangeability between an operation and a workflow. This allows significant extensibility, which can be further complemented with distributed computing and remote management interfaces.
APA, Harvard, Vancouver, ISO, and other styles
48

Munir, Kamran, Saad Liaquat Kiani, Khawar Hasham, Richard McClatchey, Andrew Branson, and Jetendr Shamdasani. "Provision of an integrated data analysis platform for computational neuroscience experiments." Journal of Systems and Information Technology 16, no. 3 (August 5, 2014): 150–69. http://dx.doi.org/10.1108/jsit-01-2014-0004.

Full text
Abstract:
Purpose – The purpose of this paper is to provide an integrated analysis base to facilitate computational neuroscience experiments, following a user-led approach to provide access to the integrated neuroscience data and to enable the analyses demanded by the biomedical research community. Design/methodology/approach – The design and development of the N4U analysis base and related information services addresses the existing research and practical challenges by offering an integrated medical data analysis environment with the necessary building blocks for neuroscientists to optimally exploit neuroscience workflows, large image data sets and algorithms to conduct analyses. Findings – The provision of an integrated e-science environment of computational neuroimaging can enhance the prospects, speed and utility of the data analysis process for neurodegenerative diseases. Originality/value – The N4U analysis base enables conducting biomedical data analyses by indexing and interlinking the neuroimaging and clinical study data sets stored on the grid infrastructure, algorithms and scientific workflow definitions along with their associated provenance information.
APA, Harvard, Vancouver, ISO, and other styles
49

Callahan, Ben J., Kris Sankaran, Julia A. Fukuyama, Paul J. McMurdie, and Susan P. Holmes. "Bioconductor workflow for microbiome data analysis: from raw reads to community analyses." F1000Research 5 (June 24, 2016): 1492. http://dx.doi.org/10.12688/f1000research.8986.1.

Full text
Abstract:
High-throughput sequencing of PCR-amplified taxonomic markers (like the 16S rRNA gene) has enabled a new level of analysis of complex bacterial communities known as microbiomes. Many tools exist to quantify and compare abundance levels or microbial composition of communities in different conditions. The sequencing reads have to be denoised and assigned to the closest taxa from a reference database. Common approaches use a notion of 97% similarity and normalize the data by subsampling to equalize library sizes. In this paper, we show that statistical models allow more accurate abundance estimates. By providing a complete workflow in R, we enable the user to do sophisticated downstream statistical analyses, including both parameteric and nonparametric methods. We provide examples of using the R packages dada2, phyloseq, DESeq2, ggplot2 and vegan to filter, visualize and test microbiome data. We also provide examples of supervised analyses using random forests, partial least squares and linear models as well as nonparametric testing using community networks and the ggnetwork package.
APA, Harvard, Vancouver, ISO, and other styles
50

Callahan, Ben J., Kris Sankaran, Julia A. Fukuyama, Paul J. McMurdie, and Susan P. Holmes. "Bioconductor Workflow for Microbiome Data Analysis: from raw reads to community analyses." F1000Research 5 (November 2, 2016): 1492. http://dx.doi.org/10.12688/f1000research.8986.2.

Full text
Abstract:
High-throughput sequencing of PCR-amplified taxonomic markers (like the 16S rRNA gene) has enabled a new level of analysis of complex bacterial communities known as microbiomes. Many tools exist to quantify and compare abundance levels or OTU composition of communities in different conditions. The sequencing reads have to be denoised and assigned to the closest taxa from a reference database. Common approaches use a notion of 97% similarity and normalize the data by subsampling to equalize library sizes. In this paper, we show that statistical models allow more accurate abundance estimates. By providing a complete workflow in R, we enable the user to do sophisticated downstream statistical analyses, whether parametric or nonparametric. We provide examples of using the R packages dada2, phyloseq, DESeq2, ggplot2 and vegan to filter, visualize and test microbiome data. We also provide examples of supervised analyses using random forests and nonparametric testing using community networks and the ggnetwork package.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography