Journal articles on the topic 'Data and metadata structures'

To see the other types of publications on this topic, follow the link: Data and metadata structures.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Data and metadata structures.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Mincewicz, Wojciech. "Metadane – cichy zabójca prywatności." Studia Politologiczne 2019, no. 54 (November 20, 2019): 230–57. http://dx.doi.org/10.33896/spolit.2019.54.9.

Full text
Abstract:
The article has a deeper reflection on the issue of metadata, that is, data which are defined or describe other data. The theoretical layer extracted three types of metadata: descriptive, structural, and administrative. Descriptive metadata is used to find and identify key information that allows the location of an object. Structured metadata describes the internal structure of the object, but administrative metadata refers to the technical information, where information is provided for example about the time and how the file was created. The purpose of the publication is to provide theoretical knowledge as well as practical. The second part of the article depicts the concepts of graphic and text files, and simple self-defense techniques are indicated, which allow you to remove metadata before sharing the file. The supplementing of article is: analysis the ability to extract meta information by Fingerprinting Organizations with Collected Archives (FOCA), which is used to mechanizedly extract metadata reflection on what the metadata includes the email header.
APA, Harvard, Vancouver, ISO, and other styles
2

Fong, Joseph, Qing Li, and Shi-Ming Huang. "Universal Data Warehousing Based on a Meta-Data Modeling Approach." International Journal of Cooperative Information Systems 12, no. 03 (September 2003): 325–63. http://dx.doi.org/10.1142/s0218843003000772.

Full text
Abstract:
Data warehouse contains vast amount of data to support complex queries of various Decision Support Systems (DSSs). It needs to store materialized views of data, which must be available consistently and instantaneously. Using a frame metadata model, this paper presents an architecture of a universal data warehousing with different data models. The frame metadata model represents the metadata of a data warehouse, which structures an application domain into classes, and integrates schemas of heterogeneous databases by capturing their semantics. A star schema is derived from user requirements based on the integrated schema, catalogued in the metadata, which stores the schema of relational database (RDB) and object-oriented database (OODB). Data materialization between RDB and OODB is achieved by unloading source database into sequential file and reloading into target database, through which an object relational view can be defined so as to allow the users to obtain the same warehouse view in different data models simultaneously. We describe our procedures of building the relational view of star schema by multidimensional SQL query, and the object oriented view of the data warehouse by Online Analytical Processing (OLAP) through method call, derived from the integrated schema. To validate our work, an application prototype system has been developed in a product sales data warehousing domain based on this approach.
APA, Harvard, Vancouver, ISO, and other styles
3

Et.al, Nur Adila Azram. "Laboratory Instruments’ Produced Scientific Data Standardization through the Use of Metadata." Turkish Journal of Computer and Mathematics Education (TURCOMAT) 12, no. 3 (April 10, 2021): 2146–51. http://dx.doi.org/10.17762/turcomat.v12i3.1157.

Full text
Abstract:
The progression of scientific data from various laboratory instruments is increasing these days. As different laboratory instruments hold different structures and formats of data, it became a concern in the management and analysis of data because of the heterogeneity of data structure and format. This paper offered a metadata structure to standardize the laboratory instruments' -produced scientific data to attain a standard structure and format. This paper contains explanation regarding the methodology and the use of proposed metadata structure, before summarizing the implementation and its related result analysis. The proposed metadata structure extraction shows promising results based on conducted evaluation and validation.
APA, Harvard, Vancouver, ISO, and other styles
4

Wagner, Michael, Christin Henzen, and Ralph Müller-Pfefferkorn. "A Research Data Infrastructure Component for the Automated Metadata and Data Quality Extraction to Foster the Provision of FAIR Data in Earth System Sciences." AGILE: GIScience Series 2 (June 4, 2021): 1–7. http://dx.doi.org/10.5194/agile-giss-2-41-2021.

Full text
Abstract:
Abstract. Metadata management is core to support discovery and reuse of data products, and to allow for reproducibility of the research data in Earth System Sciences (ESS). Thus, ensuring acquisition and provision of meaningful and quality assured metadata should become an integral part of data-driven ESS projects.We propose an open-source tool for the automated metadata and data quality extraction to foster the provision of FAIR data (Findable, Accessible, Interoperable Reusable). By enabling researchers to automatically extract and reuse structured and standardized ESS-specific metadata, in particular quality information, in several components of a research data infrastructure, we support researchers along the research data life cycle.
APA, Harvard, Vancouver, ISO, and other styles
5

Hoarfrost, Adrienne, Nick Brown, C. Titus Brown, and Carol Arnosti. "Sequencing data discovery with MetaSeek." Bioinformatics 35, no. 22 (June 21, 2019): 4857–59. http://dx.doi.org/10.1093/bioinformatics/btz499.

Full text
Abstract:
Abstract Summary Sequencing data resources have increased exponentially in recent years, as has interest in large-scale meta-analyses of integrated next-generation sequencing datasets. However, curation of integrated datasets that match a user’s particular research priorities is currently a time-intensive and imprecise task. MetaSeek is a sequencing data discovery tool that enables users to flexibly search and filter on any metadata field to quickly find the sequencing datasets that meet their needs. MetaSeek automatically scrapes metadata from all publicly available datasets in the Sequence Read Archive, cleans and parses messy, user-provided metadata into a structured, standard-compliant database and predicts missing fields where possible. MetaSeek provides a web-based graphical user interface and interactive visualization dashboard, as well as a programmatic API to rapidly search, filter, visualize, save, share and download matching sequencing metadata. Availability and implementation The MetaSeek online interface is available at https://www.metaseek.cloud/. The MetaSeek database can also be accessed via API to programmatically search, filter and download all metadata. MetaSeek source code, metadata scrapers and documents are available at https://github.com/MetaSeek-Sequencing-Data-Discovery/metaseek/.
APA, Harvard, Vancouver, ISO, and other styles
6

Qin, Jian, Jeff Hemsley, and Sarah E. Bratt. "The structural shift and collaboration capacity in GenBank Networks: A longitudinal study." Quantitative Science Studies 3, no. 1 (2022): 174–93. http://dx.doi.org/10.1162/qss_a_00181.

Full text
Abstract:
Abstract Metadata in scientific data repositories such as GenBank contain links between data submissions and related publications. As a new data source for studying collaboration networks, metadata in data repositories compensate for the limitations of publication-based research on collaboration networks. This paper reports the findings from a GenBank metadata analytics project. We used network science methods to uncover the structures and dynamics of GenBank collaboration networks from 1992–2018. The longitudinality and large scale of this data collection allowed us to unravel the evolution history of collaboration networks and identify the trend of flattening network structures over time and optimal assortative mixing range for enhancing collaboration capacity. By incorporating metadata from the data production stage with the publication stage, we uncovered new characteristics of collaboration networks as well as developed new metrics for assessing the effectiveness of enablers of collaboration—scientific and technical human capital, cyberinfrastructure, and science policy.
APA, Harvard, Vancouver, ISO, and other styles
7

Jiang, L. Q., S. A. O'Connor, K. M. Arzayus, and A. R. Parsons. "A metadata template for ocean acidification data." Earth System Science Data 7, no. 1 (June 11, 2015): 117–25. http://dx.doi.org/10.5194/essd-7-117-2015.

Full text
Abstract:
Abstract. This paper defines the best practices for documenting ocean acidification (OA) data and presents a framework for an OA metadata template. Metadata is structured information that describes and locates an information resource. It is the key to ensuring that a data set will be accessible into the future. With the rapid expansion of studies on biological responses to OA, the lack of a common metadata template to document the resulting data poses a significant hindrance to effective OA data management efforts. In this paper, we present a metadata template that can be applied to a broad spectrum of OA studies, including those studying the biological responses to OA. The "variable metadata section", which includes the variable name, observation type, whether the variable is a manipulation condition or response variable, and the biological subject on which the variable is studied, forms the core of this metadata template. Additional metadata elements, such as investigators, temporal and spatial coverage, and data citation, are essential components to complete the template. We explain the structure of the template, and define many metadata elements that may be unfamiliar to researchers.
APA, Harvard, Vancouver, ISO, and other styles
8

Lassi, Monica, Maria Johnsson, and Koraljka Golub. "Research data services." IFLA Journal 42, no. 4 (November 30, 2016): 266–77. http://dx.doi.org/10.1177/0340035216671963.

Full text
Abstract:
The paper reports on an exploratory study of researchers’ needs for effective research data management at two Swedish universities, conducted in order to inform the ongoing development of research data services. Twelve researchers from diverse fields have been interviewed, including biology, cultural studies, economics, environmental studies, geography, history, linguistics, media and psychology. The interviews were structured, guided by the Data Curation Profiles Toolkit developed at Purdue University, with added questions regarding subject metadata. The preliminary analysis indicates that the research data management practices vary greatly among the respondents, and therefore so do the implications for research data services. The added questions on subject metadata indicate needs of services guiding researchers in describing their datasets with adequate metadata.
APA, Harvard, Vancouver, ISO, and other styles
9

Jiang, L. Q., S. A. O'Connor, K. M. Arzayus, A. Kozyr, and A. R. Parsons. "A metadata template for ocean acidification data." Earth System Science Data Discussions 8, no. 1 (January 13, 2015): 1–21. http://dx.doi.org/10.5194/essdd-8-1-2015.

Full text
Abstract:
Abstract. This paper defines the best practices for documenting ocean acidification (OA) metadata and presents a framework for an OA metadata template. Metadata is structured information that describes and locates an information resource. It is the key to ensuring that a data set will survive and continue to be accessible into the future. With the rapid expansion of studies on biological responses of organisms to OA, the lack of a common metadata template to document the resulting data poses a significant hindrance to effective OA data management efforts. In this paper, we present a metadata template that can be applied to a broad spectrum of OA studies, including those studying the biological responses of organisms to OA. The "variable metadata section", which includes the variable name, observation type, whether the variable is a manipulation condition or response variable, and the biological subject on which the variable is studied, forms the core of this metadata template. Additional metadata elements, such as investigators, temporal and spatial coverage, platforms for the sampling, data citation, are essential components to complete the template. We also explain the structure of the template, and define many metadata elements that may be unfamiliar to researchers. Template availability. - Available at: http://ezid.cdlib.org/id/doi:10.7289/V5C24TCK. - DOI: doi:10.7289/V5C24TCK. - NOAA Institutional Repository Accession number: ocn881471371.
APA, Harvard, Vancouver, ISO, and other styles
10

Vanags, Mikus, and Rudite Cevere. "Type Safe Metadata Combining." Computer and Information Science 10, no. 2 (April 30, 2017): 97. http://dx.doi.org/10.5539/cis.v10n2p97.

Full text
Abstract:
Type safety is an important property of any type system. Modern programming languages support different mechanisms to work in type safe manner, e.g., properties, methods, events, attributes (annotations) and other structures. Some programming languages allow access to metadata: type information, type member information and information about applied attributes. But none of the existing mainstream programming languages which support reflection provides fully type safe metadata combining mechanism built in the programming language. Combining of metadata means a class member metadata combining with data, type metadata and constraints. Existing solutions provide no or limited type safe metadata combining mechanism; they are complex and processed at runtime, which by definition is not built-in type-safe metadata combining. Problem can be solved by introducing syntax and methods for type safe metadata combining so that, metadata could be processed at compile time in a fully type safe way. Common metadata combining use cases are data abstraction layer creation and database querying.
APA, Harvard, Vancouver, ISO, and other styles
11

Hardesty, Juliet L. "Transitioning from XML to RDF: Considerations for an effective move towards Linked Data and the Semantic Web." Information Technology and Libraries 35, no. 1 (April 1, 2016): 51. http://dx.doi.org/10.6017/ital.v35i1.9182.

Full text
Abstract:
Metadata, particularly within the academic library setting, is often expressed in eXtensible Markup Language (XML) and managed with XML tools, technologies, and workflows. Managing a library’s metadata currently takes on a greater level of complexity as libraries are increasingly adopting the Resource Description Framework (RDF). Semantic Web initiatives are surfacing in the library context with experiments in publishing metadata as Linked Data sets and also with development efforts such as BIBFRAME and the Fedora 4 Digital Repository incorporating RDF. Use cases show that transitions into RDF are occurring in both XML standards and in libraries with metadata encoded in XML. It is vital to understand that transitioning from XML to RDF requires a shift in perspective from replicating structures in XML to defining meaningful relationships in RDF. Establishing coordination and communication among these efforts will help as more libraries move to use RDF, produce Linked Data, and approach the Semantic Web.
APA, Harvard, Vancouver, ISO, and other styles
12

Foessel, Siegfried, and Heiko Sparenberg. "EN 17650 – The new standard for digital preservation of cinematographic works." Archiving Conference 2021, no. 1 (June 18, 2021): 43–46. http://dx.doi.org/10.2352/issn.2168-3204.2021.1.0.10.

Full text
Abstract:
EN 17650 is a proposed new European Standard for the digital preservation of cinematographic works. It allows organizing of content in a systematic way, the so called Cinema Preservation Package (CPP). The standard defines methods to store content in physical and logical structures and describes relationships and metadata for its components. The CPP uses existing XML schemes, in particular METS, EBUCore and PREMIS to store structural, descriptive, technical and provenance metadata. METS XML files with their core metadata contain physical and logical structures of the content, hash values and UUIDs to ensure data integrity and links to external metadata files to enrich the content with additional information. The content itself is stored based on existing public and industry standards, avoiding unnecessary conversion steps. The paper explains the concepts behind the new standard and specifies the usage and combinations of existing schemes with newly introduced metadata parameters.
APA, Harvard, Vancouver, ISO, and other styles
13

Russell, Pamela H., and Debashis Ghosh. "Radtools: R utilities for smooth navigation of medical image data." F1000Research 7 (December 24, 2018): 1976. http://dx.doi.org/10.12688/f1000research.17139.1.

Full text
Abstract:
The radiology community has adopted several widely used standards for medical image files, including the popular DICOM (Digital Imaging and Communication in Medicine) and NIfTI (Neuroimaging Informatics Technology Initiative) standards. These file formats include image intensities as well as potentially extensive metadata. The NIfTI standard specifies a particular set of header fields describing the image and minimal information about the scan. DICOM headers can include any of >4,000 available metadata attributes spanning a variety of topics. NIfTI files contain all slices for an image series, while DICOM files capture single slices and image series are typically organized into a directory. Each DICOM file contains metadata for the image series as well as the individual image slice. The programming environment R is popular for data analysis due to its free and open code, active ecosystem of tools and users, and excellent system of contributed packages. Currently, many published radiological image analyses are performed with proprietary software or custom unpublished scripts. However, R is increasing in popularity in this area due to several packages for processing and analysis of image files. While these R packages handle image import and processing, no existing package makes image metadata conveniently accessible. Extracting image metadata, combining across slices, and converting to useful formats can be prohibitively cumbersome, especially for DICOM files. We present radtools, an R package for smooth navigation of medical image data. Radtools makes the problem of extracting image metadata trivially simple, providing simple functions to explore and return information in familiar R data structures. Radtools also facilitates extraction of image data and viewing of image slices. The package is freely available under the MIT license at https://github.com/pamelarussell/radtools and is easily installable from the Comprehensive R Archive Network (https://cran.r-project.org/package=radtools).
APA, Harvard, Vancouver, ISO, and other styles
14

Canning, Erin, Susan Brown, Sarah Roger, and Kimberley Martin. "The Power to Structure." KULA: Knowledge Creation, Dissemination, and Preservation Studies 6, no. 3 (July 27, 2022): 1–15. http://dx.doi.org/10.18357/kula.169.

Full text
Abstract:
Information systems are developed by people with intent—they are designed to help creators and users tell specific stories with data. Within information systems, the often invisible structures of metadata profoundly impact the meaning that can be derived from that data. The Linked Infrastructure for Networked Cultural Scholarship project (LINCS) helps humanities researchers tell stories by using linked open data to convert humanities datasets into organized, interconnected, machine-processable resources. LINCS provides context for online cultural materials, interlinks them, andgrounds them in sources to improve web resources for research. This article describes how the LINCS team is using the shared standards of linked data and especially ontologies—typically unseen yet powerful—to bring meaning mindfully to metadata through structure. The LINCS metadata—comprised of linked open data about cultural artifacts, people, and processes—and the structures that support it must represent multiple, diverse ways of knowing. It needs to enable various means of incorporating contextual data and of telling stories with nuance and context, situated and supported by data structures that reflect and make space for specificities and complexities. As it addresses specificity in each research dataset, LINCS is simultaneously working to balance interoperability, as achieved through a level of generalization, with contextual and domain-specific requirements. The LINCS team’s approach to ontology adoption and use centers on intersectionality, multiplicity, and difference. The question of what meaning the structures being used will bring to the data is as important as what meaning is introduced as a result of linking data together, and the project has built this premise into its decision-making and implementation processes. To convey an understanding of categories and classification as contextually embedded—culturally produced, intersecting, and discursive—the LINCS team frames them not as fixed but as grounds for investigation and starting points for understanding. Metadata structures are as important as vocabularies for producing such meaning.
APA, Harvard, Vancouver, ISO, and other styles
15

Tilton, Lauren, Emeline Alexander, Luke Malcynsky, and Hanglin Zhou. "The Role of Metadata in American Studies." Polish Journal for American Studies, Issue 14 (Autumn 2020) (December 1, 2020): 149–63. http://dx.doi.org/10.7311/pjas.14/2/2020.02.

Full text
Abstract:
This article argues that metadata can animate rather than stall American Studies inquiry. Data about data can enable and expand the kinds of context, evidence, and interdisciplinary methodological approaches that American Studies can engage with while taking back data from the very power structures that the field aims to reveal, critique, and abolish. As a result, metadata can be a site where the field realizes its intellectual and political commitments. The article draws on a range of digital humanities projects, with a focus on projects created by the authors, that demonstrate the possibilities (and challenges) of metadata for American Studies.
APA, Harvard, Vancouver, ISO, and other styles
16

Bashina, O. E., N. A. Komkova, L. V. Matraeva, and V. E. Kosolapova. "The Future of International Statistical Data Sharing and New Issues of Interaction." Voprosy statistiki 26, no. 7 (August 1, 2019): 55–66. http://dx.doi.org/10.34023/2313-6383-2019-26-7-55-66.

Full text
Abstract:
The article deals with challenges and prospects of implementation of the Statistical Data and Metadata eXchange (SDMX) standard and using it in the international sharing of statistical data and metadata. The authors identified potential areas where this standard can be used, described a mechanism for data and metadata sharing according to SDMX standard. Major issues classified into three groups - general, statistical, information technology - were outlined by applying both domestic and foreign experience of implementation of the standard. These issues may arise at the national level (if the standard is implemented domestically), at the international level (when the standard is applied by international organizations), and at the national-international level (if the information is exchanged between national statistical data providers and international organizations). General issues arise at the regulatory level and are associated with establishing boundaries of responsibility of counterpart organizations at all three levels of interaction, as well as in terms of increasing the capacity to apply the SDMX standard. Issues of statistical nature are most often encountered due to the sharing of large amounts of data and metadata related to various thematic areas of statistics; there should be a unified structure of data and metadata generation and transmission. With the development of information sharing, arise challenges and issues associated with continuous monitoring and expanding SDMX code lists. At the same time, there is a lack of a universal data structure at the international level and, as a result, it is difficult to understand and apply at the national level the existing data structures developed by international organizations. Challenges of information technology are related to creating an IT infrastructure for data and metadata sharing using the SDMX standard. The IT infrastructure (depending on the participant status) includes the following elements: tools for the receiving organizations, tools for sending organization and the infrastructure for the IT professionals. For each of the outlined issues, the authors formulated some practical recommendations based on the complexity principle as applied to the implementation of the international SDMX standard for the exchange of data and metadata.
APA, Harvard, Vancouver, ISO, and other styles
17

Rasmussen, Karsten Boye. "Metadata is key - the most important data after data." IASSIST Quarterly 42, no. 2 (July 18, 2018): 1. http://dx.doi.org/10.29173/iq922.

Full text
Abstract:
Welcome to the second issue of volume 42 of the IASSIST Quarterly (IQ 42:2, 2018). The IASSIST Quarterly has had several papers on many different aspects of the Data Documentation Initiative - for a long time better known by its acronym DDI, without any further explanation. DDI is a brand. The IASSIST Quarterly has also included special issues of collections of papers concerning DDI. Among staff at data archives and data libraries, as well as the users of these facilities, I think we can agree that it is the data that comes first. However, fundamental to all uses of data is the documentation describing the data, without which the data are useless. Therefore, it comes as no surprise that the IASSIST Quarterly is devoted partly to the presentation of papers related to documentation. The question of documentation or data resembles the question of the chicken or the egg. Don't mistake the keys for your car. The metadata and the data belong together and should not be separated. DDI now is a standard, but as with other standards it continues to evolve. The argument about why standards are good comes to mind: 'The nice thing about standards is that you have so many to choose from!'. DDI is the de facto standard for most social science data at data archives and university data libraries. The first paper demonstrates a way to tackle the heterogeneous character of the usage of the DDI. The approach is able to support collaborative questionnaire development as well as export in several formats including the metadata as DDI. The second paper shows how an institutionalized and more general metadata standard - in this case the Belgian Encoded Archival Description (EAD) - is supported by a developed crosswalk from DDI to EAD. However, IQ 42:2 is not a DDI special issue, and the third paper presents an open-source research data management platform called Dendro and a laboratory notebook called LabTablet without mentioning DDI. However, the paper certainly does mention metadata - it is the key to all data. The winner of the paper competition of the IASSIST 2017 conference is presented in this issue. 'Flexible DDI Storage' is authored by Oliver Hopt, Claus-Peter Klas, Alexander Mühlbauer, all affiliated with GESIS - the Leibniz-Institute for the Social Sciences in Germany. The authors argue that the current usage of DDI is heterogeneous and that this results in complex database models for each developed application. The paper shows a new binding of DDI to applications that works independently of most version changes and interpretative differences, thus avoiding continuous reimplementation. The work is based upon their developed DDI-FlatDB approach, which they showed at the European DDI conferences in 2015 and 2016, and which is also described in the paper. Furthermore, a web-based questionnaire editor and application supports large DDI structures and collaborative questionnaire development as well as production of structured metadata for survey institutes and data archives. The paper describes the questionnaire workflow from the start to the export of questionnaire, DDI XML, and SPSS. The development is continuing and it will be published as open source. The second paper is also focused on DDI, now in relation to a new data archive. 'Elaborating a Crosswalk Between Data Documentation Initiative (DDI) and Encoded Archival Description (EAD) for an Emerging Data Archive Service Provider' is by Benjamin Peuch who is a researcher at the State Archives of Belgium. It is expected that the future Belgian data archive will be part of the State Archives, and because DDI is the most widespread metadata standard in the social sciences, the State Archives have developed a DDI-to-EAD crosswalk in order to re-use their EAD infrastructure. The paper shows the conceptual differences between DDI and EAD - both XML based - and how these can be reconciled or avoided for the purpose of a data archive for the social sciences. The author also foresees a fruitful collaboration between traditional archivists and social scientists. The third paper is by a group of scholars connected to the Informatics Engineering Department of University of Porto and the INESC TEC in Portugal. Cristina Ribeiro, João Rocha da Silva, João Aguiar Castro, Ricardo Carvalho Amorim, João Correia Lopes, and Gabriel David are the authors of 'Research Data Management Tools and Workflows: Experimental Work at the University of Porto'. The authors start with the statement that 'Research datasets include all kinds of objects, from web pages to sensor data, and originate in every domain'. The task is to make these data visible, described, preserved, and searchable. The focus is on data preparation, dataset organization and metadata creation. Some groups were proposed a developed open-source research data management platform called Dendro and a laboratory notebook called LabTablet, while other groups that demanded a domain-specific approach had special developed models and applications. All development and metadata modelling have in sight the metadata dissemination. Submissions of papers for the IASSIST Quarterly are always very welcome. We welcome input from IASSIST conferences or other conferences and workshops, from local presentations or papers especially written for the IQ. When you are preparing such a presentation, give a thought to turning your one-time presentation into a lasting contribution. Doing that after the event also gives you the opportunity of improving your work after feedback. We encourage you to login or create an author login to https://www.iassistquarterly.com (our Open Journal System application). We permit authors 'deep links' into the IQ as well as deposition of the paper in your local repository. Chairing a conference session with the purpose of aggregating and integrating papers for a special issue IQ is also much appreciated as the information reaches many more people than the limited number of session participants and will be readily available on the IASSIST Quarterly website at https://www.iassistquarterly.com. Authors are very welcome to take a look at the instructions and layout: https://www.iassistquarterly.com/index.php/iassist/about/submissions Authors can also contact me directly via e-mail: kbr@sam.sdu.dk. Should you be interested in compiling a special issue for the IQ as guest editor(s) I will also be delighted to hear from you. Karsten Boye Rasmussen - June, 2018
APA, Harvard, Vancouver, ISO, and other styles
18

Fugazza, Cristiano, Monica Pepe, Alessandro Oggioni, Paolo Tagliolato, and Paola Carrara. "Raising Semantics-Awareness in Geospatial Metadata Management." ISPRS International Journal of Geo-Information 7, no. 9 (September 7, 2018): 370. http://dx.doi.org/10.3390/ijgi7090370.

Full text
Abstract:
Geospatial metadata are often encoded in formats that either are not aimed at efficient retrieval of resources or are plainly outdated. Particularly, the quantum leap represented by the Semantic Web did not induce so far a consistent, interlinked baseline in the geospatial domain. Datasets, scientific literature related to them, and ultimately the researchers behind these products are only loosely connected; the corresponding metadata intelligible only to humans, duplicated in different systems, seldom consistently. We address these issues by relating metadata items to resources that represent keywords, institutes, researchers, toponyms, and virtually any RDF data structure made available over the Web via SPARQL endpoints. Essentially, our methodology fosters delegated metadata management as the entities referred to in metadata are independent, decentralized data structures with their own life cycle. Our example implementation of delegated metadata envisages: (i) editing via customizable web-based forms (including injection of semantic information); (ii) encoding of records in any XML metadata schema; and (iii) translation into RDF. Among the semantics-aware features that this practice enables, we present a worked-out example focusing on automatic update of metadata descriptions. Our approach, demonstrated in the context of INSPIRE metadata (the ISO 19115/19119 profile eliciting integration of European geospatial resources) is also applicable to a broad range of metadata standards, including non-geospatial ones.
APA, Harvard, Vancouver, ISO, and other styles
19

Harvey, Andrew S. "Time-Use Metadata." Transportation Research Record: Journal of the Transportation Research Board 1804, no. 1 (January 2002): 67–76. http://dx.doi.org/10.3141/1804-10.

Full text
Abstract:
Time-diary data provide a complete sequential record of all activities of individuals, including travel, for a period of 24 or 48 h or longer. Hence, time-use data have much to offer travel behavior analysts and modelers. The pool of time-use data is rapidly increasing. Additionally, comparability between time-use data and travel data is growing, largely because of the expanding volume of activity data collected in travel surveys. One challenge is to ensure that the data and time-use, travel, and other researchers can be brought together in the most efficient manner. This task requires the development of both study-level and variable-level metadata standards. Much work, providing a basis for the development of time-use metadata standards, has already been undertaken in collateral fields. Arguments are made for exploration, application, and expansion of existing work, to establish time-use metadata standards. A consolidation of efforts is proposed between time-use and travel behavior data professionals to ensure that each field has the optimum opportunity to identify, locate, evaluate, and access useful data in either field.
APA, Harvard, Vancouver, ISO, and other styles
20

Stillerman, J., M. Greenwald, and J. Wright. "Scientific data management with navigational metadata." Fusion Engineering and Design 128 (March 2018): 113–16. http://dx.doi.org/10.1016/j.fusengdes.2018.01.063.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Schoenenwald, Alexander, Simon Kern, Josef Viehhauser, and Johannes Schildgen. "Collecting and visualizing data lineage of Spark jobs." Datenbank-Spektrum 21, no. 3 (October 4, 2021): 179–89. http://dx.doi.org/10.1007/s13222-021-00387-7.

Full text
Abstract:
AbstractMetadata management constitutes a key prerequisite for enterprises as they engage in data analytics and governance. Today, however, the context of data is often only manually documented by subject matter experts, and lacks completeness and reliability due to the complex nature of data pipelines. Thus, collecting data lineage—describing the origin, structure, and dependencies of data—in an automated fashion increases quality of provided metadata and reduces manual effort, making it critical for the development and operation of data pipelines. In our practice report, we propose an end-to-end solution that digests lineage via (Py‑)Spark execution plans. We build upon the open-source component Spline, allowing us to reliably consume lineage metadata and identify interdependencies. We map the digested data into an expandable data model, enabling us to extract graph structures for both coarse- and fine-grained data lineage. Lastly, our solution visualizes the extracted data lineage via a modern web app, and integrates with BMW Group’s soon-to-be open-sourced Cloud Data Hub.
APA, Harvard, Vancouver, ISO, and other styles
22

Schoenenwald, Alexander, Simon Kern, Josef Viehhauser, and Johannes Schildgen. "Collecting and visualizing data lineage of Spark jobs." Datenbank-Spektrum 21, no. 3 (October 4, 2021): 179–89. http://dx.doi.org/10.1007/s13222-021-00387-7.

Full text
Abstract:
AbstractMetadata management constitutes a key prerequisite for enterprises as they engage in data analytics and governance. Today, however, the context of data is often only manually documented by subject matter experts, and lacks completeness and reliability due to the complex nature of data pipelines. Thus, collecting data lineage—describing the origin, structure, and dependencies of data—in an automated fashion increases quality of provided metadata and reduces manual effort, making it critical for the development and operation of data pipelines. In our practice report, we propose an end-to-end solution that digests lineage via (Py‑)Spark execution plans. We build upon the open-source component Spline, allowing us to reliably consume lineage metadata and identify interdependencies. We map the digested data into an expandable data model, enabling us to extract graph structures for both coarse- and fine-grained data lineage. Lastly, our solution visualizes the extracted data lineage via a modern web app, and integrates with BMW Group’s soon-to-be open-sourced Cloud Data Hub.
APA, Harvard, Vancouver, ISO, and other styles
23

Li, Yaping. "Glowworm Swarm Optimization Algorithm- and K-Prototypes Algorithm-Based Metadata Tree Clustering." Mathematical Problems in Engineering 2021 (February 9, 2021): 1–10. http://dx.doi.org/10.1155/2021/8690418.

Full text
Abstract:
The main objective of this paper is to present a new clustering algorithm for metadata trees based on K-prototypes algorithm, GSO (glowworm swarm optimization) algorithm, and maximal frequent path (MFP). Metadata tree clustering includes computing the feature vector of the metadata tree and the feature vector clustering. Therefore, traditional data clustering methods are not suitable directly for metadata trees. As the main method to calculate eigenvectors, the MFP method also faces the difficulties of high computational complexity and loss of key information. Generally, the K-prototypes algorithm is suitable for clustering of mixed-attribute data such as feature vectors, but the K-prototypes algorithm is sensitive to the initial clustering center. Compared with other swarm intelligence algorithms, the GSO algorithm has more efficient global search advantages, which are suitable for solving multimodal problems and also useful to optimize the K-prototypes algorithm. To address the clustering of metadata tree structures in terms of clustering accuracy and high data dimension, this paper combines the GSO algorithm, K-prototypes algorithm, and MFP together to study and design a new metadata structure clustering method. Firstly, MFP is used to describe metadata tree features, and the key parameter of categorical data is introduced into the feature vector of MFP to improve the accuracy of the feature vector to describe the metadata tree; secondly, GSO is combined with K-prototypes to design GSOKP for clustering the feature vector that contains numeric data and categorical data so as to improve the clustering accuracy; finally, tests are conducted with a set of metadata trees. The experimental results show that the designed metadata tree clustering method GSOKP-FP has certain advantages in respect to clustering accuracy and time complexity.
APA, Harvard, Vancouver, ISO, and other styles
24

Su, Shian, Vincent J. Carey, Lori Shepherd, Matthew Ritchie, Martin T. Morgan, and Sean Davis. "BiocPkgTools: Toolkit for mining the Bioconductor package ecosystem." F1000Research 8 (May 29, 2019): 752. http://dx.doi.org/10.12688/f1000research.19410.1.

Full text
Abstract:
Motivation: The Bioconductor project, a large collection of open source software for the comprehension of large-scale biological data, continues to grow with new packages added each week, motivating the development of software tools focused on exposing package metadata to developers and users. The resulting BiocPkgTools package facilitates access to extensive metadata in computable form covering the Bioconductor package ecosystem, facilitating downstream applications such as custom reporting, data and text mining of Bioconductor package text descriptions, graph analytics over package dependencies, and custom search approaches. Results: The BiocPkgTools package has been incorporated into the Bioconductor project, installs using standard procedures, and runs on any system supporting R. It provides functions to load detailed package metadata, longitudinal package download statistics, package dependencies, and Bioconductor build reports, all in "tidy data" form. BiocPkgTools can convert from tidy data structures to graph structures, enabling graph-based analytics and visualization. An end-user-friendly graphical package explorer aids in task-centric package discovery. Full documentation and example use cases are included. Availability: The BiocPkgTools software and complete documentation are available from Bioconductor (https://bioconductor.org/packages/BiocPkgTools).
APA, Harvard, Vancouver, ISO, and other styles
25

Fischer, Colin, Monika Sester, and Steffen Schön. "Spatio-Temporal Research Data Infrastructure in the Context of Autonomous Driving." ISPRS International Journal of Geo-Information 9, no. 11 (October 25, 2020): 626. http://dx.doi.org/10.3390/ijgi9110626.

Full text
Abstract:
In this paper, we present an implementation of a research data management system that features structured data storage for spatio-temporal experimental data (environmental perception and navigation in the framework of autonomous driving), including metadata management and interfaces for visualization and parallel processing. The demands of the research environment, the design of the system, the organization of the data storage, and computational hardware as well as structures and processes related to data collection, preparation, annotation, and storage are described in detail. We provide examples for the handling of datasets, explaining the required data preparation steps for data storage as well as benefits when using the data in the context of scientific tasks.
APA, Harvard, Vancouver, ISO, and other styles
26

Bogdanović, Miloš, Milena Frtunić Gligorijević, Nataša Veljković, and Leonid Stoimenov. "GENERATING KNOWLEDGE STRUCTURES FROM OPEN DATASETS' TAGS - AN APPROACH BASED ON FORMAL CONCEPT ANALYSIS." Facta Universitatis, Series: Automatic Control and Robotics 20, no. 1 (April 14, 2021): 021. http://dx.doi.org/10.22190/fuacr201225002b.

Full text
Abstract:
Under influence of data transparency initiatives, a variety of institutions have published a significant number of datasets. In most cases, data publishers take advantage of open data portals (ODPs) for making their datasets publicly available. To improve the datasets' discoverability, open data portals (ODPs) group open datasets into categories using various criteria like publishers, institutions, formats, and descriptions. For these purposes, portals take advantage of metadata accompanying datasets. However, a part of metadata may be missing, or may be incomplete or redundant. Each of these situations makes it difficult for users to find appropriate datasets and obtain the desired information. As the number of available datasets grows, this problem becomes easy to notice. This paper is focused on the first step towards decreasing this problem by implementing knowledge structures to be used in situations where a part of datasets' metadata is missing. In particular, we focus on developing knowledge structures capable of suggesting the best match for the category where an uncategorized dataset should belong to. Our approach relies on dataset descriptions provided by users within dataset tags. We take advantage of a formal concept analysis to reveal the shared conceptualization originating from the tags' usage by developing a concept lattice per each category of open datasets. Since tags represent free text metadata entered by users, in this paper we will present a method of optimizing their usage through means of semantic similarity measures based on natural language processing mechanisms. Finally, we will demonstrate the advantage of our proposal by comparing concept lattices generated using formal the concept analysis before and after the optimization process. The main experimental research results will show that our approach is capable of reducing the number of nodes within a lattice more than 40%.
APA, Harvard, Vancouver, ISO, and other styles
27

Firdaus Ahmad Fadzil, Ahmad, Zaaba Ahmad, Noor Elaiza Abd Khalid, and Shafaf Ibrahim. "Retinal Fundus Image Blood Vessels Segmentation via Object-Oriented Metadata Structures." International Journal of Engineering & Technology 7, no. 4.33 (December 9, 2018): 110. http://dx.doi.org/10.14419/ijet.v7i4.33.23511.

Full text
Abstract:
Retinal fundus image is a crucial tool for ophthalmologists to diagnose eye-related diseases. These images provide visual information of the interior layer of the retina structures such as optic disc, optic cup, blood vessels and macula that can assist ophthalmologist in determining the health of an eye. Segmentation of blood vessels in fundus images is one of the most fundamental phase in detecting diseases such as diabetic retinopathy. However, the ambiguity of the retina structures in the retinal fundus images presents a challenge for researcher to segment the blood vessels. Extensive pre-processing and training of the images is necessary for precise segmentation, which is very intricate and laborious. This paper proposes the implementation of object-oriented-based metadata (OOM) structures of each pixel in the retinal fundus images. These structures comprise of additional metadata towards the conventional red, green, and blue data for each pixel within the images. The segmentation of the blood vessels in the retinal fundus images are performed by considering these additional metadata that enunciates the location, color spaces, and neighboring pixels of each individual pixel. From the results, it is shown that accurate segmentation of retinal fundus blood vessels can be achieved by purely employing straightforward thresholding method via the OOM structures without extensive pre-processing image processing technique or data training.
APA, Harvard, Vancouver, ISO, and other styles
28

Fegraus, Eric H., Sandy Andelman, Matthew B. Jones, and Mark Schildhauer. "Maximizing the Value of Ecological Data with Structured Metadata: An Introduction to Ecological Metadata Language (EML) and Principles for Metadata Creation." Bulletin of the Ecological Society of America 86, no. 3 (July 2005): 158–68. http://dx.doi.org/10.1890/0012-9623(2005)86[158:mtvoed]2.0.co;2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
29

Hasan, Forat Falih, and Muhamad Shahbani Abu Bakar. "An approach for metadata extraction and transformation for various data sources using R programming language." Indonesian Journal of Electrical Engineering and Computer Science 26, no. 3 (June 1, 2022): 1520. http://dx.doi.org/10.11591/ijeecs.v26.i3.pp1520-1529.

Full text
Abstract:
The metadata system is the key system for sharing and transforming data between various information systems (ISs), and each database system has its own structure for storing and retrieving metadata information. Metadata information must be extracted for data transformation. Furthermore, these procedures were required to communicate with each type of database system and retrieve the stored metadata; these processes required much information and effort. To overcome the challenge of accessing and extracting metadata from any type of data source, a unifomed method must be developed and integrated into any organization's information systems. The semi-structured data extraction method (SeDEM) is a developed method that includes three main operations: logical structure operation, unique key operation, and relationships operation. Finally, the accurate information obtained using the SeDEM addressed data quality issues concerning the integrity and completeness of the data transformation.
APA, Harvard, Vancouver, ISO, and other styles
30

Andritsos, Periklis, and Patrick Keilty. "Level-Wise Exploration of Linked and Big Data Guided by Controlled Vocabularies and Folksonomies." Advances in Classification Research Online 24, no. 1 (January 9, 2014): 1. http://dx.doi.org/10.7152/acro.v24i1.14670.

Full text
Abstract:
This paper proposes a level-wise exploration of linked and big data guided by controlled vocabularies and folksonomies. We leverage techniques from both Reconstructability Analysis and cataloging and classification research to provide solutions that will structure and store large amounts of metadata, identify links between data, and explore data structures to produce models that will facilitate effective information retrieval.
APA, Harvard, Vancouver, ISO, and other styles
31

Bhat, Talapady. "Rule and Root-based Metadata-Ecosystem for Structural Bioinformatics & Facebook." Acta Crystallographica Section A Foundations and Advances 70, a1 (August 5, 2014): C496. http://dx.doi.org/10.1107/s2053273314095035.

Full text
Abstract:
Despite the widespread efforts to develop flexible formats such as PDB, mmCIF, CIF., to store and exchange data, the lack of best practice metadata pose major challenges. Readily adoptable methods with demonstrated usability across multiple solutions to create on-demand metadata are critical for the effective archive and exchange of data in a user-centric fashion. It is important that there exists a metadata-ecosystem where metadata of all structural and biological research evolve synchronously. Previously we described (Chem-BLAST, http://xpdb.nist.gov/chemblast/pdb.pl) a new `root' based concept used in language development (Latin & Sanskrit) to simplify the selection or creation of terms for metadata for millions of chemical structures from the PDB and the PubChem. Subsequently we extended it to text-based data on Cell-image-data (BMC, doi:10.1186/1471-2105-12-487). Here we describe further extension of this concept by creating roots and rules to define an ecosystem for composing new or modifying existing metadata for demonstrated inter-operability. A major focus of the rules is to ensure that the metadata terms are self-explaining (intuitive), highly-reused to describe many experiments and also that they are usable in a federated environment to construct new use-cases. We illustrate the use of this concept to compose semantic terminology for a wide range of disciplines ranging from material science to biology. Examples of the use of such metadata to create demonstrated solutions to describe data on cell-image data will also be presented. I will present ideas and examples to foster discussion on metadata architecture a) that is independent of formats and b) that is better suited for a federated environment c) that could be used readily to build components such as resource description framework (RDF) and Web services for Semantic web.
APA, Harvard, Vancouver, ISO, and other styles
32

Shankaranarayanan, G., and Bin Zhu. "Enhancing decision-making with data quality metadata." Journal of Systems and Information Technology 23, no. 2 (August 18, 2021): 199–217. http://dx.doi.org/10.1108/jsit-08-2020-0153.

Full text
Abstract:
Purpose Data quality metadata (DQM) is a set of quality measurements associated with the data. Prior research in data quality has shown that DQM improves decision performance. The same research has also shown that DQM overloads the cognitive capacity of decision-makers. Visualization is a proven technique to reduce cognitive overload in decision-making. This paper aims to describe a prototype decision support system with a visual interface and examine its efficacy in reducing cognitive overload in the context of decision-making with DQM. Design/methodology/approach The authors describe the salient features of the prototype and following the design science paradigm, this paper evaluates its usefulness using an experimental setting. Findings The authors find that the interface not only reduced perceived mental demand but also improved decision performance despite added task complexity due to the presence of DQM. Research limitations/implications A drawback of this study is the sample size. With a sample size of 51, the power of the model to draw conclusions is weakened. Practical implications In today’s decision environments, decision-makers deal with extraordinary volumes of data the quality of which is unknown or not determinable with any certainty. The interface and its evaluation offer insights into the design of decision support systems that reduce the complexity of the data and facilitate the integration of DQM into the decision tasks. Originality/value To the best of my knowledge, this is the only research to build and evaluate a decision-support prototype for structured decision-making with DQM.
APA, Harvard, Vancouver, ISO, and other styles
33

Chapman, John. "A conversation about linked data in the library and publishing ecosystem." Information Services & Use 40, no. 3 (November 10, 2020): 177–79. http://dx.doi.org/10.3233/isu-200087.

Full text
Abstract:
During the inaugural 2020 NISO+ conference, the “Ask the Experts about… Linked Data” panel included discussion of the transition of library metadata from legacy, record-based models to linked data structures. Panelists John Chapman (OCLC, Inc.) and Philip Schreur (Stanford University) were the speakers; NISO Board of Directors member Mary Sauer-Games (OCLC, Inc.) was the facilitator. The event was an open-ended conversation, with topics driven by questions and comments from the audience.
APA, Harvard, Vancouver, ISO, and other styles
34

Atan, Rodziah, and Nur Adila Azram. "A Framework for Halal Knowledge Metadata Representations." Applied Mechanics and Materials 892 (June 2019): 8–15. http://dx.doi.org/10.4028/www.scientific.net/amm.892.8.

Full text
Abstract:
End users and consumers of halal industry are facing difficulties in finding verified halal information. This occurred due to information that is stored in silos at every point of activity for every process chain, employing different structures and models, creating an issue of information verification. Integration of multiple information systems generally aims at combining selected systems so that information can be easily retrieved and manage by users. A proposed five components metadata representation development methodology is presented in this paper so that they form a unified new whole and give users the illusion of interacting with one single information system, therefore, data can be represented using the same abstraction principles (unified global data model and unified semantics) without any physical restructuring.
APA, Harvard, Vancouver, ISO, and other styles
35

Alter, George. "Reflections on the Intermediate Data Structure (IDS)." Historical Life Course Studies 10 (March 31, 2021): 71–75. http://dx.doi.org/10.51964/hlcs9570.

Full text
Abstract:
The Intermediate Data Structure (IDS) encourages sharing historical life course data by storing data in a common format. To encompass the complexity of life histories, IDS relies on data structures that are unfamiliar to most social scientists. This article examines four features of IDS that make it flexible and expandable: the Entity-Attribute-Value model, the relational database model, embedded metadata, and the Chronicle file. I also consider IDS from the perspective of current discussions about sharing data across scientific domains. We can find parallels to IDS in other fields that may lead to future innovations.
APA, Harvard, Vancouver, ISO, and other styles
36

Bernstein, Herbert J. "Extending NXmx metadata to facilitate data sharing." Acta Crystallographica Section A Foundations and Advances 75, a2 (August 18, 2019): e724-e724. http://dx.doi.org/10.1107/s2053273319088326.

Full text
APA, Harvard, Vancouver, ISO, and other styles
37

Prieto, Mario, Helena Deus, Anita de Waard, Erik Schultes, Beatriz García-Jiménez, and Mark D. Wilkinson. "Data-driven classification of the certainty of scholarly assertions." PeerJ 8 (April 21, 2020): e8871. http://dx.doi.org/10.7717/peerj.8871.

Full text
Abstract:
The grammatical structures scholars use to express their assertions are intended to convey various degrees of certainty or speculation. Prior studies have suggested a variety of categorization systems for scholarly certainty; however, these have not been objectively tested for their validity, particularly with respect to representing the interpretation by the reader, rather than the intention of the author. In this study, we use a series of questionnaires to determine how researchers classify various scholarly assertions, using three distinct certainty classification systems. We find that there are three distinct categories of certainty along a spectrum from high to low. We show that these categories can be detected in an automated manner, using a machine learning model, with a cross-validation accuracy of 89.2% relative to an author-annotated corpus, and 82.2% accuracy against a publicly-annotated corpus. This finding provides an opportunity for contextual metadata related to certainty to be captured as a part of text-mining pipelines, which currently miss these subtle linguistic cues. We provide an exemplar machine-accessible representation—a Nanopublication—where certainty category is embedded as metadata in a formal, ontology-based manner within text-mined scholarly assertions.
APA, Harvard, Vancouver, ISO, and other styles
38

Liakos, Panagiotis, Panagiota Koltsida, George Kakaletris, Peter Baumann, Yannis Ioannidis, and Alex Delis. "A Distributed Infrastructure for Earth-Science Big Data Retrieval." International Journal of Cooperative Information Systems 24, no. 02 (June 2015): 1550002. http://dx.doi.org/10.1142/s0218843015500021.

Full text
Abstract:
Earth-Science data are composite, multi-dimensional and of significant size, and as such, continue to pose a number of ongoing problems regarding their management. With new and diverse information sources emerging as well as rates of generated data continuously increasing, a persistent challenge becomes more pressing: To make the information existing in multiple heterogeneous resources readily available. The widespread use of the XML data-exchange format has enabled the rapid accumulation of semi-structured metadata for Earth-Science data. In this paper, we exploit this popular use of XML and present the means for querying metadata emanating from multiple sources in a succinct and effective way. Thereby, we release the user from the very tedious and time consuming task of examining individual XML descriptions one by one. Our approach, termed Meta-Array Data Search (MAD Search), brings together diverse data sources while enhancing the user-friendliness of the underlying information sources. We gather metadata using different standards and construct an amalgamated service with the help of tools that discover and harvest such metadata; this service facilitates the end-user by offering easy and timely access to all metadata. The main contribution of our work is a novel query language termed xWCPS, that builds on top of two widely-adopted standards: XQuery and the Web Coverage Processing Service (WCPS). xWCPS furnishes a rich set of features regarding the way scientific data can be queried with. Our proposed unified language allows for requesting metadata while also giving processing directives. Consequently, the xWCPS-enabled MAD Search helps in both retrieval and processing of large data sets hosted in an heterogeneous infrastructure. We demonstrate the effectiveness of our approach through diverse use-cases that provide insights into the syntactic power and overall expressiveness of xWCPS. We evaluate MAD Search in a distributed environment that comprises five high-volume array-databases whose sizes range between 20 and 100 GB and so, we ascertain the applicability and potential of our proposal.
APA, Harvard, Vancouver, ISO, and other styles
39

KELLY, PAUL H. J., and OLAV BECKMANN. "GENERATIVE AND ADAPTIVE METHODS IN PERFORMANCE PROGRAMMING." Parallel Processing Letters 15, no. 03 (September 2005): 239–55. http://dx.doi.org/10.1142/s0129626405002192.

Full text
Abstract:
Performance programming is characterized by the need to structure software components to exploit the context of use. Relevant context includes the target processor architecture, the available resources (number of processors, network capacity), prevailing resource contention, the values and shapes of input and intermediate data structures, the schedule and distribution of input data delivery, and the way the results are to be used. This paper concerns adapting to dynamic context: adaptive algorithms, malleable and migrating tasks, and application structures based on dynamic component composition. Adaptive computations use metadata associated with software components — performance models, dependence information, data size and shape. Computation itself is interwoven with planning and optimizing the computation process, using this metadata. This reflective nature motivates metaprogramming techniques. We present a research agenda aimed at developing a modelling framework which allows us to characterize both computation and dynamic adaptation in a way that allows systematic optimization.
APA, Harvard, Vancouver, ISO, and other styles
40

Nayak, Stuti, Amrapali Zaveri, Pedro Hernandez Serrano, and Michel Dumontier. "Experience: Automated Prediction of Experimental Metadata from Scientific Publications." Journal of Data and Information Quality 13, no. 4 (December 31, 2021): 1–11. http://dx.doi.org/10.1145/3451219.

Full text
Abstract:
While there exists an abundance of open biomedical data, the lack of high-quality metadata makes it challenging for others to find relevant datasets and to reuse them for another purpose. In particular, metadata are useful to understand the nature and provenance of the data. A common approach to improving the quality of metadata relies on expensive human curation, which itself is time-consuming and also prone to error. Towards improving the quality of metadata, we use scientific publications to automatically predict metadata key:value pairs. For prediction, we use a Convolutional Neural Network (CNN) and a Bidirectional Long-short term memory network (BiLSTM). We focus our attention on the NCBI Disease Corpus, which is used for training the CNN and BiLSTM. We perform two different kinds of experiments with these two architectures: (1) we predict the disease names by using their unique ID in the MeSH ontology and (2) we use the tree structures of MeSH ontology to move up in the hierarchy of these disease terms, which reduces the number of labels. We also perform various multi-label classification techniques for the above-mentioned experiments. We find that in both cases CNN achieves the best results in predicting the superclasses for disease with an accuracy of 83%.
APA, Harvard, Vancouver, ISO, and other styles
41

SCHERP, ANSGAR, CARSTEN SAATHOFF, and STEFAN SCHEGLMANN. "A PATTERN SYSTEM FOR DESCRIBING THE SEMANTICS OF STRUCTURED MULTIMEDIA DOCUMENTS." International Journal of Semantic Computing 06, no. 03 (September 2012): 263–88. http://dx.doi.org/10.1142/s1793351x12400089.

Full text
Abstract:
Today's metadata models and metadata standards often focus on a specific media type only, lack combinability with other metadata models, or are limited with respect to the features they support. Thus they are not sufficient to describe the semantics of rich, structured multimedia documents. To overcome these limitations, we have developed a comprehensive model for representing multimedia metadata, the Multimedia Metadata Ontology (M3O). The M3O has been developed by an extensive analysis of related work and abstracts from the features of existing metadata models and metadata standards. It is based on the foundational ontology DOLCE+DnS Ultralight and makes use of ontology design patterns. The M3O serves as generic modeling framework for integrating the existing metadata models and metadata standards rather than replacing them. As such, the M3O can be used internally as semantic data model within complex multimedia applications such as authoring tools or multimedia management systems. To make use of the M3O in concrete multimedia applications, a generic application programming interface (API) has been implemented based on a sophisticated persistence layer that provides explicit support for ontology design patterns. To demonstrate applicability of the M3O API, we have integrated and applied it with our SemanticMM4U framework for the multi-channel generation of semantically annotated multimedia documents.
APA, Harvard, Vancouver, ISO, and other styles
42

Wigan, Marcus, Margaret Grieco, and Julian Mine. "Enabling and Managing Greater Access to Transport Data Through Metadata." Transportation Research Record: Journal of the Transportation Research Board 1804, no. 1 (January 2002): 48–55. http://dx.doi.org/10.3141/1804-07.

Full text
Abstract:
MetadatA—information about data sets—allow clear understanding of exactly what the elements and structure of a given data set entail. Metadata in conjunction with XML-based specifications, schemas, and tools allow a high level of automated and validated interworking between different types and sources of data. This is an issue of emerging importance to transportation, traffic, and planning, and the communities they serve, as these areas are all data intensive but with very different views of the world. The potential of this linkage is outlined, and a progression is made through a simple example metadata specification for nonmotorized transport, the agreements developed by the geospatial community for geographic data and the transport layers within them, and the formal XML document. The XML specification and validation approach now makes possible and practical a more effective and more accessible use of the information in the multiple fields linked through their involvement in transportation. The key outcome required is a vocabulary (or integrated vocabularies) of globally agreed-upon metadata element definitions for the various fields in and overlapping transportation. The advent of formal document specifications of which XML is a widely used example would then allow a significant expansion of the accessibility, use, and reuse of such data to the great benefit of the user, policy, and analysis user communities.
APA, Harvard, Vancouver, ISO, and other styles
43

Ziaimatin, Hasti, Alireza Nili, and Alistair Barros. "Reducing Consumer Uncertainty: Towards an Ontology for Geospatial User-Centric Metadata." ISPRS International Journal of Geo-Information 9, no. 8 (August 12, 2020): 488. http://dx.doi.org/10.3390/ijgi9080488.

Full text
Abstract:
With the increased use of geospatial datasets across heterogeneous user groups and domains, assessing fitness-for-use is emerging as an essential task. Users are presented with an increasing choice of data from various portals, repositories, and clearinghouses. Consequently, comparing the quality and evaluating fitness-for-use of different datasets presents major challenges for spatial data users. While standardization efforts have significantly improved metadata interoperability, the increasing choice of metadata standards and their focus on data production rather than potential data use and application, renders typical metadata documents insufficient for effectively communicating fitness-for-use. Thus, research has focused on the challenge of communicating fitness-for-use of geospatial data, proposing a more “user-centric” approach to geospatial metadata. We present the Geospatial User-Centric Metadata ontology (GUCM) for communicating fitness-for-use of spatial datasets to users in the spatial and other domains, to enable them to make informed data source selection decisions. GUCM enables metadata description for various components of a dataset in the context of different application domains. It captures producer-supplied and user-described metadata in structured format using concepts from domain-independent ontologies. This facilitates interoperability between spatial and nonspatial metadata on open data platforms and provides the means for searching/discovering spatial data based on user-specified quality and fitness-for-use criteria.
APA, Harvard, Vancouver, ISO, and other styles
44

Russell, Pamela H., and Debashis Ghosh. "Radtools: R utilities for convenient extraction of medical image metadata." F1000Research 7 (January 25, 2019): 1976. http://dx.doi.org/10.12688/f1000research.17139.2.

Full text
Abstract:
The radiology community has adopted several widely used standards for medical image files, including the popular DICOM (Digital Imaging and Communication in Medicine) and NIfTI (Neuroimaging Informatics Technology Initiative) standards. These file formats include image intensities as well as potentially extensive metadata. The NIfTI standard specifies a particular set of header fields describing the image and minimal information about the scan. DICOM headers can include any of >4,000 available metadata attributes spanning a variety of topics. NIfTI files contain all slices for an image series, while DICOM files capture single slices and image series are typically organized into a directory. Each DICOM file contains metadata for the image series as well as the individual image slice. The programming environment R is popular for data analysis due to its free and open code, active ecosystem of tools and users, and excellent system of contributed packages. Currently, many published radiological image analyses are performed with proprietary software or custom unpublished scripts. However, R is increasing in popularity in this area due to several packages for processing and analysis of image files. While these R packages handle image import and processing, no existing package makes image metadata conveniently accessible. Extracting image metadata, combining across slices, and converting to useful formats can be prohibitively cumbersome, especially for DICOM files. We present radtools, an R package for convenient extraction of medical image metadata. Radtools provides simple functions to explore and return metadata in familiar R data structures. For convenience, radtools also includes wrappers of existing tools for extraction of pixel data and viewing of image slices. The package is freely available under the MIT license at https://github.com/pamelarussell/radtools and is easily installable from the Comprehensive R Archive Network (https://cran.r-project.org/package=radtools).
APA, Harvard, Vancouver, ISO, and other styles
45

Russell, Pamela H., and Debashis Ghosh. "Radtools: R utilities for convenient extraction of medical image metadata." F1000Research 7 (March 25, 2019): 1976. http://dx.doi.org/10.12688/f1000research.17139.3.

Full text
Abstract:
The radiology community has adopted several widely used standards for medical image files, including the popular DICOM (Digital Imaging and Communication in Medicine) and NIfTI (Neuroimaging Informatics Technology Initiative) standards. These file formats include image intensities as well as potentially extensive metadata. The NIfTI standard specifies a particular set of header fields describing the image and minimal information about the scan. DICOM headers can include any of >4,000 available metadata attributes spanning a variety of topics. NIfTI files contain all slices for an image series, while DICOM files capture single slices and image series are typically organized into a directory. Each DICOM file contains metadata for the image series as well as the individual image slice. The programming environment R is popular for data analysis due to its free and open code, active ecosystem of tools and users, and excellent system of contributed packages. Currently, many published radiological image analyses are performed with proprietary software or custom unpublished scripts. However, R is increasing in popularity in this area due to several packages for processing and analysis of image files. While these R packages handle image import and processing, no existing package makes image metadata conveniently accessible. Extracting image metadata, combining across slices, and converting to useful formats can be prohibitively cumbersome, especially for DICOM files. We present radtools, an R package for convenient extraction of medical image metadata. Radtools provides simple functions to explore and return metadata in familiar R data structures. For convenience, radtools also includes wrappers of existing tools for extraction of pixel data and viewing of image slices. The package is freely available under the MIT license at GitHub and is easily installable from the Comprehensive R Archive Network.
APA, Harvard, Vancouver, ISO, and other styles
46

Ribeiro, Cristina, João Rocha da Silva, João Aguiar Castro, Ricardo Carvalho Amorim, João Correia Lopes, and Gabriel David. "Research Data Management Tools and Workflows: Experimental Work at the University of Porto." IASSIST Quarterly 42, no. 2 (July 18, 2018): 1–16. http://dx.doi.org/10.29173/iq925.

Full text
Abstract:
Research datasets include all kinds of objects, from web pages to sensor data, and originate in every domain. Concerns with data generated in large projects and well-funded research areas are centered on their exploration and analysis. For data in the long tail, the main issues are still how to get data visible, satisfactorily described, preserved, and searchable. Our work aims to promote data publication in research institutions, considering that researchers are the core stakeholders and need straightforward workflows, and that multi-disciplinary tools can be designed and adapted to specific areas with a reasonable effort. For small groups with interesting datasets but not much time or funding for data curation, we have to focus on engaging researchers in the process of preparing data for publication, while providing them with measurable outputs. In larger groups, solutions have to be customized to satisfy the requirements of more specific research contexts. We describe our experience at the University of Porto in two lines of enquiry. For the work with long-tail groups we propose general-purpose tools for data description and the interface to multi-disciplinary data repositories. For areas with larger projects and more specific requirements, namely wind infrastructure, sensor data from concrete structures and marine data, we define specialized workflows. In both cases, we present a preliminary evaluation of results and an estimate of the kind of effort required to keep the proposed infrastructures running. The tools available to researchers can be decisive for their commitment. We focus on data preparation, namely on dataset organization and metadata creation. For groups in the long tail, we propose Dendro, an open-source research data management platform, and explore automatic metadata creation with LabTablet, an electronic laboratory notebook. For groups demanding a domain-specific approach, our analysis has resulted in the development of models and applications to organize the data and support some of their use cases. Overall, we have adopted ontologies for metadata modeling, keeping in sight metadata dissemination as Linked Open Data.
APA, Harvard, Vancouver, ISO, and other styles
47

Kalantari, Mohsen, Syahrudin Syahrudin, Abbas Rajabifard, and Hannah Hubbard. "Synchronising Spatial Metadata Records and Interfaces to Improve the Usability of Metadata Systems." ISPRS International Journal of Geo-Information 10, no. 6 (June 7, 2021): 393. http://dx.doi.org/10.3390/ijgi10060393.

Full text
Abstract:
The spatial data infrastructure literature reveals significant gaps in metadata systems concerning their efficiency and effectiveness for end-users. The literature proposes improvements to make the metadata systems more user-friendly. These improvements include new metadata elements and user interfaces that are in concert with each other. In this paper, we implement the proposed improvements in a prototype system and engage with end-users to assess if the proposals help users’ expectations. The prototype is evaluated by conducting think-aloud protocol (TAP) usability testing and semi-structured interviews with end-users. The result demonstrates an increased level of satisfaction about existing systems and some more areas to improve. We conclude that a synchronised development approach for the spatial metadata and the user interface will increase the usability of the metadata for data discovery and selection.
APA, Harvard, Vancouver, ISO, and other styles
48

Raybould, Matthew I. J., Claire Marks, Alan P. Lewis, Jiye Shi, Alexander Bujotzek, Bruck Taddese, and Charlotte M. Deane. "Thera-SAbDab: the Therapeutic Structural Antibody Database." Nucleic Acids Research 48, no. D1 (September 26, 2019): D383—D388. http://dx.doi.org/10.1093/nar/gkz827.

Full text
Abstract:
Abstract The Therapeutic Structural Antibody Database (Thera-SAbDab; http://opig.stats.ox.ac.uk/webapps/therasabdab) tracks all antibody- and nanobody-related therapeutics recognized by the World Health Organisation (WHO), and identifies any corresponding structures in the Structural Antibody Database (SAbDab) with near-exact or exact variable domain sequence matches. Thera-SAbDab is synchronized with SAbDab to update weekly, reflecting new Protein Data Bank entries and the availability of new sequence data published by the WHO. Each therapeutic summary page lists structural coverage (with links to the appropriate SAbDab entries), alignments showing where any near-matches deviate in sequence, and accompanying metadata, such as intended target and investigated conditions. Thera-SAbDab can be queried by therapeutic name, by a combination of metadata, or by variable domain sequence - returning all therapeutics that are within a specified sequence identity over a specified region of the query. The sequences of all therapeutics listed in Thera-SAbDab (461 unique molecules, as of 5 August 2019) are downloadable as a single file with accompanying metadata.
APA, Harvard, Vancouver, ISO, and other styles
49

Wang, Zichen, Alexander Lachmann, and Avi Ma’ayan. "Mining data and metadata from the gene expression omnibus." Biophysical Reviews 11, no. 1 (December 29, 2018): 103–10. http://dx.doi.org/10.1007/s12551-018-0490-8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
50

Zaveri, Amrapali, Wei Hu, and Michel Dumontier. "MetaCrowd: Crowdsourcing Biomedical Metadata Quality Assessment." Human Computation 6 (September 4, 2019): 98–112. http://dx.doi.org/10.15346/hc.v6i1.98.

Full text
Abstract:
To reuse the enormous amounts of biomedical data available on the Web, there is an urgent need for good quality metadata. This is extremely important to ensure that data is maximally Findable, Accessible, Interoperable and Reusable. The Gene Expression Omnibus (GEO) allow users to specify metadata in the form of textual key: value pairs (e.g. sex: female). However, since there is no structured vocabulary or format available, the 44,000,000+ key: value pairs suffer from numerous quality issues. Using domain experts for the curation is not only time consuming but also unscalable. Thus, in our approach, MetaCrowd, we apply crowdsourcing as a means for GEO metadata quality assessment. Our results show crowdsourcing is a reliable and feasible way to identify similar as well as erroneous metadata in GEO. This is extremely useful for data consumers and producers for curating and providing good quality metadata.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography