Zaloguj się

Gotowe bibliografie tematyczne / Model annotation / Artykuły w czasopismach

Kliknij ten link, aby zobaczyć inne rodzaje publikacji na ten temat: Model annotation.

Artykuły w czasopismach na temat „Model annotation”

Autor: Grafiati

Data publikacji: 31 sierpnia 2024

Data aktualizacji: 5 lipca 2025

Utwórz poprawne odniesienie w stylach APA, MLA, Chicago, Harvard i wielu innych

Wybierz rodzaj źródła:

Sprawdź 50 najlepszych artykułów w czasopismach naukowych na temat „Model annotation”.

Przycisk „Dodaj do bibliografii” jest dostępny obok każdej pracy w bibliografii. Użyj go – a my automatycznie utworzymy odniesienie bibliograficzne do wybranej pracy w stylu cytowania, którego potrzebujesz: APA, MLA, Harvard, Chicago, Vancouver itp.

Możesz również pobrać pełny tekst publikacji naukowej w formacie „.pdf” i przeczytać adnotację do pracy online, jeśli odpowiednie parametry są dostępne w metadanych.

Przeglądaj artykuły w czasopismach z różnych dziedzin i twórz odpowiednie bibliografie.

1

Benitez-Garcia, Gibran, Jesus Olivares-Mercado, Gabriel Sanchez-Perez, and Hiroki Takahashi. "IPN HandS: Efficient Annotation Tool and Dataset for Skeleton-Based Hand Gesture Recognition." Applied Sciences 15, no. 11 (2025): 6321. https://doi.org/10.3390/app15116321.

Pełny tekst źródła

Streszczenie:

Hand gesture recognition (HGR) heavily relies on high-quality annotated datasets. However, annotating hand landmarks in video sequences is a time-intensive challenge. In this work, we introduce IPN HandS, an enhanced version of our IPN Hand dataset, which now includes approximately 700,000 hand skeleton annotations and corrected gesture boundaries. To generate these annotations efficiently, we propose a novel annotation tool that combines automatic detection, inter-frame interpolation, copy–paste capabilities, and manual refinement. This tool significantly reduces annotation time from 70 min to just 27 min per video, allowing for the scalable and precise annotation of large datasets. We validate the advantages of the IPN HandS dataset by training a lightweight LSTM-based model using these annotations and comparing its performance against models trained with annotations from the widely used MediaPipe hand pose estimators. Our model achieves an accuracy that is 12% higher than the MediaPipe Hands model and 8% higher than the MediaPipe Holistic model. These results underscore the importance of annotation quality in training generalization and overall recognition performance. Both the IPN HandS dataset and the annotation tool will be released to support reproducible research and future work in HGR and related fields.

Style APA, Harvard, Vancouver, ISO itp.

2

Liu, Zheng. "LDA-Based Automatic Image Annotation Model." Advanced Materials Research 108-111 (May 2010): 88–94. http://dx.doi.org/10.4028/www.scientific.net/amr.108-111.88.

Pełny tekst źródła

Streszczenie:

This paper presents LDA-based automatic image annotation by visual topic learning and related annotation extending. We introduce the Latent Dirichlet Allocation (LDA) model in visual application domain. Firstly, the visual topic which is most relevant to the unlabeled image is obtained. According to this visual topic, the annotations with highest likelihood serve as seed annotations. Next, seed annotations are extended by analyzing the relationship between seed annotations and related Flickr tags. Finally, we combine seed annotations and extended annotations to construct final annotation set. Experiments conducted on corel5k dataset demonstrate the effectiveness of the proposed model.

Style APA, Harvard, Vancouver, ISO itp.

3

Okpala, Ebuka, and Long Cheng. "Large Language Model Annotation Bias in Hate Speech Detection." Proceedings of the International AAAI Conference on Web and Social Media 19 (June 7, 2025): 1389–418. https://doi.org/10.1609/icwsm.v19i1.35879.

Pełny tekst źródła

Streszczenie:

Large language models (LLMs) are fast becoming ubiquitous and have shown impressive performance in various natural language processing (NLP) tasks. Annotating data for downstream applications is a resource-intensive task in NLP. Recently, the use of LLMs as a cost-effective data annotator for annotating data used to train other models or as an assistive tool has been explored. Yet, little is known regarding the societal implications of using LLMs for data annotation. In this work, focusing on hate speech detection, we investigate how using LLMs such as GPT-4 and Llama-3 for hate speech detection can lead to different performances for different text dialects and racial bias in online hate detection classifiers. We used LLMs to predict hate speech in seven hate speech datasets and trained classifiers on the LLM annotations of each dataset. Using tweets written in African-American English (AAE) and Standard American English (SAE), we show that classifiers trained on LLM annotations assign tweets written in AAE to negative classes (e.g., hate, offensive, abuse, racism, etc.) at a higher rate than tweets written in SAE and that the classifiers have a higher false positive rate towards AAE tweets. We explore the effect of incorporating dialect priming in the prompting techniques used in prediction, showing that introducing dialect increases the rate at which AAE tweets are assigned to negative classes.

Style APA, Harvard, Vancouver, ISO itp.

4

Paun, Silviu, Bob Carpenter, Jon Chamberlain, Dirk Hovy, Udo Kruschwitz, and Massimo Poesio. "Comparing Bayesian Models of Annotation." Transactions of the Association for Computational Linguistics 6 (December 2018): 571–85. http://dx.doi.org/10.1162/tacl_a_00040.

Pełny tekst źródła

Streszczenie:

The analysis of crowdsourced annotations in natural language processing is concerned with identifying (1) gold standard labels, (2) annotator accuracies and biases, and (3) item difficulties and error patterns. Traditionally, majority voting was used for 1, and coefficients of agreement for 2 and 3. Lately, model-based analysis of corpus annotations have proven better at all three tasks. But there has been relatively little work comparing them on the same datasets. This paper aims to fill this gap by analyzing six models of annotation, covering different approaches to annotator ability, item difficulty, and parameter pooling (tying) across annotators and items. We evaluate these models along four aspects: comparison to gold labels, predictive accuracy for new annotations, annotator characterization, and item difficulty, using four datasets with varying degrees of noise in the form of random (spammy) annotators. We conclude with guidelines for model selection, application, and implementation.

Style APA, Harvard, Vancouver, ISO itp.

5

Misirli, Goksel, Matteo Cavaliere, William Waites, et al. "Annotation of rule-based models with formal semantics to enable creation, analysis, reuse and visualization." Bioinformatics 32, no. 6 (2015): 908–17. http://dx.doi.org/10.1093/bioinformatics/btv660.

Pełny tekst źródła

Streszczenie:

Abstract Motivation: Biological systems are complex and challenging to model and therefore model reuse is highly desirable. To promote model reuse, models should include both information about the specifics of simulations and the underlying biology in the form of metadata. The availability of computationally tractable metadata is especially important for the effective automated interpretation and processing of models. Metadata are typically represented as machine-readable annotations which enhance programmatic access to information about models. Rule-based languages have emerged as a modelling framework to represent the complexity of biological systems. Annotation approaches have been widely used for reaction-based formalisms such as SBML. However, rule-based languages still lack a rich annotation framework to add semantic information, such as machine-readable descriptions, to the components of a model. Results: We present an annotation framework and guidelines for annotating rule-based models, encoded in the commonly used Kappa and BioNetGen languages. We adapt widely adopted annotation approaches to rule-based models. We initially propose a syntax to store machine-readable annotations and describe a mapping between rule-based modelling entities, such as agents and rules, and their annotations. We then describe an ontology to both annotate these models and capture the information contained therein, and demonstrate annotating these models using examples. Finally, we present a proof of concept tool for extracting annotations from a model that can be queried and analyzed in a uniform way. The uniform representation of the annotations can be used to facilitate the creation, analysis, reuse and visualization of rule-based models. Although examples are given, using specific implementations the proposed techniques can be applied to rule-based models in general. Availability and implementation: The annotation ontology for rule-based models can be found at http://purl.org/rbm/rbmo. The krdf tool and associated executable examples are available at http://purl.org/rbm/rbmo/krdf. Contact: anil.wipat@newcastle.ac.uk or vdanos@inf.ed.ac.uk

Style APA, Harvard, Vancouver, ISO itp.

6

Li, Huadong, Ying Wei, Han Peng, and Wei Zhang. "DiffuPrompter: Pixel-Level Automatic Annotation for High-Resolution Remote Sensing Images with Foundation Models." Remote Sensing 16, no. 11 (2024): 2004. http://dx.doi.org/10.3390/rs16112004.

Pełny tekst źródła

Streszczenie:

Instance segmentation is pivotal in remote sensing image (RSI) analysis, aiding in many downstream tasks. However, annotating images with pixel-wise annotations is time-consuming and laborious. Despite some progress in automatic annotation, the performance of existing methods still needs improvement due to the high precision requirements for pixel-level annotation and the complexity of RSIs. With the support of large-scale data, some foundational models have made significant progress in semantic understanding and generalization capabilities. In this paper, we delve deep into the potential of the foundational models in automatic annotation and propose a training-free automatic annotation method called DiffuPrompter, achieving pixel-level automatic annotation of RSIs. Extensive experimental results indicate that the proposed method can provide reliable pseudo-labels, significantly reducing the annotation costs of the segmentation task. Additionally, the cross-domain validation experiments confirm the powerful effectiveness of large-scale pseudo-data in improving model generalization performance.

Style APA, Harvard, Vancouver, ISO itp.

7

Chu, Zhendong, Jing Ma, and Hongning Wang. "Learning from Crowds by Modeling Common Confusions." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 7 (2021): 5832–40. http://dx.doi.org/10.1609/aaai.v35i7.16730.

Pełny tekst źródła

Streszczenie:

Crowdsourcing provides a practical way to obtain large amounts of labeled data at a low cost. However, the annotation quality of annotators varies considerably, which imposes new challenges in learning a high-quality model from the crowdsourced annotations. In this work, we provide a new perspective to decompose annotation noise into common noise and individual noise and differentiate the source of confusion based on instance difficulty and annotator expertise on a per-instance-annotator basis. We realize this new crowdsourcing model by an end-to-end learning solution with two types of noise adaptation layers: one is shared across annotators to capture their commonly shared confusions, and the other one is pertaining to each annotator to realize individual confusion. To recognize the source of noise in each annotation, we use an auxiliary network to choose from the two noise adaptation layers with respect to both instances and annotators. Extensive experiments on both synthesized and real-world benchmarks demonstrate the effectiveness of our proposed common noise adaptation solution.

Style APA, Harvard, Vancouver, ISO itp.

8

Rotman, Guy, and Roi Reichart. "Multi-task Active Learning for Pre-trained Transformer-based Models." Transactions of the Association for Computational Linguistics 10 (2022): 1209–28. http://dx.doi.org/10.1162/tacl_a_00515.

Pełny tekst źródła

Streszczenie:

Abstract Multi-task learning, in which several tasks are jointly learned by a single model, allows NLP models to share information from multiple annotations and may facilitate better predictions when the tasks are inter-related. This technique, however, requires annotating the same text with multiple annotation schemes, which may be costly and laborious. Active learning (AL) has been demonstrated to optimize annotation processes by iteratively selecting unlabeled examples whose annotation is most valuable for the NLP model. Yet, multi-task active learning (MT-AL) has not been applied to state-of-the-art pre-trained Transformer-based NLP models. This paper aims to close this gap. We explore various multi-task selection criteria in three realistic multi-task scenarios, reflecting different relations between the participating tasks, and demonstrate the effectiveness of multi-task compared to single-task selection. Our results suggest that MT-AL can be effectively used in order to minimize annotation efforts for multi-task NLP models.1

Style APA, Harvard, Vancouver, ISO itp.

9

Wen-Yi, Andrea W., Kathryn Adamson, Nathalie Greenfield, et al. "Automate or Assist? The Role of Computational Models in Identifying Gendered Discourse in US Capital Trial Transcripts." Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society 7 (October 16, 2024): 1556–66. http://dx.doi.org/10.1609/aies.v7i1.31746.

Pełny tekst źródła

Streszczenie:

The language used by US courtroom actors in criminal trials has long been studied for biases. However, systematic studies for bias in high-stakes court trials have been difficult, due to the nuanced nature of bias and the legal expertise required. Large language models offer the possibility to automate annotation. But validating the computational approach requires both an understanding of how automated methods fit in existing annotation workflows and what they really offer. We present a case study of adding a computational model to a complex and high-stakes problem: identifying gender-biased language in US capital trials for women defendants. Our team of experienced death-penalty lawyers and NLP technologists pursue a three-phase study: first annotating manually, then training and evaluating computational models, and finally comparing expert annotations to model predictions. Unlike many typical NLP tasks, annotating for gender bias in months-long capital trials is complicated, with many individual judgment calls. Contrary to standard arguments for automation that are based on efficiency and scalability, legal experts find the computational models most useful in providing opportunities to reflect on their own bias in annotation and to build consensus on annotation rules. This experience suggests that seeking to replace experts with computational models for complex annotation is both unrealistic and undesirable. Rather, computational models offer valuable opportunities to assist the legal experts in annotation-based studies.

Style APA, Harvard, Vancouver, ISO itp.

10

Luo, Yan, Tianxiu Lu, Weihan Zhang, Suiqun Li, and Xuefeng Wang. "Augmenting Three-Dimensional Model Annotation System with Enhanced Reality." Journal of Computing and Electronic Information Management 12, no. 2 (2024): 1–7. http://dx.doi.org/10.54097/uv15ws76.

Pełny tekst źródła

Streszczenie:

This study proposes an augmented reality-based three-dimensional model annotation system, integrating cloud anchors, three-dimensional reconstruction, and augmented reality technology to achieve explicit three-dimensional annotations on models. Employing an improved ORB algorithm, the annotated model is persistently anchored in three-dimensional space through cloud anchors, presenting accurate spatial information and showcasing the depth of scenes and relationships between elements. The system supports multiple data types for annotations, such as text and images. Through a comparison with traditional two-dimensional annotation in a drone experiment, the system demonstrates higher experimental efficiency, providing more intuitive annotation guidance and enhancing remote guidance efficiency and user understanding of drones.

Style APA, Harvard, Vancouver, ISO itp.

11

Filali, Jalila, Hajer Baazaoui Zghal, and Jean Martinet. "Ontology-Based Image Classification and Annotation." International Journal of Pattern Recognition and Artificial Intelligence 34, no. 11 (2020): 2040002. http://dx.doi.org/10.1142/s0218001420400029.

Pełny tekst źródła

Streszczenie:

With the rapid growth of image collections, image classification and annotation has been active areas of research with notable recent progress. Bag-of-Visual-Words (BoVW) model, which relies on building visual vocabulary, has been widely used in this area. Recently, attention has been shifted to the use of advanced architectures which are characterized by multi-level processing. Hierarchical Max-Pooling (HMAX) model has attracted a great deal of attention in image classification. To improve image classification and annotation, several approaches based on ontologies have been proposed. However, image classification and annotation remain a challenging problem due to many related issues like the problem of ambiguity between classes. This problem can affect the quality of both classification and annotation results. In this paper, we propose an ontology-based image classification and annotation approach. Our contributions consist of the following: (1) exploiting ontological relationships between classes during both image classification and annotation processes; (2) combining the outputs of hypernym–hyponym classifiers to lead to a better discrimination between classes; and (3) annotating images by combining hypernym and hyponym classification results in order to improve image annotation and to reduce the ambiguous and inconsistent annotations. The aim is to improve image classification and annotation by using ontologies. Several strategies have been experimented, and the obtained results have shown that our proposal improves image classification and annotation.

Style APA, Harvard, Vancouver, ISO itp.

12

Wu, Xian, Wei Fan, and Yong Yu. "Sembler: Ensembling Crowd Sequential Labeling for Improved Quality." Proceedings of the AAAI Conference on Artificial Intelligence 26, no. 1 (2021): 1713–19. http://dx.doi.org/10.1609/aaai.v26i1.8351.

Pełny tekst źródła

Streszczenie:

Many natural language processing tasks, such as named entity recognition (NER), part of speech (POS) tagging, word segmentation, and etc., can be formulated as sequential data labeling problems. Building a sound labeler requires very large number of correctly labeled training examples, which may not always be possible. On the other hand, crowdsourcing provides an inexpensive yet efficient alternative to collect manual sequential labeling from non-experts. However the quality of crowd labeling cannot be guaranteed, and three kinds of errors are typical: (1) incorrect annotations due to lack of expertise (e.g., labeling gene names from plain text requires corresponding domain knowledge); (2) ignored or omitted annotations due to carelessness or low confidence; (3) noisy annotations due to cheating or vandalism. To correct these mistakes, we present Sembler, a statistical model for ensembling crowd sequential labelings. Sembler considers three types of statistical information: (1) the majority agreement that proves the correctness of an annotation; (2) correct annotation that improves the credibility of the corresponding annotator; (3) correct annotation that enhances the correctness of other annotations which share similar linguistic or contextual features. We evaluate the proposed model on a real Twitter and a synthetical biological data set, and find that Sembler is particularly accurate when more than half of annotators make mistakes.

Style APA, Harvard, Vancouver, ISO itp.

13

VanBerlo, Bennett, Delaney Smith, Jared Tschirhart, et al. "Enhancing Annotation Efficiency with Machine Learning: Automated Partitioning of a Lung Ultrasound Dataset by View." Diagnostics 12, no. 10 (2022): 2351. http://dx.doi.org/10.3390/diagnostics12102351.

Pełny tekst źródła

Streszczenie:

Background: Annotating large medical imaging datasets is an arduous and expensive task, especially when the datasets in question are not organized according to deep learning goals. Here, we propose a method that exploits the hierarchical organization of annotating tasks to optimize efficiency. Methods: We trained a machine learning model to accurately distinguish between one of two classes of lung ultrasound (LUS) views using 2908 clips from a larger dataset. Partitioning the remaining dataset by view would reduce downstream labelling efforts by enabling annotators to focus on annotating pathological features specific to each view. Results: In a sample view-specific annotation task, we found that automatically partitioning a 780-clip dataset by view saved 42 min of manual annotation time and resulted in 55±6 additional relevant labels per hour. Conclusions: Automatic partitioning of a LUS dataset by view significantly increases annotator efficiency, resulting in higher throughput relevant to the annotating task at hand. The strategy described in this work can be applied to other hierarchical annotation schemes.

Style APA, Harvard, Vancouver, ISO itp.

14

Pozharkova, I. N. "Context-Dependent Annotation Method in Emergency Monitoring Information Systems." Informacionnye Tehnologii 28, no. 1 (2022): 43–47. http://dx.doi.org/10.17587/it.28.43-47.

Pełny tekst źródła

Streszczenie:

The article presents the method of context-dependent annotations used in solving problems of emergency monitoring on the basis of information systems. The method is based on a spectral language model that allows solving various information search problems taking into account the specific features of the applied area. The functional model of emergency monitoring task in IDEF0 notation is presented. The task of context-dependent annotating operational summaries as a basis for generating preliminary reports is formulated. The main problems that arise in solving this problem on a large volume of initial data and can be critical in conditions of fast-developing emergencies are identified. The problem of context-dependent annotations in the conditions of existing restrictions was set, the main language units used in solving it are described. A flowchart of solving the problem of context-dependent annotating taking into account the specifics of the subject area is presented, and the implementation of each stage of this algorithm is described in detail. Described is a method of determining relevance of a text fragment to a target query based on a spectral language model. Basic assessments of annotation quality are described. A comparative analysis of the quality and speed of constructing annotations manually and on the basis of the presented method using assessory estimates was carried out. The effectiveness of the context-dependent annotation method in processing a large number of documents in conditions of fast-developing emergency situations requiring emergency decision-making is shown.

Style APA, Harvard, Vancouver, ISO itp.

15

Bauer, Matthias, and Angelika Zirker. "Explanatory Annotation of Literary Texts and the Reader: Seven Types of Problems." International Journal of Humanities and Arts Computing 11, no. 2 (2017): 212–32. http://dx.doi.org/10.3366/ijhac.2017.0193.

Pełny tekst źródła

Streszczenie:

While most literary scholars wish to help readers understand literary texts by providing them with explanatory annotations, we want to go a step further and enable them, on the basis of structured information, to arrive at interpretations of their own. We therefore seek to establish a concept of explanatory annotation that is reader-oriented and combines hermeneutics with the opportunities provided by digital methods. In a first step, we are going to present a few examples of existing annotations that apparently do not take into account readerly needs. To us, they represent seven types of common problems in explanatory annotation. We then introduce a possible model of best practice which is based on categories and structured along the lines of the following questions: What kind(s) of annotations do improve text comprehension? Which contexts must be considered when annotating? Is it possible to develop a concept of the reader on the basis of annotations—and can, in turn, annotations address a particular kind of readership, i.e.: in how far can annotations be(come) individualised?

Style APA, Harvard, Vancouver, ISO itp.

16

Wood, Valerie, Seth Carbon, Midori A. Harris, et al. "Term Matrix: a novel Gene Ontology annotation quality control system based on ontology term co-annotation patterns." Open Biology 10, no. 9 (2020): 200149. http://dx.doi.org/10.1098/rsob.200149.

Pełny tekst źródła

Streszczenie:

Biological processes are accomplished by the coordinated action of gene products. Gene products often participate in multiple processes, and can therefore be annotated to multiple Gene Ontology (GO) terms. Nevertheless, processes that are functionally, temporally and/or spatially distant may have few gene products in common, and co-annotation to unrelated processes probably reflects errors in literature curation, ontology structure or automated annotation pipelines. We have developed an annotation quality control workflow that uses rules based on mutually exclusive processes to detect annotation errors, based on and validated by case studies including the three we present here: fission yeast protein-coding gene annotations over time; annotations for cohesin complex subunits in human and model species; and annotations using a selected set of GO biological process terms in human and five model species. For each case study, we reviewed available GO annotations, identified pairs of biological processes which are unlikely to be correctly co-annotated to the same gene products (e.g. amino acid metabolism and cytokinesis), and traced erroneous annotations to their sources. To date we have generated 107 quality control rules, and corrected 289 manual annotations in eukaryotes and over 52 700 automatically propagated annotations across all taxa.

Style APA, Harvard, Vancouver, ISO itp.

17

Hayat, Hassan, Carles Ventura, and Agata Lapedriza. "Modeling Subjective Affect Annotations with Multi-Task Learning." Sensors 22, no. 14 (2022): 5245. http://dx.doi.org/10.3390/s22145245.

Pełny tekst źródła

Streszczenie:

In supervised learning, the generalization capabilities of trained models are based on the available annotations. Usually, multiple annotators are asked to annotate the dataset samples and, then, the common practice is to aggregate the different annotations by computing average scores or majority voting, and train and test models on these aggregated annotations. However, this practice is not suitable for all types of problems, especially when the subjective information of each annotator matters for the task modeling. For example, emotions experienced while watching a video or evoked by other sources of content, such as news headlines, are subjective: different individuals might perceive or experience different emotions. The aggregated annotations in emotion modeling may lose the subjective information and actually represent an annotation bias. In this paper, we highlight the weaknesses of models that are trained on aggregated annotations for modeling tasks related to affect. More concretely, we compare two generic Deep Learning architectures: a Single-Task (ST) architecture and a Multi-Task (MT) architecture. While the ST architecture models single emotional perception each time, the MT architecture jointly models every single annotation and the aggregated annotations at once. Our results show that the MT approach can more accurately model every single annotation and the aggregated annotations when compared to methods that are directly trained on the aggregated annotations. Furthermore, the MT approach achieves state-of-the-art results on the COGNIMUSE, IEMOCAP, and SemEval_2007 benchmarks.

Style APA, Harvard, Vancouver, ISO itp.

18

Rao, Xun, Jiasheng Wang, Wenjing Ran, Mengzhu Sun, and Zhe Zhao. "Deep-Learning-Based Annotation Extraction Method for Chinese Scanned Maps." ISPRS International Journal of Geo-Information 12, no. 10 (2023): 422. http://dx.doi.org/10.3390/ijgi12100422.

Pełny tekst źródła

Streszczenie:

One of a map’s fundamental elements is its annotations, and extracting these annotations is an important step in enabling machine intelligence to understand scanned map data. Due to the complexity of the characters and lines, extracting annotations from scanned Chinese maps is difficult, and there is currently little research in this area. A deep-learning-based framework for extracting annotations from scanned Chinese maps is presented in the paper. Improved the EAST annotation detection model and CRNN annotation recognition model based on transfer learning make up the two primary parts of this framework. Several sets of the comparative tests for annotation detection and recognition were created in order to assess the efficacy of this method for extracting annotations from scanned Chinese maps. The experimental findings show the following: (i) The suggested annotation detection approach in this study revealed precision, recall, and h-mean values of 0.8990, 0.8389, and 0.8635, respectively. These measures demonstrate improvements over the currently popular models of −0.0354 to 0.0907, 0.0131 to 0.2735, and 0.0467 to 0.1919, respectively. (ii) The proposed annotation recognition method in this study revealed precision, recall, and h-mean values of 0.9320, 0.8956, and 0.9134, respectively. These measurements demonstrate improvements over the currently popular models of 0.0294 to 0.1049, 0.0498 to 0.1975, and 0.0402 to 0.1582, respectively.

Style APA, Harvard, Vancouver, ISO itp.

19

Ren, Jiaxin, Wanzeng Liu, Jun Chen, et al. "HI-CMAIM: Hybrid Intelligence-Based Multi-Source Unstructured Chinese Map Annotation Interpretation Model." Remote Sensing 17, no. 2 (2025): 204. https://doi.org/10.3390/rs17020204.

Pełny tekst źródła

Streszczenie:

Map annotation interpretation is crucial for geographic information extraction and intelligent map analysis. This study addresses the challenges associated with interpreting Chinese map annotations, specifically visual complexity and data scarcity issues, by proposing a hybrid intelligence-based multi-source unstructured Chinese map annotation interpretation method (HI-CMAIM). Firstly, leveraging expert knowledge in an innovative way, we constructed a high-quality expert knowledge-based map annotation dataset (EKMAD), which significantly enhanced data diversity and accuracy. Furthermore, an improved annotation detection model (CMA-DB) and an improved annotation recognition model (CMA-CRNN) were designed based on the characteristics of map annotations, both incorporating expert knowledge. A two-stage transfer learning strategy was employed to tackle the issue of limited training samples. Experimental results demonstrated the superiority of HI-CMAIM over existing algorithms. In the detection task, CMA-DB achieved an 8.54% improvement in Hmean (from 87.73% to 96.27%) compared to the DB algorithm. In the recognition task, CMA-CRNN achieved a 15.54% improvement in accuracy (from 79.77% to 95.31%) and a 4-fold reduction in NED (from 0.1026 to 0.0242), confirming the effectiveness and advancement of the proposed method. This research not only provides a novel approach and data support for Chinese map annotation interpretation but also fills the gap of high-quality, diverse datasets. It holds practical application value in fields such as geographic information systems and cartography, significantly contributing to the advancement of intelligent map interpretation.

Style APA, Harvard, Vancouver, ISO itp.

20

Attik, Mohammed, Malik Missen, Mickaël Coustaty, et al. "OpinionML—Opinion Markup Language for Sentiment Representation." Symmetry 11, no. 4 (2019): 545. http://dx.doi.org/10.3390/sym11040545.

Pełny tekst źródła

Streszczenie:

It is the age of the social web, where people express themselves by giving their opinions about various issues, from their personal life to the world’s political issues. This process generates a lot of opinion data on the web that can be processed for valuable information, and therefore, semantic annotation of opinions becomes an important task. Unfortunately, existing opinion annotation schemes have failed to satisfy annotation challenges and cannot even adhere to the basic definition of opinion. Opinion holders, topical features and temporal expressions are major components of an opinion that remain ignored in existing annotation schemes. In this work, we propose OpinionML, a new Markup Language, that aims to compensate for the issues that existing typical opinion markup languages fail to resolve. We present a detailed discussion about existing annotation schemes and their associated problems. We argue that OpinionML is more robust, flexible and easier for annotating opinion data. Its modular approach while implementing a logical model provides us with a flexible and easier model of annotation. OpinionML can be considered a step towards “information symmetry”. It is an effort for consistent sentiment annotations across the research community. We perform experiments to prove robustness of the proposed OpinionML and the results demonstrate its capability of retrieving significant components of opinion segments. We also propose OpinionML ontology in an effort to make OpinionML more inter-operable. The ontology proposed is more complete than existing opinion ontologies like Marl and Onyx. A comprehensive comparison of the proposed ontology with existing sentiment ontologies Marl and Onyx proves its worth.

Style APA, Harvard, Vancouver, ISO itp.

21

Li, Wei, Haiyu Song, Hongda Zhang, Houjie Li, and Pengjie Wang. "The Image Annotation Refinement in Embedding Feature Space based on Mutual Information." International Journal of Circuits, Systems and Signal Processing 16 (January 10, 2022): 191–201. http://dx.doi.org/10.46300/9106.2022.16.23.

Pełny tekst źródła

Streszczenie:

The ever-increasing size of images has made automatic image annotation one of the most important tasks in the fields of machine learning and computer vision. Despite continuous efforts in inventing new annotation algorithms and new models, results of the state-of-the-art image annotation methods are often unsatisfactory. In this paper, to further improve annotation refinement performance, a novel approach based on weighted mutual information to automatically refine the original annotations of images is proposed. Unlike the traditional refinement model using only visual feature, the proposed model use semantic embedding to properly map labels and visual features to a meaningful semantic space. To accurately measure the relevance between the particular image and its original annotations, the proposed model utilize all available information including image-to-image, label-to-label and image-to-label. Experimental results conducted on three typical datasets show not only the validity of the refinement, but also the superiority of the proposed algorithm over existing ones. The improvement largely benefits from our proposed mutual information method and utilizing all available information.

Style APA, Harvard, Vancouver, ISO itp.

22

Cooling, Michael T., and Peter Hunter. "The CellML Metadata Framework 2.0 Specification." Journal of Integrative Bioinformatics 12, no. 2 (2015): 86–103. http://dx.doi.org/10.1515/jib-2015-260.

Pełny tekst źródła

Streszczenie:

Summary The CellML Metadata Framework 2.0 is a modular framework that describes how semantic annotations should be made about mathematical models encoded in the CellML (www.cellml.org) format, and their elements. In addition to the Core specification, there are several satellite specifications, each designed to cater for model annotation in a different context. Basic Model Information, Citation, License and Biological Annotation specifications are presented.

Style APA, Harvard, Vancouver, ISO itp.

23

Öhman, Emily, and Kaisla Kajava. "Sentimentator." Digital Humanities in the Nordic and Baltic Countries Publications 1, no. 1 (2018): 98–110. http://dx.doi.org/10.5617/dhnbpub.11013.

Pełny tekst źródła

Streszczenie:

We introduce Sentimentator; a publicly available gamified web-based annotation platform for fine-grained sentiment annotation at the sentence-level. Sentimentator is unique in that it moves beyond binary classification. We use a ten-dimensional model which allows for the annotation of 51 unique sentiments and emotions. The platform is gamified with a complex scoring system designed to reward users for high quality annotations. Sentimentator introduces several unique features that have previously not been available, or at best very limited, for sentiment annotation. In particular, it provides streamlined multi-dimensional annotation optimized for sentence-level annotation of movie subtitles. Because the platform is publicly available it will benefit anyone and everyone interested in fine-grained sentiment analysis and emotion detection, as well as annotation of other datasets.

Style APA, Harvard, Vancouver, ISO itp.

24

Liu, Guojun, Yan Shi, Hongxu Huang, et al. "FPCAM: A Weighted Dictionary-Driven Model for Single-Cell Annotation in Pulmonary Fibrosis." Biology 14, no. 5 (2025): 479. https://doi.org/10.3390/biology14050479.

Pełny tekst źródła

Streszczenie:

The groundbreaking development of scRNA-seq has significantly improved cellular resolution. However, accurate cell-type annotation remains a major challenge. Existing annotation tools are often limited by their reliance on reference datasets, the heterogeneity of marker genes, and subjective biases introduced through manual intervention, all of which impact annotation accuracy and reliability. To address these limitations, we developed FPCAM, a fully automated pulmonary fibrosis cell-type annotation model. Built on the R Shiny platform, FPCAM utilizes a matrix of up-regulated marker genes and a manually curated gene–cell association dictionary specific to pulmonary fibrosis. It achieves accurate and efficient cell-type annotation through similarity matrix construction and optimized matching algorithms. To evaluate its performance, we compared FPCAM with state-of-the-art annotation models, including SCSA, SingleR, and SciBet. The results showed that FPCAM and SCSA both achieved an accuracy of 89.7%, outperforming SingleR and SciBet. Furthermore, FPCAM demonstrated high accuracy in annotating the external validation dataset GSE135893, successfully identifying multiple cell subtypes. In summary, FPCAM provides an efficient, flexible, and accurate solution for cell-type identification and serves as a powerful tool for scRNA-seq research in pulmonary fibrosis and other related diseases.

Style APA, Harvard, Vancouver, ISO itp.

25

Tschöpe, Okka, Lutz Suhrbier, Anton Güntsch, and Walter Berendsohn. "AnnoSys – an online tool for sharing annotations to enhance data quality." Biodiversity Information Science and Standards 1 (August 15, 2017): e20315. https://doi.org/10.3897/tdwgproceedings.1.20315.

Pełny tekst źródła

Streszczenie:

AnnoSys is a web-based open-source information system that enables users to correct and enrich specimen data published in data portals, thus enhancing data quality and documenting research developments over time. This brings the traditional annotation workflows for specimens to the Internet, as annotations become visible to researchers who subsequently observe the annotated specimen. During its first phase, the AnnoSys project developed a fully functional prototype of an annotation data repository for complex and cross-linked XML-standardized data in the ABCD (Access to biological collection data Berendsohn 2007- and Darwin Core (DwC - Wieczorek et al. 2012) standards, including back-end server functionality, web services and an on-line user interface Tschoepe et al. 2013. Annotation data are stored using the Open Annotation Data Model Sanderson et al. 2013 and an RDF-database Suhrbier et al. 2017. Public access to the annotations and the corresponding copy of the original record is provided via Linked Data, REST and SPARQL web services. AnnoSys can easyly be integrated into portals providing specimen data (see Suhrbier & al., this session). As a result, the individual specimen page then includes two links, one providing access to existing annotations stored in the AnnoSys repository, the other linking to the AnnoSys annotation Editor for annotation input. AnnoSys is now integrated into a dozen specimen portals, including the Global Biodiversity Information Facility GBIF and the Global Genome Biodiversity Network GGBN. In contrast to conventional, site-based annotation systems, annotations regarding a specimen are accessible from all portals providing access to the specimen's data, independent of which portal has originally been used as a starting point for the annotation. Apart from that, users can query the data in the AnnoSys portal or create a subscription to get notified about annotations using criteria referring to the data record. For example, a specialist for a certain family of organisms, working on a flora or fauna of a certain country, may subscribe to that family name and the country. The subscriber is notified by email about any annotations that fulfil these criteria. Other possible subscription and filter criteria include the name of collector, identifer or annotator, catalogue or accession numbers, and collection name or code. For curators a special curatorial workflow supports their handling of annotations, for example confirming a correction according to the annotation in the underlying primary database. User feedback on the currently available system has led to a significantly simplified version of the user interface, which is currently undergoing testing and final implementation. Moreover, the current, second project phase aims at extending the generic qualities of AnnoSys to allow processing of additional data formats, including RDF data with machine readable semantic concepts, and thus opening up the data gathered through AnnoSys for the Semantic Web. We developed a semantic concept driven annotation management, including the specification of a selector concept for RDF data and a repository for original records extended to RDF and other formats. Based on DwC RDF terms and the ABCD ontology, which deconstructs the ABCD XML-schema into individually addressable RDF-resources, we built an "AnnoSys ontology". The AnnoSys-2 system is currently in the testing phase and will be released in 2018. In future research (see Suhrbier, this volume), we will examine the use of AnnoSys for taxon-level data as well as its integration with image annotation systems. BGBM Berlin is committed to sustain AnnoSys beyond the financed project phase.

Style APA, Harvard, Vancouver, ISO itp.

26

Wang, Tian, Yuanye Ma, Catherine Blake, Masooda Bashir, and Hsin-Yuan Wang. "Taking disagreements into consideration: human annotation variability in privacy policy analysis." Information Research an international electronic journal 30, iConf (2025): 81–92. https://doi.org/10.47989/ir30iconf47581.

Pełny tekst źródła

Streszczenie:

Introduction. Privacy policies inform users about data practices but are often complex and difficult to interpret. Human annotation plays a key role in understanding privacy policies, yet annotation disagreements highlight the complexity of these texts. Traditional machine learning models prioritize consensus, overlooking annotation variability and its impact on accuracy. Method. This study examines how annotation disagreements affect machine learning performance using the OPP-115 corpus. It compares majority vote and union methods with alternative strategies to assess their impact on policy classification. Analysis. The study evaluates whether increasing annotator consensus improves model effectiveness and if disagreement-aware approaches yield more reliable results. Results. Higher agreement levels improve model performance across most categories. Complete agreement yields the best F1-scores, especially for First Party Collection/Use and Third-Party Sharing/Collection. Annotation disagreements significantly impact classification outcomes, underscoring the need for understanding annotation disagreements. Conclusions. Ignoring annotation disagreements can misrepresent model accuracy. This study proposes new evaluation strategies that account for annotation variability, offering a more realistic approach to privacy policy analysis. Future work should explore the causes of annotation disagreements to improve machine learning transparency and reliability.

Style APA, Harvard, Vancouver, ISO itp.

27

Meunier, Loïc, Denis Baurain, and Luc Cornet. "AMAW: automated gene annotation for non-model eukaryotic genomes." F1000Research 12 (February 16, 2023): 186. http://dx.doi.org/10.12688/f1000research.129161.1.

Pełny tekst źródła

Streszczenie:

Background: The annotation of genomes is a crucial step regarding the analysis of new genomic data and resulting insights, and this especially for emerging organisms which allow researchers to access unexplored lineages, so as to expand our knowledge of poorly represented taxonomic groups. Complete pipelines for eukaryotic genome annotation have been proposed for more than a decade, but the issue is still challenging. One of the most widely used tools in the field is MAKER2, an annotation pipeline using experimental evidence (mRNA-seq and proteins) and combining different gene prediction tools. MAKER2 enables individual laboratories and small-scale projects to annotate non-model organisms for which pre-existing gene models are not available. The optimal use of MAKER2 requires gathering evidence data (by searching and assembling transcripts, and/or collecting homologous proteins from related organisms), elaborating the best annotation strategy (training of gene models) and efficiently orchestrating the different steps of the software in a grid computing environment, which is tedious, time-consuming and requires a great deal of bioinformatic skills. Methods: To address these issues, we present AMAW (Automated MAKER2 Annotation Wrapper), a wrapper pipeline for MAKER2 that automates the above-mentioned tasks. Importantly, AMAW also exists as a Singularity container recipe easy to deploy on a grid computer, thereby overcoming the tricky installation of MAKER2. Use case: The performance of AMAW is illustrated through the annotation of a selection of 32 protist genomes, for which we compared its annotations with those produced with gene models directly available in AUGUSTUS. Conclusions: Importantly, AMAW also exists as a Singularity container recipe easy to deploy on a grid computer, thereby overcoming the tricky installation of MAKER2

Style APA, Harvard, Vancouver, ISO itp.

28

Nichyporuk, Brennan, Jillian Cardinell, Justin Szeto, et al. "Rethinking Generalization: The Impact of Annotation Style on Medical Image Segmentation." Machine Learning for Biomedical Imaging 1, December 2022 (2022): 1–37. http://dx.doi.org/10.59275/j.melba.2022-2d93.

Pełny tekst źródła

Streszczenie:

Generalization is an important attribute of machine learning models, particularly for those that are to be deployed in a medical context, where unreliable predictions can have real world consequences. While the failure of models to generalize across datasets is typically attributed to a mismatch in the data distributions, performance gaps are often a consequence of biases in the "ground-truth" label annotations. This is particularly important in the context of medical image segmentation of pathological structures (e.g. lesions), where the annotation process is much more subjective, and affected by a number underlying factors, including the annotation protocol, rater education/experience, and clinical aims, among others. In this paper, we show that modeling annotation biases, rather than ignoring them, poses a promising way of accounting for differences in annotation style across datasets. To this end, we propose a generalized conditioning framework to (1) learn and account for different annotation styles across multiple datasets using a single model, (2) identify similar annotation styles across different datasets in order to permit their effective aggregation, and (3) fine-tune a fully trained model to a new annotation style with just a few samples. Next, we present an image-conditioning approach to model annotation styles that correlate with specific image features, potentially enabling detection biases to be more easily identified.

Style APA, Harvard, Vancouver, ISO itp.

29

Yeh, Eric, William Jarrold, and Joshua Jordan. "Leveraging Psycholinguistic Resources and Emotional Sequence Models for Suicide Note Emotion Annotation." Biomedical Informatics Insights 5s1 (January 2012): BII.S8979. http://dx.doi.org/10.4137/bii.s8979.

Pełny tekst źródła

Streszczenie:

We describe the submission entered by SRI International and UC Davis for the I2B2 NLP Challenge Track 2. Our system is based on a machine learning approach and employs a combination of lexical, syntactic, and psycholinguistic features. In addition, we model the sequence and locations of occurrence of emotions found in the notes. We discuss the effect of these features on the emotion annotation task, as well as the nature of the notes themselves. We also explore the use of bootstrapping to help account for what appeared to be annotator fatigue in the data. We conclude a discussion of future avenues for improving the approach for this task, and also discuss how annotations at the word span level may be more appropriate for this task than annotations at the sentence level.

Style APA, Harvard, Vancouver, ISO itp.

30

Li, Yongqi, Xin Miao, Mayi Xu, and Tieyun Qian. "Strong Empowered and Aligned Weak Mastered Annotation for Weak-to-Strong Generalization." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 26 (2025): 27437–45. https://doi.org/10.1609/aaai.v39i26.34955.

Pełny tekst źródła

Streszczenie:

The super-alignment problem of how humans can effectively supervise super-human AI has garnered increasing attention. Recent research has focused on investigating the weak-to-strong generalization (W2SG) scenario as an analogy for super-alignment. This scenario examines how a pre-trained strong model, supervised by an aligned weak model, can outperform its weak supervisor. Despite good progress, current W2SG methods face two main issues: 1) The annotation quality is limited by the knowledge scope of the weak model; 2) It is risky to position the strong model as the final corrector. To tackle these issues, we propose a ``Strong Empowered and Aligned Weak Mastered'' (SEAM) framework for weak annotations in W2SG. This framework can leverage the vast intrinsic knowledge of the pre-trained strong model to empower the annotation and position the aligned weak model as the annotation master. Specifically, the pre-trained strong model first generates principle fast-and-frugal trees for samples to be annotated, encapsulating rich sample-related knowledge. Then, the aligned weak model picks informative nodes based on the tree's information distribution for final annotations. Experiments on six datasets for preference tasks in W2SG scenarios validate the effectiveness of our proposed method.

Style APA, Harvard, Vancouver, ISO itp.

31

Mannai, Zayneb, Anis Kalboussi, and Ahmed Hadj Kacem. "Towards a Standard of Modelling Annotations in the E-Health Domain." Health Informatics - An International Journal 10, no. 04 (2021): 1–10. http://dx.doi.org/10.5121/hiij.2021.10401.

Pełny tekst źródła

Streszczenie:

A large number of annotation systems in e-health domain have been implemented in the literature. Several factors distinguish these systems from one another. In fact, each of these systems is based on a separate paradigm, resulting in a disorganized and unstructured vision. As part of our research, we attempted to categorize them based on the functionalities provided by each system, and we also proposed a model of annotations that integrates both the health professional and the patient in the process of annotating the medical file.

Style APA, Harvard, Vancouver, ISO itp.

32

Ma, Qin Yi, Li Hua Song, Da Peng Xie, and Mao Jun Zhou. "Development of CAD Model Annotation System Based on Design Intent." Applied Mechanics and Materials 863 (February 2017): 368–72. http://dx.doi.org/10.4028/www.scientific.net/amm.863.368.

Pełny tekst źródła

Streszczenie:

Most of the product design on the market is variant design or adaptive design, which need to reuse existing product design knowledge. A key aspect of reusing existing CAD model is correctly define and understand the design intents behind of existing CAD model, and this paper introduces a CAD model annotation system based on design intent. Design intents contained all design information of entire life cycle from modeling, analysis to manufacturing are marked onto the CAD model using PMI module in UG to improve the readability of the CAD model. Second, given the problems such as management difficulties, no filter and retrieval functions, this paper proposes an annotation manager system based on UG redevelopment by filtration, retrieval, grouping and other functions to reduce clutter on the 3D annotations and be convenient for users to view needed all kinds of annotations. Finally, design information is represented both internally within the 3D model and externally on a XML file.

Style APA, Harvard, Vancouver, ISO itp.

33

BALEY, Julien. "Leveraging graph algorithms to speed up the annotation of large rhymed corpora." Cahiers de Linguistique Asie Orientale 51, no. 1 (2022): 46–80. http://dx.doi.org/10.1163/19606028-bja10019.

Pełny tekst źródła

Streszczenie:

Abstract Rhyming patterns play a crucial role in the phonological reconstruction of earlier stages of Chinese. The past few years have seen the emergence of the use of graphs to model rhyming patterns, notably with List’s (2016) proposal to use graph community detection as a way to go beyond the limits of the link-and-bind method and test new hypotheses regarding phonological reconstruction. List’s approach requires the existence of a rhyme-annotated corpus; such corpora are rare and prohibitively expensive to produce. The present paper solves this problem by introducing several strategies to automate annotation. Among others, the main contribution is the use of graph community detection itself to build an automatic annotator. This annotator requires no previous annotation, no knowledge of phonology, and automatically adapts to corpora of different periods by learning their rhyme categories. Through a series of case studies, we demonstrate the viability of the approach in quickly annotating hundreds of thousands of poems with high accuracy.

Style APA, Harvard, Vancouver, ISO itp.

34

Xu, Yixuan, and Jingyi Cui. "Artificial Intelligence in Gene Annotation: Current Applications, Challenges, and Future Prospects." Theoretical and Natural Science 98, no. 1 (2025): 8–15. https://doi.org/10.54254/2753-8818/2025.21464.

Pełny tekst źródła

Streszczenie:

Gene annotation is a critical process in genomics that involves the description of not only the position but also the function of an encoded element of a genome. In general, this provides biological context to sequence data, enabling an advanced level of understanding of genetic information. This is important in areas aligned with genetic engineering, studies of diseases, and evolution. Through ML and DL methodologies, AI enhances functional annotation and gene prediction effectively and accurately. This review focuses on AI in genomic research and assesses its effectiveness compared to traditional annotation tools. Using Escherichia coli as the representative model organism, the study focuses on a systematic approach of gene prediction using web Augustus with functional annotation using DeepGOPlus, an artificial intelligence tool, instead of the conventional BLAST-based annotation using the UniProt database. The study researches the extent of GO term coverage, the specificity of the annotations, and the concordance among these various tools. Artificial intelligence is highly beneficial owing to its speed, scalability, and proficiency in annotating intricate or poorly defined genomic areas. Notable instances include DeepGOPlus, which has demonstrated enhanced coverage by suggesting new terms that were frequently missed by earlier traditional tools. Notwithstanding these, AI tools face challenges such as dependence on high-quality training data, concerns about interpretability, and the need for biological validation to support the predictions. This review emphasizes the transformative impact that artificial intelligence brings to the field of gene annotation by presenting novel applications in many fields, including personalized medicine and synthetic biology, in which traditional methods suffer from severe limitations.

Style APA, Harvard, Vancouver, ISO itp.

35

Shang, Zirui, Yubo Zhu, Hongxi Li, Shuo Yang, and Xinxiao Wu. "Video Summarization Using Denoising Diffusion Probabilistic Model." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 7 (2025): 6776–84. https://doi.org/10.1609/aaai.v39i7.32727.

Pełny tekst źródła

Streszczenie:

Video summarization aims to eliminate visual redundancy while retaining key parts of video to construct concise and comprehensive synopses. Most existing methods use discriminative models to predict the importance scores of video frames. However, these methods are susceptible to annotation inconsistency caused by the inherent subjectivity of different annotators when annotating the same video. In this paper, we introduce a generative framework for video summarization that learns how to generate summaries from a probability distribution perspective, effectively reducing the interference of subjective annotation noise. Specifically, we propose a novel diffusion summarization method based on the Denoising Diffusion Probabilistic Model (DDPM), which learns the probability distribution of training data through noise prediction, and generates summaries by iterative denoising. Our method is more resistant to subjective annotation noise, and is less prone to overfitting the training data than discriminative methods, with strong generalization ability. Moreover, to facilitate training DDPM with limited data, we employ an unsupervised video summarization model to implement the earlier denoising process. Extensive experiments on various datasets (TVSum, SumMe, and FPVSum) demonstrate the effectiveness of our method.

Style APA, Harvard, Vancouver, ISO itp.

36

Yuan, Guowen, Ben Kao, and Tien-Hsuan Wu. "CEMA – Cost-Efficient Machine-Assisted Document Annotations." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 9 (2023): 11043–50. http://dx.doi.org/10.1609/aaai.v37i9.26308.

Pełny tekst źródła

Streszczenie:

We study the problem of semantically annotating textual documents that are complex in the sense that the documents are long, feature rich, and domain specific. Due to their complexity, such annotation tasks require trained human workers, which are very expensive in both time and money. We propose CEMA, a method for deploying machine learning to assist humans in complex document annotation. CEMA estimates the human cost of annotating each document and selects the set of documents to be annotated that strike the best balance between model accuracy and human cost. We conduct experiments on complex annotation tasks in which we compare CEMA against other document selection and annotation strategies. Our results show that CEMA is the most cost-efficient solution for those tasks.

Style APA, Harvard, Vancouver, ISO itp.

37

Chanenson, Jake, Madison Pickering, and Noah Apthrope. "Automating Governing Knowledge Commons and Contextual Integrity (GKC-CI) Privacy Policy Annotations with Large Language Models." Proceedings on Privacy Enhancing Technologies 2025, no. 2 (2025): 280–308. https://doi.org/10.56553/popets-2025-0062.

Pełny tekst źródła

Streszczenie:

Identifying contextual integrity (CI) and governing knowledge commons (GKC) parameters in privacy policy texts can facilitate normative privacy analysis. However, GKC-CI annotation has heretofore required manual or crowdsourced effort. This paper demonstrates that high-accuracy GKC-CI parameter annotation of privacy policies can be performed automatically using large language models. We fine-tune 50 open-source and proprietary models on 21,588 ground truth GKC-CI annotations from 16 privacy policies. Our best performing model has an accuracy of 90.65%, which is comparable to the accuracy of experts on the same task. We apply our best performing model to 456 privacy policies from a variety of online services, demonstrating the effectiveness of scaling GKC-CI annotation for privacy policy exploration and analysis. We publicly release our model training code, training and testing data, an annotation visualizer, and all annotated policies for future GKC-CI research.

Style APA, Harvard, Vancouver, ISO itp.

38

Salek, Mahyar, Yoram Bachrach, and Peter Key. "Hotspotting — A Probabilistic Graphical Model For Image Object Localization Through Crowdsourcing." Proceedings of the AAAI Conference on Artificial Intelligence 27, no. 1 (2013): 1156–62. http://dx.doi.org/10.1609/aaai.v27i1.8465.

Pełny tekst źródła

Streszczenie:

Object localization is an image annotation task which consists of finding the location of a target object in an image. It is common to crowdsource annotation tasks and aggregate responses to estimate the true annotation. While for other kinds of annotations consensus is simple and powerful, it cannot be applied to object localization as effectively due to the task's rich answer space and inherent noise in responses. We propose a probabilistic graphical model to localize objects in images based on responses from the crowd. We improve upon natural aggregation methods such as the mean and the median by simultaneously estimating the difficulty level of each question and skill level of every participant. We empirically evaluate our model on crowdsourced data and show that our method outperforms simple aggregators both in estimating the true locations and in ranking participants by their ability. We also propose a simple adaptive sourcing scheme that works well for very sparse datasets.

Style APA, Harvard, Vancouver, ISO itp.

39

Wu, Aihua. "Ranking Biomedical Annotations with Annotator’s Semantic Relevancy." Computational and Mathematical Methods in Medicine 2014 (2014): 1–11. http://dx.doi.org/10.1155/2014/258929.

Pełny tekst źródła

Streszczenie:

Biomedical annotation is a common and affective artifact for researchers to discuss, show opinion, and share discoveries. It becomes increasing popular in many online research communities, and implies much useful information. Ranking biomedical annotations is a critical problem for data user to efficiently get information. As the annotator’s knowledge about the annotated entity normally determines quality of the annotations, we evaluate the knowledge, that is, semantic relationship between them, in two ways. The first is extracting relational information from credible websites by mining association rules between an annotator and a biomedical entity. The second way is frequent pattern mining from historical annotations, which reveals common features of biomedical entities that an annotator can annotate with high quality. We propose a weighted and concept-extended RDF model to represent an annotator, a biomedical entity, and their background attributes and merge information from the two ways as the context of an annotator. Based on that, we present a method to rank the annotations by evaluating their correctness according to user’s vote and the semantic relevancy between the annotator and the annotated entity. The experimental results show that the approach is applicable and efficient even when data set is large.

Style APA, Harvard, Vancouver, ISO itp.

40

Zhang, Hansong, Shikun Li, Dan Zeng, Chenggang Yan, and Shiming Ge. "Coupled Confusion Correction: Learning from Crowds with Sparse Annotations." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 15 (2024): 16732–40. http://dx.doi.org/10.1609/aaai.v38i15.29613.

Pełny tekst źródła

Streszczenie:

As the size of the datasets getting larger, accurately annotating such datasets is becoming more impractical due to the expensiveness on both time and economy. Therefore, crowd-sourcing has been widely adopted to alleviate the cost of collecting labels, which also inevitably introduces label noise and eventually degrades the performance of the model. To learn from crowd-sourcing annotations, modeling the expertise of each annotator is a common but challenging paradigm, because the annotations collected by crowd-sourcing are usually highly-sparse. To alleviate this problem, we propose Coupled Confusion Correction (CCC), where two models are simultaneously trained to correct the confusion matrices learned by each other. Via bi-level optimization, the confusion matrices learned by one model can be corrected by the distilled data from the other. Moreover, we cluster the ``annotator groups'' who share similar expertise so that their confusion matrices could be corrected together. In this way, the expertise of the annotators, especially of those who provide seldom labels, could be better captured. Remarkably, we point out that the annotation sparsity not only means the average number of labels is low, but also there are always some annotators who provide very few labels, which is neglected by previous works when constructing synthetic crowd-sourcing annotations. Based on that, we propose to use Beta distribution to control the generation of the crowd-sourcing labels so that the synthetic annotations could be more consistent with the real-world ones. Extensive experiments are conducted on two types of synthetic datasets and three real-world datasets, the results of which demonstrate that CCC significantly outperforms state-of-the-art approaches. Source codes are available at: https://github.com/Hansong-Zhang/CCC.

Style APA, Harvard, Vancouver, ISO itp.

41

Pimenov, I. S. "Analyzing Disagreements in Argumentation Annotation of Scientific Texts in Russian Language." NSU Vestnik. Series: Linguistics and Intercultural Communication 21, no. 2 (2023): 89–104. http://dx.doi.org/10.25205/1818-7935-2023-21-2-89-104.

Pełny tekst źródła

Streszczenie:

This paper presents the analysis of inter-annotator disagreements in modeling argumentation in scientific papers. The aim of the study is to specify annotation guidelines for the typical disagreement cases. The analysis focuses on inter-annotator disagreements at three annotation levels: theses identification, links construction between theses, specification of reasoning models for these links. The dataset contains 20 argumentation annotations for 10 scientific papers from two thematic areas, where two experts have independently annotated each text. These 20 annotations include 917 theses and 773 arguments. The annotation of each text has consisted in modelling its argumentation structure in accordance with Argument Interchange Format. The use of this model results in construction of an oriented graph with two node types (information nodes for statements, scheme nodes for links between them and reasoning models in these links) for an annotated text. Identification of reasoning models follows Walton’s classification. To identify disagreements between annotators, we perform an automatic comparison of graphs that represent an argumentation structure of the same text. This comparison includes three stages: 1) identification of theses that are present in one graph and absent in another; 2) detection of links that connect the corresponding theses between graphs in a different manner; 3) identification of different reasoning models specified for the same links. Next, an expert analysis of the automatically identified discrepancies enables specification of the typical disagreement cases based on the structural properties of argumentation graphs (positioning of theses, configuration of links across statements at different distances in the text, the ratio between the overall frequency of a reasoning model in annotations and the frequency of disagreements over its identification). The study shows that the correspondence values between argumentation graphs reach on average 78 % for theses, 55 % for links, 60 % for reasoning models. Typical disagreement cases include 1) detection of theses expressed in a text without explicit justification; 2) construction of links between theses in the same paragraph or at a distance of four and more paragraphs; 3) identification of two specific reasoning models (connected respectively to the 40 % and 33 % of disagreements); 4) confusion over functionally different schemes due to the perception of links by annotators in different aspects. The study results in formulating annotation guidelines for minimizing typical disagreement cases at each level of argumentation structures.

Style APA, Harvard, Vancouver, ISO itp.

42

Braylan, Alexander, Madalyn Marabella, Omar Alonso, and Matthew Lease. "A General Model for Aggregating Annotations Across Simple, Complex, and Multi-Object Annotation Tasks." Journal of Artificial Intelligence Research 78 (December 11, 2023): 901–73. http://dx.doi.org/10.1613/jair.1.14388.

Pełny tekst źródła

Streszczenie:

Human annotations are vital to supervised learning, yet annotators often disagree on the correct label, especially as annotation tasks increase in complexity. A common strategy to improve label quality is to ask multiple annotators to label the same item and then aggregate their labels. To date, many aggregation models have been proposed for simple categorical or numerical annotation tasks, but far less work has considered more complex annotation tasks, such as those involving open-ended, multivariate, or structured responses. Similarly, while a variety of bespoke models have been proposed for specific tasks, our work is the first we are aware of to introduce aggregation methods that generalize across many, diverse complex tasks, including sequence labeling, translation, syntactic parsing, ranking, bounding boxes, and keypoints. This generality is achieved by applying readily available task-specific distance functions, then devising a task-agnostic method to model these distances between labels, rather than the labels themselves. This article presents a unified treatment of our prior work on complex annotation modeling and extends that work with investigation of three new research questions. First, how do complex annotation task and dataset properties impact aggregation accuracy? Second, how should a task owner navigate the many modeling choices in order to maximize aggregation accuracy? Finally, what tests and diagnoses can verify that aggregation models are specified correctly for the given data? To understand how various factors impact accuracy and to inform model selection, we conduct large-scale simulation studies and broad experiments on real, complex datasets. Regarding testing, we introduce the concept of unit tests for aggregation models and present a suite of such tests to ensure that a given model is not mis-specified and exhibits expected behavior. Beyond investigating these research questions above, we discuss the foundational concept and nature of annotation complexity, present a new aggregation model as a conceptual bridge between traditional models and our own, and contribute a new general semisupervised learning method for complex label aggregation that outperforms prior work.

Style APA, Harvard, Vancouver, ISO itp.

43

Bilal, Mühenad, Ranadheer Podishetti, Leonid Koval, Mahmoud A. Gaafar, Daniel Grossmann, and Markus Bregulla. "The Effect of Annotation Quality on Wear Semantic Segmentation by CNN." Sensors 24, no. 15 (2024): 4777. http://dx.doi.org/10.3390/s24154777.

Pełny tekst źródła

Streszczenie:

In this work, we investigate the impact of annotation quality and domain expertise on the performance of Convolutional Neural Networks (CNNs) for semantic segmentation of wear on titanium nitride (TiN) and titanium carbonitride (TiCN) coated end mills. Using an innovative measurement system and customized CNN architecture, we found that domain expertise significantly affects model performance. Annotator 1 achieved maximum mIoU scores of 0.8153 for abnormal wear and 0.7120 for normal wear on TiN datasets, whereas Annotator 3 with the lowest expertise achieved significantly lower scores. Sensitivity to annotation inconsistencies and model hyperparameters were examined, revealing that models for TiCN datasets showed a higher coefficient of variation (CV) of 16.32% compared to 8.6% for TiN due to the subtle wear characteristics, highlighting the need for optimized annotation policies and high-quality images to improve wear segmentation.

Style APA, Harvard, Vancouver, ISO itp.

44

Dijkema, Tom, and Sam Leeflang. "DiSSCover the Potential of FAIR Digital Object Annotations and How You Can Use Them!" Biodiversity Information Science and Standards 8 (August 7, 2024): e133172. https://doi.org/10.3897/biss.8.133172.

Pełny tekst źródła

Streszczenie:

The infrastructure for the Distributed System of Scientific Collections (DiSSCo) is in full development. Work within the DiSSCo Transition Project has been focused on building infrastructure, creating data models, and setting up Application Programming Interfaces (APIs) (Koureas et al. 2024). In the past years, DiSSCo has presented this work at different Biodiversity Information Standards (TDWG) conferences (Leeflang and Addink 2023, Leeflang et al. 2022, Addink et al. 2021). In this year's session, we would like to focus on the human-facing application: DiSSCover.DiSSCover is the graphical user interface through which users can interact with Findable, Accessible, Interoperable and Reusable (FAIR) Digital Objects (FDOs), facilitating the curation and enhancement of specimen data (Islam 2024). Development started in 2022 and is ongoing. The interface acts as a gateway into the DiSSCo infrastructure, providing access to digital specimens and media. Extracted from the core DiSSCo API, the data is converted into an easily readable format and made discoverable through a diverse set of filters. DiSSCover's main focus is to allow users to make annotations upon the data.Through the concept of annotations, we connect expert and machine-generated information, to create extended digital specimens (Hardisty et al. 2022), e.g, by creating linkages to other infrastructures, correcting or adding new information, or by triggering machine annotation services. Machine annotation services are automated scalable tools that run in the background and automatically curate and extend the specimen (Addink et al. 2023). Human users will remain important, as all annotations made by machine annotation services can be reviewed by a trusted person. Annotations are Fair Digital Objects and target a specific part of a specimen, be it a data fragment or an associated media file.At the heart of DiSSCover lies the Open Digital Specimen data specification (Leeflang and Addink 2023). It tries to harmonise multiple data standards into one generic specification based on the new Global Biodiversity Information Facility (GBIF) Unified Model (Robertson et al. 2022). The data is stored as JavaScript Object Notation (JSON) based on JSON Schemas (Anonymous 2024). Annotations are linked to specific data attributes using a JSON-path as the identifier. Data attributes can be individual terms, collections of terms called classes, or the whole object. This creates a flexible but complex data structure, the basis for which we used the World Wide Web Consortium (W3C) web annotation data model (Sanderson et al. 2017).The W3C Web annotation data model contains two main components: the target and body. The target specifies which data attribute the annotation is made on, for example, the term: 'ods:specimenName'. This is a local term within the open Digital Specimen namespace (ods), which holds the accepted name of the digital specimen. The annotation body holds the value(s) that are appended to the digital specimen, and differ based upon the annotation motivation. DiSSCo recognises five different annotation motivations: addition, modification, comment, assessment and deletion, each of which has its own unique function. This creates a flexible structure that should be able to handle any information the user wants to add to the object. The challenge of DiSSCover is to preserve the complex structure of annotations, whilst making it convenient for users to work with.The session will provide a look at the different kinds of annotations and their use from a practical perspective. A demonstration of DiSSCover will show how users can create annotations, providing knowledge about the process that will give shape to DiSSCo's main goal of enriching natural history data.

Style APA, Harvard, Vancouver, ISO itp.

45

Zhu, Zhen, Yibo Wang, Shouqing Yang, et al. "CORAL: Collaborative Automatic Labeling System Based on Large Language Models." Proceedings of the VLDB Endowment 17, no. 12 (2024): 4401–4. http://dx.doi.org/10.14778/3685800.3685885.

Pełny tekst źródła

Streszczenie:

In the era of big data, data annotation is integral to numerous applications. However, it is widely acknowledged as a laborious and time-consuming process, significantly impeding the scalability and efficiency of data-driven applications. To reduce the human cost, we demonstrate CORAL, a collaborative automatic labeling system driven by large language models (LLMs), which achieves high-quality annotation with the least human effort. Firstly, CORAL employs LLM to automatically annotate vast datasets, generating coarse-grained labels. Subsequently, a weakly-supervised learning module trains small language models (SLMs) using noisy label learning techniques to distill accurate labels from LLM's annotations. It also allows statistical analysis of model outcomes to identify potentially erroneous labels, reducing the human cost of error detection. Furthermore, CORAL supports iterative refinement by LLMs and SLMs using manually corrected labels, thereby ensuring continual enhancement in annotation quality and model performance. A visual interface enables annotation process monitoring and result analysis.

Style APA, Harvard, Vancouver, ISO itp.

46

Spetale, Flavio E., Javier Murillo, Gabriela V. Villanova, Pilar Bulacio, and Elizabeth Tapia. "FGGA-lnc: automatic gene ontology annotation of lncRNA sequences based on secondary structures." Interface Focus 11, no. 4 (2021): 20200064. http://dx.doi.org/10.1098/rsfs.2020.0064.

Pełny tekst źródła

Streszczenie:

The study of long non-coding RNAs (lncRNAs), greater than 200 nucleotides, is central to understanding the development and progression of many complex diseases. Unlike proteins, the functionality of lncRNAs is only subtly encoded in their primary sequence. Current in-silico lncRNA annotation methods mostly rely on annotations inferred from interaction networks. But extensive experimental studies are required to build these networks. In this work, we present a graph-based machine learning method called FGGA-lnc for the automatic gene ontology (GO) annotation of lncRNAs across the three GO subdomains. We build upon FGGA (factor graph GO annotation), a computational method originally developed to annotate protein sequences from non-model organisms. In the FGGA-lnc version, a coding-based approach is introduced to fuse primary sequence and secondary structure information of lncRNA molecules. As a result, lncRNA sequences become sequences of a higher-order alphabet allowing supervised learning methods to assess individual GO-term annotations. Raw GO annotations obtained in this way are unaware of the GO structure and therefore likely to be inconsistent with it. The message-passing algorithm embodied by factor graph models overcomes this problem. Evaluations of the FGGA-lnc method on lncRNA data, from model and non-model organisms, showed promising results suggesting it as a candidate to satisfy the huge demand for functional annotations arising from high-throughput sequencing technologies.

Style APA, Harvard, Vancouver, ISO itp.

47

Cornwell, Peter. "Progress with Repository-based Annotation Infrastructure for Biodiversity Applications." Biodiversity Information Science and Standards 7 (September 14, 2023): e112707. https://doi.org/10.3897/biss.7.112707.

Pełny tekst źródła

Streszczenie:

Rapid development since the 1980s of technologies for analysing texts, has led not only to widespread employment of text 'mining', but also to now-pervasive large language model artificial intelligence (AI) applications. However, building new, concise, data resources from historic, as well as contemporary scientific literature, which can be employed efficiently at scale by automation and which have long-term value for the research community, has proved more elusive.Efforts at codifying analyses, such as the Text Encoding Initiative (TEI), date from the early 1990s and were initially driven by the social sciences and humanities (SSH) and linguistics communities, and extended with multiple XML-based tagging schemes, including in biodiversity (Miller et al. 2012). In 2010, the Bio-Ontologies Special Interest Group (of the International Society for Computational Biology) presented its Annotation Ontology (AO), incorporating JavaScript Object Notation and broadening previous XML-based approaches (Ciccarese et al. 2011). From 2011, the Open Annotation Data Model (OADM) (Sanderson et al. 2013) focused on cross-domain standards with utility for Web 3.0, leading to the W3C Web Annotation Data Model (WADM) Recommendation in February 2017*1 and the potential for unifying the multiplicity of already-in-use tagging approaches.This continual evolution has made the preservation of investment using annotation methods, and in particular of the connections between annotations and their context in source literature, particularly challenging. Infrastructure that entered service during the intervening years does not yet support WADM, and has only recently started to address the parallel emergence of page imagery-based standards such as the International Image Interoperability Framework (IIIF). Notably, IIIF instruments such as Mirador-2, which has been employed widely for manual creation and editing of annotations in SSH, continue to employ the now-deprecated OADM. Although multiple efforts now address combining IIIF and TEI text coordinate systems, they are currently fundamentally incompatible.However, emerging repository technologies enable preservation of annotation investment to be accomplished comprehensively for the first time. Native IIIF support enables interactive previewing of annotations within repository graphical user interfaces and dynamic serialisation technologies provide compatibility with existing XML-based infrastructures. Repository access controls can permit experts to trace annotation sources in original texts even if the literature is not publicly accessible, e.g., due to copyright restriction. This is of paramount importance, not only because surrounding context can be crucial to qualify formal terms that have been annotated, such as collecting country. Also, contemporary automated text mining—essential for operation at the scale of known biodiversity literature—is not 100% accurate and manual checking of uncertainties is currently essential. On-going improvement of language analysis tools through AI integration offers significant future gains from reprocessing literature and updating annotation data resources. Nevertheless, without effective preservation of digitized literature, as well as annotations, this enrichment will not be possible—and today's investments in gathering together, as well as analysing scientific literature will be devalued or lost.We report new functionality included in the InvenioRDM*2 Free and Open Source Software (FOSS) repository software platform, which natively supports IIIF and WADM. InvenioRDM development and maintenance is funded and managed by an international consortium. From late 2023, the InvenioRDM-based ZenodoRDM update*3 will display annotations on biodiversity literature interactively. Significantly, the Biodiversity Literature Repository (BLR) is a Zenodo Community. BLR automatically notifies the Global Biodiversity Information Facility (GBIF) of new taxonomic data and GBIF downloads and integrates this into its service.Moreover, an annotation service based on the WADM-native Mirador-3 FOSS IIIF viewer has now been developed and will enter service with ZenodoRDM. This enables editing of biodiversity annotations from within the repository interface, as well as automated updating of taxonomic information products provided to other major infrastructures such as GBIF.Two aspects of this ZenodoRDM annotation service are presented:dynamic transformation of (preservable) WADM annotations for consumption by contemporary IIIF-compliant applications such as Mirador-3, as well as for Plazi TreatmentBank/GBIF compatibilityauthentication and task organization permitting management of groups of expert contributors performing annotation enrichment tasks directly through the ZenodoRDM graphical user interface (GUI)Workflows for editing existing biodiversity annotations, as well as origination of new annotations, need to be tailored for specific tasks—e.g., unifying geographic collecting location definitions in historic reports—via configurable dialogs for contributors and controlled vocabularies. Selectively populating workflows with annotations according to a task definition is also important to avoid cluttering the editing GUI with non-essential information. Updated annotations are integrated into a new annotation collection upon completion of a task, before updating repository records.Current work on annotation workflows for SSH applications is also reported. The ZenodoRDM biodiversity annotation service implements a generic repository micro-service API, and the implementation of similar services for other repository software platforms is discussed.

Style APA, Harvard, Vancouver, ISO itp.

48

Ramakrishnaiah, Yashpal, Adam P. Morris, Jasbir Dhaliwal, Melcy Philip, Levin Kuhlmann, and Sonika Tyagi. "Linc2function: A Comprehensive Pipeline and Webserver for Long Non-Coding RNA (lncRNA) Identification and Functional Predictions Using Deep Learning Approaches." Epigenomes 7, no. 3 (2023): 22. http://dx.doi.org/10.3390/epigenomes7030022.

Pełny tekst źródła

Streszczenie:

Long non-coding RNAs (lncRNAs), comprising a significant portion of the human transcriptome, serve as vital regulators of cellular processes and potential disease biomarkers. However, the function of most lncRNAs remains unknown, and furthermore, existing approaches have focused on gene-level investigation. Our work emphasizes the importance of transcript-level annotation to uncover the roles of specific transcript isoforms. We propose that understanding the mechanisms of lncRNA in pathological processes requires solving their structural motifs and interactomes. A complete lncRNA annotation first involves discriminating them from their coding counterparts and then predicting their functional motifs and target bio-molecules. Current in silico methods mainly perform primary-sequence-based discrimination using a reference model, limiting their comprehensiveness and generalizability. We demonstrate that integrating secondary structure and interactome information, in addition to using transcript sequence, enables a comprehensive functional annotation. Annotating lncRNA for newly sequenced species is challenging due to inconsistencies in functional annotations, specialized computational techniques, limited accessibility to source code, and the shortcomings of reference-based methods for cross-species predictions. To address these challenges, we developed a pipeline for identifying and annotating transcript sequences at the isoform level. We demonstrate the effectiveness of the pipeline by comprehensively annotating the lncRNA associated with two specific disease groups. The source code of our pipeline is available under the MIT licensefor local use by researchers to make new predictions using the pre-trained models or to re-train models on new sequence datasets. Non-technical users can access the pipeline through a web server setup.

Style APA, Harvard, Vancouver, ISO itp.

49

Sakamoto, Nami, Takaki Oka, Yuki Matsuzawa, et al. "MS2Lipid: A Lipid Subclass Prediction Program Using Machine Learning and Curated Tandem Mass Spectral Data." Metabolites 14, no. 11 (2024): 602. http://dx.doi.org/10.3390/metabo14110602.

Pełny tekst źródła

Streszczenie:

Background: Untargeted lipidomics using collision-induced dissociation-based tandem mass spectrometry (CID-MS/MS) is essential for biological and clinical applications. However, annotation confidence still relies on manual curation by analytical chemists, despite the development of various software tools for automatic spectral processing based on rule-based fragment annotations. Methods: In this study, we present a novel machine learning model, MS2Lipid, for the prediction of known lipid subclasses from MS/MS queries, providing an orthogonal approach to existing lipidomics software programs in determining the lipid subclass of ion features. We designed a new descriptor, MCH (mode of carbon and hydrogen), to increase the specificity of lipid subclass prediction in nominal mass resolution MS data. Results: The model, trained with 6760 and 6862 manually curated MS/MS spectra for the positive and negative ion modes, respectively, classified queries into one or several of 97 lipid subclasses, achieving an accuracy of 97.4% in the test set. The program was further validated using various datasets from different instruments and curators, with the average accuracy exceeding 87.2%. Using an integrated approach with molecular spectral networking, we demonstrated the utility of MS2Lipid by annotating microbiota-derived esterified bile acids, whose abundance was significantly increased in fecal samples of obese patients in a human cohort study. This suggests that the machine learning model provides an independent criterion for lipid subclass classification, enhancing the annotation of lipid metabolites within known lipid classes. Conclusions: MS2Lipid is a highly accurate machine learning model that enhances lipid subclass annotation from MS/MS data and provides an independent criterion.

Style APA, Harvard, Vancouver, ISO itp.

50

Julian, Sahertian, and Akbar Saiful. "Automatic Image Annotation Using CMRM with Scene Information." TELKOMNIKA Telecommunication, Computing, Electronics and Control 15, no. 2 (2017): 693–701. https://doi.org/10.12928/TELKOMNIKA.v15i2.5160.

Pełny tekst źródła

Streszczenie:

Searching of digital images in a disorganized image collection is a challenging problem. One step of image searching is automatic image annotation. Automatic image annotation refers to the process of automatically assigning relevant text keywords to any given image, reflecting its content. In the past decade many automatic image annotation methods have been proposed and achieved promising result. However, annotation prediction from the methods is still far from accurate. To tackle this problem, in this paper we propose an automatic annotation method using relevance model and scene information. CMRM is one of automatic image annotation method based on relevance model approach. CMRM method assumes that regions in an image can be described using a small vocabulary of blobs. Blobs are generated from segmentation, feature extraction, and clustering. Given a training set of images with annotations, this method predicts the probability of generating a word given the blobs in an image. To improve annotation prediction accuracy of CMRM, in this paper we utilize scene information incorporate with CMRM. Our proposed method is called scene-CMRM. Global image region can be represented by features which indicate type of scene shown in the image. Thus, annotation prediction of CMRM could be more accurate based on that scene type. Our experiments showed that, the methods provides prediction with better precision than CMRM does, where precision represents the percentage of words that is correctly predicted.

Style APA, Harvard, Vancouver, ISO itp.

Oferujemy zniżki na wszystkie plany premium dla autorów, których prace zostały uwzględnione w tematycznych zestawieniach literatury. Skontaktuj się z nami, aby uzyskać unikalny kod promocyjny!