Log in

Relevant bibliographies by topics / Annotated score / Journal articles

To see the other types of publications on this topic, follow the link: Annotated score.

Journal articles on the topic 'Annotated score'

Author: Grafiati

Published: 9 March 2023

Last updated: 10 March 2023

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Annotated score.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Constraints, Generative. "Break Up Variations: An Annotated Score." Performance Philosophy 4, no. 2 (February 1, 2019): 591–600. http://dx.doi.org/10.21476/pp.2019.42227.

Full text

Abstract:

Break Up Variations is an annotated score by means of which we consider the document as a break-up from — and with — the thinking of performance. We explore the formal categories of page-based and stage-based scores and documentations of performance, asserting the simultaneity of the document and its performance in their mutual departures, theorising the break-up as a form of relation, not as its absence. As a committee of interdisciplinary researchers and practitioners, we consider annotation in terms of affective and theoretical responses to each other’s subject positions.Break Up Variations relates to the problems particular to working in groups: the challenges of collaboration, the disagreements and community-led conflict resolutions, the difficulties with acting professionally, and the desires to keep working together, despite it all. We ask the following questions of each other and ourselves: What are the strategies that art, science, politics and theory might offer each other for navigating — possibly circumventing — the demise of relationships? If the working relationship breaks down, could the end of the group be considered a constitutive aspect of that group? We consider these questions to be about institutions as much as they are about interdependence on personal and planetary scales. Riffing on ideas about romantic break-ups, political dissolutions and ecological collapse, Break Up Variations considers the possibility that an end to a dream of symbiotic life is exactly what makes that dream possible and important.

APA, Harvard, Vancouver, ISO, and other styles

2

Hoffmann, Martin A., Louis-Félix Nothias, Marcus Ludwig, Markus Fleischauer, Emily C. Gentry, Michael Witting, Pieter C. Dorrestein, Kai Dührkop, and Sebastian Böcker. "High-confidence structural annotation of metabolites absent from spectral libraries." Nature Biotechnology 40, no. 3 (October 14, 2021): 411–21. http://dx.doi.org/10.1038/s41587-021-01045-9.

Full text

Abstract:

AbstractUntargeted metabolomics experiments rely on spectral libraries for structure annotation, but, typically, only a small fraction of spectra can be matched. Previous in silico methods search in structure databases but cannot distinguish between correct and incorrect annotations. Here we introduce the COSMIC workflow that combines in silico structure database generation and annotation with a confidence score consisting of kernel density P value estimation and a support vector machine with enforced directionality of features. On diverse datasets, COSMIC annotates a substantial number of hits at low false discovery rates and outperforms spectral library search. To demonstrate that COSMIC can annotate structures never reported before, we annotated 12 natural bile acids. The annotation of nine structures was confirmed by manual evaluation and two structures using synthetic standards. In human samples, we annotated and manually validated 315 molecular structures currently absent from the Human Metabolome Database. Application of COSMIC to data from 17,400 metabolomics experiments led to 1,715 high-confidence structural annotations that were absent from spectral libraries.

APA, Harvard, Vancouver, ISO, and other styles

3

Hentschel, Johannes, Markus Neuwirth, and Martin Rohrmeier. "The Annotated Mozart Sonatas: Score, Harmony, and Cadence." Transactions of the International Society for Music Information Retrieval 4, no. 1 (2021): 67–80. http://dx.doi.org/10.5334/tismir.63.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Kors, Jan A., Simon Clematide, Saber A. Akhondi, Erik M. van Mulligen, and Dietrich Rebholz-Schuhmann. "A multilingual gold-standard corpus for biomedical concept recognition: the Mantra GSC." Journal of the American Medical Informatics Association 22, no. 5 (May 5, 2015): 948–56. http://dx.doi.org/10.1093/jamia/ocv037.

Full text

Abstract:

Abstract Objective To create a multilingual gold-standard corpus for biomedical concept recognition. Materials and methods We selected text units from different parallel corpora (Medline abstract titles, drug labels, biomedical patent claims) in English, French, German, Spanish, and Dutch. Three annotators per language independently annotated the biomedical concepts, based on a subset of the Unified Medical Language System and covering a wide range of semantic groups. To reduce the annotation workload, automatically generated preannotations were provided. Individual annotations were automatically harmonized and then adjudicated, and cross-language consistency checks were carried out to arrive at the final annotations. Results The number of final annotations was 5530. Inter-annotator agreement scores indicate good agreement (median F-score 0.79), and are similar to those between individual annotators and the gold standard. The automatically generated harmonized annotation set for each language performed equally well as the best annotator for that language. Discussion The use of automatic preannotations, harmonized annotations, and parallel corpora helped to keep the manual annotation efforts manageable. The inter-annotator agreement scores provide a reference standard for gauging the performance of automatic annotation techniques. Conclusion To our knowledge, this is the first gold-standard corpus for biomedical concept recognition in languages other than English. Other distinguishing features are the wide variety of semantic groups that are being covered, and the diversity of text genres that were annotated.

APA, Harvard, Vancouver, ISO, and other styles

5

ElNahass, Yasser H., Hossam K. Mahmoud, Mervat M. Mattar, Omar A. Fahmy, Mohamed A. Samra, Raafat M. Abdelfattah, Fatma A. ElRefaey, et al. "MPN10 score and survival of molecularly annotated myeloproliferative neoplasm patients." Leukemia & Lymphoma 59, no. 4 (August 22, 2017): 844–54. http://dx.doi.org/10.1080/10428194.2017.1365852.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Mohapatra, Nilamadhaba, Namrata Sarraf, and Swapna sarit Sahu. "Domain based Chunking." International Journal on Natural Language Computing 10, no. 04 (August 30, 2021): 1–14. http://dx.doi.org/10.5121/ijnlc.2021.10401.

Full text

Abstract:

Chunking means splitting the sentences into tokens and then grouping them in a meaningful way. When it comes to high-performance chunking systems, transformer models have proved to be the state of the art benchmarks. To perform chunking as a task it requires a large-scale high quality annotated corpus where each token is attached with a particular tag similar as that of Named Entity Recognition Tasks. Later these tags are used in conjunction with pointer frameworks to find the final chunk. To solve this for a specific domain problem, it becomes a highly costly affair in terms of time and resources to manually annotate and produce a large-high-quality training set. When the domain is specific and diverse, then cold starting becomes even more difficult because of the expected large number of manually annotated queries to cover all aspects. To overcome the problem, we applied a grammar-based text generation mechanism where instead of annotating a sentence we annotate using grammar templates. We defined various templates corresponding to different grammar rules. To create a sentence we used these templates along with the rules where symbol or terminal values were chosen from the domain data catalog. It helped us to create a large number of annotated queries. These annotated queries were used for training the machine learning model using an ensemble transformer-based deep neural network model [24.] We found that grammar-based annotation was useful to solve domain-based chunks in input query sentences without any manual annotation where it was found to achieve a classification F1 score of 96.97% in classifying the tokens for the out of template queries.

APA, Harvard, Vancouver, ISO, and other styles

7

Jacobs, Gilles, and Véronique Hoste. "SENTiVENT: enabling supervised information extraction of company-specific events in economic and financial news." Language Resources and Evaluation 56, no. 1 (October 8, 2021): 225–57. http://dx.doi.org/10.1007/s10579-021-09562-4.

Full text

Abstract:

AbstractWe present SENTiVENT, a corpus of fine-grained company-specific events in English economic news articles. The domain of event processing is highly productive and various general domain, fine-grained event extraction corpora are freely available but economically-focused resources are lacking. This work fills a large need for a manually annotated dataset for economic and financial text mining applications. A representative corpus of business news is crawled and an annotation scheme developed with an iteratively refined economic event typology. The annotations are compatible with benchmark datasets (ACE/ERE) so state-of-the-art event extraction systems can be readily applied. This results in a gold-standard dataset annotated with event triggers, participant arguments, event co-reference, and event attributes such as type, subtype, negation, and modality. An adjudicated reference test set is created for use in annotator and system evaluation. Agreement scores are substantial and annotator performance adequate, indicating that the annotation scheme produces consistent event annotations of high quality. In an event detection pilot study, satisfactory results were obtained with a macro-averaged $$F_1$$ F 1 -score of $$59\%$$ 59 % validating the dataset for machine learning purposes. This dataset thus provides a rich resource on events as training data for supervised machine learning for economic and financial applications. The dataset and related source code is made available at https://osf.io/8jec2/.

APA, Harvard, Vancouver, ISO, and other styles

8

Dari, Retno Wulan, Suvi Akhiriyah, Eva Rahmawati, Him'mawan Adi Nugroho, and Twin Dyah Martiana. "Students’ ability in writing annotated bibliography: Teaching critical writing." EnJourMe (English Journal of Merdeka) : Culture, Language, and Teaching of English 7, no. 2 (January 1, 2023): 264–74. http://dx.doi.org/10.26905/enjourme.v7i2.9034.

Full text

Abstract:

Annotated bibliographies are used in various situations. Annotated bibliography is a short-annotated list of sources that summarizes, evaluates, and states source relevance. The ability to write annotated bibliography shows one’s ability to find the right information for writing research article. This paper aims to see how the ability of second year students majoring in English is in making annotated bibliography. This research is qualitative research using data obtained from 42 students who assigned on writing annotated bibliography. Data were explained descriptively from the score mean and percentage using Turnitin then scaled them using 4-scale Linkert Scale. Results showed that of five variables, quality of sources, annotations content, overall quality were good, while accuracy and annotations structure variables were adequate. For Turnitin check, 69.05% showed good and very good quality, while only 9.52% them showed poor quality. Finally, it isindicated that most of the second-year subjects consider as having good standard abilities categorized in writing annotated bibliography. DOI: 10.26905/enjourme.v7i2.8966

APA, Harvard, Vancouver, ISO, and other styles

9

Sarker, Abeed, Maksim Belousov, Jasper Friedrichs, Kai Hakala, Svetlana Kiritchenko, Farrokh Mehryary, Sifei Han, et al. "Data and systems for medication-related text classification and concept normalization from Twitter: insights from the Social Media Mining for Health (SMM4H)-2017 shared task." Journal of the American Medical Informatics Association 25, no. 10 (October 1, 2018): 1274–83. http://dx.doi.org/10.1093/jamia/ocy114.

Full text

Abstract:

AbstractObjectiveWe executed the Social Media Mining for Health (SMM4H) 2017 shared tasks to enable the community-driven development and large-scale evaluation of automatic text processing methods for the classification and normalization of health-related text from social media. An additional objective was to publicly release manually annotated data.Materials and MethodsWe organized 3 independent subtasks: automatic classification of self-reports of 1) adverse drug reactions (ADRs) and 2) medication consumption, from medication-mentioning tweets, and 3) normalization of ADR expressions. Training data consisted of 15 717 annotated tweets for (1), 10 260 for (2), and 6650 ADR phrases and identifiers for (3); and exhibited typical properties of social-media-based health-related texts. Systems were evaluated using 9961, 7513, and 2500 instances for the 3 subtasks, respectively. We evaluated performances of classes of methods and ensembles of system combinations following the shared tasks.ResultsAmong 55 system runs, the best system scores for the 3 subtasks were 0.435 (ADR class F1-score) for subtask-1, 0.693 (micro-averaged F1-score over two classes) for subtask-2, and 88.5% (accuracy) for subtask-3. Ensembles of system combinations obtained best scores of 0.476, 0.702, and 88.7%, outperforming individual systems.DiscussionAmong individual systems, support vector machines and convolutional neural networks showed high performance. Performance gains achieved by ensembles of system combinations suggest that such strategies may be suitable for operational systems relying on difficult text classification tasks (eg, subtask-1).ConclusionsData imbalance and lack of context remain challenges for natural language processing of social media text. Annotated data from the shared task have been made available as reference standards for future studies (http://dx.doi.org/10.17632/rxwfb3tysd.1).

APA, Harvard, Vancouver, ISO, and other styles

10

Mumtaz, Raabia, and Muhammad Abdul Qadir. "CustNER." International Journal on Semantic Web and Information Systems 16, no. 3 (July 2020): 110–27. http://dx.doi.org/10.4018/ijswis.2020070107.

Full text

Abstract:

This article describes CustNER: a system for named-entity recognition (NER) of person, location, and organization. Realizing the incorrect annotations of existing NER, four categories of false negatives have been identified. The NEs not annotated contain nationalities, have corresponding resource in DBpedia, are acronyms of other NEs. A rule-based system, CustNER, has been proposed that utilizes existing NERs and DBpedia knowledge base. CustNER has been trained on the open knowledge extraction (OKE) challenge 2017 dataset and evaluated on OKE and CoNLL03 (Conference on Natural Language Learning) datasets. The OKE dataset has also been annotated with the three types. Evaluation results show that CustNER outperforms existing NERs with F score 12.4% better than Stanford NER and 3.1% better than Illinois NER. On another standard evaluation dataset for which the system is not trained, the CoNLL03 dataset, CustNER gives results comparable to existing systems with F score 3.9% better than Stanford NER, though Illinois NER F score is 1.3% better than CustNER.

APA, Harvard, Vancouver, ISO, and other styles

11

Xiong, Deyi, Min Zhang, Aiti Aw, and Haizhou Li. "Linguistically Annotated Reordering: Evaluation and Analysis." Computational Linguistics 36, no. 3 (September 2010): 535–68. http://dx.doi.org/10.1162/coli_a_00009.

Full text

Abstract:

Linguistic knowledge plays an important role in phrase movement in statistical machine translation. To efficiently incorporate linguistic knowledge into phrase reordering, we propose a new approach: Linguistically Annotated Reordering (LAR). In LAR, we build hard hierarchical skeletons and inject soft linguistic knowledge from source parse trees to nodes of hard skeletons during translation. The experimental results on large-scale training data show that LAR is comparable to boundary word-based reordering (BWR) (Xiong, Liu, and Lin 2006), which is a very competitive lexicalized reordering approach. When combined with BWR, LAR provides complementary information for phrase reordering, which collectively improves the BLEU score significantly. To further understand the contribution of linguistic knowledge in LAR to phrase reordering, we introduce a syntax-based analysis method to automatically detect constituent movement in both reference and system translations, and summarize syntactic reordering patterns that are captured by reordering models. With the proposed analysis method, we conduct a comparative analysis that not only provides the insight into how linguistic knowledge affects phrase movement but also reveals new challenges in phrase reordering.

APA, Harvard, Vancouver, ISO, and other styles

12

Soneson, Charlotte, Michael I. Love, Rob Patro, Shobbir Hussain, Dheeraj Malhotra, and Mark D. Robinson. "A junction coverage compatibility score to quantify the reliability of transcript abundance estimates and annotation catalogs." Life Science Alliance 2, no. 1 (January 17, 2019): e201800175. http://dx.doi.org/10.26508/lsa.201800175.

Full text

Abstract:

Most methods for statistical analysis of RNA-seq data take a matrix of abundance estimates for some type of genomic features as their input, and consequently the quality of any obtained results is directly dependent on the quality of these abundances. Here, we present the junction coverage compatibility score, which provides a way to evaluate the reliability of transcript-level abundance estimates and the accuracy of transcript annotation catalogs. It works by comparing the observed number of reads spanning each annotated splice junction in a genomic region to the predicted number of junction-spanning reads, inferred from the estimated transcript abundances and the genomic coordinates of the corresponding annotated transcripts. We show that although most genes show good agreement between the observed and predicted junction coverages, there is a small set of genes that do not. Genes with poor agreement are found regardless of the method used to estimate transcript abundances, and the corresponding transcript abundances should be treated with care in any downstream analyses.

APA, Harvard, Vancouver, ISO, and other styles

13

Hafner, Elisabeth D., Patrick Barton, Rodrigo Caye Daudt, Jan Dirk Wegner, Konrad Schindler, and Yves Bühler. "Automated avalanche mapping from SPOT 6/7 satellite imagery with deep learning: results, evaluation, potential and limitations." Cryosphere 16, no. 9 (September 2, 2022): 3517–30. http://dx.doi.org/10.5194/tc-16-3517-2022.

Full text

Abstract:

Abstract. Spatially dense and continuous information on avalanche occurrences is crucial for numerous safety-related applications such as avalanche warning, hazard zoning, hazard mitigation measures, forestry, risk management and numerical simulations. This information is today still collected in a non-systematic way by observers in the field. Current research has explored the application of remote sensing technology to fill this information gap by providing spatially continuous information on avalanche occurrences over large regions. Previous investigations have confirmed the high potential of avalanche mapping from remotely sensed imagery to complement existing databases. Currently, the bottleneck for fast data provision from optical data is the time-consuming manual mapping. In our study we deploy a slightly adapted DeepLabV3+, a state-of-the-art deep learning model, to automatically identify and map avalanches in SPOT 6/7 imagery from 24 January 2018 and 16 January 2019. We relied on 24 778 manually annotated avalanche polygons split into geographically disjointed regions for training, validating and testing. Additionally, we investigate generalization ability by testing our best model configuration on SPOT 6/7 data from 6 January 2018 and comparing it to avalanches we manually annotated for that purpose. To assess the quality of the model results, we investigate the probability of detection (POD), the positive predictive value (PPV) and the F1 score. Additionally, we assessed the reproducibility of manually annotated avalanches in a small subset of our data. We achieved an average POD of 0.610, PPV of 0.668 and an F1 score of 0.625 in our test areas and found an F1 score in the same range for avalanche outlines annotated by different experts. Our model and approach are an important step towards a fast and comprehensive documentation of avalanche periods from optical satellite imagery in the future, complementing existing avalanche databases. This will have a large impact on safety-related applications, making mountain regions safer.

APA, Harvard, Vancouver, ISO, and other styles

14

Ali, Muhaddisa Barat, Xiaohan Bai, Irene Yu-Hua Gu, Mitchel S. Berger, and Asgeir Store Jakola. "A Feasibility Study on Deep Learning Based Brain Tumor Segmentation Using 2D Ellipse Box Areas." Sensors 22, no. 14 (July 15, 2022): 5292. http://dx.doi.org/10.3390/s22145292.

Full text

Abstract:

In most deep learning-based brain tumor segmentation methods, training the deep network requires annotated tumor areas. However, accurate tumor annotation puts high demands on medical personnel. The aim of this study is to train a deep network for segmentation by using ellipse box areas surrounding the tumors. In the proposed method, the deep network is trained by using a large number of unannotated tumor images with foreground (FG) and background (BG) ellipse box areas surrounding the tumor and background, and a small number of patients (<20) with annotated tumors. The training is conducted by initial training on two ellipse boxes on unannotated MRIs, followed by refined training on a small number of annotated MRIs. We use a multi-stream U-Net for conducting our experiments, which is an extension of the conventional U-Net. This enables the use of complementary information from multi-modality (e.g., T1, T1ce, T2, and FLAIR) MRIs. To test the feasibility of the proposed approach, experiments and evaluation were conducted on two datasets for glioma segmentation. Segmentation performance on the test sets is then compared with those used on the same network but trained entirely by annotated MRIs. Our experiments show that the proposed method has obtained good tumor segmentation results on the test sets, wherein the dice score on tumor areas is (0.8407, 0.9104), and segmentation accuracy on tumor areas is (83.88%, 88.47%) for the MICCAI BraTS’17 and US datasets, respectively. Comparing the segmented results by using the network trained by all annotated tumors, the drop in the segmentation performance from the proposed approach is (0.0594, 0.0159) in the dice score, and (8.78%, 2.61%) in segmented tumor accuracy for MICCAI and US test sets, which is relatively small. Our case studies have demonstrated that training the network for segmentation by using ellipse box areas in place of all annotated tumors is feasible, and can be considered as an alternative, which is a trade-off between saving medical experts’ time annotating tumors and a small drop in segmentation performance.

APA, Harvard, Vancouver, ISO, and other styles

15

Klein, Ari Z., Arjun Magge, and Graciela Gonzalez-Hernandez. "ReportAGE: Automatically extracting the exact age of Twitter users based on self-reports in tweets." PLOS ONE 17, no. 1 (January 25, 2022): e0262087. http://dx.doi.org/10.1371/journal.pone.0262087.

Full text

Abstract:

Advancing the utility of social media data for research applications requires methods for automatically detecting demographic information about social media study populations, including users’ age. The objective of this study was to develop and evaluate a method that automatically identifies the exact age of users based on self-reports in their tweets. Our end-to-end automatic natural language processing (NLP) pipeline, ReportAGE, includes query patterns to retrieve tweets that potentially mention an age, a classifier to distinguish retrieved tweets that self-report the user’s exact age (“age” tweets) and those that do not (“no age” tweets), and rule-based extraction to identify the age. To develop and evaluate ReportAGE, we manually annotated 11,000 tweets that matched the query patterns. Based on 1000 tweets that were annotated by all five annotators, inter-annotator agreement (Fleiss’ kappa) was 0.80 for distinguishing “age” and “no age” tweets, and 0.95 for identifying the exact age among the “age” tweets on which the annotators agreed. A deep neural network classifier, based on a RoBERTa-Large pretrained transformer model, achieved the highest F1-score of 0.914 (precision = 0.905, recall = 0.942) for the “age” class. When the age extraction was evaluated using the classifier’s predictions, it achieved an F1-score of 0.855 (precision = 0.805, recall = 0.914) for the “age” class. When it was evaluated directly on the held-out test set, it achieved an F1-score of 0.931 (precision = 0.873, recall = 0.998) for the “age” class. We deployed ReportAGE on a collection of more than 1.2 billion tweets, posted by 245,927 users, and predicted ages for 132,637 (54%) of them. Scaling the detection of exact age to this large number of users can advance the utility of social media data for research applications that do not align with the predefined age groupings of extant binary or multi-class classification approaches.

APA, Harvard, Vancouver, ISO, and other styles

16

Silvestri, Stefano, Francesco Gargiulo, and Mario Ciampi. "Iterative Annotation of Biomedical NER Corpora with Deep Neural Networks and Knowledge Bases." Applied Sciences 12, no. 12 (June 7, 2022): 5775. http://dx.doi.org/10.3390/app12125775.

Full text

Abstract:

The large availability of clinical natural language documents, such as clinical narratives or diagnoses, requires the definition of smart automatic systems for their processing and analysis, but the lack of annotated corpora in the biomedical domain, especially in languages different from English, makes it difficult to exploit the state-of-art machine-learning systems to extract information from such kinds of documents. For these reasons, healthcare professionals lose big opportunities that can arise from the analysis of this data. In this paper, we propose a methodology to reduce the manual efforts needed to annotate a biomedical named entity recognition (B-NER) corpus, exploiting both active learning and distant supervision, respectively based on deep learning models (e.g., Bi-LSTM, word2vec FastText, ELMo and BERT) and biomedical knowledge bases, in order to speed up the annotation task and limit class imbalance issues. We assessed this approach by creating an Italian-language electronic health record corpus annotated with biomedical domain entities in a small fraction of the time required for a fully manual annotation. The obtained corpus was used to train a B-NER deep neural network whose performances are comparable with the state of the art, with an F1-Score equal to 0.9661 and 0.8875 on two test sets.

APA, Harvard, Vancouver, ISO, and other styles

17

Wisnalmawati, Wisnalmawati, Agus Sasmito Aribowo, and Yunie Herawati. "Semi-supervised Learning Models for Sentiment Analysis on Marketplace Dataset." International Journal of Artificial Intelligence & Robotics (IJAIR) 4, no. 2 (December 3, 2022): 78–85. http://dx.doi.org/10.25139/ijair.v4i2.5267.

Full text

Abstract:

Sentiment analysis aims to categorize opinions using an annotated corpus to train the model. However, building a high-quality, fully annotated corpus takes a lot of effort, time, and expense. The semi-supervised learning technique efficiently adds training data automatically from unlabeled data. The labeling process, which requires human expertise and requires time, can be helped by an SSL approach. This study aims to develop an SSL-Model for sentiment analysis and to compare the learning capabilities of Naive Bayes (NB) and Random Forest (RF) in the SSL. Our model attempts to annotate opinion documents in Indonesian. We use an ensemble multi-classifier that works on unigrams, bigrams, and trigrams vectors. Our model test uses a marketplace dataset containing rating comments scrapping from Shopee for smartphone products in the Indonesian Language. The research started with data preparation, vectorization using TF-IDF, feature extraction, modeling using Random Forest (RF) and Naïve Bayes (NB), and evaluation using Accuracy and F1-score. The performance of the NB model outperformed previous research, increasing by 5,5%. The conclusion is that SSL performance highly depends on the number of training data and the compatibility of the features or patterns in the document with machine learning. On our marketplace dataset, better to use Random Forest.

APA, Harvard, Vancouver, ISO, and other styles

18

Prabhakar, Akshara, Gouri Sankar Majumder, and Ashish Anand. "CL-NERIL: A Cross-Lingual Model for NER in Indian Languages (Student Abstract)." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 11 (June 28, 2022): 13031–32. http://dx.doi.org/10.1609/aaai.v36i11.21652.

Full text

Abstract:

Developing Named Entity Recognition (NER) systems for Indian languages has been a long-standing challenge, mainly owing to the requirement of a large amount of annotated clean training instances. This paper proposes an end-to-end framework for NER for Indian languages in a low-resource setting by exploiting parallel corpora of English and Indian languages and an English NER dataset. The proposed framework includes an annotation projection method that combines word alignment score and NER tag prediction confidence score on source language (English) data to generate weakly labeled data in a target Indian language. We employ a variant of the Teacher-Student model and optimize it jointly on the pseudo labels of the Teacher model and predictions on the generated weakly labeled data. We also present manually annotated test sets for three Indian languages: Hindi, Bengali, and Gujarati. We evaluate the performance of the proposed framework on the test sets of the three Indian languages. Empirical results show a minimum 10% performance improvement compared to the zero-shot transfer learning model on all languages. This indicates that weakly labeled data generated using the proposed annotation projection method in target Indian languages can complement well-annotated source language data to enhance performance. Our code is publicly available at https://github.com/aksh555/CL-NERIL.

APA, Harvard, Vancouver, ISO, and other styles

19

Du, Jingcheng, Yang Xiang, Madhuri Sankaranarayanapillai, Meng Zhang, Jingqi Wang, Yuqi Si, Huy Anh Pham, Hua Xu, Yong Chen, and Cui Tao. "Extracting postmarketing adverse events from safety reports in the vaccine adverse event reporting system (VAERS) using deep learning." Journal of the American Medical Informatics Association 28, no. 7 (February 27, 2021): 1393–400. http://dx.doi.org/10.1093/jamia/ocab014.

Full text

Abstract:

Abstract Objective Automated analysis of vaccine postmarketing surveillance narrative reports is important to understand the progression of rare but severe vaccine adverse events (AEs). This study implemented and evaluated state-of-the-art deep learning algorithms for named entity recognition to extract nervous system disorder-related events from vaccine safety reports. Materials and Methods We collected Guillain-Barré syndrome (GBS) related influenza vaccine safety reports from the Vaccine Adverse Event Reporting System (VAERS) from 1990 to 2016. VAERS reports were selected and manually annotated with major entities related to nervous system disorders, including, investigation, nervous_AE, other_AE, procedure, social_circumstance, and temporal_expression. A variety of conventional machine learning and deep learning algorithms were then evaluated for the extraction of the above entities. We further pretrained domain-specific BERT (Bidirectional Encoder Representations from Transformers) using VAERS reports (VAERS BERT) and compared its performance with existing models. Results and Conclusions Ninety-one VAERS reports were annotated, resulting in 2512 entities. The corpus was made publicly available to promote community efforts on vaccine AEs identification. Deep learning-based methods (eg, bi-long short-term memory and BERT models) outperformed conventional machine learning-based methods (ie, conditional random fields with extensive features). The BioBERT large model achieved the highest exact match F-1 scores on nervous_AE, procedure, social_circumstance, and temporal_expression; while VAERS BERT large models achieved the highest exact match F-1 scores on investigation and other_AE. An ensemble of these 2 models achieved the highest exact match microaveraged F-1 score at 0.6802 and the second highest lenient match microaveraged F-1 score at 0.8078 among peer models.

APA, Harvard, Vancouver, ISO, and other styles

20

González, Aitor, Marie Artufel, and Pascal Rihet. "TAGOOS: genome-wide supervised learning of non-coding loci associated to complex phenotypes." Nucleic Acids Research 47, no. 14 (May 2, 2019): e79-e79. http://dx.doi.org/10.1093/nar/gkz320.

Full text

Abstract:

Abstract Genome-wide association studies (GWAS) associate single nucleotide polymorphisms (SNPs) to complex phenotypes. Most human SNPs fall in non-coding regions and are likely regulatory SNPs, but linkage disequilibrium (LD) blocks make it difficult to distinguish functional SNPs. Therefore, putative functional SNPs are usually annotated with molecular markers of gene regulatory regions and prioritized with dedicated prediction tools. We integrated associated SNPs, LD blocks and regulatory features into a supervised model called TAGOOS (TAG SNP bOOSting) and computed scores genome-wide. The TAGOOS scores enriched and prioritized unseen associated SNPs with an odds ratio of 4.3 and 3.5 and an area under the curve (AUC) of 0.65 and 0.6 for intronic and intergenic regions, respectively. The TAGOOS score was correlated with the maximal significance of associated SNPs and expression quantitative trait loci (eQTLs) and with the number of biological samples annotated for key regulatory features. Analysis of loci and regions associated to cleft lip and human adult height phenotypes recovered known functional loci and predicted new functional loci enriched in transcriptions factors related to the phenotypes. In conclusion, we trained a supervised model based on associated SNPs to prioritize putative functional regions. The TAGOOS scores, annotations and UCSC genome tracks are available here: https://tagoos.readthedocs.io.

APA, Harvard, Vancouver, ISO, and other styles

21

Jonnalagadda, Siddhartha, Trevor Cohen, Stephen Wu, Hongfang Liu, and Graciela Gonzalez. "Using Empirically Constructed Lexical Resources for Named Entity Recognition." Biomedical Informatics Insights 6s1 (January 2013): BII.S11664. http://dx.doi.org/10.4137/bii.s11664.

Full text

Abstract:

Because of privacy concerns and the expense involved in creating an annotated corpus, the existing small-annotated corpora might not have sufficient examples for learning to statistically extract all the named-entities precisely. In this work, we evaluate what value may lie in automatically generated features based on distributional semantics when using machine-learning named entity recognition (NER). The features we generated and experimented with include n-nearest words, support vector machine (SVM)-regions, and term clustering, all of which are considered distributional semantic features. The addition of the n-nearest words feature resulted in a greater increase in F-score than by using a manually constructed lexicon to a baseline system. Although the need for relatively small-annotated corpora for retraining is not obviated, lexicons empirically derived from unannotated text can not only supplement manually created lexicons, but also replace them. This phenomenon is observed in extracting concepts from both biomedical literature and clinical notes.

APA, Harvard, Vancouver, ISO, and other styles

22

Zhang, Xuhong, Toby C. Cornish, Lin Yang, Tellen D. Bennett, Debashis Ghosh, and Fuyong Xing. "Generative Adversarial Domain Adaptation for Nucleus Quantification in Images of Tissue Immunohistochemically Stained for Ki-67." JCO Clinical Cancer Informatics, no. 4 (September 2020): 666–79. http://dx.doi.org/10.1200/cci.19.00108.

Full text

Abstract:

PURPOSE We focus on the problem of scarcity of annotated training data for nucleus recognition in Ki-67 immunohistochemistry (IHC)–stained pancreatic neuroendocrine tumor (NET) images. We hypothesize that deep learning–based domain adaptation is helpful for nucleus recognition when image annotations are unavailable in target data sets. METHODS We considered 2 different institutional pancreatic NET data sets: one (ie, source) containing 38 cases with 114 annotated images and the other (ie, target) containing 72 cases with 20 annotated images. The gold standards were manually annotated by 1 pathologist. We developed a novel deep learning–based domain adaptation framework to count different types of nuclei (ie, immunopositive tumor, immunonegative tumor, nontumor nuclei). We compared the proposed method with several recent fully supervised deep learning models, such as fully convolutional network-8s (FCN-8s), U-Net, fully convolutional regression network (FCRN) A, FCRNB, and fully residual convolutional network (FRCN). We also evaluated the proposed method by learning with a mixture of converted source images and real target annotations. RESULTS Our method achieved an F1 score of 81.3% and 62.3% for nucleus detection and classification in the target data set, respectively. Our method outperformed FCN-8s (53.6% and 43.6% for nucleus detection and classification, respectively), U-Net (61.1% and 47.6%), FCRNA (63.4% and 55.8%), and FCRNB (68.2% and 60.6%) in terms of F1 score and was competitive with FRCN (81.7% and 70.7%). In addition, learning with a mixture of converted source images and only a small set of real target labels could further boost the performance. CONCLUSION This study demonstrates that deep learning–based domain adaptation is helpful for nucleus recognition in Ki-67 IHC stained images when target data annotations are not available. It would improve the applicability of deep learning models designed for downstream supervised learning tasks on different data sets.

APA, Harvard, Vancouver, ISO, and other styles

23

Merdivan, Erinc, Deepika Singh, Sten Hanke, Johannes Kropf, Andreas Holzinger, and Matthieu Geist. "Human Annotated Dialogues Dataset for Natural Conversational Agents." Applied Sciences 10, no. 3 (January 21, 2020): 762. http://dx.doi.org/10.3390/app10030762.

Full text

Abstract:

Conversational agents are gaining huge popularity in industrial applications such as digital assistants, chatbots, and particularly systems for natural language understanding (NLU). However, a major drawback is the unavailability of a common metric to evaluate the replies against human judgement for conversational agents. In this paper, we develop a benchmark dataset with human annotations and diverse replies that can be used to develop such metric for conversational agents. The paper introduces a high-quality human annotated movie dialogue dataset, HUMOD, that is developed from the Cornell movie dialogues dataset. This new dataset comprises 28,500 human responses from 9500 multi-turn dialogue history-reply pairs. Human responses include: (i) ratings of the dialogue reply in relevance to the dialogue history; and (ii) unique dialogue replies for each dialogue history from the users. Such unique dialogue replies enable researchers in evaluating their models against six unique human responses for each given history. Detailed analysis on how dialogues are structured and human perception on dialogue score in comparison with existing models are also presented.

APA, Harvard, Vancouver, ISO, and other styles

24

Yang, Jie Chi, and Peichin Chang. "Captions and reduced forms instruction: The impact on EFL students’ listening comprehension." ReCALL 26, no. 1 (November 21, 2013): 44–61. http://dx.doi.org/10.1017/s0958344013000219.

Full text

Abstract:

AbstractFor many EFL learners, listening poses a grave challenge. The difficulty in segmenting a stream of speech and limited capacity in short-term memory are common weaknesses for language learners. Specifically, reduced forms, which frequently appear in authentic informal conversations, compound the challenges in listening comprehension. Numerous interventions have been implemented to assist EFL language learners, and of these, the application of captions has been found highly effective in promoting learning. Few studies have examined how different modes of captions may enhance listening comprehension. This study proposes three modes of captions: full, keyword-only, and annotated keyword captions and investigates their contribution to the learning of reduced forms and overall listening comprehension. Forty-four EFL university students participated in the study and were randomly assigned to one of the three groups. The results revealed that all three groups exhibited improvement on the pre-test while the annotated keyword caption group exhibited the best performance with the highest mean score. Comparing performances between groups, the annotated keyword caption group also emulated both the full caption and the keyword-only caption groups, particularly in the ability to recognize reduced forms. The study sheds light on the potential of annotated keyword captions in enhancing reduced forms learning and overall listening comprehension.

APA, Harvard, Vancouver, ISO, and other styles

25

Saremi, Nedda F., Jonas Oppenheimer, Christopher Vollmers, Brendan O’Connell, Shard A. Milne, Ashley Byrne, Li Yu, Oliver A. Ryder, Richard E. Green, and Beth Shapiro. "An Annotated Draft Genome for the Andean Bear, Tremarctos ornatus." Journal of Heredity 112, no. 4 (April 21, 2021): 377–84. http://dx.doi.org/10.1093/jhered/esab021.

Full text

Abstract:

Abstract The Andean bear is the only extant member of the Tremarctine subfamily and the only extant ursid species to inhabit South America. Here, we present an annotated de novo assembly of a nuclear genome from a captive-born female Andean bear, Mischief, generated using a combination of short and long DNA and RNA reads. Our final assembly has a length of 2.23 Gb, and a scaffold N50 of 21.12 Mb, contig N50 of 23.5 kb, and BUSCO score of 88%. The Andean bear genome will be a useful resource for exploring the complex phylogenetic history of extinct and extant bear species and for future population genetics studies of Andean bears.

APA, Harvard, Vancouver, ISO, and other styles

26

Etienne, Aaron, Aanis Ahmad, Varun Aggarwal, and Dharmendra Saraswat. "Deep Learning-Based Object Detection System for Identifying Weeds Using UAS Imagery." Remote Sensing 13, no. 24 (December 20, 2021): 5182. http://dx.doi.org/10.3390/rs13245182.

Full text

Abstract:

Current methods of broadcast herbicide application cause a negative environmental and economic impact. Computer vision methods, specifically those related to object detection, have been reported to aid in site-specific weed management procedures for targeted herbicide application within a field. However, a major challenge to developing a weed detection system is the requirement for a properly annotated database to differentiate between weeds and crops under field conditions. This research involved creating an annotated database of 374 red, green, and blue (RGB) color images organized into monocot and dicot weed classes. The images were acquired from corn and soybean research plots located in north-central Indiana using an unmanned aerial system (UAS) flown at 30 and 10 m heights above ground level (AGL). A total of 25,560 individual weed instances were manually annotated. The annotated database consisted of four different subsets (Training Image Sets 1–4) to train the You Only Look Once version 3 (YOLOv3) deep learning model for five separate experiments. The best results were observed with Training Image Set 4, consisting of images acquired at 10 m AGL. For monocot and dicot weeds, respectively, an average precision (AP) score of 91.48 % and 86.13% was observed at a 25% IoU threshold (AP @ T = 0.25), as well as 63.37% and 45.13% at a 50% IoU threshold (AP @ T = 0.5). This research has demonstrated a need to develop large, annotated weed databases to evaluate deep learning models for weed identification under field conditions. It also affirms the findings of other limited research studies utilizing object detection for weed identification under field conditions.

APA, Harvard, Vancouver, ISO, and other styles

27

Wu, Beilei, Lijun Tao, Daqing Yang, Wei Li, Hongbo Xu, and Qianggui He. "Development of an Immune Infiltration-Related Eight-Gene Prognostic Signature in Colorectal Cancer Microenvironment." BioMed Research International 2020 (August 27, 2020): 1–43. http://dx.doi.org/10.1155/2020/2719739.

Full text

Abstract:

Objective. Stromal cells and immune cells have important clinical significance in the microenvironment of colorectal cancer (CRC). This study is aimed at developing a CRC gene signature on the basis of stromal and immune scores. Methods. A cohort of CRC patients (n=433) were adopted from The Cancer Genome Atlas (TCGA) database. Stromal/immune scores were calculated by the ESTIMATE algorithm. Correlation between prognosis/clinical characteristics and stromal/immune scores was assessed. Differentially expressed stromal and immune genes were identified. Their potential functions were annotated by functional enrichment analysis. Cox regression analysis was used to develop an eight-gene risk score model. Its predictive efficacies for 3 years, 5 years, overall survival (OS), and progression-free survival interval (PFI) were evaluated using time-dependent receiver operating characteristic (ROC) curves. The correlation between the risk score and the infiltering levels of six immune cells was analyzed using TIMER. The risk score was validated using an independent dataset. Results. Immune score was in a significant association with prognosis and clinical characteristics of CRC. 736 upregulated and two downregulated stromal and immune genes were identified, which were mainly enriched into immune-related biological processes and pathways. An-eight gene prognostic risk score model was conducted, consisting of CCL22, CD36, CPA3, CPT1C, KCNE4, NFATC1, RASGRP2, and SLC2A3. High risk score indicated a poor prognosis of patients. The area under the ROC curves (AUC) s of the model for 3 years, 5 years, OS, and PFI were 0.71, 0.70, 0.73, and 0.66, respectively. Thus, the model possessed well performance for prediction of patients’ prognosis, which was confirmed by an external dataset. Moreover, the risk score was significantly correlated with immune cell infiltration. Conclusion. Our study conducted an immune-related prognostic risk score model, which could provide novel targets for immunotherapy of CRC.

APA, Harvard, Vancouver, ISO, and other styles

28

Stoica, George, Emmanouil Antonios Platanios, and Barnabas Poczos. "Re-TACRED: Addressing Shortcomings of the TACRED Dataset." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 15 (May 18, 2021): 13843–50. http://dx.doi.org/10.1609/aaai.v35i15.17631.

Full text

Abstract:

TACRED is one of the largest and most widely used sentence-level relation extraction datasets. Proposed models that are evaluated using this dataset consistently set new state-of-the-art performance. However, they still exhibit large error rates despite leveraging external knowledge and unsupervised pretraining on large text corpora. A recent study suggested that this may be due to poor dataset quality. The study observed that over 50% of the most challenging sentences from the development and test sets are incorrectly labeled and account for an average drop of 8% f1-score in model performance. However, this study was limited to a small biased sample of 5k (out of a total of 106k) sentences, substantially restricting the generalizability and broader implications of its findings. In this paper, we address these shortcomings by: (i) performing a comprehensive study over the whole TACRED dataset, (ii) proposing an improved crowdsourcing strategy and deploying it to re-annotate the whole dataset, and (iii) performing a thorough analysis to understand how correcting the TACRED annotations affects previously published results. After verification, we observed that 23.9% of TACRED labels are incorrect. Moreover, evaluating several models on our revised dataset yields an average f1-score improvement of 14.3% and helps uncover significant relationships between the different models (rather than simply offsetting or scaling their scores by a constant factor). Finally, aside from our analysis we also release Re-TACRED, a new completely re-annotated version of the TACRED dataset that can be used to perform reliable evaluation of relation extraction models.

APA, Harvard, Vancouver, ISO, and other styles

29

Weinstein, Ben G., Sarah J. Graves, Sergio Marconi, Aditya Singh, Alina Zare, Dylan Stewart, Stephanie A. Bohlman, and Ethan P. White. "A benchmark dataset for canopy crown detection and delineation in co-registered airborne RGB, LiDAR and hyperspectral imagery from the National Ecological Observation Network." PLOS Computational Biology 17, no. 7 (July 2, 2021): e1009180. http://dx.doi.org/10.1371/journal.pcbi.1009180.

Full text

Abstract:

Broad scale remote sensing promises to build forest inventories at unprecedented scales. A crucial step in this process is to associate sensor data into individual crowns. While dozens of crown detection algorithms have been proposed, their performance is typically not compared based on standard data or evaluation metrics. There is a need for a benchmark dataset to minimize differences in reported results as well as support evaluation of algorithms across a broad range of forest types. Combining RGB, LiDAR and hyperspectral sensor data from the USA National Ecological Observatory Network’s Airborne Observation Platform with multiple types of evaluation data, we created a benchmark dataset to assess crown detection and delineation methods for canopy trees covering dominant forest types in the United States. This benchmark dataset includes an R package to standardize evaluation metrics and simplify comparisons between methods. The benchmark dataset contains over 6,000 image-annotated crowns, 400 field-annotated crowns, and 3,000 canopy stem points from a wide range of forest types. In addition, we include over 10,000 training crowns for optional use. We discuss the different evaluation data sources and assess the accuracy of the image-annotated crowns by comparing annotations among multiple annotators as well as overlapping field-annotated crowns. We provide an example submission and score for an open-source algorithm that can serve as a baseline for future methods.

APA, Harvard, Vancouver, ISO, and other styles

30

Ales, Zacharie, Alexandre Pauchet, and Arnaud Knippel. "Extraction and Clustering of Two-Dimensional Dialogue Patterns." International Journal on Artificial Intelligence Tools 27, no. 02 (March 2018): 1850001. http://dx.doi.org/10.1142/s021821301850001x.

Full text

Abstract:

This article proposes a two-step methodology to ease the identification of dialogue patterns in a corpus of annotated dialogues. The annotations of a given dialogue are represented within a two-dimensional array whose lines correspond to the utterances of the dialogue ordered chronologically. The first step of our methodology consists in extracting recurrent patterns. To that end, we adapt a dynamic programming algorithm used to align two-dimensional arrays by reducing its complexity and improving its trace-back procedure. During the second step, the obtained patterns are clustered using various heuristics from the literature. As evaluation process, our method is applied onto a corpus of annotated dialogues between a parent and her child in a storytelling context. The obtained partitions of dialogue patterns are evaluated by an expert in child development of language to assess how the methodology helps the expert into explaining the child behaviors. The influence of the method parameters (clustering heuristics, minimum extraction score, number of clusters and substitution score array) are studied. Dialogue patterns that manual extractions have failed to detect are highlighted by the method and the most efficient values of the parameters are therefore determined.

APA, Harvard, Vancouver, ISO, and other styles

31

Simons, Colinda C. J. M., Nadine S. M. Offermans, Monika Stoll, Piet A. van den Brandt, and Matty P. Weijenberg. "Empirical Investigation of Genomic Clusters Associated With Height and the Risk of Postmenopausal Breast and Colorectal Cancer in the Netherlands Cohort Study." American Journal of Epidemiology 191, no. 3 (November 2, 2021): 413–29. http://dx.doi.org/10.1093/aje/kwab259.

Full text

Abstract:

Abstract We empirically investigated genomic clusters associated with both height and postmenopausal breast cancer (BC) or colorectal cancer (CRC) (or both) in the Netherlands Cohort Study to unravel shared underlying mechanisms between height and these cancers. The Netherlands Cohort Study (1986–2006) includes 120,852 participants (case-cohort study: nsubcohort = 5,000; 20.3 years of follow-up). Variants in clusters on chromosomes 2, 4, 5, 6 (2 clusters), 10, and 20 were genotyped using toenail DNA. Cluster-specific genetic risk scores were modeled in relation to height and postmenopausal BC and CRC risk using age-adjusted linear regression and multivariable-adjusted Cox regression, respectively. Only the chromosome 10 cluster risk score was associated with all 3 phenotypes in the same sex (women); that is, it was associated with increased height (βcontinuous = 0.34, P = 0.014), increased risk of hormone-receptor–positive BC (for estrogen-receptor–positive BC, hazard ratio (HRcontinuous score) = 1.10 (95% confidence interval (CI): 1.02, 1.20); for progesterone-receptor–positive BC, HRcontinuous score = 1.15 (95% CI: 1.04, 1.26)), and increased risk of distal colon (HRcontinuous score = 1.13, 95% CI: 1.01, 1.27) and rectal (HRcontinuous score = 1.14, 95% CI: 0.99, 1.30) cancer. The chromosome 10 cluster variants were all annotated to the zinc finger MIZ-type containing 1 gene (ZMIZ1), which is involved in androgen receptor activity. This suggests that hormone-related growth mechanisms could influence both height and postmenopausal BC and CRC.

APA, Harvard, Vancouver, ISO, and other styles

32

Dwane, Lisa, Fiona M. Behan, Emanuel Gonçalves, Howard Lightfoot, Wanjuan Yang, Dieudonne van der Meer, Rebecca Shepherd, Miguel Pignatelli, Francesco Iorio, and Mathew J. Garnett. "Project Score database: a resource for investigating cancer cell dependencies and prioritizing therapeutic targets." Nucleic Acids Research 49, no. D1 (October 17, 2020): D1365—D1372. http://dx.doi.org/10.1093/nar/gkaa882.

Full text

Abstract:

Abstract CRISPR genetic screens in cancer cell models are a powerful tool to elucidate oncogenic mechanisms and to identify promising therapeutic targets. The Project Score database (https://score.depmap.sanger.ac.uk/) uses genome-wide CRISPR–Cas9 dropout screening data in hundreds of highly annotated cancer cell models to identify genes required for cell fitness and prioritize novel oncology targets. The Project Score database currently allows users to investigate the fitness effect of 18 009 genes tested across 323 cancer cell models. Through interactive interfaces, users can investigate data by selecting a specific gene, cancer cell model or tissue type, as well as browsing all gene fitness scores. Additionally, users can identify and rank candidate drug targets based on an established oncology target prioritization pipeline, incorporating genetic biomarkers and clinical datasets for each target, and including suitability for drug development based on pharmaceutical tractability. Data are freely available and downloadable. To enhance analyses, links to other key resources including Open Targets, COSMIC, the Cell Model Passports, UniProt and the Genomics of Drug Sensitivity in Cancer are provided. The Project Score database is a valuable new tool for investigating genetic dependencies in cancer cells and the identification of candidate oncology targets.

APA, Harvard, Vancouver, ISO, and other styles

33

Dumitrache, Anca, Lora Aroyo, and Chris Welty. "Capturing Ambiguity in Crowdsourcing Frame Disambiguation." Proceedings of the AAAI Conference on Human Computation and Crowdsourcing 6 (June 15, 2018): 12–20. http://dx.doi.org/10.1609/hcomp.v6i1.13330.

Full text

Abstract:

FrameNet is a computational linguistics resource composed of semantic frames, high-level concepts that represent the meanings of words. In this paper, we present an approach to gather frame disambiguation annotations in sentences using a crowdsourcing approach with multiple workers per sentence to capture inter-annotator disagreement. We perform an experiment over a set of 433 sentences annotated with frames from the FrameNet corpus, and show that the aggregated crowd annotations achieve an F1 score greater than 0.67 as compared to expert linguists. We highlight cases where the crowd annotation was correct even though the expert is in disagreement, arguing for the need to have multiple annotators per sentence. Most importantly, we examine cases in which crowd workers could not agree, and demonstrate that these cases exhibit ambiguity, either in the sentence, frame, or the task itself, and argue that collapsing such cases to a single, discrete truth value (i.e. correct or incorrect) is inappropriate, creating arbitrary targets for machine learning.

APA, Harvard, Vancouver, ISO, and other styles

34

Wu, Tong, Nikolas Martelaro, Simon Stent, Jorge Ortiz, and Wendy Ju. "Learning When Agents Can Talk to Drivers Using the INAGT Dataset and Multisensor Fusion." Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 5, no. 3 (September 9, 2021): 1–28. http://dx.doi.org/10.1145/3478125.

Full text

Abstract:

This paper examines sensor fusion techniques for modeling opportunities for proactive speech-based in-car interfaces. We leverage the Is Now a Good Time (INAGT) dataset, which consists of automotive, physiological, and visual data collected from drivers who self-annotated responses to the question "Is now a good time?," indicating the opportunity to receive non-driving information during a 50-minute drive. We augment this original driver-annotated data with third-party annotations of perceived safety, in order to explore potential driver overconfidence. We show that fusing automotive, physiological, and visual data allows us to predict driver labels of availability, achieving an 0.874 F1-score by extracting statistically relevant features and training with our proposed deep neural network, PazNet. Using the same data and network, we achieve an 0.891 F1-score for predicting third-party labeled safe moments. We train these models to avoid false positives---determinations that it is a good time to interrupt when it is not---since false positives may cause driver distraction or service deactivation by the driver. Our analyses show that conservative models still leave many moments for interaction and show that most inopportune moments are short. This work lays a foundation for using sensor fusion models to predict when proactive speech systems should engage with drivers.

APA, Harvard, Vancouver, ISO, and other styles

35

Pilon, S., M. J. Puttkammer, and G. B. Van Huyssteen. "Die ontwikkeling van ’n woordafbreker en kompositumanaliseerder vir Afrikaans." Literator 29, no. 1 (July 25, 2008): 21–42. http://dx.doi.org/10.4102/lit.v29i1.99.

Full text

Abstract:

The development of a hyphenator and compound analyser for Afrikaans The development of two core-technologies for Afrikaans, viz. a hyphenator and a compound analyser is described in this article. As no annotated Afrikaans data existed prior to this project to serve as training data for a machine learning classifier, the core-technologies in question are first developed using a rule-based approach. The rule-based hyphenator and compound analyser are evaluated and the hyphenator obtains an fscore of 90,84%, while the compound analyser only reaches an f-score of 78,20%. Since these results are somewhat disappointing and/or insufficient for practical implementation, it was decided that a machine learning technique (memory-based learning) will be used instead. Training data for each of the two core-technologies is then developed using “TurboAnnotate”, an interface designed to improve the accuracy and speed of manual annotation. The hyphenator developed using machine learning has been trained with 39 943 words and reaches an fscore of 98,11% while the f-score of the compound analyser is 90,57% after being trained with 77 589 annotated words. It is concluded that machine learning (specifically memory-based learning) seems an appropriate approach for developing coretechnologies for Afrikaans.

APA, Harvard, Vancouver, ISO, and other styles

36

Gretz, Shai, Roni Friedman, Edo Cohen-Karlik, Assaf Toledo, Dan Lahav, Ranit Aharonov, and Noam Slonim. "A Large-Scale Dataset for Argument Quality Ranking: Construction and Analysis." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 05 (April 3, 2020): 7805–13. http://dx.doi.org/10.1609/aaai.v34i05.6285.

Full text

Abstract:

Identifying the quality of free-text arguments has become an important task in the rapidly expanding field of computational argumentation. In this work, we explore the challenging task of argument quality ranking. To this end, we created a corpus of 30,497 arguments carefully annotated for point-wise quality, released as part of this work. To the best of our knowledge, this is the largest dataset annotated for point-wise argument quality, larger by a factor of five than previously released datasets. Moreover, we address the core issue of inducing a labeled score from crowd annotations by performing a comprehensive evaluation of different approaches to this problem. In addition, we analyze the quality dimensions that characterize this dataset. Finally, we present a neural method for argument quality ranking, which outperforms several baselines on our own dataset, as well as previous methods published for another dataset.

APA, Harvard, Vancouver, ISO, and other styles

37

Mani, Annapoorni, Shahriman Abu Bakar, Pranesh Krishnan, and Sazali Yaacob. "Categorization of material quality using a model-free reinforcement learning algorithm." Journal of Physics: Conference Series 2107, no. 1 (November 1, 2021): 012027. http://dx.doi.org/10.1088/1742-6596/2107/1/012027.

Full text

Abstract:

Abstract Reinforcement learning is the most preferred algorithms for optimization problems in industrial automation. Model-free reinforcement learning algorithms optimize for rewards without the knowledge of the environmental dynamics and require less computation. Regulating the quality of the raw materials in the inbound inventory can improve the manufacturing process. In this paper, the raw materials arriving at the incoming inspection process are categorized and labeled based on their quality through the path traveled. A model-free temporal difference learning approach is used to predict the acceptance and rejection path of raw materials in the incoming inspection process. The algorithm presented eight routes paths that the raw materials could travel. Four pathways correspond to material acceptance, while the rest lead to material refusal. The materials are annotated using the total scores acquired in the incoming inspection process. The materials traveling on the ideal path (path A) get the highest total score. The rest of the accepted materials in the acceptance path have a 7.37% lower score in path B, whereas path C and path D get 37.28% and 42.44% lower than the ideal approach.

APA, Harvard, Vancouver, ISO, and other styles

38

da Silva, Daniel Queirós, Filipe Neves dos Santos, Vítor Filipe, Armando Jorge Sousa, and Paulo Moura Oliveira. "Edge AI-Based Tree Trunk Detection for Forestry Monitoring Robotics." Robotics 11, no. 6 (November 27, 2022): 136. http://dx.doi.org/10.3390/robotics11060136.

Full text

Abstract:

Object identification, such as tree trunk detection, is fundamental for forest robotics. Intelligent vision systems are of paramount importance in order to improve robotic perception, thus enhancing the autonomy of forest robots. To that purpose, this paper presents three contributions: an open dataset of 5325 annotated forest images; a tree trunk detection Edge AI benchmark between 13 deep learning models evaluated on four edge-devices (CPU, TPU, GPU and VPU); and a tree trunk mapping experiment using an OAK-D as a sensing device. The results showed that YOLOR was the most reliable trunk detector, achieving a maximum F1 score around 90% while maintaining high scores for different confidence levels; in terms of inference time, YOLOv4 Tiny was the fastest model, attaining 1.93 ms on the GPU. YOLOv7 Tiny presented the best trade-off between detection accuracy and speed, with average inference times under 4 ms on the GPU considering different input resolutions and at the same time achieving an F1 score similar to YOLOR. This work will enable the development of advanced artificial vision systems for robotics in forestry monitoring operations.

APA, Harvard, Vancouver, ISO, and other styles

39

Serna, Ainhoa, Aitor Soroa, and Rodrigo Agerri. "Applying Deep Learning Techniques for Sentiment Analysis to Assess Sustainable Transport." Sustainability 13, no. 4 (February 23, 2021): 2397. http://dx.doi.org/10.3390/su13042397.

Full text

Abstract:

Users voluntarily generate large amounts of textual content by expressing their opinions, in social media and specialized portals, on every possible issue, including transport and sustainability. In this work we have leveraged such User Generated Content to obtain a high accuracy sentiment analysis model which automatically analyses the negative and positive opinions expressed in the transport domain. In order to develop such model, we have semiautomatically generated an annotated corpus of opinions about transport, which has then been used to fine-tune a large pretrained language model based on recent deep learning techniques. Our empirical results demonstrate the robustness of our approach, which can be applied to automatically process massive amounts of opinions about transport. We believe that our method can help to complement data from official statistics and traditional surveys about transport sustainability. Finally, apart from the model and annotated dataset, we also provide a transport classification score with respect to the sustainability of the transport types found in the use case dataset.

APA, Harvard, Vancouver, ISO, and other styles

40

Lachmann, Alexander, Zhuorui Xie, and Avi Ma’ayan. "blitzGSEA: efficient computation of gene set enrichment analysis through gamma distribution approximation." Bioinformatics 38, no. 8 (February 10, 2022): 2356–57. http://dx.doi.org/10.1093/bioinformatics/btac076.

Full text

Abstract:

Abstract Motivation The identification of pathways and biological processes from differential gene expression is central for interpretation of data collected by transcriptomics assays. Gene set enrichment analysis (GSEA) is the most commonly used algorithm to calculate the significance of the relevancy of an annotated gene set with a differential expression signature. To compute significance, GSEA implements permutation tests which are slow and inaccurate for comparing many differential expression signatures to thousands of annotated gene sets. Results Here, we present blitzGSEA, an algorithm that is based on the same running sum statistic as GSEA, but instead of performing permutations, blitzGSEA approximates the enrichment score probabilities based on Gamma distributions. blitzGSEA achieves significant improvement in performance compared with prior GSEA implementations, while approximating small P-values more accurately. Availability and implementation The data, a python package, together with all source code, and a detailed user guide are available from GitHub at: https://github.com/MaayanLab/blitzgsea. Supplementary information Supplementary data are available at Bioinformatics online.

APA, Harvard, Vancouver, ISO, and other styles

41

Yang, Hui, Alistair Willis, Anne De Roeck, and Bashar Nuseibeh. "A Hybrid Model for Automatic Emotion Recognition in Suicide Notes." Biomedical Informatics Insights 5s1 (January 2012): BII.S8948. http://dx.doi.org/10.4137/bii.s8948.

Full text

Abstract:

We describe the Open University team's submission to the 2011 i2b2/VA/Cincinnati Medical Natural Language Processing Challenge, Track 2 Shared Task for sentiment analysis in suicide notes. This Shared Task focused on the development of automatic systems that identify, at the sentence level, affective text of 15 specific emotions from suicide notes. We propose a hybrid model that incorporates a number of natural language processing techniques, including lexicon-based keyword spotting, CRF-based emotion cue identification, and machine learning-based emotion classification. The results generated by different techniques are integrated using different vote-based merging strategies. The automated system performed well against the manually-annotated gold standard, and achieved encouraging results with a micro-averaged F-measure score of 61.39% in textual emotion recognition, which was ranked 1st place out of 24 participant teams in this challenge. The results demonstrate that effective emotion recognition by an automated system is possible when a large annotated corpus is available.

APA, Harvard, Vancouver, ISO, and other styles

42

Wang, Hongbin, Jingzhen Ye, Zhengtao Yu, Jian Wang, and Cunli Mao. "Unsupervised Keyword Extraction Methods Based on a Word Graph Network." International Journal of Ambient Computing and Intelligence 11, no. 2 (April 2020): 68–79. http://dx.doi.org/10.4018/ijaci.2020040104.

Full text

Abstract:

Supervised keyword extraction methods usually require a large human-annotated corpus to train the model. Expensive manual labeling has made unsupervised technology using word graph networks attractive. Traditional word graph networks simply consider the co-occurrence relationship of words or the topological structure of the network, ignoring the influence of semantic relations between words on keyword extraction. To solve these problems, an unsupervised keyword extraction method based on word graph networks for both Chinese and English is proposed. This method uses word embedding to applying a “word attraction score” to semantic relevance between words in a document. Combination of the bias weight of the node and a weighted PageRank algorithm is used to compute the final scores of words. The experimental results demonstrate that the method is more effective than the traditional methods.

APA, Harvard, Vancouver, ISO, and other styles

43

Ribeiro, Guilherme Aramizo, Elika Ridelman, Justin D. Klein, Beth A. Angst, Christina M. Shanti, and Mo Rastgaar. "50 Assessment of Skin Graft in Pediatric Burn Patients Using Machine Learning Is Comparable to Human Expert Performance." Journal of Burn Care & Research 41, Supplement_1 (March 2020): S33—S34. http://dx.doi.org/10.1093/jbcr/iraa024.054.

Full text

Abstract:

Abstract Introduction Though widely used, current scar assessment scales are inaccurate and highly subjective, further complicating the already difficult task of determining the optimal management of burn patients. Additional disadvantages of these tools include the need for direct examination by an experienced clinician and the inability to retrospectively review them. The lack of an accurate assessment tool inevitably impairs any research examining novel therapeutic strategies designed to improve burn scar outcomes by introducing observer bias at every step. Common examples of these tools include the Vancouver Scar Scale and Visual analog scale. New imaging and processing technologies have the potential of bringing accuracy, reproducibility, and accessibility to burn scar assessments. With these goals in mind, our team developed a novel scoring system and a classification model based on Machine Learning algorithms and analyzed 87 pictures to obtain scores on Inflammation (I), Scar (S), Uniformity (U), and Pigmentation (P). Methods All algorithms were trained using both the sub-acute and the long-term phase pictures. The classification model is based on supervised learning, which requires many examples of annotated pictures and corresponding scar scores. The model used a Linear Discriminant Analysis (LDA) algorithm and visual features of the scars and the natural skin. To train and evaluate this model, four burn care providers individually annotated 186 pictures of skin grafts and later formed a committee to annotate by consensus a subset of representative pictures. While the individual predictions were used as an accuracy baseline, the consensus annotation was the true score and used to train the model. Results The model predictions were more accurate in scores mainly based on color (I and P), rather than texture (S and U), as shown by the micro-averaged Area Under the Curve (AUC) of 0.86, 0.61, 0.51, and 0.80 for I, S, U, and P, respectively (Figure 1). The model accuracy was higher than the human baseline for the I (F1 of 0.60 vs. 0.59±0.13, respectively) and P scores (0.54 vs. 0.51±0.09), but lower in the S (0.30 vs. 0.63±0.22) and U scores (0.62 vs. 0.86±0.19). Conclusions Our findings are encouraging and suggest that further improvement of the accuracy of the algorithm could be achieved on the second phase of our assessment development project by increasing the number of pictures it learns from and adding more visual features related to skin texture. Applicability of Research to Practice Our study provides an accurate and reproducible evaluation of burn scars, that leads to newer therapeutic strategies employed by specialized burn care facilities.

APA, Harvard, Vancouver, ISO, and other styles

44

Vadas, David, and James R. Curran. "Parsing Noun Phrases in the Penn Treebank." Computational Linguistics 37, no. 4 (December 2011): 753–809. http://dx.doi.org/10.1162/coli_a_00076.

Full text

Abstract:

Noun phrases (nps) are a crucial part of natural language, and can have a very complex structure. However, this np structure is largely ignored by the statistical parsing field, as the most widely used corpus is not annotated with it. This lack of gold-standard data has restricted previous efforts to parse nps, making it impossible to perform the supervised experiments that have achieved high performance in so many Natural Language Processing (nlp) tasks. We comprehensively solve this problem by manually annotating np structure for the entire Wall Street Journal section of the Penn Treebank. The inter-annotator agreement scores that we attain dispel the belief that the task is too difficult, and demonstrate that consistent np annotation is possible. Our gold-standard np data is now available for use in all parsers. We experiment with this new data, applying the Collins (2003) parsing model, and find that its recovery of np structure is significantly worse than its overall performance. The parser's F-score is up to 5.69% lower than a baseline that uses deterministic rules. Through much experimentation, we determine that this result is primarily caused by a lack of lexical information. To solve this problem we construct a wide-coverage, large-scale np Bracketing system. With our Penn Treebank data set, which is orders of magnitude larger than those used previously, we build a supervised model that achieves excellent results. Our model performs at 93.8% F-score on the simple task that most previous work has undertaken, and extends to bracket longer, more complex nps that are rarely dealt with in the literature. We attain 89.14% F-score on this much more difficult task. Finally, we implement a post-processing module that brackets nps identified by the Bikel (2004) parser. Our np Bracketing model includes a wide variety of features that provide the lexical information that was missing during the parser experiments, and as a result, we outperform the parser's F-score by 9.04%. These experiments demonstrate the utility of the corpus, and show that many nlp applications can now make use of np structure.

APA, Harvard, Vancouver, ISO, and other styles

45

Lee, Hee Jin, Soo Yoon Cho, Eun Yoon Cho, Yoojoo Lim, Soo Ick Cho, Wonkyung Jung, Sanghoon Song, et al. "Artificial intelligence (AI)–powered spatial analysis of tumor-infiltrating lymphocytes (TIL) for prediction of response to neoadjuvant chemotherapy (NAC) in triple-negative breast cancer (TNBC)." Journal of Clinical Oncology 40, no. 16_suppl (June 1, 2022): 595. http://dx.doi.org/10.1200/jco.2022.40.16_suppl.595.

Full text

Abstract:

595 Background: Stromal TIL are a well-recognized prognostic and predictive biomarker in breast cancer. There is a need for tools assisting visual assessment of TIL, to improve reproducibility as well as for convenience. This study aims to assess the clinical significance of AI-powered spatial TIL analysis in the prediction of pathologic complete response (pCR) after NAC in TNBC patients. Methods: H&E stained slides and clinical outcomes data were obtained from stage I – III TNBC patients treated with NAC in two centers in Korea. For spatial TIL analysis, we used Lunit SCOPE IO, an AI-powered H&E Whole-Slide Image (WSI) analyzer, which identifies and quantifies TIL within the cancer or stroma area. Lunit SCOPE IO was developed with a 13.5 x 109 micrometer2 area and 6.2 x 106 TIL from 17,849 H&E WSI of multiple cancer types, annotated by 104 board-certified pathologists. iTIL score and sTIL score were defined as area occupied by TIL in the intratumoral area (%) and the surrounding stroma (%), respectively. Immune phenotype (IP) of each slide was defined from spatial TIL calculation, as inflamed (high TIL density in tumor area), immune-excluded (high TIL density in stroma), or desert (low TIL density overall). Results: A total of 954 TNBC patients treated from 2006 to 2019 were included in this analysis. pCR (ypT0N0) was confirmed in 261 (27.4%) patients. The neoadjuvant regimens used were mostly anthracycline (97.8%) and taxane (75.1%) -based, with 116 (12.1%) patients receiving additional platinum and 41 (4.3%) patients treated as part of immune checkpoint inhibitor or PARP inhibitor clinical trials. The median iTIL score and sTIL score were 4.3% (IQR 3.2 – 5.8) and 8.1% (IQR 6.3 – 13.4), respectively. The mean iTIL score was significantly higher in patients who achieved pCR after NAC (5.8% vs. 4.5%, p < 0.001), and a similar difference was observed with sTIL score (12.1%.1 vs. 9.4%, p < 0.001). iTIL score was found to remain as an independent predictor of pCR along with cT stage and Ki-67 in the multivariable analysis (adjusted odds ratio 1.211 (95% CI 1.125 – 1.304) per 1 point (%) change in the score, p <0.001). By IP groups, 291 (30.5%) patients were classified as inflamed, 502 (52.6%) as excluded, and 161 (16.9%) as desert phenotype. The patients with inflamed phenotype were more likely to achieve pCR (44.7%) than other phenotypes (19.8%, p < 0.001). Conclusions: AI-powered spatial TIL analysis could assess TIL densities in the cancer area and surrounding stroma of TNBC, and TIL density scores and IP classification could predict pCR after NAC.

APA, Harvard, Vancouver, ISO, and other styles

46

Kolmogorova, Anastasia V. "EMOTION DETECTION AND SEMANTICS OF EMOTIVES: DISTRESS AND ANGER IN ANNOTATED TEXT DATASET." Philological Class 26, no. 2 (2021): 78–89. http://dx.doi.org/10.51762/1fk-2021-26-02-06.

Full text

Abstract:

The article explores the ways of making emotional lexemes semantic description consistent with interpretative intuition of the ordinary language speaker. The research novelty is determined by the fact that it is based on the data retrieved from the emotional assessment of 3920 internet-texts in Russian made by informants via using a specially designed computer interface. When applied this interface, we can aggregate the weight of 8 emotions (distress, enjoyment, anger, surprise, shame, excitement, disgust, fear) in text. Thus, the data we have used for this publication includes two sets of 150 internet-texts assessed by 2000 informants with the highest score of emotions of distress or anger. The scope of the study covers the semantics of two mentioned above lexemes (grust’ and gnev) analyzed through the prism of collective introspection of informants. The article purpose is to discuss the case when a semantic description of emotives is given by an expert, which largely uses “the best texts” of corresponding emotions, according to the collective opinion of informants. Our methods include psycholinguistic experiment, corpus and semantic analysis. The research led us to three main conclusions. Firstly, the semantic descriptions of emotives grust’ and gnev obtained in proposed way represent prototypical scenarios of living an emotion in social context and take into account not only the introspective sensations of an expert-linguist, but the interpretative strategies of language users. Secondly, such semantic explanation provides us with keys for explaining, why machine learning technologies are better at detecting anger than sadness in text. Finally, it creates a precedent in using new technologies for making an ecological semantic description of emotive vocabulary. The research results can find application in emotiology, lexicographic practice and didactics.

APA, Harvard, Vancouver, ISO, and other styles

47

Wieting, John, Mohit Bansal, Kevin Gimpel, and Karen Livescu. "From Paraphrase Database to Compositional Paraphrase Model and Back." Transactions of the Association for Computational Linguistics 3 (December 2015): 345–58. http://dx.doi.org/10.1162/tacl_a_00143.

Full text

Abstract:

The Paraphrase Database (PPDB; Ganitkevitch et al., 2013) is an extensive semantic resource, consisting of a list of phrase pairs with (heuristic) confidence estimates. However, it is still unclear how it can best be used, due to the heuristic nature of the confidences and its necessarily incomplete coverage. We propose models to leverage the phrase pairs from the PPDB to build parametric paraphrase models that score paraphrase pairs more accurately than the PPDB’s internal scores while simultaneously improving its coverage. They allow for learning phrase embeddings as well as improved word embeddings. Moreover, we introduce two new, manually annotated datasets to evaluate short-phrase paraphrasing models. Using our paraphrase model trained using PPDB, we achieve state-of-the-art results on standard word and bigram similarity tasks and beat strong baselines on our new short phrase paraphrase tasks.

APA, Harvard, Vancouver, ISO, and other styles

48

Tan, Hongwei, Muhammad Naeem, Hussain Ali, Muhammad Shakeel, Haiou Kuang, Ze Zhang, and Cheng Sun. "Genome Sequence of the Asian Honeybee in Pakistan Sheds Light on Its Phylogenetic Relationship with Other Honeybees." Insects 12, no. 7 (July 16, 2021): 652. http://dx.doi.org/10.3390/insects12070652.

Full text

Abstract:

In Pakistan, Apis cerana, the Asian honeybee, has been used for honey production and pollination services. However, its genomic makeup and phylogenetic relationship with those in other countries are still unknown. We collected A. cerana samples from the main cerana-keeping region in Pakistan and performed whole genome sequencing. A total of 28 Gb of Illumina shotgun reads were generated, which were used to assemble the genome. The obtained genome assembly had a total length of 214 Mb, with a GC content of 32.77%. The assembly had a scaffold N50 of 2.85 Mb and a BUSCO completeness score of 99%, suggesting a remarkably complete genome sequence for A. cerana in Pakistan. A MAKER pipeline was employed to annotate the genome sequence, and a total of 11,864 protein-coding genes were identified. Of them, 6750 genes were assigned at least one GO term, and 8813 genes were annotated with at least one protein domain. Genome-scale phylogeny analysis indicated an unexpectedly close relationship between A. cerana in Pakistan and those in China, suggesting a potential human introduction of the species between the two countries. Our results will facilitate the genetic improvement and conservation of A. cerana in Pakistan.

APA, Harvard, Vancouver, ISO, and other styles

49

Hummel, Manuela, Klaus H. Metzeler, Christian Buske, Stefan K. Bohlander, and Ulrich Mansmann. "Association between a Prognostic Gene Signature and Functional Gene Sets." Bioinformatics and Biology Insights 2 (January 2008): BBI.S1018. http://dx.doi.org/10.4137/bbi.s1018.

Full text

Abstract:

Background The development of expression-based gene signatures for predicting prognosis or class membership is a popular and challenging task. Besides their stringent validation, signatures need a functional interpretation and must be placed in a biological context. Popular tools such as Gene Set Enrichment have drawbacks because they are restricted to annotated genes and are unable to capture the information hidden in the signature's non-annotated genes. Methodology We propose concepts to relate a signature with functional gene sets like pathways or Gene Ontology categories. The connection between single signature genes and a specific pathway is explored by hierarchical variable selection and gene association networks. The risk score derived from an individual patient's signature is related to expression patterns of pathways and Gene Ontology categories. Global tests are useful for these tasks, and they adjust for other factors. GlobalAncova is used to explore the effect on gene expression in specific functional groups from the interaction of the score and selected mutations in the patient's genome. Results We apply the proposed methods to an expression data set and a corresponding gene signature for predicting survival in Acute Myeloid Leukemia (AML). The example demonstrates strong relations between the signature and cancer-related pathways. The signature-based risk score was found to be associated with development-related biological processes. Conclusions Many authors interpret the functional aspects of a gene signature by linking signature genes to pathways or relevant functional gene groups. The method of gene set enrichment is preferred to annotating signature genes to specific Gene Ontology categories. The strategies proposed in this paper go beyond the restriction of annotation and deepen the insights into the biological mechanisms reflected in the information given by a signature.

APA, Harvard, Vancouver, ISO, and other styles

50

Li, Yuxia, Yu Si, Zhonggui Tong, Lei He, Jinglin Zhang, Shiyu Luo, and Yushu Gong. "MQANet: Multi-Task Quadruple Attention Network of Multi-Object Semantic Segmentation from Remote Sensing Images." Remote Sensing 14, no. 24 (December 10, 2022): 6256. http://dx.doi.org/10.3390/rs14246256.

Full text

Abstract:

Multi-object semantic segmentation from remote sensing images has gained significant attention in land resource surveying, global change monitoring, and disaster detection. Compared to other application scenarios, the objects in the remote sensing field are larger and have a wider range of distribution. In addition, some similar targets, such as roads and concrete-roofed buildings, are easily misjudged. However, existing convolutional neural networks operate only in the local receptive field, and this limits their capacity to represent the potential association between different objects and surrounding features. This paper develops a Multi-task Quadruple Attention Network (MQANet) to address the above-mentioned issues and increase segmentation accuracy. The MQANet contains four attention modules: position attention module (PAM), channel attention module (CAM), label attention module (LAM), and edge attention module (EAM). The quadruple attention modules obtain global features by expanding the receptive fields of the network and introducing spatial context information in the label. Then, a multi-tasking mechanism which splits a multi-category segmentation task into several binary-classification segmentation tasks is introduced to improve the ability to identify similar objects. The proposed MQANet network was applied to the Potsdam dataset, the Vaihingen dataset and self-annotated images from Chongzhou and Wuzhen (CZ-WZ), representative cities in China. Our MQANet performs better over the baseline net by a large margin of +6.33 OA and +7.05 Mean F1-score on the Vaihingen dataset, +3.57 OA and +2.83 Mean F1-score on the Potsdam dataset, and +3.88 OA and +8.65 Mean F1-score on the self-annotated dataset (CZ-WZ dataset). In addition, each image execution time of the MQANet model is reduced 66.6 ms compared to UNet. Moreover, the effectiveness of MQANet was also proven by comparative experiments with other studies.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!