Articoli di riviste sul tema "Author and Document Representation Learning"

Segui questo link per vedere altri tipi di pubblicazioni sul tema: Author and Document Representation Learning.

Cita una fonte nei formati APA, MLA, Chicago, Harvard e in molti altri stili

Scegli il tipo di fonte:

Vedi i top-50 articoli di riviste per l'attività di ricerca sul tema "Author and Document Representation Learning".

Accanto a ogni fonte nell'elenco di riferimenti c'è un pulsante "Aggiungi alla bibliografia". Premilo e genereremo automaticamente la citazione bibliografica dell'opera scelta nello stile citazionale di cui hai bisogno: APA, MLA, Harvard, Chicago, Vancouver ecc.

Puoi anche scaricare il testo completo della pubblicazione scientifica nel formato .pdf e leggere online l'abstract (il sommario) dell'opera se è presente nei metadati.

Vedi gli articoli di riviste di molte aree scientifiche e compila una bibliografia corretta.

1

Para, Upendar, e M. S. Patel. "A New Term Representation Method for Gender and Age Prediction". International Journal on Recent and Innovation Trends in Computing and Communication 11, n. 5s (17 maggio 2023): 90–104. http://dx.doi.org/10.17762/ijritcc.v11i5s.6633.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Author Profiling is a kind of text classification method that is used for detecting the personality profiles such as age, gender, educational background, place of origin, personality traits, native language, etc., of authors by processing their written texts. Several applications like forensic analysis, security and marking are used the techniques of author profiling for finding the basic details of authors. The main problem in the domain of author profiling is preparation of suitable dataset for predicting the characteristics of authors. PAN is one organization conducting competitions on various types of shared tasks. In 2013, PAN organizers presented the task of author profiling in their series of competitions and continued this task in further years. They arranged different kinds of datasets in different varieties of languages. From 2013 onwards several researchers proposed solutions for author profiling to predict different personality features of authors by utilizing the datasets provided in PAN competitions. Researchers used different kinds of features like character based, lexical or word based, structural features, syntactic, content based, style based features for distinguishing the author’s writing styles in their texts. Most of the researchers observed that the content based features like words or phrases those are used in the text are most useful for detecting the personality features of authors. In this work, the experiment conducted with the content based features like most important words or terms for predicting age group and gender from the PAN competition datasets. Two datasets such as PAN 2014 and 2016 author profiling datasets are used in this experiment. The documents of dataset are converted in to a vector representation which is a suitable format for giving training to machine learning algorithms. The term representation in a document vector plays a crucial role to improve the performance of gender and age group prediction.The Term Weight Measures (TWMs) are such techniques used for this purpose to represent the significance of a term value in document vector representation. In this work, we developed a new TWM for representing the term value in document vector representation. The proposed TWM’s efficiency is compared with the efficiency of other existing TWMs. Two Machine Learning (ML) algorithms like SVM (Support Vector Machine) and RF (Random Forest) are considered in this experiment for estimating the accuracy of proposed approach. We recognized that the proposed TWM accomplished best accuracies for gender and age prediction in two PAN Datasets.
2

Ma, Yingying, Youlong Wu e Chengqiang Lu. "A Graph-Based Author Name Disambiguation Method and Analysis via Information Theory". Entropy 22, n. 4 (7 aprile 2020): 416. http://dx.doi.org/10.3390/e22040416.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Name ambiguity, due to the fact that many people share an identical name, often deteriorates the performance of information integration, document retrieval and web search. In academic data analysis, author name ambiguity usually decreases the analysis performance. To solve this problem, an author name disambiguation task is designed to divide documents related to an author name reference into several parts and each part is associated with a real-life person. Existing methods usually use either attributes of documents or relationships between documents and co-authors. However, methods of feature extraction using attributes cause inflexibility of models while solutions based on relationship graph network ignore the information contained in the features. In this paper, we propose a novel name disambiguation model based on representation learning which incorporates attributes and relationships. Experiments on a public real dataset demonstrate the effectiveness of our model and experimental results demonstrate that our solution is superior to several state-of-the-art graph-based methods. We also increase the interpretability of our method through information theory and show that the analysis could be helpful for model selection and training progress.
3

Stoean, Catalin, e Daniel Lichtblau. "Author Identification Using Chaos Game Representation and Deep Learning". Mathematics 8, n. 11 (2 novembre 2020): 1933. http://dx.doi.org/10.3390/math8111933.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
An author unconsciously encodes in the written text a certain style that is often difficult to recognize. Still, there are many computational means developed for this purpose that take into account various features, from lexical and character-based attributes to syntactic or semantic ones. We propose an approach that starts from the character level and uses chaos game representation to illustrate documents like images which are subsequently classified by a deep learning algorithm. The experiments are made on three data sets and the outputs are comparable to the results from the literature. The study also verifies the suitability of the method for small data sets and whether image augmentation can improve the classification efficiency.
4

Pooja, Km, Samrat Mondal e Joydeep Chandra. "Exploiting Higher Order Multi-dimensional Relationships with Self-attention for Author Name Disambiguation". ACM Transactions on Knowledge Discovery from Data 16, n. 5 (31 ottobre 2022): 1–23. http://dx.doi.org/10.1145/3502730.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Name ambiguity is a prevalent problem in scholarly publications due to the unprecedented growth of digital libraries and number of researchers. An author is identified by their name in the absence of a unique identifier. The documents of an author are mistakenly assigned due to underlying ambiguity, which may lead to an improper assessment of the author. Various efforts have been made in the literature to solve the name disambiguation problem with supervised and unsupervised approaches. The unsupervised approaches for author name disambiguation are preferred due to the availability of a large amount of unlabeled data. Bibliographic data contain heterogeneous features, thus recently, representation learning-based techniques have been used in literature to embed heterogeneous features in common space. Documents of a scholar are connected by multiple relations. Recently, research has shifted from a single homogeneous relation to multi-dimensional (heterogeneous) relations for the latent representation of document. Connections in graphs are sparse, and higher order links between documents give an additional clue. Therefore, we have used multiple neighborhoods in different relation types in heterogeneous graph for representation of documents. However, different order neighborhood in each relation type has different importance which we have empirically validated also. Therefore, to properly utilize the different neighborhoods in relation type and importance of each relation type in the heterogeneous graph, we propose attention-based multi-dimensional multi-hop neighborhood-based graph convolution network for embedding that uses the two levels of an attention, namely, (i) relation level and (ii) neighborhood level, in each relation. A significant improvement over existing state-of-the-art methods in terms of various evaluation matrices has been obtained by the proposed approach.
5

Kavuri, Karunakar, e M. Kavitha. "A Word Embeddings based Approach for Author Profiling: Gender and Age Prediction". International Journal on Recent and Innovation Trends in Computing and Communication 11, n. 7s (13 luglio 2023): 239–50. http://dx.doi.org/10.17762/ijritcc.v11i7s.6996.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Author Profiling (AP) is a method of identifying the demographic profiles such as age, gender, location, native language and personality traits of an author by processing their written texts. The AP techniques are used in multiple applications such as literary research, marketing, forensics and security. The researchers identified various differences in the authors writing styles by analysing various datasets. The differences in writing styles are represented as stylistic features. The researchers extracted several style based features like structural, content, word, character, syntactic, readability and semantic features to recognize the profiles of the authors. Traditionally, the researchers extracted various feature combinations for differentiating the profiles of authors. Several existing works are used Machine Learning (ML) methods for predicting the author characteristics of a new author. The existing works achieved good accuracies for predicting the author characteristics by considering the both stylistic features and ML algorithms combination. Recently, in advent of Deep Learning (DL) techniques the researchers are proposed approaches to author profiling by using these techniques. Few researchers identified that the deep learning techniques performance is good for author profiles prediction than the results of style based features. In this work, a word embeddings based approach is proposed for gender and age prediction. In this approach, the experiment conducted with different word embedding models such as Word2Vec, GloVe, FastText and BERT for generating word vectors for words. The documents are converted as vectors by using the document representation technique which uses the word embeddings of words. The document vectors are transferred to three different ML algorithms such as Extreme Gradient Boosting (XGBoost), Random Forest (RF) and Logistic Regression (LR) for generating the trained model. This model is used for predicating the accuracy of age and gender prediction. The XGBoost classifier with word embeddings of BERT achieved good accuracies for age and gender prediction than other word embeddings and ML algorithms. The experiment implemented on PAN 2014 competition Reviews dataset for age and gender prediction. The proposed approach attained best accuracies for predicting age and gender than the performances of various existing approaches proposed for AP.
6

Buffone, Brittany, Ilena Djuana, Katherine Yang, Kyle J. Wilby, Maguy S. El Hajj e Kerry Wilbur. "Diversity in health professional education scholarship: a document analysis of international author representation in leading journals". BMJ Open 10, n. 11 (novembre 2020): e043970. http://dx.doi.org/10.1136/bmjopen-2020-043970.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
ObjectivesThe global distribution of health professionals and associated training programmes is wide but prior study has demonstrated reported scholarship of teaching and learning arises from predominantly Western perspectives.DesignWe conducted a document analysis to examine authorship of recent publications to explore current international representation.Data sourcesThe table of contents of seven high-impact English-language health professional education journals between 2008 and 2018 was extracted from Embase.Eligibility criteriaThe journals were selected according to highest aggregate ranking across specific scientific impact indices and stating health professional education in scope; only original research and review articles from these publications were included for analysis.Data extraction and synthesisThe table of contents was extracted and eligible publications screened by independent reviewers who further characterised the geographic affiliations of the publishing research teams and study settings (if applicable).ResultsA total 12 018 titles were screened and 7793 (64.8%) articles included. Most were collaborations (7048, 90.4%) conducted by authors from single geographic regions (5851, 86%). Single-region teams were most often formed from countries in North America (56%), Northern Europe (14%) or Western Europe (10%). Overall lead authorship from Asian, African or South American regions was less than 15%, 5% and 1%, respectively. Geographic representation varied somewhat by journal, but not across time.ConclusionsDiversity in health professional education scholarship, as marked by nation of authors’ professional affiliations, remains low. Under-representation of published research outside Global North regions limits dissemination of novel ideas resulting in unidirectional flow of experiences and a concentrated worldview of teaching and learning.
7

Popova, Y. B., e A. V. Goloburda. "ALGORITHMIC AND PROGRAM IMPLEMENTATION OF THE PLAGIARISM DEFINITION IN LEARNING MANAGEMENT SYSTEMS". «System analysis and applied information science», n. 1 (12 giugno 2018): 71–78. http://dx.doi.org/10.21122/2309-4923-2018-1-71-78.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
The main advantage of using information technologies in education, which consists in speeding up and simplifying of information exchange, is also its drawback, because it raises the problem of plagiarism. The purpose of this paper is to develop testing text software for uniqueness in learning management systems. To achieve this goal, it is necessary to solve a range of problems related to the choice of a method for determining plagiarism, its algorithmization and software implementation. The work deals with the methods of shingles, super-shingles, signature methods, vector models of text representation, as well as cluster analysis of text information. The authors suggest a modification of the vector model to improve the accuracy of determining similar documents by creating an N-list of each document separately. As a result, a pairwise comparison of the documents and the formation of the image of one document relative to the N-list of the other will occur. Thus, in the i-th row of the similarity matrix, the coefficients of similarity of all the documents considered relative to the i-th document will be recorded. The proposed modification will also speed up the calculation process, since there is no need to search for common terms for all documents. To analyze a large number of student’s works in order to test them for plagiarism, the authors propose using a cluster approach. Its application showed that the time for determining duplicates for one document and for all documents included in the sample is the same. For the same time it is possible to get all the options for the same works of students. Thus, the use of cluster analysis of text information in determining plagiarism significantly saves both the teacher’s time and computing resources. The software implementation of the proposed algorithms is implemented as a web service in the Java language.
8

Popova, Oleksandra. "ECONOMIC AND LEGAL DISCOURSE: PARADIGM OF CHANGES IN THE XXI CENTURY (ON THE MATERIAL OF CHINESE, ENGLISH AND UKRAINIAN LANGUAGES)". Naukovy Visnyk of South Ukrainian National Pedagogical University named after K. D. Ushynsky: Linguistic Sciences 2022, n. 34 (luglio 2022): 61–73. http://dx.doi.org/10.24195/2616-5317-2022-34-6.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
The article is devoted to the study of the paradigm of changes in the content of official documents regulating economic and legal relations in the academic sphere in the XXI century. The author considers the factors influencing the process of changes in the content of official documents regulating economic and legal relations in the academic sphere in the context of the linguistic-translation paradigm. The concepts “economic and legal discourse”, “composition of the text of the document”, “academic sphere” have been clarified. Economic and legal discourse is determined through the prism of its dual nature (linguistic and extralinguistic) as a discourse of economics and law, education-driven discourse, its extralinguistic background being associated with the prerequisites for initiating elaborations in the area of education, for launching academic mobility programmes intended for participants of the teaching / learning process at the background of the native state development and intergovernmental cooperation. The text composition of the document is associated with the format (frame) of its representation, namely: the structural and compositional form of the document along with lexical and grammatical features of the economic and legal discourse. We interpret the academic sphere as the environment in which the acquisition of new knowledge, exchange of education-related information, implementation of scientific / research projects, practical manifestation of the outcomes, cultural exchange and intercultural communication take place due to the creation of certain conditions. Some changes in the structure and composition of the documents regulating economic and legal relations in the academic sphere have been characterised. The linguistic peculiarities of the English official documents and their variants of translation into Chinese and Ukrainian have been analysed. The author presents illustrative means demonstrating the interaction of factors influencing the content of official documents which regulate economic and legal relations in the academic sphere in the XXI century.
9

Dalyan, Tuğba, Hakan Ayral e Özgür Özdemir. "A Comprehensive Study of Learning Approaches for Author Gender Identification". Information Technology and Control 51, n. 3 (23 settembre 2022): 429–45. http://dx.doi.org/10.5755/j01.itc.51.3.29907.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
In recent years, author gender identification is an important yet challenging task in the fields of information retrieval and computational linguistics. In this paper, different learning approaches are presented to address the problem of author gender identification for Turkish articles. First, several classification algorithms are applied to the list of representations based on different paradigms: fixed-length vector representations such as Stylometric Features (SF), Bag-of-Words (BoW) and distributed word/document embeddings such as Word2vec, fastText and Doc2vec. Secondly, deep learning architectures, Convolution Neural Network (CNN), Recurrent Neural Network (RNN), special kinds of RNN such as Long-Short Term Memory (LSTM) and Gated Recurrent Unit (GRU), C-RNN, Bidirectional LSTM (bi-LSTM), Bidirectional GRU (bi-GRU), Hierarchical Attention Networks and Multi-head Attention (MHA) are designated and their comparable performances are evaluated. We conducted a variety of experiments and achieved outstanding empirical results. To conclude, ML algorithms with BoW have promising results. fast-Text is also probably suitable between embedding models. This comprehensive study contributes to literature utilizing different learning approaches based on several ways of representations. It is also first important attempt to identify author gender applying SF, fastText and DNN architectures to the Turkish language.
10

Tarmizi, Nursyahirah, Suhaila Saee e Dayang Hanani Abang Ibrahim. "TOWARDS CURBING CYBER-BULLYING IN MALAYSIA BY AUTHOR IDENTIFICATION OF IBAN AND KADAZANDUSUN OSN TEXT USING DEEP LEARNING". ASEAN Engineering Journal 13, n. 2 (31 maggio 2023): 145–57. http://dx.doi.org/10.11113/aej.v13.19171.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Online Social Network (OSN) is frequently used to carry out cyber-criminal actions such as cyberbullying. As a developing country in Asia that keeps abreast of ICT advancement, Malaysia is no exception when it comes to cyberbullying. Author Identification (AI) task plays a vital role in social media forensic investigation (SMF) to unveil the genuine identity of the offender by analysing the text written in OSN by the candidate culprits. Several challenges in AI dealing with OSN text, including limited text length and informal language full of internet jargon and grammatical errors that further impact AI's performance in SMF. The traditional AI system that analyses long text documents seems inadequate to analyse short OSN text's writing style. N-gram features are proven to efficiently represent the authors' writing style for shot text. However, representing N-grams in traditional representation like Tf-IDF resulted in sparse and difficult in grasping the semantic information from text. Besides, most AI works have been done in English but receive less attention in indigenous languages. In West Malaysia, the supreme languages that transcend ethnic boundaries are Iban of Sarawak and KadazanDusun of Sabah, which both are inherently under-resourced. This paper presented a proposed workflow of AI for short OSN text using two Under-Resourced Language (U-RL), Iban and KadazanDusun tweets, to curb the cyberbullying issue in Malaysia. This paper compares Tf-Idf (sparse) and SoA embedding-based (dense) feature representations to observe which representations best represent the stylistic features of the authors’ writing. N-grams of word, character, and POS were extracted as the features. The representation models were learned by different classifiers using machine learning (Naïve Bayes, Random Forest, and SVM). The convolutional neural network (CNN), a SoA deep learning model in sentence classification, was tested against the traditional classifiers. The result was observed by combining different representation models and classifiers on three datasets (English, Iban, and KadazanDusun). The best result was achieved when CNN learned embedding-based models with a combination of all features. KadazanDusun achieved the highest accuracy with 95.76%, English with 95.02%, and Iban with 94%..
11

Surya, Chennam Chandrika, Karunakar K, Murali Mohan T e R. Prasanthi Kumari. "Language Variety Prediction using Word Embeddings and Machine Leaning Algorithms". International Journal for Research in Applied Science and Engineering Technology 10, n. 12 (31 dicembre 2022): 1616–23. http://dx.doi.org/10.22214/ijraset.2022.48280.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Abstract: Author Profiling is a technique of predicting demographic characteristics like gender, age, location, nativity language, educational background etc., of an author by analysing their written texts. Author profiling is used in several text processing applications like forensics analysis, marketing, security. The author profiling techniques identify the stylistic differences among the author writing styles to identify the demographics of authors. Researchers experimented with various stylistic features like lexical features, content-based features, syntactic features, semantic features, domain specific features, structural features, readability features etc., to identify the stylistic differences among different author’s texts. The dataset plays an important role to analyse the stylistic differences of authors. PAN is one competition organizes different types of tasks in every year to encourage the participants around the globe for providing solutions to different types of text classification problems like plagiarism detection, authorship attribution, authorship verification, authorship profiling, celebrity profiling, style change detection, fake news spreaders detection, hate speech spreaders detection etc. The author profiling task was introduced in 2013 by the organizers of PAN competition. The organizers carefully gather the datasets and make available to the researchers for providing solutions to the problems. Every year the organizers conduct competitions on different sub-tasks of author profiling and provides datasets in different languages and in different genres. In 2017 competition, PAN introduces a task of predicting the language variety of an author. They release the dataset in four languages. In this work, we proposed an approach for English language dataset of language variety prediction. The proposed approach used the word embeddings generated by the Word2Vec model and BERT (Bidirectional Encoder Representations from Transformers) model. The word embeddings are used for generating the document vectors by combining the word embeddings of words those contain in documents. The document vectors are trained with two machine learning algorithms such as support vector machine and random forest. The Random Forest attained best accuracy of 96.87 for language variety prediction when experiment conducted with BERT embeddings
12

Anwar, Waheed, Imran Sarwar Bajwa e Shabana Ramzan. "Design and Implementation of a Machine Learning-Based Authorship Identification Model". Scientific Programming 2019 (16 gennaio 2019): 1–14. http://dx.doi.org/10.1155/2019/9431073.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
In this paper, a novel approach is presented for authorship identification in English and Urdu text using the LDA model with n-grams texts of authors and cosine similarity. The proposed approach uses similarity metrics to identify various learned representations of stylometric features and uses them to identify the writing style of a particular author. The proposed LDA-based approach emphasizes instance-based and profile-based classifications of an author’s text. Here, LDA suitably handles high-dimensional and sparse data by allowing more expressive representation of text. The presented approach is an unsupervised computational methodology that can handle the heterogeneity of the dataset, diversity in writing, and the inherent ambiguity of the Urdu language. A large corpus has been used for performance testing of the presented approach. The results of experiments show superiority of the proposed approach over the state-of-the-art representations and other algorithms used for authorship identification. The contributions of the presented work are the use of cosine similarity with n-gram-based LDA topics to measure similarity in vectors of text documents. Achievement of overall 84.52% accuracy on PAN12 datasets and 93.17% accuracy on Urdu news articles without using any labels for authorship identification task is done.
13

Zhao, Wentao, Dalin Zhou, Xinguo Qiu e Wei Jiang. "How to Represent Paintings: A Painting Classification Using Artistic Comments". Sensors 21, n. 6 (10 marzo 2021): 1940. http://dx.doi.org/10.3390/s21061940.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
The goal of large-scale automatic paintings analysis is to classify and retrieve images using machine learning techniques. The traditional methods use computer vision techniques on paintings to enable computers to represent the art content. In this work, we propose using a graph convolutional network and artistic comments rather than the painting color to classify type, school, timeframe and author of the paintings by implementing natural language processing (NLP) techniques. First, we build a single artistic comment graph based on co-occurrence relations and document word relations and then train an art graph convolutional network (ArtGCN) on the entire corpus. The nodes, which include the words and documents in the topological graph are initialized using a one-hot representation; then, the embeddings are learned jointly for both words and documents, supervised by the known-class training labels of the paintings. Through extensive experiments on different classification tasks using different input sources, we demonstrate that the proposed methods achieve state-of-art performance. In addition, ArtGCN can learn word and painting embeddings, and we find that they have a major role in describing the labels and retrieval paintings, respectively.
14

García-Gorrostieta, Jesús Miguel, Aurelio López-López, Samuel González-López e Adrián Pastor López-Monroy. "Improved argumentative paragraphs detection in academic theses supported with unit segmentation". Journal of Intelligent & Fuzzy Systems 42, n. 5 (31 marzo 2022): 4481–91. http://dx.doi.org/10.3233/jifs-219237.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Academic theses writing is a complex task that requires the author to be skilled in argumentation. The goal of the academic author is to communicate clear ideas and to convince the reader of the presented claims. However, few students are good arguers, and this is a skill that takes time to master. In this paper, we present an exploration of lexical features used to model automatic detection of argumentative paragraphs using machine learning techniques. We present a novel proposal, which combines the information in the complete paragraph with the detection of argumentative segments in order to achieve improved results for the detection of argumentative paragraphs. We propose two approaches; a more descriptive one, which uses the decision tree classifier with indicators and lexical features; and another more efficient, which uses an SVM classifier with lexical features and a Document Occurrence Representation (DOR). Both approaches consider the detection of argumentative segments to ensure that a paragraph detected as argumentative has indeed segments with argumentation. We achieved encouraging results for both approaches.
15

Lee, Seungpeel, Honggeun Ji, Jina Kim e Eunil Park. "What books will be your bestseller? A machine learning approach with Amazon Kindle". Electronic Library 39, n. 1 (5 aprile 2021): 137–51. http://dx.doi.org/10.1108/el-08-2020-0234.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Purpose With the rapid increase in internet use, most people tend to purchase books through online stores. Several such stores also provide book recommendations for buyer convenience, and both collaborative and content-based filtering approaches have been widely used for building these recommendation systems. However, both approaches have significant limitations, including cold start and data sparsity. To overcome these limitations, this study aims to investigate whether user satisfaction can be predicted based on easily accessible book descriptions. Design/methodology/approach The authors collected a large-scale Kindle Books data set containing book descriptions and ratings, and calculated whether a specific book will receive a high rating. For this purpose, several feature representation methods (bag-of-words, term frequency–inverse document frequency [TF-IDF] and Word2vec) and machine learning classifiers (logistic regression, random forest, naive Bayes and support vector machine) were used. Findings The used classifiers show substantial accuracy in predicting reader satisfaction. Among them, the random forest classifier combined with the TF-IDF feature representation method exhibited the highest accuracy at 96.09%. Originality/value This study revealed that user satisfaction can be predicted based on book descriptions and shed light on the limitations of existing recommendation systems. Further, both practical and theoretical implications have been discussed.
16

Pogorilyy, S. D., A. A. Kramov e P. V. Biletskyi. "METHOD FOR COHERECE EVALUATION OF UKRAINIAN TEXTS USING CONVO-LUTIONAL NEURAL NETWORK". Collection of scientific works of the Military Institute of Kyiv National Taras Shevchenko University, n. 65 (2019): 64–71. http://dx.doi.org/10.17721/2519-481x/2019/65-08.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
The estimation of text coherence is one of the most actual tasks of computer linguistics. Analysis of text coherence is widely used for writing and selection of documents. It allows clearly conveying the idea of an author to a reader. The importance of this task can be confirmed by the availability of actual works that are dedicated to solving it. Different automated methods for the estimation of text coherence are based on the methodology of machine learning. Corresponding methods are based on of formal text representation and following detection of regularities for the generation of an output result. The purpose of this work is to perform the analytic review of different automated methods for the estimation of text coherence; to justify method selection and adapt it due to the features of the Ukrainian language; to perform the experimental verification of the effectiveness of the suggested method for a Ukrainian corpus. In this paper, the comparative analysis of the methods for the estimation of coherence of English texts basing on a machine learning methodology has been performed. The expediency of application of methods that are based on trained universal models for the formalized representation of text components has been justified. The following models using neural networks with different architecture can be considered: recurrent and convolutional networks. These types of networks are widely used for text processing because they allow processing input data with an unfixed structure like sentences or words. Despite the ability of recurrent neural networks to take into account previous data (this behavior is similar to text perception by the reader), the convolutional neural network for conducting experimental research has been chosen. Such choice has been made due to the ability of convolutional neural networks to detect relations between entities regardless of the distance between them. In this paper, the principle of the method basing on the convolutional neural network and the corresponding architecture has been described. Program application for the verification of the suggested method effectiveness has been created. Formalized representation of text elements has been performed using a previously trained model for the semantic representation of words; the training process of this model has been implemented on the corpus of Ukrainian scientific abstracts. The training of the formed networks using pre-trained model has been performed. Experimental verification of method effectiveness for solving of document discrimination task and insert task has been made on the set of scientific articles. The results obtained may indicate that the method using convolutional neural networks can be used for further estimation of coherence of Ukrainian texts.
17

Song, Kaisong, Yangyang Kang, Wei Gao, Zhe Gao, Changlong Sun e Xiaozhong Liu. "Evidence Aware Neural Pornographic Text Identification for Child Protection". Proceedings of the AAAI Conference on Artificial Intelligence 35, n. 17 (18 maggio 2021): 14939–47. http://dx.doi.org/10.1609/aaai.v35i17.17753.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Identifying pornographic text online is practically useful to protect children from access to such adult content. However, some authors may intentionally avoid using sensitive words in their pornographic texts to take advantage of the lack of human audits. Without prior knowledge guidance, real semantics of such pornographic text is difficult to understand by existing methods due to its high context-sensitivity and heavy usage of figurative language, which brings huge challenges to the porn detection systems used in social media platforms. In this paper, we approach to the problem as a document-level porn identification task by locating and integrating sentence-level evidence and propose a novel Evidence-Aware Neural Porn Classification (eNPC) model. Specifically, we first propose a basic model which locates porn indicative sentences in the document with a multiple instance learning model, and then aggregate the sentence-level evidence to induce document label with self-attention mechanism. Moreover, we consider label dependencies within local context. Finally, we further enhance the sentence representation with prior knowledge produced by an automatic porn lexicon construction strategy. Extensive experimental results show that our model exhibits consistent superiority over competitors on two real-world Chinese novel datasets and an English story dataset.
18

Polat, Huseyin, e Mesut Korpe. "Estimation of Demographic Traits of the Deputies through Parliamentary Debates Using Machine Learning". Electronics 11, n. 15 (29 luglio 2022): 2374. http://dx.doi.org/10.3390/electronics11152374.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
One of the most impressive applications of the combined use of natural language processing (NLP), classical machine learning, and deep learning (DL) approaches is the estimation of demographic traits from the text. Author Profiling (AP) is the analysis of a text to identify the demographics or characteristics of its author. So far, most researchers in this field have focused on using social media data in the English language. This article aims to expand the predictive potential of demographic traits by focusing on a more diverse dataset and language. Knowing the background of deputies is essential for citizens, political scientists and policymakers. In this study, we present the application of NLP and machine learning (ML) approaches to Turkish parliamentary debates to estimate the demographic traits of the deputies. Seven traits were determined: gender, age, education, occupation, election region, party, and party status. As a first step, a corpus was compiled from Turkish parliamentary debates between 2012 and 2020. Document representations (feature extraction) were performed using various NLP techniques. Then, we created sub-datasets containing the extracted features from the corpus. These sub-datasets were used by different ML classification algorithms. The best classification accuracy rates were more than 31%, 27%, 35%, 41%, 29%, 59%, and 32% according to the majority baseline for gender, age, education, occupation, election region, party, and party status, respectively. The experimental results show that the demographics of deputies can be estimated effectively using NLP, classical ML, and DL approaches.
19

Utami, Putri Lintang, Nadi Suprapto e Hasan Nuurul Hidaayatullaah. "Exploring Research Trends of Physics Concept Mapping in Physics Learning: Bibliometric Analysis". Studies in Philosophy of Science and Education 3, n. 2 (31 luglio 2022): 58–69. http://dx.doi.org/10.46627/sipose.v3i2.308.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Research related to the application of concept mapping in physics learning is increasing and has become a trending topic for decades. This research purpose are (1) Analyze the research network and visualization concept mapping research in physics learning and contributions to learning physics (2) Analyze contribution of Indonesian Researchers on Concept mapping research (3) Analyze research recommendations related to concept mapping in physics learning. This bibliometric analysis using Scopus database in 2012-2023 period documents and analyze by VOSviewer. Visualization results with physics concept mapping be found in 6 clusters with the dominant items are mapping, students, and physics. The implementation of concept mapping in physics learning has many impact there are conceptual understanding, student performance, learning outcome, problem solving and physics misconception. Indonesia has contribute in physics concept mapping research with Universitas Negeri Manado being the most productive affiliations, Polukan c. being the most productive author and the most cited publication is “The effectiveness of Concept Mapping Content Representation Lesson Study (ComCoReLS) model to improve skills of Creating Physics Lesson Plan (CPLP) for pre-service physics teacher”. Based on network visualization the research recommendation about physics concept mapping are about concept mapping’s research in problem solving and misconceptions.
20

Brown, John Seely. "Process versus Product: A Perspective on Tools for Communal and Informal Electronic Learning". Journal of Educational Computing Research 1, n. 2 (maggio 1985): 179–201. http://dx.doi.org/10.2190/l00t-22h0-b7nj-1324.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
This article explores new paradigms for the use of computers in learning. Two concepts crucial to the development of qualitatively new kinds of computer-based learning environments are identified: the importance of focusing on the underlying process rather than just the product of a creative effort; and the importance of the computer's ability to record, represent and communicate that underlying process. We discuss the cognitive, pedagogical, and sociological issues relevant to the creation of learning environments in five domains, along with examples of specific possibilities in each: 1) Empowering environments. How can we design computer-based tools that both promote creativity and aid the development of artistic discipline? 2) Games. How can the motivational aspects of arcade-style games be transferred to more fertile learning environments? 3) Communication. How can we break away from the fundamentally linear structuring of ideas necessary in print-based communication and create tools to aid the representation and comprehension of nonlinear ideas and arguments? 4) Writing. How can we create tools to help authors move from the chaos of pre-articulate ideas to the order of a polished document? 5) Education. How can we create a computer-based system that “mirrors” students' thought processes, helping them to reflect on those processes and thereby to improve their metacognitive skills?
21

Kadu, Mr Lukesh, Dr Manoj Deshpande e Dr Vijaykumar Pawar. "Survey of Deep Learning Approaches for Twitter Text Classification". International Journal of Advanced Engineering Research and Science 9, n. 12 (2022): 106–12. http://dx.doi.org/10.22161/ijaers.912.12.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Sentiment analysis (also known as opinion mining or emotion AI) is the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjective information. Sentiment analysis is widely applied to voice of the customer materials such as reviews and survey responses, online and social media, and healthcare materials for applications that range from marketing to customer service to clinical medicine. With the rise of deep language models, such as RoBERTa, also more difficult data domains can be analyzed, e.g., news texts where authors typically express their opinion/sentiment less explicitly. Sentiment analysis aims to extract opinion automatically from data and classify them as positive and negative. Twitter widely used social media tools, been seen as an important source of information for acquiring people’s attitudes, emotions, views, and feedbacks. Within this context, Twitter sentiment analysis techniques were developed to decide whether textual tweets express a positive or negative opinion. In contrast to lower classification performance of traditional algorithms, deep learning models, including Convolution Neural Network (CNN) and Bidirectional Long Short-Term Memory (Bi-LSTM), have achieved a significant result in sentiment analysis. Keras is a Deep Learning (DL) framework that provides an embedding layer to produce the vector representation of words present in the document. The objective of this work is to analyze the performance of deep learning models namely Convolutional Neural Network (CNN), Simple Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM), bidirectional Long Short-Term Memory (Bi-LSTM), BERT and RoBERTa for classifying the twitter reviews. From the experiments conducted, it is found that RoBERTa model performs better than CNN and simple RNN for sentiment classification.
22

Juanes, Juan A., e Pablo Ruisoto. "Technological Advances and Teaching Innovation Applied to Health Science Education". Journal of Information Technology Research 7, n. 2 (aprile 2014): 1–6. http://dx.doi.org/10.4018/jitr.2014040101.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
The objective of this special issue under the title “Technological advances and teaching innovation applied to health science education” is to improve health science education, to encourage the information exchange and dissemination regarding different training aspects in medical science. Technological procedures in teaching entail an important adequacy and teaching content analysis to transmit and be acquired by students, as well as their careful presentation so that the message and knowledge reach the student more effectively. Due to this, the design of technological applications is very important so that is becomes attractive to the user, and the time spent in the learning process helps optimize it and facilitate its knowledge. The authors will introduce, to teachers and researchers, current technological application tools and their possibilities in education; providing complementary training elements that help improve the teaching and learning process in health sciences. How these application of computer technologies in education broadens the action and intercommunication possibilities between teachers and students, allowing access to new means of exploration and representation, together with new ways to access knowledge through diverse types of tools: powerful body structure visualization, multimedia imagery, computer simulations, stereoscopic visualization, virtual and augmented reality techniques, computer platforms for resource and document storage and mobile devices will be further discussed.
23

Abayeva, Galiya A., Gulzhan S. Orazayeva, Saltanat J. Omirbek, Gaukhar B. Ibatova, Venera G. Zakirova e Vera K. Vlasova. "A cross-database bibliometric analysis of ubiquitous learning: Trends, influences, and future directions". Contemporary Educational Technology 15, n. 4 (1 ottobre 2023): ep471. http://dx.doi.org/10.30935/cedtech/13648.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
The concept of ubiquitous learning has emerged as a pedagogical approach in response to the advancements made in mobile, wireless communication, and sensing technologies. The domain of ubiquitous learning is distinguished by swift progression, thereby presenting a difficulty in maintaining current knowledge of its developments. The implementation of bibliometric analysis would enable the tracking of its development and current status. The objective of the present investigation is to perform a thorough bibliometric examination of the domain of ubiquitous learning. This research aims to discern significant attributes, patterns, and influencers within the discipline by analyzing scholarly works. The primary objective of this study is to provide a comprehensive depiction of the salient characteristics and patterns exhibited by the datasets employed in ubiquitous learning research, namely Scopus, Web of Science (WoS), and merged datasets. Additionally, the study seeks to trace the historical development of publications in this domain and to ascertain the most noteworthy publications and authors that have exerted a significant impact on this field. This study provides an extensive bibliometric analysis of ubiquitous learning, examining output from Scopus, WoS, and a merged dataset. It highlights the field’s growth and the rising use of diverse data sources, with Scopus and the merged dataset revealing broader insights. The analysis reveals an interest peak in 2016 and a subsequent decline likely due to incomplete recent data. Documents, predominantly articles, differ across databases, underscoring the unique contributions of each. The study identifies “Lecture Notes in Computer Science” and “Ubiquitous Learning” as major research sources. It recognizes Hwang, G.-J. as a highly influential author, with Asian institutions leading in research output. However, Western institutions also show strong representation in WoS and merged databases. Despite variations in total citation counts, countries like China, Switzerland, and Ireland contribute significantly to the field. Terms like “mobile learning” and “life log” have vital roles in bridging research clusters, while thematic maps reveal evolving trends like mobile learning and learning analytics. The collaborative structure and key figures in ubiquitous learning are illuminated through network analysis, emphasizing the importance of cross-database analysis for a comprehensive view of the field.
24

Andriani, Agis, Fuad Abdullah, Enjang Nurhaedin, Arini Nurul Hidayati, Dewi Rosmala e Yuyus Saputra. "The Representation of Counterproductive Religious Values in a Selected Chapter of an Indonesian ELT Textbook: Systemic Functional Multimodal Discourse Analysis". Journal of Pragmatics and Discourse Research 4, n. 1 (6 febbraio 2024): 47–62. http://dx.doi.org/10.51817/jpdr.v4i1.756.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Countless studies have examined the vital role of ELT textbooks as learning sources, particularly in terms of intercultural, multicultural, and trans-cultural analysis. Yet, none of them specifically talked about religious values as the research focus. Hence, this study aimed at construing religious values represented in a selected chapter of an Indonesian ELT textbook. Descriptive problem-driven content analysis was used as the research design, whilst the research data were collected through document analysis. Later, to analyze the data, the research utilized Systemic Functional Multimodal Discourse Analysis (SF-MDA) (O’Halloran, 2008c) as the framework with a focus on representational meaning and transitivity analysis for each visual and textual data. The findings showed that two data modes represent religiosity, namely visual and verbal data. In visual, religious values (artifacts, beliefs, and behaviors) are represented by the classificational process while in verbal data; they are represented by the material and relational processes. Four of Indonesia's large recognized religious communities were represented namely Christianity, Buddhism, Hinduism, and Confucianism. Yet, there is no single datum that figures Islamic values, whereas, the Islamic community is the largest in the country and even in the world. Hence, this implication suggests that stakeholders (particularly textbook authors) should pay attention to the issue of how to fairly present the five legalized communities' values existing in Indonesia. Therefore, because Indonesia has varied its communities, ethnicities, and backgrounds, ELT textbooks should fairly embody the diversities more over the religious aspects which are the core competence to gain.
25

Liu, Yi, Qiuyan Zhu, Pengfei Jiang, Yang Yang, Mingyun Wang, Hao Liang, Qinghua Peng e Qiuyan Zhang. "Bibliometric and visualized analysis of DME from 2012 to 2022". Medicine 103, n. 13 (29 marzo 2024): e37347. http://dx.doi.org/10.1097/md.0000000000037347.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Background: Diabetic macular edema (DME) is the main cause of irreversible vision loss in patients with diabetes mellitus (DM), resulting in a certain burden to patients and society. With the increasing incidence of DME, more and more researchers are focusing on it. Methods: The papers related to DME between 2012 and 2022 from the Web of Science core Collection were searched in this study. Based on CiteSpace and VOS viewer, these publications were analyzed in terms of spatiotemporal distribution, author distribution, subject classification, topic distribution, and citations. Results: A total of 5165 publications on DME were included. The results showed that the research on DME is on a steady growth trend. The country with the highest number of published documents was the US. Wong Tien Yin from Tsinghua University was the author with the most published articles. The journal of Retina, the Journal of Retinal and Vitreous Diseases had a large number of publications. The article “Mechanisms of macular edema: Beyond the surface” was the highly cited literature and “Aflibercept, bevacizumab, or ranibizumab for diabetic macular edema” had the highest co-citation frequency. The treatment, diagnosis, pathogenesis, as well as etiology and epidemiological investigation of DME, have been the current research direction. Deep learning has been widely used in the medical field for its strong feature representation ability. Conclusions: The study revealed the important authoritative literature, journals, institutions, scholars, countries, research hotspots, and development trends in in the field of DME. This indicates that communication and cooperation between disciplines, universities, and countries are crucial. It can advance research in DME and even ophthalmology.
26

Pappas, Nikolaos, e Andrei Popescu-Belis. "Explicit Document Modeling through Weighted Multiple-Instance Learning". Journal of Artificial Intelligence Research 58 (22 marzo 2017): 591–626. http://dx.doi.org/10.1613/jair.5240.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Representing documents is a crucial component in many NLP tasks, for instance predicting aspect ratings in reviews. Previous methods for this task treat documents globally, and do not acknowledge that target categories are often assigned by their authors with generally no indication of the specific sentences that motivate them. To address this issue, we adopt a weakly supervised learning model, which jointly learns to focus on relevant parts of a document according to the context along with a classifier for the target categories. Derived from the weighted multiple-instance regression (MIR) framework, the model learns decomposable document vectors for each individual category and thus overcomes the representational bottleneck in previous methods due to a fixed-length document vector. During prediction, the estimated relevance or saliency weights explicitly capture the contribution of each sentence to the predicted rating, thus offering an explanation of the rating. Our model achieves state-of-the-art performance on multi-aspect sentiment analysis, improving over several baselines. Moreover, the predicted saliency weights are close to human estimates obtained by crowdsourcing, and increase the performance of lexical and topical features for review segmentation and summarization.
27

Macri, Carmelo, Stephen Bacchi, Sheng Chieh Teoh, Wan Yin Lim, Lydia Lam, Sandy Patel, Mark Slee, Robert Casson e WengOnn Chan. "Evaluating the Ability of Open-Source Artificial Intelligence to Predict Accepting-Journal Impact Factor and Eigenfactor Score Using Academic Article Abstracts: Cross-sectional Machine Learning Analysis". Journal of Medical Internet Research 25 (7 marzo 2023): e42789. http://dx.doi.org/10.2196/42789.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Background Strategies to improve the selection of appropriate target journals may reduce delays in disseminating research results. Machine learning is increasingly used in content-based recommender algorithms to guide journal submissions for academic articles. Objective We sought to evaluate the performance of open-source artificial intelligence to predict the impact factor or Eigenfactor score tertile using academic article abstracts. Methods PubMed-indexed articles published between 2016 and 2021 were identified with the Medical Subject Headings (MeSH) terms “ophthalmology,” “radiology,” and “neurology.” Journals, titles, abstracts, author lists, and MeSH terms were collected. Journal impact factor and Eigenfactor scores were sourced from the 2020 Clarivate Journal Citation Report. The journals included in the study were allocated percentile ranks based on impact factor and Eigenfactor scores, compared with other journals that released publications in the same year. All abstracts were preprocessed, which included the removal of the abstract structure, and combined with titles, authors, and MeSH terms as a single input. The input data underwent preprocessing with the inbuilt ktrain Bidirectional Encoder Representations from Transformers (BERT) preprocessing library before analysis with BERT. Before use for logistic regression and XGBoost models, the input data underwent punctuation removal, negation detection, stemming, and conversion into a term frequency-inverse document frequency array. Following this preprocessing, data were randomly split into training and testing data sets with a 3:1 train:test ratio. Models were developed to predict whether a given article would be published in a first, second, or third tertile journal (0-33rd centile, 34th-66th centile, or 67th-100th centile), as ranked either by impact factor or Eigenfactor score. BERT, XGBoost, and logistic regression models were developed on the training data set before evaluation on the hold-out test data set. The primary outcome was overall classification accuracy for the best-performing model in the prediction of accepting journal impact factor tertile. Results There were 10,813 articles from 382 unique journals. The median impact factor and Eigenfactor score were 2.117 (IQR 1.102-2.622) and 0.00247 (IQR 0.00105-0.03), respectively. The BERT model achieved the highest impact factor tertile classification accuracy of 75.0%, followed by an accuracy of 71.6% for XGBoost and 65.4% for logistic regression. Similarly, BERT achieved the highest Eigenfactor score tertile classification accuracy of 73.6%, followed by an accuracy of 71.8% for XGBoost and 65.3% for logistic regression. Conclusions Open-source artificial intelligence can predict the impact factor and Eigenfactor score of accepting peer-reviewed journals. Further studies are required to examine the effect on publication success and the time-to-publication of such recommender systems.
28

Minang, Putri Sundari, e Gustaman Saragih. "LEXICAL COLLOCATION AND TEXTUAL COHESION IN STUDENT’S ENGLISH WRITING DISCUSSION TEXT". INFERENCE: Journal of English Language Teaching 5, n. 1 (21 luglio 2022): 62. http://dx.doi.org/10.30998/inference.v5i1.8068.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
<p><strong>Abstract</strong><strong>: </strong>The purpose of this study was to determine (1) the form of significant lexical collocation in student's English writing discussion text, (2) the process of lexical collocation creates cohesion in student's English writing discussion text, and (3) the cultural reflection displayed by the structure of lexical collocation and textual cohesion in student's English writing discussion text. This study applies a mixture of descriptive and quantitative qualitative methods in describing research findings. The data source of this research is the results of writing discussion texts in 3 PKBM Karawang. Data that has been documented in total is 30 discussion texts with 5274 words and 1664 types of words. The author uses the help of a computer application called Antconc to obtain the collocation data. The findings show that (1) the lexical collocations produced by students in Paket C in the discussion text are L1 (15%), L2 (34%), L3 (18%), L4 (4%), L5 (18%), and L7 (11%); (2) reference data are personal (51%), demonstrative (30%), and comparative (10%); conjunction data are additive (44%), adversative (15%), temporal (5%), and causal (36%). (3) The cultural representation found was the issue raised as the title of the text sourced from daily activities and used the words claim and assumed to describe students' ideas and opinions in the text. This shows that collocation is able to get a special place in learning English to improve the writing ability of students in Paket C in PKBM.</p>
29

Yan, Yan, Xu-Cheng Yin, Sujian Li, Mingyuan Yang e Hong-Wei Hao. "Learning Document Semantic Representation with Hybrid Deep Belief Network". Computational Intelligence and Neuroscience 2015 (2015): 1–9. http://dx.doi.org/10.1155/2015/650527.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
High-level abstraction, for example, semantic representation, is vital for document classification and retrieval. However, how to learn document semantic representation is still a topic open for discussion in information retrieval and natural language processing. In this paper, we propose a new Hybrid Deep Belief Network (HDBN) which uses Deep Boltzmann Machine (DBM) on the lower layers together with Deep Belief Network (DBN) on the upper layers. The advantage of DBM is that it employs undirected connection when training weight parameters which can be used to sample the states of nodes on each layer more successfully and it is also an effective way to remove noise from the different document representation type; the DBN can enhance extract abstract of the document in depth, making the model learn sufficient semantic representation. At the same time, we explore different input strategies for semantic distributed representation. Experimental results show that our model using the word embedding instead of single word has better performance.
30

Siddiqui, Shoaib Ahmed, Andreas Dengel e Sheraz Ahmed. "Self-Supervised Representation Learning for Document Image Classification". IEEE Access 9 (2021): 164358–67. http://dx.doi.org/10.1109/access.2021.3133200.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
31

Zhang, Wenyue, Yang Li e Suge Wang. "Learning document representation via topic-enhanced LSTM model". Knowledge-Based Systems 174 (giugno 2019): 194–204. http://dx.doi.org/10.1016/j.knosys.2019.03.007.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
32

Liu, Ying, Tongzhou Zhao, Yue Chai e Yiqi Jiang. "A Word Elimination Strategy for Learning Document Representation". IOP Conference Series: Materials Science and Engineering 466 (28 dicembre 2018): 012091. http://dx.doi.org/10.1088/1757-899x/466/1/012091.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
33

Takale, Sheetal Ajaykumar. "Knowledge Representation for Legal Document Summarization". International Journal of Innovative Research in Computer Science and Technology 11, n. 4 (13 luglio 2023): 61–66. http://dx.doi.org/10.55524/ijircst.2023.11.4.11.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
This paper presents a novel approach for legal document summarization. Proposed approach is based on Ripple-Down Rules (RDR). It is an incremental knowledge acquisition method. RDR allows us to quickly build an extendable knowledge base using classification rules. The classification rules are written using a set of features. Summary is generated using the identified rhetorical roles in the document. Experiments demonstrate that the RDR based Legal Document summarization approach outperforms the supervised and unsupervised machine learning models.
34

Morariu, Daniel, Lucian Vințan e Radu Crețulescu. "An Extension of the VSM Documents Representation using Word Embedding". Balkan Region Conference on Engineering and Business Education 2, n. 1 (20 dicembre 2017): 249–57. http://dx.doi.org/10.1515/cplbu-2017-0033.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Abstract In this paper, we will present experiments that try to integrate the power of Word Embedding representation in real problems for documents classification. Word Embedding is a new tendency used in the natural language processing domain that tries to represent each word from the document in a vector format. This representation embeds the semantically context in that the word occurs more frequently. We include this new representation in a classical VSM document representation and evaluate it using a learning algorithm based on the Support Vector Machine. This new added information makes the classification to be more difficult because it increases the learning time and the memory needed. The obtained results are slightly weaker comparatively with the classical VSM document representation. By adding the WE representation to the classical VSM representation we want to improve the current educational paradigm for the computer science students which is generally limited to the VSM representation.
35

Daele, Amaury. "Reifying, Participating and Learning". International Journal of Web-Based Learning and Teaching Technologies 5, n. 1 (gennaio 2010): 43–60. http://dx.doi.org/10.4018/jwltt.2010010104.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
This paper presents observations and analysis of an activity of reification of professional practices within a community of practice. A case is examined of a distance community of tutors using a semantic Wiki for formalising their practices and a tool for storing and classifying documents. On the basis of the instrumental genesis theory, the author highlights the process of appropriation of the tools by the community of practice. This community participated in the development and conception of uses for the tools through a research and development project based on participatory design. This appropriation process, even if it did not occur to the expected extent, did nonetheless allow the community’s members to develop their representations regarding the reification of their practices and, gradually, to elaborate broader uses of the tools.
36

Krisnaningsih, Erina, Maharani Ayu Nurdiana Putri, Tsabitamia Irba, Nadi Supapto, Utama Alan Deta e Eko Hariyono. "Bibliometric Analysis of Multi Representation Based on Problem-Solving Skills Using VOSviewer". Berkala Ilmiah Pendidikan Fisika 9, n. 3 (1 novembre 2021): 274. http://dx.doi.org/10.20527/bipf.v9i3.11329.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
The purpose of this study was to analyze the scope related to the subject of problem-solving skills based on multiple representation in 2016 – 2020 with 20 documents through bibliometric analysis. The research method used was a literature study through all the articles analyzed in this study. The articles were taken from the Scopus database with sampling in 2003 – 2020, resulting in 29 scientific work data exported in *.ris (RIS) and *CVS formats. Then, those data were processed using VOSviewer and Microsoft Excel. The results of publications in the last five years have increased. Indonesia is the dominant country in publicizing papers about this topic. Institutions from Germany managed to publish most of the documents about multi representation. Meanwhile, Poland is the origin country of the authors with most publications. The visualization of research trends on multi representation resulted in four main clusters: (1) multi representation related to students, representation, and learning processes (2) multi representation as a class (3) multi representation related to the problem (4) multi representation as a model and process. Meanwhile, Indonesian researchers are very active in contributing to this topic, in line with the number of publications by country, namely Indonesia.
37

Duan, Junwen, Xiao Ding, Yue Zhang e Ting Liu. "TEND: A Target-Dependent Representation Learning Framework for News Document". IEEE/ACM Transactions on Audio, Speech, and Language Processing 27, n. 12 (dicembre 2019): 2313–25. http://dx.doi.org/10.1109/taslp.2019.2947364.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
38

Stolyarov, Yu N. "Theory for the good of practice. To discussion on the origins of documentology". Scientific and Technical Libraries, n. 6 (23 giugno 2022): 137–51. http://dx.doi.org/10.33186/1027-3689-2022-6-137-151.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
The author responds to the considerations expressed mainly by E. A. Plesh-kevich in his article “On documentology and its methodology”. The author praises Pleshkevich’s acknowledging the concept of “documentology”. Indeed, the term “document” differs pertaining to the subject (substantially) and object (functional-ly), and this representation makes the foundation for relativeness of this concept in practical applications. Meanwhile, the author points to the disruptiveness of other Pleshkevich’s arguments: alleged unscientific nature of differentiated ap-proach to the concept of “document”, validity of pure theorizing regardless of practice, and denial of Paul Otlet’s methodology. The latter thought that there was the overdue need to develop the general theory of document similar to that in biology, sociology, physics and the so-called big sciences. The author demonstrates the wrongfulness of abandoning of the wider definition of the document by the International Standardization Organization (ISO). According to ISO, the document is an object that may be regarded as an element of documented quality management system. The practical effectvenesss of the method of rising from the abstract to the concrete which enables to narrow the concept of document down to the needs of individual libraries, is substantiated.
39

Younas, Junaid, Shoaib Ahmed Siddiqui, Mohsin Munir, Muhammad Imran Malik, Faisal Shafait, Paul Lukowicz e Sheraz Ahmed. "Fi-Fo Detector: Figure and Formula Detection Using Deformable Networks". Applied Sciences 10, n. 18 (16 settembre 2020): 6460. http://dx.doi.org/10.3390/app10186460.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
We propose a novel hybrid approach that fuses traditional computer vision techniques with deep learning models to detect figures and formulas from document images. The proposed approach first fuses the different computer vision based image representations, i.e., color transform, connected component analysis, and distance transform, termed as Fi-Fo image representation. The Fi-Fo image representation is then fed to deep models for further refined representation-learning for detecting figures and formulas from document images. The proposed approach is evaluated on a publicly available ICDAR-2017 Page Object Detection (POD) dataset and its corrected version. It produces the state-of-the-art results for formula and figure detection in document images with an f1-score of 0.954 and 0.922, respectively. Ablation study results reveal that the Fi-Fo image representation helps in achieving superior performance in comparison to raw image representation. Results also establish that the hybrid approach helps deep models to learn more discriminating and refined features.
40

Kilimci, Zeynep H., e Selim Akyokus. "Deep Learning- and Word Embedding-Based Heterogeneous Classifier Ensembles for Text Classification". Complexity 2018 (9 ottobre 2018): 1–10. http://dx.doi.org/10.1155/2018/7130146.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
The use of ensemble learning, deep learning, and effective document representation methods is currently some of the most common trends to improve the overall accuracy of a text classification/categorization system. Ensemble learning is an approach to raise the overall accuracy of a classification system by utilizing multiple classifiers. Deep learning-based methods provide better results in many applications when compared with the other conventional machine learning algorithms. Word embeddings enable representation of words learned from a corpus as vectors that provide a mapping of words with similar meaning to have similar representation. In this study, we use different document representations with the benefit of word embeddings and an ensemble of base classifiers for text classification. The ensemble of base classifiers includes traditional machine learning algorithms such as naïve Bayes, support vector machine, and random forest and a deep learning-based conventional network classifier. We analysed the classification accuracy of different document representations by employing an ensemble of classifiers on eight different datasets. Experimental results demonstrate that the usage of heterogeneous ensembles together with deep learning methods and word embeddings enhances the classification performance of texts.
41

Putra, Fredi Ganda, Dewi Lengkana, Sugeng Sutiarso, Nurhanurawati Nurhanurawati, Antomi Saregar, Rahma Diani, Santi Widyawati, Suparman Suparman, Khoirunnisa Imama e Rofiqul Umam. "Mathematical representation: A bibliometric mapping of the research literature (2013–2022)". Infinity Journal 13, n. 1 (8 ottobre 2023): 1–26. http://dx.doi.org/10.22460/infinity.v13i1.p1-26.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Mathematical representation ability is an essential skill for students to understand mathematical concepts. Many studies have been conducted regarding this ability, but it is necessary to map existing research to provide a clearer picture of future research topics. This study aims to provide a bibliometric review of trends using mathematical representation skills in mathematics teaching research. The method in this study is bibliometric analysis, which aims to analyze and classify bibliographic material by presenting representative summaries of the literature in the Scopus database. The search was carried out using the keyword "mathematical representation" and selecting "article title" in the search menu in the Scopus.com database. Perish or Publish (PoP) software analyzes the author's name, number of document citations, document title, year of publication, document source, publisher, and document type. The results showed 99 publications and 357 citations related to mathematical representations, where the number of publications and citations fluctuated. The application of learning models and approaches, computer media, and analysis of mathematical representations is a research trend related to this variable. Therefore, paying attention to mathematical representations in learning mathematics and using effective strategies to improve students' mathematical representation abilities is essential. The findings of this study indicate the need to develop syntax and learning media based on mathematical representations to strengthen students' mathematical abilities.
42

Feng, Kai, Lan Huang, Hao Xu, Kangping Wang, Wei Wei e Rui Zhang. "Deep Multilabel Multilingual Document Learning for Cross-Lingual Document Retrieval". Entropy 24, n. 7 (7 luglio 2022): 943. http://dx.doi.org/10.3390/e24070943.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Cross-lingual document retrieval, which aims to take a query in one language to retrieve relevant documents in another, has attracted strong research interest in the last decades. Most studies on this task start with cross-lingual comparisons at the word level and then represent documents via word embeddings, which leads to insufficient structure information. In this work, the cross-lingual comparison at the document level is achieved through the cross-lingual semantic space. Our method, MDL (deep multilabel multilingual document learning), leverages a six-layer fully connected network to project cross-lingual documents into a shared semantic space. The semantic distances can be calculated when the cross-lingual documents are transformed into embeddings in semantic space. The supervision signals are automatically extracted from the data and then used to construct the semantic space via a linear classifier. The ambiguity of manual labels could be avoided and the multilabel supervision signals can be acquired instead of a single label. The representation of the semantic space is enriched by multilabel supervision signals, which improves the discriminative ability of the embeddings. The MDL is easy to extend to other fields since it does not depend on specific data. Furthermore, MDL is more efficient than the models training all languages jointly, since each language is trained individually. Experiments on Wikipedia data showed that the proposed method outperforms the state-of-the-art cross-lingual document retrieval methods.
43

Xu, Shusheng, Xingxing Zhang, Yi Wu e Furu Wei. "Sequence Level Contrastive Learning for Text Summarization". Proceedings of the AAAI Conference on Artificial Intelligence 36, n. 10 (28 giugno 2022): 11556–65. http://dx.doi.org/10.1609/aaai.v36i10.21409.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Contrastive learning models have achieved great success in unsupervised visual representation learning, which maximize the similarities between feature representations of different views of the same image, while minimize the similarities between feature representations of views of different images. In text summarization, the output summary is a shorter form of the input document and they have similar meanings. In this paper, we propose a contrastive learning model for supervised abstractive text summarization, where we view a document, its gold summary and its model generated summaries as different views of the same mean representation and maximize the similarities between them during training. We improve over a strong sequence-to-sequence text generation model (i.e., BART) on three different summarization datasets. Human evaluation also shows that our model achieves better faithfulness ratings compared to its counterpart without contrastive objectives. We release our code at https://github.com/xssstory/SeqCo.
44

Apandi, Siti Hawa, Jamaludin Sallim, Rozlina Mohamed e Norkhairi Ahmad. "Automatic Topic-Based Web Page Classification Using Deep Learning". JOIV : International Journal on Informatics Visualization 7, n. 3-2 (30 novembre 2023): 2108. http://dx.doi.org/10.30630/joiv.7.3-2.1616.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
The internet is frequently surfed by people by using smartphones, laptops, or computers in order to search information online in the web. The increase of information in the web has made the web pages grow day by day. The automatic topic-based web page classification is used to manage the excessive amount of web pages by classifying them to different categories based on the web page content. Different machine learning algorithms have been employed as web page classifiers to categorise the web pages. However, there is lack of study that review classification of web pages using deep learning. In this study, the automatic topic-based classification of web pages utilising deep learning that has been proposed by many key researchers are reviewed. The relevant research papers are selected from reputable research databases. The review process looked at the dataset, features, algorithm, pre-processing used in classification of web pages, document representation technique and performance of the web page classification model. The document representation technique used to represent the web page features is an important aspect in the classification of web pages as it affects the performance of the web page classification model. The integral web page feature is the textual content. Based on the review, it was found that the image based web page classification showed higher performance compared to the text based web page classification. Due to lack of matrix representation that can effectively handle long web page text content, a new document representation technique which is word cloud image can be used to visualize the words that have been extracted from the text content web page.
45

Buckley, Charles, e Chrissi Nerantzi. "Effective Use of Visual Representation in Research and Teaching within Higher Education". International Journal of Management and Applied Research 7, n. 3 (4 settembre 2020): 196–214. http://dx.doi.org/10.18646/2056.73.20-014.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
There are now increasing opportunities for educators to use creative forms of visual representation in their professional practice. Despite the potential for increasing researcher and teacher understanding and student engagement and learning through the proliferation of visual material, the rationale and deliberate planning of using images remains relatively unexplored. The potential benefits to learners through the incorporation of visual representation on its own or with text are well-documented although the ways in which it can be used effectively is less well-established. This paper provides an introduction to some of the research into using visual representation within researching and teaching and learning within higher education. It draws on examples from the authors’ own practice to provide insights into a selection of ways in which visual representation might be used in various ways such as generative/analytical techniques and communicative tools. The authors provide two examples of visualised frameworks and models that have been developed and used in the context of academic development; the use of simple relationship diagrams in learning and teaching and dissemination of practice; the use of diagrams to explain complex phenomenon and an example of using images juxtaposed with diagrams and text to present a case for professional teaching recognition.
46

Zhao, Yi, Yu Qiao e Keqing He. "A Novel Tagging Augmented LDA Model for Clustering". International Journal of Web Services Research 16, n. 3 (luglio 2019): 59–77. http://dx.doi.org/10.4018/ijwsr.2019070104.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Clustering has become an increasingly important task in the analysis of large documents. Clustering aims to organize these documents, and facilitate better search and knowledge extraction. Most existing clustering methods that use user-generated tags only consider their positive influence for improving automatic clustering performance. The authors argue that not all user-generated tags can provide useful information for clustering. In this article, the authors propose a new solution for clustering, named HRT-LDA (High Representation Tags Latent Dirichlet Allocation), which considers the effects of different tags on clustering performance. For this, the authors perform a tag filtering strategy and a tag appending strategy based on transfer learning, Word2vec, TF-IDF and semantic computing. Extensive experiments on real-world datasets demonstrate that HRT-LDA outperforms the state-of-the-art tagging augmented LDA methods for clustering.
47

Peng, Liwen, Siqi Shen, Jun Xu, Yongquan Fu, Dongsheng Li e Adele Lu Jia. "Diting: An Author Disambiguation Method Based on Network Representation Learning". IEEE Access 7 (2019): 135539–55. http://dx.doi.org/10.1109/access.2019.2942477.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
48

WEI, YANG, JINMAO WEI e ZHENGLU YANG. "Unsupervised learning of semantic representation for documents with the law of total probability". Natural Language Engineering 24, n. 4 (2 novembre 2017): 491–522. http://dx.doi.org/10.1017/s1351324917000420.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
AbstractThe semantic information of documents needs to be represented because it is the basis for many applications, such as document summarization, web search, and text analysis. Although many studies have explored this problem by enriching document vectors with the relatedness of the words involved, the performance remains far from satisfactory because the physical boundaries of documents hinder the evaluation of the relatedness between words. To address this problem, we propose an effective approach to further infer the implicit relatedness between words via their common related words. To avoid overestimation of the implicit relatedness, we restrict the inference in terms of the marginal probabilities of the words based on the law of total probability. The proposed method measures the relatedness between words, which is confirmed theoretically and experimentally. Thorough evaluation on real datasets illustrates that significant improvement on document clustering has been achieved with the proposed method compared with state-of-the-art methods.
49

Al-Sabahi, Kamal, e Zhang Zuping. "Document Summarization Using Sentence-Level Semantic Based on Word Embeddings". International Journal of Software Engineering and Knowledge Engineering 29, n. 02 (febbraio 2019): 177–96. http://dx.doi.org/10.1142/s0218194019500086.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
In the era of information overload, text summarization has become a focus of attention in a number of diverse fields such as, question answering systems, intelligence analysis, news recommendation systems, search results in web search engines, and so on. A good document representation is the key point in any successful summarizer. Learning this representation becomes a very active research in natural language processing field (NLP). Traditional approaches mostly fail to deliver a good representation. Word embedding has proved an excellent performance in learning the representation. In this paper, a modified BM25 with Word Embeddings are used to build the sentence vectors from word vectors. The entire document is represented as a set of sentence vectors. Then, the similarity between every pair of sentence vectors is computed. After that, TextRank, a graph-based model, is used to rank the sentences. The summary is generated by picking the top-ranked sentences according to the compression rate. Two well-known datasets, DUC2002 and DUC2004, are used to evaluate the models. The experimental results show that the proposed models perform comprehensively better compared to the state-of-the-art methods.
50

Bajner, Maria. "Lifelong Learning Redefined: From Sustainability to Generational Learning". Andragoška spoznanja 25, n. 3 (8 ottobre 2019): 35–45. http://dx.doi.org/10.4312/as.25.3.35-45.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
The following paper is intended to give a brief account of the trends in lifelong learning as they appear in the official documents of UNESCO and OECD. It identifies the driving forces behind humanistic and utilitarian considerations in the opposing approaches of UNESCO and OECD, while it also addresses the role of political influencers in confusing the issues. The author uses document analysis of studies and findings of international surveys to shed light on the ambivalent stances in educational documents towards the importance of lifelong learning. The author will argue that a shift in rhetoric from lifelong learning to generational learning is needed in order to eliminate “doublespeak” and meet the needs of today's generations brought up often with utilitarian values and high economic expectations.

Vai alla bibliografia