Siga este enlace para ver otros tipos de publicaciones sobre el tema: Multilingual information extraction.

Artículos de revistas sobre el tema "Multilingual information extraction"

Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros

Elija tipo de fuente:

Consulte los 50 mejores artículos de revistas para su investigación sobre el tema "Multilingual information extraction".

Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.

También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.

Explore artículos de revistas sobre una amplia variedad de disciplinas y organice su bibliografía correctamente.

1

Claro, Daniela Barreiro, Marlo Souza, Clarissa Castellã Xavier y Leandro Oliveira. "Multilingual Open Information Extraction: Challenges and Opportunities". Information 10, n.º 7 (2 de julio de 2019): 228. http://dx.doi.org/10.3390/info10070228.

Texto completo
Resumen
The number of documents published on the Web in languages other than English grows every year. As a consequence, the need to extract useful information from different languages increases, highlighting the importance of research into Open Information Extraction (OIE) techniques. Different OIE methods have dealt with features from a unique language; however, few approaches tackle multilingual aspects. In those approaches, multilingualism is restricted to processing text in different languages, rather than exploring cross-linguistic resources, which results in low precision due to the use of general rules. Multilingual methods have been applied to numerous problems in Natural Language Processing, achieving satisfactory results and demonstrating that knowledge acquisition for a language can be transferred to other languages to improve the quality of the facts extracted. We argue that a multilingual approach can enhance OIE methods as it is ideal to evaluate and compare OIE systems, and therefore can be applied to the collected facts. In this work, we discuss how the transfer knowledge between languages can increase acquisition from multilingual approaches. We provide a roadmap of the Multilingual Open IE area concerning state of the art studies. Additionally, we evaluate the transfer of knowledge to improve the quality of the facts extracted in each language. Moreover, we discuss the importance of a parallel corpus to evaluate and compare multilingual systems.
Los estilos APA, Harvard, Vancouver, ISO, etc.
2

Khairova, Nina, Orken Mamyrbayev, Kuralay Mukhsina, Anastasiia Kolesnyk y Saurabh Pratap. "Logical-linguistic model for multilingual Open Information Extraction". Cogent Engineering 7, n.º 1 (1 de enero de 2020): 1714829. http://dx.doi.org/10.1080/23311916.2020.1714829.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
3

Hashemzahde, Bahare y Majid Abdolrazzagh-Nezhad. "Improving keyword extraction in multilingual texts". International Journal of Electrical and Computer Engineering (IJECE) 10, n.º 6 (1 de diciembre de 2020): 5909. http://dx.doi.org/10.11591/ijece.v10i6.pp5909-5916.

Texto completo
Resumen
The accuracy of keyword extraction is a leading factor in information retrieval systems and marketing. In the real world, text is produced in a variety of languages, and the ability to extract keywords based on information from different languages improves the accuracy of keyword extraction. In this paper, the available information of all languages is applied to improve a traditional keyword extraction algorithm from a multilingual text. The proposed keywork extraction procedure is an unsupervise algorithm and designed based on selecting a word as a keyword of a given text, if in addition to that language holds a high rank based on the keywords criteria in other languages, as well. To achieve to this aim, the average TF-IDF of the candidate words were calculated for the same and the other languages. Then the words with the higher averages TF-IDF were chosen as the extracted keywords. The obtained results indicat that the algorithms’ accuracis of the multilingual texts in term frequency-inverse document frequency (TF-IDF) algorithm, graph-based algorithm, and the improved proposed algorithm are 80%, 60.65%, and 91.3%, respectively.
Los estilos APA, Harvard, Vancouver, ISO, etc.
4

Vasilkovsky, Michael, Anton Alekseev, Valentin Malykh, Ilya Shenbin, Elena Tutubalina, Dmitriy Salikhov, Mikhail Stepnov, Andrey Chertok y Sergey Nikolenko. "DetIE: Multilingual Open Information Extraction Inspired by Object Detection". Proceedings of the AAAI Conference on Artificial Intelligence 36, n.º 10 (28 de junio de 2022): 11412–20. http://dx.doi.org/10.1609/aaai.v36i10.21393.

Texto completo
Resumen
State of the art neural methods for open information extraction (OpenIE) usually extract triplets (or tuples) iteratively in an autoregressive or predicate-based manner in order not to produce duplicates. In this work, we propose a different approach to the problem that can be equally or more successful. Namely, we present a novel single-pass method for OpenIE inspired by object detection algorithms from computer vision. We use an order-agnostic loss based on bipartite matching that forces unique predictions and a Transformer-based encoder-only architecture for sequence labeling. The proposed approach is faster and shows superior or similar performance in comparison with state of the art models on standard benchmarks in terms of both quality metrics and inference time. Our model sets the new state of the art performance of 67.7% F1 on CaRB evaluated as OIE2016 while being 3.35x faster at inference than previous state of the art. We also evaluate the multilingual version of our model in the zero-shot setting for two languages and introduce a strategy for generating synthetic multilingual data to fine-tune the model for each specific language. In this setting, we show performance improvement of 15% on multilingual Re-OIE2016, reaching 75% F1 for both Portuguese and Spanish languages. Code and models are available at https://github.com/sberbank-ai/DetIE.
Los estilos APA, Harvard, Vancouver, ISO, etc.
5

Ghimire, Dadhi Ram, Sanjeev Panday y Aman Shakya. "Information Extraction from a Large Knowledge Graph in the Nepali Language". National College of Computer Studies Research Journal 3, n.º 1 (9 de diciembre de 2024): 33–49. https://doi.org/10.3126/nccsrj.v3i1.72336.

Texto completo
Resumen
Information is abundant in the web. The knowledge graph is used for organizing information in a structured format that can be retrieved using specialized queries. There are many Knowledge graphs but they differ in their ontologies and taxonomies as well as property types that bind the relation between the entities, which creates problems while extracting the knowledge from them. There is an issue in multilingual support. While most of them claim to be multilingual they are more suitable for querying in the English language. Most of the existing knowledge graphs in existence are based on Wikipedia Info box. In this work, we have devised an information extraction pipeline for retrieving knowledge in Nepali Language from Wikidata using SPARQL endpoint. Queries based on Wikipedia info box has more accurate responses than the Queries based on the paragraph content of Wikipedia articles. The main reason behind that is that the information inside the paragraph is not linked properly in the Wikipedia info box.
Los estilos APA, Harvard, Vancouver, ISO, etc.
6

Azzam, Saliha, Kevin Humphreys, Robert Gaizauskas y Yorick Wilks. "Using a language independent domain model for multilingual information extraction". Applied Artificial Intelligence 13, n.º 7 (octubre de 1999): 705–24. http://dx.doi.org/10.1080/088395199117252.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
7

Seretan, Violeta y Eric Wehrli. "Multilingual collocation extraction with a syntactic parser". Language Resources and Evaluation 43, n.º 1 (1 de octubre de 2008): 71–85. http://dx.doi.org/10.1007/s10579-008-9075-7.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
8

Zhang, Ruijuan. "Multilingual pretrained based multi-feature fusion model for English text classification". Computer Science and Information Systems, n.º 00 (2025): 4. https://doi.org/10.2298/csis240630004z.

Texto completo
Resumen
Deep learning methods have been widely applied to English text classification tasks in recent years, achieving strong performance. However, current methods face two significant challenges: (1) they struggle to effectively capture long-range contextual structure information within text sequences, and (2) they do not adequately integrate linguistic knowledge into representations for enhancing the performance of classifiers. To this end, a novel multilingual pre-training based multi-feature fusion method is proposed for English text classification (MFFMP-ETC). Specifically, MFFMP-ETC consists of the multilingual feature extraction, the multilevel structure learning, and the multi-view representation fusion. MFFMP-ETC utilizes the Multilingual BERT as deep semantic extractor to introduce language information into representation learning, which significantly endows text representations with robustness. Then, MFFMP-ETC integrates Bi-LSTM and TextCNN into multilingual pre-training architecture to capture global and local structure information of English texts, via modelling bidirectional contextual semantic dependencies and multi-granularity local semantic dependencies. Meanwhile, MFFMP-ETC devises the multi-view representation fusion within the invariant semantic learning of representations to aggregate consistent and complementary information among views. MFFMP-ETC synergistically integrates Multilingual BERT?s deep semantic features, Bi-LSTM?s bidirectional context processing, and TextCNN local feature extraction, offering a more comprehensive and effective solution for capturing long-distance dependencies and nuanced contextual information in text classification. Finally, results on three datasets show MFFMP-ETC conducts a new baseline in terms of accuracy, sensitivity, and precision, verifying progressiveness and effectiveness of MFFMP-ETC in the text classification.
Los estilos APA, Harvard, Vancouver, ISO, etc.
9

Danielsson, Pernilla. "Automatic extraction of meaningful units from corpora". International Journal of Corpus Linguistics 8, n.º 1 (14 de agosto de 2003): 109–27. http://dx.doi.org/10.1075/ijcl.8.1.06dan.

Texto completo
Resumen
In this article, we will reconsider the notion of a word as the basic unit of analysis in language and propose that in an information and meaning carrying system the unit of analysis should be a unit of meaning (UM). Such a UM may consist of one or more words. A method will be promoted that attempts to automatically retrieve UMs from corpora. To illustrate the results that may be obtained by this method, the node word ‘stroke’ will be used in a small study. The results will be discussed, with implications considered for both monolingual and multilingual use. The monolingual study will benefit from using the British National Corpus, while the multilingual study introduces a parallel corpus consisting of Swedish novels and their translations into English.
Los estilos APA, Harvard, Vancouver, ISO, etc.
10

Aysa, Anwar, Mijit Ablimit, Hankiz Yilahun y Askar Hamdulla. "Chinese-Uyghur Bilingual Lexicon Extraction Based on Weak Supervision". Information 13, n.º 4 (31 de marzo de 2022): 175. http://dx.doi.org/10.3390/info13040175.

Texto completo
Resumen
Bilingual lexicon extraction is useful, especially for low-resource languages that can leverage from high-resource languages. The Uyghur language is a derivative language, and its language resources are scarce and noisy. Moreover, it is difficult to find a bilingual resource to utilize the linguistic knowledge of other large resource languages, such as Chinese or English. There is little related research on unsupervised extraction for the Chinese-Uyghur languages, and the existing methods mainly focus on term extraction methods based on translated parallel corpora. Accordingly, unsupervised knowledge extraction methods are effective, especially for the low-resource languages. This paper proposes a method to extract a Chinese-Uyghur bilingual dictionary by combining the inter-word relationship matrix mapped by the neural network cross-language word embedding vector. A seed dictionary is used as a weak supervision signal. A small Chinese-Uyghur parallel data resource is used to map the multilingual word vectors into a unified vector space. As the word-particles of these two languages are not well-coordinated, stems are used as the main linguistic particles. The strong inter-word semantic relationship of word vectors is used to associate Chinese-Uyghur semantic information. Two retrieval indicators, such as nearest neighbor retrieval and cross-domain similarity local scaling, are used to calculate similarity to extract bilingual dictionaries. The experimental results show that the accuracy of the Chinese-Uyghur bilingual dictionary extraction method proposed in this paper is improved to 65.06%. This method helps to improve Chinese-Uyghur machine translation, automatic knowledge extraction, and multilingual translations.
Los estilos APA, Harvard, Vancouver, ISO, etc.
11

M.Sapkal, Kunal, Vinayak R.Gharge, Sanika J.Kadam, Rutuja N.Mulik y Prof Vaibhav U. Bhosale. "VEDA-VISION GPT-AN AI-POWERED MULTILINGUAL DOCUMENT PROCESSING AND INTERACTION PLATFORM". International Journal of Engineering Applied Sciences and Technology 09, n.º 05 (1 de septiembre de 2024): 129–34. http://dx.doi.org/10.33564/ijeast.2024.v09i05.015.

Texto completo
Resumen
VEDA-VISIONGPT innovates multilingual document handling by unifying text recognition, language translation, and retrieval-augmented generation with interactive AI. This ground breaking platform extracts content from all types of documents in diverse Indian language sources, renders translations across more than 15 native tongues, and facilitates natural language queries. Employing advanced OCR and AI technologies, it offers comprehensive multilingual document management. Enhanced text analytics enable clear, logical information extraction. Aimed at government, legal, and academic sectors requiring precise language processing, VEDAVISIONGPT revolutionizes cross-lingual information access and comprehension, transforming how diverse linguistic communities engage with content.
Los estilos APA, Harvard, Vancouver, ISO, etc.
12

Frazier, Stefan. "Meaningful texts: the extraction of semantic information from monolingual and multilingual corpora". International Journal of Bilingual Education and Bilingualism 12, n.º 4 (julio de 2009): 489–92. http://dx.doi.org/10.1080/13670050802149432.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
13

Asgari-Bidhendi, Majid, Mehrdad Nasser, Behrooz Janfada y Behrouz Minaei-Bidgoli. "PERLEX: A Bilingual Persian-English Gold Dataset for Relation Extraction". Scientific Programming 2021 (16 de marzo de 2021): 1–8. http://dx.doi.org/10.1155/2021/8893270.

Texto completo
Resumen
Relation extraction is the task of extracting semantic relations between entities in a sentence. It is an essential part of some natural language processing tasks such as information extraction, knowledge extraction, question answering, and knowledge base population. The main motivations of this research stem from a lack of a dataset for relation extraction in the Persian language as well as the necessity of extracting knowledge from the growing big data in the Persian language for different applications. In this paper, we present “PERLEX” as the first Persian dataset for relation extraction, which is an expert-translated version of the “SemEval-2010-Task-8” dataset. Moreover, this paper addresses Persian relation extraction utilizing state-of-the-art language-agnostic algorithms. We employ six different models for relation extraction on the proposed bilingual dataset, including a non-neural model (as the baseline), three neural models, and two deep learning models fed by multilingual BERT contextual word representations. The experiments result in the maximum F1-score of 77.66% (provided by BERTEM-MTB method) as the state of the art of relation extraction in the Persian language.
Los estilos APA, Harvard, Vancouver, ISO, etc.
14

Macken, Lieve, Els Lefever y Véronique Hoste. "TExSIS". Terminology 19, n.º 1 (29 de abril de 2013): 1–30. http://dx.doi.org/10.1075/term.19.1.01mac.

Texto completo
Resumen
We report on TExSIS, a flexible bilingual terminology extraction system that uses a sophisticated chunk-based alignment method for the generation of candidate terms, after which the specificity of the candidate terms is determined by combining several statistical filters. Although the set-up of the architecture is largely language-independent, we present terminology extraction results for four different languages and three language pairs. Gold standard data sets were created for French-Italian, French-English and French-Dutch, which allowed us not only to evaluate precision, which is common practice, but also recall. We compared the TExSIS approach, which takes a multilingual perspective from the start, with the more commonly used approach of first identifying term candidates monolingually and then aligning the source and target terms. A comparison of our system with the LUIZ approach described by Vintar (2010) reveals that TExSIS outperforms LUIZ both for monolingual and bilingual terminology extraction. Our results also clearly show that the precision of the alignment is crucial for the success of the terminology extraction. Furthermore, based on the observation that the precision scores for bilingual terminology extraction outperform those of the monolingual systems, we conclude that multilingual evidence helps to determine unithood in less related languages.
Los estilos APA, Harvard, Vancouver, ISO, etc.
15

Stvan, Laurel Smith. "Book Review: Meaningful Texts: The Extraction of Semantic Information from Monolingual and Multilingual Corpora". Discourse Studies 8, n.º 2 (abril de 2006): 330–31. http://dx.doi.org/10.1177/146144560600800209.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
16

Steinberger, Ralf, Sylvia Ombuya, Mijail Kabadjov, Bruno Pouliquen, Leo Della Rocca, Jenya Belyaeva, Monica de Paola, Camelia Ignat y Erik van der Goot. "Expanding a multilingual media monitoring and information extraction tool to a new language: Swahili". Language Resources and Evaluation 45, n.º 3 (6 de julio de 2011): 311–30. http://dx.doi.org/10.1007/s10579-011-9155-y.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
17

Thenarasi V. "Multilingual Handwritten Recognition using DNN". Communications on Applied Nonlinear Analysis 31, n.º 6s (15 de agosto de 2024): 367–79. http://dx.doi.org/10.52783/cana.v31.1229.

Texto completo
Resumen
Handwriting has continued to persist as a means of communication and recording information in day-to-day life even with the introduction of new technologies. Given its ubiquity in human transactions, machine recognition of handwriting has practical significance, as in reading handwritten notes in a Personal Digital Assistant (PDA), in postal addresses on envelopes, in amounts in bank cheques, in handwritten fields in forms, etc. However the script-independent methodology for multilingual Offline Handwriting Recognition (OHR) becomes a very difficult task, since the multilingual methods have different characters and words. Prediction of script-independent methodology reduces accuracy rate of the OHR method. To overcome this problem, new OHR of Tamil and English is transduced into electronic data. It majorly focuses on the removal of noises and word, character segmentation methods with higher recognition rate. The images which are scanned may also contain noises. Image denoising steps consists of binarization, noise elimination, and size normalization. Words and characters segmentation are performed by using Particle Swarm Optimization (PSO) algorithm. Then those segmented samples are used for the next step which is feature extraction. Finally, word recognition is performed by using the deep neural network classifier.
Los estilos APA, Harvard, Vancouver, ISO, etc.
18

Chen, Liang-Hua y Chih-Wen Su. "Video Caption Extraction Using Spatio-Temporal Slices". International Journal of Image and Graphics 18, n.º 02 (abril de 2018): 1850009. http://dx.doi.org/10.1142/s0219467818500092.

Texto completo
Resumen
Captions in videos play an important role for video indexing and retrieval. In this paper, we propose a novel algorithm to extract multilingual captions from video. Our approach is based on the analysis of spatio-temporal slices of video. If the horizontal (or vertical) scan line contains some pixels of caption region then the corresponding spatio-temporal slice will have bar-code like patterns. By integrating the structure information of bar-code like patterns in horizontal and vertical slices, the spatial and temporal positions of video captions can be located accurately. Experimental results show that the proposed algorithm is effective and outperforms some existing techniques.
Los estilos APA, Harvard, Vancouver, ISO, etc.
19

Choi, Wonjun, Hwa-Mook Yoon, Mi-Hwan Hyun, Hye-Jin Lee, Jae-Wook Seol, Kangsan Dajeong Lee, Young Joon Yoon y Hyesoo Kong. "Building an annotated corpus for automatic metadata extraction from multilingual journal article references". PLOS ONE 18, n.º 1 (20 de enero de 2023): e0280637. http://dx.doi.org/10.1371/journal.pone.0280637.

Texto completo
Resumen
Bibliographic references containing citation information of academic literature play an important role as a medium connecting earlier and recent studies. As references contain machine-readable metadata such as author name, title, or publication year, they have been widely used in the field of citation information services including search services for scholarly information and research trend analysis. Many institutions around the world manually extract and continuously accumulate reference metadata to provide various scholarly services. However, manually collection of reference metadata every year continues to be a burden because of the associated cost and time consumption. With the accumulation of a large volume of academic literature, several tools, including GROBID and CERMINE, that automatically extract reference metadata have been released. However, these tools have some limitations. For example, they are only applicable to references written in English, the types of extractable metadata are limited for each tool, and the performance of the tools is insufficient to replace the manual extraction of reference metadata. Therefore, in this study, we focused on constructing a high-quality corpus to automatically extract metadata from multilingual journal article references. Using our constructed corpus, we trained and evaluated a BERT-based transfer-learning model. Furthermore, we compared the performance of the BERT-based model with that of the existing model, GROBID. Currently, our corpus contains 3,815,987 multilingual references, mainly in English and Korean, with labels for 13 different metadata types. According to our experiment, the BERT-based model trained using our corpus showed excellent performance in extracting metadata not only from journal references written in English but also in other languages, particularly Korean. This corpus is available at http://doi.org/10.23057/47.
Los estilos APA, Harvard, Vancouver, ISO, etc.
20

Alcantara, Tomas Humberto Montiel, David Krütli, Revathi Ravada y Thomas Hanne. "Multilingual Text Summarization for German Texts Using Transformer Models". Information 14, n.º 6 (25 de mayo de 2023): 303. http://dx.doi.org/10.3390/info14060303.

Texto completo
Resumen
The tremendous increase in documents available on the Web has turned finding the relevant pieces of information into a challenging, tedious, and time-consuming activity. Text summarization is an important natural language processing (NLP) task used to reduce the reading requirements of text. Automatic text summarization is an NLP task that consists of creating a shorter version of a text document which is coherent and maintains the most relevant information of the original text. In recent years, automatic text summarization has received significant attention, as it can be applied to a wide range of applications such as the extraction of highlights from scientific papers or the generation of summaries of news articles. In this research project, we are focused mainly on abstractive text summarization that extracts the most important contents from a text in a rephrased form. The main purpose of this project is to summarize texts in German. Unfortunately, most pretrained models are only available for English. We therefore focused on the German BERT multilingual model and the BART monolingual model for English, with a consideration of translation possibilities. As the source of the experiment setup, took the German Wikipedia article dataset and compared how well the multilingual model performed for German text summarization when compared to using machine-translated text summaries from monolingual English language models. We used the ROUGE-1 metric to analyze the quality of the text summarization.
Los estilos APA, Harvard, Vancouver, ISO, etc.
21

Cho, Seongkuk, Jihoon Moon, Junhyeok Bae, Jiwon Kang y Sangwook Lee. "A Framework for Understanding Unstructured Financial Documents Using RPA and Multimodal Approach". Electronics 12, n.º 4 (13 de febrero de 2023): 939. http://dx.doi.org/10.3390/electronics12040939.

Texto completo
Resumen
The financial business process worldwide suffers from huge dependencies upon labor and written documents, thus making it tedious and time-consuming. In order to solve this problem, traditional robotic process automation (RPA) has recently been developed into a hyper-automation solution by combining computer vision (CV) and natural language processing (NLP) methods. These solutions are capable of image analysis, such as key information extraction and document classification. However, they could improve on text-rich document images and require much training data for processing multilingual documents. This study proposes a multimodal approach-based intelligent document processing framework that combines a pre-trained deep learning model with traditional RPA used in banks to automate business processes from real-world financial document images. The proposed framework can perform classification and key information extraction on a small amount of training data and analyze multilingual documents. In order to evaluate the effectiveness of the proposed framework, extensive experiments were conducted using Korean financial document images. The experimental results show the superiority of the multimodal approach for understanding financial documents and demonstrate that adequate labeling can improve performance by up to about 15%.
Los estilos APA, Harvard, Vancouver, ISO, etc.
22

Suhas D. Pachpande y Parag U. Bhalchandra. "MarQO: A query optimizer in multilingual environment for information retrieval in Marathi language". International Journal of Science and Research Archive 9, n.º 2 (30 de agosto de 2023): 986–96. http://dx.doi.org/10.30574/ijsra.2023.9.2.0712.

Texto completo
Resumen
Information retrieval is a crucial component of modern information systems. A significant portion of the vast amount of information stored worldwide is in local languages. While most information retrieval systems are designed primarily for English, there is a growing need for these systems to work with data in languages other than English. Cross Language Information Retrieval (CLIR) systems play a pivotal role in enabling information retrieval across multiple languages. However, these systems often face challenges due to ambiguities in query translation, impacting retrieval accuracy. This paper introduces "MarQO," a query optimizer designed to address these challenges in the context of Marathi language. MarQO employs a multi-stage approach, including lexical processing, extraction of multi-word terms, synonym addition, phrasal translations, utilization of word co-occurrence statistics, and more. By disambiguating query keyword translations, MarQO significantly improves the accuracy of translations, thereby leading to more relevant document retrieval results.
Los estilos APA, Harvard, Vancouver, ISO, etc.
23

Mouhamad Kawas. "Unlocking Insights from Medical Texts: Leveraging Natural Language Processing for Information Extraction in Clinical Notes". Tuijin Jishu/Journal of Propulsion Technology 44, n.º 3 (26 de septiembre de 2023): 1207–14. http://dx.doi.org/10.52783/tjjpt.v44.i3.451.

Texto completo
Resumen
This research delves into the intersection of scientific inquiry and knowledge extraction, focusing on the application of Natural Language Processing (NLP) techniques in the medical domain. The introduction sets the stage by emphasizing the significance and relevance of the research, articulating a well-defined research question, and outlining the paper's structure. The methodology section meticulously describes the research framework, research methods and techniques, data collection process, and data analysis approach, emphasizing transparency and rigor. The results and discussion section presents key findings from a study on the efficacy of a drug (Drug X) in reducing blood pressure compared to a placebo. Demographic data, blood pressure reduction over time, adverse events, unexpected findings, and comparisons to previous research are detailed. Implications for clinical practice, future research directions, and study limitations are also addressed. The study concludes by summarizing its key findings related to NLP's efficiency in medical information extraction and its broader implications for healthcare efficiency, medical research, and global health impact. The research objective is restated, highlighting the success of NLP techniques in extracting valuable medical information from diverse text sources. Future research areas, such as multilingual NLP, semantic understanding, ethical considerations, and clinical validation, are identified as avenues for further exploration.
Los estilos APA, Harvard, Vancouver, ISO, etc.
24

Nagaraja, B. G. y H. S. Jayanna. "Multilingual Speaker Identification by Combining Evidence from LPR and Multitaper MFCC". Journal of Intelligent Systems 22, n.º 3 (1 de septiembre de 2013): 241–51. http://dx.doi.org/10.1515/jisys-2013-0038.

Texto completo
Resumen
AbstractIn this work, the significance of combining the evidence from multitaper mel-frequency cepstral coefficients (MFCC), linear prediction residual (LPR), and linear prediction residual phase (LPRP) features for multilingual speaker identification with the constraint of limited data condition is demonstrated. The LPR is derived from linear prediction analysis, and LPRP is obtained by dividing the LPR using its Hilbert envelope. The sine-weighted cepstrum estimators (SWCE) with six tapers are considered for multitaper MFCC feature extraction. The Gaussian mixture model–universal background model is used for modeling each speaker for different evidence. The evidence is then combined at scoring level to improve the performance. The monolingual, crosslingual, and multilingual speaker identification studies were conducted using 30 randomly selected speakers from the IITG multivariability speaker recognition database. The experimental results show that the combined evidence improves the performance by nearly 8–10% compared with individual evidence.
Los estilos APA, Harvard, Vancouver, ISO, etc.
25

Wijonarko, Panji y Amalia Zahra. "Spoken language identification on 4 Indonesian local languages using deep learning". Bulletin of Electrical Engineering and Informatics 11, n.º 6 (1 de diciembre de 2022): 3288–93. http://dx.doi.org/10.11591/eei.v11i6.4166.

Texto completo
Resumen
Language identification is at the forefront of assistance in many applications, including multilingual speech systems, spoken language translation, multilingual speech recognition, and human-machine interaction via voice. The identification of indonesian local languages using spoken language identification technology has enormous potential to advance tourism potential and digital content in Indonesia. The goal of this study is to identify four Indonesian local languages: Javanese, Sundanese, Minangkabau, and Buginese, utilizing deep learning classification techniques such as artificial neural network (ANN), convolutional neural network (CNN), and long-term short memory (LSTM). The selected extraction feature for audio data extraction employs mel-frequency cepstral coefficient (MFCC). The results showed that the LSTM model had the highest accuracy for each speech duration (3 s, 10 s, and 30 s), followed by the CNN and ANN models.
Los estilos APA, Harvard, Vancouver, ISO, etc.
26

Turki, Houcemeddine, Mohamed Ali Hadj Taieb, Thomas Shafee, Tiago Lubiana, Dariusz Jemielniak, Mohamed Ben Aouicha, Jose Emilio Labra Gayo et al. "Representing COVID-19 information in collaborative knowledge graphs: The case of Wikidata". Semantic Web 13, n.º 2 (3 de febrero de 2022): 233–64. http://dx.doi.org/10.3233/sw-210444.

Texto completo
Resumen
Information related to the COVID-19 pandemic ranges from biological to bibliographic, from geographical to genetic and beyond. The structure of the raw data is highly complex, so converting it to meaningful insight requires data curation, integration, extraction and visualization, the global crowdsourcing of which provides both additional challenges and opportunities. Wikidata is an interdisciplinary, multilingual, open collaborative knowledge base of more than 90 million entities connected by well over a billion relationships. It acts as a web-scale platform for broader computer-supported cooperative work and linked open data, since it can be written to and queried in multiple ways in near real time by specialists, automated tools and the public. The main query language, SPARQL, is a semantic language used to retrieve and process information from databases saved in Resource Description Framework (RDF) format. Here, we introduce four aspects of Wikidata that enable it to serve as a knowledge base for general information on the COVID-19 pandemic: its flexible data model, its multilingual features, its alignment to multiple external databases, and its multidisciplinary organization. The rich knowledge graph created for COVID-19 in Wikidata can be visualized, explored, and analyzed for purposes like decision support as well as educational and scholarly research.
Los estilos APA, Harvard, Vancouver, ISO, etc.
27

Wen, Yonghua, Junjun Guo, Zhiqiang Yu y Zhengtao Yu. "Chinese–Vietnamese Pseudo-Parallel Sentences Extraction Based on Image Information Fusion". Information 14, n.º 5 (21 de mayo de 2023): 298. http://dx.doi.org/10.3390/info14050298.

Texto completo
Resumen
Parallel sentences play a crucial role in various NLP tasks, particularly for cross-lingual tasks such as machine translation. However, due to the time-consuming and laborious nature of manual construction, many low-resource languages still suffer from a lack of large-scale parallel data. The objective of pseudo-parallel sentence extraction is to automatically identify sentence pairs in different languages that convey similar meanings. Earlier methods heavily relied on parallel data, which is unsuitable for low-resource scenarios. The current mainstream research direction is to use transfer learning or unsupervised learning based on cross-lingual word embeddings and multilingual pre-trained models; however, these methods are ineffective for languages with substantial differences. To address this issue, we propose a sentence extraction method that leverages image information fusion to extract Chinese–Vietnamese pseudo-parallel sentences from collections of bilingual texts. Our method first employs an adaptive image and text feature fusion strategy to efficiently extract the bilingual parallel sentence pair, and then, a multimodal fusion method is presented to balance the information between the image and text modalities. The experiments on multiple benchmarks show that our method achieves promising results compared to a competitive baseline by infusing additional external image information.
Los estilos APA, Harvard, Vancouver, ISO, etc.
28

Ma, Xiaoyue, Siya Zhang y Pengwei Zhao. "The effect of multilingual suggested tags on cross-language information tagging behaviour". Electronic Library 39, n.º 2 (8 de junio de 2021): 318–36. http://dx.doi.org/10.1108/el-07-2020-0177.

Texto completo
Resumen
Purpose Suggested tag was considered as one of the critical factors affecting a user’s tagging behaviour. However, compared to the findings on the suggested tags for the monolingual environment, it still lacks focused studies on the tag suggestions for cross-language information. Therefore, this paper aims to concern with annotation behaviour and psychological cognition in the cross-language environment when suggested tags are provided. Design/methodology/approach A cross-language tagging experiment was conducted to explore the impact of suggested tags on the tagging results and process. The descriptive statistics of tags, the sources and semantic relations of tags, as well as the user’s psychological cognition were all measured in the test. Findings The experimental results demonstrated that the multilingual suggested tags could bring some costs to a user’s tagging perception. Furthermore, the language factor of suggested tags led to different paths of tagging imitation (reflected by longer semantic mapping and imitation at the visual level) and different cognitive processes (topic extraction and inference process). Originality/value To the best of the authors’ knowledge, this study is one of the first to emphasize the effect of suggested tags during multilingual tagging. The findings will enrich the theories of user-information interaction in the cross-language environment and, in turn, provide practical implications for tag-based information system design.
Los estilos APA, Harvard, Vancouver, ISO, etc.
29

Narayana, K. Lakshmi. "LANGUAGE DETECTION USING MACHINE LEARNING". INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 07, n.º 11 (1 de noviembre de 2023): 1–11. http://dx.doi.org/10.55041/ijsrem27383.

Texto completo
Resumen
This paper presents an innovative Machine Learning (ML) model for language detection that combines the power of logistic regression with a multimodal approach. The proposed model is designed to handle three types of inputs: sequential text data, files, and image representations. The proposed model offers a versatile and accurate solution for identifying languages across diverse data modalities. The model architecture employs logistic regression to enhance interpretability and feature extraction from each input modality. Trained on a comprehensive multilingual dataset, the model exhibits robust performance, showcasing its applicability to real-world scenarios. The model’s ability to process text, files, and images makes it well-suited for applications in content filtering, cross-modal information retrieval, and multilingual sentiment analysis. This research contributes to the advancement of language detection models by offering a unified solution for handling diverse input types.
Los estilos APA, Harvard, Vancouver, ISO, etc.
30

Sakhovskiy, Andrey Sergeyevich y Elena Viktorovna Tutubalina. "Сross-lingual transfer learning in drug-related information extraction from user-generated texts". Proceedings of the Institute for System Programming of the RAS 33, n.º 6 (2021): 217–28. http://dx.doi.org/10.15514/ispras-2021-33(6)-15.

Texto completo
Resumen
Aggregating knowledge about drug, disease, and drug reaction entities across a broader range of domains and languages is critical for information extraction (IE) applications. In this work, we present a fine-grained evaluation intended to understand the efficiency of multilingual BERT-based models for biomedical named entity recognition (NER) and multi-label sentence classification tasks. We investigate the role of transfer learning (TL) strategies between two English corpora and a novel annotated corpus of Russian reviews about drug therapy. Labels for sentences include health-related issues or their absence. The sentences with one are additionally labelled at the expression level to identify fine-grained subtypes such as drug names, drug indications, and drug reactions. Evaluation results demonstrate that BERT trained on Russian and English raw reviews (5M in total) shows the best transfer capabilities on evaluation of adverse drug reactions on Russian data. The macro F1 score of 74.85% in the NER task was achieved by our RuDR-BERT model. For the classification task, our EnRuDR-BERT model achieves the macro F1 score of 70%, gaining 8.64% over the score of a general domain BERT model.
Los estilos APA, Harvard, Vancouver, ISO, etc.
31

Riasat, Maria. "Joint Entity and Relation Extraction Using Machine Reading Comprehension for Urdu". American Journal of Computer Science and Technology 7, n.º 3 (26 de septiembre de 2024): 104–14. http://dx.doi.org/10.11648/j.ajcst.20240703.15.

Texto completo
Resumen
Joint Entity and Relation Extraction (JERE) plays an important role in natural language processing (NLP) by identifying names, locations, and the relationships among them from unstructured text. Despite extensive research in languages like English, JERE poses significant challenges in low-resource languages, particularly Urdu, due to limited annotated da-ta and inherent linguistic complexities. In this paper, we propose a novel Machine Reading Comprehension (MRC)-based approach that effectively addresses the JERE task for Urdu, integrating a text encoder and a question-answering module that work synergistically to enhance entity and relationship extraction. We introduce an annotated Urdu JERE dataset and demonstrate how our methodology will significantly contribute to multilingual NLP efforts. We propose an innovative Machine Reading Comprehension (MRC)-based method to tackle JERE in Urdu. This method has two main components: a text encoder and a question answering (QA) module. The text encoder converts Urdu text into a compact vector form, which is then fed into the QA module. The QA module generates answers to queries regarding the desired entities and relationships, producing a sequence of tokens that represent these entities and their interactions. The model is trained to minimize the difference between its predicted answers and the correct ones. Our approach, along with the introduction of an annotated Urdu JERE dataset, significantly advances multilingual NLP and information ex-traction research. The insights gained can be applied to other low-resource languages, aiding in the development of NLP tools and applications for a broader array of languages.
Los estilos APA, Harvard, Vancouver, ISO, etc.
32

Gupta, Vaibhav. "Keyword-Based Exploration of Library Resources". INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 09, n.º 01 (17 de enero de 2025): 1–9. https://doi.org/10.55041/ijsrem40835.

Texto completo
Resumen
The project "Keyword-Based Exploration of Library Resources" addresses the challenges associated with accessing and discovering academic resources efficiently. Traditional systems often suffer from limitations such as inadequate multilingual support, poor metadata utilization, and restricted filtering capabilities, which hinder users from locating relevant research materials effectively. This project proposes an innovative solution leveraging Artificial Intelligence (AI) and Natural Language Processing (NLP) techniques to enhance search capabilities and inclusivity. The system incorporates: • Multilingual Search: Enabling users to perform queries in various languages using translation APIs. • Advanced Filtering Options: Allowing searches to be refined by author, publication year, journal, and more. • AI-Powered Metadata Extraction: Utilizing Optical Character Recognition (OCR) and NLP to extract and catalogue metadata like keywords, authors, and publication years. The proposed system is built on a Python backend using Flask for API integration and MyAWS CLOUD for secure data storage. By integrating robust search mechanisms and user-friendly design, the project contributes to Sustainable Development Goal 4 (Quality Education), fostering global accessibility to knowledge and academic research. The outcomes of this project are anticipated to significantly improve resource discoverability, inclusivity, and precision, addressing the needs of diverse academic communities. INDEX TERMS Keyword Search, Library Resource Management, Information Retrieval, Digital Libraries, Metadata Extraction, Search Optimization, Natural Language Processing (NLP), Database Searching, Search Algorithms, Document Retrieval Systems, Academic Research Tools.
Los estilos APA, Harvard, Vancouver, ISO, etc.
33

Monti, Johanna, Maria Pia Di Buono, Giulia Speranza, Maria Centrella y Andrea De Carlo. "Le projet Archaeo-Term : premiers résultats". Traduction et Langues 21, n.º 1 (31 de agosto de 2022): 121–36. http://dx.doi.org/10.52919/translang.v21i1.875.

Texto completo
Resumen
The Project Archaeo-Term: Initial Results This article aims at describing the objectives, the theoretical and methodological background, the development, and the first results of the Archaeo-Term project of the University of Naples "L'Orientale", Department of Literary, Linguistic and Comparative Studies. The Archaeo-Term project has been developed within the YourTermCULT project promoted by the Terminology Without Borders Project of the Terminology Coordination Unit (TermCoord) of the European Parliament - Directorate-General for Translation (DGT) specifically for collecting terminology in different aspects related to culture. The aim of the Archaeo-Term project is to enhance the access to the archaeological data in several formats and languages. It represents a common effort to contribute to the creation of linguistic and terminological resources for the domain of Cultural Heritage (CH) and, in particular, for the sub-domain of archaeology, which is notably highly complex and fragmented. One of the first results of the Archaeo-Term project is the creation of a multilingual terminological resource for the domain of archaeology, which can be conveniently employed in different Natural Language Processing (NLP) tasks, including Machine Translation (MT). The first version of the Archaeo-Term multilingual terminological resource is available in 5 languages: Italian, English, Spanish, German, and Dutch and is publicly accessible online. With the objective of promoting a common and shared termbase across different languages, the Archaeo-Term terminological resource is addressed not only to a specialized audience such as experts in the field of archaeology but also as terminological support for translators and interpreters during their professional practice, as well as for a more general audience. The terminological resource is the result of an extraction and aggregation process carried out starting from two already existing thesauri: the Italian "Thesaurus per la definizione dei reperti archeologici" developed by the Italian Central Institute for Catalogue and Documentation (Istituto Centrale per il Catalogo e la Documentazione - ICCD) and the multilingual Art and Architecture Thesaurus (AAT) developed by the Getty Research Institute, which is among the most trustworthy and accurate resources in the domain of Cultural Heritage. Taking advantage of the Semantic Web formalisms applied to these terminological resources, we are able to extract and merge information from the aforementioned thesauri using SPARQL queries. Indeed, we run different queries against the SPARQL endpoint to enrich our multilingual terminological resource by extracting useful information about the different terminological entries. The information extracted and merged from these thesauri by means of several consecutive queries is as follows: the equivalent terms in the foreseen languages, the alternative terms and the plural forms, the domains and sub-domains, the definitions of the terms, and their sources. Furthermore, the extraction phase has been followed by an evaluation step aimed at checking missing information, verifying and adjusting possible misalignments among entries, and setting potential future implementations. As an ongoing project, we are also planning to enlarge the terminological resource with equivalent terms in other languages such as French, Swedish, Polish, Russian, and Chinese, with the aim of extending the language coverage also to non- European languages which are usually under-represented and low-resourced. As a first implementation with regards to the first version of the terminological resource, we have currently collected: 1.059 entries in Italian, 1.055 in Spanish, 1.053 in English, 843 in Russian, 600 in Polish, 460 in German, 193 in French, and 82 in Chinese. To conclude, the Archaeo-Term project aims at promoting the creation of high-quality and trustworthy multilingual terminological resources for the domain of archaeology by also collaborating at the same time with institutions, experts in the field of terminology, linguistics, and cultural heritage.
Los estilos APA, Harvard, Vancouver, ISO, etc.
34

Gašpar, Angelina, Sanja Seljan y Vlasta Kučiš. "Measuring Terminology Consistency in Translated Corpora: Implementation of the Herfindahl-Hirshman Index". Information 13, n.º 2 (18 de enero de 2022): 43. http://dx.doi.org/10.3390/info13020043.

Texto completo
Resumen
Consistent terminology can positively influence communication, information transfer, and proper understanding. In multilingual written communication processes, challenges are augmented due to translation variants. The main aim of this study was to implement the Herfindahl-Hirshman Index (HHI) for the assessment of translated terminology in parallel corpora for the evaluation of translated terminology. This research was conducted on three types of legal domain subcorpora, dating from different periods: the Croatian-English parallel corpus (1991–2009), Latin-English and Latin-Croatian versions of the Code of Canon Law (1983), and English and Croatian versions of the EU legislation (2013). After the terminology extraction process, validation of term candidates was performed, followed by an evaluation. Terminology consistency was measured using the HHI—a commonly accepted measurement of market concentration. Results show that the HHI can be used for measuring terminology consistency to improve information transfer and message understanding. In translation settings, the process shows the need for quality management solutions.
Los estilos APA, Harvard, Vancouver, ISO, etc.
35

León-Araúz, Pilar, Arianne Reimerink y Pamela Faber. "EcoLexicon and by-products". Terminology 25, n.º 2 (26 de noviembre de 2019): 222–58. http://dx.doi.org/10.1075/term.00037.leo.

Texto completo
Resumen
Abstract Reutilization and interoperability are major issues in the fields of knowledge representation and extraction, as reflected in initiatives such as the Semantic Web and the Linked Open Data Cloud. This paper shows how terminological resources can be integrated and reused within different types of application. EcoLexicon is a multilingual terminological knowledge base (TKB) on environmental science that integrates conceptual, linguistic and visual information. It has led to the following by-products: (i) the EcoLexicon English Corpus; (ii) EcoLexiCAT, a terminology-enhanced translation tool; and (iii) Manzanilla, an image annotation tool. This paper explains EcoLexicon and its by-products, and shows how the latter exploit and enhance the data in the TKB.
Los estilos APA, Harvard, Vancouver, ISO, etc.
36

Rigouts Terryn, Ayla, Véronique Hoste y Els Lefever. "In no uncertain terms: a dataset for monolingual and multilingual automatic term extraction from comparable corpora". Language Resources and Evaluation 54, n.º 2 (26 de marzo de 2019): 385–418. http://dx.doi.org/10.1007/s10579-019-09453-9.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
37

Luporini, Antonella. "Exploring the Lexis of Art Through a Specialized Corpus: A Bilingual Italian-English Perspective". International Journal of English Linguistics 13, n.º 7 (20 de diciembre de 2023): 29. http://dx.doi.org/10.5539/ijel.v13n7p29.

Texto completo
Resumen
This study presents an application of a specialized corpus, including texts specifically related to art and cultural heritage, to the analysis of artistic vocabulary in a bilingual (Italian-English) perspective, focusing on the Italian lemmas opera, figura and disegno and their English translation equivalents. The starting point is the Italian corpus that is being developed under the research project Lessico multilingue dei beni culturali (‘Multilingual art and cultural heritage vocabulary’, LBC), available online in open access through NoSketchEngine. First, a lemmatized nounlist ordered by frequency of occurrence is extracted from the corpus, leading to the selection of the above-mentioned focus words, in view of both their frequency and status as technical terms within the domain of art (though exhibiting different levels of technicality). These are further investigated by extracting collocates and KWIC concordances, leading to the identification of several specialized collocations and domain-specific senses. The analysis subsequently moves from corpus to dictionary, exploring the extent to which the patterns emerging from corpus investigation are accounted for in the entries for opera, figura and disegno in four Italian-English bilingual dictionaries. From this viewpoint, the study also aims to show how specialized corpus data can be used for the extraction of collocations, terms, and context-specific word senses, which may in turn be used both to enrich the information provided by currently available general dictionaries, and to work towards the creation of a large-scale specialized bilingual dictionary, which is non-existent to date.
Los estilos APA, Harvard, Vancouver, ISO, etc.
38

Bergamaschi, Sonia, Stefania De Nardis, Riccardo Martoglia, Federico Ruozzi, Luca Sala, Matteo Vanzini y Riccardo Amerigo Vigliermo. "Novel Perspectives for the Management of Multilingual and Multialphabetic Heritages through Automatic Knowledge Extraction: The DigitalMaktaba Approach". Sensors 22, n.º 11 (25 de mayo de 2022): 3995. http://dx.doi.org/10.3390/s22113995.

Texto completo
Resumen
The linguistic and social impact of multiculturalism can no longer be neglected in any sector, creating the urgent need of creating systems and procedures for managing and sharing cultural heritages in both supranational and multi-literate contexts. In order to achieve this goal, text sensing appears to be one of the most crucial research areas. The long-term objective of the DigitalMaktaba project, born from interdisciplinary collaboration between computer scientists, historians, librarians, engineers and linguists, is to establish procedures for the creation, management and cataloguing of archival heritage in non-Latin alphabets. In this paper, we discuss the currently ongoing design of an innovative workflow and tool in the area of text sensing, for the automatic extraction of knowledge and cataloguing of documents written in non-Latin languages (Arabic, Persian and Azerbaijani). The current prototype leverages different OCR, text processing and information extraction techniques in order to provide both a highly accurate extracted text and rich metadata content (including automatically identified cataloguing metadata), overcoming typical limitations of current state of the art approaches. The initial tests provide promising results. The paper includes a discussion of future steps (e.g., AI-based techniques further leveraging the extracted data/metadata and making the system learn from user feedback) and of the many foreseen advantages of this research, both from a technical and a broader cultural-preservation and sharing point of view.
Los estilos APA, Harvard, Vancouver, ISO, etc.
39

Patil, Ratnamala S., Geeta Hanji y Rakesh Huded. "Enhanced scene text recognition using deep learning based hybrid attention recognition network". IAES International Journal of Artificial Intelligence (IJ-AI) 13, n.º 4 (1 de diciembre de 2024): 4927. http://dx.doi.org/10.11591/ijai.v13.i4.pp4927-4938.

Texto completo
Resumen
<span lang="EN-US">The technique of automatically recognizing and transforming text that is present in pictures or scenes into machine-readable text is known as scene text recognition. It facilitates applications like content extraction, translation, and text analysis in real-world visual data by enabling computers to comprehend and extract textual information from images, videos, or documents. Scene text recognition is essential for many applications, such as language translation and content extraction from photographs. The hybrid attention recognition network (HARN), unique technology presented in this research, is intended to greatly improve efficiency and accuracy of text recognition in complicated scene situations. HARN makes use of cutting-edge elements including alignment-free sequence-to-sequence (AFS) module, creative attention mechanisms, and hybrid architecture that blends attention models with convolutional neural networks (CNNs). Thanks to its novel attention processes, HARN is capable of comprehending wide range of scene text components by capturing both local and global context information. Through faster network convergence, shorter training times, and better utilization of computing resources, the suggested technique raises bar for state-of-the-art. HARN’s versatility makes it a good choice for range of scene text recognition applications, including multilingual text analysis and data extraction. Extensive tests are conducted to assess the effectiveness of HARN approach and demonstrate it is ability to greatly influence real-world applications where accurate and efficient text recognition is essential.</span>
Los estilos APA, Harvard, Vancouver, ISO, etc.
40

Šafran, Valentino, Simon Lin, Jama Nateqi, Alistair G. Martin, Urška Smrke, Umut Ariöz, Nejc Plohl et al. "Multilingual Framework for Risk Assessment and Symptom Tracking (MRAST)". Sensors 24, n.º 4 (8 de febrero de 2024): 1101. http://dx.doi.org/10.3390/s24041101.

Texto completo
Resumen
The importance and value of real-world data in healthcare cannot be overstated because it offers a valuable source of insights into patient experiences. Traditional patient-reported experience and outcomes measures (PREMs/PROMs) often fall short in addressing the complexities of these experiences due to subjectivity and their inability to precisely target the questions asked. In contrast, diary recordings offer a promising solution. They can provide a comprehensive picture of psychological well-being, encompassing both psychological and physiological symptoms. This study explores how using advanced digital technologies, i.e., automatic speech recognition and natural language processing, can efficiently capture patient insights in oncology settings. We introduce the MRAST framework, a simplified way to collect, structure, and understand patient data using questionnaires and diary recordings. The framework was validated in a prospective study with 81 colorectal and 85 breast cancer survivors, of whom 37 were male and 129 were female. Overall, the patients evaluated the solution as well made; they found it easy to use and integrate into their daily routine. The majority (75.3%) of the cancer survivors participating in the study were willing to engage in health monitoring activities using digital wearable devices daily for an extended period. Throughout the study, there was a noticeable increase in the number of participants who perceived the system as having excellent usability. Despite some negative feedback, 44.44% of patients still rated the app’s usability as above satisfactory (i.e., 7.9 on 1–10 scale) and the experience with diary recording as above satisfactory (i.e., 7.0 on 1–10 scale). Overall, these findings also underscore the significance of user testing and continuous improvement in enhancing the usability and user acceptance of solutions like the MRAST framework. Overall, the automated extraction of information from diaries represents a pivotal step toward a more patient-centered approach, where healthcare decisions are based on real-world experiences and tailored to individual needs. The potential usefulness of such data is enormous, as it enables better measurement of everyday experiences and opens new avenues for patient-centered care.
Los estilos APA, Harvard, Vancouver, ISO, etc.
41

Faber, Pamela, Silvia Montero Martínez, María Rosa Castro Prieto, José Senso Ruiz, Juan Antonio Prieto Velasco, Pilar León Araúz, Carlos Márquez Linares y Miguel Vega Expósito. "Process-oriented terminology management in the domain of Coastal Engineering". Terminology 12, n.º 2 (13 de noviembre de 2006): 189–213. http://dx.doi.org/10.1075/term.12.2.03fab.

Texto completo
Resumen
This article describes the theoretical premises and methodology presently being used in the development of the PuertoTerm database on Coastal Engineering. In our project there are three foci, which are highly relevant to the elaboration of lexicographic and terminological products: (1) the conceptual organization underlying any knowledge resource; (2) the multidimensional nature of conceptual representations; and (3) knowledge extraction through the use of multilingual corpora. In this sense we propose a frame-based organization of specialized fields in which a dynamic, process-oriented event frame provides the conceptual underpinnings for the location of sub-hierarchies of concepts within a specialized domain event. We explain how frames with semantic and syntactic information can be specified within this type of framework, and also discuss issues regarding concept denomination and terminological meaning, based on the use of definitional schemas for each conceptual category. We also offer a typology of images for the inclusion of graphic information in each entry, depending on the nature of the concept.
Los estilos APA, Harvard, Vancouver, ISO, etc.
42

Vitas, Duško, Cvetana Krstev y Denis Maurel. "A note on the semantic and morphological properties of proper names in the Prolex project". Lingvisticæ Investigationes. International Journal of Linguistics and Language Resources 30, n.º 1 (10 de agosto de 2007): 115–33. http://dx.doi.org/10.1075/li.30.1.08vit.

Texto completo
Resumen
In this paper we present a linguistic approach to the analysis of proper names. The basic assumption of our approach is that proper names are linguistic units of text that should be treated using the same methods that are applied to text in its totality. We illustrate the inflectional and derivational properties of simple and multi-word proper names on the example of Serbian, and describe how these properties have been formalized in order to develop e-dictionaries of the DELA type. In order to support multi-lingual applications we have developed a model of a multilingual relational dictionary of proper names based on an ontology, as well as an actual database. Finally, we outline how the developed dictionaries and database can be used in real monolingual and multi-lingual applications, such as information extraction.
Los estilos APA, Harvard, Vancouver, ISO, etc.
43

Akcali, Zafer, Hazal Selvi Cubuk, Arzu Oguz, Murat Kocak, Aydan Farzaliyeva, Fatih Guven, Mehmet Nezir Ramazanoglu, Efe Hasdemir, Ozden Altundag y Ahmet Muhtesem Agildere. "Automated Extraction of Key Entities from Non-English Mammography Reports Using Named Entity Recognition with Prompt Engineering". Bioengineering 12, n.º 2 (10 de febrero de 2025): 168. https://doi.org/10.3390/bioengineering12020168.

Texto completo
Resumen
Objective: Named entity recognition (NER) offers a powerful method for automatically extracting key clinical information from text, but current models often lack sufficient support for non-English languages. Materials and Methods: This study investigated a prompt-based NER approach using Google’s Gemini 1.5 Pro, a large language model (LLM) with a 1.5-million-token context window. We focused on extracting important clinical entities from Turkish mammography reports, a language with limited available natural language processing (NLP) tools. Our method employed many-shot learning, incorporating 165 examples within a 26,000-token prompt derived from 75 initial reports. We tested the model on a separate set of 85 unannotated reports, concentrating on five key entities: anatomy (ANAT), impression (IMP), observation presence (OBS-P), absence (OBS-A), and uncertainty (OBS-U). Results: Our approach achieved high accuracy, with a macro-averaged F1 score of 0.99 for relaxed match and 0.84 for exact match. In relaxed matching, the model achieved F1 scores of 0.99 for ANAT, 0.99 for IMP, 1.00 for OBS-P, 1.00 for OBS-A, and 0.99 for OBS-U. For exact match, the F1 scores were 0.88 for ANAT, 0.79 for IMP, 0.78 for OBS-P, 0.94 for OBS-A, and 0.82 for OBS-U. Discussion: These results indicate that a many-shot prompt engineering approach with large language models provides an effective way to automate clinical information extraction for languages where NLP resources are less developed, and as reported in the literature, generally outperforms zero-shot, five-shot, and other few-shot methods. Conclusion: This approach has the potential to significantly improve clinical workflows and research efforts in multilingual healthcare environments.
Los estilos APA, Harvard, Vancouver, ISO, etc.
44

Rutuja Rajendra Patil, Devika A. Verma, Seema Babusing Rathod, Rupali A. Mahajan, Poorva Agrawal,. "Enhancing Lip Reading: A Deep Learning Approach with CNN and RNN Integration". Journal of Electrical Systems 20, n.º 2s (4 de abril de 2024): 463–71. http://dx.doi.org/10.52783/jes.1367.

Texto completo
Resumen
This research introduces an innovative approach to enhance lip reading-based text extraction and translation through the integration of a double Convolutional Neural Network (CNN) coupled with Recurrent Neural Network (RNN) architecture. The proposed model aims to leverage the strengths of both CNN and RNN to achieve superior accuracy in lip movement interpretation and subsequent text extraction. The methodology involves training the double CNN+RNN model on extensive datasets containing synchronized lip movements and corresponding linguistic expressions. The initial layers of the model utilize CNNs to effectively capture spatial features from the visual input of lip images. The extracted features are then fed into RNN layers, allowing the model to grasp temporal dependencies and contextual information crucial for accurate lip reading. The trained model showcases its proficiency in extracting textual content from spoken words, demonstrating an advanced capability to decipher nuances in lip gestures. Furthermore, the extracted text undergoes a translation process, enabling the conversion of spoken language into various target languages. This research not only contributes to the advancement of lip reading technologies but also establishes a robust foundation for real-world applications such as accessibility solutions for individuals with hearing impairments, real-time multilingual translation services, and improved communication in challenging acoustic environments. The abstract concludes with a discussion on the potential impact of the double CNN+RNN model in pushing the boundaries of human-computer interaction, emphasizing the synergy between deep learning, lip reading, and translation technologies
Los estilos APA, Harvard, Vancouver, ISO, etc.
45

Khan, Muzammil, Kifayat Ullah, Yasser Alharbi, Ali Alferaidi, Talal Saad Alharbi, Kusum Yadav, Naif Alsharabi y Aakash Ahmad. "Understanding the Research Challenges in Low-Resource Language and Linking Bilingual News Articles in Multilingual News Archive". Applied Sciences 13, n.º 15 (25 de julio de 2023): 8566. http://dx.doi.org/10.3390/app13158566.

Texto completo
Resumen
The developed world has focused on Web preservation compared to the developing world, especially news preservation for future generations. However, the news published online is volatile because of constant changes in the technologies used to disseminate information and the formats used for publication. News preservation became more complicated and challenging when the archive began to contain articles from low-resourced and morphologically complex languages like Urdu and Arabic, along with English news articles. The digital news story preservation framework is enriched with eighteen sources for Urdu, Arabic, and English news sources. This study presents challenges in low-resource languages (LRLs), research challenges, and details of how the framework is enhanced. In this paper, we introduce a multilingual news archive and discuss the digital news story extractor, which addresses major issues in implementing low-resource languages and facilitates normalized format migration. The extraction results are presented in detail for high-resource languages, i.e., English, and low-resource languages, i.e., Urdu and Arabic. LRLs encountered a high error rate during preservation compared to high-resource languages (HRLs), corresponding to 10% and 03%, respectively. The extraction results show that few news sources are not regularly updated and release few new news stories online. LRLs require more detailed study for accurate news content extraction and archiving for future access. LRLs and HRLs enrich the digital news story preservation (DNSP) framework. The Digital News Stories Archive (DNSA) preserves a huge number of news articles from multiple news sources in LRLs and HRLs. This paper presents research challenges encountered during the preservation of Urdu and Arabic-language news articles to create a multilingual news archive. The second part of the paper compares two bilingual linking mechanisms for Urdu-to-English-language news articles in the DNSA: the common ratio measure for dual language (CRMDL) and the similarity measure based on transliteration words (SMTW) with the cosine similarity measure (CSM) baseline technique. The experimental results show that the SMTW is more effective than the CRMDL and CSM for linking Urdu-to-English news articles. The precision improved from 46% and 50% to 60%, and the recall improved from 64% and 67% to 82% for CSM, CRMDL, and SMTW, respectively, with improved impact of common terms as well.
Los estilos APA, Harvard, Vancouver, ISO, etc.
46

Bouaine, Chaimaa y Faouzia Benabbou. "Efficient cross-lingual plagiarism detection using bidirectional and auto-regressive transformers". IAES International Journal of Artificial Intelligence (IJ-AI) 13, n.º 4 (1 de diciembre de 2024): 4619. http://dx.doi.org/10.11591/ijai.v13.i4.pp4619-4629.

Texto completo
Resumen
<span lang="EN-US">The pervasive availability of vast online information has fundamentally altered our approach to acquiring knowledge. Nevertheless, this wealth of data has also presented significant challenges to academic integrity, notably in the realm of cross-lingual plagiarism. This type of plagiarism involves the unauthorized copying, translation, ideas, or works from one language into others without proper citation. This research introduces a methodology for identifying multilingual plagiarism, utilizing a pre-trained multilingual bidirectional and auto-regressive transformers (mBART) model for document feature extraction. Additionally, a siamese long short-term memory (SLSTM) model is employed for classifying pairs of documents as either "plagiarized" or "non-plagiarized". Our approach exhibits notable performance across various languages, including English (En), Spanish (Es), German (De), and French (Fr). Notably, experiments focusing on the En-Fr language pair yielded exceptional results, with an accuracy of 98.83%, precision of 98.42%, recall of 99.32%, and F-score of 98.87%. For En-Es, the model achieved an accuracy of 97.94%, precision of 98.57%, recall of 97.47%, and an F-score of 98.01%. In the case of En-De, the model demonstrated an accuracy of 95.59%, precision of 95.21%, recall of 96.85%, and F-score of 96.02%. These outcomes underscore the effectiveness of combining the MBART transformer and SLSTM models for cross-lingual plagiarism detection.</span>
Los estilos APA, Harvard, Vancouver, ISO, etc.
47

Makarych, M. V., Y. B. Popova y M. O. Shved. "Linguistic database and software for english-belarusian-russian dictionary of technical terms". «System analysis and applied information science», n.º 4 (6 de febrero de 2019): 74–82. http://dx.doi.org/10.21122/2309-4923-2018-4-74-82.

Texto completo
Resumen
The central object of computer lexicography is a computer or electronic dictionary, which must have a sufficiently large vocabulary, provide the consistent extraction of information depending on the user’s need and provide complete grammatical information about the words of input and output languages. Taking into account the current trend in the development of special terminological dictionaries, the authors propose an English-Belarusian-Russian dictionary of technical terms. At the initial stage of the work the dictionary was named TechLex and covers the following subject areas: architecture and construction, water supply, information technology, pedagogy, transport communications, economics, energy-supply. Currently, each subject area of the dictionary is located in the Internet GoogleTable and contains about 1000 terms. It has the possibility to be simultaneously filled by several teachers. The linguistic database of the dictionary is not created by the traditional way of processing a large number of paper dictionaries and combining the received translations. Lexis from sequential processing of scientific and technical English periodicals of particular subject areas is the base of it. The software of the proposed electronic dictionary is designed taking into account the analysis of modern electronic multilingual translation dictionaries and is a client-server application in Java programming language. The client part of the system contains a mobile application for the Android operating system, which was tested on tablets and smartphones with different screen diagonals. The interface of TechLex dictionary is designed in such a way that only a single zone is activated according to the query, so there is no need to view all the subject areas of the dictionary. The proposed TechLex dictionary is the first technical multilingual electronic dictionary with an English-Belarusian-Russian version.
Los estilos APA, Harvard, Vancouver, ISO, etc.
48

Horák, Aleš, Vít Baisa, Adam Rambousek y Vít Suchomel. "A New Approach for Semi-Automatic Building and Extending a Multilingual Terminology Thesaurus". International Journal on Artificial Intelligence Tools 28, n.º 02 (marzo de 2019): 1950008. http://dx.doi.org/10.1142/s0218213019500088.

Texto completo
Resumen
This paper describes a new system for semi-automatically building, extending and managing a terminological thesaurus — a multilingual terminology dictionary enriched with relationships between the terms themselves to form a thesaurus. The system allows to radically enhance the workow of current terminology expert groups, where most of the editing decisions still come from introspection. The presented system supplements the lexicographic process with natural language processing techniques, which are seamlessly integrated to the thesaurus editing environment. The system’s methodology and the resulting thesaurus are closely connected to new domain corpora in the six languages involved. They are used for term usage examples as well as for the automatic extraction of new candidate terms. The terminological thesaurus is now accessible via a web-based application, which (a) presents rich detailed information on each term, (b) visualizes term relations, and (c) displays real-life usage examples of the term in the domain-related documents and in the context-based similar terms. Furthermore, the specialized corpora are used to detect candidate translations of terms from the central language (Czech) to the other languages (English, French, German, Russian and Slovak) as well as to detect broader Czech terms, which help to place new terms in the actual thesaurus hierarchy. This project has been realized as a terminological thesaurus of land surveying, but the presented tools and methodology are reusable for other terminology domains.
Los estilos APA, Harvard, Vancouver, ISO, etc.
49

Ahmad, Maulana Ihsan y Moh Kanif Anwari. "Computational linguistics and natural language processing techniques for semantic field extraction in Arabic online news". Studies in English Language and Education 11, n.º 3 (30 de septiembre de 2024): 1685–709. http://dx.doi.org/10.24815/siele.v11i3.38090.

Texto completo
Resumen
The research aimed to extract semantic fields from Arabic online news and advance Natural Language Processing (NLP) applications in understanding and managing news information effectively. It provides a comprehensive approach to processing and analyzing large volumes of Arabic news data by integrating semantic field analysis, NLP, and computational linguistics. Using quantitative methods, Arabic news articles were collected and processed with Python, a popular programming language in data analysis, and applied various NLP techniques and machine learning models to accurately extract semantic fields. The primary objective was to evaluate the effectiveness of different classification models in categorizing Arabic news and to identify the most suitable model for semantic field extraction. The research evaluated five classification models: Naive Bayes, Support Vector Machine (SVM), Logistic Regression, Random Forest, and Gradient Boosting. Among these, SVM achieves the highest overall accuracy of 90%. Specifically, SVM demonstrated exceptional performance in categorizing sports-related news, with a 99% probability and an F1-Score of 98%. However, it faced challenges in categorizing health and science news, achieving a lower F1-Score of 79%. Overall, the study demonstrated the effectiveness of computational methods, particularly SVM, in classifying Arabic news and extracting semantic fields, thereby advancing NLP and computational linguistics. The findings highlighted the potential of SVM for accurate news analysis and the need for further enhancement of NLP techniques to address multilingual and domain-specific challenges.
Los estilos APA, Harvard, Vancouver, ISO, etc.
50

Boudjellal, Nada, Huaping Zhang, Asif Khan, Arshad Ahmad, Rashid Naseem, Jianyun Shang y Lin Dai. "ABioNER: A BERT-Based Model for Arabic Biomedical Named-Entity Recognition". Complexity 2021 (13 de marzo de 2021): 1–6. http://dx.doi.org/10.1155/2021/6633213.

Texto completo
Resumen
The web is being loaded daily with a huge volume of data, mainly unstructured textual data, which increases the need for information extraction and NLP systems significantly. Named-entity recognition task is a key step towards efficiently understanding text data and saving time and effort. Being a widely used language globally, English is taking over most of the research conducted in this field, especially in the biomedical domain. Unlike other languages, Arabic suffers from lack of resources. This work presents a BERT-based model to identify biomedical named entities in the Arabic text data (specifically disease and treatment named entities) that investigates the effectiveness of pretraining a monolingual BERT model with a small-scale biomedical dataset on enhancing the model understanding of Arabic biomedical text. The model performance was compared with two state-of-the-art models (namely, AraBERT and multilingual BERT cased), and it outperformed both models with 85% F1-score.
Los estilos APA, Harvard, Vancouver, ISO, etc.
Ofrecemos descuentos en todos los planes premium para autores cuyas obras están incluidas en selecciones literarias temáticas. ¡Contáctenos para obtener un código promocional único!

Pasar a la bibliografía