Gotowe bibliografie tematyczne / Computational language documentation

Gotowa bibliografia na temat „Computational language documentation”

Autor: Grafiati

Data publikacji: 25 maja 2024

Utwórz poprawne odniesienie w stylach APA, MLA, Chicago, Harvard i wielu innych

Wybierz rodzaj źródła:

Spis treści

Artykuły w czasopismach
Rozprawy doktorskie
Książki
Części książek
Streszczenia konferencji
Raporty organizacyjne

Zobacz listy aktualnych artykułów, książek, rozpraw, streszczeń i innych źródeł naukowych na temat „Computational language documentation”.

Przycisk „Dodaj do bibliografii” jest dostępny obok każdej pracy w bibliografii. Użyj go – a my automatycznie utworzymy odniesienie bibliograficzne do wybranej pracy w stylu cytowania, którego potrzebujesz: APA, MLA, Harvard, Chicago, Vancouver itp.

Możesz również pobrać pełny tekst publikacji naukowej w formacie „.pdf” i przeczytać adnotację do pracy online, jeśli odpowiednie parametry są dostępne w metadanych.

Artykuły w czasopismach na temat "Computational language documentation"

A, Vinnarasu, i Deepa V. Jose. "Speech to text conversion and summarization for effective understanding and documentation". International Journal of Electrical and Computer Engineering (IJECE) 9, nr 5 (1.10.2019): 3642. http://dx.doi.org/10.11591/ijece.v9i5.pp3642-3648.

Pełny tekst źródła

Streszczenie:

<p class="western" style="margin-top: 0.21cm; margin-bottom: 0cm;" align="justify"><span>Speech, is the most powerful way of communication with which human beings express their thoughts and feelings through different languages. The features of speech differs with each language. However, even while communicating in the same language, the pace and the dialect varies with each person. This creates difficulty in understanding the conveyed message for some people. Sometimes lengthy speeches are also quite difficult to follow due to reasons such as different pronunciation, pace and so on. Speech recognition which is an inter disciplinary field of computational linguistics aids in developing technologies that empowers the recognition and translation of speech into text. Text summarization extracts the utmost important information from a source which is a text and provides the adequate summary of the same. The research work presented in this paper describes an easy and effective method for speech recognition. The speech is converted to the corresponding text and produces summarized text. This has various applications like lecture notes creation, summarizing catalogues for lengthy documents and so on. Extensive experimentation is performed to validate the efficiency of the proposed method</span></p>

Style APA, Harvard, Vancouver, ISO itp.

Feldman, Jerome A. "Advances in Embodied Construction Grammar". Constructions and Frames 12, nr 1 (29.07.2020): 149–69. http://dx.doi.org/10.1075/cf.00038.fel.

Pełny tekst źródła

Streszczenie:

Abstract This paper describes the continuing goals and present status of the ICSI/UC Berkeley efforts on Embodied Construction Grammar (ECG). ECG is semantics-based formalism grounded in cognitive linguistics. ECG is the most explicitly inter-disciplinary of the construction grammars with deep links to computation, neuroscience, and cognitive science. Work continues on core cognitive, computational, and linguistic issues, including aspects of the mind/body problem. Much of the recent emphasis has been on applications and on tools to facilitate new applications. Extensive documentation plus downloadable systems and grammars can be found at the ECG Homepage.1

Style APA, Harvard, Vancouver, ISO itp.

Feraru, Silvia Monica, Horia-Nicolai Teodorescu i Marius Dan Zbancioc. "SRoL - Web-based Resources for Languages and Language Technology e-Learning". International Journal of Computers Communications & Control 5, nr 3 (1.09.2010): 301. http://dx.doi.org/10.15837/ijccc.2010.3.2483.

Pełny tekst źródła

Streszczenie:

The SRoL Web-based spoken language repository and tool collection includes thousands of voice recordings grouped on sections like "Basic sounds of the Romanian language", "Emotional voices", "Specific language processes", "Pathological voices", "Comparison of natural and synthetic speech", "Gnathophonics and gnathosonics". The recordings are annotated and documented according to proprietary methodology and protocols. Moreover, we included on the site extended documentation on the Romanian language, on speech technology, and on tools, produced by the SRoL team, for voice analysis. The resources are a part of the CLARIN European Network for Language Resources. The resources and tools are useful in virtual learning for phonetics of the Romanian language, speech technology, and medical subjects related to voice. We report on several applications in language learning and voice technology classes. Here, we emphasize the utilization of the SRoL resources in education for medicine and speech rehabilitation.

Style APA, Harvard, Vancouver, ISO itp.

Madlazim, M., i Bagus Jaya Santosa. "Computational physics Using Python: Implementing Maxwell Equation for Circle Polarization". Jurnal Penelitian Fisika dan Aplikasinya (JPFA) 1, nr 1 (14.06.2011): 1. http://dx.doi.org/10.26740/jpfa.v1n1.p1-7.

Pełny tekst źródła

Streszczenie:

Python is a relatively new computing language, created by Guido van Rossum [A.S. Tanenbaum, R. van Renesse, H. van Staveren, G.J. Sharp, S.J. Mullender, A.J. Jansen, G. van Rossum, Experiences with the Amoeba distributed operating system, Communications of the ACM 33 (1990) 46–63; also on-line at http://www.cs.vu.nl/pub/amoeba/, which is particularly suitable for teaching a course in computational physics. There are two questions to be considered: (i) For whom is the course intended? (ii) What are the criteria for a suitable language, and why choose Python? The criteria include the nature of the application. High performance computing requires a compiled language, e.g., FORTRAN. For some applications a computer algebra, e.g., Maple, is appropriate. For teaching, and for program development, an interpreted language has considerable advantages: Python appears particularly suitable. Python‟s attractions include (i) its system of modules which makes it easy to extend, (ii) its excellent graphics (VPython module), (iii) its excellent on line documentation, (iv) it is free and can be downloaded from the web. Python and VPython will be described briefly, and some programs demonstrated numerical and animation of some phenomenal physics. In this article, we gave solution of circle polarization by solving Maxwell equation.

Style APA, Harvard, Vancouver, ISO itp.

Et.al, Naveen N. Kulkarni. "Tailoring effective requirement's specification for ingenuity in Software Development Life Cycle." Turkish Journal of Computer and Mathematics Education (TURCOMAT) 12, nr 3 (11.04.2021): 3338–44. http://dx.doi.org/10.17762/turcomat.v12i3.1590.

Pełny tekst źródła

Streszczenie:

Software Requirements Engineering (SRE) process define software manuscripts with sustaining Software Requirement Specification (SRS) and its activities. SRE comprises many tasks requirement analysis, elicitation, documentation, conciliation and validation. Natural language is most popular and commonly used to form the SRS document. However, natural language has its own limitations wrt quality approach for SRS. The constraints include incomplete, incorrect, ambiguous, and inconsistency. In software engineering, most applications are object-oriented. So requirements are unlike problem domain need to be developed. So software documentation is completed in such a way that, all authorized users like clients, analysts, managers, and developers can understand it. These are the basis for success of any planned project. Most of the work is still dependent on intensive human (domain expert) work. consequences of the project success still depend on timeliness with tending errors. The fundamental quality intended for each activity is specified during the software development process. This paper concludes critically with best practices in writing SRS. This approach helps to mitigate SRS limitation up to some extent. An initial review highlights capable results for the proposed practices

Style APA, Harvard, Vancouver, ISO itp.

Rougny, Adrien. "sbgntikz—a TikZ library to draw SBGN maps". Bioinformatics 35, nr 21 (9.05.2019): 4499–500. http://dx.doi.org/10.1093/bioinformatics/btz287.

Pełny tekst źródła

Streszczenie:

Abstract Summary The systems biology graphical notation (SBGN) has emerged as the main standard to represent biological maps graphically. It comprises three complementary languages: Process Description, for detailed biomolecular processes; Activity Flow, for influences of biological activities and Entity Relationship, for independent relations shared among biological entities. On the other hand, TikZ is one of the most commonly used package to ‘program’ graphics within TEX/LATEX. Here, we present sbgntikz, a TikZ library that allows drawing and customizing SBGN maps directly into TEX/LATEX documents, using the TikZ language. sbgntikz supports all glyphs of the three SBGN languages, and offers options that facilitate the drawing of complex glyph assembly within TikZ. Furthermore, sbgntikz is provided together with a converter that allows transforming any SBGN map stored under the SBGN Markup Language into a TikZ picture, or rendering it directly into a PDF file. Availability and implementation sbgntikz, the SBGN-ML to sbgntikz converter, as well as a complete documentation can be freely downloaded from https://github.com/Adrienrougny/sbgntikz/. The library and the converter are compatible with all recent operating systems, including Windows, MacOS, and all common Linux distributions. Supplementary information Supplementary material is available at Bioinformatics online.

Style APA, Harvard, Vancouver, ISO itp.

Zulkower, Valentin, i Susan Rosser. "DNA Features Viewer: a sequence annotation formatting and plotting library for Python". Bioinformatics 36, nr 15 (8.07.2020): 4350–52. http://dx.doi.org/10.1093/bioinformatics/btaa213.

Pełny tekst źródła

Streszczenie:

Abstract Motivation Although the Python programming language counts many Bioinformatics and Computational Biology libraries; none offers customizable sequence annotation visualizations with layout optimization. Results DNA Features Viewer is a sequence annotation plotting library which optimizes plot readability while letting users tailor other visual aspects (colors, labels, highlights etc.) to their particular use case. Availability and implementation Open-source code and documentation are available on Github under the MIT license (https://github.com/Edinburgh-Genome-Foundry/DnaFeaturesViewer). Supplementary information Supplementary data are available at Bioinformatics online.

Style APA, Harvard, Vancouver, ISO itp.

Vo, Hoang Nhat Khang, Duc Dong Le, Tran Minh Dat Phan, Tan Sang Nguyen, Quoc Nguyen Pham, Ngoc Oanh Tran, Quang Duc Nguyen, Tran Minh Hieu Vo i Tho Quan. "Revitalizing Bahnaric Language through Neural Machine Translation: Challenges, Strategies, and Promising Outcomes". Proceedings of the AAAI Conference on Artificial Intelligence 38, nr 21 (24.03.2024): 23360–68. http://dx.doi.org/10.1609/aaai.v38i21.30385.

Pełny tekst źródła

Streszczenie:

The Bahnar, a minority ethnic group in Vietnam with ancient roots, hold a language of deep cultural and historical significance. The government is prioritizing the preservation and dissemination of Bahnar language through online availability and cross-generational communication. Recent AI advances, including Neural Machine Translation (NMT), have transformed translation with improved accuracy and fluency, fostering language revitalization through learning, communication, and documentation. In particular, NMT enhances accessibility for Bahnar language speakers, making information and content more available. However, translating Vietnamese to Bahnar language faces practical hurdles due to resource limitations, particularly in the case of Bahnar language as an extremely low-resource language. These challenges encompass data scarcity, vocabulary constraints, and a lack of fine-tuning data. To address these, we propose transfer learning from selected pre-trained models to optimize translation quality and computational efficiency, capitalizing on linguistic similarities between Vietnamese and Bahnar language. Concurrently, we apply tailored augmentation strategies to adapt machine translation for the Vietnamese-Bahnar language context. Our approach is validated through superior results on bilingual Vietnamese-Bahnar language datasets when compared to baseline models. By tackling translation challenges, we help revitalize Bahnar language, ensuring information flows freely and the language thrives.

Style APA, Harvard, Vancouver, ISO itp.

Jones, Joshua P., Kurama Okubo, Tim Clements i Marine A. Denolle. "SeisIO: A Fast, Efficient Geophysical Data Architecture for the Julia Language". Seismological Research Letters 91, nr 4 (29.04.2020): 2368–77. http://dx.doi.org/10.1785/0220190295.

Pełny tekst źródła

Streszczenie:

Abstract SeisIO for the Julia language is a new geophysical data framework that combines the intuitive syntax of a high-level language with performance comparable to FORTRAN or C. Benchmark comparisons against recent versions of popular programs for seismic data download and analysis demonstrate significant improvements in file read speed and orders-of-magnitude improvements in memory overhead. Because the Julia language natively supports parallel computing with an intuitive syntax, we benchmark test parallel download and processing of multiweek segments of contiguous data from two sets of 10 broadband seismic stations, and find that SeisIO outperforms two popular Python-based tools for data downloads. The current capabilities of SeisIO include file read support for several geophysical data formats, online data access using a variety of services, and optimized versions of several common data processing operations. Tutorial notebooks and extensive documentation are available to improve the user experience. As an accessible example of performant scientific computing for the next generation of researchers, SeisIO offers ease of use and rapid learning without sacrificing computational efficiency.

Style APA, Harvard, Vancouver, ISO itp.

Swapnil Shinde, Vishnu Suryawanshi, Varsha Jadhav, Nakul Sharma, Mandar Diwakar,. "Graph-Based Keyphrase Extraction for Software Traceability in Source Code and Documentation Mapping". International Journal on Recent and Innovation Trends in Computing and Communication 11, nr 9 (30.10.2023): 832–36. http://dx.doi.org/10.17762/ijritcc.v11i9.8973.

Pełny tekst źródła

Streszczenie:

Natural Language Processing (NLP) forms the basis of several computational tasks. However, when applied to the software system’s, NLP provides several irrelevant features and the noise gets mixed up while extracting features. As the scale of software system’s increases, different metrics are needed to assess these systems. Diagrammatic and visual representation of the SE projects code forms an essential component of Source Code Analysis (SCA). These SE projects cannot be analyzed by traditional source code analysis methods nor can they be analyzed by traditional diagrammatic representation. Hence, there is a need to modify the traditional approaches in lieu of changing environments to reduce learning gap for the developers and traceability engineers. The traditional approaches fall short in addressing specific metrics in terms of document similarity and graph dependency approaches. In terms of source code analysis, the graph dependency graph can be used for finding the relevant key-terms and keyphrases as they occur not just intra-document but also inter-document. In this work, a similarity measure based on context is proposed which can be employed to find a traceability link between the source code metrics and API documents present in a package. Probabilistic graph-based keyphrase extraction approach is used for searching across the different project files.

Style APA, Harvard, Vancouver, ISO itp.

Więcej źródeł

Rozprawy doktorskie na temat "Computational language documentation"

Godard, Pierre. "Unsupervised word discovery for computational language documentation". Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLS062/document.

Pełny tekst źródła

Streszczenie:

La diversité linguistique est actuellement menacée : la moitié des langues connues dans le monde pourraient disparaître d'ici la fin du siècle. Cette prise de conscience a inspiré de nombreuses initiatives dans le domaine de la linguistique documentaire au cours des deux dernières décennies, et 2019 a été proclamée Année internationale des langues autochtones par les Nations Unies, pour sensibiliser le public à cette question et encourager les initiatives de documentation et de préservation. Néanmoins, ce travail est coûteux en temps, et le nombre de linguistes de terrain, limité. Par conséquent, le domaine émergent de la documentation linguistique computationnelle (CLD) vise à favoriser le travail des linguistes à l'aide d'outils de traitement automatique. Le projet Breaking the Unwritten Language Barrier (BULB), par exemple, constitue l'un des efforts qui définissent ce nouveau domaine, et réunit des linguistes et des informaticiens. Cette thèse examine le problème particulier de la découverte de mots dans un flot non segmenté de caractères, ou de phonèmes, transcrits à partir du signal de parole dans un contexte de langues très peu dotées. Il s'agit principalement d'une procédure de segmentation, qui peut également être couplée à une procédure d'alignement lorsqu'une traduction est disponible. En utilisant deux corpus en langues bantoues correspondant à un scénario réaliste pour la linguistique documentaire, l'un en Mboshi (République du Congo) et l'autre en Myene (Gabon), nous comparons diverses méthodes monolingues et bilingues de découverte de mots sans supervision. Nous montrons ensuite que l'utilisation de connaissances linguistiques expertes au sein du formalisme des Adaptor Grammars peut grandement améliorer les résultats de la segmentation, et nous indiquons également des façons d'utiliser ce formalisme comme outil de décision pour le linguiste. Nous proposons aussi une variante tonale pour un algorithme de segmentation bayésien non-paramétrique, qui utilise un schéma de repli modifié pour capturer la structure tonale. Pour tirer parti de la supervision faible d'une traduction, nous proposons et étendons, enfin, une méthode de segmentation neuronale basée sur l'attention, et améliorons significativement la performance d'une méthode bilingue existante
Language diversity is under considerable pressure: half of the world’s languages could disappear by the end of this century. This realization has sparked many initiatives in documentary linguistics in the past two decades, and 2019 has been proclaimed the International Year of Indigenous Languages by the United Nations, to raise public awareness of the issue and foster initiatives for language documentation and preservation. Yet documentation and preservation are time-consuming processes, and the supply of field linguists is limited. Consequently, the emerging field of computational language documentation (CLD) seeks to assist linguists in providing them with automatic processing tools. The Breaking the Unwritten Language Barrier (BULB) project, for instance, constitutes one of the efforts defining this new field, bringing together linguists and computer scientists. This thesis examines the particular problem of discovering words in an unsegmented stream of characters, or phonemes, transcribed from speech in a very-low-resource setting. This primarily involves a segmentation procedure, which can also be paired with an alignment procedure when a translation is available. Using two realistic Bantu corpora for language documentation, one in Mboshi (Republic of the Congo) and the other in Myene (Gabon), we benchmark various monolingual and bilingual unsupervised word discovery methods. We then show that using expert knowledge in the Adaptor Grammar framework can vastly improve segmentation results, and we indicate ways to use this framework as a decision tool for the linguist. We also propose a tonal variant for a strong nonparametric Bayesian segmentation algorithm, making use of a modified backoff scheme designed to capture tonal structure. To leverage the weak supervision given by a translation, we finally propose and extend an attention-based neural segmentation method, improving significantly the segmentation performance of an existing bilingual method

Style APA, Harvard, Vancouver, ISO itp.

Steensland, Henrik, i Dina Dervisevic. "Controlled Languages in Software User Documentation". Thesis, Linköping University, Department of Computer and Information Science, 2005. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-4637.

Pełny tekst źródła

Streszczenie:

In order to facilitate comprehensibility and translation, the language used in software user documentation must be standardized. If the terminology and language rules are standardized and consistent, the time and cost of translation will be reduced. For this reason, controlled languages have been developed. Controlled languages are subsets of other languages, purposely limited by restricting the terminology and grammar that is allowed.

The purpose and goal of this thesis is to investigate how using a controlled language can improve comprehensibility and translatability of software user documentation written in English. In order to reach our goal, we have performed a case study at IFS AB. We specify a number of research questions that help satisfy some of the goals of IFS and, when generalized, fulfill the goal of this thesis.

A major result of our case study is a list of sixteen controlled language rules. Some examples of these rules are control of the maximum allowed number of words in a sentence, and control of when the author is allowed to use past participles. We have based our controlled language rules on existing controlled languages, style guides, research reports, and the opinions of technical writers at IFS.

When we applied these rules to different user documentation texts at IFS, we managed to increase the readability score for each of the texts. Also, during an assessment test of readability and translatability, the rewritten versions were chosen in 85 % of the cases by experienced technical writers at IFS.

Another result of our case study is a prototype application that shows that it is possible to develop and use a software checker for helping the authors when writing documentation according to our suggested controlled language rules.

Style APA, Harvard, Vancouver, ISO itp.

Tao, Joakim, i David Thimrén. "Smoothening of Software documentation : comparing a self-made sequence to sequence model to a pre-trained model GPT-2". Thesis, Linköpings universitet, Institutionen för datavetenskap, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-178186.

Pełny tekst źródła

Streszczenie:

This thesis was done in collaboration with Ericsson AB with the goal of researching the possibility of creating a machine learning model that can transfer the style of a text into another arbitrary style depending on the data used. This had the purpose of making their technical documentation appear to have been written with one cohesive style for a better reading experience. Two approaches to solve this task were tested, the first one was to implement an encoder-decoder model from scratch, and the second was to use the pre-trained GPT-2 model created by a team from OpenAI and fine-tune the model on the specific task. Both of these models were trained on data provided by Ericsson, sentences were extracted from their documentation. To evaluate the models training loss, test sentences, and BLEU scores were used and these were compared to each other and with other state-of-the-art models. The models did not succeed in transforming text into a general technical documentation style but a good understanding of what would need to be improved and adjusted to improve the results were obtained.

This thesis was presented on June 22, 2021, the presentation was done online on Microsoft teams.

Style APA, Harvard, Vancouver, ISO itp.

Helmersson, Benjamin. "Definition Extraction From Swedish Technical Documentation : Bridging the gap between industry and academy approaches". Thesis, Linköpings universitet, Institutionen för datavetenskap, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-131057.

Pełny tekst źródła

Streszczenie:

Terminology is concerned with the creation and maintenance of concept systems, terms and definitions. Automatic term and definition extraction is used to simplify this otherwise manual and sometimes tedious process. This thesis presents an integrated approach of pattern matching and machine learning, utilising feature vectors in which each feature is a Boolean function of a regular expression. The integrated approach is compared with the two more classic approaches, showing a significant increase in recall while maintaining a comparable precision score. Less promising is the negative correlation between the performance of the integrated approach and training size. Further research is suggested.

Style APA, Harvard, Vancouver, ISO itp.

Okabe, Shu. "Modèles faiblement supervisés pour la documentation automatique des langues". Electronic Thesis or Diss., université Paris-Saclay, 2023. http://www.theses.fr/2023UPASG091.

Pełny tekst źródła

Streszczenie:

Face à la menace d'extinction de la moitié des langues parlées aujourd'hui d'ici la fin du siècle, la documentation des langues est un domaine de la linguistique notamment consacré à la collecte, annotation et archivage de données. Dans ce contexte, la documentation automatique des langues vise à outiller les linguistes pour faciliter différentes étapes de la documentation, à travers des approches de traitement automatique du langage.Dans le cadre du projet de documentation automatique CLD2025, cette thèse s'intéresse principalement à deux tâches : la segmentation en mots, identifiant les frontières des mots dans une transcription non segmentée d'une phrase enregistrée, ainsi que la génération de gloses interlinéaires, prédisant des annotations linguistiques pour chaque unité de la phrase. Pour la première, nous améliorons les performances des modèles bayésiens non paramétriques utilisés jusque là à travers une supervision faible, en nous appuyant sur des ressources disponibles de manière réaliste lors de la documentation, comme des phrases déjà segmentées ou des lexiques. Comme nous observons toujours une tendance de sur-segmentation dans nos modèles, nous introduisons un second niveau de segmentation : les morphèmes. Nos expériences avec divers types de modèles de segmentation à deux niveaux indiquent une qualité de segmentation sensiblement meilleure ; nous constatons, par ailleurs, les limites des approches uniquement statistiques pour différencier les mots des morphèmes.La seconde tâche concerne la génération de gloses, soit grammaticales, soit lexicales. Comme ces dernières ne peuvent pas être prédites en se basant seulement sur les données d'entraînement, notre modèle statistique d'étiquetage de séquences fait moduler, pour chaque phrase, les étiquettes possibles et propose une approche compétitive avec les modèles neuronaux les plus récents
In the wake of the threat of extinction of half of the languages spoken today by the end of the century, language documentation is a field of linguistics notably dedicated to the recording, annotation, and archiving of data. In this context, computational language documentation aims to devise tools for linguists to ease several documentation steps through natural language processing approaches.As part of the CLD2025 computational language documentation project, this thesis focuses mainly on two tasks: word segmentation to identify word boundaries in an unsegmented transcription of a recorded sentence and automatic interlinear glossing to predict linguistic annotations for each sentence unit.For the first task, we improve the performance of the Bayesian non-parametric models used until now through weak supervision. For this purpose, we leverage realistically available resources during documentation, such as already-segmented sentences or dictionaries. Since we still observe an over-segmenting tendency in our models, we introduce a second segmentation level: the morphemes. Our experiments with various types of two-level segmentation models indicate a slight improvement in the segmentation quality. However, we also face limitations in differentiating words from morphemes, using statistical cues only. The second task concerns the generation of either grammatical or lexical glosses. As the latter cannot be predicted using training data solely, our statistical sequence-labelling model adapts the set of possible labels for each sentence and provides a competitive alternative to the most recent neural models

Style APA, Harvard, Vancouver, ISO itp.

Palmer, Alexis Mary. "Semi-automated annotation and active learning for language documentation". 2009. http://hdl.handle.net/2152/19805.

Pełny tekst źródła

Streszczenie:

By the end of this century, half of the approximately 6000 extant languages will cease to be transmitted from one generation to the next. The field of language documentation seeks to make a record of endangered languages before they reach the point of extinction, while they are still in use. The work of documenting and describing a language is difficult and extremely time-consuming, and resources are extremely limited. Developing efficient methods for making lasting records of languages may increase the amount of documentation achieved within budget restrictions. This thesis approaches the problem from the perspective of computational linguistics, asking whether and how automated language processing can reduce human annotation effort when very little labeled data is available for model training. The task addressed is morpheme labeling for the Mayan language Uspanteko, and we test the effectiveness of two complementary types of machine support: (a) learner-guided selection of examples for annotation (active learning); and (b) annotator access to the predictions of the learned model (semi-automated annotation). Active learning (AL) has been shown to increase efficacy of annotation effort for many different tasks. Most of the reported results, however, are from studies which simulate annotation, often assuming a single, infallible oracle. In our studies, crucially, annotation is not simulated but rather performed by human annotators. We measure and record the time spent on each annotation, which in turn allows us to evaluate the effectiveness of machine support in terms of actual annotation effort. We report three main findings with respect to active learning. First, in order for efficiency gains reported from active learning to be meaningful for realistic annotation scenarios, the type of cost measurement used to gauge those gains must faithfully reflect the actual annotation cost. Second, the relative effectiveness of different selection strategies in AL seems to depend in part on the characteristics of the annotator, so it is important to model the individual oracle or annotator when choosing a selection strategy. And third, the cost of labeling a given instance from a sample is not a static value but rather depends on the context in which it is labeled. We report two main findings with respect to semi-automated annotation. First, machine label suggestions have the potential to increase annotator efficacy, but the degree of their impact varies by annotator, with annotator expertise a likely contributing factor. At the same time, we find that implementation and interface must be handled very carefully if we are to accurately measure gains from semi-automated annotation. Together these findings suggest that simulated annotation studies fail to model crucial human factors inherent to applying machine learning strategies in real annotation settings.
text

Style APA, Harvard, Vancouver, ISO itp.

Książki na temat "Computational language documentation"

Grenoble, Lenore A. Language documentation: Practice and values. Amsterdam: John Benjamins Pub. Company, 2010.

Znajdź pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

Corpus-based studies in language use, language learning, and language documentation. Amsterdam: Rodopi, 2011.

Znajdź pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

Language documentation: Practices and values. Amsterdam: John Benjamins Pub. Company, 2010.

Znajdź pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

Humphrey, Tonkin, Johnson-Weiner Karen, Center for Research and Documentation on World Language Problems. i Conference on Language and Communication (4th : 1985 : New York, N.Y.), red. Overcoming language barriers: The human/machine relationship : report of the fourth annual conference of the Center for Research and Documentation on World Language Problems, New York, December 13-14, 1985. New York: The Center, 1986.

Znajdź pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

1962-, Gelbukh Alexander, i LINK (Online service), red. Computational linguistics and intelligent text processing: 7th international conference, CICLing 2006, Mexico City, Mexico, February 19-25, 2006 : proceedings. Berlin: Springer, 2006.

Znajdź pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

Furbee, N. Louanna, i Lenore A. Grenoble. Language Documentation: Practice and Values. Benjamins Publishing Company, John, 2010.

Znajdź pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

Intelligent Document Retrieval: Exploiting Markup Structure (The Information Retrieval Series). Springer, 2005.

Znajdź pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

Kruschwitz, Udo. Intelligent Document Retrieval: Exploiting Markup Structure. Springer, 2010.

Znajdź pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

Kruschwitz, Udo. Intelligent Document Retrieval: Exploiting Markup Structure. Springer London, Limited, 2005.

Znajdź pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

Kruschwitz, Udo. Intelligent Document Retrieval. Springer, 2008.

Znajdź pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

Części książek na temat "Computational language documentation"

Maxwell, Mike, i Jonathan D. Amith. "Language Documentation: The Nahuatl Grammar". W Computational Linguistics and Intelligent Text Processing, 474–85. Berlin, Heidelberg: Springer Berlin Heidelberg, 2005. http://dx.doi.org/10.1007/978-3-540-30586-6_52.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

Kulikowski, Casimir A. "50 Years of Achievements and Persistent Challenges for Biomedical and Health Informatics and John Mantas’ Educational and Nursing Informatics Contributions". W Studies in Health Technology and Informatics. IOS Press, 2022. http://dx.doi.org/10.3233/shti220936.

Pełny tekst źródła

Streszczenie:

Biomedical and Health Informatics (BMHI) have been essential catalysts for achievements in medical research and healthcare applications over the past 50 years. These include increasingly sophisticated information systems and data bases for documentation and processing, standardization of biomedical data, nomenclatures, and vocabularies to assist with large scale literature indexing and text analysis for information retrieval, and methods for computationally modeling and analyzing research and clinical data. Statistical and AI techniques for decision support, instrumentation integration, and workflow aids with improved data/information management tools are critical for scientific discoveries in the - omics revolutions with their related drug and vaccine breakthroughs and their translation to clinical and preventive healthcare. Early work on biomedical image and pattern recognition, knowledge-based expert systems, innovative database, software and simulation techniques, natural language processing and computational ontologies have all been invaluable for basic research and education. However, these methods are still in their infancy and many fundamental open scientific problems abound. Scientifically this is due to persistent limitations in understanding biological processes within complex living environments and ecologies. In clinical practice the modeling of fluid practitioner roles and methods as they adjust to novel cybernetic technologies present great opportunities but also the potential of unintended e-iatrogenic harms which must be constrained in order to adhere to ethical Hippocratic norms of responsible behavior. Balancing the art, science, and technologies of BMHI has been a hallmark of debates about the field’s historical evolution. The present article reviews selected milestones, achievements, and challenges in BMHI education mainly, from a historical perspective, including some commentaries from leaders and pioneers in the field, a selection of which have been published online recently by the International Medical Informatics Association (IMIA) as the first volume of an IMIA History WG eBook. The focus of this chapter is primarily on the development of BMHI in terms of those of its educational activities which have been most significant during the first half century of IMIA, and it concentrates mainly on the leadership and contributions of John Mantas who is being honored on his retirement by the Symposia in Athens for which this chapter has been written.

Style APA, Harvard, Vancouver, ISO itp.

Streszczenia konferencji na temat "Computational language documentation"

Okabe, Shu, Laurent Besacier i François Yvon. "Weakly Supervised Word Segmentation for Computational Language Documentation". W Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, PA, USA: Association for Computational Linguistics, 2022. http://dx.doi.org/10.18653/v1/2022.acl-long.510.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

Zariquiey, Roberto, Arturo Oncevay i Javier Vera. "CLD² Language Documentation Meets Natural Language Processing for Revitalising Endangered Languages". W Proceedings of the Fifth Workshop on the Use of Computational Methods in the Study of Endangered Languages. Stroudsburg, PA, USA: Association for Computational Linguistics, 2022. http://dx.doi.org/10.18653/v1/2022.computel-1.4.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

Bird, Steven, Florian R. Hanke, Oliver Adams i Haejoong Lee. "Aikuma: A Mobile App for Collaborative Language Documentation". W Proceedings of the 2014 Workshop on the Use of Computational Methods in the Study of Endangered Languages. Stroudsburg, PA, USA: Association for Computational Linguistics, 2014. http://dx.doi.org/10.3115/v1/w14-2201.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

Little, Alexa N. "Connecting Documentation and Revitalization: A New Approach to Language Apps". W Proceedings of the 2nd Workshop on the Use of Computational Methods in the Study of Endangered Languages. Stroudsburg, PA, USA: Association for Computational Linguistics, 2017. http://dx.doi.org/10.18653/v1/w17-0120.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

Bettinson, Mat, i Steven Bird. "Developing a Suite of Mobile Applications for Collaborative Language Documentation". W Proceedings of the 2nd Workshop on the Use of Computational Methods in the Study of Endangered Languages. Stroudsburg, PA, USA: Association for Computational Linguistics, 2017. http://dx.doi.org/10.18653/v1/w17-0121.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

Blokland, Rogier, Niko Partanen i Michael Rießler. "A pseudonymisation method for language documentation corpora: An experiment with spoken Komi". W Proceedings of the Sixth International Workshop on Computational Linguistics of Uralic Languages. Stroudsburg, PA, USA: Association for Computational Linguistics, 2020. http://dx.doi.org/10.18653/v1/2020.iwclul-1.1.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

Anastasopoulos, Antonios, i David Chiang. "A case study on using speech-to-translation alignments for language documentation". W Proceedings of the 2nd Workshop on the Use of Computational Methods in the Study of Endangered Languages. Stroudsburg, PA, USA: Association for Computational Linguistics, 2017. http://dx.doi.org/10.18653/v1/w17-0123.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

Gerstenberger, Ciprian, Niko Partanen, Michael Rießler i Joshua Wilbur. "Instant Annotations ėxtendash Applying NLP Methods to the Annotation of Spoken Language Documentation Corpora". W Proceedings of the Third Workshop on Computational Linguistics for Uralic Languages. Stroudsburg, PA, USA: Association for Computational Linguistics, 2017. http://dx.doi.org/10.18653/v1/w17-0604.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

Walther, Géraldine, i Benoît Sagot. "Speeding up corpus development for linguistic research: language documentation and acquisition in Romansh Tuatschin". W Proceedings of the Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature. Stroudsburg, PA, USA: Association for Computational Linguistics, 2017. http://dx.doi.org/10.18653/v1/w17-2212.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

Shi, Jiatong, Jonathan D. Amith, Rey Castillo García, Esteban Guadalupe Sierra, Kevin Duh i Shinji Watanabe. "Leveraging End-to-End ASR for Endangered Language Documentation: An Empirical Study on Yolóxochitl Mixtec". W Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. Stroudsburg, PA, USA: Association for Computational Linguistics, 2021. http://dx.doi.org/10.18653/v1/2021.eacl-main.96.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

Raporty organizacyjne na temat "Computational language documentation"

González-Montaña, Luis Antonio. Semantic-based methods for morphological descriptions: An applied example for Neotropical species of genus Lepidocyrtus Bourlet, 1839 (Collembola: Entomobryidae). Verlag der Österreichischen Akademie der Wissenschaften, listopad 2021. http://dx.doi.org/10.1553/biosystecol.1.e71620.

Pełny tekst źródła

Streszczenie:

The production of semantic annotations has gained renewed attention due to the development of anatomical ontologies and the documentation of morphological data. Two methods are proposed in this production, differing in their methodological and philosophical approaches: class-based method and instance-based method. The first, the semantic annotations are established as class expressions, while in the second, the annotations incorporate individuals. An empirical evaluation of the above methods was applied in the morphological description of Neotropical species of the genus Lepidocyrtus (Collembola: Entomobryidae: Lepidocyrtinae). The semantic annotations are expressed as RDF triple, which is a language most flexible than the Entity-Quality syntax used commonly in the description of phenotypes. The morphological descriptions were built in Protégé 5.4.0 and stored in an RDF store created with Fuseki Jena. The semantic annotations based on RDF triple increase the interoperability and integration of data from diverse sources, e.g., museum data. However, computational challenges are present, which are related with the development of semi-automatic methods for the generation of RDF triple, interchanging between texts and RDF triple, and the access by non-expert users.

Style APA, Harvard, Vancouver, ISO itp.

Striuk, Andrii M., i Serhiy O. Semerikov. The Dawn of Software Engineering Education. [б. в.], luty 2020. http://dx.doi.org/10.31812/123456789/3671.

Pełny tekst źródła

Streszczenie:

Designing a mobile-oriented environment for professional and practical training requires determining the stable (fundamental) and mobile (technological) components of its content and determining the appropriate model for specialist training. In order to determine the ratio of fundamental and technological in the content of software engineers’ training, a retrospective analysis of the first model of training software engineers developed in the early 1970s was carried out and its compliance with the current state of software engineering development as a field of knowledge and a new the standard of higher education in Ukraine, specialty 121 “Software Engineering”. It is determined that the consistency and scalability inherent in the historically first training program are largely consistent with the ideas of evolutionary software design. An analysis of its content also provided an opportunity to identify the links between the training for software engineers and training for computer science, computer engineering, cybersecurity, information systems and technologies. It has been established that the fundamental core of software engineers’ training should ensure that students achieve such leading learning outcomes: to know and put into practice the fundamental concepts, paradigms and basic principles of the functioning of language, instrumental and computational tools for software engineering; know and apply the appropriate mathematical concepts, domain methods, system and object-oriented analysis and mathematical modeling for software development; put into practice the software tools for domain analysis, design, testing, visualization, measurement and documentation of software. It is shown that the formation of the relevant competencies of future software engineers must be carried out in the training of all disciplines of professional and practical training.

Style APA, Harvard, Vancouver, ISO itp.

Oferujemy zniżki na wszystkie plany premium dla autorów, których prace zostały uwzględnione w tematycznych zestawieniach literatury. Skontaktuj się z nami, aby uzyskać unikalny kod promocyjny!