Auswahl der wissenschaftlichen Literatur zum Thema „Pre-training corpora“
Geben Sie eine Quelle nach APA, MLA, Chicago, Harvard und anderen Zitierweisen an
Inhaltsverzeichnis
Machen Sie sich mit den Listen der aktuellen Artikel, Bücher, Dissertationen, Berichten und anderer wissenschaftlichen Quellen zum Thema "Pre-training corpora" bekannt.
Neben jedem Werk im Literaturverzeichnis ist die Option "Zur Bibliographie hinzufügen" verfügbar. Nutzen Sie sie, wird Ihre bibliographische Angabe des gewählten Werkes nach der nötigen Zitierweise (APA, MLA, Harvard, Chicago, Vancouver usw.) automatisch gestaltet.
Sie können auch den vollen Text der wissenschaftlichen Publikation im PDF-Format herunterladen und eine Online-Annotation der Arbeit lesen, wenn die relevanten Parameter in den Metadaten verfügbar sind.
Zeitschriftenartikel zum Thema "Pre-training corpora"
Sun, Yu, Shuohuan Wang, Yukun Li, Shikun Feng, Hao Tian, Hua Wu und Haifeng Wang. „ERNIE 2.0: A Continual Pre-Training Framework for Language Understanding“. Proceedings of the AAAI Conference on Artificial Intelligence 34, Nr. 05 (03.04.2020): 8968–75. http://dx.doi.org/10.1609/aaai.v34i05.6428.
Der volle Inhalt der QuelleMoodaley, Wayne, und Arnesh Telukdarie. „A Conceptual Framework for Subdomain Specific Pre-Training of Large Language Models for Green Claim Detection“. European Journal of Sustainable Development 12, Nr. 4 (01.10.2023): 319. http://dx.doi.org/10.14207/ejsd.2023.v12n4p319.
Der volle Inhalt der QuelleLiu, Yinhan, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis und Luke Zettlemoyer. „Multilingual Denoising Pre-training for Neural Machine Translation“. Transactions of the Association for Computational Linguistics 8 (November 2020): 726–42. http://dx.doi.org/10.1162/tacl_a_00343.
Der volle Inhalt der QuelleDean, Roger Thornton, und Marcus Thomas Pearce. „Algorithmically-generated Corpora that use Serial Compositional Principles Can Contribute to the Modeling of Sequential Pitch Structure in Non-tonal Music“. Empirical Musicology Review 11, Nr. 1 (08.07.2016): 27. http://dx.doi.org/10.18061/emr.v11i1.4900.
Der volle Inhalt der QuelleYuan, Sha, Hanyu Zhao, Zhengxiao Du, Ming Ding, Xiao Liu, Yukuo Cen, Xu Zou, Zhilin Yang und Jie Tang. „WuDaoCorpora: A super large-scale Chinese corpora for pre-training language models“. AI Open 2 (2021): 65–68. http://dx.doi.org/10.1016/j.aiopen.2021.06.001.
Der volle Inhalt der QuelleKreutzer, Julia, Isaac Caswell, Lisa Wang, Ahsan Wahab, Daan van Esch, Nasanbayar Ulzii-Orshikh, Allahsera Tapo et al. „Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets“. Transactions of the Association for Computational Linguistics 10 (2022): 50–72. http://dx.doi.org/10.1162/tacl_a_00447.
Der volle Inhalt der QuelleQian, Jing, Yong Yue, Katie Atkinson und Gangmin Li. „Understanding Chinese Moral Stories with Further Pre-Training“. International Journal on Natural Language Computing 12, Nr. 2 (29.04.2023): 01–12. http://dx.doi.org/10.5121/ijnlc.2023.12201.
Der volle Inhalt der QuelleJiang, Xiaoze, Yaobo Liang, Weizhu Chen und Nan Duan. „XLM-K: Improving Cross-Lingual Language Model Pre-training with Multilingual Knowledge“. Proceedings of the AAAI Conference on Artificial Intelligence 36, Nr. 10 (28.06.2022): 10840–48. http://dx.doi.org/10.1609/aaai.v36i10.21330.
Der volle Inhalt der QuelleKajiwara, Tomoyuki, Biwa Miura und Yuki Arase. „Monolingual Transfer Learning via Bilingual Translators for Style-Sensitive Paraphrase Generation“. Proceedings of the AAAI Conference on Artificial Intelligence 34, Nr. 05 (03.04.2020): 8042–49. http://dx.doi.org/10.1609/aaai.v34i05.6314.
Der volle Inhalt der QuelleKryeziu, Labehat, und Visar Shehu. „Pre-Training MLM Using Bert for the Albanian Language“. SEEU Review 18, Nr. 1 (01.06.2023): 52–62. http://dx.doi.org/10.2478/seeur-2023-0035.
Der volle Inhalt der QuelleDissertationen zum Thema "Pre-training corpora"
Ortiz, Suarez Pedro. „A Data-driven Approach to Natural Language Processing for Contemporary and Historical French“. Electronic Thesis or Diss., Sorbonne université, 2022. http://www.theses.fr/2022SORUS155.
Der volle Inhalt der QuelleIn recent years, neural methods for Natural Language Processing (NLP) have consistently and repeatedly improved the state of the art in a wide variety of NLP tasks. One of the main contributing reasons for this steady improvement is the increased use of transfer learning techniques. These methods consist in taking a pre-trained model and reusing it, with little to no further training, to solve other tasks. Even though these models have clear advantages, their main drawback is the amount of data that is needed to pre-train them. The lack of availability of large-scale data previously hindered the development of such models for contemporary French, and even more so for its historical states.In this thesis, we focus on developing corpora for the pre-training of these transfer learning architectures. This approach proves to be extremely effective, as we are able to establish a new state of the art for a wide range of tasks in NLP for contemporary, medieval and early modern French as well as for six other contemporary languages. Furthermore, we are able to determine, not only that these models are extremely sensitive to pre-training data quality, heterogeneity and balance, but we also show that these three features are better predictors of the pre-trained models' performance in downstream tasks than the pre-training data size itself. In fact, we determine that the importance of the pre-training dataset size was largely overestimated, as we are able to repeatedly show that such models can be pre-trained with corpora of a modest size
Bücher zum Thema "Pre-training corpora"
Humphreys, S. C. Kinship in Ancient Athens. Oxford University Press, 2018. http://dx.doi.org/10.1093/oso/9780198788249.001.0001.
Der volle Inhalt der QuellePeters, Thomas A. Library Programs Online. ABC-CLIO, LLC, 2009. http://dx.doi.org/10.5040/9798400679216.
Der volle Inhalt der QuelleBuchteile zum Thema "Pre-training corpora"
Mahamoud, Ibrahim Souleiman, Mickaël Coustaty, Aurélie Joseph, Vincent Poulain d’Andecy und Jean-Marc Ogier. „KAP: Pre-training Transformers for Corporate Documents Understanding“. In Document Analysis and Recognition – ICDAR 2023 Workshops, 65–79. Cham: Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-41501-2_5.
Der volle Inhalt der QuelleSiva Raju, S., und Khushboo Ahire. „Enhancing the Quality of Pre-school Education Through Training of Anganwadi Workers: A CSR Initiative“. In Corporate Social Responsibility in India, 81–95. Singapore: Springer Singapore, 2017. http://dx.doi.org/10.1007/978-981-10-3902-7_5.
Der volle Inhalt der QuelleStevens, Meg, Georgina Kennedy und Timothy Churches. „Applying and Improving a Publicly Available Medication NER Pipeline in a Clinical Cancer EMR“. In Studies in Health Technology and Informatics. IOS Press, 2024. http://dx.doi.org/10.3233/shti231051.
Der volle Inhalt der QuelleJiang, Eric P. „Automatic Text Classification from Labeled and Unlabeled Data“. In Intelligent Data Analysis for Real-Life Applications, 249–64. IGI Global, 2012. http://dx.doi.org/10.4018/978-1-4666-1806-0.ch013.
Der volle Inhalt der QuelleSyed, Mahanazuddin, Shaymaa Al-Shukri, Shorabuddin Syed, Kevin Sexton, Melody L. Greer, Meredith Zozus, Sudeepa Bhattacharyya und Fred Prior. „DeIDNER Corpus: Annotation of Clinical Discharge Summary Notes for Named Entity Recognition Using BRAT Tool“. In Studies in Health Technology and Informatics. IOS Press, 2021. http://dx.doi.org/10.3233/shti210195.
Der volle Inhalt der QuelleRevenko, Artem, Victor Mireles, Anna Breit, Peter Bourgonje, Julian Moreno-Schneider, Maria Khvalchik und Georg Rehm. „Learning Ontology Classes from Text by Clustering Lexical Substitutes Derived from Language Models1“. In Towards a Knowledge-Aware AI. IOS Press, 2022. http://dx.doi.org/10.3233/ssw220018.
Der volle Inhalt der QuelleIyer, Usha. „Introduction“. In Dancing Women, 1–26. Oxford University Press, 2020. http://dx.doi.org/10.1093/oso/9780190938734.003.0001.
Der volle Inhalt der QuelleArya, Ali. „Content Description for Face Animation“. In Encyclopedia of Information Science and Technology, First Edition, 546–49. IGI Global, 2005. http://dx.doi.org/10.4018/978-1-59140-553-5.ch096.
Der volle Inhalt der QuelleBier, Ada, und Elena Borsetto. „Bisogni e preoccupazioni del corpo docente impegnato in English Medium Instruction (EMI) Una prospettiva italiana post-pandemia“. In La linguistica educativa tra ricerca e sperimentazione Scritti in onore di Carmel Mary Coonan. Venice: Fondazione Università Ca’ Foscari, 2023. http://dx.doi.org/10.30687/978-88-6969-683-1/018.
Der volle Inhalt der QuelleKonferenzberichte zum Thema "Pre-training corpora"
Vu, Thuy-Trang, Xuanli He, Gholamreza Haffari und Ehsan Shareghi. „Koala: An Index for Quantifying Overlaps with Pre-training Corpora“. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Stroudsburg, PA, USA: Association for Computational Linguistics, 2023. http://dx.doi.org/10.18653/v1/2023.emnlp-demo.7.
Der volle Inhalt der QuelleLiu, Zhuang, Degen Huang, Kaiyu Huang, Zhuang Li und Jun Zhao. „FinBERT: A Pre-trained Financial Language Representation Model for Financial Text Mining“. In Twenty-Ninth International Joint Conference on Artificial Intelligence and Seventeenth Pacific Rim International Conference on Artificial Intelligence {IJCAI-PRICAI-20}. California: International Joint Conferences on Artificial Intelligence Organization, 2020. http://dx.doi.org/10.24963/ijcai.2020/622.
Der volle Inhalt der QuelleQian, Jing, Yong Yue, Katie Atkinson und Gangmin Li. „Knowledge-Enriched Moral Understanding upon Continual Pre-training“. In 10th International Conference on Computer Networks & Communications (CCNET 2023). Academy and Industry Research Collaboration Center (AIRCC), 2023. http://dx.doi.org/10.5121/csit.2023.130414.
Der volle Inhalt der QuelleLu, Jinliang, Yu Lu und Jiajun Zhang. „Take a Closer Look at Multilinguality! Improve Multilingual Pre-Training Using Monolingual Corpora Only“. In Findings of the Association for Computational Linguistics: EMNLP 2023. Stroudsburg, PA, USA: Association for Computational Linguistics, 2023. http://dx.doi.org/10.18653/v1/2023.findings-emnlp.190.
Der volle Inhalt der QuelleWang, Xin'ao, Huan Li, Ke Chen und Lidan Shou. „FedBFPT: An Efficient Federated Learning Framework for Bert Further Pre-training“. In Thirty-Second International Joint Conference on Artificial Intelligence {IJCAI-23}. California: International Joint Conferences on Artificial Intelligence Organization, 2023. http://dx.doi.org/10.24963/ijcai.2023/483.
Der volle Inhalt der QuelleQu, Yuanbin, Peihan Liu, Wei Song, Lizhen Liu und Miaomiao Cheng. „A Text Generation and Prediction System: Pre-training on New Corpora Using BERT and GPT-2“. In 2020 IEEE 10th International Conference on Electronics Information and Emergency Communication (ICEIEC). IEEE, 2020. http://dx.doi.org/10.1109/iceiec49280.2020.9152352.
Der volle Inhalt der QuelleZan, Daoguang, Bei Chen, Dejian Yang, Zeqi Lin, Minsu Kim, Bei Guan, Yongji Wang, Weizhu Chen und Jian-Guang Lou. „CERT: Continual Pre-training on Sketches for Library-oriented Code Generation“. In Thirty-First International Joint Conference on Artificial Intelligence {IJCAI-22}. California: International Joint Conferences on Artificial Intelligence Organization, 2022. http://dx.doi.org/10.24963/ijcai.2022/329.
Der volle Inhalt der QuelleEdwards, Aleksandra, Jose Camacho-Collados, Hélène De Ribaupierre und Alun Preece. „Go Simple and Pre-Train on Domain-Specific Corpora: On the Role of Training Data for Text Classification“. In Proceedings of the 28th International Conference on Computational Linguistics. Stroudsburg, PA, USA: International Committee on Computational Linguistics, 2020. http://dx.doi.org/10.18653/v1/2020.coling-main.481.
Der volle Inhalt der QuelleEdwards, Aleksandra, Jose Camacho-Collados, Hélène De Ribaupierre und Alun Preece. „Go Simple and Pre-Train on Domain-Specific Corpora: On the Role of Training Data for Text Classification“. In Proceedings of the 28th International Conference on Computational Linguistics. Stroudsburg, PA, USA: International Committee on Computational Linguistics, 2020. http://dx.doi.org/10.18653/v1/2020.coling-main.481.
Der volle Inhalt der QuelleFlorencio, Felipe de A., Matheus S. de Lacerda, Anderson P. Cavalcanti und Vitor Rolim. „Three-Layer Denoiser: Denoising Parallel Corpora for NMT Systems“. In Encontro Nacional de Inteligência Artificial e Computacional. Sociedade Brasileira de Computação - SBC, 2023. http://dx.doi.org/10.5753/eniac.2023.234268.
Der volle Inhalt der Quelle