Literatura académica sobre el tema "Pre-training corpora"
Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros
Consulte las listas temáticas de artículos, libros, tesis, actas de conferencias y otras fuentes académicas sobre el tema "Pre-training corpora".
Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.
También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.
Artículos de revistas sobre el tema "Pre-training corpora"
Sun, Yu, Shuohuan Wang, Yukun Li, et al. "ERNIE 2.0: A Continual Pre-Training Framework for Language Understanding." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 05 (2020): 8968–75. http://dx.doi.org/10.1609/aaai.v34i05.6428.
Texto completoMoodaley, Wayne, and Arnesh Telukdarie. "A Conceptual Framework for Subdomain Specific Pre-Training of Large Language Models for Green Claim Detection." European Journal of Sustainable Development 12, no. 4 (2023): 319. http://dx.doi.org/10.14207/ejsd.2023.v12n4p319.
Texto completoHussain, Rida Ghafoor. "RiskBERT: A Pre-Trained Insurance-Based Language Model for Text Classification." International Journal of Innovative Technology and Exploring Engineering 14, no. 7 (2025): 12–18. https://doi.org/10.35940/ijitee.f1097.14070625.
Texto completoLiu, Yinhan, Jiatao Gu, Naman Goyal, et al. "Multilingual Denoising Pre-training for Neural Machine Translation." Transactions of the Association for Computational Linguistics 8 (November 2020): 726–42. http://dx.doi.org/10.1162/tacl_a_00343.
Texto completoDean, Roger Thornton, and Marcus Thomas Pearce. "Algorithmically-generated Corpora that use Serial Compositional Principles Can Contribute to the Modeling of Sequential Pitch Structure in Non-tonal Music." Empirical Musicology Review 11, no. 1 (2016): 27. http://dx.doi.org/10.18061/emr.v11i1.4900.
Texto completoKreutzer, Julia, Isaac Caswell, Lisa Wang, et al. "Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets." Transactions of the Association for Computational Linguistics 10 (2022): 50–72. http://dx.doi.org/10.1162/tacl_a_00447.
Texto completoYuan, Sha, Hanyu Zhao, Zhengxiao Du, et al. "WuDaoCorpora: A super large-scale Chinese corpora for pre-training language models." AI Open 2 (2021): 65–68. http://dx.doi.org/10.1016/j.aiopen.2021.06.001.
Texto completoQian, Jing, Yong Yue, Katie Atkinson, and Gangmin Li. "Understanding Chinese Moral Stories with Further Pre-Training." International Journal on Natural Language Computing 12, no. 2 (2023): 01–12. http://dx.doi.org/10.5121/ijnlc.2023.12201.
Texto completoJing, Qian, Yue Yong, Atkinson Katie, and Li Gangmin. "Understanding Chinese Moral Stories with Further Pre-Training." International Journal on Natural Language Computing (IJNLC) 12, no. 2 (2023): 12. https://doi.org/10.5281/zenodo.7929155.
Texto completoChukhno, Olena, and Nataliia Tuchyna. "OVERCOMING DIFFICULATIES IN USING LINGUISTIC CORPORA FOR TEACHING ENGLISH TO PRE-SERVICE TEACHERS." Education. Innovation. Practice 12, no. 7 (2024): 91–105. http://dx.doi.org/10.31110/2616-650x-vol12i7-014.
Texto completoTesis sobre el tema "Pre-training corpora"
Ortiz, Suarez Pedro. "A Data-driven Approach to Natural Language Processing for Contemporary and Historical French." Electronic Thesis or Diss., Sorbonne université, 2022. http://www.theses.fr/2022SORUS155.
Texto completoLibros sobre el tema "Pre-training corpora"
Humphreys, S. C. Kinship in Ancient Athens. Oxford University Press, 2018. http://dx.doi.org/10.1093/oso/9780198788249.001.0001.
Texto completoPeters, Thomas A. Library Programs Online. ABC-CLIO, LLC, 2009. http://dx.doi.org/10.5040/9798400679216.
Texto completoCapítulos de libros sobre el tema "Pre-training corpora"
Perełkiewicz, Michał, and Rafał Poświata. "A Review of the Challenges with Massive Web-Mined Corpora Used in Large Language Models Pre-training." In Lecture Notes in Computer Science. Springer Nature Switzerland, 2025. https://doi.org/10.1007/978-3-031-81596-6_14.
Texto completoNag, Arijit, Bidisha Samanta, Animesh Mukherjee, Niloy Ganguly, and Soumen Chakrabarty. "Effect of Unknown and Fragmented Tokens on the Performance of Multilingual Language Models at Low-Resource Tasks." In Event Analytics across Languages and Communities. Springer Nature Switzerland, 2024. http://dx.doi.org/10.1007/978-3-031-64451-1_5.
Texto completoMahamoud, Ibrahim Souleiman, Mickaël Coustaty, Aurélie Joseph, Vincent Poulain d’Andecy, and Jean-Marc Ogier. "KAP: Pre-training Transformers for Corporate Documents Understanding." In Document Analysis and Recognition – ICDAR 2023 Workshops. Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-41501-2_5.
Texto completoSiva Raju, S., and Khushboo Ahire. "Enhancing the Quality of Pre-school Education Through Training of Anganwadi Workers: A CSR Initiative." In Corporate Social Responsibility in India. Springer Singapore, 2017. http://dx.doi.org/10.1007/978-981-10-3902-7_5.
Texto completoNaumenko, Maksym, Iryna Hrashchenko, Tetiana Tsalko, Svitlana Nevmerzhytska, Svitlana Krasniuk, and Yurii Kulynych. "Innovative technological modes of data mining and modelling for adaptive project management of food industry competitive enterprises in crisis conditions." In PROJECT MANAGEMENT: INDUSTRY SPECIFICS. TECHNOLOGY CENTER PC, 2024. https://doi.org/10.15587/978-617-8360-03-0.ch2.
Texto completoHo, Shaun. "Impacts of Continued Legal Pre-Training and IFT on LLMs’ Latent Representations of Human-Defined Legal Concepts." In Frontiers in Artificial Intelligence and Applications. IOS Press, 2024. https://doi.org/10.3233/faia241259.
Texto completoTufiş Dan. "Algorithms and Data Design Issues for Basic NLP Tools." In NATO Science for Peace and Security Series - D: Information and Communication Security. IOS Press, 2009. https://doi.org/10.3233/978-1-58603-954-7-3.
Texto completoStevens, Meg, Georgina Kennedy, and Timothy Churches. "Applying and Improving a Publicly Available Medication NER Pipeline in a Clinical Cancer EMR." In Studies in Health Technology and Informatics. IOS Press, 2024. http://dx.doi.org/10.3233/shti231051.
Texto completoJiang, Eric P. "Automatic Text Classification from Labeled and Unlabeled Data." In Intelligent Data Analysis for Real-Life Applications. IGI Global, 2012. http://dx.doi.org/10.4018/978-1-4666-1806-0.ch013.
Texto completoLiu, Ran, Ming Liu, Min Yu, et al. "GLIMMER: Incorporating Graph and Lexical Features in Unsupervised Multi-Document Summarization." In Frontiers in Artificial Intelligence and Applications. IOS Press, 2024. http://dx.doi.org/10.3233/faia240930.
Texto completoActas de conferencias sobre el tema "Pre-training corpora"
Vu, Thuy-Trang, Xuanli He, Gholamreza Haffari, and Ehsan Shareghi. "Koala: An Index for Quantifying Overlaps with Pre-training Corpora." In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, 2023. http://dx.doi.org/10.18653/v1/2023.emnlp-demo.7.
Texto completoLiu, Zhuang, Degen Huang, Kaiyu Huang, Zhuang Li, and Jun Zhao. "FinBERT: A Pre-trained Financial Language Representation Model for Financial Text Mining." In Twenty-Ninth International Joint Conference on Artificial Intelligence and Seventeenth Pacific Rim International Conference on Artificial Intelligence {IJCAI-PRICAI-20}. International Joint Conferences on Artificial Intelligence Organization, 2020. http://dx.doi.org/10.24963/ijcai.2020/622.
Texto completoQian, Jing, Yong Yue, Katie Atkinson, and Gangmin Li. "Knowledge-Enriched Moral Understanding upon Continual Pre-training." In 10th International Conference on Computer Networks & Communications (CCNET 2023). Academy and Industry Research Collaboration Center (AIRCC), 2023. http://dx.doi.org/10.5121/csit.2023.130414.
Texto completoLu, Jinliang, Yu Lu, and Jiajun Zhang. "Take a Closer Look at Multilinguality! Improve Multilingual Pre-Training Using Monolingual Corpora Only." In Findings of the Association for Computational Linguistics: EMNLP 2023. Association for Computational Linguistics, 2023. http://dx.doi.org/10.18653/v1/2023.findings-emnlp.190.
Texto completoXu, Yipei, Dakuan Lu, Jiaqing Liang, et al. "Source Prompt: Coordinated Pre-training of Language Models on Diverse Corpora from Multiple Sources." In CIKM '24: The 33rd ACM International Conference on Information and Knowledge Management. ACM, 2024. http://dx.doi.org/10.1145/3627673.3679835.
Texto completoWang, Xin'ao, Huan Li, Ke Chen, and Lidan Shou. "FedBFPT: An Efficient Federated Learning Framework for Bert Further Pre-training." In Thirty-Second International Joint Conference on Artificial Intelligence {IJCAI-23}. International Joint Conferences on Artificial Intelligence Organization, 2023. http://dx.doi.org/10.24963/ijcai.2023/483.
Texto completoQu, Yuanbin, Peihan Liu, Wei Song, Lizhen Liu, and Miaomiao Cheng. "A Text Generation and Prediction System: Pre-training on New Corpora Using BERT and GPT-2." In 2020 IEEE 10th International Conference on Electronics Information and Emergency Communication (ICEIEC). IEEE, 2020. http://dx.doi.org/10.1109/iceiec49280.2020.9152352.
Texto completoZan, Daoguang, Bei Chen, Dejian Yang, et al. "CERT: Continual Pre-training on Sketches for Library-oriented Code Generation." In Thirty-First International Joint Conference on Artificial Intelligence {IJCAI-22}. International Joint Conferences on Artificial Intelligence Organization, 2022. http://dx.doi.org/10.24963/ijcai.2022/329.
Texto completoEdwards, Aleksandra, Jose Camacho-Collados, Hélène De Ribaupierre, and Alun Preece. "Go Simple and Pre-Train on Domain-Specific Corpora: On the Role of Training Data for Text Classification." In Proceedings of the 28th International Conference on Computational Linguistics. International Committee on Computational Linguistics, 2020. http://dx.doi.org/10.18653/v1/2020.coling-main.481.
Texto completoEdwards, Aleksandra, Jose Camacho-Collados, Hélène De Ribaupierre, and Alun Preece. "Go Simple and Pre-Train on Domain-Specific Corpora: On the Role of Training Data for Text Classification." In Proceedings of the 28th International Conference on Computational Linguistics. International Committee on Computational Linguistics, 2020. http://dx.doi.org/10.18653/v1/2020.coling-main.481.
Texto completoInformes sobre el tema "Pre-training corpora"
Rosenblat, Sruly, Tim O'Reilly, and Ilan Strauss. Beyond Public Access in LLM Pre-Training Data: Non-public book content in OpenAI’s Models. AI Disclosures Project, Social Science Research Council, 2025. https://doi.org/10.35650/aidp.4111.d.2025.
Texto completoStrauss, Ilan, Isobel Moure, Tim O’Reilly, and Sruly Rosenblat. The State of AI Governance Research: AI Safety and Reliability in Real World Commercial Deployment. AI Disclosures Project, Social Science Research Council, 2025. https://doi.org/10.35650/aidp.4112.d.2025.
Texto completo