Log in

Relevant bibliographies by topics / Large language model

Contents

Journal articles
Dissertations / Theses
Books
Book chapters
Conference papers
Reports

Academic literature on the topic 'Large language model'

Author: Grafiati

Published: 29 July 2024

Last updated: 30 July 2024

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Large language model.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Large language model"

1

B, Mr DHANUSH. "CHATBOT USING LARGE LANGUAGE MODEL." INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 08, no. 05 (May 14, 2024): 1–5. http://dx.doi.org/10.55041/ijsrem34001.

Full text

Abstract:

The concept of Natural Language Processing has seen a remarkable advancement in the recent years. This remarkable advancement was particularly with the development of Large Language Models (LLM). Large Language Models are used to develop a human like conversations. This LLM is a part of Natural Language Processing which focuses on enabling computers to understand, interpret, and generate human language. The existing system of chatbots does not generate human like responses. The proposed system of chatbots uses the power of Large Language Models to generate more human like responses, providing the conversation in a natural way. By genereating human like respones, it will be in a natural way for the user. To enhance user experience, the chatbot uses a dynamic learning mechanism, by which it continuously adapt to user preferences and evolving conversational patterns. This system uses feedbacks from the users to refine its responses everytime.Moreover, the chatbot is designed with a multi-turn conversational context awareness, allowing it to maintain coherence and relevance throughout extended dialogues.The effectiveness of the proposed chatbot is evaluated through user testing, comparing its performance against traditional rule-based chatbots and existing conversational agents. This report explains about the usage of Large Language Models in the design and implementation of conversational chatbots. The outcomes of this research contribute to the advancement of intelligent chatbot systems, demonstrating the potential of large language models to significantly enhance conversational AI applications.

APA, Harvard, Vancouver, ISO, and other styles

2

Zhang, Chengyi, Xingyu Wang, and Ziyun Wang. "Large language model in electrocatalysis." Chinese Journal of Catalysis 59 (April 2024): 7–14. http://dx.doi.org/10.1016/s1872-2067(23)64612-1.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Sagi, Sriram. "Advancing AI: Enhancing Large Language Model Performance through GPU Optimization Techniques." International Journal of Science and Research (IJSR) 13, no. 3 (March 5, 2024): 630–33. http://dx.doi.org/10.21275/sr24309100709.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Baral, Elina, and Sagar Shrestha. "Large Vocabulary Continuous Speech Recognition for Nepali Language." International Journal of Signal Processing Systems 8, no. 4 (December 2020): 68–73. http://dx.doi.org/10.18178/ijsps.8.4.68-73.

Full text

Abstract:

Speech Recognition is a widely studied topic for high-resource languages like English and Mandarin. A plethora of publications exist that study the performance of several recognition methods for these languages. However differences in phonetics, accent, language model, etc between any two different languages demand for a study of speech recognition methodologies and components separately for each language. In this paper, we present a comparative study of popular speech recognition methods for Nepali, a low-resource Indo-Aryan language. We describe our approach to building the phonetic dictionary and present our findings for DNN and GMM based techniques with speaker adaptation on 50K vocabulary speech recognition task.

APA, Harvard, Vancouver, ISO, and other styles

5

Garg, Prerak, and Divya Beeram. "Large Language Model-Based Autonomous Agents." International Journal of Computer Trends and Technology 72, no. 5 (May 30, 2024): 151–62. http://dx.doi.org/10.14445/22312803/ijctt-v72i5p118.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Huang, Sen, Kaixiang Yang, Sheng Qi, and Rui Wang. "When large language model meets optimization." Swarm and Evolutionary Computation 90 (October 2024): 101663. http://dx.doi.org/10.1016/j.swevo.2024.101663.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Shi, Zhouxing, Yihan Wang, Fan Yin, Xiangning Chen, Kai-Wei Chang, and Cho-Jui Hsieh. "Red Teaming Language Model Detectors with Language Models." Transactions of the Association for Computational Linguistics 12 (2024): 174–89. http://dx.doi.org/10.1162/tacl_a_00639.

Full text

Abstract:

Abstract The prevalence and strong capability of large language models (LLMs) present significant safety and ethical risks if exploited by malicious users. To prevent the potentially deceptive usage of LLMs, recent work has proposed algorithms to detect LLM-generated text and protect LLMs. In this paper, we investigate the robustness and reliability of these LLM detectors under adversarial attacks. We study two types of attack strategies: 1) replacing certain words in an LLM’s output with their synonyms given the context; 2) automatically searching for an instructional prompt to alter the writing style of the generation. In both strategies, we leverage an auxiliary LLM to generate the word replacements or the instructional prompt. Different from previous works, we consider a challenging setting where the auxiliary LLM can also be protected by a detector. Experiments reveal that our attacks effectively compromise the performance of all detectors in the study with plausible generations, underscoring the urgent need to improve the robustness of LLM-generated text detection systems. Code is available at https://github.com/shizhouxing/LLM-Detector-Robustness.

APA, Harvard, Vancouver, ISO, and other styles

8

Aman, Mussa. "Large Language Model Based Fake News Detection." Procedia Computer Science 231 (2024): 740–45. http://dx.doi.org/10.1016/j.procs.2023.12.144.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Singh, Pranaydeep, Orphée De Clercq, and Els Lefever. "Distilling Monolingual Models from Large Multilingual Transformers." Electronics 12, no. 4 (February 18, 2023): 1022. http://dx.doi.org/10.3390/electronics12041022.

Full text

Abstract:

Although language modeling has been trending upwards steadily, models available for low-resourced languages are limited to large multilingual models such as mBERT and XLM-RoBERTa, which come with significant overheads for deployment vis-à-vis their model size, inference speeds, etc. We attempt to tackle this problem by proposing a novel methodology to apply knowledge distillation techniques to filter language-specific information from a large multilingual model into a small, fast monolingual model that can often outperform the teacher model. We demonstrate the viability of this methodology on two downstream tasks each for six languages. We further dive into the possible modifications to the basic setup for low-resourced languages by exploring ideas to tune the final vocabulary of the distilled models. Lastly, we perform a detailed ablation study to understand the different components of the setup better and find out what works best for the two under-resourced languages, Swahili and Slovene.

APA, Harvard, Vancouver, ISO, and other styles

10

Beurer-Kellner, Luca, Marc Fischer, and Martin Vechev. "Prompting Is Programming: A Query Language for Large Language Models." Proceedings of the ACM on Programming Languages 7, PLDI (June 6, 2023): 1946–69. http://dx.doi.org/10.1145/3591300.

Full text

Abstract:

Large language models have demonstrated outstanding performance on a wide range of tasks such as question answering and code generation. On a high level, given an input, a language model can be used to automatically complete the sequence in a statistically-likely way. Based on this, users prompt these models with language instructions or examples, to implement a variety of downstream tasks. Advanced prompting methods can even imply interaction between the language model, a user, and external tools such as calculators. However, to obtain state-of-the-art performance or adapt language models for specific tasks, complex task- and model-specific programs have to be implemented, which may still require ad-hoc interaction. Based on this, we present the novel idea of Language Model Programming (LMP). LMP generalizes language model prompting from pure text prompts to an intuitive combination of text prompting and scripting. Additionally, LMP allows constraints to be specified over the language model output. This enables easy adaption to many tasks while abstracting language model internals and providing high-level semantics. To enable LMP, we implement LMQL (short for Language Model Query Language), which leverages the constraints and control flow from an LMP prompt to generate an efficient inference procedure that minimizes the number of expensive calls to the underlying language model. We show that LMQL can capture a wide range of state-of-the-art prompting methods in an intuitive way, especially facilitating interactive flows that are challenging to implement with existing high-level APIs. Our evaluation shows that we retain or increase the accuracy on several downstream tasks, while also significantly reducing the required amount of computation or cost in the case of pay-to-use APIs (26-85% cost savings).

APA, Harvard, Vancouver, ISO, and other styles

More sources

Dissertations / Theses on the topic "Large language model"

1

Jiang, Yuandong. "Large Scale Distributed Semantic N-gram Language Model." Wright State University / OhioLINK, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=wright1316200173.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Tang, Haijiang. "Building phrase based language model from large corpus /." View Abstract or Full-Text, 2002. http://library.ust.hk/cgi/db/thesis.pl?ELEC%202002%20TANG.

Full text

Abstract:

Thesis (M. Phil.)--Hong Kong University of Science and Technology, 2002.
Includes bibliographical references (leaves 74-79). Also available in electronic version. Access restricted to campus users.

APA, Harvard, Vancouver, ISO, and other styles

3

McGreevy, Michael. "Statistical language modelling for large vocabulary speech recognition." Thesis, Queensland University of Technology, 2006. https://eprints.qut.edu.au/16444/1/Michael_McGreevy_Thesis.pdf.

Full text

Abstract:

The move towards larger vocabulary Automatic Speech Recognition (ASR) systems places greater demands on language models. In a large vocabulary system, acoustic confusion is greater, thus there is more reliance placed on the language model for disambiguation. In addition to this, ASR systems are increasingly being deployed in situations where the speaker is not conscious of their interaction with the system, such as in recorded meetings and surveillance scenarios. This results in more natural speech, which contains many false starts and disfluencies. In this thesis we investigate a novel approach to the modelling of speech corrections. We propose a syntactic model of speech corrections, and seek to determine if this model can improve on the performance of standard language modelling approaches when applied to conversational speech. We investigate a number of related variations to our basic approach and compare these approaches against the class-based N-gram. We also investigate the modelling of styles of speech. Specifically, we investigate whether the incorporation of prior knowledge about sentence types can improve the performance of language models. We propose a sentence mixture model based on word-class N-grams, in which the sentence mixture models and the word-class membership probabilities are jointly trained. We compare this approach with word-based sentence mixture models.

APA, Harvard, Vancouver, ISO, and other styles

4

McGreevy, Michael. "Statistical language modelling for large vocabulary speech recognition." Queensland University of Technology, 2006. http://eprints.qut.edu.au/16444/.

Full text

Abstract:

The move towards larger vocabulary Automatic Speech Recognition (ASR) systems places greater demands on language models. In a large vocabulary system, acoustic confusion is greater, thus there is more reliance placed on the language model for disambiguation. In addition to this, ASR systems are increasingly being deployed in situations where the speaker is not conscious of their interaction with the system, such as in recorded meetings and surveillance scenarios. This results in more natural speech, which contains many false starts and disfluencies. In this thesis we investigate a novel approach to the modelling of speech corrections. We propose a syntactic model of speech corrections, and seek to determine if this model can improve on the performance of standard language modelling approaches when applied to conversational speech. We investigate a number of related variations to our basic approach and compare these approaches against the class-based N-gram. We also investigate the modelling of styles of speech. Specifically, we investigate whether the incorporation of prior knowledge about sentence types can improve the performance of language models. We propose a sentence mixture model based on word-class N-grams, in which the sentence mixture models and the word-class membership probabilities are jointly trained. We compare this approach with word-based sentence mixture models.

APA, Harvard, Vancouver, ISO, and other styles

5

Tan, Ming. "A Large Scale Distributed Syntactic, Semantic and Lexical Language Model for Machine Translation." Wright State University / OhioLINK, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=wright1386111950.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Susman, Derya. "Turkish Large Vocabulary Continuous Speech Recognition By Using Limited Audio Corpus." Master's thesis, METU, 2012. http://etd.lib.metu.edu.tr/upload/12614207/index.pdf.

Full text

Abstract:

Speech recognition in Turkish Language is a challenging problem in several perspectives. Most of the challenges are related to the morphological structure of the language. Since Turkish is an agglutinative language, it is possible to generate many words from a single stem by using suffixes. This characteristic of the language increases the out-of-vocabulary (OOV) words, which degrade the performance of a speech recognizer dramatically. Also, Turkish language allows words to be ordered in a free manner, which makes it difficult to generate robust language models. In this thesis, the existing models and approaches which address the problem of Turkish LVCSR (Large Vocabulary Continuous Speech Recognition) are explored. Different recognition units (words, morphs, stem and endings) are used in generating the n-gram language models. 3-gram and 4-gram language models are generated with respect to the recognition unit. Since the solution domain of speech recognition is involved with machine learning, the performance of the recognizer depends on the sufficiency of the audio data used in acoustic model training. However, it is difficult to obtain rich audio corpora for the Turkish language. In this thesis, existing approaches are used to solve the problem of Turkish LVCSR by using a limited audio corpus. We also proposed several data selection approaches in order to improve the robustness of the acoustic model.

APA, Harvard, Vancouver, ISO, and other styles

7

Comez, Murat Ali. "Large Vocabulary Continuous Speech Recogniton For Turkish Using Htk." Master's thesis, METU, 2003. http://etd.lib.metu.edu.tr/upload/1205491/index.pdf.

Full text

Abstract:

This study aims to build a new language model that can be used in a Turkish large vocabulary continuous speech recognition system. Turkish is a very productive language in terms of word forms because of its agglutinative nature. For such languages like Turkish, the vocabulary size is far from being acceptable. From only one simple stem, thousands of new word forms can be generated using inflectional or derivational suffixes. In this thesis, words are parsed into their stems and endings. One ending includes the suffixes attached to the associated root. Then the search network based on bigrams is constructed. Bigrams are obtained either using stem and endings, or using only stems. The language model proposed is based on bigrams obtained using only stems. All work is done in HTK (Hidden Markov Model Toolkit) environment, except parsing and network transforming. Besides of offering a new language model for Turkish, this study involves a comprehensive work about speech recognition inspecting into concepts in the state of the art speech recognition systems. To acquire good command of these concepts and processes in speech recognition isolated word, connected word and continuous speech recognition tasks are performed. The experimental results associated with these tasks are also given.

APA, Harvard, Vancouver, ISO, and other styles

8

Sagen, Markus. "Large-Context Question Answering with Cross-Lingual Transfer." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-440704.

Full text

Abstract:

Models based around the transformer architecture have become one of the most prominent for solving a multitude of natural language processing (NLP)tasks since its introduction in 2017. However, much research related to the transformer model has focused primarily on achieving high performance and many problems remain unsolved. Two of the most prominent currently are the lack of high performing non-English pre-trained models, and the limited number of words most trained models can incorporate for their context. Solving these problems would make NLP models more suitable for real-world applications, improving information retrieval, reading comprehension, and more. All previous research has focused on incorporating long-context for English language models. This thesis investigates the cross-lingual transferability between languages when only training for long-context in English. Training long-context models in English only could make long-context in low-resource languages, such as Swedish, more accessible since it is hard to find such data in most languages and costly to train for each language. This could become an efficient method for creating long-context models in other languages without the need for such data in all languages or pre-training from scratch. We extend the models’ context using the training scheme of the Longformer architecture and fine-tune on a question-answering task in several languages. Our evaluation could not satisfactorily confirm nor deny if transferring long-term context is possible for low-resource languages. We believe that using datasets that require long-context reasoning, such as a multilingual TriviaQAdataset, could demonstrate our hypothesis’s validity.

APA, Harvard, Vancouver, ISO, and other styles

9

Uzelac, Lawrence Stevan. "A Multiple Coupled Microstrip Transmission Line Model for High-Speed VLSI Interconnect Simulation." PDXScholar, 1991. https://pdxscholar.library.pdx.edu/open_access_etds/4526.

Full text

Abstract:

A model is presented which incorporates the advantages of a mixed mode simulation to characterize transmission line behavior in multiple coupled Transmission line systems. The model is intended for use by digital circuit designers who wish to be able to obtain accurate transmission line behavior for complex digital systems for which continuous time simulation tools such as SPICE would time prohibitive. The model uses a transverse electromagnetic wave approximation to obtain solutions to the basic transmission line equations. A modal analysis technique is used to solve for the attenuation and propagation constants for the transmission lines. Modal analysis done in the frequency domain after a Fast Fourier Transform of the time-domain input signals. Boundary conditions are obtained from the Thevinized transmission line input equivalent circuit and the transmission line output load impedance. The model uses a unique solution queue system that allows n-line coupled transmission lines to be solved without resorting to large order matrix methods or the need to diagonals larger matrices using linear transformations. This solution queue system is based on the method of solution superposition. As a result, the CPU time required for the model is primarily a function of the number of transitions and not the number of lines modeled. Incorporation of the model into event driven circuit simulators such as Network C is discussed. It will be shown that the solution queue methods used in this model make it ideally suited for incorporation into a event-driven simulation network. The model presented in this thesis can be scaled to incorporate direct electromagnetic coupling between first, second, or third lines adjacent to the line transitioning. It is shown that modeling strictly adjacent line coupling is adequate for typical digital technologies. It is shown that the model accurately reproduces the transmission line behavior of systems modeled by previous authors. Example transitions on a 8-line system are reviewed. Finally, future model improvements are discussed.

APA, Harvard, Vancouver, ISO, and other styles

10

Labeau, Matthieu. "Neural language models : Dealing with large vocabularies." Thesis, Université Paris-Saclay (ComUE), 2018. http://www.theses.fr/2018SACLS313/document.

Full text

Abstract:

Le travail présenté dans cette thèse explore les méthodes pratiques utilisées pour faciliter l'entraînement et améliorer les performances des modèles de langues munis de très grands vocabulaires. La principale limite à l'utilisation des modèles de langue neuronaux est leur coût computationnel: il dépend de la taille du vocabulaire avec laquelle il grandit linéairement. La façon la plus aisée de réduire le temps de calcul de ces modèles reste de limiter la taille du vocabulaire, ce qui est loin d'être satisfaisant pour de nombreuses tâches. La plupart des méthodes existantes pour l'entraînement de ces modèles à grand vocabulaire évitent le calcul de la fonction de partition, qui est utilisée pour forcer la distribution de sortie du modèle à être normalisée en une distribution de probabilités. Ici, nous nous concentrons sur les méthodes à base d'échantillonnage, dont le sampling par importance et l'estimation contrastive bruitée. Ces méthodes permettent de calculer facilement une approximation de cette fonction de partition. L'examen des mécanismes de l'estimation contrastive bruitée nous permet de proposer des solutions qui vont considérablement faciliter l'entraînement, ce que nous montrons expérimentalement. Ensuite, nous utilisons la généralisation d'un ensemble d'objectifs basés sur l'échantillonnage comme divergences de Bregman pour expérimenter avec de nouvelles fonctions objectif. Enfin, nous exploitons les informations données par les unités sous-mots pour enrichir les représentations en sortie du modèle. Nous expérimentons avec différentes architectures, sur le Tchèque, et montrons que les représentations basées sur les caractères permettent l'amélioration des résultats, d'autant plus lorsque l'on réduit conjointement l'utilisation des représentations de mots
This work investigates practical methods to ease training and improve performances of neural language models with large vocabularies. The main limitation of neural language models is their expensive computational cost: it depends on the size of the vocabulary, with which it grows linearly. Despite several training tricks, the most straightforward way to limit computation time is to limit the vocabulary size, which is not a satisfactory solution for numerous tasks. Most of the existing methods used to train large-vocabulary language models revolve around avoiding the computation of the partition function, ensuring that output scores are normalized into a probability distribution. Here, we focus on sampling-based approaches, including importance sampling and noise contrastive estimation. These methods allow an approximate computation of the partition function. After examining the mechanism of self-normalization in noise-contrastive estimation, we first propose to improve its efficiency with solutions that are adapted to the inner workings of the method and experimentally show that they considerably ease training. Our second contribution is to expand on a generalization of several sampling based objectives as Bregman divergences, in order to experiment with new objectives. We use Beta divergences to derive a set of objectives from which noise contrastive estimation is a particular case. Finally, we aim at improving performances on full vocabulary language models, by augmenting output words representation with subwords. We experiment on a Czech dataset and show that using character-based representations besides word embeddings for output representations gives better results. We also show that reducing the size of the output look-up table improves results even more

APA, Harvard, Vancouver, ISO, and other styles

More sources

Books on the topic "Large language model"

1

Satō, Hideto. A data model, knowledge base, and natural language processing for sharing a large statistical database. Ibaraki, Osaka, Japan: Institute of Social and Economic Research, Osaka University, 1989.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

2

Amaratunga, Thimira. Understanding Large Language Models. Berkeley, CA: Apress, 2023. http://dx.doi.org/10.1007/979-8-8688-0017-7.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Kucharavy, Andrei, Octave Plancherel, Valentin Mulder, Alain Mermoud, and Vincent Lenders, eds. Large Language Models in Cybersecurity. Cham: Springer Nature Switzerland, 2024. http://dx.doi.org/10.1007/978-3-031-54827-7.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Törnberg, Petter. How to Use Large-Language Models for Text Analysis. 1 Oliver’s Yard, 55 City Road, London EC1Y 1SP United Kingdom: SAGE Publications Ltd, 2024. http://dx.doi.org/10.4135/9781529683707.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Bashkatov, Alexander. Modeling in OpenSCAD: examples. ru: INFRA-M Academic Publishing LLC., 2019. http://dx.doi.org/10.12737/959073.

Full text

Abstract:

The tutorial is an introductory course to the study of the basics of geometric modeling for 3D printing using the programming language OpenSCAD and is built on the basis of descriptions of instructions for creating primitives, determining their properties, carrying out transformations and other service operations. It contains a large number of examples with detailed comments and description of the performed actions, which allows you to get basic skills in creating three-dimensional and flat models, exporting and importing graphical data. Meets the requirements of the Federal state educational standards of higher education of the last generation. It can be useful for computer science teachers, students, students and anyone who is interested in three-dimensional modeling and preparation of products for 3D printing.

APA, Harvard, Vancouver, ISO, and other styles

6

Build a Large Language Model (from Scratch). Manning Publications Co. LLC, 2024.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

7

Generative AI with LangChain: Build Large Language Model Apps with Python, ChatGPT and Other LLMs. Packt Publishing, Limited, 2023.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

8

Generative AI with LangChain: Build Large Language Model Apps with Python, ChatGPT, and Other LLMs. de Gruyter GmbH, Walter, 2023.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

9

Large Language Model-Based Solutions: How to Deliver Value with Cost-Effective Generative AI Applications. Wiley & Sons, Limited, John, 2024.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

10

Large Language Model-Based Solutions: How to Deliver Value with Cost-Effective Generative AI Applications. Wiley & Sons, Incorporated, John, 2024.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

More sources

Book chapters on the topic "Large language model"

1

Wu, Yonghui. "Large Language Model and Text Generation." In Cognitive Informatics in Biomedicine and Healthcare, 265–97. Cham: Springer International Publishing, 2024. http://dx.doi.org/10.1007/978-3-031-55865-8_10.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Ruiu, Dragos. "LLMs Red Teaming." In Large Language Models in Cybersecurity, 213–23. Cham: Springer Nature Switzerland, 2024. http://dx.doi.org/10.1007/978-3-031-54827-7_24.

Full text

Abstract:

AbstractPrompt red-teaming is a form of evaluation that involves testing machine learning models for vulnerabilities that could result in undesirable behaviors. It is similar to adversarial attacks, but red-teaming prompts appear like regular natural language prompts, and they reveal model limitations that can cause harmful user experiences or aid violence. Red-teaming can be resource-intensive due to the large search space required to search the prompt space of possible model failures. Augmenting the model with a classifier trained to predict potentially undesirable texts is a possible workaround. Red-teaming LLMs is a developing research area, and there is a need for best practices, including persuading people to harm themselves or others and other problematic behaviors, such as memorization, spam, weapons assembly instructions, and the generation of code with pre-defined vulnerabilities. The challenge with evaluating LLMs for malicious behaviors is that they are not explicitly trained to exhibit such behaviors. Therefore, it is critical to continually develop red-teaming methods that can adapt as models become more powerful. Multi-organization collaboration on datasets and best practices can enable smaller entities releasing models to still red-team their models before release, leading to a safer user experience across the board.

APA, Harvard, Vancouver, ISO, and other styles

3

Kucharavy, Andrei. "Overview of Existing LLM Families." In Large Language Models in Cybersecurity, 31–44. Cham: Springer Nature Switzerland, 2024. http://dx.doi.org/10.1007/978-3-031-54827-7_3.

Full text

Abstract:

AbstractWhile the general public discovered Large Language Models (LLMs) with ChatGPT—a generative autoregressive model, they are far from the only models in the LLM family. Various architectures and training regiments optimized for specific usages were designed throughout their development, which were then classified as different LLM families.

APA, Harvard, Vancouver, ISO, and other styles

4

Dolamic, Ljiljana. "Conversational Agents." In Large Language Models in Cybersecurity, 45–53. Cham: Springer Nature Switzerland, 2024. http://dx.doi.org/10.1007/978-3-031-54827-7_4.

Full text

Abstract:

AbstractConversational agents (CA) are engaged in interactive conversations with users, providing responses and assistance while combining Natural Language Processing (NLP), Understanding (NLU), and Generating (NLG) techniques. Two tiers of conversational agent derivation from Large Language Models (LLMs) exist. The first tier involves conversational fine-tuning from datasets, representing expected user questions and desired conversational agent responses. The second tier requires manual prompting by human operators and evaluation of model output, which is then used for further fine-tuning. Fine-tuning with Reinforcement Learning from Human Feedback (RLHF) models perform better but are resource-intensive and specific for each model. Another critical difference in the performance of various CA is their ability to access auxiliary services for task delegation.

APA, Harvard, Vancouver, ISO, and other styles

5

Kucharavy, Andrei. "Adapting LLMs to Downstream Applications." In Large Language Models in Cybersecurity, 19–29. Cham: Springer Nature Switzerland, 2024. http://dx.doi.org/10.1007/978-3-031-54827-7_2.

Full text

Abstract:

AbstractBy themselves, pretrained Large Language Models (LLMs) are interesting objects of study. However, they need to undergo a subsequent transfer learning phase to make them useful for downstream applications. While historically referred to as “fine-tuning,” the range of the tools available to LLMs users to better adapt base models to their applications is now significantly wider than the traditional fine-tuning. In order to provide the reader with an idea of the strengths and weaknesses of each method and allow them to pick one that would suit their needs best, an overview and classification of the most notable methods is provided, specifically the prompt optimization, pre-prompting and implicit prompting (system prompting), model coordination through actor agents, integration with auxiliary tools, parameter-efficient fine-tuning, further model pre-training, from-scratch retraining, and finally domain-specific distillation.

APA, Harvard, Vancouver, ISO, and other styles

6

Schillaci, Zachary. "On-Site Deployment of LLMs." In Large Language Models in Cybersecurity, 205–11. Cham: Springer Nature Switzerland, 2024. http://dx.doi.org/10.1007/978-3-031-54827-7_23.

Full text

Abstract:

AbstractAs consumer electronics and tensor computation for machine learning (ML) continue to advance, model execution and training become more accessible. NVIDIA introduced the RTX 4090 graphics cards, marketed initially as gamer-oriented products, in late 2022. Though relatively expensive for consumer use, their manufacturer’s suggested retail price (MSRP) of 1600 USD makes them affordable as a professional tool. These cards’ extensive video random access memory (vRAM), computational power comparable to last-generation flagship professional cards, and ability to use single-byte floats enable a pair of them to train, fine-tune, and run on-premises Large Language Models (LLMs) with up to 7 billion parameters per card. Until this release, such a feat would have required data center-level equipment. Although the RTX 4090 and H100 GPU represent a qualitative step forward, iterative improvements combined with the speculated lowering of computational precision to half-byte floats could make larger models even more accessible for on-premises use. This development might, in one aspect, lower the entry barrier for cyberattackers, simplifying the process for advanced persistent threats (APTs) to camouflage their activities amidst unsophisticated attackers or those employing generative LLMs for non-malicious purposes. Conversely, as an alternative to cloud-hosted models, on-site LLMs may limit the possibility of private information leakage or model poisoning while offering specialized capabilities for legitimate users.

APA, Harvard, Vancouver, ISO, and other styles

7

Kurimo, Mikko, and Krista Lagus. "An Efficiently Focusing Large Vocabulary Language Model." In Artificial Neural Networks — ICANN 2002, 1068–73. Berlin, Heidelberg: Springer Berlin Heidelberg, 2002. http://dx.doi.org/10.1007/3-540-46084-5_173.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Ji, Jianchao, Zelong Li, Shuyuan Xu, Wenyue Hua, Yingqiang Ge, Juntao Tan, and Yongfeng Zhang. "GenRec: Large Language Model for Generative Recommendation." In Lecture Notes in Computer Science, 494–502. Cham: Springer Nature Switzerland, 2024. http://dx.doi.org/10.1007/978-3-031-56063-7_42.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Majumdar, Subhabrata, and Terry Vogelsang. "Towards Safe LLMs Integration." In Large Language Models in Cybersecurity, 243–47. Cham: Springer Nature Switzerland, 2024. http://dx.doi.org/10.1007/978-3-031-54827-7_27.

Full text

Abstract:

AbstractLLMs face a critical vulnerability known as sandbox breakout, where attackers bypass the system designers’ limitations to prevent malicious access to the resources for which the LLM agent is a user interface. Thus, they can access the system and potentially steal data, change the interaction with other users, or inject malicious code or contents into underlying databases. Therefore, it is essential to identify and address vulnerabilities that could be exploited to break out of the sandbox. These vulnerabilities could exist in the sandbox, the operating system, or the LLM’s software dependencies. To mitigate the risk of LLM sandbox breakout, robust security measures, such as regular model updates, automated model red-teaming, testing, and access control policies, must be implemented. In addition, sandboxing should be enforced at multiple levels to reduce the attack surface and prevent attackers from accessing critical systems. By implementing these measures, the risk of LLM sandbox breakout can be significantly reduced, and the security and reliability of LLM-based applications can be improved.

APA, Harvard, Vancouver, ISO, and other styles

10

Majumdar, Subhabrata. "Standards for LLM Security." In Large Language Models in Cybersecurity, 225–31. Cham: Springer Nature Switzerland, 2024. http://dx.doi.org/10.1007/978-3-031-54827-7_25.

Full text

Abstract:

AbstractThe National Institute of Standards and Technology (NIST) is a recognized authority on computer security that publishes guidelines and standards for a broad range of technologies, including artificial intelligence (AI). The guidelines include the requirement for LLM decision-making transparency, explainability, testing, and validation to guarantee model reliability and security. Moreover, the NIST has also created standards for cryptography, a critical element of many LLM-based applications, such as secure communication and data encryption. The cryptography standards help ensure that LLM-based applications are secure and resilient against attacks by malicious entities. NIST standards can provide a practical framework for secure and ethical LLM-based application development and deployment. By adhering to these standards, developers and organizations can increase the confidence that their LLM-based applications are dependable, trustworthy, and resistant to attacks.

APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Large language model"

1

Huang, Jiaji, Yi Li, Wei Ping, and Liang Huang. "Large Margin Neural Language Model." In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA, USA: Association for Computational Linguistics, 2018. http://dx.doi.org/10.18653/v1/d18-1150.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Chen, Kua, Yujing Yang, Boqi Chen, José Antonio Hernández López, Gunter Mussbacher, and Dániel Varró. "Automated Domain Modeling with Large Language Models: A Comparative Study." In 2023 ACM/IEEE 26th International Conference on Model Driven Engineering Languages and Systems (MODELS). IEEE, 2023. http://dx.doi.org/10.1109/models58315.2023.00037.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Meng, Ruijie, Martin Mirchev, Marcel Böhme, and Abhik Roychoudhury. "Large Language Model guided Protocol Fuzzing." In Network and Distributed System Security Symposium. Reston, VA: Internet Society, 2024. http://dx.doi.org/10.14722/ndss.2024.24556.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

HASHIMOTO, Tomomi. "Ethical Judgment using Large Language Model." In 2024 16th International Conference on Computer and Automation Engineering (ICCAE). IEEE, 2024. http://dx.doi.org/10.1109/iccae59995.2024.10569797.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Singh, Aditi, Saket Kumar, Abul Ehtesham, Tala Talaei Khoei, and Deepshikha Bhati. "Large Language Model-Driven Immersive Agent." In 2024 IEEE World AI IoT Congress (AIIoT). IEEE, 2024. http://dx.doi.org/10.1109/aiiot61789.2024.10578948.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Galindo, José A., Antonio J. Dominguez, Jules White, and David Benavides. "Large Language Models to generate meaningful feature model instances." In SPLC '23: 27th ACM International Systems and Software Product Line Conference. New York, NY, USA: ACM, 2023. http://dx.doi.org/10.1145/3579027.3608973.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Stammbach, Dominik, Vilém Zouhar, Alexander Hoyle, Mrinmaya Sachan, and Elliott Ash. "Revisiting Automated Topic Model Evaluation with Large Language Models." In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA, USA: Association for Computational Linguistics, 2023. http://dx.doi.org/10.18653/v1/2023.emnlp-main.581.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Zhao, James, Yuxi Xie, Kenji Kawaguchi, Junxian He, and Michael Xie. "Automatic Model Selection with Large Language Models for Reasoning." In Findings of the Association for Computational Linguistics: EMNLP 2023. Stroudsburg, PA, USA: Association for Computational Linguistics, 2023. http://dx.doi.org/10.18653/v1/2023.findings-emnlp.55.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Xu, Austin, Will Monroe, and Klinton Bicknell. "Large language model augmented exercise retrieval for personalized language learning." In LAK '24: The 14th Learning Analytics and Knowledge Conference. New York, NY, USA: ACM, 2024. http://dx.doi.org/10.1145/3636555.3636883.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Mysore, Sheshera, Andrew Mccallum, and Hamed Zamani. "Large Language Model Augmented Narrative Driven Recommendations." In RecSys '23: Seventeenth ACM Conference on Recommender Systems. New York, NY, USA: ACM, 2023. http://dx.doi.org/10.1145/3604915.3608829.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "Large language model"

1

Seymore, Kristie, and Ronald Rosenfeld. Large-Scale Topic Detection and Language Model Adaptation. Fort Belvoir, VA: Defense Technical Information Center, June 1997. http://dx.doi.org/10.21236/ada327553.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Zhang, Hao. Large Language Model (LLM) Monthly Report (2024 Apr). ResearchHub Technologies, Inc., May 2024. http://dx.doi.org/10.55277/researchhub.0ps6xenm.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Sun, Ruiqi, and Daniel Trefler. The Impact of AI and Cross-Border Data Regulation on International Trade in Digital Services: A Large Language Model. Cambridge, MA: National Bureau of Economic Research, November 2023. http://dx.doi.org/10.3386/w31925.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Lavadenz, Magaly, Sheila Cassidy, Elvira G. Armas, Rachel Salivar, Grecya V. Lopez, and Amanda A. Ross. Sobrato Early Academic Language (SEAL) Model: Final Report of Findings from a Four-Year Study. Center for Equity for English Learners, Loyola Marymount University, 2020. http://dx.doi.org/10.15365/ceel.seal2020.

Full text

Abstract:

The Sobrato Early Academic Language (SEAL) Model Research and Evaluation Final Report is comprised of three sets of studies that took place between 2015 and 2019 to examine the effectiveness of the SEAL Model in 67 schools within 12 districts across the state of California. Over a decade ago, the Sobrato Family Foundation responded to the enduring opportunity gaps and low academic outcomes for the state’s 1.2 million English Learners by investing in the design of the SEAL Model. The SEAL PreK–Grade 3 Model was created as a whole-school initiative to develop students’ language, literacy, and academic skills. The pilot study revealed promising findings, and the large-scale implementation of SEAL was launched in 2013. This report addresses a set of research questions and corresponding studies focused on: 1) the perceptions of school and district-level leaders regarding district and school site implementation of the SEAL Model, 2) teachers’ development and practices, and 3) student outcomes. The report is organized in five sections, within which are twelve research briefs that address the three areas of study. Technical appendices are included in each major section. A developmental evaluation process with mixed methods research design was used to answer the research questions. Key findings indicate that the implementation of the SEAL Model has taken root in many schools and districts where there is evidence of systemic efforts or instructional improvement for the English Learners they serve. In regards to teachers’ development and practices, there were statistically significant increases in the use of research-based practices for English Learners. Teachers indicated a greater sense of efficacy in addressing the needs of this population and believe the model has had a positive impact on their knowledge and skills to support the language and literacy development of PreK- Grade 3 English Learners. Student outcome data reveal that despite SEAL schools averaging higher rates of poverty compared to the statewide rate, SEAL English Learners in grades 2–4 performed comparably or better than California English Learners in developing their English proficiency; additional findings show that an overwhelming majority of SEAL students are rapidly progressing towards proficiency thus preventing them from becoming long-term English Learners. English Learners in bilingual programs advanced in their development of Spanish, while other English Learners suffered from language loss in Spanish. The final section of the report provides considerations and implications for further SEAL replication, sustainability, additional research and policy.

APA, Harvard, Vancouver, ISO, and other styles

5

Prasad, Jayanti. Large Language Models: AI Foundations and Applications in Python. Instats Inc., 2023. http://dx.doi.org/10.61700/85rfezw01y0q9521.

Full text

Abstract:

This 5-day workshop provides a comprehensive understanding of large language models, their AI foundations, and applications in Python. Designed for PhD students, professors, and professional researchers, the seminar offers hands-on coding sessions, case studies, and discussions on the future of large language models in academic research.

APA, Harvard, Vancouver, ISO, and other styles

6

Alonso-Robisco, Andres, and Jose Manuel Carbo. Analysis of CBDC Narrative OF Central Banks using Large Language Models. Madrid: Banco de España, August 2023. http://dx.doi.org/10.53479/33412.

Full text

Abstract:

Central banks are increasingly using verbal communication for policymaking, focusing not only on traditional monetary policy, but also on a broad set of topics. One such topic is central bank digital currency (CBDC), which is attracting attention from the international community. The complex nature of this project means that it must be carefully designed to avoid unintended consequences, such as financial instability. We propose the use of different Natural Language Processing (NLP) techniques to better understand central banks’ stance towards CBDC, analyzing a set of central bank discourses from 2016 to 2022. We do this using traditional techniques, such as dictionary-based methods, and two large language models (LLMs), namely Bert and ChatGPT, concluding that LLMs better reflect the stance identified by human experts. In particular, we observe that ChatGPT exhibits a higher degree of alignment because it can capture subtler information than BERT. Our study suggests that LLMs are an effective tool to improve sentiment measurements for policy-specific texts, though they are not infallible and may be subject to new risks, like higher sensitivity to the length of texts, and prompt engineering.

APA, Harvard, Vancouver, ISO, and other styles

7

Marra de Artiñano, Ignacio, Franco Riottini Depetris, and Christian Volpe Martincus. Automatic Product Classification in International Trade: Machine Learning and Large Language Models. Inter-American Development Bank, July 2023. http://dx.doi.org/10.18235/0005012.

Full text

Abstract:

Accurately classifying products is essential in international trade. Virtually all countries categorize products into tariff lines using the Harmonized System (HS) nomenclature for both statistical and duty collection purposes. In this paper, we apply and assess several different algorithms to automatically classify products based on text descriptions. To do so, we use agricultural product descriptions from several public agencies, including customs authorities and the United States Department of Agriculture (USDA). We find that while traditional machine learning (ML) models tend to perform well within the dataset in which they were trained, their precision drops dramatically when implemented outside of it. In contrast, large language models (LLMs) such as GPT 3.5 show a consistently good performance across all datasets, with accuracy rates ranging between 60% and 90% depending on HS aggregation levels. Our analysis highlights the valuable role that artificial intelligence (AI) can play in facilitating product classification at scale and, more generally, in enhancing the categorization of unstructured data.

APA, Harvard, Vancouver, ISO, and other styles

8

Windsor, Callan, and Max Zang. Firms' Price-setting Behaviour: Insights from Earnings Calls. Reserve Bank of Australia, September 2023. http://dx.doi.org/10.47688/rdp2023-06.

Full text

Abstract:

We introduce new firm-level indices covering input costs, demand and final prices based on listed Australian firms' earnings calls going back to 2007. These indices are constructed using a powerful transformer-based large language model. We show the new indices track current economic conditions, consistent with a simple conceptual framework we use to explain why there is real-time information in firms' earnings calls. Focusing on firms' price-setting behaviour, the reduced-form associations we estimate appear to show that discussions around final prices have become more sensitive to import costs but less sensitive to labour costs in the period since 2021. This is after controlling for changes in the operating environment that are common to all firms, including global supply shocks. Firms' price-setting sentiment also appears more sensitive to rising input costs compared to falling costs, suggesting that prices could remain front-of-mind for company executives even as supply pressures ease.

APA, Harvard, Vancouver, ISO, and other styles

9

Horton, John. Large Language Models as Simulated Economic Agents: What Can We Learn from Homo Silicus? Cambridge, MA: National Bureau of Economic Research, April 2023. http://dx.doi.org/10.3386/w31122.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Gluckman, Peter, and Hema Sridhar. A framework for evaluating rapidly developing digital and related technologies: AI, Large Language Models and beyond. International Science Council, October 2023. http://dx.doi.org/10.24948/2023.11.

Full text

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!