To see the other types of publications on this topic, follow the link: Large Language Models (LLMs).

Journal articles on the topic 'Large Language Models (LLMs)'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Large Language Models (LLMs).'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Zhang, Tianyi, Faisal Ladhak, Esin Durmus, Percy Liang, Kathleen McKeown, and Tatsunori B. Hashimoto. "Benchmarking Large Language Models for News Summarization." Transactions of the Association for Computational Linguistics 12 (2024): 39–57. http://dx.doi.org/10.1162/tacl_a_00632.

Full text
Abstract:
Abstract Large language models (LLMs) have shown promise for automatic summarization but the reasons behind their successes are poorly understood. By conducting a human evaluation on ten LLMs across different pretraining methods, prompts, and model scales, we make two important observations. First, we find instruction tuning, not model size, is the key to the LLM’s zero-shot summarization capability. Second, existing studies have been limited by low-quality references, leading to underestimates of human performance and lower few-shot and finetuning performance. To better evaluate LLMs, we perform human evaluation over high-quality summaries we collect from freelance writers. Despite major stylistic differences such as the amount of paraphrasing, we find that LLM summaries are judged to be on par with human written summaries.
APA, Harvard, Vancouver, ISO, and other styles
2

Hamaniuk, Vita A. "The potential of Large Language Models in language education." Educational Dimension 5 (December 9, 2021): 208–10. http://dx.doi.org/10.31812/ed.650.

Full text
Abstract:
This editorial explores the potential of Large Language Models (LLMs) in language education. It discusses the role of LLMs in machine translation, the concept of ‘prompt programming’, and the inductive bias of LLMs for abstract textual reasoning. The editorial also highlights using LLMs as creative writing tools and their effectiveness in paraphrasing tasks. It concludes by emphasizing the need for responsible and ethical use of these tools in language education.
APA, Harvard, Vancouver, ISO, and other styles
3

Yang, Jidong. "Large language models privacy and security." Applied and Computational Engineering 76, no. 1 (July 16, 2024): 177–88. http://dx.doi.org/10.54254/2755-2721/76/20240584.

Full text
Abstract:
The advancement of large language models (LLMs) has yielded significant advancements across various domains. Nevertheless, this progress has also raised crucial concerns regarding privacy and security. The paper does a comprehensive literature study to thoroughly examine the fundamental principles of LLM. It also provides a detailed examination of the characteristics and application fields of various LLMs, with a particular focus on Transformer. Furthermore, this study places emphasis on the examination of privacy concerns that may emerge in the context of LLM's handling of personal and sensitive data. It also explores the potential hazards associated with information leakage and misuse, as well as the existing privacy safeguards and the obstacles encountered in their implementation. Overall, LLM has made significant advancements in technology. However, it is imperative to acknowledge the importance of doing research on safeguarding privacy and enhancing security. These aspects are vital for guaranteeing the sustained development and public confidence in LLM technology.
APA, Harvard, Vancouver, ISO, and other styles
4

Huang, Dawei, Chuan Yan, Qing Li, and Xiaojiang Peng. "From Large Language Models to Large Multimodal Models: A Literature Review." Applied Sciences 14, no. 12 (June 11, 2024): 5068. http://dx.doi.org/10.3390/app14125068.

Full text
Abstract:
With the deepening of research on Large Language Models (LLMs), significant progress has been made in recent years on the development of Large Multimodal Models (LMMs), which are gradually moving toward Artificial General Intelligence. This paper aims to summarize the recent progress from LLMs to LMMs in a comprehensive and unified way. First, we start with LLMs and outline various conceptual frameworks and key techniques. Then, we focus on the architectural components, training strategies, fine-tuning guidance, and prompt engineering of LMMs, and present a taxonomy of the latest vision–language LMMs. Finally, we provide a summary of both LLMs and LMMs from a unified perspective, make an analysis of the development status of large-scale models in the view of globalization, and offer potential research directions for large-scale models.
APA, Harvard, Vancouver, ISO, and other styles
5

Kumar, Deepak, Yousef Anees AbuHashem, and Zakir Durumeric. "Watch Your Language: Investigating Content Moderation with Large Language Models." Proceedings of the International AAAI Conference on Web and Social Media 18 (May 28, 2024): 865–78. http://dx.doi.org/10.1609/icwsm.v18i1.31358.

Full text
Abstract:
Large language models (LLMs) have exploded in popularity due to their ability to perform a wide array of natural language tasks. Text-based content moderation is one LLM use case that has received recent enthusiasm, however, there is little research investigating how LLMs can help in content moderation settings. In this work, we evaluate a suite of commodity LLMs on two common content moderation tasks: rule-based community moderation and toxic content detection. For rule-based community moderation, we instantiate 95 subcommunity specific LLMs by prompting GPT-3.5 with rules from 95 Reddit subcommunities. We find that GPT-3.5 is effective at rule-based moderation for many communities, achieving a median accuracy of 64% and a median precision of 83%. For toxicity detection, we evaluate a range of LLMs (GPT-3, GPT-3.5, GPT-4, Gemini Pro, LLAMA 2) and show that LLMs significantly outperform currently widespread toxicity classifiers. However, we also found that increases in model size add only marginal benefit to toxicity detection, suggesting a potential performance plateau for LLMs on toxicity detection tasks. We conclude by outlining avenues for future work in studying LLMs and content moderation.
APA, Harvard, Vancouver, ISO, and other styles
6

Pendyala, Vishnu S., and Christopher E. Hall. "Explaining Misinformation Detection Using Large Language Models." Electronics 13, no. 9 (April 26, 2024): 1673. http://dx.doi.org/10.3390/electronics13091673.

Full text
Abstract:
Large language models (LLMs) are a compressed repository of a vast corpus of valuable information on which they are trained. Therefore, this work hypothesizes that LLMs such as Llama, Orca, Falcon, and Mistral can be used for misinformation detection by making them cross-check new information with the repository on which they are trained. Accordingly, this paper describes the findings from the investigation of the abilities of LLMs in detecting misinformation on multiple datasets. The results are interpreted using explainable AI techniques such as Local Interpretable Model-Agnostic Explanations (LIME), SHapley Additive exPlanations (SHAP), and Integrated Gradients. The LLMs themselves are also asked to explain their classification. These complementary approaches aid in better understanding the inner workings of misinformation detection using LLMs and lead to conclusions about their effectiveness at the task. The methodology is generic and nothing specific is assumed for any of the LLMs, so the conclusions apply generally. Primarily, when it comes to misinformation detection, the experiments show that the LLMs are limited by the data on which they are trained.
APA, Harvard, Vancouver, ISO, and other styles
7

Cheng, Jerome. "Applications of Large Language Models in Pathology." Bioengineering 11, no. 4 (March 31, 2024): 342. http://dx.doi.org/10.3390/bioengineering11040342.

Full text
Abstract:
Large language models (LLMs) are transformer-based neural networks that can provide human-like responses to questions and instructions. LLMs can generate educational material, summarize text, extract structured data from free text, create reports, write programs, and potentially assist in case sign-out. LLMs combined with vision models can assist in interpreting histopathology images. LLMs have immense potential in transforming pathology practice and education, but these models are not infallible, so any artificial intelligence generated content must be verified with reputable sources. Caution must be exercised on how these models are integrated into clinical practice, as these models can produce hallucinations and incorrect results, and an over-reliance on artificial intelligence may lead to de-skilling and automation bias. This review paper provides a brief history of LLMs and highlights several use cases for LLMs in the field of pathology.
APA, Harvard, Vancouver, ISO, and other styles
8

Chu, Zhibo, Zichong Wang, and Wenbin Zhang. "Fairness in Large Language Models: A Taxonomic Survey." ACM SIGKDD Explorations Newsletter 26, no. 1 (July 24, 2024): 34–48. http://dx.doi.org/10.1145/3682112.3682117.

Full text
Abstract:
Large Language Models (LLMs) have demonstrated remarkable success across various domains. However, despite their promising performance in numerous real-world applications, most of these algorithms lack fairness considerations. Consequently, they may lead to discriminatory outcomes against certain communities, particularly marginalized populations, prompting extensive study in fair LLMs. On the other hand, fairness in LLMs, in contrast to fairness in traditional machine learning, entails exclusive backgrounds, taxonomies, and fulfillment techniques. To this end, this survey presents a comprehensive overview of recent advances in the existing literature concerning fair LLMs. Specifically, a brief introduction to LLMs is provided, followed by an analysis of factors contributing to bias in LLMs. Additionally, the concept of fairness in LLMs is discussed categorically, summarizing metrics for evaluating bias in LLMs and existing algorithms for promoting fairness. Furthermore, resources for evaluating bias in LLMs, including toolkits and datasets, are summarized. Finally, existing research challenges and open questions are discussed.
APA, Harvard, Vancouver, ISO, and other styles
9

Lin, Hsiao-Ying, and Jeffrey Voas. "Lower Energy Large Language Models (LLMs)." Computer 56, no. 10 (October 2023): 14–16. http://dx.doi.org/10.1109/mc.2023.3278160.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Long, Robert. "Introspective Capabilities in Large Language Models." Journal of Consciousness Studies 30, no. 9 (September 30, 2023): 143–53. http://dx.doi.org/10.53765/20512201.30.9.143.

Full text
Abstract:
This paper considers the kind of introspection that large language models (LLMs) might be able to have. It argues that LLMs, while currently limited in their introspective capabilities, are not inherently unable to have such capabilities: they already model the world, including mental concepts, and already have some introspection-like capabilities. With deliberate training, LLMs may develop introspective capabilities. The paper proposes a method for such training for introspection, situates possible LLM introspection in the 'possible forms of introspection' framework proposed by Kammerer and Frankish, and considers the ethical ramifications of introspection and self-report in AI systems.
APA, Harvard, Vancouver, ISO, and other styles
11

Dahl, Matthew, Varun Magesh, Mirac Suzgun, and Daniel E. Ho. "Large Legal Fictions: Profiling Legal Hallucinations in Large Language Models." Journal of Legal Analysis 16, no. 1 (January 1, 2024): 64–93. http://dx.doi.org/10.1093/jla/laae003.

Full text
Abstract:
Abstract Do large language models (LLMs) know the law? LLMs are increasingly being used to augment legal practice, education, and research, yet their revolutionary potential is threatened by the presence of “hallucinations”—textual output that is not consistent with legal facts. We present the first systematic evidence of these hallucinations in public-facing LLMs, documenting trends across jurisdictions, courts, time periods, and cases. Using OpenAI’s ChatGPT 4 and other public models, we show that LLMs hallucinate at least 58% of the time, struggle to predict their own hallucinations, and often uncritically accept users’ incorrect legal assumptions. We conclude by cautioning against the rapid and unsupervised integration of popular LLMs into legal tasks, and we develop a typology of legal hallucinations to guide future research in this area.
APA, Harvard, Vancouver, ISO, and other styles
12

Shah, Asghar, Samer Wahood, Dorra Guermazi, Candice E. Brem, and Elie Saliba. "Skin and Syntax: Large Language Models in Dermatopathology." Dermatopathology 11, no. 1 (February 14, 2024): 101–11. http://dx.doi.org/10.3390/dermatopathology11010009.

Full text
Abstract:
This literature review introduces the integration of Large Language Models (LLMs) in the field of dermatopathology, outlining their potential benefits, challenges, and prospects. It discusses the changing landscape of dermatopathology with the emergence of LLMs. The potential advantages of LLMs include a streamlined generation of pathology reports, the ability to learn and provide up-to-date information, and simplified patient education. Existing instances of LLMs encompass diagnostic support, research acceleration, and trainee education. Challenges involve biases, data privacy and quality, and establishing a balance between AI and dermatopathological expertise. Prospects include the integration of LLMs with other AI technologies to improve diagnostics and the improvement of multimodal LLMs that can handle both text and image input. Our implementation guidelines highlight the importance of model transparency and interpretability, data quality, and continuous oversight. The transformative potential of LLMs in dermatopathology is underscored, with an emphasis on a dynamic collaboration between artificial intelligence (AI) experts (technical specialists) and dermatopathologists (clinicians) for improved patient outcomes.
APA, Harvard, Vancouver, ISO, and other styles
13

Viswanathan, Vijay, Kiril Gashteovski, Kiril Gashteovski, Carolin Lawrence, Tongshuang Wu, and Graham Neubig. "Large Language Models Enable Few-Shot Clustering." Transactions of the Association for Computational Linguistics 12 (2024): 321–33. http://dx.doi.org/10.1162/tacl_a_00648.

Full text
Abstract:
Abstract Unlike traditional unsupervised clustering, semi-supervised clustering allows users to provide meaningful structure to the data, which helps the clustering algorithm to match the user’s intent. Existing approaches to semi-supervised clustering require a significant amount of feedback from an expert to improve the clusters. In this paper, we ask whether a large language model (LLM) can amplify an expert’s guidance to enable query-efficient, few-shot semi-supervised text clustering. We show that LLMs are surprisingly effective at improving clustering. We explore three stages where LLMs can be incorporated into clustering: before clustering (improving input features), during clustering (by providing constraints to the clusterer), and after clustering (using LLMs post-correction). We find that incorporating LLMs in the first two stages routinely provides significant improvements in cluster quality, and that LLMs enable a user to make trade-offs between cost and accuracy to produce desired clusters. We release our code and LLM prompts for the public to use.1
APA, Harvard, Vancouver, ISO, and other styles
14

Lapid, Raz, Ron Langberg, and Moshe Sipper. "Open Sesame! Universal Black-Box Jailbreaking of Large Language Models." Applied Sciences 14, no. 16 (August 14, 2024): 7150. http://dx.doi.org/10.3390/app14167150.

Full text
Abstract:
Large language models (LLMs), designed to provide helpful and safe responses, often rely on alignment techniques to align with user intent and social guidelines. Unfortunately, this alignment can be exploited by malicious actors seeking to manipulate an LLM’s outputs for unintended purposes. In this paper, we introduce a novel approach that employs a genetic algorithm (GA) to manipulate LLMs when model architecture and parameters are inaccessible. The GA attack works by optimizing a universal adversarial prompt that—when combined with a user’s query—disrupts the attacked model’s alignment, resulting in unintended and potentially harmful outputs. Our novel approach systematically reveals a model’s limitations and vulnerabilities by uncovering instances where its responses deviate from expected behavior. Through extensive experiments, we demonstrate the efficacy of our technique, thus contributing to the ongoing discussion on responsible AI development by providing a diagnostic tool for evaluating and enhancing alignment of LLMs with human intent. To our knowledge, this is the first automated universal black-box jailbreak attack.
APA, Harvard, Vancouver, ISO, and other styles
15

Gomez, Alejandro Pradas, Petter Krus, Massimo Panarotto, and Ola Isaksson. "Large language models in complex system design." Proceedings of the Design Society 4 (May 2024): 2197–206. http://dx.doi.org/10.1017/pds.2024.222.

Full text
Abstract:
AbstractThis paper investigates the use of Large Language Models (LLMs) in engineering complex systems, demonstrating how they can support designers on detail design phases. Two aerospace cases, a system architecture definition and a CAD model generation activities are studied. The research reveals LLMs' challenges and opportunities to support designers, and future research areas to further improve their application in engineering tasks. It emphasizes the new paradigm of LLMs support compared to traditional Machine Learning techniques, as they can successfully perform tasks with just a few examples.
APA, Harvard, Vancouver, ISO, and other styles
16

Gao, Yingming, Baorian Nuchged, Ya Li, and Linkai Peng. "An Investigation of Applying Large Language Models to Spoken Language Learning." Applied Sciences 14, no. 1 (December 26, 2023): 224. http://dx.doi.org/10.3390/app14010224.

Full text
Abstract:
People have long desired intelligent conversational systems that can provide assistance in practical scenarios. The latest advancements in large language models (LLMs) are making significant strides toward turning this aspiration into a tangible reality. LLMs are believed to hold the most potential and value in education, especially in the creation of AI-driven virtual teachers that facilitate language learning. This study focuses on assessing the effectiveness of LLMs within the educational domain, specifically in the areas of spoken language learning, which encompass phonetics, phonology, and second language acquisition. To this end, we first introduced a new multiple-choice question dataset to evaluate the effectiveness of LLMs in the aforementioned scenarios, including the understanding and application of spoken language knowledge. Moreover, we investigated the influence of various prompting techniques such as zero- and few-shot methods (prepending the question with question-answer exemplars), chain-of-thought (CoT) prompting, in-domain exemplars, and external tools. We conducted a comprehensive evaluation of popular LLMs (20 distinct models) using these methods. The experimental results showed that the task of extracting conceptual knowledge posed few challenges for these LLMs, whereas the task of application questions was relatively difficult. In addition, some widely proven effective prompting methods combined with domain-specific examples resulted in significant performance improvements compared to the zero-shot baselines. Additionally, some other preliminary experiments also demonstrated the strengths and weaknesses of different LLMs. The findings of this study can shed light on the application of LLMs to spoken language learning.
APA, Harvard, Vancouver, ISO, and other styles
17

Shi, Zhouxing, Yihan Wang, Fan Yin, Xiangning Chen, Kai-Wei Chang, and Cho-Jui Hsieh. "Red Teaming Language Model Detectors with Language Models." Transactions of the Association for Computational Linguistics 12 (2024): 174–89. http://dx.doi.org/10.1162/tacl_a_00639.

Full text
Abstract:
Abstract The prevalence and strong capability of large language models (LLMs) present significant safety and ethical risks if exploited by malicious users. To prevent the potentially deceptive usage of LLMs, recent work has proposed algorithms to detect LLM-generated text and protect LLMs. In this paper, we investigate the robustness and reliability of these LLM detectors under adversarial attacks. We study two types of attack strategies: 1) replacing certain words in an LLM’s output with their synonyms given the context; 2) automatically searching for an instructional prompt to alter the writing style of the generation. In both strategies, we leverage an auxiliary LLM to generate the word replacements or the instructional prompt. Different from previous works, we consider a challenging setting where the auxiliary LLM can also be protected by a detector. Experiments reveal that our attacks effectively compromise the performance of all detectors in the study with plausible generations, underscoring the urgent need to improve the robustness of LLM-generated text detection systems. Code is available at https://github.com/shizhouxing/LLM-Detector-Robustness.
APA, Harvard, Vancouver, ISO, and other styles
18

Fan, Ju, Zihui Gu, Songyue Zhang, Yuxin Zhang, Zui Chen, Lei Cao, Guoliang Li, Samuel Madden, Xiaoyong Du, and Nan Tang. "Combining Small Language Models and Large Language Models for Zero-Shot NL2SQL." Proceedings of the VLDB Endowment 17, no. 11 (July 2024): 2750–63. http://dx.doi.org/10.14778/3681954.3681960.

Full text
Abstract:
Zero-shot natural language to SQL (NL2SQL) aims to generalize pretrained NL2SQL models to new environments ( e.g. , new databases and new linguistic phenomena) without any annotated NL2SQL samples from these environments. Existing approaches either use small language models (SLMs) like BART and T5, or prompt large language models (LLMs). However, SLMs may struggle with complex natural language reasoning, and LLMs may not precisely align schemas to identify the correct columns or tables. In this paper, we propose a ZeroNL2SQL framework, which divides NL2SQL into smaller sub-tasks and utilizes both SLMs and LLMs. ZeroNL2SQL first fine-tunes SLMs for better generalizability in SQL structure identification and schema alignment, producing an SQL sketch. It then uses LLMs's language reasoning capability to fill in the missing information in the SQL sketch. To support ZeroNL2SQL, we propose novel database serialization and question-aware alignment methods for effective sketch generation using SLMs. Additionally, we devise a multi-level matching strategy to recommend the most relevant values to LLMs, and select the optimal SQL query via an execution-based strategy. Comprehensive experiments show that ZeroNL2SQL achieves the best zero-shot NL2SQL performance on benchmarks, i.e. , outperforming the state-of-the-art SLM-based methods by 5.5% to 16.4% and exceeding LLM-based methods by 10% to 20% on execution accuracy.
APA, Harvard, Vancouver, ISO, and other styles
19

Yazi Gholami. "Large Language Models (LLMs) for Cybersecurity: A Systematic Review." World Journal of Advanced Engineering Technology and Sciences 13, no. 1 (September 30, 2024): 057–69. http://dx.doi.org/10.30574/wjaets.2024.13.1.0395.

Full text
Abstract:
The rapid evolution of artificial intelligence (AI), particularly Large Language Models (LLMs) such as GPT-3 and BERT, has transformed various domains by enabling sophisticated natural language processing (NLP) tasks. In cybersecurity, the integration of LLMs presents promising new capabilities to address the growing complexity and scale of cyber threats. This paper provides a comprehensive review of the current research on the application of LLMs in cybersecurity. Leveraging a systematic literature review (SLR), it synthesizes key findings on how LLMs have been employed in tasks such as vulnerability detection, malware analysis, and phishing detection. The review highlights the advantages of LLMs, such as their ability to process unstructured data and automate complex tasks, while also addressing challenges related to scalability, false positives, and ethical concerns. By exploring domain-specific techniques and identifying limitations, this paper proposes future research directions aimed at enhancing the effectiveness of LLMs in cybersecurity. Key insights are offered to guide the continued development and application of LLMs in defending against evolving cyber threats.
APA, Harvard, Vancouver, ISO, and other styles
20

Sun, Yushi, Hao Xin, Kai Sun, Yifan Ethan Xu, Xiao Yang, Xin Luna Dong, Nan Tang, and Lei Chen. "Are Large Language Models a Good Replacement of Taxonomies?" Proceedings of the VLDB Endowment 17, no. 11 (July 2024): 2919–32. http://dx.doi.org/10.14778/3681954.3681973.

Full text
Abstract:
Large language models (LLMs) demonstrate an impressive ability to internalize knowledge and answer natural language questions. Although previous studies validate that LLMs perform well on general knowledge while presenting poor performance on long-tail nuanced knowledge, the community is still doubtful about whether the traditional knowledge graphs should be replaced by LLMs. In this paper, we ask if the schema of knowledge graph (i.e., taxonomy) is made obsolete by LLMs. Intuitively, LLMs should perform well on common taxonomies and at taxonomy levels that are common to people. Unfortunately, there lacks a comprehensive benchmark that evaluates the LLMs over a wide range of taxonomies from common to specialized domains and at levels from root to leaf so that we can draw a confident conclusion. To narrow the research gap, we constructed a novel taxonomy hierarchical structure discovery benchmark named TaxoGlimpse to evaluate the performance of LLMs over taxonomies. TaxoGlimpse covers ten representative taxonomies from common to specialized domains with in-depth experiments of different levels of entities in this taxonomy from root to leaf. Our comprehensive experiments of eighteen LLMs under three prompting settings validate that LLMs perform miserably poorly in handling specialized taxonomies and leaf-level entities. Specifically, the QA accuracy of the best LLM drops by up to 30% as we go from common to specialized domains and from root to leaf levels of taxonomies.
APA, Harvard, Vancouver, ISO, and other styles
21

Raj, Pinaki. "A Literature Review on Emotional Intelligence of Large Language Models (LLMs)." international journal of advanced research in computer science 15, no. 4 (August 20, 2024): 30–34. http://dx.doi.org/10.26483/ijarcs.v15i4.7111.

Full text
Abstract:
Large Language Models(LLMs) are artificial intelligence models that use deep neural networks to perform Natural Language Processing (NLP) tasks. These tasks include interaction between humans and computers, enabling computers to interpret and generate human languages in a meaningful manner. Large Language models are called "large" because of the architecture’s size and the huge sets of training text data. With the emergence of transformer-based LLMs, the game of NLPs has reached another level. This is due to their ability to handle long-range text dependencies in parallel. The growing prevalence of transformer-based LLMs in human lives has necessitated evaluating the scope of the Emotional Intelligence(EI) of LLMs. This paper will discuss the need for emotional intelligence in transformer-based LLMs and the various existing studies that have evaluated this aspect. The potential challenges of the LLMs along with the future directions for research in this field will also be discussed.
APA, Harvard, Vancouver, ISO, and other styles
22

Xia, Yuchen, Nasser Jazdi, and Michael Weyrich. "Applying Large Language Models for Intelligent Industrial Automation." atp magazin 66, no. 6-7 (July 1, 2024): 62–71. http://dx.doi.org/10.17560/atp.v66i6-7.2739.

Full text
Abstract:
This paper explores the transformative potential of Large Language Models (LLMs) in industrial automation, presenting a comprehensive framework for their integration into complex industrial systems. We begin with a theoretical overview of LLMs, elucidating their pivotal capabilities such as interpretation, task automation, and autonomous agent functionality. A generic methodology for integrating LLMs into industrial applications is outlined, explaining how to apply LLM for task-specific applications. Four case studies demonstrate the practical use of LLMs across different industrial environments: transforming unstructured data into structured data as asset administration shell model, improving user interactions with document databases through conversational systems, planning and controlling industrial operations autonomously, and interacting with simulation models to determine the parametrization of the process. The studies illustrate the ability of LLMs to manage versatile tasks and interface with digital twins and automation systems, indicating that efficiency and productivity improvements can be achieved by strategically deploying LLM technologies in industrial settings.
APA, Harvard, Vancouver, ISO, and other styles
23

Chen, Jiawei, Hongyu Lin, Xianpei Han, and Le Sun. "Benchmarking Large Language Models in Retrieval-Augmented Generation." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 16 (March 24, 2024): 17754–62. http://dx.doi.org/10.1609/aaai.v38i16.29728.

Full text
Abstract:
Retrieval-Augmented Generation (RAG) is a promising approach for mitigating the hallucination of large language models (LLMs). However, existing research lacks rigorous evaluation of the impact of retrieval-augmented generation on different large language models, which make it challenging to identify the potential bottlenecks in the capabilities of RAG for different LLMs. In this paper, we systematically investigate the impact of Retrieval-Augmented Generation on large language models. We analyze the performance of different large language models in 4 fundamental abilities required for RAG, including noise robustness, negative rejection, information integration, and counterfactual robustness. To this end, we establish Retrieval-Augmented Generation Benchmark (RGB), a new corpus for RAG evaluation in both English and Chinese. RGB divides the instances within the benchmark into 4 separate testbeds based on the aforementioned fundamental abilities required to resolve the case. Then we evaluate 6 representative LLMs on RGB to diagnose the challenges of current LLMs when applying RAG. Evaluation reveals that while LLMs exhibit a certain degree of noise robustness, they still struggle significantly in terms of negative rejection, information integration, and dealing with false information. The aforementioned assessment outcomes indicate that there is still a considerable journey ahead to effectively apply RAG to LLMs.
APA, Harvard, Vancouver, ISO, and other styles
24

Zhou, Zihao, Qiufeng Wang, Mingyu Jin, Jie Yao, Jianan Ye, Wei Liu, Wei Wang, Xiaowei Huang, and Kaizhu Huang. "MathAttack: Attacking Large Language Models towards Math Solving Ability." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 17 (March 24, 2024): 19750–58. http://dx.doi.org/10.1609/aaai.v38i17.29949.

Full text
Abstract:
With the boom of Large Language Models (LLMs), the research of solving Math Word Problem (MWP) has recently made great progress. However, there are few studies to examine the robustness of LLMs in math solving ability. Instead of attacking prompts in the use of LLMs, we propose a MathAttack model to attack MWP samples which are closer to the essence of robustness in solving math problems. Compared to traditional text adversarial attack, it is essential to preserve the mathematical logic of original MWPs during the attacking. To this end, we propose logical entity recognition to identify logical entries which are then frozen. Subsequently, the remaining text are attacked by adopting a word-level attacker. Furthermore, we propose a new dataset RobustMath to evaluate the robustness of LLMs in math solving ability. Extensive experiments on our RobustMath and two another math benchmark datasets GSM8K and MultiAirth show that MathAttack could effectively attack the math solving ability of LLMs. In the experiments, we observe that (1) Our adversarial samples from higher-accuracy LLMs are also effective for attacking LLMs with lower accuracy (e.g., transfer from larger to smaller-size LLMs, or from few-shot to zero-shot prompts); (2) Complex MWPs (such as more solving steps, longer text, more numbers) are more vulnerable to attack; (3) We can improve the robustness of LLMs by using our adversarial samples in few-shot prompts. Finally, we hope our practice and observation can serve as an important attempt towards enhancing the robustness of LLMs in math solving ability. The code and dataset is available at: https://github.com/zhouzihao501/MathAttack.
APA, Harvard, Vancouver, ISO, and other styles
25

Tian, Yijun, Huan Song, Zichen Wang, Haozhu Wang, Ziqing Hu, Fang Wang, Nitesh V. Chawla, and Panpan Xu. "Graph Neural Prompting with Large Language Models." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 17 (March 24, 2024): 19080–88. http://dx.doi.org/10.1609/aaai.v38i17.29875.

Full text
Abstract:
Large language models (LLMs) have shown remarkable generalization capability with exceptional performance in various language modeling tasks. However, they still exhibit inherent limitations in precisely capturing and returning grounded knowledge. While existing work has explored utilizing knowledge graphs (KGs) to enhance language modeling via joint training and customized model architectures, applying this to LLMs is problematic owing to their large number of parameters and high computational cost. Therefore, how to enhance pre-trained LLMs using grounded knowledge, e.g., retrieval-augmented generation, remains an open question. In this work, we propose Graph Neural Prompting (GNP), a novel plug-and-play method to assist pre-trained LLMs in learning beneficial knowledge from KGs. GNP encompasses various designs, including a standard graph neural network encoder, a cross-modality pooling module, a domain projector, and a self-supervised link prediction objective. Extensive experiments on multiple datasets demonstrate the superiority of GNP on both commonsense and biomedical reasoning tasks across different LLM sizes and settings. Code is available at https://github.com/meettyj/GNP.
APA, Harvard, Vancouver, ISO, and other styles
26

Gao, Zhengjie, Xuanzi Liu, Yuanshuai Lan, and Zheng Yang. "A Brief Survey on Safety of Large Language Models." Journal of Computing and Information Technology 32, no. 1 (July 15, 2024): 47–64. http://dx.doi.org/10.20532/cit.2024.1005778.

Full text
Abstract:
Large Language Models (LLMs) have revolutionized Natural Language Processing (NLP) and have been widely adopted in various applications such as machine translation, chatbots, text summarization, and so on. However, the use of LLMs has raised concerns about their potential safety and security risks. In this survey, we explore the safety implications of LLMs, including ethical considerations, hallucination, and prompt injection. We also discuss current research efforts to mitigate these risks and identify areas for future research. Our survey provides a comprehensive overview of the safety concerns related to LLMs, which can help researchers and practitioners in the NLP community develop more safe and ethical applications of LLMs.
APA, Harvard, Vancouver, ISO, and other styles
27

Patil, Rajvardhan, and Venkat Gudivada. "A Review of Current Trends, Techniques, and Challenges in Large Language Models (LLMs)." Applied Sciences 14, no. 5 (March 1, 2024): 2074. http://dx.doi.org/10.3390/app14052074.

Full text
Abstract:
Natural language processing (NLP) has significantly transformed in the last decade, especially in the field of language modeling. Large language models (LLMs) have achieved SOTA performances on natural language understanding (NLU) and natural language generation (NLG) tasks by learning language representation in self-supervised ways. This paper provides a comprehensive survey to capture the progression of advances in language models. In this paper, we examine the different aspects of language models, which started with a few million parameters but have reached the size of a trillion in a very short time. We also look at how these LLMs transitioned from task-specific to task-independent to task-and-language-independent architectures. This paper extensively discusses different pretraining objectives, benchmarks, and transfer learning methods used in LLMs. It also examines different finetuning and in-context learning techniques used in downstream tasks. Moreover, it explores how LLMs can perform well across many domains and datasets if sufficiently trained on a large and diverse dataset. Next, it discusses how, over time, the availability of cheap computational power and large datasets have improved LLM’s capabilities and raised new challenges. As part of our study, we also inspect LLMs from the perspective of scalability to see how their performance is affected by the model’s depth, width, and data size. Lastly, we provide an empirical comparison of existing trends and techniques and a comprehensive analysis of where the field of LLM currently stands.
APA, Harvard, Vancouver, ISO, and other styles
28

Cuskley, Christine, Rebecca Woods, and Molly Flaherty. "The Limitations of Large Language Models for Understanding Human Language and Cognition." Open Mind 8 (2024): 1058–83. http://dx.doi.org/10.1162/opmi_a_00160.

Full text
Abstract:
Abstract Researchers have recently argued that the capabilities of Large Language Models (LLMs) can provide new insights into longstanding debates about the role of learning and/or innateness in the development and evolution of human language. Here, we argue on two grounds that LLMs alone tell us very little about human language and cognition in terms of acquisition and evolution. First, any similarities between human language and the output of LLMs are purely functional. Borrowing the “four questions” framework from ethology, we argue that what LLMs do is superficially similar, but how they do it is not. In contrast to the rich multimodal data humans leverage in interactive language learning, LLMs rely on immersive exposure to vastly greater quantities of unimodal text data, with recent multimodal efforts built upon mappings between images and text. Second, turning to functional similarities between human language and LLM output, we show that human linguistic behavior is much broader. LLMs were designed to imitate the very specific behavior of human writing; while they do this impressively, the underlying mechanisms of these models limit their capacities for meaning and naturalistic interaction, and their potential for dealing with the diversity in human language. We conclude by emphasising that LLMs are not theories of language, but tools that may be used to study language, and that can only be effectively applied with specific hypotheses to motivate research.
APA, Harvard, Vancouver, ISO, and other styles
29

Fernandez, Raul Castro, Aaron J. Elmore, Michael J. Franklin, Sanjay Krishnan, and Chenhao Tan. "How Large Language Models Will Disrupt Data Management." Proceedings of the VLDB Endowment 16, no. 11 (July 2023): 3302–9. http://dx.doi.org/10.14778/3611479.3611527.

Full text
Abstract:
Large language models (LLMs), such as GPT-4, are revolutionizing software's ability to understand, process, and synthesize language. The authors of this paper believe that this advance in technology is significant enough to prompt introspection in the data management community, similar to previous technological disruptions such as the advents of the world wide web, cloud computing, and statistical machine learning. We argue that the disruptive influence that LLMs will have on data management will come from two angles. (1) A number of hard database problems, namely, entity resolution, schema matching, data discovery, and query synthesis, hit a ceiling of automation because the system does not fully understand the semantics of the underlying data. Based on large training corpora of natural language, structured data, and code, LLMs have an unprecedented ability to ground database tuples, schemas, and queries in real-world concepts. We will provide examples of how LLMs may completely change our approaches to these problems. (2) LLMs blur the line between predictive models and information retrieval systems with their ability to answer questions. We will present examples showing how large databases and information retrieval systems have complementary functionality.
APA, Harvard, Vancouver, ISO, and other styles
30

Lopez-Lira, Alejandro. "Large language models (LLMs) and financial analysis." Business & Management Collection 2024, no. 9 (September 30, 2024): e1006319. http://dx.doi.org/10.69645/pldt5076.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Lu, Ruei-Shan, Ching-Chang Lin, and Hsiu-Yuan Tsao. "Empowering Large Language Models to Leverage Domain-Specific Knowledge in E-Learning." Applied Sciences 14, no. 12 (June 18, 2024): 5264. http://dx.doi.org/10.3390/app14125264.

Full text
Abstract:
Large language models (LLMs) have demonstrated remarkable capabilities in various natural language processing tasks. However, their performance in domain-specific contexts, such as E-learning, is hindered by the lack of specific domain knowledge. This paper adopts a novel approach of retrieval augment generation to empower LLMs with domain-specific knowledge in the field of E-learning. The approach leverages external knowledge sources, such as E-learning lectures or research papers, to enhance the LLM’s understanding and generation capabilities. Experimental evaluations demonstrate the effectiveness and superiority of our approach compared to existing methods in capturing and generating E-learning-specific information.
APA, Harvard, Vancouver, ISO, and other styles
32

Maathuis, Clara, and Sabarathinam Chockalingam. "Risk Assessment of Large Language Models Beyond Apocalyptic Visions." European Conference on Cyber Warfare and Security 23, no. 1 (June 21, 2024): 279–86. http://dx.doi.org/10.34190/eccws.23.1.2293.

Full text
Abstract:
The remarkable development of Large Language Models (LLMs) continues to revolutionize various human activities in different societal domains like education, communications, and healthcare. While facilitating the generation of coherent and contextually relevant text across a diverse plethora of topics, LLMs became a set of instruments available in different toolboxes of decision makers. In this way, LLMs moved from a hype to an actual underlying mechanism for capturing valuable insights, revealing different perspectives on topics, and providing real-time decision-making support. As LLMs continue to increase in sophistication and accessibility, both societal and academic effort from AI and cyber security is projected in this direction, and a general societal unrest is seen due to their unknown consequences. Nevertheless, an apocalyptic vision towards their risks and impact does not represent a constructive and realistic approach. Contrarily, this could be an impediment to building LLMs that are safe, responsible, trustworthy, and have a real contribution to the overall societal well-being. Hence, understanding and addressing the risks of LLMs is imperative for building them in an ethical, social, and legal manner while making sure to consider control mechanisms for avoiding, mitigating, accepting, and transferring their risks and harmful consequences. Taking into consideration that these technological developments find themselves in an incipient phase, this research calls for a multi-angled perspective and proposes a realistic theoretical risk assessment method for LLMs.
APA, Harvard, Vancouver, ISO, and other styles
33

Swindle, Adrian, Derrick McNealy, Giri Krishnan, and Ramyaa Ramyaa. "Evaluation of Large Language Models on Code Obfuscation (Student Abstract)." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 21 (March 24, 2024): 23664–66. http://dx.doi.org/10.1609/aaai.v38i21.30517.

Full text
Abstract:
Obfuscation intends to decrease interpretability of code and identification of code behavior. Large Language Models(LLMs) have been proposed for code synthesis and code analysis. This paper attempts to understand how well LLMs can analyse code and identify code behavior. Specifically, this paper systematically evaluates several LLMs’ capabilities to detect obfuscated code and identify behavior across a variety of obfuscation techniques with varying levels of complexity. LLMs proved to be better at detecting obfuscations that changed identifiers, even to misleading ones, compared to obfuscations involving code insertions (unused variables, as well as variables that replace constants with expressions that evaluate to those constants). Hardest to detect were obfuscations that layered multiple simple transformations. For these, only 20-40% of the LLMs’ responses were correct. Adding misleading documentation was also successful in misleading LLMs. We provide all our code to replicate results at https://github.com/SwindleA/LLMCodeObfuscation. Overall, our results suggest a gap in LLMs’ ability to understand code.
APA, Harvard, Vancouver, ISO, and other styles
34

Pahune, Saurabh, and Manoj Chandrasekharan. "Several Categories of Large Language Models (LLMs): A Short Survey." International Journal for Research in Applied Science and Engineering Technology 11, no. 7 (July 31, 2023): 615–33. http://dx.doi.org/10.22214/ijraset.2023.54677.

Full text
Abstract:
Abstract: Large Language Models (LLMs) have become effective tools for natural language process-ing and have been used in many different fields. This essay offers a succinct summary of various LLM subcategories. The survey emphasizes recent developments and efforts made for various LLM kinds, including task-based financial LLMs, multilingual language LLMs, biomedical and clinical LLMs, vision language LLMs, and code language models. The survey gives a general summary of the methods, attributes, datasets, transformer models, and comparison metrics applied in each category of LLMs. Furthermore, it highlights unresolved problems in the field of developing chatbots and virtual assistants, such as boosting natural language processing, enhancing chatbot intelligence, and resolving moral and legal dilemmas. The purpose of this study is to provide readers, developers, academics, and users interested in LLM-based chatbots and virtual intelligent assistant technologies with use full information and future directions.
APA, Harvard, Vancouver, ISO, and other styles
35

Chen, Xi. "Large Language Models in the Medical Field: Principles and Applications." International Journal of Computer Science and Information Technology 2, no. 3 (May 28, 2024): 219–24. http://dx.doi.org/10.62051/ijcsit.v2n3.24.

Full text
Abstract:
Large language models (LLMs) have emerged as powerful tools in various fields, including healthcare. This paper explores the transformative role of LLMs in healthcare quality enhancement, their applications in medical decision-making, and their potential to drive healthcare innovation. Adopting a method of case study, the present study demonstrates how LLMs streamline medical processes, assist in diagnosis and treatment, and enable personalized healthcare solutions. Additionally, the principles of LLMs in medicine were discussed, including pre-training, fine-tuning, and prompt engineering. By leveraging LLMs, healthcare professionals can enhance patient care, optimize workflows, and make more informed decisions, ultimately leading to better healthcare outcomes.
APA, Harvard, Vancouver, ISO, and other styles
36

Xue, Qing. "Unlocking the potential: A comprehensive exploration of large language models in natural language processing." Applied and Computational Engineering 57, no. 1 (April 30, 2024): 247–52. http://dx.doi.org/10.54254/2755-2721/57/20241341.

Full text
Abstract:
In recent years, large language models (LLMs) have revolutionized natural language processing (NLP) with their transformative architectures and sophisticated training techniques. This paper provides a comprehensive overview of LLMs, focusing on their architecture, training methodologies, and diverse applications. We delve into the transformer architecture, attention mechanisms, and parameter tuning strategies that underpin LLMs' capabilities. Furthermore, we explore training techniques such as self-supervised learning, transfer learning, and curriculum learning, highlighting their roles in empowering LLMs with linguistic proficiency. Additionally, we discuss the wide-ranging applications of LLMs, including text generation, sentiment analysis, and question answering, showcasing their versatility and impact across various domains. Through this comprehensive examination, we aim to elucidate the advancements and potentials of LLMs in shaping the future of natural language understanding and generation.
APA, Harvard, Vancouver, ISO, and other styles
37

Li, Jiahuan, Hao Zhou, Shujian Huang, Shanbo Cheng, and Jiajun Chen. "Eliciting the Translation Ability of Large Language Models via Multilingual Finetuning with Translation Instructions." Transactions of the Association for Computational Linguistics 12 (2024): 576–92. http://dx.doi.org/10.1162/tacl_a_00655.

Full text
Abstract:
Abstract Large-scale pretrained language models (LLMs), such as ChatGPT and GPT4, have shown strong abilities in multilingual translation, without being explicitly trained on parallel corpora. It is intriguing how the LLMs obtain their ability to carry out translation instructions for different languages. In this paper, we present a detailed analysis by finetuning a multilingual pretrained language model, XGLM-7.5B, to perform multilingual translation following given instructions. Firstly, we show that multilingual LLMs have stronger translation abilities than previously demonstrated. For a certain language, the translation performance depends on its similarity to English and the amount of data used in the pretraining phase. Secondly, we find that LLMs’ ability to carry out translation instructions relies on the understanding of translation instructions and the alignment among different languages. With multilingual finetuning with translation instructions, LLMs could learn to perform the translation task well even for those language pairs unseen during the instruction tuning phase.
APA, Harvard, Vancouver, ISO, and other styles
38

Baryshnikov, P. N. "What is scientific knowledge produced by Large Language Models?" Philosophical Problems of IT & Cyberspace (PhilIT&C), no. 1 (July 12, 2024): 89–103. http://dx.doi.org/10.17726/philit.2024.1.6.

Full text
Abstract:
This article examines the nature of scientific knowledge generated by Large Language Models (LLMs) and assesses their impact on scientific discoveries and the philosophy of science. LLMs, such as GPT‑4, are advanced deep learning algorithms capable of performing various natural language processing tasks, including text generation, translation, and data analysis. The study aims to explore how these technologies influence the scientific research process, questioning the classification and validity of AI‑assisted scientific discoveries. The methodology involves a comprehensive review of existing literature on the application of LLMs in various scientific fields, coupled with an analysis of their ethical implications. Key findings highlight the benefits of LLMs, including accelerated research processes, enhanced accuracy, and the ability to integrate interdisciplinary knowledge. However, challenges such as issues of reliability, the ethical responsibility of AI‑generated content, and environmental concerns are also discussed. The paper concludes that while LLMs significantly contribute to scientific advancements, their use necessitates a reevaluation of traditional concepts in the philosophy of science and the establishment of new ethical guidelines to ensure transparency, accountability, and integrity in AI‑assisted research. This balanced approach aims to harness the potential of LLMs while addressing the ethical and practical challenges they present.
APA, Harvard, Vancouver, ISO, and other styles
39

Li, Qinbin, Junyuan Hong, Chulin Xie, Jeffrey Tan, Rachel Xin, Junyi Hou, Xavier Yin, et al. "LLM-PBE: Assessing Data Privacy in Large Language Models." Proceedings of the VLDB Endowment 17, no. 11 (July 2024): 3201–14. http://dx.doi.org/10.14778/3681954.3681994.

Full text
Abstract:
Large Language Models (LLMs) have become integral to numerous domains, significantly advancing applications in data management, mining, and analysis. Their profound capabilities in processing and interpreting complex language data, however, bring to light pressing concerns regarding data privacy, especially the risk of unintentional training data leakage. Despite the critical nature of this issue, there has been no existing literature to offer a comprehensive assessment of data privacy risks in LLMs. Addressing this gap, our paper introduces LLM-PBE, a toolkit crafted specifically for the systematic evaluation of data privacy risks in LLMs. LLM-PBE is designed to analyze privacy across the entire lifecycle of LLMs, incorporating diverse attack and defense strategies, and handling various data types and metrics. Through detailed experimentation with multiple LLMs, LLM-PBE facilitates an in-depth exploration of data privacy concerns, shedding light on influential factors such as model size, data characteristics, and evolving temporal dimensions. This study not only enriches the understanding of privacy issues in LLMs but also serves as a vital resource for future research in the field. Aimed at enhancing the breadth of knowledge in this area, the findings, resources, and our full technical report are made available at https://llm-pbe.github.io/, providing an open platform for academic and practical advancements in LLM privacy assessment.
APA, Harvard, Vancouver, ISO, and other styles
40

Alkaoud, Mohamed. "A bilingual benchmark for evaluating large language models." PeerJ Computer Science 10 (February 29, 2024): e1893. http://dx.doi.org/10.7717/peerj-cs.1893.

Full text
Abstract:
This work introduces a new benchmark for the bilingual evaluation of large language models (LLMs) in English and Arabic. While LLMs have transformed various fields, their evaluation in Arabic remains limited. This work addresses this gap by proposing a novel evaluation method for LLMs in both Arabic and English, allowing for a direct comparison between the performance of the two languages. We build a new evaluation dataset based on the General Aptitude Test (GAT), a standardized test widely used for university admissions in the Arab world, that we utilize to measure the linguistic capabilities of LLMs. We conduct several experiments to examine the linguistic capabilities of ChatGPT and quantify how much better it is at English than Arabic. We also examine the effect of changing task descriptions from Arabic to English and vice-versa. In addition to that, we find that fastText can surpass ChatGPT in finding Arabic word analogies. We conclude by showing that GPT-4 Arabic linguistic capabilities are much better than ChatGPT’s Arabic capabilities and are close to ChatGPT’s English capabilities.
APA, Harvard, Vancouver, ISO, and other styles
41

Zeng, Jiali, Fandong Meng, Yongjing Yin, and Jie Zhou. "Teaching Large Language Models to Translate with Comparison." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 17 (March 24, 2024): 19488–96. http://dx.doi.org/10.1609/aaai.v38i17.29920.

Full text
Abstract:
Open-sourced large language models (LLMs) have demonstrated remarkable efficacy in various tasks with instruction tuning. However, these models can sometimes struggle with tasks that require more specialized knowledge such as translation. One possible reason for such deficiency is that instruction tuning aims to generate fluent and coherent text that continues from a given instruction without being constrained by any task-specific requirements. Moreover, it can be more challenging to tune smaller LLMs with lower-quality training data. To address this issue, we propose a novel framework using examples in comparison to teach LLMs to learn translation. Our approach involves output comparison and preference comparison, presenting the model with carefully designed examples of correct and incorrect translations and an additional preference loss for better regularization. Empirical evaluation on four language directions of WMT2022 and FLORES-200 benchmarks shows the superiority of our proposed method over existing methods. Our findings offer a new perspective on fine-tuning LLMs for translation tasks and provide a promising solution for generating high-quality translations. Please refer to Github for more details: https://github.com/lemon0830/TIM.
APA, Harvard, Vancouver, ISO, and other styles
42

Yang, Diyi. "Human-AI Interaction in the Age of Large Language Models." Proceedings of the AAAI Symposium Series 3, no. 1 (May 20, 2024): 66–67. http://dx.doi.org/10.1609/aaaiss.v3i1.31183.

Full text
Abstract:
Large language models (LLMs) have revolutionized the way humans interact with AI systems, transforming a wide range of fields and disciplines. In this talk, I share two distinct approaches to empowering human-AI interaction using LLMs. The first one explores how LLMstransform computational social science, and how human-AI collaboration can reduce costs and improve the efficiency of social science research. The second part looks at social skill learning via LLMs by empowering therapists and learners with LLM-empowered feedback and deliberative practices. These two works demonstrate how human-AI collaboration via LLMs can empower individuals and foster positive change. We conclude by discussing how LLMs enable collaborative intelligence by redefining the interactions between humans and AI systems.
APA, Harvard, Vancouver, ISO, and other styles
43

Jeong, Hyeongyo, Haechan Lee, Changwon Kim, and Sungtae Shin. "A Survey of Robot Intelligence with Large Language Models." Applied Sciences 14, no. 19 (October 2, 2024): 8868. http://dx.doi.org/10.3390/app14198868.

Full text
Abstract:
Since the emergence of ChatGPT, research on large language models (LLMs) has actively progressed across various fields. LLMs, pre-trained on vast text datasets, have exhibited exceptional abilities in understanding natural language and planning tasks. These abilities of LLMs are promising in robotics. In general, traditional supervised learning-based robot intelligence systems have a significant lack of adaptability to dynamically changing environments. However, LLMs help a robot intelligence system to improve its generalization ability in dynamic and complex real-world environments. Indeed, findings from ongoing robotics studies indicate that LLMs can significantly improve robots’ behavior planning and execution capabilities. Additionally, vision-language models (VLMs), trained on extensive visual and linguistic data for the vision question answering (VQA) problem, excel at integrating computer vision with natural language processing. VLMs can comprehend visual contexts and execute actions through natural language. They also provide descriptions of scenes in natural language. Several studies have explored the enhancement of robot intelligence using multimodal data, including object recognition and description by VLMs, along with the execution of language-driven commands integrated with visual information. This review paper thoroughly investigates how foundation models such as LLMs and VLMs have been employed to boost robot intelligence. For clarity, the research areas are categorized into five topics: reward design in reinforcement learning, low-level control, high-level planning, manipulation, and scene understanding. This review also summarizes studies that show how foundation models, such as the Eureka model for automating reward function design in reinforcement learning, RT-2 for integrating visual data, language, and robot actions in vision-language-action models, and AutoRT for generating feasible tasks and executing robot behavior policies via LLMs, have improved robot intelligence.
APA, Harvard, Vancouver, ISO, and other styles
44

Nazi, Zabir Al, and Wei Peng. "Large Language Models in Healthcare and Medical Domain: A Review." Informatics 11, no. 3 (August 7, 2024): 57. http://dx.doi.org/10.3390/informatics11030057.

Full text
Abstract:
The deployment of large language models (LLMs) within the healthcare sector has sparked both enthusiasm and apprehension. These models exhibit the remarkable ability to provide proficient responses to free-text queries, demonstrating a nuanced understanding of professional medical knowledge. This comprehensive survey delves into the functionalities of existing LLMs designed for healthcare applications and elucidates the trajectory of their development, starting with traditional Pretrained Language Models (PLMs) and then moving to the present state of LLMs in the healthcare sector. First, we explore the potential of LLMs to amplify the efficiency and effectiveness of diverse healthcare applications, particularly focusing on clinical language understanding tasks. These tasks encompass a wide spectrum, ranging from named entity recognition and relation extraction to natural language inference, multimodal medical applications, document classification, and question-answering. Additionally, we conduct an extensive comparison of the most recent state-of-the-art LLMs in the healthcare domain, while also assessing the utilization of various open-source LLMs and highlighting their significance in healthcare applications. Furthermore, we present the essential performance metrics employed to evaluate LLMs in the biomedical domain, shedding light on their effectiveness and limitations. Finally, we summarize the prominent challenges and constraints faced by large language models in the healthcare sector by offering a holistic perspective on their potential benefits and shortcomings. This review provides a comprehensive exploration of the current landscape of LLMs in healthcare, addressing their role in transforming medical applications and the areas that warrant further research and development.
APA, Harvard, Vancouver, ISO, and other styles
45

Bussaja, Janga. "Exploring White Fragility in Large Language Models." International Journal of Computer Science and Information Technology 16, no. 4 (August 28, 2024): 53–73. http://dx.doi.org/10.5121/ijcsit.2024.16405.

Full text
Abstract:
This paper evaluates the understanding and biases of large language models (LLMs) regarding racism by comparing their responses to those of prominent African-centered scholars, Dr. Amos Wilson and Dr. Frances Cress Welsing. The study identifies racial biases in LLMs, illustrating the critical need for specialized AI systems like "Smoky," designed to address systemic racism with a foundation in African-centered scholarship. By highlighting disparities and potential biases in LLM responses, the research aims to contribute to the development of more culturally aware and contextually sensitive AI systems.
APA, Harvard, Vancouver, ISO, and other styles
46

Zhu, Zhen, Yibo Wang, Shouqing Yang, Lin Long, Runze Wu, Xiu Tang, Junbo Zhao, and Haobo Wang. "CORAL: Collaborative Automatic Labeling System Based on Large Language Models." Proceedings of the VLDB Endowment 17, no. 12 (August 2024): 4401–4. http://dx.doi.org/10.14778/3685800.3685885.

Full text
Abstract:
In the era of big data, data annotation is integral to numerous applications. However, it is widely acknowledged as a laborious and time-consuming process, significantly impeding the scalability and efficiency of data-driven applications. To reduce the human cost, we demonstrate CORAL, a collaborative automatic labeling system driven by large language models (LLMs), which achieves high-quality annotation with the least human effort. Firstly, CORAL employs LLM to automatically annotate vast datasets, generating coarse-grained labels. Subsequently, a weakly-supervised learning module trains small language models (SLMs) using noisy label learning techniques to distill accurate labels from LLM's annotations. It also allows statistical analysis of model outcomes to identify potentially erroneous labels, reducing the human cost of error detection. Furthermore, CORAL supports iterative refinement by LLMs and SLMs using manually corrected labels, thereby ensuring continual enhancement in annotation quality and model performance. A visual interface enables annotation process monitoring and result analysis.
APA, Harvard, Vancouver, ISO, and other styles
47

Chiarello, Filippo, Simone Barandoni, Marija Majda Škec, and Gualtiero Fantoni. "Generative large language models in engineering design: opportunities and challenges." Proceedings of the Design Society 4 (May 2024): 1959–68. http://dx.doi.org/10.1017/pds.2024.198.

Full text
Abstract:
AbstractDespite the rapid advancement of generative Large Language Models (LLMs), there is still limited understanding of their potential impacts on engineering design (ED). This study fills this gap by collecting the tasks LLMs can perform within ED, using a Natural Language Processing analysis of 15,355 ED research papers. The results lead to a framework of LLM tasks in design, classifying them for different functions of LLMs and ED phases. Our findings illuminate the opportunities and risks of using LLMs for design, offering a foundation for future research and application in this domain.
APA, Harvard, Vancouver, ISO, and other styles
48

Joshi, Aditya, and Shruta Rawat. "Evaluation of Large Language Models Using an Indian Language LGBTI+ Lexicon." AI Ethics Journal 4, no. 1 (November 9, 2023). http://dx.doi.org/10.47289/aiej20231109.

Full text
Abstract:
Large language models (LLMs) are typically evaluated on the basis of task-based benchmarks such as MMLU. Such benchmarks do not examine the behaviour of LLMs in specific contexts. This is particularly true in the LGBTI+ context where social stereotypes may result in variation in LGBTI+ terminology. Therefore, domain-specific lexicons or dictionaries may be useful as a representative list of words against which the LLM’s behaviour needs to be evaluated. This paper presents a methodology for evaluation of LLMs using an LGBTI+ lexicon in Indian languages. The methodology consists of four steps: formulating NLP tasks relevant to the expected behaviour, creating prompts that test LLMs, using the LLMs to obtain the output and, finally, manually evaluating the results. Our qualitative analysis shows that the three LLMs we experiment on are unable to detect underlying hateful content. Similarly, we observe limitations in using machine translation as means to evaluate natural language understanding in languages other than English. The methodology presented in this paper can be useful for LGBTI+ lexicons in other languages as well as other domain-specific lexicons. The work done in this paper opens avenues for responsible behaviour of LLMs in the Indian context, especially with prevalent social perception of the LGBTI+ community.
APA, Harvard, Vancouver, ISO, and other styles
49

Linegar, Mitchell, Rafal Kocielnik, and R. Michael Alvarez. "Large language models and political science." Frontiers in Political Science 5 (October 16, 2023). http://dx.doi.org/10.3389/fpos.2023.1257092.

Full text
Abstract:
Large Language Models (LLMs) are a type of artificial intelligence that uses information from very large datasets to model the use of language and generate content. While LLMs like GPT-3 have been used widely in many applications, the recent public release of OpenAI's ChatGPT has opened more debate about the potential uses and abuses of LLMs. In this paper, we provide a brief introduction to LLMs and discuss their potential application in political science and political methodology. We use two examples of LLMs from our recent research to illustrate how LLMs open new areas of research. We conclude with a discussion of how researchers can use LLMs in their work, and issues that researchers need to be aware of regarding using LLMs in political science and political methodology.
APA, Harvard, Vancouver, ISO, and other styles
50

Kong, Qing‐Zhou, Kun‐Ping Ju, Meng Wan, Jing Liu, Xiao‐Qi Wu, Yue‐Yue Li, Xiu‐Li Zuo, and Yan‐Qing Li. "Comparative analysis of large language models in medical counseling: A focus on Helicobacter pylori infection." Helicobacter 29, no. 1 (January 2024). http://dx.doi.org/10.1111/hel.13055.

Full text
Abstract:
AbstractBackgroundLarge language models (LLMs) are promising medical counseling tools, but the reliability of responses remains unclear. We aimed to assess the feasibility of three popular LLMs as counseling tools for Helicobacter pylori infection in different counseling languages.Materials and MethodsThis study was conducted between November 20 and December 1, 2023. Three large language models (ChatGPT 4.0 [LLM1], ChatGPT 3.5 [LLM2], and ERNIE Bot 4.0 [LLM3]) were input 15 H. pylori related questions each, once in English and once in Chinese. Each chat was conducted using the “New Chat” function to avoid bias from correlation interference. Responses were recorded and blindly assigned to three reviewers for scoring on three established Likert scales: accuracy (ranged 1–6 point), completeness (ranged 1–3 point), and comprehensibility (ranged 1–3 point). The acceptable thresholds for the scales were set at a minimum of 4, 2, and 2, respectively. Final various source and interlanguage comparisons were made.ResultsThe overall mean (SD) accuracy score was 4.80 (1.02), while 1.82 (0.78) for completeness score and 2.90 (0.36) for comprehensibility score. The acceptable proportions for the accuracy, completeness, and comprehensibility of the responses were 90%, 45.6%, and 100%, respectively. The acceptable proportion of overall completeness score for English responses was better than for Chinese responses (p = 0.034). For accuracy, the English responses of LLM3 were better than the Chinese responses (p = 0.0055). As for completeness, the English responses of LLM1 was better than the Chinese responses (p = 0.0257). For comprehensibility, the English responses of LLM1 was better than the Chinese responses (p = 0.0496). No differences were found between the various LLMs.ConclusionsThe LLMs responded satisfactorily to questions related to H. pylori infection. But further improving completeness and reliability, along with considering language nuances, is crucial for optimizing overall performance.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography