Journal articles on the topic 'Pretrained models'

To see the other types of publications on this topic, follow the link: Pretrained models.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Pretrained models.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Hofmann, Valentin, Goran Glavaš, Nikola Ljubešić, Janet B. Pierrehumbert, and Hinrich Schütze. "Geographic Adaptation of Pretrained Language Models." Transactions of the Association for Computational Linguistics 12 (2024): 411–31. http://dx.doi.org/10.1162/tacl_a_00652.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Abstract While pretrained language models (PLMs) have been shown to possess a plethora of linguistic knowledge, the existing body of research has largely neglected extralinguistic knowledge, which is generally difficult to obtain by pretraining on text alone. Here, we contribute to closing this gap by examining geolinguistic knowledge, i.e., knowledge about geographic variation in language. We introduce geoadaptation, an intermediate training step that couples language modeling with geolocation prediction in a multi-task learning setup. We geoadapt four PLMs, covering language groups from three geographic areas, and evaluate them on five different tasks: fine-tuned (i.e., supervised) geolocation prediction, zero-shot (i.e., unsupervised) geolocation prediction, fine-tuned language identification, zero-shot language identification, and zero-shot prediction of dialect features. Geoadaptation is very successful at injecting geolinguistic knowledge into the PLMs: The geoadapted PLMs consistently outperform PLMs adapted using only language modeling (by especially wide margins on zero-shot prediction tasks), and we obtain new state-of-the-art results on two benchmarks for geolocation prediction and language identification. Furthermore, we show that the effectiveness of geoadaptation stems from its ability to geographically retrofit the representation space of the PLMs.
2

Bear Don’t Walk IV, Oliver J., Tony Sun, Adler Perotte, and Noémie Elhadad. "Clinically relevant pretraining is all you need." Journal of the American Medical Informatics Association 28, no. 9 (June 21, 2021): 1970–76. http://dx.doi.org/10.1093/jamia/ocab086.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Abstract Clinical notes present a wealth of information for applications in the clinical domain, but heterogeneity across clinical institutions and settings presents challenges for their processing. The clinical natural language processing field has made strides in overcoming domain heterogeneity, while pretrained deep learning models present opportunities to transfer knowledge from one task to another. Pretrained models have performed well when transferred to new tasks; however, it is not well understood if these models generalize across differences in institutions and settings within the clinical domain. We explore if institution or setting specific pretraining is necessary for pretrained models to perform well when transferred to new tasks. We find no significant performance difference between models pretrained across institutions and settings, indicating that clinically pretrained models transfer well across such boundaries. Given a clinically pretrained model, clinical natural language processing researchers may forgo the time-consuming pretraining step without a significant performance drop.
3

Basu, Sourya, Prasanna Sattigeri, Karthikeyan Natesan Ramamurthy, Vijil Chenthamarakshan, Kush R. Varshney, Lav R. Varshney, and Payel Das. "Equi-Tuning: Group Equivariant Fine-Tuning of Pretrained Models." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 6 (June 26, 2023): 6788–96. http://dx.doi.org/10.1609/aaai.v37i6.25832.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
We introduce equi-tuning, a novel fine-tuning method that transforms (potentially non-equivariant) pretrained models into group equivariant models while incurring minimum L_2 loss between the feature representations of the pretrained and the equivariant models. Large pretrained models can be equi-tuned for different groups to satisfy the needs of various downstream tasks. Equi-tuned models benefit from both group equivariance as an inductive bias and semantic priors from pretrained models. We provide applications of equi-tuning on three different tasks: image classification, compositional generalization in language, and fairness in natural language generation (NLG). We also provide a novel group-theoretic definition for fairness in NLG. The effectiveness of this definition is shown by testing it against a standard empirical method of fairness in NLG. We provide experimental results for equi-tuning using a variety of pretrained models: Alexnet, Resnet, VGG, and Densenet for image classification; RNNs, GRUs, and LSTMs for compositional generalization; and GPT2 for fairness in NLG. We test these models on benchmark datasets across all considered tasks to show the generality and effectiveness of the proposed method.
4

Wang, Canjun, Zhao Li, Tong Chen, Ruishuang Wang, and Zhengyu Ju. "Research on the Application of Prompt Learning Pretrained Language Model in Machine Translation Task with Reinforcement Learning." Electronics 12, no. 16 (August 9, 2023): 3391. http://dx.doi.org/10.3390/electronics12163391.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
With the continuous advancement of deep learning technology, pretrained language models have emerged as crucial tools for natural language processing tasks. However, optimization of pretrained language models is essential for specific tasks such as machine translation. This paper presents a novel approach that integrates reinforcement learning with prompt learning to enhance the performance of pretrained language models in machine translation tasks. In our methodology, a “prompt” string is incorporated into the input of the pretrained language model, to guide the generation of an output that aligns closely with the target translation. Reinforcement learning is employed to train the model in producing optimal translation results. During this training process, the target translation is utilized as a reward signal to incentivize the model to generate an output that aligns more closely with the desired translation. Experimental results validated the effectiveness of the proposed approach. The pretrained language model trained with prompt learning and reinforcement learning exhibited superior performance compared to traditional pretrained language models in machine translation tasks. Furthermore, we observed that different prompt strategies significantly impacted the model’s performance, underscoring the importance of selecting an optimal prompt strategy tailored to the specific task. The results suggest that using techniques such as prompt learning and reinforcement learning can improve the performance of pretrained language models for tasks such as text generation and machine translation. The method proposed in this paper not only offers a fresh perspective on leveraging pretrained language models in machine translation and other related tasks but also serves as a valuable reference for further research in this domain. By combining reinforcement learning with prompt learning, researchers can explore new avenues for optimizing pretrained language models and improving their efficacy in various natural language processing tasks.
5

Parmonangan, Ivan Halim, Marsella Marsella, Doharfen Frans Rino Pardede, Katarina Prisca Rijanto, Stephanie Stephanie, Kreshna Adhitya Chandra Kesuma, Valentina Tiara Cahyaningtyas, and Maria Susan Anggreainy. "Training CNN-based Model on Low Resource Hardware and Small Dataset for Early Prediction of Melanoma from Skin Lesion Images." Engineering, MAthematics and Computer Science (EMACS) Journal 5, no. 2 (May 31, 2023): 41–46. http://dx.doi.org/10.21512/emacsjournal.v5i2.9904.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Melanoma is a kind of rare skin cancer that can spread quickly to the other skin layers and the organs beneath. Melanoma is known to be curable only if it is diagnosed at an early stage. This poses a challenge for accurate prediction to cut the number of deaths caused by melanoma. Deep learning methods have recently shown promising performance in classifying images accurately. However, it requires a lot of samples to generalize well, while the number of melanoma sample images is limited. To solve this issue, transfer learning has widely adapted to transfer the knowledge of the pretrained model to another domain or new dataset which has lesser samples or different tasks. This study is aimed to find which method is better to achieve this for early melanoma prediction from skin lesion images. We investigated three pretrained and one non-pretrained image classification models. Specifically, we choose the pretrained models which are efficient to train on small training sample and low hardware resource. The result shows that using limited sample images and low hardware resource, pretrained image models yield better overall accuracy and recall compared to the non-pretrained model. This suggests that pretrained models are more suitable in this task with constrained data and hardware resource.
6

Edman, Lukas, Gabriele Sarti, Antonio Toral, Gertjan van Noord, and Arianna Bisazza. "Are Character-level Translations Worth the Wait? Comparing ByT5 and mT5 for Machine Translation." Transactions of the Association for Computational Linguistics 12 (2024): 392–410. http://dx.doi.org/10.1162/tacl_a_00651.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Abstract Pretrained character-level and byte-level language models have been shown to be competitive with popular subword models across a range of Natural Language Processing tasks. However, there has been little research on their effectiveness for neural machine translation (NMT), particularly within the popular pretrain-then-finetune paradigm. This work performs an extensive comparison across multiple languages and experimental conditions of character- and subword-level pretrained models (ByT5 and mT5, respectively) on NMT. We show the effectiveness of character-level modeling in translation, particularly in cases where fine-tuning data is limited. In our analysis, we show how character models’ gains in translation quality are reflected in better translations of orthographically similar words and rare words. While evaluating the importance of source texts in driving model predictions, we highlight word-level patterns within ByT5, suggesting an ability to modulate word-level and character-level information during generation. We conclude by assessing the efficiency tradeoff of byte models, suggesting their usage in non-time-critical scenarios to boost translation quality.
7

Won, Hyun-Sik, Min-Ji Kim, Dohyun Kim, Hee-Soo Kim, and Kang-Min Kim. "University Student Dropout Prediction Using Pretrained Language Models." Applied Sciences 13, no. 12 (June 13, 2023): 7073. http://dx.doi.org/10.3390/app13127073.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Predicting student dropout from universities is an imperative but challenging task. Numerous data-driven approaches that utilize both student demographic information (e.g., gender, nationality, and high school graduation year) and academic information (e.g., GPA, participation in activities, and course evaluations) have shown meaningful results. Recently, pretrained language models have achieved very successful results in understanding the tasks associated with structured data as well as textual data. In this paper, we propose a novel student dropout prediction framework based on demographic and academic information, using a pretrained language model to capture the relationship between different forms of information. To this end, we first formulate both types of information in natural language form. We then recast the student dropout prediction task as a natural language inference (NLI) task. Finally, we fine-tune the pretrained language models to predict student dropout. In particular, we further enhance the model using a continuous hypothesis. The experimental results demonstrate that the proposed model is effective for the freshmen dropout prediction task. The proposed method exhibits significant improvements of as much as 9.00% in terms of F1-score compared with state-of-the-art techniques.
8

Zhou, Shengchao, Gaofeng Meng, Zhaoxiang Zhang, Richard Yi Da Xu, and Shiming Xiang. "Robust Feature Rectification of Pretrained Vision Models for Object Recognition." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 3 (June 26, 2023): 3796–804. http://dx.doi.org/10.1609/aaai.v37i3.25492.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Pretrained vision models for object recognition often suffer a dramatic performance drop with degradations unseen during training. In this work, we propose a RObust FEature Rectification module (ROFER) to improve the performance of pretrained models against degradations. Specifically, ROFER first estimates the type and intensity of the degradation that corrupts the image features. Then, it leverages a Fully Convolutional Network (FCN) to rectify the features from the degradation by pulling them back to clear features. ROFER is a general-purpose module that can address various degradations simultaneously, including blur, noise, and low contrast. Besides, it can be plugged into pretrained models seamlessly to rectify the degraded features without retraining the whole model. Furthermore, ROFER can be easily extended to address composite degradations by adopting a beam search algorithm to find the composition order. Evaluations on CIFAR-10 and Tiny-ImageNet demonstrate that the accuracy of ROFER is 5% higher than that of SOTA methods on different degradations. With respect to composite degradations, ROFER improves the accuracy of a pretrained CNN by 10% and 6% on CIFAR-10 and Tiny-ImageNet respectively.
9

Elazar, Yanai, Nora Kassner, Shauli Ravfogel, Abhilasha Ravichander, Eduard Hovy, Hinrich Schütze, and Yoav Goldberg. "Measuring and Improving Consistency in Pretrained Language Models." Transactions of the Association for Computational Linguistics 9 (2021): 1012–31. http://dx.doi.org/10.1162/tacl_a_00410.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Abstract Consistency of a model—that is, the invariance of its behavior under meaning-preserving alternations in its input—is a highly desirable property in natural language processing. In this paper we study the question: Are Pretrained Language Models (PLMs) consistent with respect to factual knowledge? To this end, we create ParaRel🤘, a high-quality resource of cloze-style query English paraphrases. It contains a total of 328 paraphrases for 38 relations. Using ParaRel🤘, we show that the consistency of all PLMs we experiment with is poor— though with high variance between relations. Our analysis of the representational spaces of PLMs suggests that they have a poor structure and are currently not suitable for representing knowledge robustly. Finally, we propose a method for improving model consistency and experimentally demonstrate its effectiveness.1
10

Takeoka, Kunihiro. "Low-resouce Taxonomy Enrichment with Pretrained Language Models." Journal of Natural Language Processing 29, no. 1 (2022): 259–63. http://dx.doi.org/10.5715/jnlp.29.259.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Si, Chenglei, Zhengyan Zhang, Yingfa Chen, Fanchao Qi, Xiaozhi Wang, Zhiyuan Liu, Yasheng Wang, Qun Liu, and Maosong Sun. "Sub-Character Tokenization for Chinese Pretrained Language Models." Transactions of the Association for Computational Linguistics 11 (May 18, 2023): 469–87. http://dx.doi.org/10.1162/tacl_a_00560.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Abstract Tokenization is fundamental to pretrained language models (PLMs). Existing tokenization methods for Chinese PLMs typically treat each character as an indivisible token. However, they ignore the unique feature of the Chinese writing system where additional linguistic information exists below the character level, i.e., at the sub-character level. To utilize such information, we propose sub-character (SubChar for short) tokenization. Specifically, we first encode the input text by converting each Chinese character into a short sequence based on its glyph or pronunciation, and then construct the vocabulary based on the encoded text with sub-word segmentation. Experimental results show that SubChar tokenizers have two main advantages over existing tokenizers: 1) They can tokenize inputs into much shorter sequences, thus improving the computational efficiency. 2) Pronunciation-based SubChar tokenizers can encode Chinese homophones into the same transliteration sequences and produce the same tokenization output, hence being robust to homophone typos. At the same time, models trained with SubChar tokenizers perform competitively on downstream tasks. We release our code and models at https://github.com/thunlp/SubCharTokenization to facilitate future work.
12

Ren, Guanyu. "Monkeypox Disease Detection with Pretrained Deep Learning Models." Information Technology and Control 52, no. 2 (July 15, 2023): 288–96. http://dx.doi.org/10.5755/j01.itc.52.2.32803.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Monkeypox has been recognized as the next global pandemic after COVID-19 and its potential damage cannot be neglected. Computer vision-based diagnosis and detection method with deep learning models have been proven effective during the COVID-19 period. However, with limited samples, the deep learning models are difficult to be full trained. In this paper, twelve CNN-based models, including VGG16, VGG19, ResNet152, DenseNet121, DenseNet201, EfficientNetB7, EfficientNetV2B3, EfficientNetV2M and InceptionV3, are used for monkeypox detection with limited skin pictures. Numerical results suggest that DenseNet201 achieves the best classification accuracy of 98.89% for binary classification, 100% for four-class classification and 99.94% for six-class classification over the rest models.
13

Chen, Zhi, Yuncong Liu, Lu Chen, Su Zhu, Mengyue Wu, and Kai Yu. "OPAL: Ontology-Aware Pretrained Language Model for End-to-End Task-Oriented Dialogue." Transactions of the Association for Computational Linguistics 11 (2023): 68–84. http://dx.doi.org/10.1162/tacl_a_00534.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Abstract This paper presents an ontology-aware pretrained language model (OPAL) for end-to-end task-oriented dialogue (TOD). Unlike chit-chat dialogue models, task-oriented dialogue models fulfill at least two task-specific modules: Dialogue state tracker (DST) and response generator (RG). The dialogue state consists of the domain-slot-value triples, which are regarded as the user’s constraints to search the domain-related databases. The large-scale task-oriented dialogue data with the annotated structured dialogue state usually are inaccessible. It prevents the development of the pretrained language model for the task-oriented dialogue. We propose a simple yet effective pretraining method to alleviate this problem, which consists of two pretraining phases. The first phase is to pretrain on large-scale contextual text data, where the structured information of the text is extracted by the information extracting tool. To bridge the gap between the pretraining method and downstream tasks, we design two pretraining tasks: ontology-like triple recovery and next-text generation, which simulates the DST and RG, respectively. The second phase is to fine-tune the pretrained model on the TOD data. The experimental results show that our proposed method achieves an exciting boost and obtains competitive performance even without any TOD data on CamRest676 and MultiWOZ benchmarks.
14

Choi, Yong-Seok, Yo-Han Park, and Kong Joo Lee. "Building a Korean morphological analyzer using two Korean BERT models." PeerJ Computer Science 8 (May 2, 2022): e968. http://dx.doi.org/10.7717/peerj-cs.968.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
A morphological analyzer plays an essential role in identifying functional suffixes of Korean words. The analyzer input and output differ from each other in their length and strings, which can be dealt with by an encoder-decoder architecture. We adopt a Transformer architecture, which is an encoder-decoder architecture with self-attention rather than a recurrent connection, to implement a Korean morphological analyzer. Bidirectional Encoder Representations from Transformers (BERT) is one of the most popular pretrained representation models; it can present an encoded sequence of input words, considering contextual information. We initialize both the Transformer encoder and decoder with two types of Korean BERT, one of which is pretrained with a raw corpus, and the other is pretrained with a morphologically analyzed dataset. Therefore, implementing a Korean morphological analyzer based on Transformer is a fine-tuning process with a relatively small corpus. A series of experiments proved that parameter initialization using pretrained models can alleviate the chronic problem of a lack of training data and reduce the time required for training. In addition, we can determine the number of layers required for the encoder and decoder to optimize the performance of a Korean morphological analyzer.
15

Kim, Hyunil, Tae-Yeong Kwak, Hyeyoon Chang, Sun Woo Kim, and Injung Kim. "RCKD: Response-Based Cross-Task Knowledge Distillation for Pathological Image Analysis." Bioengineering 10, no. 11 (November 2, 2023): 1279. http://dx.doi.org/10.3390/bioengineering10111279.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
We propose a novel transfer learning framework for pathological image analysis, the Response-based Cross-task Knowledge Distillation (RCKD), which improves the performance of the model by pretraining it on a large unlabeled dataset guided by a high-performance teacher model. RCKD first pretrains a student model to predict the nuclei segmentation results of the teacher model for unlabeled pathological images, and then fine-tunes the pretrained model for the downstream tasks, such as organ cancer sub-type classification and cancer region segmentation, using relatively small target datasets. Unlike conventional knowledge distillation, RCKD does not require that the target tasks of the teacher and student models be the same. Moreover, unlike conventional transfer learning, RCKD can transfer knowledge between models with different architectures. In addition, we propose a lightweight architecture, the Convolutional neural network with Spatial Attention by Transformers (CSAT), for processing high-resolution pathological images with limited memory and computation. CSAT exhibited a top-1 accuracy of 78.6% on ImageNet with only 3M parameters and 1.08 G multiply-accumulate (MAC) operations. When pretrained by RCKD, CSAT exhibited average classification and segmentation accuracies of 94.2% and 0.673 mIoU on six pathological image datasets, which is 4% and 0.043 mIoU higher than EfficientNet-B0, and 7.4% and 0.006 mIoU higher than ConvNextV2-Atto pretrained on ImageNet, respectively.
16

Ivgi, Maor, Uri Shaham, and Jonathan Berant. "Efficient Long-Text Understanding with Short-Text Models." Transactions of the Association for Computational Linguistics 11 (2023): 284–99. http://dx.doi.org/10.1162/tacl_a_00547.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Abstract Transformer-based pretrained language models (LMs) are ubiquitous across natural language understanding, but cannot be applied to long sequences such as stories, scientific articles, and long documents due to their quadratic complexity. While a myriad of efficient transformer variants have been proposed, they are typically based on custom implementations that require expensive pretraining from scratch. In this work, we propose SLED: SLiding-Encoder and Decoder, a simple approach for processing long sequences that re-uses and leverages battle-tested short-text pretrained LMs. Specifically, we partition the input into overlapping chunks, encode each with a short-text LM encoder and use the pretrained decoder to fuse information across chunks (fusion-in-decoder). We illustrate through controlled experiments that SLED offers a viable strategy for long text understanding and evaluate our approach on SCROLLS, a benchmark with seven datasets across a wide range of language understanding tasks. We find that SLED is competitive with specialized models that are up to 50x larger and require a dedicated and expensive pretraining step.
17

Almonacid-Olleros, Guillermo, Gabino Almonacid, David Gil, and Javier Medina-Quero. "Evaluation of Transfer Learning and Fine-Tuning to Nowcast Energy Generation of Photovoltaic Systems in Different Climates." Sustainability 14, no. 5 (March 7, 2022): 3092. http://dx.doi.org/10.3390/su14053092.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
New trends of Machine learning models are able to nowcast power generation overtaking the formulation-based standards. In this work, the capabilities of deep learning to predict energy generation over three different areas and deployments in the world are discussed. To this end, transfer learning from deep learning models to nowcast output power generation in photovoltaic systems is analyzed. First, data from three photovoltaic systems in different regions of Spain, Italy and India are unified under a common segmentation stage. Next, pretrained and non-pretrained models are evaluated in the same and different regions to analyze the transfer of knowledge between different deployments and areas. The use of pretrained models provides encouraging results which can be optimized with rearward learning of local data, providing more accurate models.
18

Lee, Eunchan, Changhyeon Lee, and Sangtae Ahn. "Comparative Study of Multiclass Text Classification in Research Proposals Using Pretrained Language Models." Applied Sciences 12, no. 9 (April 29, 2022): 4522. http://dx.doi.org/10.3390/app12094522.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Recently, transformer-based pretrained language models have demonstrated stellar performance in natural language understanding (NLU) tasks. For example, bidirectional encoder representations from transformers (BERT) have achieved outstanding performance through masked self-supervised pretraining and transformer-based modeling. However, the original BERT may only be effective for English-based NLU tasks, whereas its effectiveness for other languages such as Korean is limited. Thus, the applicability of BERT-based language models pretrained in languages other than English to NLU tasks based on those languages must be investigated. In this study, we comparatively evaluated seven BERT-based pretrained language models and their expected applicability to Korean NLU tasks. We used the climate technology dataset, which is a Korean-based large text classification dataset, in research proposals involving 45 classes. We found that the BERT-based model pretrained on the most recent Korean corpus performed the best in terms of Korean-based multiclass text classification. This suggests the necessity of optimal pretraining for specific NLU tasks, particularly those in languages other than English.
19

Mutreja, G., and K. Bittner. "EVALUATING CONVNET AND TRANSFORMER BASED SELF-SUPERVISED ALGORITHMS FOR BUILDING ROOF FORM CLASSIFICATION." International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLVIII-1/W2-2023 (December 13, 2023): 315–21. http://dx.doi.org/10.5194/isprs-archives-xlviii-1-w2-2023-315-2023.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Abstract. This research paper presents a comprehensive evaluation of various self-supervised learning models for building roof type classification. We conduct linear evaluation experiments for the models pretrained on both the ImageNet1K dataset and a custom building roof type dataset to assess the models’ performance for the roof type classification task. The results demonstrate the effectiveness of the ViT-based BEiTV2 model, which outperforms other models on both datasets, achieving an accuracy of 96.8% from the model pretrained on ImageNet1K dataset and 92.67% on the model pretrained on building roof type dataset. The class activation maps further validate the strong performance of MoCoV3, BarlowTwins, and DenseCL models. These findings emphasize the potential of self-supervised learning for accurate building roof type classification, with the ViT-based BEiTV2 model showcasing state-of-the-art results.
20

Malyala, Sohith Sai, Janardhan Reddy Guntaka, Sai Vignesh Chintala, Lohith Vattikuti, and SrinivasaRao Tummalapalli. "Exploring How AI Answering Models Understand and Respond in Context." International Journal for Research in Applied Science and Engineering Technology 11, no. 9 (September 30, 2023): 224–28. http://dx.doi.org/10.22214/ijraset.2023.55597.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Abstract: Question answering (QA) is an important capability for artificial intelligence systems to assist humans by providing relevant information. In recent years, large pretrained language models like BERT and GPT have shown promising results on QA tasks. This paper explores how two state-of-the-art models, BERT and GPT-4, understand questions and generate answers in conversational contexts. We first provide an overview of the architectures and pretrained objectives of both models. Then we conduct experiments on two QA datasets to evaluate each model's ability to reason about questions, leverage context and background knowledge, and provide natural and logically consistent responses. Quantitative results reveal the strengths and weaknesses of each model, with BERT demonstrating stronger reasoning abilities but GPT-4 generating more human-like responses. Through qualitative error analysis, we identify cases where each model fails and propose explanations grounded in their underlying architectures and pretraining approaches. This analysis provides insights into the current capabilities and limitations of large pretrained models for open-domain conversational QA. The results suggest directions for improving both types of models, including combining their complementary strengths, increasing reasoning ability, and incorporating more conversational context. This work highlights important considerations in developing AI systems that can intelligently understand and respond to natural language questions.
21

Demircioğlu, Aydin. "Deep Features from Pretrained Networks Do Not Outperform Hand-Crafted Features in Radiomics." Diagnostics 13, no. 20 (October 20, 2023): 3266. http://dx.doi.org/10.3390/diagnostics13203266.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
In radiomics, utilizing features extracted from pretrained deep networks could result in models with a higher predictive performance than those relying on hand-crafted features. This study compared the predictive performance of models trained with either deep features, hand-crafted features, or a combination of these features in terms of the area under the receiver-operating characteristic curve (AUC) and other metrics. We trained models on ten radiological datasets using five feature selection methods and three classifiers. Our results indicate that models based on deep features did not show an improved AUC compared to those utilizing hand-crafted features (deep: AUC 0.775, hand-crafted: AUC 0.789; p = 0.28). Including morphological features alongside deep features led to overall improvements in prediction performance for all models (+0.02 gain in AUC; p < 0.001); however, the best model did not benefit from this (+0.003 gain in AUC; p = 0.57). Using all hand-crafted features in addition to the deep features resulted in a further overall improvement (+0.034 in AUC; p < 0.001), but only a minor improvement could be observed for the best model (deep: AUC 0.798, hand-crafted: AUC 0.789; p = 0.92). Furthermore, our results show that models based on deep features extracted from networks pretrained on medical data have no advantage in predictive performance over models relying on features extracted from networks pretrained on ImageNet data. Our study contributes a benchmarking analysis of models trained on hand-crafted and deep features from pretrained networks across multiple datasets. It also provides a comprehensive understanding of their applicability and limitations in radiomics. Our study shows, in conclusion, that models based on features extracted from pretrained deep networks do not outperform models trained on hand-crafted ones.
22

Kotei, Evans, and Ramkumar Thirunavukarasu. "A Systematic Review of Transformer-Based Pre-Trained Language Models through Self-Supervised Learning." Information 14, no. 3 (March 16, 2023): 187. http://dx.doi.org/10.3390/info14030187.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Transfer learning is a technique utilized in deep learning applications to transmit learned inference to a different target domain. The approach is mainly to solve the problem of a few training datasets resulting in model overfitting, which affects model performance. The study was carried out on publications retrieved from various digital libraries such as SCOPUS, ScienceDirect, IEEE Xplore, ACM Digital Library, and Google Scholar, which formed the Primary studies. Secondary studies were retrieved from Primary articles using the backward and forward snowballing approach. Based on set inclusion and exclusion parameters, relevant publications were selected for review. The study focused on transfer learning pretrained NLP models based on the deep transformer network. BERT and GPT were the two elite pretrained models trained to classify global and local representations based on larger unlabeled text datasets through self-supervised learning. Pretrained transformer models offer numerous advantages to natural language processing models, such as knowledge transfer to downstream tasks that deal with drawbacks associated with training a model from scratch. This review gives a comprehensive view of transformer architecture, self-supervised learning and pretraining concepts in language models, and their adaptation to downstream tasks. Finally, we present future directions to further improvement in pretrained transformer-based language models.
23

Jackson, Richard G., Erik Jansson, Aron Lagerberg, Elliot Ford, Vladimir Poroshin, Timothy Scrivener, Mats Axelsson, Martin Johansson, Lesly Arun Franco, and Eliseo Papa. "Ablations over transformer models for biomedical relationship extraction." F1000Research 9 (July 16, 2020): 710. http://dx.doi.org/10.12688/f1000research.24552.1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Background: Masked language modelling approaches have enjoyed success in improving benchmark performance across many general and biomedical domain natural language processing tasks, including biomedical relationship extraction (RE). However, the recent surge in both the number of novel architectures and the volume of training data they utilise may lead us to question whether domain specific pretrained models are necessary. Additionally, recent work has proposed novel classification heads for RE tasks, further improving performance. Here, we perform ablations over several pretrained models and classification heads to try to untangle the perceived benefits of each. Methods: We use a range of string preprocessing strategies, combined with Bidirectional Encoder Representations from Transformers (BERT), BioBERT and RoBERTa architectures to perform ablations over three RE datasets pertaining to drug-drug and chemical protein interactions, and general domain relationship extraction. We explore the use of the RBERT classification head, compared to a simple linear classification layer across all architectures and datasets. Results: We observe a moderate performance benefit in using the BioBERT pretrained model over the BERT base cased model, although there appears to be little difference when comparing BioBERT to RoBERTa large. In addition, we observe a substantial benefit of using the RBERT head on the general domain RE dataset, but this is not consistently reflected in the biomedical RE datasets. Finally, we discover that randomising the token order of training data does not result in catastrophic performance degradation in our selected tasks. Conclusions: We find a recent general domain pretrained model performs approximately the same as a biomedical specific one, suggesting that domain specific models may be of limited use given the tendency of recent model pretraining regimes to incorporate ever broader sets of data. In addition, we suggest that care must be taken in RE model training, to prevent fitting to non-syntactic features of datasets.
24

Sahel, S., M. Alsahafi, M. Alghamdi, and T. Alsubait. "Logo Detection Using Deep Learning with Pretrained CNN Models." Engineering, Technology & Applied Science Research 11, no. 1 (February 6, 2021): 6724–29. http://dx.doi.org/10.48084/etasr.3919.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Logo detection in images and videos is considered a key task for various applications, such as vehicle logo detection for traffic-monitoring systems, copyright infringement detection, and contextual content placement. The main contribution of this work is the application of emerging deep learning techniques to perform brand and logo recognition tasks through the use of multiple modern convolutional neural network models. In this work, pre-trained object detection models are utilized in order to enhance the performance of logo detection tasks when only a portion of labeled training images taken in truthful context is obtainable, evading wide manual classification costs. Superior logo detection results were obtained. In this study, the FlickrLogos-32 dataset was used, which is a common public dataset for logo detection and brand recognition from real-world product images. For model evaluation, the efficiency of creating the model and of its accuracy was considered.
25

Jiang, Shengyi, Sihui Fu, Nankai Lin, and Yingwen Fu. "Pretrained models and evaluation data for the Khmer language." Tsinghua Science and Technology 27, no. 4 (August 2022): 709–18. http://dx.doi.org/10.26599/tst.2021.9010060.

Full text
APA, Harvard, Vancouver, ISO, and other styles
26

Zeng, Zhiyuan, and Deyi Xiong. "Unsupervised and few-shot parsing from pretrained language models." Artificial Intelligence 305 (April 2022): 103665. http://dx.doi.org/10.1016/j.artint.2022.103665.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

Saravagi, Deepika, Shweta Agrawal, Manisha Saravagi, Jyotir Moy Chatterjee, and Mohit Agarwal. "Diagnosis of Lumbar Spondylolisthesis Using Optimized Pretrained CNN Models." Computational Intelligence and Neuroscience 2022 (April 13, 2022): 1–12. http://dx.doi.org/10.1155/2022/7459260.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Spondylolisthesis refers to the slippage of one vertebral body over the adjacent one. It is a chronic condition that requires early detection to prevent unpleasant surgery. The paper presents an optimized deep learning model for detecting spondylolisthesis in X-ray radiographs. The dataset contains a total of 299 X-ray radiographs from which 156 images are showing the spine with spondylolisthesis and 143 images are of the normal spine. Image augmentation technique is used to increase the data samples. In this study, VGG16 and InceptionV3 models were used for the image classification task. The developed model is optimized by utilizing the TFLite model optimization technique. The experimental result shows that the VGG16 model has achieved a 98% accuracy rate, which is higher than InceptionV3’s 96% accuracy rate. The size of the implemented model is reduced up to four times so it can be used on small devices. The compressed VGG16 and InceptionV3 models have achieved 100% and 96% accuracy rate, respectively. Our finding shows that the implemented models were outperformed in the diagnosis of lumbar spondylolisthesis as compared to the model suggested by Varcin et al. (which had a maximum of 93% accuracy rate). Also, the developed quantized model has achieved higher accuracy rate than Zebin and Rezvy’s (VGG16 + TFLite) model with 90% accuracy. Furthermore, by evaluating the model’s performance on other publicly available datasets, we have generalised our approach on the public platform.
28

Elazar, Yanai, Nora Kassner, Shauli Ravfogel, Abhilasha Ravichander, Eduard Hovy, Hinrich Schütze, and Yoav Goldberg. "Erratum: Measuring and Improving Consistency in Pretrained Language Models." Transactions of the Association for Computational Linguistics 9 (2021): 1407. http://dx.doi.org/10.1162/tacl_x_00455.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Abstract During production of this paper, an error was introduced to the formula on the bottom of the right column of page 1020. In the last two terms of the formula, the n and m subscripts were swapped. The correct formula is:Lc=∑n=1k∑m=n+1kDKL(Qnri∥Qmri)+DKL(Qmri∥Qnri)The paper has been updated.
29

Al-Sarem, Mohammed, Mohammed Al-Asali, Ahmed Yaseen Alqutaibi, and Faisal Saeed. "Enhanced Tooth Region Detection Using Pretrained Deep Learning Models." International Journal of Environmental Research and Public Health 19, no. 22 (November 21, 2022): 15414. http://dx.doi.org/10.3390/ijerph192215414.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
The rapid development of artificial intelligence (AI) has led to the emergence of many new technologies in the healthcare industry. In dentistry, the patient’s panoramic radiographic or cone beam computed tomography (CBCT) images are used for implant placement planning to find the correct implant position and eliminate surgical risks. This study aims to develop a deep learning-based model that detects missing teeth’s position on a dataset segmented from CBCT images. Five hundred CBCT images were included in this study. After preprocessing, the datasets were randomized and divided into 70% training, 20% validation, and 10% test data. A total of six pretrained convolutional neural network (CNN) models were used in this study, which includes AlexNet, VGG16, VGG19, ResNet50, DenseNet169, and MobileNetV3. In addition, the proposed models were tested with/without applying the segmentation technique. Regarding the normal teeth class, the performance of the proposed pretrained DL models in terms of precision was above 0.90. Moreover, the experimental results showed the superiority of DenseNet169 with a precision of 0.98. In addition, other models such as MobileNetV3, VGG19, ResNet50, VGG16, and AlexNet obtained a precision of 0.95, 0.94, 0.94, 0.93, and 0.92, respectively. The DenseNet169 model performed well at the different stages of CBCT-based detection and classification with a segmentation accuracy of 93.3% and classification of missing tooth regions with an accuracy of 89%. As a result, the use of this model may represent a promising time-saving tool serving dental implantologists with a significant step toward automated dental implant planning.
30

Xu, Canwen, and Julian McAuley. "A Survey on Model Compression and Acceleration for Pretrained Language Models." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 9 (June 26, 2023): 10566–75. http://dx.doi.org/10.1609/aaai.v37i9.26255.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Despite achieving state-of-the-art performance on many NLP tasks, the high energy cost and long inference delay prevent Transformer-based pretrained language models (PLMs) from seeing broader adoption including for edge and mobile computing. Efficient NLP research aims to comprehensively consider computation, time and carbon emission for the entire life-cycle of NLP, including data preparation, model training and inference. In this survey, we focus on the inference stage and review the current state of model compression and acceleration for pretrained language models, including benchmarks, metrics and methodology.
31

Lee, Chanhee, Kisu Yang, Taesun Whang, Chanjun Park, Andrew Matteson, and Heuiseok Lim. "Exploring the Data Efficiency of Cross-Lingual Post-Training in Pretrained Language Models." Applied Sciences 11, no. 5 (February 24, 2021): 1974. http://dx.doi.org/10.3390/app11051974.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Language model pretraining is an effective method for improving the performance of downstream natural language processing tasks. Even though language modeling is unsupervised and thus collecting data for it is relatively less expensive, it is still a challenging process for languages with limited resources. This results in great technological disparity between high- and low-resource languages for numerous downstream natural language processing tasks. In this paper, we aim to make this technology more accessible by enabling data efficient training of pretrained language models. It is achieved by formulating language modeling of low-resource languages as a domain adaptation task using transformer-based language models pretrained on corpora of high-resource languages. Our novel cross-lingual post-training approach selectively reuses parameters of the language model trained on a high-resource language and post-trains them while learning language-specific parameters in the low-resource language. We also propose implicit translation layers that can learn linguistic differences between languages at a sequence level. To evaluate our method, we post-train a RoBERTa model pretrained in English and conduct a case study for the Korean language. Quantitative results from intrinsic and extrinsic evaluations show that our method outperforms several massively multilingual and monolingual pretrained language models in most settings and improves the data efficiency by a factor of up to 32 compared to monolingual training.
32

Zhang, Wenbo, Xiao Li, Yating Yang, Rui Dong, and Gongxu Luo. "Keeping Models Consistent between Pretraining and Translation for Low-Resource Neural Machine Translation." Future Internet 12, no. 12 (November 27, 2020): 215. http://dx.doi.org/10.3390/fi12120215.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Recently, the pretraining of models has been successfully applied to unsupervised and semi-supervised neural machine translation. A cross-lingual language model uses a pretrained masked language model to initialize the encoder and decoder of the translation model, which greatly improves the translation quality. However, because of a mismatch in the number of layers, the pretrained model can only initialize part of the decoder’s parameters. In this paper, we use a layer-wise coordination transformer and a consistent pretraining translation transformer instead of a vanilla transformer as the translation model. The former has only an encoder, and the latter has an encoder and a decoder, but the encoder and decoder have exactly the same parameters. Both models can guarantee that all parameters in the translation model can be initialized by the pretrained model. Experiments on the Chinese–English and English–German datasets show that compared with the vanilla transformer baseline, our models achieve better performance with fewer parameters when the parallel corpus is small.
33

Lobo, Fernando, Maily Selena González, Alicia Boto, and José Manuel Pérez de la Lastra. "Prediction of Antifungal Activity of Antimicrobial Peptides by Transfer Learning from Protein Pretrained Models." International Journal of Molecular Sciences 24, no. 12 (June 17, 2023): 10270. http://dx.doi.org/10.3390/ijms241210270.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Peptides with antifungal activity have gained significant attention due to their potential therapeutic applications. In this study, we explore the use of pretrained protein models as feature extractors to develop predictive models for antifungal peptide activity. Various machine learning classifiers were trained and evaluated. Our AFP predictor achieved comparable performance to current state-of-the-art methods. Overall, our study demonstrates the effectiveness of pretrained models for peptide analysis and provides a valuable tool for predicting antifungal peptide activity and potentially other peptide properties.
34

Zhang, Tianyu, Jake Gu, Omid Ardakanian, and Joyce Kim. "Addressing data inadequacy challenges in personal comfort models by combining pretrained comfort models." Energy and Buildings 264 (June 2022): 112068. http://dx.doi.org/10.1016/j.enbuild.2022.112068.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

Yang, Xi, Jiang Bian, William R. Hogan, and Yonghui Wu. "Clinical concept extraction using transformers." Journal of the American Medical Informatics Association 27, no. 12 (October 29, 2020): 1935–42. http://dx.doi.org/10.1093/jamia/ocaa189.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Abstract Objective The goal of this study is to explore transformer-based models (eg, Bidirectional Encoder Representations from Transformers [BERT]) for clinical concept extraction and develop an open-source package with pretrained clinical models to facilitate concept extraction and other downstream natural language processing (NLP) tasks in the medical domain. Methods We systematically explored 4 widely used transformer-based architectures, including BERT, RoBERTa, ALBERT, and ELECTRA, for extracting various types of clinical concepts using 3 public datasets from the 2010 and 2012 i2b2 challenges and the 2018 n2c2 challenge. We examined general transformer models pretrained using general English corpora as well as clinical transformer models pretrained using a clinical corpus and compared them with a long short-term memory conditional random fields (LSTM-CRFs) mode as a baseline. Furthermore, we integrated the 4 clinical transformer-based models into an open-source package. Results and Conclusion The RoBERTa-MIMIC model achieved state-of-the-art performance on 3 public clinical concept extraction datasets with F1-scores of 0.8994, 0.8053, and 0.8907, respectively. Compared to the baseline LSTM-CRFs model, RoBERTa-MIMIC remarkably improved the F1-score by approximately 4% and 6% on the 2010 and 2012 i2b2 datasets. This study demonstrated the efficiency of transformer-based models for clinical concept extraction. Our methods and systems can be applied to other clinical tasks. The clinical transformer package with 4 pretrained clinical models is publicly available at https://github.com/uf-hobi-informatics-lab/ClinicalTransformerNER. We believe this package will improve current practice on clinical concept extraction and other tasks in the medical domain.
36

De Coster, Mathieu, and Joni Dambre. "Leveraging Frozen Pretrained Written Language Models for Neural Sign Language Translation." Information 13, no. 5 (April 23, 2022): 220. http://dx.doi.org/10.3390/info13050220.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
We consider neural sign language translation: machine translation from signed to written languages using encoder–decoder neural networks. Translating sign language videos to written language text is especially complex because of the difference in modality between source and target language and, consequently, the required video processing. At the same time, sign languages are low-resource languages, their datasets dwarfed by those available for written languages. Recent advances in written language processing and success stories of transfer learning raise the question of how pretrained written language models can be leveraged to improve sign language translation. We apply the Frozen Pretrained Transformer (FPT) technique to initialize the encoder, decoder, or both, of a sign language translation model with parts of a pretrained written language model. We observe that the attention patterns transfer in zero-shot to the different modality and, in some experiments, we obtain higher scores (from 18.85 to 21.39 BLEU-4). Especially when gloss annotations are unavailable, FPTs can increase performance on unseen data. However, current models appear to be limited primarily by data quality and only then by data quantity, limiting potential gains with FPTs. Therefore, in further research, we will focus on improving the representations used as inputs to translation models.
37

AlOyaynaa, Sarah, and Yasser Kotb. "Arabic Grammatical Error Detection Using Transformers-based Pretrained Language Models." ITM Web of Conferences 56 (2023): 04009. http://dx.doi.org/10.1051/itmconf/20235604009.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
This paper presents a new study to use pre-trained language models based on the transformers for Arabic grammatical error detection (GED). We proposed fine-tuned language models based on pre-trained language models called AraBERT and M-BERT to perform Arabic GED on two approaches, which are the token level and sentence level. Fine-tuning was done with different publicly available Arabic datasets. The proposed models outperform similar studies with F1 value of 0.87, recall of 0.90, precision of 0.83 at the token level, and F1 of 0.98, recall of 0.99, and precision of 0.97 at the sentence level. Whereas the other studies in the same field (i.e., GED) results less than the current study (e.g., F0.5 of 69.21). Moreover, the current study shows that the fine-tuned language models that were built on the monolingual pre-trained language models result in better performance than the multilingual pre-trained language models in Arabic.
38

Kalyan, Katikapalli Subramanyam, Ajit Rajasekharan, and Sivanesan Sangeetha. "AMMU: A survey of transformer-based biomedical pretrained language models." Journal of Biomedical Informatics 126 (February 2022): 103982. http://dx.doi.org/10.1016/j.jbi.2021.103982.

Full text
APA, Harvard, Vancouver, ISO, and other styles
39

Silver, Tom, Soham Dan, Kavitha Srinivas, Joshua B. Tenenbaum, Leslie Kaelbling, and Michael Katz. "Generalized Planning in PDDL Domains with Pretrained Large Language Models." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 18 (March 24, 2024): 20256–64. http://dx.doi.org/10.1609/aaai.v38i18.30006.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Recent work has considered whether large language models (LLMs) can function as planners: given a task, generate a plan. We investigate whether LLMs can serve as generalized planners: given a domain and training tasks, generate a program that efficiently produces plans for other tasks in the domain. In particular, we consider PDDL domains and use GPT-4 to synthesize Python programs. We also consider (1) Chain-of-Thought (CoT) summarization, where the LLM is prompted to summarize the domain and propose a strategy in words before synthesizing the program; and (2) automated debugging, where the program is validated with respect to the training tasks, and in case of errors, the LLM is re-prompted with four types of feedback. We evaluate this approach in seven PDDL domains and compare it to four ablations and four baselines. Overall, we find that GPT-4 is a surprisingly powerful generalized planner. We also conclude that automated debugging is very important, that CoT summarization has non-uniform impact, that GPT-4 is far superior to GPT-3.5, and that just two training tasks are often sufficient for strong generalization.
40

Ahmad, Muhammad Shahrul Zaim, Nor Azlina Ab. Aziz, and Anith Khairunnisa Ghazali. "Development of Automated Attendance System Using Pretrained Deep Learning Models." Vol. 6 No. 1 (2024) 6, no. 1 (April 30, 2024): 6–12. http://dx.doi.org/10.33093/ijoras.2024.6.1.2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Abstract - Smart classroom enables better learning experience to the students and aid towards efficient campus' management. Many studies have shown positive correlation between attendance and student's performance, where the higher the attendance, the better the student's performance. Therefore, many higher learning institutions make class attendance compulsory and students' attendance are recorded. Technological solutions for an advanced attendance system such as face recognition is highly desirable. The authenticity of attendance can be ensured by using such solution. In this work, artificial intelligence based face recognition system is used for attendance recording system. The recognized face is used to confirm the presence of a student to the class. Six pretrained face recognition model are evaluated for the adoption in the system developed. The FaceNet, is adopted in this work with accuracy of more than 95%. The automation system is supported by IoT.
41

Yulianto, Rudy, Faqihudin, Meika Syahbana Rusli, Adhitio Satyo Bayangkari Karno, Widi Hastomo, Aqwam Rosadi Kardian, Vany Terisia, and Tri Surawan. "Innovative UNET-Based Steel Defect Detection Using 5 Pretrained Models." Evergreen 10, no. 4 (December 2023): 2365–78. http://dx.doi.org/10.5109/7160923.

Full text
APA, Harvard, Vancouver, ISO, and other styles
42

Yin, Yi, Weiming Zhang, Nenghai Yu, and Kejiang Chen. "Steganalysis of neural networks based on parameter statistical bias." Journal of University of Science and Technology of China 52, no. 1 (2022): 1. http://dx.doi.org/10.52396/justc-2021-0197.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
<p>Many pretrained deep learning models have been released to help engineers and researchers develop deep learning-based systems or conduct research with minimall effort. Previous work has shown that at secret message can be embedded in neural network parameters without compromising the accuracy of the model. Malicious developers can, therefore, hide malware or other baneful information in pretrained models, causing harm to society. Hence, reliable detection of these vicious pretrained models is urgently needed. We analyze existing approaches for hiding messages and find that they will ineluctably cause biases in the parameter statistics. Therefore, we propose steganalysis methods for steganography on neural network parameters that extract statistics from benign and malicious models and build classifiers based on the extracted statistics. To the best of our knowledge, this is the first study on neural network steganalysis. The experimental results reveal that our proposed algorithm can effectively detect a model with an embedded message. Notably, our detection methods are still valid in cases where the payload of the stego model is low.</p>
43

AlZahrani, Fetoun Mansour, and Maha Al-Yahya. "A Transformer-Based Approach to Authorship Attribution in Classical Arabic Texts." Applied Sciences 13, no. 12 (June 18, 2023): 7255. http://dx.doi.org/10.3390/app13127255.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Authorship attribution (AA) is a field of natural language processing that aims to attribute text to its author. Although the literature includes several studies on Arabic AA in general, applying AA to classical Arabic texts has not gained similar attention. This study focuses on investigating recent Arabic pretrained transformer-based models in a rarely studied domain with limited research contributions: the domain of Islamic law. We adopt an experimental approach to investigate AA. Because no dataset has been designed specifically for this task, we design and build our own dataset using Islamic law digital resources. We conduct several experiments on fine-tuning four Arabic pretrained transformer-based models: AraBERT, AraELECTRA, ARBERT, and MARBERT. Results of the experiments indicate that for the task of attributing a given text to its author, ARBERT and AraELECTRA outperform the other models with an accuracy of 96%. We conclude that pretrained transformer models, specifically ARBERT and AraELECTRA, fine-tuned using the Islamic legal dataset, show significant results in applying AA to Islamic legal texts.
44

Albashish, Dheeb. "Ensemble of adapted convolutional neural networks (CNN) methods for classifying colon histopathological images." PeerJ Computer Science 8 (July 5, 2022): e1031. http://dx.doi.org/10.7717/peerj-cs.1031.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Deep convolutional neural networks (CNN) manifest the potential for computer-aided diagnosis systems (CADs) by learning features directly from images rather than using traditional feature extraction methods. Nevertheless, due to the limited sample sizes and heterogeneity in tumor presentation in medical images, CNN models suffer from training issues, including training from scratch, which leads to overfitting. Alternatively, a pre-trained neural network’s transfer learning (TL) is used to derive tumor knowledge from medical image datasets using CNN that were designed for non-medical activations, alleviating the need for large datasets. This study proposes two ensemble learning techniques: E-CNN (product rule) and E-CNN (majority voting). These techniques are based on the adaptation of the pretrained CNN models to classify colon cancer histopathology images into various classes. In these ensembles, the individuals are, initially, constructed by adapting pretrained DenseNet121, MobileNetV2, InceptionV3, and VGG16 models. The adaptation of these models is based on a block-wise fine-tuning policy, in which a set of dense and dropout layers of these pretrained models is joined to explore the variation in the histology images. Then, the models’ decisions are fused via product rule and majority voting aggregation methods. The proposed model was validated against the standard pretrained models and the most recent works on two publicly available benchmark colon histopathological image datasets: Stoean (357 images) and Kather colorectal histology (5,000 images). The results were 97.20% and 91.28% accurate, respectively. The achieved results outperformed the state-of-the-art studies and confirmed that the proposed E-CNNs could be extended to be used in various medical image applications.
45

Pan, Yu, Ye Yuan, Yichun Yin, Jiaxin Shi, Zenglin Xu, Ming Zhang, Lifeng Shang, Xin Jiang, and Qun Liu. "Preparing Lessons for Progressive Training on Language Models." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 17 (March 24, 2024): 18860–68. http://dx.doi.org/10.1609/aaai.v38i17.29851.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
The rapid progress of Transformers in artificial intelligence has come at the cost of increased resource consumption and greenhouse gas emissions due to growing model sizes. Prior work suggests using pretrained small models to improve training efficiency, but this approach may not be suitable for new model structures. On the other hand, training from scratch can be slow, and progressively stacking layers often fails to achieve significant acceleration. To address these challenges, we propose a novel method called Apollo, which prepares lessons for expanding operations by learning high-layer functionality during training of low layers. Our approach involves low-value-prioritized sampling (LVPS) to train different depths and weight sharing to facilitate efficient expansion. We also introduce an interpolation method for stable model depth extension. Experiments demonstrate that Apollo achieves state-of-the-art acceleration ratios, even rivaling methods using pretrained models, making it a universal and efficient solution for training deep models while reducing time, financial, and environmental costs.
46

Anupriya, Anupriya. "Fine-tuning Pretrained Transformers for Sentiment Analysis on Twitter Data." Mathematical Statistician and Engineering Applications 70, no. 2 (February 26, 2021): 1344–52. http://dx.doi.org/10.17762/msea.v70i2.2326.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Due to the noise and informal language present in Twitter data, it is difficult to perform sentiment analysis on the platform. In recent years, a number of transformer models have been developed that can perform well in this type of task. This study aims to analyze the performance of these models on Twitter data. The study utilizes a publicly-available dataset of tweets with neutral, positive, or negative sentiment. It preprocesses the data and tokenizes it using WordPiece. Three transformer models are then tuned using the labeled tweets' pre-defined weights and the models' training weights from large language modeling projects. The models are trained on a 5-phased scale. The three models' performance was evaluated using various metrics, such as accuracy, recall, and F1 score. The results show that the models performed well overall, with ELECTRA leading the way with an accuracy of 85.8%, followed by XLNet and BERT with 84.3% and 84.5% accuracy, respectively. The study also looked into the hyperparameters' impact on the performance. It revealed that batch sizes and learning rates have a significant effect on the models' performance. The results indicate that the models performed better with larger batch sizes and lower learning rates. The study concluded that the three pre-trained transformer models, namely XLNet, ELECTRA, and BERT, were able to perform well in terms of their performance when it came to analyzing Twitter data. Their findings can be beneficial for those working in the field of sentiment analysis on social media platforms.
47

Zhang, Zhanhao. "The transferability of transfer learning model based on ImageNet for medical image classification tasks." Applied and Computational Engineering 18, no. 1 (October 23, 2023): 143–51. http://dx.doi.org/10.54254/2755-2721/18/20230980.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Transfer learning with pretrained weights is commonly based on the ImageNet dataset. However, ImageNet does not contain medical images, leaving the transferability of these pretrained weights for medical image classification an open question. The core purpose of this study is to investigate the impact of transfer learning on the accuracy of medical image classification, utilizing ResNet18, VGG11, AlexNet, and MobileNet, which are four of the most widely used neural network models. Specifically, this study aims to determine whether the incorporation of transfer learning techniques leads to significant improvements in the performance of image classification tasks, as compared to traditional methods that do not utilize transfer learning. The dataset consists of approximately 4,000 chest X-ray images with labels of healthy, COVID, or Viral Pneumonia. The final layer's output neurons of the networks architecture were revised to three to accommodate the ternary classification task. Preprocessing techniques include downsampling and normalization of the pixel values. By maintaining the same dataset and preprocessing methods, this study compares the performance of the models with and without pretrained weights. The results demonstrate that, compared to not using transfer learning, all four network models converge more quickly and achieve higher validation accuracy in the initial epochs when transfer learning is employed. Furthermore, the models exhibit higher prediction accuracy in the final test set. This study suggests that using transfer learning with pretrained weights based on ImageNet can boost the efficiency of medical image classification tasks.
48

Anton, Jonah, Liam Castelli, Mun Fai Chan, Mathilde Outters, Wan Hee Tang, Venus Cheung, Pancham Shukla, Rahee Walambe, and Ketan Kotecha. "How Well Do Self-Supervised Models Transfer to Medical Imaging?" Journal of Imaging 8, no. 12 (December 1, 2022): 320. http://dx.doi.org/10.3390/jimaging8120320.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Self-supervised learning approaches have seen success transferring between similar medical imaging datasets, however there has been no large scale attempt to compare the transferability of self-supervised models against each other on medical images. In this study, we compare the generalisability of seven self-supervised models, two of which were trained in-domain, against supervised baselines across nine different medical datasets. We find that ImageNet pretrained self-supervised models are more generalisable than their supervised counterparts, scoring up to 10% better on medical classification tasks. The two in-domain pretrained models outperformed other models by over 20% on in-domain tasks, however they suffered significant loss of accuracy on all other tasks. Our investigation of the feature representations suggests that this trend may be due to the models learning to focus too heavily on specific areas.
49

Siahkoohi, Ali, Mathias Louboutin, and Felix J. Herrmann. "The importance of transfer learning in seismic modeling and imaging." GEOPHYSICS 84, no. 6 (November 1, 2019): A47—A52. http://dx.doi.org/10.1190/geo2019-0056.1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Accurate forward modeling is essential for solving inverse problems in exploration seismology. Unfortunately, it is often not possible to afford being physically or numerically accurate. To overcome this conundrum, we make use of raw and processed data from nearby surveys. We have used these data, consisting of shot records or velocity models, to pretrain a neural network to correct for the effects of, for instance, the free surface or numerical dispersion, both of which can be considered as proxies for incomplete or inaccurate physics. Given this pretrained neural network, we apply transfer learning to fine-tune this pretrained neural network so it performs well on its task of mapping low-cost, but low-fidelity, solutions to high-fidelity solutions for the current survey. As long as we can limit ourselves during fine-tuning to using only a small fraction of high-fidelity data, we gain processing the current survey while using information from nearby surveys. We examined this principle by removing surface-related multiples and ghosts from shot records and the effects of numerical dispersion from migrated images and wave simulations.
50

Chen, Die, Hua Zhang, Zeqi Chen, Bo Xie, and Ye Wang. "Comparative Analysis on Alignment-Based and Pretrained Feature Representations for the Identification of DNA-Binding Proteins." Computational and Mathematical Methods in Medicine 2022 (June 28, 2022): 1–14. http://dx.doi.org/10.1155/2022/5847242.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
The interaction between DNA and protein is vital for the development of a living body. Previous numerous studies on in silico identification of DNA-binding proteins (DBPs) usually include features extracted from the alignment-based (pseudo) position-specific scoring matrix (PSSM), leading to limited application due to its time-consuming generation. Few researchers have paid attention to the application of pretrained language models at the scale of evolution to the identification of DBPs. To this end, we present comprehensive insights into a comparison study on alignment-based PSSM and pretrained evolutionary scale modeling (ESM) representations in the field of DBP classification. The comparison is conducted by extracting information from PSSM and ESM representations using four unified averaging operations and by performing various feature selection (FS) methods. Experimental results demonstrate that the pretrained ESM representation outperforms the PSSM-derived features in a fair comparison perspective. The pretrained feature presentation deserves wide application to the area of in silico DBP identification as well as other function annotation issues. Finally, it is also confirmed that an ensemble scheme by aggregating various trained FS models can significantly improve the classification performance of DBPs.

To the bibliography