Journal articles on the topic 'Transformer Architecture'

To see the other types of publications on this topic, follow the link: Transformer Architecture.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Transformer Architecture.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Rahali, Abir, and Moulay A. Akhloufi. "End-to-End Transformer-Based Models in Textual-Based NLP." AI 4, no. 1 (January 5, 2023): 54–110. http://dx.doi.org/10.3390/ai4010004.

Full text
Abstract:
Transformer architectures are highly expressive because they use self-attention mechanisms to encode long-range dependencies in the input sequences. In this paper, we present a literature review on Transformer-based (TB) models, providing a detailed overview of each model in comparison to the Transformer’s standard architecture. This survey focuses on TB models used in the field of Natural Language Processing (NLP) for textual-based tasks. We begin with an overview of the fundamental concepts at the heart of the success of these models. Then, we classify them based on their architecture and training mode. We compare the advantages and disadvantages of popular techniques in terms of architectural design and experimental value. Finally, we discuss open research, directions, and potential future work to help solve current TB application challenges in NLP.
APA, Harvard, Vancouver, ISO, and other styles
2

Chi, Ye, Haikun Liu, Ganwei Peng, Xiaofei Liao, and Hai Jin. "Transformer: An OS-Supported Reconfigurable Hybrid Memory Architecture." Applied Sciences 12, no. 24 (December 18, 2022): 12995. http://dx.doi.org/10.3390/app122412995.

Full text
Abstract:
Non-volatile memories (NVMs) have aroused vast interest in hybrid memory systems due to their promising features of byte-addressability, high storage density, low cost per byte, and near-zero standby energy consumption. However, since NVMs have limited write endurance, high write latency, and high write energy consumption, it is still challenging to directly replace traditional dynamic random access memory (DRAM) with NVMs. Many studies propose to utilize NVM and DRAM in a hybrid memory system, and explore sophisticated memory management schemes to alleviate the impact of slow NVM on the performance of applications. A few studies architected DRAM and NVM in a cache/memory hierarchy. However, the storage and performance overhead of the cache metadata (i.e., tags) management is rather expensive in this hierarchical architecture. Some other studies architected NVM and DRAM in a single (flat) address space to form a parallel architecture. However, the hot page monitoring and migration are critical for the performance of applications in this architecture. In this paper, we propose Transformer, an OS-supported reconfigurable hybrid memory architecture to efficiently use DRAM and NVM without redesigning the hardware architecture. To identify frequently accessed (hot) memory pages for migration, we propose to count the number of page accesses in OSes by sampling the access bit of pages periodically. We further migrate the identified hot pages from NVM to DRAM to improve the performance of hybrid memory system. More importantly, Transformer can simulate a hierarchical hybrid memory architecture while DRAM and NVM are physically managed in a flat address space, and can dynamically shift the logical memory architecture between parallel and hierarchical architectures according to applications’ memory access patterns. Experimental results show that Transformer can improve the application performance by 62% on average (up to 2.7×) compared with an NVM-only system, and can also improve performance by up to 79% and 42% (21% and 24% on average) compared with hierarchical and parallel architectures, respectively.
APA, Harvard, Vancouver, ISO, and other styles
3

Cui, Liyuan, Guoqiang Zhong, Xiang Liu, and Hongwei Xu. "A Compact Object Detection Architecture with Transformer Enhancing." Journal of Physics: Conference Series 2278, no. 1 (May 1, 2022): 012034. http://dx.doi.org/10.1088/1742-6596/2278/1/012034.

Full text
Abstract:
Abstract With the advancements in rising computer vision processing, Transformer has attracted increasing interesting in this field. However, it is limited because of its unprecedented storage, heavy reliance on data size and intolerable computational power consumption. While lightweight network is in other extreme, pursuing the compact architectures accompanied by performance loss. In this paper, we enhance an architecture as the backbone of object detection networks through combining right-size Transformer, i.e. Vision Transformer module. Specifically, based on GhostNet, a well-known lightweight neural network structure moreover, embed this Vision Transformer module at the end of GhostNet, and use the input data with slicing design to reduce the computational burden of the neural networks. Vision Transformer is taken to enhance the architecture as the backbone of object detection networks, and the well-known YOLOv5 as the baseline. We conduct multi-metric comparison experiments on two medium-scale object detection datasets with large, medium and small scale networks. Results show that without relying on ultra-large dataset and pre-trained models, the proposed Transformer module enhanced architecture achieves comparable or even higher mAP metrics with only half of the model size and floating-point computation of the baseline.
APA, Harvard, Vancouver, ISO, and other styles
4

Lorenzo, Javier, Ignacio Parra Alonso, Rubén Izquierdo, Augusto Luis Ballardini, Álvaro Hernández Saz, David Fernández Llorca, and Miguel Ángel Sotelo. "CAPformer: Pedestrian Crossing Action Prediction Using Transformer." Sensors 21, no. 17 (August 24, 2021): 5694. http://dx.doi.org/10.3390/s21175694.

Full text
Abstract:
Anticipating pedestrian crossing behavior in urban scenarios is a challenging task for autonomous vehicles. Early this year, a benchmark comprising JAAD and PIE datasets have been released. In the benchmark, several state-of-the-art methods have been ranked. However, most of the ranked temporal models rely on recurrent architectures. In our case, we propose, as far as we are concerned, the first self-attention alternative, based on transformer architecture, which has had enormous success in natural language processing (NLP) and recently in computer vision. Our architecture is composed of various branches which fuse video and kinematic data. The video branch is based on two possible architectures: RubiksNet and TimeSformer. The kinematic branch is based on different configurations of transformer encoder. Several experiments have been performed mainly focusing on pre-processing input data, highlighting problems with two kinematic data sources: pose keypoints and ego-vehicle speed. Our proposed model results are comparable to PCPA, the best performing model in the benchmark reaching an F1 Score of nearly 0.78 against 0.77. Furthermore, by using only bounding box coordinates and image data, our model surpasses PCPA by a larger margin (F1=0.75 vs. F1=0.72). Our model has proven to be a valid alternative to recurrent architectures, providing advantages such as parallelization and whole sequence processing, learning relationships between samples not possible with recurrent architectures.
APA, Harvard, Vancouver, ISO, and other styles
5

Shao, Ran, Xiao-Jun Bi, and Zheng Chen. "A novel hybrid transformer-CNN architecture for environmental microorganism classification." PLOS ONE 17, no. 11 (November 11, 2022): e0277557. http://dx.doi.org/10.1371/journal.pone.0277557.

Full text
Abstract:
The success of vision transformers (ViTs) has given rise to their application in classification tasks of small environmental microorganism (EM) datasets. However, due to the lack of multi-scale feature maps and local feature extraction capabilities, the pure transformer architecture cannot achieve good results on small EM datasets. In this work, a novel hybrid model is proposed by combining the transformer with a convolution neural network (CNN). Compared to traditional ViTs and CNNs, the proposed model achieves state-of-the-art performance when trained on small EM datasets. This is accomplished in two ways. 1) Instead of the original fixed-size feature maps of the transformer-based designs, a hierarchical structure is adopted to obtain multi-scale feature maps. 2) Two new blocks are introduced to the transformer’s two core sections, namely the convolutional parameter sharing multi-head attention block and the local feed-forward network block. The ways allow the model to extract more local features compared to traditional transformers. In particular, for classification on the sixth version of the EM dataset (EMDS-6), the proposed model outperforms the baseline Xception by 6.7 percentage points, while being 60 times smaller in parameter size. In addition, the proposed model also generalizes well on the WHOI dataset (accuracy of 99%) and constitutes a fresh approach to the use of transformers for visual classification tasks based on small EM datasets.
APA, Harvard, Vancouver, ISO, and other styles
6

Ibrahem, Hatem, Ahmed Salem, and Hyun-Soo Kang. "RT-ViT: Real-Time Monocular Depth Estimation Using Lightweight Vision Transformers." Sensors 22, no. 10 (May 19, 2022): 3849. http://dx.doi.org/10.3390/s22103849.

Full text
Abstract:
The latest research in computer vision highlighted the effectiveness of the vision transformers (ViT) in performing several computer vision tasks; they can efficiently understand and process the image globally unlike the convolution which processes the image locally. ViTs outperform the convolutional neural networks in terms of accuracy in many computer vision tasks but the speed of ViTs is still an issue, due to the excessive use of the transformer layers that include many fully connected layers. Therefore, we propose a real-time ViT-based monocular depth estimation (depth estimation from single RGB image) method with encoder-decoder architectures for indoor and outdoor scenes. This main architecture of the proposed method consists of a vision transformer encoder and a convolutional neural network decoder. We started by training the base vision transformer (ViT-b16) with 12 transformer layers then we reduced the transformer layers to six layers, namely ViT-s16 (the Small ViT) and four layers, namely ViT-t16 (the Tiny ViT) to obtain real-time processing. We also try four different configurations of the CNN decoder network. The proposed architectures can learn the task of depth estimation efficiently and can produce more accurate depth predictions than the fully convolutional-based methods taking advantage of the multi-head self-attention module. We train the proposed encoder-decoder architecture end-to-end on the challenging NYU-depthV2 and CITYSCAPES benchmarks then we evaluate the trained models on the validation and test sets of the same benchmarks showing that it outperforms many state-of-the-art methods on depth estimation while performing the task in real-time (∼20 fps). We also present a fast 3D reconstruction (∼17 fps) experiment based on the depth estimated from our method which is considered a real-world application of our method.
APA, Harvard, Vancouver, ISO, and other styles
7

Lee, Jaewoo, Sungjun Lee, Wonki Cho, Zahid Ali Siddiqui, and Unsang Park. "Vision Transformer-Based Tailing Detection in Videos." Applied Sciences 11, no. 24 (December 7, 2021): 11591. http://dx.doi.org/10.3390/app112411591.

Full text
Abstract:
Tailing is defined as an event where a suspicious person follows someone closely. We define the problem of tailing detection from videos as an anomaly detection problem, where the goal is to find abnormalities in the walking pattern of the pedestrians (victim and follower). We, therefore, propose a modified Time-Series Vision Transformer (TSViT), a method for anomaly detection in video, specifically for tailing detection with a small dataset. We introduce an effective way to train TSViT with a small dataset by regularizing the prediction model. To do so, we first encode the spatial information of the pedestrians into 2D patterns and then pass them as tokens to the TSViT. Through a series of experiments, we show that the tailing detection on a small dataset using TSViT outperforms popular CNN-based architectures, as the CNN architectures tend to overfit with a small dataset of time-series images. We also show that when using time-series images, the performance of CNN-based architecture gradually drops, as the network depth is increased, to increase its capacity. On the other hand, a decreasing number of heads in Vision Transformer architecture shows good performance on time-series images, and the performance is further increased as the input resolution of the images is increased. Experimental results demonstrate that the TSViT performs better than the handcrafted rule-based method and CNN-based method for tailing detection. TSViT can be used in many applications for video anomaly detection, even with a small dataset.
APA, Harvard, Vancouver, ISO, and other styles
8

He, Ju, Jie-Neng Chen, Shuai Liu, Adam Kortylewski, Cheng Yang, Yutong Bai, and Changhu Wang. "TransFG: A Transformer Architecture for Fine-Grained Recognition." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 1 (June 28, 2022): 852–60. http://dx.doi.org/10.1609/aaai.v36i1.19967.

Full text
Abstract:
Fine-grained visual classification (FGVC) which aims at recognizing objects from subcategories is a very challenging task due to the inherently subtle inter-class differences. Most existing works mainly tackle this problem by reusing the backbone network to extract features of detected discriminative regions. However, this strategy inevitably complicates the pipeline and pushes the proposed regions to contain most parts of the objects thus fails to locate the really important parts. Recently, vision transformer (ViT) shows its strong performance in the traditional classification task. The self-attention mechanism of the transformer links every patch token to the classification token. In this work, we first evaluate the effectiveness of the ViT framework in the fine-grained recognition setting. Then motivated by the strength of the attention link can be intuitively considered as an indicator of the importance of tokens, we further propose a novel Part Selection Module that can be applied to most of the transformer architectures where we integrate all raw attention weights of the transformer into an attention map for guiding the network to effectively and accurately select discriminative image patches and compute their relations. A contrastive loss is applied to enlarge the distance between feature representations of confusing classes. We name the augmented transformer-based model TransFG and demonstrate the value of it by conducting experiments on five popular fine-grained benchmarks where we achieve state-of-the-art performance. Qualitative results are presented for better understanding of our model.
APA, Harvard, Vancouver, ISO, and other styles
9

Gao, Shuguo, Jun Zhao, Yunpeng Liu, Ziqiang Xu, Zhe Li, Lu Sun, and Yuan Tian. "Research into Power Transformer Health Assessment Technology Based on Uncertainty of Information and Deep Architecture Design." Mathematical Problems in Engineering 2021 (April 2, 2021): 1–12. http://dx.doi.org/10.1155/2021/8831872.

Full text
Abstract:
The uncertainty of the evaluation information is likely to affect the accuracy of the evaluation, when conducting a health evaluation of a power transformer. A multilevel health assessment method for power transformers is proposed in view of the three aspects of indicator criterion uncertainty, weight uncertainty, and fusion uncertainty. Firstly, indicator selection is conducted through the transformer guidelines and engineering experience to establish a multilevel model of transformers that can reflect the defect type and defect location. Then, a Gaussian cloud model is used to solve the uncertainty of the indicator criterion boundary. Based on association rules, AHP, and variable weights, the processed weights are calculated from the update module to obtain comprehensive weights, which overcomes the uncertainty of the weights. Improved DSmT theory is used for multiple evidence fusion to solve the high conflict and uncertainty problems in the fusion process. Finally, through actual case analysis, the defect type, defect location, and overall state of the transformer of the device are obtained. By comparing with many defect cases in a case-study library, the evaluation accuracy rate is found to reach 96.21%, which verifies the practicability and efficiency of the method.
APA, Harvard, Vancouver, ISO, and other styles
10

Xu, Zhen, David R. So, and Andrew M. Dai. "MUFASA: Multimodal Fusion Architecture Search for Electronic Health Records." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 12 (May 18, 2021): 10532–40. http://dx.doi.org/10.1609/aaai.v35i12.17260.

Full text
Abstract:
One important challenge of applying deep learning to electronic health records (EHR) is the complexity of their multimodal structure. EHR usually contains a mixture of structured (codes) and unstructured (free-text) data with sparse and irregular longitudinal features -- all of which doctors utilize when making decisions. In the deep learning regime, determining how different modality representations should be fused together is a difficult problem, which is often addressed by handcrafted modeling and intuition. In this work, we extend state-of-the-art neural architecture search (NAS) methods and propose MUltimodal Fusion Architecture SeArch (MUFASA) to simultaneously search across multimodal fusion strategies and modality-specific architectures for the first time. We demonstrate empirically that our MUFASA method outperforms established unimodal NAS on public EHR data with comparable computation costs. In addition, MUFASA produces architectures that outperform Transformer and Evolved Transformer. Compared with these baselines on CCS diagnosis code prediction, our discovered models improve top-5 recall from 0.88 to 0.91 and demonstrate the ability to generalize to other EHR tasks. Studying our top architecture in depth, we provide empirical evidence that MUFASA's improvements are derived from its ability to both customize modeling for each modality and find effective fusion strategies.
APA, Harvard, Vancouver, ISO, and other styles
11

Wei, Lixing. "A Transformer Network Architecture for Dermoscopy Image Segmentation." Journal of Physics: Conference Series 2303, no. 1 (July 1, 2022): 012043. http://dx.doi.org/10.1088/1742-6596/2303/1/012043.

Full text
Abstract:
Abstract Aiming at the problems of irregular shape and blurred boundary of skin lesions in skin lesions images, this paper proposes a skin lesion segmentation algorithm combining CNN and Transformer. Firstly, Resnet is used as the backbone feature extraction network to extract features, and the extracted feature map sequence is used as the input of Transformer. A new structural boundary attention gate is added to Transformer to extract enough local details to deal with fuzzy boundaries. Finally, DenseASPP is used to enhance features Represents and processes multi-scale information, and proposes an improved loss function, the purpose of which is to make the model pay attention to the boundary region when calculating the loss function. The experimental results show that the dice value and JI value of the network on the ISIC2017 dataset are 0.854534 and 0.767901, respectively, and the dice value and JI value on the ISIC2018 dataset are 0.908548 and 0.843689, respectively, which achieves good results compared to other advanced models. Its effectiveness is proved by comparing with different models and showing the effect.
APA, Harvard, Vancouver, ISO, and other styles
12

Wang, Zhixue, Yu Zhang, Lin Luo, and Nan Wang. "TransCD: scene change detection via transformer-based architecture." Optics Express 29, no. 25 (November 30, 2021): 41409. http://dx.doi.org/10.1364/oe.440720.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

Choi, Yong-Seok, Yo-Han Park, and Kong Joo Lee. "Building a Korean morphological analyzer using two Korean BERT models." PeerJ Computer Science 8 (May 2, 2022): e968. http://dx.doi.org/10.7717/peerj-cs.968.

Full text
Abstract:
A morphological analyzer plays an essential role in identifying functional suffixes of Korean words. The analyzer input and output differ from each other in their length and strings, which can be dealt with by an encoder-decoder architecture. We adopt a Transformer architecture, which is an encoder-decoder architecture with self-attention rather than a recurrent connection, to implement a Korean morphological analyzer. Bidirectional Encoder Representations from Transformers (BERT) is one of the most popular pretrained representation models; it can present an encoded sequence of input words, considering contextual information. We initialize both the Transformer encoder and decoder with two types of Korean BERT, one of which is pretrained with a raw corpus, and the other is pretrained with a morphologically analyzed dataset. Therefore, implementing a Korean morphological analyzer based on Transformer is a fine-tuning process with a relatively small corpus. A series of experiments proved that parameter initialization using pretrained models can alleviate the chronic problem of a lack of training data and reduce the time required for training. In addition, we can determine the number of layers required for the encoder and decoder to optimize the performance of a Korean morphological analyzer.
APA, Harvard, Vancouver, ISO, and other styles
14

Shi, Hao, Bingqian Chai, Yupei Wang, and Liang Chen. "A Local-Sparse-Information-Aggregation Transformer with Explicit Contour Guidance for SAR Ship Detection." Remote Sensing 14, no. 20 (October 20, 2022): 5247. http://dx.doi.org/10.3390/rs14205247.

Full text
Abstract:
Ship detection in synthetic aperture radar (SAR) images has witnessed rapid development in recent years, especially after the adoption of convolutional neural network (CNN)-based methods. Recently, a transformer using self-attention and a feed forward neural network with a encoder-decoder structure has received much attention from researchers, due to its intrinsic characteristics of global-relation modeling between pixels and an enlarged global receptive field. However, when adapting transformers to SAR ship detection, one challenging issue cannot be ignored. Background clutter, such as a coast, an island, or a sea wave, made previous object detectors easily miss ships with a blurred contour. Therefore, in this paper, we propose a local-sparse-information-aggregation transformer with explicit contour guidance for ship detection in SAR images. Based on the Swin Transformer architecture, in order to effectively aggregate sparse meaningful cues of small-scale ships, a deformable attention mechanism is incorporated to change the original self-attention mechanism. Moreover, a novel contour-guided shape-enhancement module is proposed to explicitly enforce the contour constraints on the one-dimensional transformer architecture. Experimental results show that our proposed method achieves superior performance on the challenging HRSID and SSDD datasets.
APA, Harvard, Vancouver, ISO, and other styles
15

Young, Paul, Nima Ebadi, Arun Das, Mazal Bethany, Kevin Desai, and Peyman Najafirad. "Can Hierarchical Transformers Learn Facial Geometry?" Sensors 23, no. 2 (January 13, 2023): 929. http://dx.doi.org/10.3390/s23020929.

Full text
Abstract:
Human faces are a core part of our identity and expression, and thus, understanding facial geometry is key to capturing this information. Automated systems that seek to make use of this information must have a way of modeling facial features in a way that makes them accessible. Hierarchical, multi-level architectures have the capability of capturing the different resolutions of representation involved. In this work, we propose using a hierarchical transformer architecture as a means of capturing a robust representation of facial geometry. We further demonstrate the versatility of our approach by using this transformer as a backbone to support three facial representation problems: face anti-spoofing, facial expression representation, and deepfake detection. The combination of effective fine-grained details alongside global attention representations makes this architecture an excellent candidate for these facial representation problems. We conduct numerous experiments first showcasing the ability of our approach to address common issues in facial modeling (pose, occlusions, and background variation) and capture facial symmetry, then demonstrating its effectiveness on three supplemental tasks.
APA, Harvard, Vancouver, ISO, and other styles
16

Bai, He, Peng Shi, Jimmy Lin, Yuqing Xie, Luchen Tan, Kun Xiong, Wen Gao, and Ming Li. "Segatron: Segment-Aware Transformer for Language Modeling and Understanding." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 14 (May 18, 2021): 12526–34. http://dx.doi.org/10.1609/aaai.v35i14.17485.

Full text
Abstract:
Transformers are powerful for sequence modeling. Nearly all state-of-the-art language models and pre-trained language models are based on the Transformer architecture. However, it distinguishes sequential tokens only with the token position index. We hypothesize that better contextual representations can be generated from the Transformer with richer positional information. To verify this, we propose a segment-aware Transformer (Segatron), by replacing the original token position encoding with a combined position encoding of paragraph, sentence, and token. We first introduce the segment-aware mechanism to Transformer-XL, which is a popular Transformer-based language model with memory extension and relative position encoding. We find that our method can further improve the Transformer-XL base model and large model, achieving 17.1 perplexity on the WikiText-103 dataset. We further investigate the pre-training masked language modeling task with Segatron. Experimental results show that BERT pre-trained with Segatron (SegaBERT) can outperform BERT with vanilla Transformer on various NLP tasks, and outperforms RoBERTa on zero-shot sentence representation learning. Our code is available on GitHub.
APA, Harvard, Vancouver, ISO, and other styles
17

Burn, G. L. "Implementing the evaluation transformer model of reduction on parallel machines." Journal of Functional Programming 1, no. 3 (July 1991): 329–66. http://dx.doi.org/10.1017/s0956796800000137.

Full text
Abstract:
AbstractThe evaluation transformer model of reduction generalizes lazy evaluation in two ways: it can start the evaluation of expressions before their first use, and it can evaluate expressions further than weak head normal form. Moreover, the amount of evaluation required of an argument to a function may depend on the amount of evaluation required of the function application. It is a suitable candidate model for implementing lazy functional languages on parallel machines.In this paper we explore the implementation of lazy functional languages on parallel machines, both shared and distributed memory architectures, using the evaluation transformer model of reduction. We will see that the same code can be produced for both styles of architecture, and the definition of the instruction set is virtually the same for each style. The essential difference is that a distributed memory architecture has one extra node type for non-local pointers, and instructions which involve the value of such nodes need their definitions extended to cover this new type of node.To make our presentation accessible, we base our description on a variant of the well-known G-machine, an abstract machine for executing lazy functional programs.
APA, Harvard, Vancouver, ISO, and other styles
18

Iliadis, Lazaros, Spyridon Nikolaidis, Panagiotis Sarigiannidis, Shaohua Wan, and Sotirios Goudos. "Artwork Style Recognition Using Vision Transformers and MLP Mixer." Technologies 10, no. 1 (December 28, 2021): 2. http://dx.doi.org/10.3390/technologies10010002.

Full text
Abstract:
Through the extensive study of transformers, attention mechanisms have emerged as potentially more powerful than sequential recurrent processing and convolution. In this realm, Vision Transformers have gained much research interest, since their architecture changes the dominant paradigm in Computer Vision. An interesting and difficult task in this field is the classification of artwork styles, since the artistic style of a painting is a descriptor that captures rich information about the painting. In this paper, two different Deep Learning architectures—Vision Transformer and MLP Mixer (Multi-layer Perceptron Mixer)—are trained from scratch in the task of artwork style recognition, achieving over 39% prediction accuracy for 21 style classes on the WikiArt paintings dataset. In addition, a comparative study between the most common optimizers was conducted obtaining useful information for future studies.
APA, Harvard, Vancouver, ISO, and other styles
19

Obuchowski, Aleksander, and Michał Lew. "Transformer-Capsule Model for Intent Detection (Student Abstract)." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 10 (April 3, 2020): 13885–86. http://dx.doi.org/10.1609/aaai.v34i10.7215.

Full text
Abstract:
Intent recognition is one of the most crucial tasks in NLU systems, which are nowadays especially important for designing intelligent conversation. We propose a novel approach to intent recognition which involves combining transformer architecture with capsule networks. Our results show that such architecture performs better than original capsule-NLU network implementations and achieves state-of-the-art results on datasets such as ATIS, AskUbuntu ,and WebApp.
APA, Harvard, Vancouver, ISO, and other styles
20

Vasilevskij, V. V., and M. O. Poliakov. "Reproducing of the humidity curve of power transformers oil using adaptive neuro-fuzzy systems." Electrical Engineering & Electromechanics, no. 1 (February 23, 2021): 10–14. http://dx.doi.org/10.20998/2074-272x.2021.1.02.

Full text
Abstract:
Introduction. One of the parameters that determine the state of the insulation of power transformers is the degree of moisture content of cellulose insulation and transformer oil. Modern systems of continuous monitoring of transformer equipment have the ability to accumulate data that can be used to reproduce the dynamics of moisture content in insulation. The purpose of the work is to reproduce the curve of the of humidity of transformer oil based on the results of measuring the temperature of the upper and lower layers of oil without the need for direct measurement of moisture content by special devices. Methodology. The construction of a fuzzy neural network is carried out using networks based on adaptive neuro-fuzzy system ANFIS. The network generated using the Grid Partition algorithm without clustering and Subtractive Clustering. Results. The paper presents a comparative analysis of fuzzy neural networks of various architectures in terms of increasing the accuracy of reproducing the moisture content of transformer oil. For training and testing fuzzy neural networks, the results of continuous monitoring of the temperature of the upper and lower layers of transformer oil during two months of operation used. Considered twenty four variants of the architecture of ANFIS models, which differ in the membership functions, the number of terms of each input quantity, and the number of training cycles. The results of using the constructed fuzzy neural networks for reproducing the dynamics of moisture content of transformer oil during a month of operation of the transformer are presented. The reproducing accuracy was assessed using the root mean square error and the coefficient of determination. The test results indicate the sufficient adequacy of the proposed models. Consequently, the RMSE value for the network constructed using Grid Partition method was 0.49, and for the network built using the Subtractive Clustering method – 0.40509.
APA, Harvard, Vancouver, ISO, and other styles
21

Kim, Ki Jin, Tae Ho Lim, S. H. Park, and K. H. Ahn. "A High Efficiency CMOS Power Amplfieir with a Diode Linearizer and Voltage Combining Transformers." Applied Mechanics and Materials 110-116 (October 2011): 5500–5504. http://dx.doi.org/10.4028/www.scientific.net/amm.110-116.5500.

Full text
Abstract:
This paper proposes a high efficiency power amplifier with a diode linearizer and voltage combining transformers in a standard 0.13-μm TSMC CMOS technology. The 3-D simulated transformer adopts multi-finger architecture which provides low insertion loss and allows high current capacity on the transformer. With the 4 differentially cascaded connected multi-finger transformers, the amplifier delivers more than 1W output power under 1.8 V supply condition. To enhance linearity of the power amplifier, the diode configuration bias circuit is used in this paper. With all integration of transformers, balun, diode bias circuits and same 4 diff-amps, the prototype Class AB Power Amplifier shows 32dBm saturation power at 2.4 GHz. Due to the diode linearizer the output P1dB is 30.8 dBm with 28 % Power Added Efficiency.
APA, Harvard, Vancouver, ISO, and other styles
22

Han, Jianhua, Xiajun Deng, Xinyue Cai, Zhen Yang, Hang Xu, Chunjing Xu, and Xiaodan Liang. "Laneformer: Object-Aware Row-Column Transformers for Lane Detection." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 1 (June 28, 2022): 799–807. http://dx.doi.org/10.1609/aaai.v36i1.19961.

Full text
Abstract:
We present Laneformer, a conceptually simple yet powerful transformer-based architecture tailored for lane detection that is a long-standing research topic for visual perception in autonomous driving. The dominant paradigms rely on purely CNN-based architectures which often fail in incorporating relations of long-range lane points and global contexts induced by surrounding objects (e.g., pedestrians, vehicles). Inspired by recent advances of the transformer encoder-decoder architecture in various vision tasks, we move forwards to design a new end-to-end Laneformer architecture that revolutionizes the conventional transformers into better capturing the shape and semantic characteristics of lanes, with minimal overhead in latency. First, coupling with deformable pixel-wise self-attention in the encoder, Laneformer presents two new row and column self-attention operations to efficiently mine point context along with the lane shapes. Second, motivated by the appearing objects would affect the decision of predicting lane segments, Laneformer further includes the detected object instances as extra inputs of multi-head attention blocks in the encoder and decoder to facilitate the lane point detection by sensing semantic contexts. Specifically, the bounding box locations of objects are added into Key module to provide interaction with each pixel and query while the ROI-aligned features are inserted into Value module. Extensive experiments demonstrate our Laneformer achieves state-of-the-art performances on CULane benchmark, in terms of 77.1% F1 score. We hope our simple and effective Laneformer will serve as a strong baseline for future research in self-attention models for lane detection.
APA, Harvard, Vancouver, ISO, and other styles
23

Chen, Shichuan, Kunfeng Qiu, Shilian Zheng, Qi Xuan, and Xiaoniu Yang. "Radio–Image Transformer: Bridging Radio Modulation Classification and ImageNet Classification." Electronics 9, no. 10 (October 9, 2020): 1646. http://dx.doi.org/10.3390/electronics9101646.

Full text
Abstract:
Radio modulation classification is widely used in the field of wireless communication. In this paper, in order to realize radio modulation classification with the help of the existing ImageNet classification models, we propose a radio–image transformer which extracts the instantaneous amplitude, instantaneous phase and instantaneous frequency from the received radio complex baseband signals, then converts the signals into images by the proposed signal rearrangement method or convolution mapping method. We finally use the existing ImageNet classification network models to classify the modulation type of the signal. The experimental results show that the proposed signal rearrangement method and convolution mapping method are superior to the methods using constellation diagrams and time–frequency images, which shows their performance advantages. In addition, by comparing the results of the seven ImageNet classification network models, it can be seen that, except for the relatively poor performance of the architecture MNASNet1_0, the modulation classification performance obtained by the other six network architectures is similar, indicating that the proposed methods do not have high requirements for the architecture of the selected ImageNet classification network models. Moreover, the experimental results show that our method has good classification performance for signal datasets with different sampling rates, Orthogonal Frequency Division Multiplexing (OFDM) signals and real measured signals.
APA, Harvard, Vancouver, ISO, and other styles
24

Sun, Zeyu, Qihao Zhu, Yingfei Xiong, Yican Sun, Lili Mou, and Lu Zhang. "TreeGen: A Tree-Based Transformer Architecture for Code Generation." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 05 (April 3, 2020): 8984–91. http://dx.doi.org/10.1609/aaai.v34i05.6430.

Full text
Abstract:
A code generation system generates programming language code based on an input natural language description. State-of-the-art approaches rely on neural networks for code generation. However, these code generators suffer from two problems. One is the long dependency problem, where a code element often depends on another far-away code element. A variable reference, for example, depends on its definition, which may appear quite a few lines before. The other problem is structure modeling, as programs contain rich structural information. In this paper, we propose a novel tree-based neural architecture, TreeGen, for code generation. TreeGen uses the attention mechanism of Transformers to alleviate the long-dependency problem, and introduces a novel AST reader (encoder) to incorporate grammar rules and AST structures into the network. We evaluated TreeGen on a Python benchmark, HearthStone, and two semantic parsing benchmarks, ATIS and GEO. TreeGen outperformed the previous state-of-the-art approach by 4.5 percentage points on HearthStone, and achieved the best accuracy among neural network-based approaches on ATIS (89.1%) and GEO (89.6%). We also conducted an ablation test to better understand each component of our model.
APA, Harvard, Vancouver, ISO, and other styles
25

Zhao, Qian, Hao Yang, Dongming Zhou, and Jinde Cao. "Rethinking Image Deblurring via CNN-Transformer Multiscale Hybrid Architecture." IEEE Transactions on Instrumentation and Measurement 72 (2023): 1–15. http://dx.doi.org/10.1109/tim.2022.3230482.

Full text
APA, Harvard, Vancouver, ISO, and other styles
26

Wu, Sitong, Tianyi Wu, Haoru Tan, and Guodong Guo. "Pale Transformer: A General Vision Transformer Backbone with Pale-Shaped Attention." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 3 (June 28, 2022): 2731–39. http://dx.doi.org/10.1609/aaai.v36i3.20176.

Full text
Abstract:
Recently, Transformers have shown promising performance in various vision tasks. To reduce the quadratic computation complexity caused by the global self-attention, various methods constrain the range of attention within a local region to improve its efficiency. Consequently, their receptive fields in a single attention layer are not large enough, resulting in insufficient context modeling. To address this issue, we propose a Pale-Shaped self-Attention (PS-Attention), which performs self-attention within a pale-shaped region. Compared to the global self-attention, PS-Attention can reduce the computation and memory costs significantly. Meanwhile, it can capture richer contextual information under the similar computation complexity with previous local self-attention mechanisms. Based on the PS-Attention, we develop a general Vision Transformer backbone with a hierarchical architecture, named Pale Transformer, which achieves 83.4%, 84.3%, and 84.9% Top-1 accuracy with the model size of 22M, 48M, and 85M respectively for 224x224 ImageNet-1K classification, outperforming the previous Vision Transformer backbones. For downstream tasks, our Pale Transformer backbone performs better than the recent state-of-the-art CSWin Transformer by a large margin on ADE20K semantic segmentation and COCO object detection & instance segmentation. The code will be released on https://github.com/BR-IDL/PaddleViT.
APA, Harvard, Vancouver, ISO, and other styles
27

Osolo, Raymond Ian, Zhan Yang, and Jun Long. "An Attentive Fourier-Augmented Image-Captioning Transformer." Applied Sciences 11, no. 18 (September 9, 2021): 8354. http://dx.doi.org/10.3390/app11188354.

Full text
Abstract:
Many vision–language models that output natural language, such as image-captioning models, usually use image features merely for grounding the captions and most of the good performance of the model can be attributed to the language model, which does all the heavy lifting, a phenomenon that has persisted even with the emergence of transformer-based architectures as the preferred base architecture of recent state-of-the-art vision–language models. In this paper, we make the images matter more by using fast Fourier transforms to further breakdown the input features and extract more of their intrinsic salient information, resulting in more detailed yet concise captions. This is achieved by performing a 1D Fourier transformation on the image features first in the hidden dimension and then in the sequence dimension. These extracted features alongside the region proposal image features result in a richer image representation that can then be queried to produce the associated captions, which showcase a deeper understanding of image–object–location relationships than similar models. Extensive experiments performed on the MSCOCO dataset demonstrate a CIDER-D, BLEU-1, and BLEU-4 score of 130, 80.5, and 39, respectively, on the MSCOCO benchmark dataset.
APA, Harvard, Vancouver, ISO, and other styles
28

Paik, Incheon, and Jun-Wei Wang. "Improving Text-to-Code Generation with Features of Code Graph on GPT-2." Electronics 10, no. 21 (November 5, 2021): 2706. http://dx.doi.org/10.3390/electronics10212706.

Full text
Abstract:
Code generation, as a very hot application area of deep learning models for text, consists of two different fields: code-to-code and text-to-code. A recent approach, GraphCodeBERT uses code graph, which is called data flow, and showed good performance improvement. The base model architecture of it is bidirectional encoder representations from transformers (BERT), which uses the encoder part of a transformer. On the other hand, generative pre-trained transformer (GPT)—another multiple transformer architecture—uses the decoder part and shows great performance in the multilayer perceptron model. In this study, we investigate the improvement of code graphs with several variances on GPT-2 to refer to the abstract semantic tree used to collect the features of variables in the code. Here, we mainly focus on GPT-2 with additional features of code graphs that allow the model to learn the effect of the data stream. The experimental phase is divided into two parts: fine-tuning of the existing GPT-2 model, and pre-training from scratch using code data. When we pre-train a new model from scratch, the model produces an outperformed result compared with using the code graph with enough data.
APA, Harvard, Vancouver, ISO, and other styles
29

Özdemir, Özgür, Emre Salih Akın, Rıza Velioğlu, and Tuğba Dalyan. "A comparative study of neural machine translation models for Turkish language." Journal of Intelligent & Fuzzy Systems 42, no. 3 (February 2, 2022): 2103–13. http://dx.doi.org/10.3233/jifs-211453.

Full text
Abstract:
Machine translation (MT) is an important challenge in the fields of Computational Linguistics. In this study, we conducted neural machine translation (NMT) experiments on two different architectures. First, Sequence to Sequence (Seq2Seq) architecture along with a variation that utilizes attention mechanism is performed on translation task. Second, an architecture that is fully based on the self-attention mechanism, namely Transformer, is employed to perform a comprehensive comparison. Besides, the contribution of employing Byte Pair Encoding (BPE) and Gumbel Softmax distributions are examined for both architectures. The experiments are conducted on two different datasets: TED Talks that is one of the popular benchmark datasets for NMT especially among morphologically rich languages like Turkish and WMT18 News dataset that is provided by The Third Conference on Machine Translation (WMT) for shared tasks on various aspects of machine translation. The evaluation of Turkish-to-English translations’ results demonstrate that the Transformer model with combination of BPE and Gumbel Softmax achieved 22.4 BLEU score on TED Talks and 38.7 BLUE score on WMT18 News dataset. The empirical results support that using Gumbel Softmax distribution improves the quality of translations for both architectures.
APA, Harvard, Vancouver, ISO, and other styles
30

Reva, I. V., O. V. Bialobrzheskyi, O. V. Todorov, and M. A. Bezzub. "Review of electric methods and systems for monitoring power transformers in the SMART GRID environment." Electrical Engineering and Power Engineering, no. 1 (March 30, 2022): 30–41. http://dx.doi.org/10.15588/1607-6761-2022-1-3.

Full text
Abstract:
Purpose. Application of analytical analysis on the available methods of monitoring the power transformer in order to classify and systematize the available information to identify rational, from the standpoint of operated electrical measuring equipment for transformer substations. Methodology. The use of methods of analytical classification and systematization of existing monitoring methods in the field of practical research and obtained field results. Findings. Power transformers remain the heart of the power grid and Smart Grid network of any level of the hierarchy of structure and architecture. As a rule, the transformer as an estimated element of the network is put into operation once and kept in working loads, alternating monitoring and scheduled restoration work, until the complete loss of working condition required for the operation requirements. Therefore, most transformers are in operation over the regulated period of more than 20 years.Carrying out the need for flexible analytical assessment and classification of existing methods of monitoring the power transformer, systematization of known information for a wider range of specialists in the energy sector. Originality. It is established that due to the complexity of modern monitoring methods the time required for their selection and use in accordance with the structure of the transformer decreases with increasing systematization and classification of the relevant methodological material. The presented systematization reduces the cost of time and material resources when choosing the necessary method of the power transformer monitoring. Practical value. Systematic classification of available monitoring methods in the appropriate relation to the families of methods and zones of monitoring mounting to search for a signal of transformer failures.
APA, Harvard, Vancouver, ISO, and other styles
31

Sun, Tao, and Hai Bo Liu. "Design of Fault Diagnosis Expert System of Transformer." Applied Mechanics and Materials 291-294 (February 2013): 2557–61. http://dx.doi.org/10.4028/www.scientific.net/amm.291-294.2557.

Full text
Abstract:
The transformer fault diagnosis expert system design knowledge representation and reasoning mechanisms are the key issue. Characteristics of transformer fault diagnosis system based on human experts, learning on the basis of the human expert diagnosis of transformer faults, to build a transformer fault diagnosis expert system of systems architecture, knowledge representation and reasoning mechanisms for a more detailed analysis and discussion.
APA, Harvard, Vancouver, ISO, and other styles
32

Lu, Kevin, Aditya Grover, Pieter Abbeel, and Igor Mordatch. "Frozen Pretrained Transformers as Universal Computation Engines." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 7 (June 28, 2022): 7628–36. http://dx.doi.org/10.1609/aaai.v36i7.20729.

Full text
Abstract:
We investigate the capability of a transformer pretrained on natural language to generalize to other modalities with minimal finetuning -- in particular, without finetuning of the self-attention and feedforward layers of the residual blocks. We consider such a model, which we call a Frozen Pretrained Transformer (FPT), and study finetuning it on a variety of sequence classification tasks spanning numerical computation, vision, and protein fold prediction. In contrast to prior works which investigate finetuning on the same modality as the pretraining dataset, we show that pretraining on natural language can improve performance and compute efficiency on non-language downstream tasks. Additionally, we perform an analysis of the architecture, comparing the performance of a random initialized transformer to a random LSTM. Combining the two insights, we find language-pretrained transformers can obtain strong performance on a variety of non-language tasks.
APA, Harvard, Vancouver, ISO, and other styles
33

Zhang, Zizhao, Han Zhang, Long Zhao, Ting Chen, Sercan Ö. Arik, and Tomas Pfister. "Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and Interpretable Visual Understanding." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 3 (June 28, 2022): 3417–25. http://dx.doi.org/10.1609/aaai.v36i3.20252.

Full text
Abstract:
Hierarchical structures are popular in recent vision transformers, however, they require sophisticated designs and massive datasets to work well. In this paper, we explore the idea of nesting basic local transformers on non-overlapping image blocks and aggregating them in a hierarchical way. We find that the block aggregation function plays a critical role in enabling cross-block non-local information communication. This observation leads us to design a simplified architecture that requires minor code changes upon the original vision transformer. The benefits of the proposed judiciously-selected design are threefold: (1) NesT converges faster and requires much less training data to achieve good generalization on both ImageNet and small datasets like CIFAR; (2) when extending our key ideas to image generation, NesT leads to a strong decoder that is 8 times faster than previous transformer-based generators; and (3) we show that decoupling the feature learning and abstraction processes via this nested hierarchy in our design enables constructing a novel method (named GradCAT) for visually interpreting the learned model. Source code is available https://github.com/google-research/nested-transformer.
APA, Harvard, Vancouver, ISO, and other styles
34

Yang, Xin, and Tao Su. "EFA-Trans: An Efficient and Flexible Acceleration Architecture for Transformers." Electronics 11, no. 21 (October 31, 2022): 3550. http://dx.doi.org/10.3390/electronics11213550.

Full text
Abstract:
The topic of transformers is rapidly emerging as one of the most important key primitives in neural networks. Unfortunately, most hardware designs for transformers are deficient, either hardly considering the configurability of the design or failing to realize the complete inference process of transformers. Specifically, few studies have paid attention to the compatibility of different computing paradigms. Thus, this paper presents EFA-Trans, a highly efficient and flexible hardware accelerator architecture for transformers. To reach high performance, we propose a configurable matrix computing array and leverage on-chip memories optimizations. In addition, with the design of nonlinear modules and fine-grained scheduling, our architecture can perform complete transformer inference. EFA-Trans is also compatible with dense and sparse patterns, which further expands its application scenarios. Moreover, a performance analytic model is abstracted to guide the determination of architecture parameter sets. Finally, our designs are developed by RTL and evaluated on Xilinx ZCU102. Experimental results demonstrate that EFA-Trans provides 23.74× and 7.58× improvement in energy efficiency compared with CPU and GPU, respectively. It also shows DSP efficiency is between 3.59× and 21.07× higher than others, outperforming existing advanced works.
APA, Harvard, Vancouver, ISO, and other styles
35

Bacco, Luca, Andrea Cimino, Felice Dell’Orletta, and Mario Merone. "Explainable Sentiment Analysis: A Hierarchical Transformer-Based Extractive Summarization Approach." Electronics 10, no. 18 (September 8, 2021): 2195. http://dx.doi.org/10.3390/electronics10182195.

Full text
Abstract:
In recent years, the explainable artificial intelligence (XAI) paradigm is gaining wide research interest. The natural language processing (NLP) community is also approaching the shift of paradigm: building a suite of models that provide an explanation of the decision on some main task, without affecting the performances. It is not an easy job for sure, especially when very poorly interpretable models are involved, like the almost ubiquitous (at least in the NLP literature of the last years) transformers. Here, we propose two different transformer-based methodologies exploiting the inner hierarchy of the documents to perform a sentiment analysis task while extracting the most important (with regards to the model decision) sentences to build a summary as the explanation of the output. For the first architecture, we placed two transformers in cascade and leveraged the attention weights of the second one to build the summary. For the other architecture, we employed a single transformer to classify the single sentences in the document and then combine the probability scores of each to perform the classification and then build the summary. We compared the two methodologies by using the IMDB dataset, both in terms of classification and explainability performances. To assess the explainability part, we propose two kinds of metrics, based on benchmarking the models’ summaries with human annotations. We recruited four independent operators to annotate few documents retrieved from the original dataset. Furthermore, we conducted an ablation study to highlight how implementing some strategies leads to important improvements on the explainability performance of the cascade transformers model.
APA, Harvard, Vancouver, ISO, and other styles
36

Dai, Yaonan, Jiuyang Yu, Dean Zhang, Tianhao Hu, and Xiaotao Zheng. "RODFormer: High-Precision Design for Rotating Object Detection with Transformers." Sensors 22, no. 7 (March 29, 2022): 2633. http://dx.doi.org/10.3390/s22072633.

Full text
Abstract:
Aiming at the problem of Transformers lack of local spatial receptive field and discontinuous boundary loss in rotating object detection, in this paper, we propose a Transformer-based high-precision rotating object detection model (RODFormer). Firstly, RODFormer uses a structured transformer architecture to collect feature information of different resolutions to improve the collection range of feature information. Secondly, a new feed-forward network (spatial-FFN) is constructed. Spatial-FFN fuses the local spatial features of 3 × 3 depthwise separable convolutions with the global channel features of multilayer perceptron (MLP) to solve the deficiencies of FFN in local spatial modeling. Finally, based on the space-FFN architecture, a detection head is built using the CIOU-smooth L1 loss function and only returns to the horizontal frame when the rotating frame is close to the horizontal, so as to alleviate the loss discontinuity of the rotating frame. Ablation experiments of RODFormer on the DOTA dataset show that the Transformer-structured module, the spatial-FFN module and the CIOU-smooth L1 loss function module are all effective in improving the detection accuracy of RODFormer. Compared with 12 rotating object detection models on the DOTA dataset, RODFormer has the highest average detection accuracy (up to 75.60%), that is, RODFormer is more competitive in rotating object detection accuracy.
APA, Harvard, Vancouver, ISO, and other styles
37

Zhu, Xiaoning, Yannan Jia, Sun Jian, Lize Gu, and Zhang Pu. "ViTT: Vision Transformer Tracker." Sensors 21, no. 16 (August 20, 2021): 5608. http://dx.doi.org/10.3390/s21165608.

Full text
Abstract:
This paper presents a new model for multi-object tracking (MOT) with a transformer. MOT is a spatiotemporal correlation task among interest objects and one of the crucial technologies of multi-unmanned aerial vehicles (Multi-UAV). The transformer is a self-attentional codec architecture that has been successfully used in natural language processing and is emerging in computer vision. This study proposes the Vision Transformer Tracker (ViTT), which uses a transformer encoder as the backbone and takes images directly as input. Compared with convolution networks, it can model global context at every encoder layer from the beginning, which addresses the challenges of occlusion and complex scenarios. The model simultaneously outputs object locations and corresponding appearance embeddings in a shared network through multi-task learning. Our work demonstrates the superiority and effectiveness of transformer-based networks in complex computer vision tasks and paves the way for applying the pure transformer in MOT. We evaluated the proposed model on the MOT16 dataset, achieving 65.7% MOTA, and obtained a competitive result compared with other typical multi-object trackers.
APA, Harvard, Vancouver, ISO, and other styles
38

Bedair, Sarah S., Jeffrey S. Pulskamp, Ryan Rudy, Ronald Polcawich, Ryan Cable, and Lee Griffin. "Boosting MEMS Piezoelectric Transformer Figures of Merit via Architecture Optimization." IEEE Electron Device Letters 39, no. 3 (March 2018): 428–31. http://dx.doi.org/10.1109/led.2018.2799864.

Full text
APA, Harvard, Vancouver, ISO, and other styles
39

Ramos-Pérez, Eduardo, Pablo J. Alonso-González, and José Javier Núñez-Velázquez. "Multi-Transformer: A New Neural Network-Based Architecture for Forecasting S&P Volatility." Mathematics 9, no. 15 (July 28, 2021): 1794. http://dx.doi.org/10.3390/math9151794.

Full text
Abstract:
Events such as the Financial Crisis of 2007–2008 or the COVID-19 pandemic caused significant losses to banks and insurance entities. They also demonstrated the importance of using accurate equity risk models and having a risk management function able to implement effective hedging strategies. Stock volatility forecasts play a key role in the estimation of equity risk and, thus, in the management actions carried out by financial institutions. Therefore, this paper has the aim of proposing more accurate stock volatility models based on novel machine and deep learning techniques. This paper introduces a neural network-based architecture, called Multi-Transformer. Multi-Transformer is a variant of Transformer models, which have already been successfully applied in the field of natural language processing. Indeed, this paper also adapts traditional Transformer layers in order to be used in volatility forecasting models. The empirical results obtained in this paper suggest that the hybrid models based on Multi-Transformer and Transformer layers are more accurate and, hence, they lead to more appropriate risk measures than other autoregressive algorithms or hybrid models based on feed forward layers or long short term memory cells.
APA, Harvard, Vancouver, ISO, and other styles
40

Meng, Fandong, and Jinchao Zhang. "DTMT: A Novel Deep Transition Architecture for Neural Machine Translation." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 224–31. http://dx.doi.org/10.1609/aaai.v33i01.3301224.

Full text
Abstract:
Past years have witnessed rapid developments in Neural Machine Translation (NMT). Most recently, with advanced modeling and training techniques, the RNN-based NMT (RNMT) has shown its potential strength, even compared with the well-known Transformer (self-attentional) model. Although the RNMT model can possess very deep architectures through stacking layers, the transition depth between consecutive hidden states along the sequential axis is still shallow. In this paper, we further enhance the RNN-based NMT through increasing the transition depth between consecutive hidden states and build a novel Deep Transition RNN-based Architecture for Neural Machine Translation, named DTMT. This model enhances the hidden-to-hidden transition with multiple non-linear transformations, as well as maintains a linear transformation path throughout this deep transition by the well-designed linear transformation mechanism to alleviate the gradient vanishing problem. Experiments show that with the specially designed deep transition modules, our DTMT can achieve remarkable improvements on translation quality. Experimental results on Chinese⇒English translation task show that DTMT can outperform the Transformer model by +2.09 BLEU points and achieve the best results ever reported in the same dataset. On WMT14 English⇒German and English⇒French translation tasks, DTMT shows superior quality to the state-of-the-art NMT systems, including the Transformer and the RNMT+.
APA, Harvard, Vancouver, ISO, and other styles
41

Chaudhry, Parinnay. "Bidirectional Encoder Representations from Transformers for Modelling Stock Prices." International Journal for Research in Applied Science and Engineering Technology 10, no. 2 (February 28, 2022): 896–901. http://dx.doi.org/10.22214/ijraset.2022.40406.

Full text
Abstract:
Abstract: Bidirectional Encoder Representations from Transformers (BERT) is a transformer neural network architecture designed for natural language processing (NLP). The model’s architecture allows for an efficient, contextual understanding of words in sentences. Empirical evidence regarding the usage of BERT has proved a high degree of accuracy in NLP tasks such as sentiment analysis and next sentence classification. This study utilises BERT’s sentiment analysis capability, proposes and tests a framework to model a quantitative relation between the news and reportings of a company, and the movement of its stock price. This study also aims to explore the nature of human psychology in terms of modelling risk and opportunity and gain insight into the subjectivity of the human mind. Keywords: natural language processing, BERT, sentiment analysis, stock price modelling, transformers, neural networks, selfattention
APA, Harvard, Vancouver, ISO, and other styles
42

Sykiotis, Stavros, Maria Kaselimi, Anastasios Doulamis, and Nikolaos Doulamis. "ELECTRIcity: An Efficient Transformer for Non-Intrusive Load Monitoring." Sensors 22, no. 8 (April 11, 2022): 2926. http://dx.doi.org/10.3390/s22082926.

Full text
Abstract:
Non-Intrusive Load Monitoring (NILM) describes the process of inferring the consumption pattern of appliances by only having access to the aggregated household signal. Sequence-to-sequence deep learning models have been firmly established as state-of-the-art approaches for NILM, in an attempt to identify the pattern of the appliance power consumption signal into the aggregated power signal. Exceeding the limitations of recurrent models that have been widely used in sequential modeling, this paper proposes a transformer-based architecture for NILM. Our approach, called ELECTRIcity, utilizes transformer layers to accurately estimate the power signal of domestic appliances by relying entirely on attention mechanisms to extract global dependencies between the aggregate and the domestic appliance signals. Another additive value of the proposed model is that ELECTRIcity works with minimal dataset pre-processing and without requiring data balancing. Furthermore, ELECTRIcity introduces an efficient training routine compared to other traditional transformer-based architectures. According to this routine, ELECTRIcity splits model training into unsupervised pre-training and downstream task fine-tuning, which yields performance increases in both predictive accuracy and training time decrease. Experimental results indicate ELECTRIcity’s superiority compared to several state-of-the-art methods.
APA, Harvard, Vancouver, ISO, and other styles
43

Xu, Yuxin, Yuyao Yan, Yiming Lin, Xi Yang, and Kaizhu Huang. "Sketch Based Image Retrieval for Architecture Images with Siamese Swin Transformer." Journal of Physics: Conference Series 2278, no. 1 (May 1, 2022): 012035. http://dx.doi.org/10.1088/1742-6596/2278/1/012035.

Full text
Abstract:
Abstract Sketch-based image retrieval (SBIR) is an image retrieval task that takes a sketch as input and outputs colour images matching the sketch. Most recent SBIR methods utilise deep learning methods with complicated network designs, which are resource-intensive for practical use. This paper proposes a novel compact framework that takes the siamese network with image view angle information, targeting the SBIR task for architecture images. In particular, the proposed siamese network engages a compact SwinTiny transformer as the backbone encoder. View angle information of the architecture image is fed to the model to further improve search accuracy. To cope with the insufficient sketches issue, simulated building sketches are used in training, which are generated by a pre-trained edge extractor. Experiments show that our model achieves 0.859 top-one accuracy exceeding many baseline models for an architecture retrieval task.
APA, Harvard, Vancouver, ISO, and other styles
44

Wang, Bingting, Ziping Cao, Zhen Luan, and Bo Zhou. "Design and Evaluation of Band-Pass Matching Coupler for Narrow-Band DC Power Line Communications." Journal of Circuits, Systems and Computers 28, no. 07 (June 27, 2019): 1950119. http://dx.doi.org/10.1142/s0218126619501196.

Full text
Abstract:
In power line communication (PLC), coupling transformers are usually required for coupling, band-pass filtering and impedance matching. However, coupling transformer design involves so many parameters that it is typically an imprecise and experimental procedure. In addition, the cost and size of transformers prevent them from being an economic and compact solution for PLC couplers. This paper first analyzes a simplified, distributed parameter model of the power line, which can be used to calculate power line impedance easily and accurately. Next, a low-cost, band-pass matching coupler with compact architecture is designed to replace the coupling transformer for direct current PLC (DC-PLC), which ensures impedance matching on the basis of an accurate power line impedance instead of using an average value. Finally, simulations as well as laboratory tests are conducted under 95–125[Formula: see text]kHz (CENELEC B-band), which confirm the new coupler’s excellent band-pass filtering and impedance matching performance.
APA, Harvard, Vancouver, ISO, and other styles
45

杨, 靖翔. "Research on Chinese Text Error Correction Based on Transformer Enhanced Architecture." Computer Science and Application 12, no. 03 (2022): 565–71. http://dx.doi.org/10.12677/csa.2022.123057.

Full text
APA, Harvard, Vancouver, ISO, and other styles
46

Raisi, Zobeir, Mohamed A. Naiel, Paul Fieguth, Steven Wardell, and John Zelek. "2D Positional Embedding-based Transformer for Scene Text Recognition." Journal of Computational Vision and Imaging Systems 6, no. 1 (January 15, 2021): 1–4. http://dx.doi.org/10.15353/jcvis.v6i1.3533.

Full text
Abstract:
Recent state-of-the-art scene text recognition methods are primarily based on Recurrent Neural Networks (RNNs), however, these methods require one-dimensional (1D) features and are not designed for recognizing irregular-text instances due to the loss of spatial information present in the original two-dimensional (2D) images. In this paper, we leverage a Transformer-based architecture for recognizing both regular and irregular text-in-the-wild images. The proposed method takes advantage of using a 2D positional encoder with the Transformer architecture to better preserve the spatial information of 2D image features than previous methods. The experiments on popular benchmarks, including the challenging COCO-Text dataset, demonstrate that the proposed scene text recognition method outperformed the state-of-the-art in most cases, especially on irregular-text recognition.
APA, Harvard, Vancouver, ISO, and other styles
47

Wang, Zeji, Xiaowei He, Yi Li, and Qinliang Chuai. "EmbedFormer: Embedded Depth-Wise Convolution Layer for Token Mixing." Sensors 22, no. 24 (December 15, 2022): 9854. http://dx.doi.org/10.3390/s22249854.

Full text
Abstract:
Visual Transformers (ViTs) have shown impressive performance due to their powerful coding ability to catch spatial and channel information. MetaFormer gives us a general architecture of transformers consisting of a token mixer and a channel mixer through which we can generally understand how transformers work. It is proved that the general architecture of the ViTs is more essential to the models’ performance than self-attention mechanism. Then, Depth-wise Convolution layer (DwConv) is widely accepted to replace local self-attention in transformers. In this work, a pure convolutional "transformer" is designed. We rethink the difference between the operation of self-attention and DwConv. It is found that the self-attention layer, with an embedding layer, unavoidably affects channel information, while DwConv only mixes the token information per channel. To address the differences between DwConv and self-attention, we implement DwConv with an embedding layer before as the token mixer to instantiate a MetaFormer block and a model named EmbedFormer is introduced. Meanwhile, SEBlock is applied in the channel mixer part to improve performance. On the ImageNet-1K classification task, EmbedFormer achieves top-1 accuracy of 81.7% without additional training images, surpassing the Swin transformer by +0.4% in similar complexity. In addition, EmbedFormer is evaluated in downstream tasks and the results are entirely above those of PoolFormer, ResNet and DeiT. Compared with PoolFormer-S24, another instance of MetaFormer, our EmbedFormer improves the score by +3.0% box AP/+2.3% mask AP on the COCO dataset and +1.3% mIoU on the ADE20K.
APA, Harvard, Vancouver, ISO, and other styles
48

Pan, Hang, Lun Xie, and Zhiliang Wang. "Plant and Animal Species Recognition Based on Dynamic Vision Transformer Architecture." Remote Sensing 14, no. 20 (October 20, 2022): 5242. http://dx.doi.org/10.3390/rs14205242.

Full text
Abstract:
Automatic prediction of the plant and animal species most likely to be observed at a given geo-location is useful for many scenarios related to biodiversity management and conservation. However, the sparseness of aerial images results in small discrepancies in the image appearance of different species categories. In this paper, we propose a novel Dynamic Vision Transformer (DViT) architecture to reduce the effect of small image discrepancies for plant and animal species recognition by aerial image and geo-location environment information. We extract the latent representation by sampling a subset of patches with low attention weights in the transformer encoder model with a learnable mask token for multimodal aerial images. At the same time, the geo-location environment information is added to the process of extracting the latent representation from aerial images and fused with the token with high attention weights to improve the distinguishability of representation by the dynamic attention fusion model. The proposed DViT method is evaluated on the GeoLifeCLEF 2021 and 2022 datasets, achieving state-of-the-art performance. The experimental results show that fusing the aerial image and multimodal geo-location environment information contributes to plant and animal species recognition.
APA, Harvard, Vancouver, ISO, and other styles
49

Prakash, PKS, Srinivas Chilukuri, Nikhil Ranade, and Shankar Viswanathan. "RareBERT: Transformer Architecture for Rare Disease Patient Identification using Administrative Claims." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 1 (May 18, 2021): 453–60. http://dx.doi.org/10.1609/aaai.v35i1.16122.

Full text
Abstract:
A rare disease is any disease that affects a very small percentage (1 in 1,500) of population. It is estimated that there are nearly 7,000 rare disease affecting 30 million patients in the U. S. alone. Most of the patients suffering from rare diseases experience multiple misdiagnoses and may never be diagnosed correctly. This is largely driven by the low prevalence of the disease that results in a lack of awareness among healthcare providers. There have been efforts from machine learning researchers to develop predictive models to help diagnose patients using healthcare datasets such as electronic health records and administrative claims. Most recently, transformer models have been applied to predict diseases BEHRT, G-BERT and Med-BERT. However, these have been developed specifically for electronic health records (EHR) and have not been designed to address rare disease challenges such as class imbalance, partial longitudinal data capture, and noisy labels. As a result, they deliver poor performance in predicting rare diseases compared with baselines. Besides, EHR datasets are generally confined to the hospital systems using them and do not capture a wider sample of patients thus limiting the availability of sufficient rare dis-ease patients in the dataset. To address these challenges, we introduced an extension of the BERT model tailored for rare disease diagnosis called RareBERT which has been trained on administrative claims datasets. RareBERT extends Med-BERT by including context embedding and temporal reference embedding. Moreover, we introduced a novel adaptive loss function to handle the class imbal-ance. In this paper, we show our experiments on diagnosing X-Linked Hypophosphatemia (XLH), a genetic rare disease. While RareBERT performs significantly better than the baseline models (79.9% AUPRC versus 30% AUPRC for Med-BERT), owing to the transformer architecture, it also shows its robustness in partial longitudinal data capture caused by poor capture of claims with a drop in performance of only 1.35% AUPRC, compared with 12% for Med-BERT and 33.0% for LSTM and 67.4% for boosting trees based baseline.
APA, Harvard, Vancouver, ISO, and other styles
50

Chernyshov, Artem, Valentin Klimov, Anita Balandina, and Boris Shchukin. "The Application of Transformer Model Architecture for the Dependency Parsing Task." Procedia Computer Science 190 (2021): 142–45. http://dx.doi.org/10.1016/j.procs.2021.06.018.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography