Journal articles on the topic 'Neural Sequence Models'

To see the other types of publications on this topic, follow the link: Neural Sequence Models.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Neural Sequence Models.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Shi, Tian, Yaser Keneshloo, Naren Ramakrishnan, and Chandan K. Reddy. "Neural Abstractive Text Summarization with Sequence-to-Sequence Models." ACM/IMS Transactions on Data Science 2, no. 1 (January 3, 2021): 1–37. http://dx.doi.org/10.1145/3419106.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Liu, Bowen, Bharath Ramsundar, Prasad Kawthekar, Jade Shi, Joseph Gomes, Quang Luu Nguyen, Stephen Ho, Jack Sloane, Paul Wender, and Vijay Pande. "Retrosynthetic Reaction Prediction Using Neural Sequence-to-Sequence Models." ACS Central Science 3, no. 10 (September 5, 2017): 1103–13. http://dx.doi.org/10.1021/acscentsci.7b00303.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Phua, Yeong Tsann, Sujata Navaratnam, Chon-Moy Kang, and Wai-Seong Che. "Sequence-to-sequence neural machine translation for English-Malay." IAES International Journal of Artificial Intelligence (IJ-AI) 11, no. 2 (June 1, 2022): 658. http://dx.doi.org/10.11591/ijai.v11.i2.pp658-665.

Full text
Abstract:
Machine translation aims to translate text from a specific language into another language using computer software. In this work, we performed neural machine translation with attention implementation on English-Malay parallel corpus. We attempt to improve the model performance by rectified linear unit (ReLU) attention alignment. Different sequence-to-sequence models were trained. These models include long-short term memory (LSTM), gated recurrent unit (GRU), bidirectional LSTM (Bi-LSTM) and bidirectional GRU (Bi-GRU). In the experiment, both bidirectional models, Bi-LSTM and Bi-GRU yield a converge of below 30 epochs. Our study shows that the ReLU attention alignment improves the bilingual evaluation understudy (BLEU) translation score between score 0.26 and 1.12 across all the models as compare to the original Tanh models.
APA, Harvard, Vancouver, ISO, and other styles
4

Demeester, Thomas. "System Identification with Time-Aware Neural Sequence Models." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 04 (April 3, 2020): 3757–64. http://dx.doi.org/10.1609/aaai.v34i04.5786.

Full text
Abstract:
Established recurrent neural networks are well-suited to solve a wide variety of prediction tasks involving discrete sequences. However, they do not perform as well in the task of dynamical system identification, when dealing with observations from continuous variables that are unevenly sampled in time, for example due to missing observations. We show how such neural sequence models can be adapted to deal with variable step sizes in a natural way. In particular, we introduce a ‘time-aware’ and stationary extension of existing models (including the Gated Recurrent Unit) that allows them to deal with unevenly sampled system observations by adapting to the observation times, while facilitating higher-order temporal behavior. We discuss the properties and demonstrate the validity of the proposed approach, based on samples from two industrial input/output processes.
APA, Harvard, Vancouver, ISO, and other styles
5

Halim, Calvin Janitra, and Kazuhiko Kawamoto. "2D Convolutional Neural Markov Models for Spatiotemporal Sequence Forecasting." Sensors 20, no. 15 (July 28, 2020): 4195. http://dx.doi.org/10.3390/s20154195.

Full text
Abstract:
Recent approaches to time series forecasting, especially forecasting spatiotemporal sequences, have leveraged the approximation power of deep neural networks to model the complexity of such sequences, specifically approaches that are based on recurrent neural networks. Still, as spatiotemporal sequences that arise in the real world are noisy and chaotic, modeling approaches that utilize probabilistic temporal models, such as deep Markov models (DMMs), are favorable because of their ability to model uncertainty, increasing their robustness to noise. However, approaches based on DMMs do not maintain the spatial characteristics of spatiotemporal sequences, with most of the approaches converting the observed input into 1D data halfway through the model. To solve this, we propose a model that retains the spatial aspect of the target sequence with a DMM that consists of 2D convolutional neural networks. We then show the robustness of our method to data with large variance compared with naive forecast, vanilla DMM, and convolutional long short-term memory (LSTM) using synthetic data, even outperforming the DNN models over a longer forecast period. We also point out the limitations of our model when forecasting real-world precipitation data and the possible future work that can be done to address these limitations, along with additional future research potential.
APA, Harvard, Vancouver, ISO, and other styles
6

Kalm, Kristjan, and Dennis Norris. "Sequence learning recodes cortical representations instead of strengthening initial ones." PLOS Computational Biology 17, no. 5 (May 24, 2021): e1008969. http://dx.doi.org/10.1371/journal.pcbi.1008969.

Full text
Abstract:
We contrast two computational models of sequence learning. The associative learner posits that learning proceeds by strengthening existing association weights. Alternatively, recoding posits that learning creates new and more efficient representations of the learned sequences. Importantly, both models propose that humans act as optimal learners but capture different statistics of the stimuli in their internal model. Furthermore, these models make dissociable predictions as to how learning changes the neural representation of sequences. We tested these predictions by using fMRI to extract neural activity patterns from the dorsal visual processing stream during a sequence recall task. We observed that only the recoding account can explain the similarity of neural activity patterns, suggesting that participants recode the learned sequences using chunks. We show that associative learning can theoretically store only very limited number of overlapping sequences, such as common in ecological working memory tasks, and hence an efficient learner should recode initial sequence representations.
APA, Harvard, Vancouver, ISO, and other styles
7

Tan, Zhixing, Jinsong Su, Boli Wang, Yidong Chen, and Xiaodong Shi. "Lattice-to-sequence attentional Neural Machine Translation models." Neurocomputing 284 (April 2018): 138–47. http://dx.doi.org/10.1016/j.neucom.2018.01.010.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Nam, Hyoungwook, Segwang Kim, and Kyomin Jung. "Number Sequence Prediction Problems for Evaluating Computational Powers of Neural Networks." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 4626–33. http://dx.doi.org/10.1609/aaai.v33i01.33014626.

Full text
Abstract:
Inspired by number series tests to measure human intelligence, we suggest number sequence prediction tasks to assess neural network models’ computational powers for solving algorithmic problems. We define the complexity and difficulty of a number sequence prediction task with the structure of the smallest automaton that can generate the sequence. We suggest two types of number sequence prediction problems: the number-level and the digit-level problems. The number-level problems format sequences as 2-dimensional grids of digits and the digit-level problems provide a single digit input per a time step. The complexity of a number-level sequence prediction can be defined with the depth of an equivalent combinatorial logic, and the complexity of a digit-level sequence prediction can be defined with an equivalent state automaton for the generation rule. Experiments with number-level sequences suggest that CNN models are capable of learning the compound operations of sequence generation rules, but the depths of the compound operations are limited. For the digitlevel problems, simple GRU and LSTM models can solve some problems with the complexity of finite state automata. Memory augmented models such as Stack-RNN, Attention, and Neural Turing Machines can solve the reverse-order task which has the complexity of simple pushdown automaton. However, all of above cannot solve general Fibonacci, Arithmetic or Geometric sequence generation problems that represent the complexity of queue automata or Turing machines. The results show that our number sequence prediction problems effectively evaluate machine learning models’ computational capabilities.
APA, Harvard, Vancouver, ISO, and other styles
9

Yousuf, Hana, Michael Lahzi, Said A. Salloum, and Khaled Shaalan. "A systematic review on sequence-to-sequence learning with neural network and its models." International Journal of Electrical and Computer Engineering (IJECE) 11, no. 3 (June 1, 2021): 2315. http://dx.doi.org/10.11591/ijece.v11i3.pp2315-2326.

Full text
Abstract:
We develop a precise writing survey on sequence-to-sequence learning with neural network and its models. The primary aim of this report is to enhance the knowledge of the sequence-to-sequence neural network and to locate the best way to deal with executing it. Three models are mostly used in sequence-to-sequence neural network applications, namely: recurrent neural networks (RNN), connectionist temporal classification (CTC), and attention model. The evidence we adopted in conducting this survey included utilizing the examination inquiries or research questions to determine keywords, which were used to search for bits of peer-reviewed papers, articles, or books at scholastic directories. Through introductory hunts, 790 papers, and scholarly works were found, and with the assistance of choice criteria and PRISMA methodology, the number of papers reviewed decreased to 16. Every one of the 16 articles was categorized by their contribution to each examination question, and they were broken down. At last, the examination papers experienced a quality appraisal where the subsequent range was from 83.3% to 100%. The proposed systematic review enabled us to collect, evaluate, analyze, and explore different approaches of implementing sequence-to-sequence neural network models and pointed out the most common use in machine learning. We followed a methodology that shows the potential of applying these models to real-world applications.
APA, Harvard, Vancouver, ISO, and other styles
10

Buckman, Jacob, and Graham Neubig. "Neural Lattice Language Models." Transactions of the Association for Computational Linguistics 6 (December 2018): 529–41. http://dx.doi.org/10.1162/tacl_a_00036.

Full text
Abstract:
In this work, we propose a new language modeling paradigm that has the ability to perform both prediction and moderation of information flow at multiple granularities: neural lattice language models. These models construct a lattice of possible paths through a sentence and marginalize across this lattice to calculate sequence probabilities or optimize parameters. This approach allows us to seamlessly incorporate linguistic intuitions — including polysemy and the existence of multiword lexical items — into our language model. Experiments on multiple language modeling tasks show that English neural lattice language models that utilize polysemous embeddings are able to improve perplexity by 9.95% relative to a word-level baseline, and that a Chinese model that handles multi-character tokens is able to improve perplexity by 20.94% relative to a character-level baseline.
APA, Harvard, Vancouver, ISO, and other styles
11

Eriguchi, Akiko, Kazuma Hashimoto, and Yoshimasa Tsuruoka. "Incorporating Source-Side Phrase Structures into Neural Machine Translation." Computational Linguistics 45, no. 2 (June 2019): 267–92. http://dx.doi.org/10.1162/coli_a_00348.

Full text
Abstract:
Neural machine translation (NMT) has shown great success as a new alternative to the traditional Statistical Machine Translation model in multiple languages. Early NMT models are based on sequence-to-sequence learning that encodes a sequence of source words into a vector space and generates another sequence of target words from the vector. In those NMT models, sentences are simply treated as sequences of words without any internal structure. In this article, we focus on the role of the syntactic structure of source sentences and propose a novel end-to-end syntactic NMT model, which we call a tree-to-sequence NMT model, extending a sequence-to-sequence model with the source-side phrase structure. Our proposed model has an attention mechanism that enables the decoder to generate a translated word while softly aligning it with phrases as well as words of the source sentence. We have empirically compared the proposed model with sequence-to-sequence models in various settings on Chinese-to-Japanese and English-to-Japanese translation tasks. Our experimental results suggest that the use of syntactic structure can be beneficial when the training data set is small, but is not as effective as using a bi-directional encoder. As the size of training data set increases, the benefits of using a syntactic tree tends to diminish.
APA, Harvard, Vancouver, ISO, and other styles
12

Hahn, Michael. "Theoretical Limitations of Self-Attention in Neural Sequence Models." Transactions of the Association for Computational Linguistics 8 (July 2020): 156–71. http://dx.doi.org/10.1162/tacl_a_00306.

Full text
Abstract:
Transformers are emerging as the new workhorse of NLP, showing great success across tasks. Unlike LSTMs, transformers process input sequences entirely through self-attention. Previous work has suggested that the computational capabilities of self-attention to process hierarchical structures are limited. In this work, we mathematically investigate the computational power of self-attention to model formal languages. Across both soft and hard attention, we show strong theoretical limitations of the computational abilities of self-attention, finding that it cannot model periodic finite-state languages, nor hierarchical structure, unless the number of layers or heads increases with input length. These limitations seem surprising given the practical success of self-attention and the prominent role assigned to hierarchical structure in linguistics, suggesting that natural language can be approximated well with models that are too weak for the formal languages typically assumed in theoretical linguistics.
APA, Harvard, Vancouver, ISO, and other styles
13

Duarte, Manuel, and Armando Pinho. "Bacterial DNA Sequence Compression Models Using Artificial Neural Networks." Entropy 15, no. 12 (August 30, 2013): 3435–48. http://dx.doi.org/10.3390/e15093435.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Jehl, Laura, Carolin Lawrence, and Stefan Riezler. "Learning Neural Sequence-to-Sequence Models from Weak Feedback with Bipolar Ramp Loss." Transactions of the Association for Computational Linguistics 7 (November 2019): 233–48. http://dx.doi.org/10.1162/tacl_a_00265.

Full text
Abstract:
In many machine learning scenarios, supervision by gold labels is not available and conse quently neural models cannot be trained directly by maximum likelihood estimation. In a weak supervision scenario, metric-augmented objectives can be employed to assign feedback to model outputs, which can be used to extract a supervision signal for training. We present several objectives for two separate weakly supervised tasks, machine translation and semantic parsing. We show that objectives should actively discourage negative outputs in addition to promoting a surrogate gold structure. This notion of bipolarity is naturally present in ramp loss objectives, which we adapt to neural models. We show that bipolar ramp loss objectives outperform other non-bipolar ramp loss objectives and minimum risk training on both weakly supervised tasks, as well as on a supervised machine translation task. Additionally, we introduce a novel token-level ramp loss objective, which is able to outperform even the best sequence-level ramp loss on both weakly supervised tasks.
APA, Harvard, Vancouver, ISO, and other styles
15

Han, Xu-Wang, Hai-Tao Zheng, Jin-Yuan Chen, and Cong-Zhi Zhao. "Diverse Decoding for Abstractive Document Summarization." Applied Sciences 9, no. 3 (January 23, 2019): 386. http://dx.doi.org/10.3390/app9030386.

Full text
Abstract:
Recently, neural sequence-to-sequence models have made impressive progress in abstractive document summarization. Unfortunately, as neural abstractive summarization research is in a primitive stage, the performance of these models is still far from ideal. In this paper, we propose a novel method called Neural Abstractive Summarization with Diverse Decoding (NASDD). This method augments the standard attentional sequence-to-sequence model in two aspects. First, we introduce a diversity-promoting beam search approach in the decoding process, which alleviates the serious diversity issue caused by standard beam search and hence increases the possibility of generating summary sequences that are more informative. Second, we creatively utilize the attention mechanism combined with the key information of the input document as an estimation of the salient information coverage, which aids in finding the optimal summary sequence. We carry out the experimental evaluation with state-of-the-art methods on the CNN/Daily Mail summarization dataset, and the results demonstrate the superiority of our proposed method.
APA, Harvard, Vancouver, ISO, and other styles
16

Lim, Dongjoon, and Mathieu Blanchette. "EvoLSTM: context-dependent models of sequence evolution using a sequence-to-sequence LSTM." Bioinformatics 36, Supplement_1 (July 1, 2020): i353—i361. http://dx.doi.org/10.1093/bioinformatics/btaa447.

Full text
Abstract:
Abstract Motivation Accurate probabilistic models of sequence evolution are essential for a wide variety of bioinformatics tasks, including sequence alignment and phylogenetic inference. The ability to realistically simulate sequence evolution is also at the core of many benchmarking strategies. Yet, mutational processes have complex context dependencies that remain poorly modeled and understood. Results We introduce EvoLSTM, a recurrent neural network-based evolution simulator that captures mutational context dependencies. EvoLSTM uses a sequence-to-sequence long short-term memory model trained to predict mutation probabilities at each position of a given sequence, taking into consideration the 14 flanking nucleotides. EvoLSTM can realistically simulate mammalian and plant DNA sequence evolution and reveals unexpectedly strong long-range context dependencies in mutation probabilities. EvoLSTM brings modern machine-learning approaches to bear on sequence evolution. It will serve as a useful tool to study and simulate complex mutational processes. Availability and implementation Code and dataset are available at https://github.com/DongjoonLim/EvoLSTM. Supplementary information Supplementary data are available at Bioinformatics online.
APA, Harvard, Vancouver, ISO, and other styles
17

Liu, Ming, and Jinxu Zhang. "Chinese Neural Question Generation: Augmenting Knowledge into Multiple Neural Encoders." Applied Sciences 12, no. 3 (January 19, 2022): 1032. http://dx.doi.org/10.3390/app12031032.

Full text
Abstract:
Neural question generation (NQG) is the task of automatically generating a question from a given passage and answering it with sequence-to-sequence neural models. Passage compression has been proposed to address the challenge of generating questions from a long passage text by only extracting relevant sentences containing the answer. However, it may not work well if the discarded irrelevant sentences contain the contextual information for the target question. Therefore, this study investigated how to incorporate knowledge triples into the sequence-to-sequence neural model to reduce such contextual information loss and proposed a multi-encoder neural model for Chinese question generation. This approach has been extensively evaluated in a large Chinese question and answer dataset. The study results showed that our approach outperformed the state-of-the-art NQG models by 5.938 points on the BLEU score and 7.120 points on the ROUGE-L score on the average since the proposed model is answer focused, which is helpful to produce an interrogative word matching the answer type. In addition, augmenting the information from the knowledge graph improves the BLEU score by 10.884 points. Finally, we discuss the challenges remaining for Chinese NQG.
APA, Harvard, Vancouver, ISO, and other styles
18

Welleck, Sean, and Kyunghyun Cho. "MLE-Guided Parameter Search for Task Loss Minimization in Neural Sequence Modeling." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 16 (May 18, 2021): 14032–40. http://dx.doi.org/10.1609/aaai.v35i16.17652.

Full text
Abstract:
Neural autoregressive sequence models are used to generate sequences in a variety of natural language processing (NLP) tasks, where they are evaluated according to sequence-level task losses. These models are typically trained with maximum likelihood estimation, which ignores the task loss, yet empirically performs well as a surrogate objective. Typical approaches to directly optimizing the task loss such as policy gradient and minimum risk training are based around sampling in the sequence space to obtain candidate update directions that are scored based on the loss of a single sequence. In this paper, we develop an alternative method based on random search in the parameter space that leverages access to the maximum likelihood gradient. We propose maximum likelihood guided parameter search (MGS), which samples from a distribution over update directions that is a mixture of random search around the current parameters and around the maximum likelihood gradient, with each direction weighted by its improvement in the task loss. MGS shifts sampling to the parameter space, and scores candidates using losses that are pooled from multiple sequences. Our experiments show that MGS is capable of optimizing sequence-level losses, with substantial reductions in repetition and non-termination in sequence completion, and similar improvements to those of minimum risk training in machine translation.
APA, Harvard, Vancouver, ISO, and other styles
19

Averbeck, Bruno B., James Kilner, and Christopher D. Frith. "Neural Correlates of Sequence Learning with Stochastic Feedback." Journal of Cognitive Neuroscience 23, no. 6 (June 2011): 1346–57. http://dx.doi.org/10.1162/jocn.2010.21436.

Full text
Abstract:
Although much is known about decision making under uncertainty when only a single step is required in the decision process, less is known about sequential decision making. We carried out a stochastic sequence learning task in which subjects had to use noisy feedback to learn sequences of button presses. We compared flat and hierarchical behavioral models and found that although both models predicted the choices of the group of subjects equally well, only the hierarchical model correlated significantly with learning-related changes in the magneto-encephalographic response. The significant modulations in the magneto-encephalographic signal occurred 83 msec before button press and 67 msec after button press. We also localized the sources of these effects and found that the early effect localized to the insula, whereas the late effect localized to the premotor cortex.
APA, Harvard, Vancouver, ISO, and other styles
20

Dabre, Raj, and Atsushi Fujita. "Recurrent Stacking of Layers for Compact Neural Machine Translation Models." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 6292–99. http://dx.doi.org/10.1609/aaai.v33i01.33016292.

Full text
Abstract:
In encoder-decoder based sequence-to-sequence modeling, the most common practice is to stack a number of recurrent, convolutional, or feed-forward layers in the encoder and decoder. While the addition of each new layer improves the sequence generation quality, this also leads to a significant increase in the number of parameters. In this paper, we propose to share parameters across all layers thereby leading to a recurrently stacked sequence-to-sequence model. We report on an extensive case study on neural machine translation (NMT) using our proposed method, experimenting with a variety of datasets. We empirically show that the translation quality of a model that recurrently stacks a single-layer 6 times, despite its significantly fewer parameters, approaches that of a model that stacks 6 different layers. We also show how our method can benefit from a prevalent way for improving NMT, i.e., extending training data with pseudo-parallel corpora generated by back-translation. We then analyze the effects of recurrently stacked layers by visualizing the attentions of models that use recurrently stacked layers and models that do not. Finally, we explore the limits of parameter sharing where we share even the parameters between the encoder and decoder in addition to recurrent stacking of layers.
APA, Harvard, Vancouver, ISO, and other styles
21

Silva, Milton, Diogo Pratas, and Armando J. Pinho. "AC2: An Efficient Protein Sequence Compression Tool Using Artificial Neural Networks and Cache-Hash Models." Entropy 23, no. 5 (April 26, 2021): 530. http://dx.doi.org/10.3390/e23050530.

Full text
Abstract:
Recently, the scientific community has witnessed a substantial increase in the generation of protein sequence data, triggering emergent challenges of increasing importance, namely efficient storage and improved data analysis. For both applications, data compression is a straightforward solution. However, in the literature, the number of specific protein sequence compressors is relatively low. Moreover, these specialized compressors marginally improve the compression ratio over the best general-purpose compressors. In this paper, we present AC2, a new lossless data compressor for protein (or amino acid) sequences. AC2 uses a neural network to mix experts with a stacked generalization approach and individual cache-hash memory models to the highest-context orders. Compared to the previous compressor (AC), we show gains of 2–9% and 6–7% in reference-free and reference-based modes, respectively. These gains come at the cost of three times slower computations. AC2 also improves memory usage against AC, with requirements about seven times lower, without being affected by the sequences’ input size. As an analysis application, we use AC2 to measure the similarity between each SARS-CoV-2 protein sequence with each viral protein sequence from the whole UniProt database. The results consistently show higher similarity to the pangolin coronavirus, followed by the bat and human coronaviruses, contributing with critical results to a current controversial subject. AC2 is available for free download under GPLv3 license.
APA, Harvard, Vancouver, ISO, and other styles
22

Rahul, Kodithala. "Neural Machine Translation." International Journal for Research in Applied Science and Engineering Technology 10, no. 7 (July 31, 2022): 2027–30. http://dx.doi.org/10.22214/ijraset.2022.45669.

Full text
Abstract:
Abstract: The project's novelty is not merely importing modules and preparing data and feeding the data to the model but understanding how the real language translation works and implementing the logics underlying each method utilized and creating every function from scratch, resulting in the creationof a Neural Machine Translation model. Initially, translation was accomplished by simply substituting words from one language for those from another. However, because languages are essentially different, a greater degree of knowledge (e.g., phrases/sentences) is required to achieveeffective results. With the introduction of deep learning, modern software now employs statisticaland neural techniques that have been shown to be more effective when translating. We are essentially translating German to English utilizing Sequence to Sequence models with attention and transformer models. Of course, everyone has access to Google Translates power, but if you want to learn how to implement translation in code, this project will show you how. We are writingour code from scratch, without using any libraries,in order to understand how each model works.While this design is a little out of date, it is still a great project to work on if you want to learn more about attention processes before moving on to Transformers. It is based on Effective Approaches to Attentionbased Neural Machine Translator and is a sequence to sequence (seq2seq) model for German to English translation.
APA, Harvard, Vancouver, ISO, and other styles
23

Schwaller, Philippe, Théophile Gaudin, Dávid Lányi, Costas Bekas, and Teodoro Laino. "“Found in Translation”: predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models." Chemical Science 9, no. 28 (2018): 6091–98. http://dx.doi.org/10.1039/c8sc02339e.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Zhang, Hao, Richard Sproat, Axel H. Ng, Felix Stahlberg, Xiaochang Peng, Kyle Gorman, and Brian Roark. "Neural Models of Text Normalization for Speech Applications." Computational Linguistics 45, no. 2 (June 2019): 293–337. http://dx.doi.org/10.1162/coli_a_00349.

Full text
Abstract:
Machine learning, including neural network techniques, have been applied to virtually every domain in natural language processing. One problem that has been somewhat resistant to effective machine learning solutions is text normalization for speech applications such as text-to-speech synthesis (TTS). In this application, one must decide, for example, that 123 is verbalized as one hundred twenty three in 123 pages but as one twenty three in 123 King Ave. For this task, state-of-the-art industrial systems depend heavily on hand-written language-specific grammars. We propose neural network models that treat text normalization for TTS as a sequence-to-sequence problem, in which the input is a text token in context, and the output is the verbalization of that token. We find that the most effective model, in accuracy and efficiency, is one where the sentential context is computed once and the results of that computation are combined with the computation of each token in sequence to compute the verbalization. This model allows for a great deal of flexibility in terms of representing the context, and also allows us to integrate tagging and segmentation into the process. These models perform very well overall, but occasionally they will predict wildly inappropriate verbalizations, such as reading 3 cm as three kilometers. Although rare, such verbalizations are a major issue for TTS applications. We thus use finite-state covering grammars to guide the neural models, either during training and decoding, or just during decoding, away from such “unrecoverable” errors. Such grammars can largely be learned from data.
APA, Harvard, Vancouver, ISO, and other styles
25

Li, Yurui, Mingjing Du, and Sheng He. "Attention-Based Sequence-to-Sequence Model for Time Series Imputation." Entropy 24, no. 12 (December 9, 2022): 1798. http://dx.doi.org/10.3390/e24121798.

Full text
Abstract:
Time series data are usually characterized by having missing values, high dimensionality, and large data volume. To solve the problem of high-dimensional time series with missing values, this paper proposes an attention-based sequence-to-sequence model to imputation missing values in time series (ASSM), which is a sequence-to-sequence model based on the combination of feature learning and data computation. The model consists of two parts, encoder and decoder. The encoder part is a BIGRU recurrent neural network and incorporates a self-attentive mechanism to make the model more capable of handling long-range time series; The decoder part is a GRU recurrent neural network and incorporates a cross-attentive mechanism into associate with the encoder part. The relationship weights between the generated sequences in the decoder part and the known sequences in the encoder part are calculated to achieve the purpose of focusing on the sequences with a high degree of correlation. In this paper, we conduct comparison experiments with four evaluation metrics and six models on four real datasets. The experimental results show that the model proposed in this paper outperforms the six comparative missing value interpolation algorithms.
APA, Harvard, Vancouver, ISO, and other styles
26

Zarrieß, Sina, Henrik Voigt, and Simeon Schüz. "Decoding Methods in Neural Language Generation: A Survey." Information 12, no. 9 (August 30, 2021): 355. http://dx.doi.org/10.3390/info12090355.

Full text
Abstract:
Neural encoder-decoder models for language generation can be trained to predict words directly from linguistic or non-linguistic inputs. When generating with these so-called end-to-end models, however, the NLG system needs an additional decoding procedure that determines the output sequence, given the infinite search space over potential sequences that could be generated with the given vocabulary. This survey paper provides an overview of the different ways of implementing decoding on top of neural network-based generation models. Research into decoding has become a real trend in the area of neural language generation, and numerous recent papers have shown that the choice of decoding method has a considerable impact on the quality and various linguistic properties of the generation output of a neural NLG system. This survey aims to contribute to a more systematic understanding of decoding methods across different areas of neural NLG. We group the reviewed methods with respect to the broad type of objective that they optimize in the generation of the sequence—likelihood, diversity, and task-specific linguistic constraints or goals—and discuss their respective strengths and weaknesses.
APA, Harvard, Vancouver, ISO, and other styles
27

Colombo, Pierre, Emile Chapuis, Matteo Manica, Emmanuel Vignon, Giovanna Varni, and Chloe Clavel. "Guiding Attention in Sequence-to-Sequence Models for Dialogue Act Prediction." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 05 (April 3, 2020): 7594–601. http://dx.doi.org/10.1609/aaai.v34i05.6259.

Full text
Abstract:
The task of predicting dialog acts (DA) based on conversational dialog is a key component in the development of conversational agents. Accurately predicting DAs requires a precise modeling of both the conversation and the global tag dependencies. We leverage seq2seq approaches widely adopted in Neural Machine Translation (NMT) to improve the modelling of tag sequentiality. Seq2seq models are known to learn complex global dependencies while currently proposed approaches using linear conditional random fields (CRF) only model local tag dependencies. In this work, we introduce a seq2seq model tailored for DA classification using: a hierarchical encoder, a novel guided attention mechanism and beam search applied to both training and inference. Compared to the state of the art our model does not require handcrafted features and is trained end-to-end. Furthermore, the proposed approach achieves an unmatched accuracy score of 85% on SwDA, and state-of-the-art accuracy score of 91.6% on MRDA.
APA, Harvard, Vancouver, ISO, and other styles
28

Li, Yangming, Kaisheng Yao, Libo Qin, Shuang Peng, Yijia Liu, and Xiaolong Li. "Span-Based Neural Buffer: Towards Efficient and Effective Utilization of Long-Distance Context for Neural Sequence Models." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 05 (April 3, 2020): 8277–84. http://dx.doi.org/10.1609/aaai.v34i05.6343.

Full text
Abstract:
Neural sequence model, though widely used for modeling sequential data such as the language model, has sequential recency bias (Kuncoro et al. 2018) to the local context, limiting its full potential to capture long-distance context. To address this problem, this paper proposes augmenting sequence models with a span-based neural buffer that efficiently represents long-distance context, allowing a gate policy network to make interpolated predictions from both the neural buffer and the underlying sequence model. Training this policy network to utilize long-distance context is however challenging due to the simple sentence dominance problem (Marvin and Linzen 2018). To alleviate this problem, we propose a novel training algorithm that combines an annealed maximum likelihood estimation with an intrinsic reward-driven reinforcement learning. Sequence models with the proposed span-based neural buffer significantly improve the state-of-the-art perplexities on the benchmark Penn Treebank and WikiText-2 datasets to 43.9 and 35.2 respectively. We conduct extensive analysis and confirm that the proposed architecture and the training algorithm both contribute to the improvements.
APA, Harvard, Vancouver, ISO, and other styles
29

Helcl, Jindřich, and Jindřich Libovický. "Neural Monkey: An Open-source Tool for Sequence Learning." Prague Bulletin of Mathematical Linguistics 107, no. 1 (April 1, 2017): 5–17. http://dx.doi.org/10.1515/pralin-2017-0001.

Full text
Abstract:
Abstract In this paper, we announce the development of Neural Monkey – an open-source neural machine translation (NMT) and general sequence-to-sequence learning system built over the TensorFlow machine learning library. The system provides a high-level API tailored for fast prototyping of complex architectures with multiple sequence encoders and decoders. Models’ overall architecture is specified in easy-to-read configuration files. The long-term goal of the Neural Monkey project is to create and maintain a growing collection of implementations of recently proposed components or methods, and therefore it is designed to be easily extensible. Trained models can be deployed either for batch data processing or as a web service. In the presented paper, we describe the design of the system and introduce the reader to running experiments using Neural Monkey.
APA, Harvard, Vancouver, ISO, and other styles
30

Jahier Pagliari, Daniele, Francesco Daghero, and Massimo Poncino. "Sequence-To-Sequence Neural Networks Inference on Embedded Processors Using Dynamic Beam Search." Electronics 9, no. 2 (February 15, 2020): 337. http://dx.doi.org/10.3390/electronics9020337.

Full text
Abstract:
Sequence-to-sequence deep neural networks have become the state of the art for a variety of machine learning applications, ranging from neural machine translation (NMT) to speech recognition. Many mobile and Internet of Things (IoT) applications would benefit from the ability of performing sequence-to-sequence inference directly in embedded devices, thereby reducing the amount of raw data transmitted to the cloud, and obtaining benefits in terms of response latency, energy consumption and security. However, due to the high computational complexity of these models, specific optimization techniques are needed to achieve acceptable performance and energy consumption on single-core embedded processors. In this paper, we present a new optimization technique called dynamic beam search, in which the inference complexity is tuned to the difficulty of the processed input sequence at runtime. Results based on measurements on a real embedded device, and on three state-of-the-art deep learning models, show that our method is able to reduce the inference time and energy by up to 25% without loss of accuracy.
APA, Harvard, Vancouver, ISO, and other styles
31

Zech, John, Jessica Forde, Joseph J. Titano, Deepak Kaji, Anthony Costa, and Eric Karl Oermann. "Detecting insertion, substitution, and deletion errors in radiology reports using neural sequence-to-sequence models." Annals of Translational Medicine 7, no. 11 (June 2019): 233. http://dx.doi.org/10.21037/atm.2018.08.11.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

Xia, Min, W. K. Wong, and Zhijie Wang. "Sequence Memory Based on Coherent Spin-Interaction Neural Networks." Neural Computation 26, no. 12 (December 2014): 2944–61. http://dx.doi.org/10.1162/neco_a_00663.

Full text
Abstract:
Sequence information processing, for instance, the sequence memory, plays an important role on many functions of brain. In the workings of the human brain, the steady-state period is alterable. However, in the existing sequence memory models using heteroassociations, the steady-state period cannot be changed in the sequence recall. In this work, a novel neural network model for sequence memory with controllable steady-state period based on coherent spininteraction is proposed. In the proposed model, neurons fire collectively in a phase-coherent manner, which lets a neuron group respond differently to different patterns and also lets different neuron groups respond differently to one pattern. The simulation results demonstrating the performance of the sequence memory are presented. By introducing a new coherent spin-interaction sequence memory model, the steady-state period can be controlled by dimension parameters and the overlap between the input pattern and the stored patterns. The sequence storage capacity is enlarged by coherent spin interaction compared with the existing sequence memory models. Furthermore, the sequence storage capacity has an exponential relationship to the dimension of the neural network.
APA, Harvard, Vancouver, ISO, and other styles
33

Azha Javed and Muhammad Javed Iqbal. "Classification of Biological Data using Deep Learning Technique." NUML International Journal of Engineering and Computing 1, no. 1 (April 27, 2022): 13–26. http://dx.doi.org/10.52015/nijec.v1i1.10.

Full text
Abstract:
A huge amount of newly sequenced proteins is being discovered on daily basis. The mainconcern is how to extract the useful characteristics of sequences as the input features for thenetwork. These sequences are increasing exponentially over the decades. However, it is veryexpensive to characterize functions for biological experiments and also, it is really necessaryto find the association between the information of datasets to create and improve medicaltools. Recently machine learning algorithms got huge attention and are widely used. Thesealgorithms are based on deep learning architecture and data-driven models. Previous workfailed to properly address issues related to the classification of biological sequences i.e.protein including efficient encoding of variable length biological sequence data andimplementation of deep learning based neural network models to enhance the performance ofclassification/ recognition systems. To overcome these issues, we have proposed a deeplearning based neural network architecture so that classification performance of the systemcan be increased. In our work, we have proposed 1D-convolution neural network whichclassifies the protein sequences to 10 top common classes. The model extracted features fromthe protein sequences labels and learned through the dataset. We have trained and evaluateour model on protein sequences downloaded from protein data bank (PDB). The modelmaximizes the accuracy rate up to 96%.
APA, Harvard, Vancouver, ISO, and other styles
34

Liu, Zuozhu, Thiparat Chotibut, Christopher Hillar, and Shaowei Lin. "Biologically Plausible Sequence Learning with Spiking Neural Networks." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 02 (April 3, 2020): 1316–23. http://dx.doi.org/10.1609/aaai.v34i02.5487.

Full text
Abstract:
Motivated by the celebrated discrete-time model of nervous activity outlined by McCulloch and Pitts in 1943, we propose a novel continuous-time model, the McCulloch-Pitts network (MPN), for sequence learning in spiking neural networks. Our model has a local learning rule, such that the synaptic weight updates depend only on the information directly accessible by the synapse. By exploiting asymmetry in the connections between binary neurons, we show that MPN can be trained to robustly memorize multiple spatiotemporal patterns of binary vectors, generalizing the ability of the symmetric Hopfield network to memorize static spatial patterns. In addition, we demonstrate that the model can efficiently learn sequences of binary pictures as well as generative models for experimental neural spike-train data. Our learning rule is consistent with spike-timing-dependent plasticity (STDP), thus providing a theoretical ground for the systematic design of biologically inspired networks with large and robust long-range sequence storage capacity.
APA, Harvard, Vancouver, ISO, and other styles
35

Bouktif, Salah, Ali Fiaz, Ali Ouni, and Mohamed Adel Serhani. "Single and Multi-Sequence Deep Learning Models for Short and Medium Term Electric Load Forecasting." Energies 12, no. 1 (January 2, 2019): 149. http://dx.doi.org/10.3390/en12010149.

Full text
Abstract:
Time series analysis using long short term memory (LSTM) deep learning is a very attractive strategy to achieve accurate electric load forecasting. Although it outperforms most machine learning approaches, the LSTM forecasting model still reveals a lack of validity because it neglects several characteristics of the electric load exhibited by time series. In this work, we propose a load-forecasting model based on enhanced-LSTM that explicitly considers the periodicity characteristic of the electric load by using multiple sequences of inputs time lags. An autoregressive model is developed together with an autocorrelation function (ACF) to regress consumption and identify the most relevant time lags to feed the multi-sequence LSTM. Two variations of deep neural networks, LSTM and gated recurrent unit (GRU) are developed for both single and multi-sequence time-lagged features. These models are compared to each other and to a spectrum of data mining benchmark techniques including artificial neural networks (ANN), boosting, and bagging ensemble trees. France Metropolitan’s electricity consumption data is used to train and validate our models. The obtained results show that GRU- and LSTM-based deep learning model with multi-sequence time lags achieve higher performance than other alternatives including the single-sequence LSTM. It is demonstrated that the new models can capture critical characteristics of complex time series (i.e., periodicity) by encompassing past information from multiple timescale sequences. These models subsequently achieve predictions that are more accurate.
APA, Harvard, Vancouver, ISO, and other styles
36

Kaselimi, M., N. Doulamis, A. Doulamis, and D. Delikaraoglou. "A SEQUENCE-TO-SEQUENCE TEMPORAL CONVOLUTIONAL NEURAL NETWORK FOR IONOSPHERE PREDICTION USING GNSS OBSERVATIONS." ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLIII-B3-2020 (August 21, 2020): 813–20. http://dx.doi.org/10.5194/isprs-archives-xliii-b3-2020-813-2020.

Full text
Abstract:
Abstract. This paper proposes a model suitable for predicting the ionosphere delay at different locations of receiver stations using a temporal 1D convolutional neural network (CNN) model. CNN model can optimally addresses non-linearity and model complex data through the creation of powerful representations at hierarchical levels of abstraction. To be able to predict ionosphere values for each visible satellite at a given station, sequence-to-sequence (seq2seq) models are introduced. These models are commonly used for solving sequential problems. In seq2seq models, a sequential input is entered to the model and the output has also a sequential form. Adopting this structure help us to predict ionosphere values for all satellites in view at every epoch. As experimental data, we used global navigation satellite system (GNSS) observations from selected sites in central Europe, of the global international GNSS network (IGS). The data used are part of the multi GNSS experiment (MGEX) project, that provides observations from multiple navigation satellite systems. After processing with precise point positioning (PPP) technique as implemented with GAMP software, the slant total electron content data (STEC) were obtained. The proposed CNN uses as input the ionosphere pierce points (IPP) points coordinates per visible satellite. Then, based on outcomes of the ionosphere parameters, the temporal CNN is deployed to predict future TEC variations.
APA, Harvard, Vancouver, ISO, and other styles
37

McCoy, R. Thomas, Robert Frank, and Tal Linzen. "Does Syntax Need to Grow on Trees? Sources of Hierarchical Inductive Bias in Sequence-to-Sequence Networks." Transactions of the Association for Computational Linguistics 8 (July 2020): 125–40. http://dx.doi.org/10.1162/tacl_a_00304.

Full text
Abstract:
Learners that are exposed to the same training data might generalize differently due to differing inductive biases. In neural network models, inductive biases could in theory arise from any aspect of the model architecture. We investigate which architectural factors affect the generalization behavior of neural sequence-to-sequence models trained on two syntactic tasks, English question formation and English tense reinflection. For both tasks, the training set is consistent with a generalization based on hierarchical structure and a generalization based on linear order. All architectural factors that we investigated qualitatively affected how models generalized, including factors with no clear connection to hierarchical structure. For example, LSTMs and GRUs displayed qualitatively different inductive biases. However, the only factor that consistently contributed a hierarchical bias across tasks was the use of a tree-structured model rather than a model with sequential recurrence, suggesting that human-like syntactic generalization requires architectural syntactic structure.
APA, Harvard, Vancouver, ISO, and other styles
38

Lourentzou, Ismini, Kabir Manghnani, and ChengXiang Zhai. "Adapting Sequence to Sequence Models for Text Normalization in Social Media." Proceedings of the International AAAI Conference on Web and Social Media 13 (July 6, 2019): 335–45. http://dx.doi.org/10.1609/icwsm.v13i01.3234.

Full text
Abstract:
Social media offer an abundant source of valuable raw data, however informal writing can quickly become a bottleneck for many natural language processing (NLP) tasks. Off-theshelf tools are usually trained on formal text and cannot explicitly handle noise found in short online posts. Moreover, the variety of frequently occurring linguistic variations presents several challenges, even for humans who might not be able to comprehend the meaning of such posts, especially when they contain slang and abbreviations. Text Normalization aims to transform online user-generated text to a canonical form. Current text normalization systems rely on string or phonetic similarity and classification models that work on a local fashion. We argue that processing contextual information is crucial for this task and introduce a social media text normalization hybrid word-character attention-based encoder-decoder model that can serve as a pre-processing step for NLP applications to adapt to noisy text in social media. Our character-based component is trained on synthetic adversarial examples that are designed to capture errors commonly found in online user-generated text. Experiments show that our model surpasses neural architectures designed for text normalization and achieves comparable performance with state-of-the-art related work.
APA, Harvard, Vancouver, ISO, and other styles
39

Madi, Nora, and Hend Al-Khalifa. "Error Detection for Arabic Text Using Neural Sequence Labeling." Applied Sciences 10, no. 15 (July 30, 2020): 5279. http://dx.doi.org/10.3390/app10155279.

Full text
Abstract:
The English language has, thus far, received the most attention in research concerning automatic grammar error correction and detection. However, these tasks have been less investigated for other languages. In this paper, we present the first experiments using neural network models for the task of error detection for Modern Standard Arabic (MSA) text. We investigate several neural network architectures and report the evaluation results acquired by applying cross-validation on the data. All experiments involve a corpus we created and augmented. The corpus has 494 sentences and 620 sentences after augmentation. Our models achieved a maximum precision of 78.09%, recall of 83.95%, and F0.5 score of 79.62% in the error detection task using SimpleRNN. Using an LSTM, we achieved a maximum precision of 79.21%, recall of 93.8%, and F0.5 score of 79.16%. Finally, the best results were achieved using a BiLSTM with a maximum precision of 80.74%, recall of 85.73%, and F0.5 score of 81.55%. We compared the results of the three models to a baseline, which is a commercially available Arabic grammar checker (Microsoft Word 2007). LSTM, BiLSTM, and SimpleRNN all outperformed the baseline in precision and F0.5. Our work shows preliminary results, demonstrating that neural network architectures for error detection through sequence labeling can successfully be applied to Arabic text.
APA, Harvard, Vancouver, ISO, and other styles
40

Yonglan, Li, and He Wenjia. "English-Chinese Machine Translation Model Based on Bidirectional Neural Network with Attention Mechanism." Journal of Sensors 2022 (March 17, 2022): 1–11. http://dx.doi.org/10.1155/2022/5199248.

Full text
Abstract:
In recent years, with the development of deep learning, machine translation using neural network has gradually become the mainstream method in industry and academia. The existing Chinese-English machine translation models generally adopt the deep neural network architecture based on attention mechanism. However, it is still a challenging problem to model short and long sequences simultaneously. Therefore, a bidirectional LSTM model integrating attention mechanism is proposed. Firstly, by using the word vector as the input data of the translation model, the linguistic symbols used in the translation process are mathematized. Secondly, two attention mechanisms are designed: local attention mechanism and global attention mechanism. The local attention mechanism is mainly used to learn which words or phrases in the input sequence are more important for modeling, while the global attention mechanism is used to learn which layer of expression vector in the input sequence is more critical. Bidirectional LSTM can better fuse the feature information in the input sequence, while bidirectional LSTM with attention mechanism can simultaneously model short and long sequences. The experimental results show that compared with many existing translation models, the bidirectional LSTM model with attention mechanism can effectively improve the quality of machine translation.
APA, Harvard, Vancouver, ISO, and other styles
41

Yaish, Ofir, and Yaron Orenstein. "Computational modeling of mRNA degradation dynamics using deep neural networks." Bioinformatics 38, no. 4 (November 26, 2021): 1087–101. http://dx.doi.org/10.1093/bioinformatics/btab800.

Full text
Abstract:
Abstract Motivation messenger RNA (mRNA) degradation plays critical roles in post-transcriptional gene regulation. A major component of mRNA degradation is determined by 3′-UTR elements. Hence, researchers are interested in studying mRNA dynamics as a function of 3′-UTR elements. A recent study measured the mRNA degradation dynamics of tens of thousands of 3′-UTR sequences using a massively parallel reporter assay. However, the computational approach used to model mRNA degradation was based on a simplifying assumption of a linear degradation rate. Consequently, the underlying mechanism of 3′-UTR elements is still not fully understood. Results Here, we developed deep neural networks to predict mRNA degradation dynamics and interpreted the networks to identify regulatory elements in the 3′-UTR and their positional effect. Given an input of a 110 nt-long 3′-UTR sequence and an initial mRNA level, the model predicts mRNA levels of eight consecutive time points. Our deep neural networks significantly improved prediction performance of mRNA degradation dynamics compared with extant methods for the task. Moreover, we demonstrated that models predicting the dynamics of two identical 3′-UTR sequences, differing by their poly(A) tail, performed better than single-task models. On the interpretability front, by using Integrated Gradients, our convolutional neural networks (CNNs) models identified known and novel cis-regulatory sequence elements of mRNA degradation. By applying a novel systematic evaluation of model interpretability, we demonstrated that the recurrent neural network models are inferior to the CNN models in terms of interpretability and that random initialization ensemble improves both prediction and interoperability performance. Moreover, using a mutagenesis analysis, we newly discovered the positional effect of various 3′-UTR elements. Availability and implementation All the code developed through this study is available at github.com/OrensteinLab/DeepUTR/. Supplementary information Supplementary data are available at Bioinformatics online.
APA, Harvard, Vancouver, ISO, and other styles
42

Suszyński, Marcin, and Katarzyna Peta. "Assembly Sequence Planning Using Artificial Neural Networks for Mechanical Parts Based on Selected Criteria." Applied Sciences 11, no. 21 (November 5, 2021): 10414. http://dx.doi.org/10.3390/app112110414.

Full text
Abstract:
The proposed model of the neural network describes the task of planning the assembly sequence on the basis of predicting the optimal assembly time of mechanical parts. In the proposed neural approach, the k-means clustering algorithm is used. In order to find the most effective network, 10,000 network models were made using various training methods, including the steepest descent method, the conjugate gradients method, and Broyden–Fletcher–Goldfarb–Shanno algorithm. Changes to network parameters also included the following activation functions: linear, logistic, tanh, exponential, and sine. The simulation results suggest that the neural predictor would be used as a predictor for the assembly sequence planning system. This paper discusses a new modeling scheme known as artificial neural networks, taking into account selected criteria for the evaluation of assembly sequences based on data that can be automatically downloaded from CAx systems.
APA, Harvard, Vancouver, ISO, and other styles
43

Liu, Pengfei, Shuaichen Chang, Xuanjing Huang, Jian Tang, and Jackie Chi Kit Cheung. "Contextualized Non-Local Neural Networks for Sequence Learning." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 6762–69. http://dx.doi.org/10.1609/aaai.v33i01.33016762.

Full text
Abstract:
Recently, a large number of neural mechanisms and models have been proposed for sequence learning, of which selfattention, as exemplified by the Transformer model, and graph neural networks (GNNs) have attracted much attention. In this paper, we propose an approach that combines and draws on the complementary strengths of these two methods. Specifically, we propose contextualized non-local neural networks (CN3), which can both dynamically construct a task-specific structure of a sentence and leverage rich local dependencies within a particular neighbourhood.Experimental results on ten NLP tasks in text classification, semantic matching, and sequence labelling show that our proposed model outperforms competitive baselines and discovers task-specific dependency structures, thus providing better interpretability to users.
APA, Harvard, Vancouver, ISO, and other styles
44

Cheng, Minhao, Jinfeng Yi, Pin-Yu Chen, Huan Zhang, and Cho-Jui Hsieh. "Seq2Sick: Evaluating the Robustness of Sequence-to-Sequence Models with Adversarial Examples." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 04 (April 3, 2020): 3601–8. http://dx.doi.org/10.1609/aaai.v34i04.5767.

Full text
Abstract:
Crafting adversarial examples has become an important technique to evaluate the robustness of deep neural networks (DNNs). However, most existing works focus on attacking the image classification problem since its input space is continuous and output space is finite. In this paper, we study the much more challenging problem of crafting adversarial examples for sequence-to-sequence (seq2seq) models, whose inputs are discrete text strings and outputs have an almost infinite number of possibilities. To address the challenges caused by the discrete input space, we propose a projected gradient method combined with group lasso and gradient regularization. To handle the almost infinite output space, we design some novel loss functions to conduct non-overlapping attack and targeted keyword attack. We apply our algorithm to machine translation and text summarization tasks, and verify the effectiveness of the proposed algorithm: by changing less than 3 words, we can make seq2seq model to produce desired outputs with high success rates. We also use an external sentiment classifier to verify the property of preserving semantic meanings for our generated adversarial examples. On the other hand, we recognize that, compared with the well-evaluated CNN-based classifiers, seq2seq models are intrinsically more robust to adversarial attacks.
APA, Harvard, Vancouver, ISO, and other styles
45

Feinauer, Christoph, Barthelemy Meynard-Piganeau, and Carlo Lucibello. "Interpretable pairwise distillations for generative protein sequence models." PLOS Computational Biology 18, no. 6 (June 23, 2022): e1010219. http://dx.doi.org/10.1371/journal.pcbi.1010219.

Full text
Abstract:
Many different types of generative models for protein sequences have been proposed in literature. Their uses include the prediction of mutational effects, protein design and the prediction of structural properties. Neural network (NN) architectures have shown great performances, commonly attributed to the capacity to extract non-trivial higher-order interactions from the data. In this work, we analyze two different NN models and assess how close they are to simple pairwise distributions, which have been used in the past for similar problems. We present an approach for extracting pairwise models from more complex ones using an energy-based modeling framework. We show that for the tested models the extracted pairwise models can replicate the energies of the original models and are also close in performance in tasks like mutational effect prediction. In addition, we show that even simpler, factorized models often come close in performance to the original models.
APA, Harvard, Vancouver, ISO, and other styles
46

Nair, Viswajit Vinod, Sonaal Pathlai Pradeep, Vaishnavi Sudheer Nair, P. N. Pournami, G. Gopakumar, and P. B. Jayaraj. "Deep Sequence Models for Ligand-Based Virtual Screening." Journal of Computational Biophysics and Chemistry 21, no. 02 (February 4, 2022): 207–17. http://dx.doi.org/10.1142/s2737416522500107.

Full text
Abstract:
The past few years have witnessed machine learning techniques take the limelight in multiple research domains. One such domain that has reaped the benefits of machine learning is computer-aided drug discovery, where the search space for candidate drug molecules is decreased using methods such as virtual screening. Current state-of-the-art sequential neural network models have shown promising results and we would like to replicate similar results with virtual screening using the encoded molecular information known as simplified molecular-input line-entry system (SMILES). Our work includes the use of attention-based sequential models — the long short-term memory with attention and an optimized version of the transformer network specifically designed to deal with SMILES (ChemBERTa). We also propose the “Overall Screening Efficacy”, an averaging metric that aggregates and encapsulates the model performance over multiple datasets. We found an overall improvement of about [Formula: see text] over the benchmark model, which relied on parallelized random forests.
APA, Harvard, Vancouver, ISO, and other styles
47

La Quatra, Moreno, and Luca Cagliero. "BART-IT: An Efficient Sequence-to-Sequence Model for Italian Text Summarization." Future Internet 15, no. 1 (December 27, 2022): 15. http://dx.doi.org/10.3390/fi15010015.

Full text
Abstract:
The emergence of attention-based architectures has led to significant improvements in the performance of neural sequence-to-sequence models for text summarization. Although these models have proved to be effective in summarizing English-written documents, their portability to other languages is limited thus leaving plenty of room for improvement. In this paper, we present BART-IT, a sequence-to-sequence model, based on the BART architecture that is specifically tailored to the Italian language. The model is pre-trained on a large corpus of Italian-written pieces of text to learn language-specific features and then fine-tuned on several benchmark datasets established for abstractive summarization. The experimental results show that BART-IT outperforms other state-of-the-art models in terms of ROUGE scores in spite of a significantly smaller number of parameters. The use of BART-IT can foster the development of interesting NLP applications for the Italian language. Beyond releasing the model to the research community to foster further research and applications, we also discuss the ethical implications behind the use of abstractive summarization models.
APA, Harvard, Vancouver, ISO, and other styles
48

Nataraj, Sathees Kumar, M. P. Paulraj, Ahmad Nazri Bin Abdullah, and Sazali Bin Yaacob. "A systematic approach for segmenting voiced/unvoiced signals using fuzzy-logic system and general fusion of neural network models for phonemes-based speech recognition." Journal of Intelligent & Fuzzy Systems 39, no. 5 (November 19, 2020): 7411–29. http://dx.doi.org/10.3233/jifs-200780.

Full text
Abstract:
In this paper, a speech-to-text translation model has been developed for Malaysian speakers based on 41 classes of Phonemes. A simple data acquisition algorithm has been used to develop a MATLAB graphical user interface (GUI) for recording the isolated word speech signals from 35 non-native Malaysian speakers. The collected database consists of 86 words with 41 classes of phoneme based on Affricatives, Diphthongs, Fricatives, Liquid, Nasals, Semivowels and Glides, Stop and Vowels. The speech samples are preprocessed to eliminate the undesirable artifacts and the fuzzy voice classifier has been employed to classify the samples into voiced sequence and unvoiced sequence. The voiced sequences are divided into frame segments and for each frame, the Linear Predictive co-efficients features are obtained from the voiced sequence. Then the feature sets are formed by deriving the LPC features from all the extracted voiced sequences, and used for classification. The isolated words chosen based on the phonemes are associated with the extracted features to establish classification system input-output mapping. The data are then normalized and randomized to rearrange the values into definite range. The Multilayer Neural Network (MLNN) model has been developed with four combinations of input and hidden activation functions. The neural network models are trained with 60%, 70% and 80% of the total data samples. The neural network architecture was aimed at creating a robust model with 60%, 70%, and 80% of the feature set with 25 trials. The trained network model is validated by simulating the network with the remaining 40%, 30%, and 20% of the set. The reliability of trained network models were compared by measuring true-positive, false-negative, and network classification accuracy. The LPC features show better discrimination and the MLNN neural network models trained using the LPC spectral band features gives better recognition.
APA, Harvard, Vancouver, ISO, and other styles
49

Gupta, Divam, Tanmoy Chakraborty, and Soumen Chakrabarti. "GIRNet: Interleaved Multi-Task Recurrent State Sequence Models." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 6497–504. http://dx.doi.org/10.1609/aaai.v33i01.33016497.

Full text
Abstract:
In several natural language tasks, labeled sequences are available in separate domains (say, languages), but the goal is to label sequences with mixed domain (such as code-switched text). Or, we may have available models for labeling whole passages (say, with sentiments), which we would like to exploit toward better position-specific label inference (say, target-dependent sentiment annotation). A key characteristic shared across such tasks is that different positions in a primary instance can benefit from different ‘experts’ trained from auxiliary data, but labeled primary instances are scarce, and labeling the best expert for each position entails unacceptable cognitive burden. We propose GIRNet, a unified position-sensitive multi-task recurrent neural network (RNN) architecture for such applications. Auxiliary and primary tasks need not share training instances. Auxiliary RNNs are trained over auxiliary instances. A primary instance is also submitted to each auxiliary RNN, but their state sequences are gated and merged into a novel composite state sequence tailored to the primary inference task. Our approach is in sharp contrast to recent multi-task networks like the crossstitch and sluice networks, which do not control state transfer at such fine granularity. We demonstrate the superiority of GIRNet using three applications: sentiment classification of code-switched passages, part-of-speech tagging of codeswitched text, and target position-sensitive annotation of sentiment in monolingual passages. In all cases, we establish new state-of-the-art performance beyond recent competitive baselines.
APA, Harvard, Vancouver, ISO, and other styles
50

Zhao, Zhengqiao, Stephen Woloszynek, Felix Agbavor, Joshua Chang Mell, Bahrad A. Sokhansanj, and Gail L. Rosen. "Learning, visualizing and exploring 16S rRNA structure using an attention-based deep neural network." PLOS Computational Biology 17, no. 9 (September 22, 2021): e1009345. http://dx.doi.org/10.1371/journal.pcbi.1009345.

Full text
Abstract:
Recurrent neural networks with memory and attention mechanisms are widely used in natural language processing because they can capture short and long term sequential information for diverse tasks. We propose an integrated deep learning model for microbial DNA sequence data, which exploits convolutional neural networks, recurrent neural networks, and attention mechanisms to predict taxonomic classifications and sample-associated attributes, such as the relationship between the microbiome and host phenotype, on the read/sequence level. In this paper, we develop this novel deep learning approach and evaluate its application to amplicon sequences. We apply our approach to short DNA reads and full sequences of 16S ribosomal RNA (rRNA) marker genes, which identify the heterogeneity of a microbial community sample. We demonstrate that our implementation of a novel attention-based deep network architecture, Read2Pheno, achieves read-level phenotypic prediction. Training Read2Pheno models will encode sequences (reads) into dense, meaningful representations: learned embedded vectors output from the intermediate layer of the network model, which can provide biological insight when visualized. The attention layer of Read2Pheno models can also automatically identify nucleotide regions in reads/sequences which are particularly informative for classification. As such, this novel approach can avoid pre/post-processing and manual interpretation required with conventional approaches to microbiome sequence classification. We further show, as proof-of-concept, that aggregating read-level information can robustly predict microbial community properties, host phenotype, and taxonomic classification, with performance at least comparable to conventional approaches. An implementation of the attention-based deep learning network is available at https://github.com/EESI/sequence_attention (a python package) and https://github.com/EESI/seq2att (a command line tool).
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography