Academic literature on the topic 'Bottleneck auto-encoder'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Bottleneck auto-encoder.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Bottleneck auto-encoder"

1

Bous, Frederik, and Axel Roebel. "A Bottleneck Auto-Encoder for F0 Transformations on Speech and Singing Voice." Information 13, no. 3 (February 23, 2022): 102. http://dx.doi.org/10.3390/info13030102.

Full text
Abstract:
In this publication, we present a deep learning-based method to transform the f0 in speech and singing voice recordings. f0 transformation is performed by training an auto-encoder on the voice signal’s mel-spectrogram and conditioning the auto-encoder on the f0. Inspired by AutoVC/F0, we apply an information bottleneck to it to disentangle the f0 from its latent code. The resulting model successfully applies the desired f0 to the input mel-spectrograms and adapts the speaker identity when necessary, e.g., if the requested f0 falls out of the range of the source speaker/singer. Using the mean f0 error in the transformed mel-spectrograms, we define a disentanglement measure and perform a study over the required bottleneck size. The study reveals that to remove the f0 from the auto-encoder’s latent code, the bottleneck size should be smaller than four for singing and smaller than nine for speech. Through a perceptive test, we compare the audio quality of the proposed auto-encoder to f0 transformations obtained with a classical vocoder. The perceptive test confirms that the audio quality is better for the auto-encoder than for the classical vocoder. Finally, a visual analysis of the latent code for the two-dimensional case is carried out. We observe that the auto-encoder encodes phonemes as repeated discontinuous temporal gestures within the latent code.
APA, Harvard, Vancouver, ISO, and other styles
2

Ullmann, Denis, Shideh Rezaeifar, Olga Taran, Taras Holotyak, Brandon Panos, and Slava Voloshynovskiy. "Information Bottleneck Classification in Extremely Distributed Systems." Entropy 22, no. 11 (October 30, 2020): 1237. http://dx.doi.org/10.3390/e22111237.

Full text
Abstract:
We present a new decentralized classification system based on a distributed architecture. This system consists of distributed nodes, each possessing their own datasets and computing modules, along with a centralized server, which provides probes to classification and aggregates the responses of nodes for a final decision. Each node, with access to its own training dataset of a given class, is trained based on an auto-encoder system consisting of a fixed data-independent encoder, a pre-trained quantizer and a class-dependent decoder. Hence, these auto-encoders are highly dependent on the class probability distribution for which the reconstruction distortion is minimized. Alternatively, when an encoding–quantizing–decoding node observes data from different distributions, unseen at training, there is a mismatch, and such a decoding is not optimal, leading to a significant increase of the reconstruction distortion. The final classification is performed at the centralized classifier that votes for the class with the minimum reconstruction distortion. In addition to the system applicability for applications facing big-data communication problems and or requiring private classification, the above distributed scheme creates a theoretical bridge to the information bottleneck principle. The proposed system demonstrates a very promising performance on basic datasets such as MNIST and FasionMNIST.
APA, Harvard, Vancouver, ISO, and other styles
3

Nguyen, Bao Quoc, Thang Tat Vu, and Mai Chi Luong. "Improving bottleneck features for Vietnamese large vocabulary continuous speech recognition system using deep neural networks." Journal of Computer Science and Cybernetics 31, no. 4 (January 3, 2016): 267. http://dx.doi.org/10.15625/1813-9663/31/4/5944.

Full text
Abstract:
In this paper, the pre-training method based on denoising auto-encoder is investigated and proved to be good models for initializing bottleneck networks of Vietnamese speech recognition system that result in better recognition performance compared to base bottleneck features reported previously. The experiments are carried out on the dataset containing speeches on Voice of Vietnam channel (VOV). The results show that the DBNF extraction for Vietnamese recognition decreases relative word error rate by 14 % and 39 % compared to the base bottleneck features and MFCC baseline, respectively.
APA, Harvard, Vancouver, ISO, and other styles
4

Wang, Mou, Xiao-Lei Zhang, and Susanto Rahardja. "An Unsupervised Deep Learning System for Acoustic Scene Analysis." Applied Sciences 10, no. 6 (March 19, 2020): 2076. http://dx.doi.org/10.3390/app10062076.

Full text
Abstract:
Acoustic scene analysis has attracted a lot of attention recently. Existing methods are mostly supervised, which requires well-predefined acoustic scene categories and accurate labels. In practice, there exists a large amount of unlabeled audio data, but labeling large-scale data is not only costly but also time-consuming. Unsupervised acoustic scene analysis on the other hand does not require manual labeling but is known to have significantly lower performance and therefore has not been well explored. In this paper, a new unsupervised method based on deep auto-encoder networks and spectral clustering is proposed. It first extracts a bottleneck feature from the original acoustic feature of audio clips by an auto-encoder network, and then employs spectral clustering to further reduce the noise and unrelated information in the bottleneck feature. Finally, it conducts hierarchical clustering on the low-dimensional output of the spectral clustering. To fully utilize the spatial information of stereo audio, we further apply the binaural representation and conduct joint clustering on that. To the best of our knowledge, this is the first time that a binaural representation is being used in unsupervised learning. Experimental results show that the proposed method outperforms the state-of-the-art competing methods.
APA, Harvard, Vancouver, ISO, and other styles
5

Nguyen, VietHung, and V.T. Pham. "Gear fault monitoring based on unsupervised feature dimensional reduction and optimized LSSVM-BSOA machine learning model." Journal of Mechanical Engineering and Sciences 16, no. 1 (March 23, 2022): 8653–61. http://dx.doi.org/10.15282/jmes.16.1.2022.01.0684.

Full text
Abstract:
In the trend of Industry 4.0 development, the big data of system operation is significant for analyzing, predicting, or identifying any possible problem. This study proposes a new diagnosis technique for identifying the vibration signal, which combines the feature dimensional reduction method and optimized classifier. Firstly, an auto-encoder feature dimensional reduction (AE-FDR) method is constructed with the bottleneck hidden layer to extract the low-dimensional feature. Secondly, a supervised classifier is formed to carry out fine-turning and classification. The least square-support vector machine (LSSVM) classifier is used as basic with an optimized parameter exploited by the backtracking search optimisation algorithm (BSOA). This LSSVM-BSOA is used to identify the gear fault based on the original vibration data. The proposed AE-FDR-LSSVM-BSOA diagnosis technique shows good ability for identifying the gear fault. A helical gear is experimented with three fault status for evaluate this method. The diagnosis result achieves a high accuracy of 93.3%.
APA, Harvard, Vancouver, ISO, and other styles
6

Mahesh T R, V Vivek, and Vinoth Kumar. "Implementation of Machine Learning-Based Data Mining Techniques for IDS." International Journal of Information Technology, Research and Applications 2, no. 1 (March 31, 2023): 7–13. http://dx.doi.org/10.59461/ijitra.v2i1.23.

Full text
Abstract:
The internet is essential for ongoing contact in the modern world, yet its effectiveness might lessen the effect known as intrusions. Any action that negatively affects the targeted system is considered an intrusion. Network security has grown to be a major issue as a result of the Internet's rapid expansion. The Network Intrusion Detection System (IDS), which is widely used, is the primary security defensive mechanism against such hostile assaults. Data mining and machine learning technologies have been extensively employed in network intrusion detection and prevention systems to extract user behaviour patterns from network traffic data. Association rules and sequence rules are the main foundations of data mining used for intrusion detection. Given the Auto encoder algorithm's traditional method's bottleneck of frequent itemsets mining, we provide a Length-Decreasing Support to Identify Intrusion based on Data Mining, which is an upgraded Data Mining Techniques based on Machine Learning for IDS. Based on test results, it appears that the suggested strategy is successful
APA, Harvard, Vancouver, ISO, and other styles
7

Kadam, Sanjay Shahaji, Jotiram Krishna Deshmukh, Ankush Madhukar Gund, Sudam Vasant Nikam, Vijay Balaso Mane, and Dayanand Raghoba Ingle. "Secure Multi-Path Selection with Optimal Controller Placement Using Hybrid Software-Defined Networks with Optimization Algorithm." International Journal on Recent and Innovation Trends in Computing and Communication 11, no. 8 (September 20, 2023): 145–59. http://dx.doi.org/10.17762/ijritcc.v11i8.7932.

Full text
Abstract:
The Internet's growth in popularity requires computer networks for both agility and resilience. Recently, unable to satisfy the computer needs for traditional networking systems. Software Defined Networking (SDN) is known as a paradigm shift in the networking industry. Many organizations are used SDN due to their efficiency of transmission. Striking the right balance between SDN and legacy switching capabilities will enable successful network scenarios in architecture networks. Therefore, this object grand scenario for a hybrid network where the external perimeter transport device is replaced with an SDN device in the service provider network. With the moving away from older networks to SDN, hybrid SDN includes both legacy and SDN switches. Existing models of SDN have limitations such as overfitting, local optimal trapping, and poor path selection efficiency. This paper proposed a Deep Kronecker Neural Network (DKNN) to improve its efficiency with a moderate optimization method for multipath selection in SDN. Dynamic resource scheduling is used for the reward function the learning performance is improved by the deep reinforcement learning (DRL) technique. The controller for centralised SDN acts as a network brain in the control plane. Among the most important duties network is selected for the best SDN controller. It is vulnerable to invasions and the controller becomes a network bottleneck. This study presents an intrusion detection system (IDS) based on the SDN model that runs as an application module within the controller. Therefore, this study suggested the feature extraction and classification of contractive auto-encoder with a triple attention-based classifier. Additionally, this study leveraged the best performing SDN controllers on which many other SDN controllers are based on OpenDayLight (ODL) provides an open northbound API and supports multiple southbound protocols. Therefore, one of the main issues in the multi-controller placement problem (CPP) that addresses needed in the setting of SDN specifically when different aspects in interruption, ability, authenticity and load distribution are being considered. Introducing the scenario concept, CPP is formulated as a robust optimization problem that considers changes in network status due to power outages, controller’s capacity, load fluctuations and changes in switches demand. Therefore, to improve network performance, it is planned to improve the optimal amount of controller placements by simulated annealing using different topologies the modified Dragonfly optimization algorithm (MDOA).
APA, Harvard, Vancouver, ISO, and other styles
8

ÇETİN, Yarkın Deniz, and Ramazan Gökberk CİNBİŞ. "Attentive Sequential Auto-Encoding Towards Unsupervised Object-centric Scene Modeling." Gazi Üniversitesi Fen Bilimleri Dergisi Part C: Tasarım ve Teknoloji, November 15, 2022. http://dx.doi.org/10.29109/gujsc.1139701.

Full text
Abstract:
This paper describes an unsupervised sequential auto-encoding model targeting multi-object scenes. The proposed model uses an attention-based formulation, with reconstruction-driven losses. The main model relies on iteratively writing regions onto a canvas, in a differentiable manner. To enforce attention to objects and/or parts, the model uses a convolutional localization network, a region level bottleneck auto-encoder and a loss term that encourages reconstruction within a limited number of iterations. An extended version of the model incorporates a background modeling component that aims at handling scenes with complex backgrounds. The model is evaluated on two separate datasets: a synthetic dataset that is constructed by composing MNIST digit instances together, and the MS-COCO dataset. The model achieves high reconstruction ability on MNIST based scenes. The extended model shows promising results on the complex and challenging MS-COCO scenes.
APA, Harvard, Vancouver, ISO, and other styles
9

Song, Junjie, Rong Huang, Yujia Tian, and Aihua Dong. "Pre-Activating Semantic Information for Image Aesthetic Assessment." AATCC Journal of Research, February 5, 2023, 247234442211479. http://dx.doi.org/10.1177/24723444221147971.

Full text
Abstract:
Automatic image aesthetic evaluation is an attractive and challenging visual task. Recently, methods based on convolutional neural networks have achieved remarkable performance. However, semantic information, an intuitive prerequisite for evaluating image aesthetics, has not received enough attention regarding its importance in previous methods. How to efficiently extract semantic information and make better use of it to assist the aesthetic evaluation task remains unsolved. In this article, we propose to utilize the self-supervised model Auto-Encoder to extract semantic information in the form of multi-task learning. Then, a fusing module is prepended at the bottleneck layer to explicitly combine semantic information with aesthetic information in a pre-activated manner. Specifically, we implement a customized pooling operation to pool the semantic features extracted by Auto-Encoder and apply a weak constraint between the pooled semantic features and aesthetic information to realize the combination. The following regressor can complete aesthetic evaluation based on the semantic–aesthetic combined features. In addition, to enable our model to adapt to arbitrary aspect ratios of images, another pooling strategy called spatial pyramid pooling is adopted to obtain the image features of a fixed length. Our method achieves competitive performance on the public image aesthetic evaluation benchmark. Especially on the most commonly used metric Spearman rank-order correlation coefficient, the proposed model achieved the best performance compared with some state-of-the-art methods. Extensive ablation studies and visualization experiments were conducted to demonstrate the effectiveness of our method.
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "Bottleneck auto-encoder"

1

Bous, Frederik. "A neural voice transformation framework for modification of pitch and intensity." Electronic Thesis or Diss., Sorbonne université, 2023. http://www.theses.fr/2023SORUS382.

Full text
Abstract:
La voix humaine est une grande source de fascination et un objet de recherche depuis plus de 100 ans. Pendant ce temps, de nombreuses technologies ont germées autour de la voix, comme le vocodeur, qui fournit une représentation paramétrique de la voix, couramment utilisée pour la transformation de la voix. Dans cette tradition, les limites des approches basées uniquement sur le traitement du signal sont évidentes : Pour créer des transformations cohérentes, les dépendances entre les différentes propriétés vocales doivent être bien comprises et modélisées avec précision. Modéliser ces corrélations avec des heuristiques obtenues par des études empiriques ne suffit pas à créer des résultats naturels. Il est nécessaire d'extraire systématiquement des informations sur la voix et d'utiliser automatiquement ces informations lors du processus de transformation. Les progrès récents de la puissance de calcul permettent cette analyse systématique des données au moyen de l'apprentissage automatique. Cette thèse utilise donc l'apprentissage automatique pour créer un système neuronal de transformation de la voix. Le système neuronal de transformation de la voix, présenté ici, fonctionne en deux étapes : Tout d'abord, un vocodeur neuronal permet d'établir une correspondance entre la forme d'onde et une représentation mel-spectrogramme des signaux vocaux. Ensuite, un auto-encodeur avec un goulot d'étranglement permet de démêler différentes propriétés de la voix du reste de l'information. L'auto-encodeur permet de modifier une propriété de la voix tout en ajustant automatiquement d'autres caractéristiques de façon à en conserver le réalisme. Dans la première partie de cette thèse, nous comparons différentes approches du vocodage neuronal et nous expliquons pourquoi la représentation mel-spectrogramme est plus adapté pour la transformation neuronale de la voix plutôt que les espaces paramétriques du vocodeur conventionnels. Dans la deuxième partie, nous présentons l'auto-encodeur avec goulot d'étranglement de l'information. L'auto-encodeur crée un code latent indépendant du conditionnement en entrée. En utilisant ce code latent, le synthétiseur peut effectuer la transformation en combinant le code latent original avec une courbe de paramètres modifiée. Nous transformons la voix en utilisant deux paramètres de contrôle : la fréquence fondamentale et le niveau sonore vocal. La transformation de la fréquence fondamentale est un problème qui a longtemps été abordé : Notre approche est comparable aux techniques existantes puisqu'elles utilisent la fréquence fondamentale comme paramètre. Cela nous permet également d'étudier comment l'auto-encodeur modélise les dépendances entre la fréquence fondamentale et d'autres propriétés de la voix dans un environnement connu. Quant au niveau sonore vocal, nous sommes confrontés au problème de la rareté des annotations. Par conséquent, nous proposons d'abord une nouvelle technique d'estimation du niveau sonore vocal dans de grandes bases de données de voix ; puis nous utilisons ces annotations pour entraîner un auto-encodeur avec goulot d'étranglement permettant de modifier le niveau sonore vocal
Human voice has been a great source of fascination and an object of research for over 100 years. During that time numerous technologies have sprouted around the voice, such as the vocoder, which provides a parametric representation of the voice, commonly used for voice transformation. From this tradition, the limitations of purely signal processing based approaches are evident: To create meaningful transformations the codependencies between different voice properties have to be understood well and modelled precisely. Modelling these correlations with heuristics obtained by empiric studies is not sufficient to create natural results. It is necessary to extract information about the voice systematically and use this information during the transformation process automatically. Recent advances in computer hardware permit this systematic analysis of data by means of machine learning. This thesis thus uses machine learning to create a neural voice transformation framework. The proposed neural voice transformation framework works in two stages: First a neural vocoder allows mapping between a raw audio and a mel-spectrogram representation of voice signals. Secondly, an auto-encoder with information bottleneck allows disentangling various voice properties from the remaining information. The auto-encoder allows changing one voice property while automatically adjusting the remaining voice properties. In the first part of this thesis, we discuss different approaches to neural vocoding and reason why the mel-spectrogram is better suited for neural voice transformations than conventional parametric vocoder spaces. In the second part we discuss the information bottleneck auto-encoder. The auto-encoder creates a latent code that is independent of its conditional input. Using the latent code the synthesizer can perform the transformation by combining the original latent code with a modified parameter curve. We transform the voice using two control parameters: the fundamental frequency and the voice level. Transformation of the fundamental frequency is an objective with a long history. Using the fundamental frequency allows us to compare our approach to existing techniques and study how the auto-encoder models the dependency on other properties in a well known environment. For the voice level, we face the problem that annotations hardly exist. Therefore, first we provide a new estimation technique for voice level in large voice databases, and subsequently use the voice level annotations to train a bottleneck auto-encoder that allows changing the voice level
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Bottleneck auto-encoder"

1

Sainath, Tara N., Brian Kingsbury, and Bhuvana Ramabhadran. "Auto-encoder bottleneck features using deep belief networks." In ICASSP 2012 - 2012 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2012. http://dx.doi.org/10.1109/icassp.2012.6288833.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Koike-Akino, Toshiaki, and Ye Wang. "Stochastic Bottleneck: Rateless Auto-Encoder for Flexible Dimensionality Reduction." In 2020 IEEE International Symposium on Information Theory (ISIT). IEEE, 2020. http://dx.doi.org/10.1109/isit44484.2020.9174523.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Abolhasanzadeh, Bahareh. "Nonlinear dimensionality reduction for intrusion detection using auto-encoder bottleneck features." In 2015 7th Conference on Information and Knowledge Technology (IKT). IEEE, 2015. http://dx.doi.org/10.1109/ikt.2015.7288799.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography