Dissertations / Theses: 'Machines de Boltzmann restreintes'

1

Fissore, Giancarlo. "Generative modeling : statistical physics of Restricted Boltzmann Machines, learning with missing information and scalable training of Linear Flows." Electronic Thesis or Diss., université Paris-Saclay, 2022. http://www.theses.fr/2022UPASG028.

Full text

Abstract:

Les modèles de réseaux neuronaux capables d'approximer et d'échantillonner des distributions de probabilité à haute dimension sont connus sous le nom de modèles génératifs. Ces dernières années, cette classe de modèles a fait l'objet d'une attention particulière en raison de son potentiel à apprendre automatiquement des représentations significatives de la grande quantité de données que nous produisons et consommons quotidiennement. Cette thèse présente des résultats théoriques et algorithmiques relatifs aux modèles génératifs et elle est divisée en deux parties. Dans la première partie, nous concentrons notre attention sur la Machine de Boltzmann Restreinte (RBM) et sa formulation en physique statistique. Historiquement, la physique statistique a joué un rôle central dans l'étude des fondements théoriques et dans le développement de modèles de réseaux neuronaux. La première implémentation neuronale d'une mémoire associative (Hopfield, 1982) est un travail séminal dans ce contexte. La RBM peut être considérée comme un développement du modèle de Hopfield, et elle est particulièrement intéressante en raison de son rôle à l'avant-garde de la révolution de l'apprentissage profond (Hinton et al. 2006). En exploitant sa formulation de physique statistique, nous dérivons une théorie de champ moyen de la RBM qui nous permet de caractériser à la fois son fonctionnement en tant que modèle génératif et la dynamique de sa procédure d'apprentissage. Cette analyse s'avère utile pour dériver une stratégie d'imputation robuste de type champ moyen qui permet d'utiliser la RBM pour apprendre des distributions empiriques dans le cas difficile où l'ensemble de données à modéliser n'est que partiellement observé et présente des pourcentages élevés d'informations manquantes. Dans la deuxième partie, nous considérons une classe de modèles génératifs connus sous le nom de Normalizing Flows (NF), dont la caractéristique distinctive est la capacité de modéliser des distributions complexes à haute dimension en employant des transformations inversibles d'une distribution simple et traitable. L'inversibilité de la transformation permet d'exprimer la densité de probabilité par un changement de variables dont l'optimisation par Maximum de Vraisemblance (ML) est assez simple mais coûteuse en calcul. La pratique courante est d'imposer des contraintes architecturales sur la classe de transformations utilisées pour les NF, afin de rendre l'optimisation par ML efficace. En partant de considérations géométriques, nous proposons un algorithme d'optimisation stochastique par descente de gradient qui exploite la structure matricielle des réseaux de neurones entièrement connectés sans imposer de contraintes sur leur structure autre que la dimensionnalité fixe requise par l'inversibilité. Cet algorithme est efficace en termes de calcul et peut s'adapter à des ensembles de données de très haute dimension. Nous démontrons son efficacité dans l'apprentissage d'une architecture non linéaire multicouche utilisant des couches entièrement connectées
Neural network models able to approximate and sample high-dimensional probability distributions are known as generative models. In recent years this class of models has received tremendous attention due to their potential in automatically learning meaningful representations of the vast amount of data that we produce and consume daily. This thesis presents theoretical and algorithmic results pertaining to generative models and it is divided in two parts. In the first part, we focus our attention on the Restricted Boltzmann Machine (RBM) and its statistical physics formulation. Historically, statistical physics has played a central role in studying the theoretical foundations and providing inspiration for neural network models. The first neural implementation of an associative memory (Hopfield, 1982) is a seminal work in this context. The RBM can be regarded to as a development of the Hopfield model, and it is of particular interest due to its role at the forefront of the deep learning revolution (Hinton et al. 2006).Exploiting its statistical physics formulation, we derive a mean-field theory of the RBM that let us characterize both its functioning as a generative model and the dynamics of its training procedure. This analysis proves useful in deriving a robust mean-field imputation strategy that makes it possible to use the RBM to learn empirical distributions in the challenging case in which the dataset to model is only partially observed and presents high percentages of missing information. In the second part we consider a class of generative models known as Normalizing Flows (NF), whose distinguishing feature is the ability to model complex high-dimensional distributions by employing invertible transformations of a simple tractable distribution. The invertibility of the transformation allows to express the probability density through a change of variables whose optimization by Maximum Likelihood (ML) is rather straightforward but computationally expensive. The common practice is to impose architectural constraints on the class of transformations used for NF, in order to make the ML optimization efficient. Proceeding from geometrical considerations, we propose a stochastic gradient descent optimization algorithm that exploits the matrix structure of fully connected neural networks without imposing any constraints on their structure other then the fixed dimensionality required by invertibility. This algorithm is computationally efficient and can scale to very high dimensional datasets. We demonstrate its effectiveness in training a multylayer nonlinear architecture employing fully connected layers

APA, Harvard, Vancouver, ISO, and other styles

2

Hasasneh, Ahmad. "Robot semantic place recognition based on deep belief networks and a direct use of tiny images." Phd thesis, Université Paris Sud - Paris XI, 2012. http://tel.archives-ouvertes.fr/tel-00960289.

Full text

Abstract:

Usually, human beings are able to quickly distinguish between different places, solely from their visual appearance. This is due to the fact that they can organize their space as composed of discrete units. These units, called ''semantic places'', are characterized by their spatial extend and their functional unity. Such a semantic category can thus be used as contextual information which fosters object detection and recognition. Recent works in semantic place recognition seek to endow the robot with similar capabilities. Contrary to classical localization and mapping works, this problem is usually addressed as a supervised learning problem. The question of semantic places recognition in robotics - the ability to recognize the semantic category of a place to which scene belongs to - is therefore a major requirement for the future of autonomous robotics. It is indeed required for an autonomous service robot to be able to recognize the environment in which it lives and to easily learn the organization of this environment in order to operate and interact successfully. To achieve that goal, different methods have been already proposed, some based on the identification of objects as a prerequisite to the recognition of the scenes, and some based on a direct description of the scene characteristics. If we make the hypothesis that objects are more easily recognized when the scene in which they appear is identified, the second approach seems more suitable. It is however strongly dependent on the nature of the image descriptors used, usually empirically derived from general considerations on image coding.Compared to these many proposals, another approach of image coding, based on a more theoretical point of view, has emerged the last few years. Energy-based models of feature extraction based on the principle of minimizing the energy of some function according to the quality of the reconstruction of the image has lead to the Restricted Boltzmann Machines (RBMs) able to code an image as the superposition of a limited number of features taken from a larger alphabet. It has also been shown that this process can be repeated in a deep architecture, leading to a sparse and efficient representation of the initial data in the feature space. A complex problem of classification in the input space is thus transformed into an easier one in the feature space. This approach has been successfully applied to the identification of tiny images from the 80 millions image database of the MIT. In the present work, we demonstrate that semantic place recognition can be achieved on the basis of tiny images instead of conventional Bag-of-Word (BoW) methods and on the use of Deep Belief Networks (DBNs) for image coding. We show that after appropriate coding a softmax regression in the projection space is sufficient to achieve promising classification results. To our knowledge, this approach has not yet been investigated for scene recognition in autonomous robotics. We compare our methods with the state-of-the-art algorithms using a standard database of robot localization. We study the influence of system parameters and compare different conditions on the same dataset. These experiments show that our proposed model, while being very simple, leads to state-of-the-art results on a semantic place recognition task.

APA, Harvard, Vancouver, ISO, and other styles

3

Svoboda, Jiří. "Multi-modální "Restricted Boltzmann Machines"." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2013. http://www.nusl.cz/ntk/nusl-236426.

Full text

Abstract:

This thesis explores how multi-modal Restricted Boltzmann Machines (RBM) can be used in content-based image tagging. This work also cointains brief analysis of modalities that can be used for multi-modal classification. There are also described various RBMs, that are suitable for different kinds of input data. A design and implementation of multimodal RBM is described together with results of preliminary experiments.

APA, Harvard, Vancouver, ISO, and other styles

4

TICKNOR, ANTHONY JAMES. "OPTICAL COMPUTING IN BOLTZMANN MACHINES." Diss., The University of Arizona, 1987. http://hdl.handle.net/10150/184169.

Full text

Abstract:

This dissertation covers theoretical and experimental work on applying optical processing techniques ot the operation of a Boltzmann machine. A Boltzmann machine is a processor that solves a problem by iteratively optimizing an estimate of the solution. The optimization is done by finding a minimum of an energy surface over the solution space. The energy function is designed to consider not only data but also a priori information about the problem to assist the optimization. The dissertation first establishes a generic line-of-approach for designing an algorithmic optical computer that might successfully operate using currently realizable analog optical systems for highly-parallel operations. Simulated annealing, the algorithm of the Boltzmann machine, is then shown to be adaptable to this line-of-approach and is chosen as the algorithm to demonstrate these concepts throughout the dissertation. The algorithm is analyzed and optical systems are outlined that will perform the appropriate tasks within the algorithm. From this analysis and design, realizations of the optically-assisted Boltzmann machine are described and it is shown that the optical systems can be used in these algorithmic computations to produce solutions as precise as the single-pass operations of the analog optical systems. Further considerations are discussed for increasing the usefulness of the Boltzmann machine with respect to operating on larger data sets while maintaining the full degrees of parallelism and to increasing the speed by reducing the number of electronical-optical transducers and by utilizing more of the available parallelism. It is demonstgrated how, with a little digital support, the analog optical systems can be used to produce solutions with digital precision but without compromising the speed of the optical computations. Finally there is a short discussion as to how the Boltzmann machine may be modelled as a neuromorphic system for added insight into the computational functioning of the machine.

APA, Harvard, Vancouver, ISO, and other styles

5

Camilli, Francesco. "Statistical mechanics perspectives on Boltzmann machines." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2019. http://amslaurea.unibo.it/19302/.

Full text

Abstract:

La tesi contiene un approccio rigoroso alla meccanica statistica dei sistemi disordinati con particolare attenzione a modelli di campo medio. Il punto di partenza è un’introduzione al modello Curie-Weiss in cui si presentano sia risultati classici sia una nuova proprietà di stabilità per cambio di normalizzazione. Il modello è stato risolto per interazioni ferromagnetiche ed antiferromagnetiche. Una volta introdotti gli strumenti fondamentali, si passa al modello di Sherringtone Kirkpatrick, in cui le interazioni sono estratte da una gaussiana standard ed indipendenti. Si provano l’esistenza del limite termodinamico e la correttezza del replica symmetry breaking ansatz di Parisi per l’energia libera. Il lower bound per quest'ultima è rigorosamente provato tramite lo schema di Aizenmann, Sims e Starr. Nei due capitoli successivi, sono stati studiati modelli multi-specie. Nel caso non disordinato, il modello multi-layer viene risolto. A seguire, un’analisi di modelli in cui la matrice di interazioni tra le specie è definita (negativa o positiva). Per sistemi multi-specie disordinati invece è stato analizzato solo il caso ellittico, con matrice delle covarianze delle interazioni definita positiva. Un caso iperbolico, la Deep Boltzmann Machine (DBM), è infine discusso. Proprio a causa dell'iperbolicità di questo modello si ha soltanto un upper bound, costruito con combinazioni delle energie libere di SK, che è più grande dell’energia libera. Delle prospettive interessanti emergono dallo studio della regione di annealing e di replica symmetry, due particolari regimi associati a fasi di alte temperature. Si può provare che, a campo esterno nullo, la stabilità della soluzione replica symmetric è implicata dalle stesse condizioni che assicurano l’annealing. Per finire, si mostra che, trovando degli opportuni fattori di forma, ovvero i rapporti tra le taglie dei layers della DBM, la regione di annealing di questo modello può essere compressa.

APA, Harvard, Vancouver, ISO, and other styles

6

CRUZ, FELIPE JOAO PONTES DA. "RECOMMENDER SYSTEMS USING RESTRICTED BOLTZMANN MACHINES." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2016. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=30285@1.

Full text

Abstract:

PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO
COORDENAÇÃO DE APERFEIÇOAMENTO DO PESSOAL DE ENSINO SUPERIOR
PROGRAMA DE EXCELENCIA ACADEMICA
Sistemas de recomendação aparecem em diversos domínios do mundo real. Vários modelos foram propostos para o problema de predição de entradas faltantes em um conjunto de dados. Duas das abordagens mais comuns são filtragem colaborativa baseada em similaridade e modelos de fatores latentes. Uma alternativa, mais recente, foi proposta por Salakhutdinov em 2007, usando máquinas de Boltzmann restritas, ou RBMs. Esse modelo se encaixa na família de modelos de fatores latentes, no qual, modelamos fatores latentes dos dados usando unidades binárias na camada escondida das RBMs. Esses modelos se mostraram capazes de aproximar resultados obtidos com modelos de fatoração de matrizes. Nesse trabalho vamos revisitar esse modelo e detalhar cuidadosamente como modelar e treinar RBMs para o problema de predição de entradas vazias em dados tabulares.
Recommender systems can be used in many problems in the real world. Many models were proposed to solve the problem of predicting missing entries in a specific dataset. Two of the most common approaches are neighborhood-based collaborative filtering and latent factor models. A more recent alternative was proposed on 2007 by Salakhutdinov, using Restricted Boltzmann Machines. This models belongs to the family of latent factor models, in which, we model latent factors over the data using hidden binary units. RBMs have shown that they can approximate solutions trained with a traditional matrix factorization model. In this work we ll revisit this proposed model and carefully detail how to model and train RBMs for the problem of missing ratings prediction.

APA, Harvard, Vancouver, ISO, and other styles

7

Moody, John Matali. "Process monitoring with restricted Boltzmann machines." Thesis, Stellenbosch : Stellenbosch University, 2014. http://hdl.handle.net/10019.1/86467.

Full text

Abstract:

Thesis (MScEng)--Stellenbosch University, 2014.
ENGLISH ABSTRACT: Process monitoring and fault diagnosis are used to detect abnormal events in processes. The early detection of such events or faults is crucial to continuous process improvement. Although principal component analysis and partial least squares are widely used for process monitoring and fault diagnosis in the metallurgical industries, these models are linear in principle; nonlinear approaches should provide more compact and informative models. The use of auto associative neural networks or auto encoders provide a principled approach for process monitoring. However, until very recently, these multiple layer neural networks have been difficult to train and have therefore not been used to any significant extent in process monitoring. With newly proposed algorithms based on the pre-training of the layers of the neural networks, it is now possible to train neural networks with very complex structures, i.e. deep neural networks. These neural networks can be used as auto encoders to extract features from high dimensional data. In this study, the application of deep auto encoders in the form of Restricted Boltzmann machines (RBM) to the extraction of features from process data is considered. These networks have mostly been used for data visualization to date and have not been applied in the context of fault diagnosis or process monitoring as yet. The objective of this investigation is therefore to assess the feasibility of using Restricted Boltzmann machines in various fault detection schemes. The use of RBM in process monitoring schemes will be discussed, together with the application of these models in automated control frameworks.
AFRIKAANSE OPSOMMING: Prosesmonitering en fout diagnose word gebruik om abnormale gebeure in prosesse op te spoor. Die vroeë opsporing van sulke gebeure of foute is noodsaaklik vir deurlopende verbetering van prosesse. Alhoewel hoofkomponent-analise en parsiële kleinste kwadrate wyd gebruik word vir prosesmonitering en fout diagnose in die metallurgiese industrieë, is hierdie modelle lineêr in beginsel; nie-lineêre benaderings behoort meer kompakte en insiggewende modelle te voorsien. Die gebruik van outo-assosiatiewe neurale netwerke of outokodeerders bied 'n beginsel gebaseerder benadering om dit te bereik. Hierdie veelvoudige laag neurale netwerke was egter tot onlangs moeilik om op te lei en is dus nie tot ŉ beduidende mate in die prosesmonitering gebruik nie. Nuwe, voorgestelde algoritmes, gebaseer op voorafopleiding van die lae van die neurale netwerke, maak dit nou moontlik om neurale netwerke met baie ingewikkelde strukture, d.w.s. diep neurale netwerke, op te lei. Hierdie neurale netwerke kan gebruik word as outokodeerders om kenmerke van hoë-dimensionele data te onttrek. In hierdie studie word die toepassing van diep outokodeerders in die vorm van Beperkte Boltzmann Masjiene vir die onttrekking van kenmerke van proses data oorweeg. Tot dusver is hierdie netwerke meestal vir data visualisering gebruik en dit is nog nie toegepas in die konteks van fout diagnose of prosesmonitering nie. Die doel van hierdie ondersoek is dus om die haalbaarheid van die gebruik van Beperkte Boltzmann Masjiene in verskeie foutopsporingskemas te assesseer. Die gebruik van Beperkte Boltzmann Masjiene se eienskappe in prosesmoniteringskemas sal bespreek word, tesame met die toepassing van hierdie modelle in outomatiese beheer raamwerke.

APA, Harvard, Vancouver, ISO, and other styles

8

LACAILLE, JEROME. "Machines de boltzmann. Theorie et applications." Paris 11, 1992. http://www.theses.fr/1992PA112213.

Full text

Abstract:

Cette these presente les machines de boltzmann en trois grandes parties. On detaille tout d'abord le formalisme et l'algorithmique de ces machines d'un point de vue theorique. L'accent sera mis sur les architectures paralleles et asymetriques. Une application de ce type de reseau est presentee dans un second temps. Il s'agit d'un detecteur de contours sur des photographies en niveau de gris. La derniere partie decrit une implementation logicielle des machines de boltzmann synchrones asymetriques. On propose un simulateur et un interpreteur muni d'une notice d'utilisation. Cette derniere partie etant concue de maniere didactique puisque chaque etape est imagee par des exemples informatiques

APA, Harvard, Vancouver, ISO, and other styles

9

Swersky, Kevin. "Inductive principles for learning Restricted Boltzmann Machines." Thesis, University of British Columbia, 2010. http://hdl.handle.net/2429/27816.

Full text

Abstract:

We explore the training and usage of the Restricted Boltzmann Machine for unsupervised feature extraction. We investigate the many different aspects involved in their training, and by applying the concept of iterate averaging we show that it is possible to greatly improve on state of the art algorithms. We also derive estimators based on the principles of pseudo-likelihood, ratio matching, and score matching, and we test them empirically against contrastive divergence, and stochastic maximum likelihood (also known as persistent contrastive divergence). Our results show that ratio matching and score matching are promising approaches to learning Restricted Boltzmann Machines. By applying score matching to the Restricted Boltzmann Machine, we show that training an auto-encoder neural network with a particular kind of regularization function is asymptotically consistent. Finally, we discuss the concept of deep learning and its relationship to training Restricted Boltzmann Machines, and briefly explore the impact of fine-tuning on the parameters and performance of a deep belief network.

APA, Harvard, Vancouver, ISO, and other styles

10

Farguell, Matesanz Enric. "A new approach to Decimation in High Order Boltzmann Machines." Doctoral thesis, Universitat Ramon Llull, 2011. http://hdl.handle.net/10803/9155.

Full text

Abstract:

La Màquina de Boltzmann (MB) és una xarxa neuronal estocàstica amb l'habilitat tant d'aprendre com d'extrapolar distribucions de probabilitat. Malgrat això, mai ha arribat a ser tant emprada com d'altres models de xarxa neuronal, com ara el perceptró, degut a la complexitat tan del procés de simulació com d'aprenentatge: les quantitats que es necessiten al llarg del procés d'aprenentatge són normalment estimades mitjançant tècniques Monte Carlo (MC), a través de l'algorisme del Temprat Simulat (SA). Això ha portat a una situació on la MB és més ben aviat considerada o bé com una extensió de la xarxa de Hopfield o bé com una implementació paral·lela del SA.

Malgrat aquesta relativa manca d'èxit, la comunitat científica de l'àmbit de les xarxes neuronals ha mantingut un cert interès amb el model. Una de les extensions més rellevants a la MB és la Màquina de Boltzmann d'Alt Ordre (HOBM), on els pesos poden connectar més de dues neurones simultàniament. Encara que les capacitats d'aprenentatge d'aquest model han estat analitzades per d'altres autors, no s'ha pogut establir una equivalència formal entre els pesos d'una MB i els pesos d'alt ordre de la HOBM.

En aquest treball s'analitza l'equivalència entre una MB i una HOBM a través de l'extensió del mètode conegut com a decimació. Decimació és una eina emprada a física estadística que es pot també aplicar a cert tipus de MB, obtenint expressions analítiques per a calcular les correlacions necessàries per a dur a terme el procés d'aprenentatge. Per tant, la decimació evita l'ús del costós algorisme del SA. Malgrat això, en la seva forma original, la decimació podia tan sols ser aplicada a cert tipus de topologies molt poc densament connectades. La extensió que es defineix en aquest treball permet calcular aquests valors independentment de la topologia de la xarxa neuronal; aquest model es basa en afegir prou pesos d'alt ordre a una MB estàndard com per a assegurar que les equacions de la decimació es poden solucionar.

Després, s'estableix una equivalència directa entre els pesos d'un model d'alt ordre, la distribució de probabilitat que pot aprendre i les matrius de Hadamard: les propietats d'aquestes matrius es poden emprar per a calcular fàcilment els pesos del sistema. Finalment, es defineix una MB estàndard amb una topologia específica que permet entendre millor la equivalència exacta entre unitats ocultes de la MB i els pesos d'alt ordre de la HOBM.
La Máquina de Boltzmann (MB) es una red neuronal estocástica con la habilidad de aprender y extrapolar distribuciones de probabilidad. Sin embargo, nunca ha llegado a ser tan popular como otros modelos de redes neuronals como, por ejemplo, el perceptrón. Esto es debido a la complejidad tanto del proceso de simulación como de aprendizaje: las cantidades que se necesitan a lo largo del proceso de aprendizaje se estiman mediante el uso de técnicas Monte Carlo (MC), a través del algoritmo del Temple Simulado (SA). En definitiva, la MB es generalmente considerada o bien una extensión de la red de Hopfield o bien como una implementación paralela del algoritmo del SA.

Pese a esta relativa falta de éxito, la comunidad científica del ámbito de las redes neuronales ha mantenido un cierto interés en el modelo. Una importante extensión es la Màquina de Boltzmann de Alto Orden (HOBM), en la que los pesos pueden conectar más de dos neuronas a la vez. Pese a que este modelo ha sido analizado en profundidad por otros autores, todavía no se ha descrito una equivalencia formal entre los pesos de una MB i las conexiones de alto orden de una HOBM.

En este trabajo se ha analizado la equivalencia entre una MB i una HOBM, a través de la extensión del método conocido como decimación. La decimación es una herramienta propia de la física estadística que también puede ser aplicada a ciertos modelos de MB, obteniendo expresiones analíticas para el cálculo de las cantidades necesarias en el algoritmo de aprendizaje. Por lo tanto, la decimación evita el alto coste computacional asociado al al uso del costoso algoritmo del SA. Pese a esto, en su forma original la decimación tan solo podía ser aplicada a ciertas topologías de MB, distinguidas por ser poco densamente conectadas. La extensión definida en este trabajo permite calcular estos valores independientemente de la topología de la red neuronal: este modelo se basa en añadir suficientes pesos de alto orden a una MB estándar como para asegurar que las ecuaciones de decimación pueden solucionarse.

Más adelante, se establece una equivalencia directa entre los pesos de un modelo de alto orden, la distribución de probabilidad que puede aprender y las matrices tipo Hadamard. Las propiedades de este tipo de matrices se pueden usar para calcular fácilmente los pesos del sistema. Finalmente, se define una BM estándar con una topología específica que permite entender mejor la equivalencia exacta entre neuronas ocultas en la MB y los pesos de alto orden de la HOBM.
The Boltzmann Machine (BM) is a stochastic neural network with the ability of both learning and extrapolating probability distributions. However, it has never been as widely used as other neural networks such as the perceptron, due to the complexity of both the learning and recalling algorithms, and to the high computational cost required in the learning process: the quantities that are needed at the learning stage are usually estimated by Monte Carlo (MC) through the Simulated Annealing (SA) algorithm. This has led to a situation where the BM is rather considered as an evolution of the Hopfield Neural Network or as a parallel implementation of the Simulated Annealing algorithm.

Despite this relative lack of success, the neural network community has continued to progress in the analysis of the dynamics of the model. One remarkable extension is the High Order Boltzmann Machine (HOBM), where weights can connect more than two neurons at a time. Although the learning capabilities of this model have already been discussed by other authors, a formal equivalence between the weights in a standard BM and the high order weights in a HOBM has not yet been established.

We analyze this latter equivalence between a second order BM and a HOBM by proposing an extension of the method known as decimation. Decimation is a common tool in statistical physics that may be applied to some kind of BMs, that can be used to obtain analytical expressions for the n-unit correlation elements required in the learning process. In this way, decimation avoids using the time consuming Simulated Annealing algorithm. However, as it was first conceived, it could only deal with sparsely connected neural networks. The extension that we define in this thesis allows computing the same quantities irrespective of the topology of the network. This method is based on adding enough high order weights to a standard BM to guarantee that the system can be solved.

Next, we establish a direct equivalence between the weights of a HOBM model, the probability distribution to be learnt and Hadamard matrices. The properties of these matrices can be used to easily calculate the value of the weights of the system. Finally, we define a standard BM with a very specific topology that helps us better understand the exact equivalence between hidden units in a BM and high order weights in a HOBM.

APA, Harvard, Vancouver, ISO, and other styles

11

Bertholds, Alexander, and Emil Larsson. "An intelligent search for feature interactions using Restricted Boltzmann Machines." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-202208.

Full text

Abstract:

Klarna uses a logistic regression to estimate the probability that an e-store customer will default on its given credit. The logistic regression is a linear statistical model which cannot detect non-linearities in the data. The aim of this project has been to develop a program which can be used to find suitable non-linear interaction-variables. This can be achieved using a Restricted Boltzmann Machine, an unsupervised neural network, whose hidden nodes can be used to model the distribution of the data. By using the hidden nodes as new variables in the logistic regression it is possible to see which nodes that have the greatest impact on the probability of default estimates. The contents of the hidden nodes, corresponding to different parts of the data distribution, can be used to find suitable interaction-variables which will allow the modelling of non-linearities. It was possible to find the data distribution using the Restricted Boltzmann Machine and adding its hidden nodes to the logistic regression improved the model's ability to predict the probability of default. The hidden nodes could be used to create interaction-variables which improve Klarna's internal models used for credit risk estimates.
Klarna använder en logistisk regression för att estimera sannolikheten att en e-handelskund inte kommer att betala sina fakturor efter att ha givits kredit. Den logistiska regressionen är en linjär modell och kan därför inte upptäcka icke-linjäriteter i datan. Målet med detta projekt har varit att utveckla ett program som kan användas för att hitta lämpliga icke-linjära interaktionsvariabler. Genom att införa dessa i den logistiska regressionen blir det möjligt att upptäcka icke-linjäriteter i datan och därmed förbättra sannolikhetsestimaten. Det utvecklade programmet använder Restricted Boltzmann Machines, en typ av oövervakat neuralt nätverk, vars dolda noder kan användas för att hitta datans distribution. Genom att använda de dolda noderna i den logistiska regressionen är det möjligt att se vilka delar av distributionen som är viktigast i sannolikhetsestimaten. Innehållet i de dolda noderna, som motsvarar olika delar av datadistributionen, kan användas för att hitta lämpliga interaktionsvariabler. Det var möjligt att hitta datans distribution genom att använda en Restricted Boltzmann Machine och dess dolda noder förbättrade sannolikhetsestimaten från den logistiska regressionen. De dolda noderna kunde användas för att skapa interaktionsvariabler som förbättrar Klarnas interna kreditriskmodeller.

APA, Harvard, Vancouver, ISO, and other styles

12

Tubiana, Jérôme. "Restricted Boltzmann machines : from compositional representations to protein sequence analysis." Thesis, Paris Sciences et Lettres (ComUE), 2018. http://www.theses.fr/2018PSLEE039/document.

Full text

Abstract:

Les Machines de Boltzmann restreintes (RBM) sont des modèles graphiques capables d’apprendre simultanément une distribution de probabilité et une représentation des données. Malgré leur architecture relativement simple, les RBM peuvent reproduire très fidèlement des données complexes telles que la base de données de chiffres écrits à la main MNIST. Il a par ailleurs été montré empiriquement qu’elles peuvent produire des représentations compositionnelles des données, i.e. qui décomposent les configurations en leurs différentes parties constitutives. Cependant, toutes les variantes de ce modèle ne sont pas aussi performantes les unes que les autres, et il n’y a pas d’explication théorique justifiant ces observations empiriques. Dans la première partie de ma thèse, nous avons cherché à comprendre comment un modèle si simple peut produire des distributions de probabilité si complexes. Pour cela, nous avons analysé un modèle simplifié de RBM à poids aléatoires à l’aide de la méthode des répliques. Nous avons pu caractériser théoriquement un régime compositionnel pour les RBM, et montré sous quelles conditions (statistique des poids, choix de la fonction de transfert) ce régime peut ou ne peut pas émerger. Les prédictions qualitatives et quantitatives de cette analyse théorique sont en accord avec les observations réalisées sur des RBM entraînées sur des données réelles. Nous avons ensuite appliqué les RBM à l’analyse et à la conception de séquences de protéines. De part leur grande taille, il est en effet très difficile de simuler physiquement les protéines, et donc de prédire leur structure et leur fonction. Il est cependant possible d’obtenir des informations sur la structure d’une protéine en étudiant la façon dont sa séquence varie selon les organismes. Par exemple, deux sites présentant des corrélations de mutations importantes sont souvent physiquement proches sur la structure. A l’aide de modèles graphiques tels que les Machine de Boltzmann, on peut exploiter ces signaux pour prédire la proximité spatiale des acides-aminés d’une séquence. Dans le même esprit, nous avons montré sur plusieurs familles de protéines que les RBM peuvent aller au-delà de la structure, et extraire des motifs étendus d’acides aminés en coévolution qui reflètent les contraintes phylogénétiques, structurelles et fonctionnelles des protéines. De plus, on peut utiliser les RBM pour concevoir de nouvelles séquences avec des propriétés fonctionnelles putatives par recombinaison de ces motifs. Enfin, nous avons développé de nouveaux algorithmes d’entraînement et des nouvelles formes paramétriques qui améliorent significativement la performance générative des RBM. Ces améliorations les rendent compétitives avec l’état de l’art des modèles génératifs tels que les réseaux génératifs adversariaux ou les auto-encodeurs variationnels pour des données de taille intermédiaires
Restricted Boltzmann machines (RBM) are graphical models that learn jointly a probability distribution and a representation of data. Despite their simple architecture, they can learn very well complex data distributions such the handwritten digits data base MNIST. Moreover, they are empirically known to learn compositional representations of data, i.e. representations that effectively decompose configurations into their constitutive parts. However, not all variants of RBM perform equally well, and little theoretical arguments exist for these empirical observations. In the first part of this thesis, we ask how come such a simple model can learn such complex probability distributions and representations. By analyzing an ensemble of RBM with random weights using the replica method, we have characterised a compositional regime for RBM, and shown under which conditions (statistics of weights, choice of transfer function) it can and cannot arise. Both qualitative and quantitative predictions obtained with our theoretical analysis are in agreement with observations from RBM trained on real data. In a second part, we present an application of RBM to protein sequence analysis and design. Owe to their large size, it is very difficult to run physical simulations of proteins, and to predict their structure and function. It is however possible to infer information about a protein structure from the way its sequence varies across organisms. For instance, Boltzmann Machines can leverage correlations of mutations to predict spatial proximity of the sequence amino-acids. Here, we have shown on several synthetic and real protein families that provided a compositional regime is enforced, RBM can go beyond structure and extract extended motifs of coevolving amino-acids that reflect phylogenic, structural and functional constraints within proteins. Moreover, RBM can be used to design new protein sequences with putative functional properties by recombining these motifs at will. Lastly, we have designed new training algorithms and model parametrizations that significantly improve RBM generative performance, to the point where it can compete with state-of-the-art generative models such as Generative Adversarial Networks or Variational Autoencoders on medium-scale data

APA, Harvard, Vancouver, ISO, and other styles

13

Huhnstock, Nikolas. "Evaluation of label incorporated recommender systems : Based on restricted boltzmann machines." Thesis, Högskolan i Skövde, Institutionen för informationsteknologi, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-12609.

Full text

Abstract:

In this thesis the problem of providing good recommendations to assist users to make the best choice out of numerous options is studied. To overcome the common problem of sparsity of the data, from which recommendations are inferred, additional label information assigned to items is considered. Based on a literature survey approaches that have proven to perform well were identified and combined into a single Framework. The proposed framework is based on the third-order Restricted Boltzmann machine which enables to incorporate label information as well as traditional rating information into a single model. The framework also implements the global-approach of collaborative filtering, where the user- and item-based approaches are both considered to improve the performance of the model. The proposed framework is implemented and evaluated using an experiment measuring the prediction error on test samples. The results obtained from the conducted experiments did not confirm the assumptions made about the improve of the models accuracy, when incorporating the additional label information. Reasons for this are identified and discussed.

APA, Harvard, Vancouver, ISO, and other styles

14

Desai, Soham Jayesh. "Hardware implementation of re-configurable Restricted Boltzmann Machines for image recognition." Thesis, Georgia Institute of Technology, 2015. http://hdl.handle.net/1853/53548.

Full text

Abstract:

The Internet of Things (IoTs) has triggered rapid advances in sensors, surveillance devices, wearables and body area networks with advanced Human-Computer Interfaces (HCI). Neural Networks optimized algorithmically for high accuracy and high representation power are very deep and require tremendous storage and processing capabilities leading to higher area and power costs. For developing smart front-ends for ‘always on’ sensor nodes we need to optimize for power and area. This requires considering trade-offs with respect to various entities such as resource utilization, processing time, area, power, accuracy etc. Our experimental results show that there is presence of a network configuration with minimum energy given the input constraints of an application in consideration. This presents the need for a hardware-software co-design approach. We present a highly parameterized hardware design on an FPGA allowing re-configurability and the ability to evaluate different design choices in a short amount of time. We also describe the capability of extending our design to offer run time configurability. This allows the design to be altered for different applications based on need and also allows the design to be used as a cascaded classifier beneficial for continuous sensing for low power applications. This thesis aims to evaluate the use of Restricted Boltzmann Machines for building such re-configurable low power front ends. We develop the hardware architecture for such a system and provide experimental results obtained for the case study of Posture detection for body worn cameras used for law enforcement.

APA, Harvard, Vancouver, ISO, and other styles

15

Tran, Son. "Representation decomposition for knowledge extraction and sharing using restricted Boltzmann machines." Thesis, City University London, 2016. http://openaccess.city.ac.uk/14423/.

Full text

Abstract:

Restricted Boltzmann machines (RBMs), with many variations and extensions, are an efficient neural network model that has been applied very successfully recently as a building block for deep networks in diverse areas ranging from language generation to video analysis and speech recognition. Despite their success and the creation of increasingly complex network models and learning algorithms based on RBMs, the question of how knowledge is represented, and could be shared by such networks, has received comparatively little attention. Neural networks are notorious for being difficult to interpret. The area of knowledge extraction addresses this problem by translating network models into symbolic knowledge. Knowledge extraction has been normally applied to feed-forward neural networks trained in supervised fashion using the back-propagation learning algorithm. More recently, research has shown that the use of unsupervised models may improve the performance of network models at learning structures from complex data. In this thesis, we study and evaluate the decomposition of the knowledge encoded by training stacks of RBMs into symbolic knowledge that can offer: (i) a compact representation for recognition tasks; (ii) an intermediate language between hierarchical symbolic knowledge and complex deep networks; (iii) an adaptive transfer learning method for knowledge reuse. These capabilities are the fundamentals of a Learning, Extraction and Sharing (LES) system, which we have developed. In this system learning can automate the process of encoding knowledge from data into an RBM, extraction then translates the knowledge into symbolic form, and sharing allows parts of the knowledge-base to be reused to improve learning in other domains. To this end, in this thesis we introduce confidence rules, which are used to allow the combination of symbolic knowledge and quantitative reasoning. Inspired by Penalty Logic - introduced for Hopfield networks confidence rules establish a relationship between logical rules and RBMs. However, instead of representing propositional well-formed formulas, confidence rules are designed to account for the reasoning of a stack of RBMs, to support modular learning and hierarchical inference. This approach shares common objectives with the work on neural-symbolic cognitive agents. We show in both theory and through empirical evaluations that a hierarchical logic program in the form of a set of confidence rules can be constructed by decomposing representations in an RBM or a deep belief network (DBN). This decomposition is at the core of a new knowledge extraction algorithm which is computationally efficient. The extraction algorithm seeks to benefit from the symbolic knowledge representation that it produces in order to improve network initialisation in the case of transfer learning. To this end, confidence rules o_er a language for encoding symbolic knowledge into a deep network, resulting, as shown empirically in this thesis, in an improvement in modular learning and reasoning. As far as we know this is the first attempt to extract, encode, and transfer symbolic knowledge among DBNs. In a confidence rule, a real value, named confidence value, is associated with a logical implication rule. We show that the logical rules with the highest confidence values can perform similarly to the original networks. We also show that by transferring and encoding representations learned from a domain onto another related or analogous domain, one may improve the performance of representations learned in this other domain. To this end, we introduce a novel algorithm for transfer learning called “Adaptive Profile Transferred Likelihood”, which adapts transferred representations to target domain data. This algorithm is shown to be more effective than the simple combination of transferred representations with the representations learned in the target domain. It is also less sensitive to noise and therefore more robust to deal with the problem of negative transfer.

APA, Harvard, Vancouver, ISO, and other styles

16

Berg, Markus. "Modeling the Term Structure of Interest Rates with Restricted Boltzmann Machines." Thesis, KTH, Matematisk statistik, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-229486.

Full text

Abstract:

This thesis investigates if Gaussian restricted Boltzmann machines can be used to model the Swedish term structure of interest rates. The models are evaluated based on the ability to make one-day-ahead forecasts and the ability to generate long term scenarios. The results are compared to simple benchmark models, such as assuming a random walk. The effects of preprocessing the input data with principal component analysis are also investigated. The results show that the ability to make one-day-ahead forecasts, measured as a mean squared error, is comparable to a random walk benchmark, both in-sample and out-of-sample. The ability to generate long term scenarios show promising results. The scenarios are evaluated based on visual properties and one-year-ahead forecast errors on semi-out-of-sample data. The results outperform the benchmark models. The main focus of the thesis is not to optimize performance of the models, but instead to serve as an introduction to modeling the term structure of interest rates with Gaussian restricted Boltzmann machines.
Denna uppsats undersöker huruvida Gaussian restricted Boltzmann machines kan användas för att modellera avkastningskurvan baserad på svensk data. De testade modellerna utvärderas baserat på förmågan att förutsäga morgondagens avkastningskurva och förmågan att generera långsiktiga scenarier för avkastningskurvan. Resultaten jämförs med enkla jämförelsemodeller, så som att anta en slumpvandring. Effekten av att använda principalkomponentanalys för att preparera indatan undersöks också. Resultaten visar att förmågan att förutsäga morgondagens avkastningskurva, mätt som medelkvadratfel, är jämförbar med att anta en slumpvandring, både in-sample och out-of-sample. Förmågan att generera långsiktiga scenarier visar på lovande resultat baserat på synbara egenskaper och förmågan till att göra ettåriga förutsägelser för semi-out-of-sample data. Uppsatsens huvudfokus är inte att optimera prestandan för modellerna, utan istället att vara en introduktion till hur avkastningskurvan kan modelleras med Gaussian restricted Boltzmann machines.

APA, Harvard, Vancouver, ISO, and other styles

17

McCoppin, Ryan R. "An Evolutionary Approximation to Contrastive Divergence in Convolutional Restricted Boltzmann Machines." Wright State University / OhioLINK, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=wright1418750414.

Full text

APA, Harvard, Vancouver, ISO, and other styles

18

Hultin, Hanna. "Image Classification Using a Combination of Convolutional Layers and Restricted Boltzmann Machines." Thesis, KTH, Skolan för teknikvetenskap (SCI), 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-168005.

Full text

Abstract:

Denna studie har till syfte att undersöka vilken effekt restricted Boltzmann machines (RBMs) har när de kombineras med ett convolutional neural network (CNN) som används för bildklassificering. Detta är ett intressant område som kombinerar övervakad och oövervakad träning av neurala nätverk och som ännu inte har granskats ordentligt. Olika versioner av neurala nätverk tränades och testades med hjälp av två dataset bestående av 70 000 handskrivna siffror respektive 60 000 naturliga bilder. Utgångspunkten var ett vanligt CNN där första lagret sedan byttes ut mot två olika sorters RBMs. För att evaluera effekten av RBMs jämfördes felprocent och träningstid. Resultaten visar att kombinationen av RBMs och CNNskan fungera om rätt implementerad och användas i tillämpningar. Det finns fortfarande mycket kvar att undersöka, då denna studie begränsades av den tillgängliga beräkningskraften.
This study aims to investigate what impact restricted Boltzmann machines (RBMs) have when combined with a convolutional neural network (CNN) used for image classification. This is an interesting area of research which combines supervised and unsupervised training of neural networks and it has not been thoroughly examined yet. Different versions of neural networks were trained and tested using two datasets consisting of 70 000 handwrittendigits and 60 000 natural images. The starting point was aregular CNN where the first layer then was replaced by two different kinds of RBMs. To evaluate the effect of RBMs the error rates and training times were compared. The results show that the combination of RBMs and CNNs can work if implemented right and can be used in different applications. There is still much left to investigate, since this study was limited by the available computational power.

APA, Harvard, Vancouver, ISO, and other styles

19

Reichert, David Paul. "Deep Boltzmann machines as hierarchical generative models of perceptual inference in the cortex." Thesis, University of Edinburgh, 2012. http://hdl.handle.net/1842/8300.

Full text

Abstract:

The mammalian neocortex is integral to all aspects of cognition, in particular perception across all sensory modalities. Whether computational principles can be identified that would explain why the cortex is so versatile and capable of adapting to various inputs is not clear. One well-known hypothesis is that the cortex implements a generative model, actively synthesising internal explanations of the sensory input. This ‘analysis by synthesis’ could be instantiated in the top-down connections in the hierarchy of cortical regions, and allow the cortex to evaluate its internal model and thus learn good representations of sensory input over time. Few computational models however exist that implement these principles. In this thesis, we investigate the deep Boltzmann machine (DBM) as a model of analysis by synthesis in the cortex, and demonstrate how three distinct perceptual phenomena can be interpreted in this light: visual hallucinations, bistable perception, and object-based attention. A common thread is that in all cases, the internally synthesised explanations go beyond, or deviate from, what is in the visual input. The DBM was recently introduced in machine learning, but combines several properties of interest for biological application. It constitutes a hierarchical generative model and carries both the semantics of a connectionist neural network and a probabilistic model. Thus, we can consider neuronal mechanisms but also (approximate) probabilistic inference, which has been proposed to underlie cortical processing, and contribute to the ongoing discussion concerning probabilistic or Bayesian models of cognition. Concretely, making use of the model’s capability to synthesise internal representations of sensory input, we model complex visual hallucinations resulting from loss of vision in Charles Bonnet syndrome.We demonstrate that homeostatic regulation of neuronal firing could be the underlying cause, reproduce various aspects of the syndrome, and examine a role for the neuromodulator acetylcholine. Next, we relate bistable perception to approximate, sampling-based probabilistic inference, and show how neuronal adaptation can be incorporated by providing a biological interpretation for a recently developed sampling algorithm. Finally, we explore how analysis by synthesis could be related to attentional feedback processing, employing the generative aspect of the DBM to implement a form of object-based attention. We thus present a model that uniquely combines several computational principles (sampling, neural processing, unsupervised learning) and is general enough to uniquely address a range of distinct perceptual phenomena. The connection to machine learning ensures theoretical grounding and practical evaluation of the underlying principles. Our results lend further credence to the hypothesis of a generative model in the brain, and promise fruitful interaction between neuroscience and Deep Learning approaches.

APA, Harvard, Vancouver, ISO, and other styles

20

Taskin, Kemal. "A Study On Identifying Makams With A Modified Boltzmann Machine." Master's thesis, METU, 2005. http://etd.lib.metu.edu.tr/upload/12606296/index.pdf.

Full text

Abstract:

Makams are well-defined modes of classical Turkish music. They can be taken as the Turkish music counterparts of Western music tonal structures at a certain level. Nevertheless, makams have additional features such as the usage of specific notes resulting from their different architecture and the special use of scales (i.e. progression). The main goal of this study is to construct a platform for identifying makams through a computer program by proposing a machine learning mechanism. There are restrictionson the mechanism related to the characteristics of the task. Such a mechanism should represent real-time sequential input with continuous values, should handle possible errors in this input and show immediate learning with limited data. These restrictions are valid and necessary for an analogy with the act of listening to music. A Boltzmann machine, modified for this purpose is designed, implemented and used in this study as this learning mechanism. Two characteristics of this study define its significance. First, this study is on the structural features of makams of classical Turkish music. Second, the identifying mechanism is a Boltzmann machine having a different schema than statistical identification tasks in tonality induction.

APA, Harvard, Vancouver, ISO, and other styles

21

Santos, Daniel Felipe Silva [UNESP]. "Reconhecimento de veículos em imagens coloridas utilizando máquinas de Boltzmann profundas e projeção bilinear." Universidade Estadual Paulista (UNESP), 2017. http://hdl.handle.net/11449/151478.

Full text

Abstract:

Submitted by Daniel Felipe Silva Santos null (danielfssantos@yahoo.com.br) on 2017-08-29T19:56:20Z No. of bitstreams: 1 ReconhecedorDeVeiculos2D-DBM.pdf: 3800862 bytes, checksum: 46f12ff55f4e0680833b9b1b184ad505 (MD5)
Approved for entry into archive by Luiz Galeffi (luizgaleffi@gmail.com) on 2017-08-29T20:19:13Z (GMT) No. of bitstreams: 1 santos_dfs_me_sjrp.pdf: 3800862 bytes, checksum: 46f12ff55f4e0680833b9b1b184ad505 (MD5)
Made available in DSpace on 2017-08-29T20:19:13Z (GMT). No. of bitstreams: 1 santos_dfs_me_sjrp.pdf: 3800862 bytes, checksum: 46f12ff55f4e0680833b9b1b184ad505 (MD5) Previous issue date: 2017-08-14
Neste trabalho é proposto um método para reconhecer veículos em imagens coloridas baseado em uma rede neural Perceptron Multicamadas pré-treinada por meio de técnicas de aprendizado em profundidade, sendo uma das técnicas composta por Máquinas de Boltzmann Profundas e projeção bilinear e a outra composta por Máquinas de Boltzmann Profundas Multinomiais e projeção bilinear. A proposição deste método justifica-se pela demanda cada vez maior da área de Sistemas de Transporte Inteligentes. Para se obter um reconhecedor de veículos robusto, a proposta é utilizar o método de treinamento inferencial não-supervisionado Divergência por Contraste em conjunto com o método inferencial Campos Intermediários, para treinar múltiplas instâncias das redes profundas. Na fase de pré-treinamento local do método proposto são utilizadas projeções bilineares para reduzir o número de nós nas camadas da rede. A junção das estruturas em redes profundas treinadas separadamente forma a arquitetura final da rede neural, que passa por uma etapa de pré- treinamento global por Campos Intermediários. Na última etapa de treinamentos a rede neural Perceptron Multicamadas (MLP) é inicializada com os parâmetros pré-treinados globalmente e a partir deste ponto, inicia-se um processo de treinamento supervisionado utilizando gradiente conjugado de segunda ordem. O método proposto foi avaliado sobre a base BIT-Vehicle de imagens frontais de veículos coletadas de um ambiente de tráfego real. Os melhores resultados obtidos pelo método proposto utilizando rede profunda multinomial foram de 81, 83% de acurácia média na versão aumentada da base original e 91, 10% na versão aumentada da base combinada (Carros, Caminhões e Ônibus). Para a abordagem de redes profundas não multinomiais os melhores resultados foram de 81, 42% na versão aumentada da base original e 91, 13% na versão aumentada da base combinada. Com a aplicação da projeção bilinear, houve um decréscimo considerável nos tempos de treinamento das redes profundas multinomial e não multinomial, sendo que no melhor caso o tempo de execução do método proposto foi 5, 5 vezes menor em comparação com os tempos das redes profundas sem aplicação de projeção bilinear.
In this work it is proposed a vehicle recognition method for color images based on a Multilayer Perceptron neural network pre-trained through deep learning techniques (one technique composed by Deep Boltzmann Machines and bilinear projections and the other composed by Multinomial Deep Boltzmann Machines and bilinear projections). This proposition is justified by the increasing demand in Traffic Engineering area for the class of Intelligent Transportation Systems. In order to create a robust vehicle recognizer, the proposal is to use the inferential unsupervised training method of Contrastive Divergence together with the Mean Field inferential method, for training multiple instances of deep models. In the local pre-training phase of the proposed method, bilinear projections are used to reduce the number of nodes of the neural network. The combination of the separated trained deep models constitutes the final recognizer’s architecture, that yet will be global pre-trained through Mean Field. In the last phase of training the Multilayer Perceptron neural network is initialized with globally pre-trained parameters and from this point, a process of supervised training starts using second order conjugate gradient. The proposed method was evaluated over the BIT-Vehicle database of frontal images of vehicles collected from a real road traffic environment. The best results obtained by the proposed method that used multinomial deep models were 81.83% of mean accuracy in the augmented original database version and 91.10% in the augmented combined database version (Cars, Trucks and Buses). For the non-multinomial deep models approach, the best results were 81.42% in the augmented version of the original database and 91.13% in the augmented version of the combined database. It was also observed a significant decreasing in the training times of the multinomial deep models and non-multinomial deep models with bilinear projection application, where in the best case scenario the execution time of the proposed method was 5.5 times lower than the deep models that did not use bilinear projection.

APA, Harvard, Vancouver, ISO, and other styles

22

Klein, Jacques-Olivier. "Contribution a l'etude de l'adequation algorithme-architecture : machines de boltzmann et circuits analogiques cellulaires." Paris 11, 1995. http://www.theses.fr/1995PA112009.

Full text

Abstract:

La machine de boltzmann est l'un des modeles de reseaux de neurones formels les plus efficaces pour la classification, mais la duree excessive de ses simulations exclut son application effective. Le recours au parallelisme est donc naturel. Dans le contexte des reseaux connexionnistes, la granularite fine utilisant une precision limitee permet d'obtenir des vitesses de traitement tres importantes grace a un parallelisme massif. La granularite la plus fine est obtenue en utilisant la grande densite des operateurs analogiques. Cependant, de telles architectures ne presentent d'interet que si les vitesses qu'elles apportent ouvrent la voie a de nouvelles classes d'applications et si l'imprecision des calculs analogiques ne degrade pas leurs performances de reconnaissance. D'une part, en utilisant une nouvelle methodologie d'evaluation de performances nous degageons les classes d'applications que seules des architectures analogiques permettent d'envisager: les tres gros reseaux de neurones utilises en traitement d'images ainsi que les applications de classification necessitant un temps de reponse tres courts (radar, physique des particules). D'autre part, nous evaluons la robustesse de la machine de boltzmann vis a vis des imperfections des operateurs analogiques et presentons les algorithmes originaux et les architectures permettant la compensation de ces defauts. Enfin nous presentons l'architecture materielle et logicielle d'une realisation analogique de machine de boltzmann permettant la validation de cette approche

APA, Harvard, Vancouver, ISO, and other styles

23

Gardella, Christophe. "Structure et sensibilité des réponses de populations de neurones dans la rétine." Thesis, Paris 6, 2017. http://www.theses.fr/2017PA066603/document.

Full text

Abstract:

Les cellules ganglionnaires transfèrent l'information visuelle de l’œil au cerveau, sous une forme encore débattue. Leurs réponses aux stimuli visuels sont non-linéaires, corrélées entre neurones, et une partie de l'information est présente au niveau de la population seulement. J'étudie d'abord la structure des réponses de population. Les cellules du cortex sont influencées par l'activité globale des neurones avoisinants, mais ces interactions manquaient encore de modèle. Je décris un modèle de population qui reproduit le couplage entre neurones et activité globale. Je montre que les neurones de la rétine de salamandre dépendent de l'activité globale de manière surprenante. Je décris ensuite une méthode pour caractériser la sensibilité de populations de neurones de la rétine de rat à des perturbations d'un stimulus. J'utilise des expériences en boucle fermée pour explorer sélectivement l'espace des perturbations autour d'un stimulus donné. Je montre que les réponses à de petites perturbations peuvent être décrites par une linéarisation de leur probabilité. Leur sensibilité présente des signes de codage efficace. Enfin, je montre comment estimer la sensibilité des réponses d'une population de neurones à partir de leur structure. Je montre que les machines de Boltzmann restreintes (RBMs) sont des modèles précis des corrélations neurales. Pour mesurer le pouvoir de discrimination des neurones, je cherche une métrique neurale telle que les réponses à des stimuli différents soient éloignées, et celles à un même stimulus soient proches. Je montre que les RBMs fournissent des métriques qui surpassent les métriques classiques pour discriminer de petites perturbations du stimulus
Ganglion cells form the output of the retina: they transfer visual information from the eye to the brain. How they represent information is still debated. Their responses to visual stimuli are highly nonlinear, exhibit strong correlations between neurons, and some information is only present at the population level. I first study the structure of population responses. Recent studies have shown that cortical cells are influenced by the summed activity of neighboring neurons. However, a model for these interactions was still lacking. I describe a model of population activity that reproduces the coupling between each cell and the population activity. Neurons in the salamander retina are found to depend in unexpected ways on the population activity. I then describe a method to characterize the sensitivity of rat retinal neurons to perturbations of a stimulus. Closed-loop experiments are used to explore selectively the space of perturbations around a given stimulus. I show that responses to small perturbations can be described by a local linearization of their probability, and that their sensitivity exhibits signatures of efficient coding. Finally, I show how the sensitivity of neural populations can be estimated from response structure. I show that Restricted Boltzmann Machines (RBMs) are accurate models of neural correlations. To measure the discrimination power of neural populations, I search for a neural metric such that responses to different stimuli are far apart and responses to the same stimulus are close. I show that RBMs provide such neural metrics, and outperform classical metrics at discriminating small stimulus perturbations

APA, Harvard, Vancouver, ISO, and other styles

24

Wang, Nan [Verfasser], Laurenz [Akademischer Betreuer] Wiskott, and Sen [Akademischer Betreuer] Cheng. "Learning natural image statistics with variants of restricted Boltzmann machines / Nan Wang. Gutachter: Laurenz Wiskott ; Sen Cheng." Bochum : Ruhr-Universität Bochum, 2016. http://d-nb.info/1089006179/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Lafargue, Vincent. "Contribution a la realisatin de reseaux de neurones formels : integration mixte de l'apprentissage des machines de boltzmann." Paris 11, 1993. http://www.theses.fr/1993PA112012.

Full text

Abstract:

De nombreux progres ont eu lieu dans le domaine des reseaux de neurones ces dernieres annees, notamment sur le plan theorique. Afin d'accelerer les simulations de ces reseaux, de nouvelles architectures ont ete proposees, pour apporter un gain en vitesse. Dans le cadre de cette these je me suis interesse a la machine de boltzmann qui est un reseau de neurone stochastique. En effet ce type de reseau offre des performances superieures aux autres, mais il demande des temps de simulations plus importants. Or les architectures analogiques sont particulierement adaptees aux reseaux de neurones. J'ai decrit trois etudes avancees de la machine de boltzmann analogique. Mais ces realisations analogiques de machines de boltzmann possedent un algorithme d'apprentissage cable, ce qui interdit le changement de cet algorithme apres la realisation d'un circuit. Je me suis donc attache, au cours de cette these, a concevoir un circuit mixte analogique-numerique comportant un apprentissage programmable. Apres une presentation generale des reseaux de neurones formels et de la machine de boltzmann, je presente l'architecture de la machine de boltzmann telle que nous l'avons concue. Je decris la conception et la realisation d'un prototype de synapse avec apprentissage comprenant un processeur bit-serie, un convertisseur numerique analogique et un convertisseur tension-courant. La derniere partie de la these est la validation experimentale des choix theoriques, et de la conception que j'ai effectues. En effet dans cette partie je teste le circuit prototype, puis je mets en oeuvre une machine de boltzmann construite avec ce circuit, afin de confirmer ses performances. Le gain de vitesse par rapport a une station de travail sparc2 est de 25. Les resultats positifs obtenus lors de cette mise en oeuvre nous permettent de preparer la conception d'un circuit comprenant beaucoup plus de synapses

APA, Harvard, Vancouver, ISO, and other styles

26

ZHU, YIMING. "Contribution a la realisation electronique de reseaux de neurones formels : integration analogique de l'apprentissage des machines de boltzmann." Paris 11, 1995. http://www.theses.fr/1995PA112008.

Full text

Abstract:

La recherche sur les reseaux de neurones formels (rnf) montre qu'ils peuvent apporter des solutions a certains problemes difficiles a resoudre sur les ordinateurs classiques. Pour les utiliser en temps reel, la conception de circuits integres dedies aux rnf a ete proposee. L'objectif de cette these a porte sur la conception et la realisation analogique des rnf, plus particulierement de la machine de boltzmann un modele de rnf booleen et stochastique, a temps discret. Apres une introduction generale aux rnf et a la machine de boltzmann, je presente une revue des implantations recentes de la machine de boltzmann dans chapitre 2. Le chapitre 3 est consacre a la conception et la realisation analogique des briques de base de l'apprentissage de la machine de boltzmann. J'introduis une nouvelle structure de la primitive compteur analogique, qui presente une tres bonne linearite et controlabilite. Le test de ces circuits realise avec succes l'apprentissage du reseau de xor sur une machine de boltzmann seuillee. Au chapitre 4, je compare trois algorithmes d'apprentissage de la machine de boltzmann a l'aide d'un simulateur que j'ai ecrit. Les resultats obtenus m'ont conduit a concevoir des circuits pour une architecture qui est adaptee a la fois a l'apprentissage synchrone asymetrique et a l'apprentissage asynchrone. Je presente une realisation originale de cellule neurone dont les effets secondaires sont compenses automatiquement. Au chapitre 5, je concois un demonstrateur de cette machine de boltzmann lineaire et un sequenceur pour le mettre en uvre. La simulation des problemes de classification bayesienne donne des taux d'erreur tres proches des meilleurs resultats theoriques malgre les imperfections de la realisation. Un exemple de reconnaissance d'une base d'images de chiffres est effectue sur ce demonstrateur avec les poids appris par une autre base des images, dont le taux de reconnaissance est superieur a 95%

APA, Harvard, Vancouver, ISO, and other styles

27

Schneider, C. "Using unsupervised machine learning for fault identification in virtual machines." Thesis, University of St Andrews, 2015. http://hdl.handle.net/10023/7327.

Full text

Abstract:

Self-healing systems promise operating cost reductions in large-scale computing environments through the automated detection of, and recovery from, faults. However, at present there appears to be little known empirical evidence comparing the different approaches, or demonstrations that such implementations reduce costs. This thesis compares previous and current self-healing approaches before demonstrating a new, unsupervised approach that combines artificial neural networks with performance tests to perform fault identification in an automated fashion, i.e. the correct and accurate determination of which computer features are associated with a given performance test failure. Several key contributions are made in the course of this research including an analysis of the different types of self-healing approaches based on their contextual use, a baseline for future comparisons between self-healing frameworks that use artificial neural networks, and a successful, automated fault identification in cloud infrastructure, and more specifically virtual machines. This approach uses three established machine learning techniques: Naïve Bayes, Baum-Welch, and Contrastive Divergence Learning. The latter demonstrates minimisation of human-interaction beyond previous implementations by producing a list in decreasing order of likelihood of potential root causes (i.e. fault hypotheses) which brings the state of the art one step closer toward fully self-healing systems. This thesis also examines the impact of that different types of faults have on their respective identification. This helps to understand the validity of the data being presented, and how the field is progressing, whilst examining the differences in impact to identification between emulated thread crashes and errant user changes – a contribution believed to be unique to this research. Lastly, future research avenues and conclusions in automated fault identification are described along with lessons learned throughout this endeavor. This includes the progression of artificial neural networks, how learning algorithms are being developed and understood, and possibilities for automatically generating feature locality data.

APA, Harvard, Vancouver, ISO, and other styles

28

Kivinen, Jyri Juhani. "Statistical models for natural scene data." Thesis, University of Edinburgh, 2014. http://hdl.handle.net/1842/8879.

Full text

Abstract:

This thesis considers statistical modelling of natural image data. Obtaining advances in this field can have significant impact for both engineering applications, and for the understanding of the human visual system. Several recent advances in natural image modelling have been obtained with the use of unsupervised feature learning. We consider a class of such models, restricted Boltzmann machines (RBMs), used in many recent state-of-the-art image models. We develop extensions of these stochastic artificial neural networks, and use them as a basis for building more effective image models, and tools for computational vision. We first develop a novel framework for obtaining Boltzmann machines, in which the hidden unit activations co-transform with transformed input stimuli in a stable and predictable way throughout the network. We define such models to be transformation equivariant. Such properties have been shown useful for computer vision systems, and have been motivational for example in the development of steerable filters, a widely used classical feature extraction technique. Translation equivariant feature sharing has been the standard method for scaling image models beyond patch-sized data to large images. In our framework we extend shallow and deep models to account for other kinds of transformations as well, focusing on in-plane rotations. Motivated by the unsatisfactory results of current generative natural image models, we take a step back, and evaluate whether they are able to model a subclass of the data, natural image textures. This is a necessary subcomponent of any credible model for visual scenes. We assess the performance of a state- of-the-art model of natural images for texture generation, using a dataset and evaluation techniques from in prior work. We also perform a dissection of the model architecture, uncovering the properties important for good performance. Building on this, we develop structured extensions for more complicated data comprised of textures from multiple classes, using the single-texture model architecture as a basis. These models are shown to be able to produce state-of-the-art texture synthesis results quantitatively, and are also effective qualitatively. It is demonstrated empirically that the developed multiple-texture framework provides a means to generate images of differently textured regions, more generic globally varying textures, and can also be used for texture interpolation, where the approach is radically dfferent from the others in the area. Finally we consider visual boundary prediction from natural images. The work aims to improve understanding of Boltzmann machines in the generation of image segment boundaries, and to investigate deep neural network architectures for learning the boundary detection problem. The developed networks (which avoid several hand-crafted model and feature designs commonly used for the problem), produce the fastest reported inference times in the literature, combined with state-of-the-art performance.

APA, Harvard, Vancouver, ISO, and other styles

29

Silva, Luis Alexandre da [UNESP]. "Aprendizado não-supervisionado de características para detecção de conteúdo malicioso." Universidade Estadual Paulista (UNESP), 2016. http://hdl.handle.net/11449/144635.

Full text

Abstract:

Submitted by LUIS ALEXANDRE DA SILVA null (luis@iontec.com.br) on 2016-11-10T17:42:59Z No. of bitstreams: 1 final_mestrado_LUIS_ALEXANDRE_DA_SILVA_2016.pdf: 1076876 bytes, checksum: 2ecd24d0aa99d8fac09eb7b56fc48eb7 (MD5)
Approved for entry into archive by LUIZA DE MENEZES ROMANETTO null (luizaromanetto@hotmail.com) on 2016-11-16T16:33:02Z (GMT) No. of bitstreams: 1 silva_la_me_sjrp.pdf: 1076876 bytes, checksum: 2ecd24d0aa99d8fac09eb7b56fc48eb7 (MD5)
Made available in DSpace on 2016-11-16T16:33:02Z (GMT). No. of bitstreams: 1 silva_la_me_sjrp.pdf: 1076876 bytes, checksum: 2ecd24d0aa99d8fac09eb7b56fc48eb7 (MD5) Previous issue date: 2016-08-25
O aprendizado de características tem sido um dos grandes desafios das técnicas baseadas em Redes Neurais Artificiais (RNAs), principalmente quando se trata de um grande número de amostras e características que as definem. Uma técnica ainda pouco explorada nesse campo diz respeito as baseadas em RNAs derivada das Máquinas de Boltzmann Restritas, do inglês Restricted Boltzmann Machines (RBM), principalmente na área de segurança de redes de computadores. A proposta deste trabalho visa explorar essas técnicas no campo de aprendizado não-supervisionado de características para detecção de conteúdo malicioso, especificamente na área de segurança de redes de computadores. Experimentos foram conduzidos usando técnicas baseadas em RBMs para o aprendizado não-supervisionado de características visando a detecção de conteúdo malicioso utilizando meta-heurísticas baseadas em algoritmos de otimização, voltado à detecção de spam em mensagens eletrônicas. Nos resultados alcançados por meio dos experimentos, observou-se, que com uma quantidade menor de características, podem ser obtidos resultados similares de acurácia quando comparados com as bases originais, com um menor tempo relacionado ao processo de treinamento, evidenciando que técnicas de aprendizado baseadas em RBMs são adequadas para o aprendizado de características no contexto deste trabalho.
The features learning has been one of the main challenges of techniques based on Artificial Neural Networks (ANN), especially when it comes to a large number of samples and features that define them. Restricted Boltzmann Machines (RBM) is a technique based on ANN, even little explored especially in security in computer networks. This study aims to explore these techniques in unsupervised features learning in order to detect malicious content, specifically in the security area in computer networks. Experiments were conducted using techniques based on RBMs for unsupervised features learning, which was aimed to identify malicious content, using meta-heuristics based on optimization algorithms, which was designed to detect spam in email messages. The experiment results demonstrated that fewer features can get similar results as the accuracy of the original bases with a lower training time, it was concluded that learning techniques based on RBMs are suitable for features learning in the context of this work.

APA, Harvard, Vancouver, ISO, and other styles

30

Upadhya, Vidyadhar. "Efficient Algorithms for Learning Restricted Boltzmann Machines." Thesis, 2020. https://etd.iisc.ac.in/handle/2005/4840.

Full text

Abstract:

The probabilistic generative models learn useful features from unlabeled data which can be used for subsequent problem-specific tasks, such as classification, regression or information retrieval. The RBM is one such important energy based probabilistic generative model. RBMs are also the building blocks for several deep generative models. It is difficult to train and evaluate RBMs mainly because the normalizing constant (known as the partition function) for the distribution that they represent is computationally hard to evaluate. Therefore, various approximate methods (based noisy gradient of the log likelihood estimated through sampling) are used to train RBMs. Thus, building efficient learning algorithms for the RBM is an important problem. In this thesis, we consider the problem of maximum likelihood learning of RBMs. We consider both binary-binary RBMs as well as Gaussian-binary RBMs. We propose a new algorithm for learning binary-binary RBMs by exploiting the property that the BB-RBM log-likelihood function is a difference of convex functions w.r.t. its parameters. In the standard difference of convex functions programming (DCP), the optimization proceeds through solving a convex optimization problem at each iteration. In the case of RBM, this convex objective function contains the partition function and hence its gradient computation may be intractable. We propose a stochastic variant of the difference of convex functions optimization algorithm, termed S-DCP, where the the convex optimization problem at each iteration is approximately solved through a few iterations of stochastic gradient descent. The resulting algorithm is simple and the contrastive divergence~(CD) algorithm, the current standard algorithm for learning RBMs, can be derived as a special case of the proposed algorithm. It is seen through empirical studies that S-DCP improves the optimization dynamics of learning binary-binary RBMs. We further modify this algorithm to accommodate centered gradients. Through extensive empirical studies on a number of benchmark datasets, we demonstrate the superior performance of the proposed algorithms. It is well documented in the literature that learning Gaussian-binary RBMs is more difficult compared to binary-binary RBMs. We extend the S-DCP algorithm to learn Gaussian-binary RBMs by proving that the Gaussian-binary RBM log-likelihood function is also a difference of convex functions w.r.t. the weights and hidden biases under the assumption that the conditional distribution of the visible units have a fixed variance. Through extensive empirical studies on a number of benchmark datasets, we demonstrate that S-DCP learns good models more efficiently compared to CD and Persistent CD, the current standard algorithms for learning Gaussian-binary RBMs. We further modify the S-DCP algorithm to accommodate variance update (outside the inner loop of the convex optimization) so that we can learn the variance parameter of visible units too instead of keeping it fixed. We empirically analyse the resulting algorithm and show that it is more efficient compared to the current algorithms. The second order learning methods provide invariance to re-parameterization of the model and also, improve the optimization dynamics of the learning algorithm by providing parameter specific (adaptive) learning rates. However, the Hessian of the log-likelihood, required for second order learning algorithm, can only be estimated through sampling and this noisy estimate makes the optimization algorithm unstable. Moreover, the computation of the Hessian inverse is expensive. We propose a second order learning algorithm on the convex S-DCP objective function using diagonal approximation of the Hessian which, we show, can be easily computed with the gradient estimates. To compensate for the noise in the Hessian estimate and to make the algorithm stable, we use an exponential averaging over these estimates. We show empirically that the resulting algorithm, termed S-DCP-D, is computationally cheap, stable and improves the performance of S-DCP further. Our empirical results show that the centered S-DCP as well as the diagonally scaled S-DCP are effective and efficient methods for learning RBMs. In all methods for learning RBMs, the log-likelihood achieved on the held-out test samples are used to evaluate the quality of learnt RBMs and for fixing the hyperparameters. However, the presence of the partition function makes estimating the log-likelihood intractable for models with large dimension. Currently one uses some sampling based methods for approximating the log likelihood. We provide an empirical analysis of these sampling based algorithms to estimate the log-likelihood and suggest some simple techniques to improve these estimates.

APA, Harvard, Vancouver, ISO, and other styles

31

Schneider, Roland. "Deterministic Boltzmann machines : learning instabilities and hardware implications." 1993. http://hdl.handle.net/1993/9699.

Full text

APA, Harvard, Vancouver, ISO, and other styles

32

Ly, Daniel Le. "A High-performance, Reconfigurable Architecture for Restricted Boltzmann Machines." Thesis, 2009. http://hdl.handle.net/1807/18805.

Full text

Abstract:

Despite the popularity and success of neural networks in research, the number of resulting commercial or industrial applications have been limited. A primary cause of this lack of adoption is due to the fact that neural networks are usually implemented as software running on general-purpose processors. Hence, a hardware implementation that can take advantage of the inherent parallelism in neural networks is desired. This thesis investigates how the Restricted Boltzmann machine, a popular type of neural network, can be effectively mapped to a high-performance hardware architecture on FPGA platforms. The proposed, modular framework is designed to reduce the time complexity of the computations through heavily customized hardware engines. The framework is tested on a platform of four Xilinx Virtex II-Pro XC2VP70 FPGAs running at 100MHz through a variety of different configurations. The maximum performance was obtained by instantiating a Restricted Boltzmann Machine of 256x256 nodes distributed across four FPGAs, which results in a computational speed of 3.13 billion connection-updates-per-second and a speed-up of 145-fold over an optimized C program running on a 2.8GHz Intel processor.

APA, Harvard, Vancouver, ISO, and other styles

33

Desjardins, Guillaume. "Improving sampling, optimization and feature extraction in Boltzmann machines." Thèse, 2013. http://hdl.handle.net/1866/10550.

Full text

Abstract:

L’apprentissage supervisé de réseaux hiérarchiques à grande échelle connaît présentement un succès fulgurant. Malgré cette effervescence, l’apprentissage non-supervisé représente toujours, selon plusieurs chercheurs, un élément clé de l’Intelligence Artificielle, où les agents doivent apprendre à partir d’un nombre potentiellement limité de données. Cette thèse s’inscrit dans cette pensée et aborde divers sujets de recherche liés au problème d’estimation de densité par l’entremise des machines de Boltzmann (BM), modèles graphiques probabilistes au coeur de l’apprentissage profond. Nos contributions touchent les domaines de l’échantillonnage, l’estimation de fonctions de partition, l’optimisation ainsi que l’apprentissage de représentations invariantes. Cette thèse débute par l’exposition d’un nouvel algorithme d'échantillonnage adaptatif, qui ajuste (de fa ̧con automatique) la température des chaînes de Markov sous simulation, afin de maintenir une vitesse de convergence élevée tout au long de l’apprentissage. Lorsqu’utilisé dans le contexte de l’apprentissage par maximum de vraisemblance stochastique (SML), notre algorithme engendre une robustesse accrue face à la sélection du taux d’apprentissage, ainsi qu’une meilleure vitesse de convergence. Nos résultats sont présent ́es dans le domaine des BMs, mais la méthode est générale et applicable à l’apprentissage de tout modèle probabiliste exploitant l’échantillonnage par chaînes de Markov. Tandis que le gradient du maximum de vraisemblance peut-être approximé par échantillonnage, l’évaluation de la log-vraisemblance nécessite un estimé de la fonction de partition. Contrairement aux approches traditionnelles qui considèrent un modèle donné comme une boîte noire, nous proposons plutôt d’exploiter la dynamique de l’apprentissage en estimant les changements successifs de log-partition encourus à chaque mise à jour des paramètres. Le problème d’estimation est reformulé comme un problème d’inférence similaire au filtre de Kalman, mais sur un graphe bi-dimensionnel, où les dimensions correspondent aux axes du temps et au paramètre de température. Sur le thème de l’optimisation, nous présentons également un algorithme permettant d’appliquer, de manière efficace, le gradient naturel à des machines de Boltzmann comportant des milliers d’unités. Jusqu’à présent, son adoption était limitée par son haut coût computationel ainsi que sa demande en mémoire. Notre algorithme, Metric-Free Natural Gradient (MFNG), permet d’éviter le calcul explicite de la matrice d’information de Fisher (et son inverse) en exploitant un solveur linéaire combiné à un produit matrice-vecteur efficace. L’algorithme est prometteur: en terme du nombre d’évaluations de fonctions, MFNG converge plus rapidement que SML. Son implémentation demeure malheureusement inefficace en temps de calcul. Ces travaux explorent également les mécanismes sous-jacents à l’apprentissage de représentations invariantes. À cette fin, nous utilisons la famille de machines de Boltzmann restreintes “spike & slab” (ssRBM), que nous modifions afin de pouvoir modéliser des distributions binaires et parcimonieuses. Les variables latentes binaires de la ssRBM peuvent être rendues invariantes à un sous-espace vectoriel, en associant à chacune d’elles, un vecteur de variables latentes continues (dénommées “slabs”). Ceci se traduit par une invariance accrue au niveau de la représentation et un meilleur taux de classification lorsque peu de données étiquetées sont disponibles. Nous terminons cette thèse sur un sujet ambitieux: l’apprentissage de représentations pouvant séparer les facteurs de variations présents dans le signal d’entrée. Nous proposons une solution à base de ssRBM bilinéaire (avec deux groupes de facteurs latents) et formulons le problème comme l’un de “pooling” dans des sous-espaces vectoriels complémentaires.
Despite the current widescale success of deep learning in training large scale hierarchical models through supervised learning, unsupervised learning promises to play a crucial role towards solving general Artificial Intelligence, where agents are expected to learn with little to no supervision. The work presented in this thesis tackles the problem of unsupervised feature learning and density estimation, using a model family at the heart of the deep learning phenomenon: the Boltzmann Machine (BM). We present contributions in the areas of sampling, partition function estimation, optimization and the more general topic of invariant feature learning. With regards to sampling, we present a novel adaptive parallel tempering method which dynamically adjusts the temperatures under simulation to maintain good mixing in the presence of complex multi-modal distributions. When used in the context of stochastic maximum likelihood (SML) training, the improved ergodicity of our sampler translates to increased robustness to learning rates and faster per epoch convergence. Though our application is limited to BM, our method is general and is applicable to sampling from arbitrary probabilistic models using Markov Chain Monte Carlo (MCMC) techniques. While SML gradients can be estimated via sampling, computing data likelihoods requires an estimate of the partition function. Contrary to previous approaches which consider the model as a black box, we provide an efficient algorithm which instead tracks the change in the log partition function incurred by successive parameter updates. Our algorithm frames this estimation problem as one of filtering performed over a 2D lattice, with one dimension representing time and the other temperature. On the topic of optimization, our thesis presents a novel algorithm for applying the natural gradient to large scale Boltzmann Machines. Up until now, its application had been constrained by the computational and memory requirements of computing the Fisher Information Matrix (FIM), which is square in the number of parameters. The Metric-Free Natural Gradient algorithm (MFNG) avoids computing the FIM altogether by combining a linear solver with an efficient matrix-vector operation. The method shows promise in that the resulting updates yield faster per-epoch convergence, despite being slower in terms of wall clock time. Finally, we explore how invariant features can be learnt through modifications to the BM energy function. We study the problem in the context of the spike & slab Restricted Boltzmann Machine (ssRBM), which we extend to handle both binary and sparse input distributions. By associating each spike with several slab variables, latent variables can be made invariant to a rich, high dimensional subspace resulting in increased invariance in the learnt representation. When using the expected model posterior as input to a classifier, increased invariance translates to improved classification accuracy in the low-label data regime. We conclude by showing a connection between invariance and the more powerful concept of disentangling factors of variation. While invariance can be achieved by pooling over subspaces, disentangling can be achieved by learning multiple complementary views of the same subspace. In particular, we show how this can be achieved using third-order BMs featuring multiplicative interactions between pairs of random variables.

APA, Harvard, Vancouver, ISO, and other styles

34

(10276277), Monika Kamma. "Information Retrieval using Markov random Fields and Restricted Boltzmann Machines." Thesis, 2021.

Find full text

Abstract:

When a user types in a search query in an Information Retrieval system, a list of top ‘n’ ranked documents relevant to the query are returned by the system. Relevant means not just returning documents that belong to the same category as that of the search query, but also returning documents that provide a concise answer to the search query. Determining the relevance of the documents is a significant challenge as the classic indexing techniques that use term/word frequencies do not consider the term (word) dependencies or the impact of previous terms on the current words or the meaning of the words in the document. There is a need to model the dependencies of the terms in the text data and learn the underlying statistical patterns to find the similarity between the user query and the documents to determine the relevancy.

This research proposes a solution based on Markov Random Fields (MRF) and Restricted Boltzmann Machines (RBM) to solve the problem of term dependencies and learn the underlying patterns to return documents that are very similar to the user query.

APA, Harvard, Vancouver, ISO, and other styles

35

Tsai, Bing-Chen, and 蔡秉宸. "A Study on Training Deep Neural Nets Based on Restricted Boltzmann Machines." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/50717890520952729454.

Full text

Abstract:

碩士
國立交通大學
電子工程學系電子研究所
102
In this thesis, we will discuss how to training a deep architecture model efficiently, then, the parameters are the important role in our model. We will discuss the influence between different parameters, include initial weight、learning rate. There are two parts in this deep architecture, one is unsupervised pre-training, the other is supervised fine-tune. In unsupervised pre-training, we use an efficient learning method “Restricted Boltzmann Machines” to extract the feature form data. In the supervised fine-tune, we use “Wake-Sleep algorithm” to establish a deep neural nets.

APA, Harvard, Vancouver, ISO, and other styles

36

Larochelle, Hugo. "Étude de techniques d'apprentissage non-supervisé pour l'amélioration de l'entraînement supervisé de modèles connexionnistes." Thèse, 2008. http://hdl.handle.net/1866/6435.

Full text

APA, Harvard, Vancouver, ISO, and other styles

37

Lajoie, Isabelle. "Apprentissage de représentations sur-complètes par entraînement d’auto-encodeurs." Thèse, 2009. http://hdl.handle.net/1866/3768.

Full text

Abstract:

Les avancés dans le domaine de l’intelligence artificielle, permettent à des systèmes informatiques de résoudre des tâches de plus en plus complexes liées par exemple à la vision, à la compréhension de signaux sonores ou au traitement de la langue. Parmi les modèles existants, on retrouve les Réseaux de Neurones Artificiels (RNA), dont la popularité a fait un grand bond en avant avec la découverte de Hinton et al. [22], soit l’utilisation de Machines de Boltzmann Restreintes (RBM) pour un pré-entraînement non-supervisé couche après couche, facilitant grandement l’entraînement supervisé du réseau à plusieurs couches cachées (DBN), entraînement qui s’avérait jusqu’alors très difficile à réussir. Depuis cette découverte, des chercheurs ont étudié l’efficacité de nouvelles stratégies de pré-entraînement, telles que l’empilement d’auto-encodeurs traditionnels(SAE) [5, 38], et l’empilement d’auto-encodeur débruiteur (SDAE) [44]. C’est dans ce contexte qu’a débuté la présente étude. Après un bref passage en revue des notions de base du domaine de l’apprentissage machine et des méthodes de pré-entraînement employées jusqu’à présent avec les modules RBM, AE et DAE, nous avons approfondi notre compréhension du pré-entraînement de type SDAE, exploré ses différentes propriétés et étudié des variantes de SDAE comme stratégie d’initialisation d’architecture profonde. Nous avons ainsi pu, entre autres choses, mettre en lumière l’influence du niveau de bruit, du nombre de couches et du nombre d’unités cachées sur l’erreur de généralisation du SDAE. Nous avons constaté une amélioration de la performance sur la tâche supervisée avec l’utilisation des bruits poivre et sel (PS) et gaussien (GS), bruits s’avérant mieux justifiés que celui utilisé jusqu’à présent, soit le masque à zéro (MN). De plus, nous avons démontré que la performance profitait d’une emphase imposée sur la reconstruction des données corrompues durant l’entraînement des différents DAE. Nos travaux ont aussi permis de révéler que le DAE était en mesure d’apprendre, sur des images naturelles, des filtres semblables à ceux retrouvés dans les cellules V1 du cortex visuel, soit des filtres détecteurs de bordures. Nous aurons par ailleurs pu montrer que les représentations apprises du SDAE, composées des caractéristiques ainsi extraites, s’avéraient fort utiles à l’apprentissage d’une machine à vecteurs de support (SVM) linéaire ou à noyau gaussien, améliorant grandement sa performance de généralisation. Aussi, nous aurons observé que similairement au DBN, et contrairement au SAE, le SDAE possédait une bonne capacité en tant que modèle générateur. Nous avons également ouvert la porte à de nouvelles stratégies de pré-entraînement et découvert le potentiel de l’une d’entre elles, soit l’empilement d’auto-encodeurs rebruiteurs (SRAE).
Progress in the machine learning domain allows computational system to address more and more complex tasks associated with vision, audio signal or natural language processing. Among the existing models, we find the Artificial Neural Network (ANN), whose popularity increased suddenly with the recent breakthrough of Hinton et al. [22], that consists in using Restricted Boltzmann Machines (RBM) for performing an unsupervised, layer by layer, pre-training initialization, of a Deep Belief Network (DBN), which enables the subsequent successful supervised training of such architecture. Since this discovery, researchers studied the efficiency of other similar pre-training strategies such as the stacking of traditional auto-encoder (SAE) [5, 38] and the stacking of denoising auto-encoder (SDAE) [44]. This is the context in which the present study started. After a brief introduction of the basic machine learning principles and of the pre-training methods used until now with RBM, AE and DAE modules, we performed a series of experiments to deepen our understanding of pre-training with SDAE, explored its different proprieties and explored variations on the DAE algorithm as alternative strategies to initialize deep networks. We evaluated the sensitivity to the noise level, and influence of number of layers and number of hidden units on the generalization error obtained with SDAE. We experimented with other noise types and saw improved performance on the supervised task with the use of pepper and salt noise (PS) or gaussian noise (GS), noise types that are more justified then the one used until now which is masking noise (MN). Moreover, modifying the algorithm by imposing an emphasis on the corrupted components reconstruction during the unsupervised training of each different DAE showed encouraging performance improvements. Our work also allowed to reveal that DAE was capable of learning, on naturals images, filters similar to those found in V1 cells of the visual cortex, that are in essence edges detectors. In addition, we were able to verify that the learned representations of SDAE, are very good characteristics to be fed to a linear or gaussian support vector machine (SVM), considerably enhancing its generalization performance. Also, we observed that, alike DBN, and unlike SAE, the SDAE had the potential to be used as a good generative model. As well, we opened the door to novel pre-training strategies and discovered the potential of one of them : the stacking of renoising auto-encoders (SRAE).

APA, Harvard, Vancouver, ISO, and other styles

38

Chen, Ying-Tsen, and 陳映岑. "Applying the Method of Deep Belief Network Pre-trained by Restricted Boltzmann Machines on High Confused Mandarin Vowel Recognition." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/8pukp3.

Full text

Abstract:

碩士
國立中興大學
統計學研究所
106
This thesis mainly uses deep belief network (DBN) pre-trained by restricted Boltzmann machine (RBM) to recognize high confused mandarin vowels such as ㄢ, ㄤ>, ㄛ , ㄨㄛ>, ㄥ, ㄣ>, etc. First, we would record the phonetic data of 20 speakers, and then perform a series of pre-processing such as digital sampling, endpoint detection, frame cutting, and windowing. Then take Mel-frequency cepstral coefficients (MFCC) as the features of the phonetic data, and use these features as the input to train the model. Different from multilayer perceptron (MLP) which uses random initial weights and biases, DBN uses RBM to pre-train the initial parameters in order to get a set of better initial parameters. After pre-training, take these initial parameters as the initial weights and biases of MLP, and then fine-tune these parameters by method of gradient descent. Since DBN obtains better initial parameters by pre-training, in the stage of using MLP to fine-tune parameters, the model converges faster than general MLP, and the recognition result is better, too. This research uses vowel data, each vowel has 25 frames, each frame has 39 features, and the model is DBN pre-trained by RBM which has one or two hidden layers. The identification rate of this method is at least 0.67% higher than that of MLP, and can increase by 9.61% at most. On average, DBN pre-trained by RBM has 4.59% higher identification rate than MLP.

APA, Harvard, Vancouver, ISO, and other styles

39

KUMAR, KARAN. "HANDWRITTEN DIGIT CLASSIFICATION USING DEEP LEARNING." Thesis, 2016. http://dspace.dtu.ac.in:8080/jspui/handle/repository/14801.

Full text

Abstract:

Abstract Representation of data is identified as a very important concept before applying any classification technique as it helps to make sense of data (images, videos etc.) and learn features. However training a single layer linear or non linear classifier has serious limitations considering the vastness of variability in data. The variability can be expressed in terms of handwriting of a person, pre-processing of images in the problem domain of classifying handwritten digits. Selection of features/latent factors therefore becomes an important aspect of classification since they are able to represent more abstract concepts related to data and each one can be provided a unique significance value. We have compared various approaches and their variations to generate an optima set of features which can be used for the classification problem of handwritten digits. Restricted Boltzmann machines(RBM) which form the baseline for deep learning are used to discover latent factors which then feed forward to higher level RBM’s or classifiers. The classifiers studied in the research include Linear Mapping, Radial Basis Function Neural Network, and Backpropagation and up-down algorithm. Results from all variations in RBM parameters and classifiers are observed and discussed. We have compared our results with other related works and it is found that the maximum accuracy achieved is 97.7%

APA, Harvard, Vancouver, ISO, and other styles

40

Dauphin, Yann. "Advances in scaling deep learning algorithms." Thèse, 2015. http://hdl.handle.net/1866/13710.

Full text

APA, Harvard, Vancouver, ISO, and other styles

41

Taylor, Graham William. "Composable, Distributed-state Models for High-dimensional Time Series." Thesis, 2009. http://hdl.handle.net/1807/19238.

Full text

Abstract:

In this thesis we develop a class of nonlinear generative models for high-dimensional time series. The first key property of these models is their distributed, or "componential" latent state, which is characterized by binary stochastic variables which interact to explain the data. The second key property is the use of an undirected graphical model to represent the relationship between latent state (features) and observations. The final key property is composability: the proposed class of models can form the building blocks of deep networks by successively training each model on the features extracted by the previous one. We first propose a model based on the Restricted Boltzmann Machine (RBM) that uses an undirected model with binary latent variables and real-valued "visible" variables. The latent and visible variables at each time step receive directed connections from the visible variables at the last few time-steps. This "conditional" RBM (CRBM) makes on-line inference efficient and allows us to use a simple approximate learning procedure. We demonstrate the power of our approach by synthesizing various motion sequences and by performing on-line filling in of data lost during motion capture. We also explore CRBMs as priors in the context of Bayesian filtering applied to multi-view and monocular 3D person tracking. We extend the CRBM in a way that preserves its most important computational properties and introduces multiplicative three-way interactions that allow the effective interaction weight between two variables to be modulated by the dynamic state of a third variable. We introduce a factoring of the implied three-way weight tensor to permit a more compact parameterization. The resulting model can capture diverse styles of motion with a single set of parameters, and the three-way interactions greatly improve its ability to blend motion styles or to transition smoothly among them. In separate but related work, we revisit Products of Hidden Markov Models (PoHMMs). We show how the partition function can be estimated reliably via Annealed Importance Sampling. This enables us to demonstrate that PoHMMs outperform various flavours of HMMs on a variety of tasks and metrics, including log likelihood.

APA, Harvard, Vancouver, ISO, and other styles

42

Lemieux, Simon. "Espaces de timbre générés par des réseaux profonds convolutionnels." Thèse, 2011. http://hdl.handle.net/1866/6294.

Full text

Abstract:

Il est avant-tout question, dans ce mémoire, de la modélisation du timbre grâce à des algorithmes d'apprentissage machine. Plus précisément, nous avons essayé de construire un espace de timbre en extrayant des caractéristiques du son à l'aide de machines de Boltzmann convolutionnelles profondes. Nous présentons d'abord un survol de l'apprentissage machine, avec emphase sur les machines de Boltzmann convolutionelles ainsi que les modèles dont elles sont dérivées. Nous présentons aussi un aperçu de la littérature concernant les espaces de timbre, et mettons en évidence quelque-unes de leurs limitations, dont le nombre limité de sons utilisés pour les construire. Pour pallier à ce problème, nous avons mis en place un outil nous permettant de générer des sons à volonté. Le système utilise à sa base des plug-ins qu'on peut combiner et dont on peut changer les paramètres pour créer une gamme virtuellement infinie de sons. Nous l'utilisons pour créer une gigantesque base de donnée de timbres générés aléatoirement constituée de vrais instruments et d'instruments synthétiques. Nous entrainons ensuite les machines de Boltzmann convolutionnelles profondes de façon non-supervisée sur ces timbres, et utilisons l'espace des caractéristiques produites comme espace de timbre. L'espace de timbre ainsi obtenu est meilleur qu'un espace semblable construit à l'aide de MFCC. Il est meilleur dans le sens où la distance entre deux timbres dans cet espace est plus semblable à celle perçue par un humain. Cependant, nous sommes encore loin d'atteindre les mêmes capacités qu'un humain. Nous proposons d'ailleurs quelques pistes d'amélioration pour s'en approcher.
This thesis presents a novel way of modelling timbre using machine learning algorithms. More precisely, we have attempted to build a timbre space by extracting audio features using deep-convolutional Boltzmann machines. We first present an overview of machine learning with an emphasis on convolutional Boltzmann machines as well as models from which they are derived. We also present a summary of the literature relevant to timbre spaces and highlight their limitations, such as the small number of timbres used to build them. To address this problem, we have developed a sound generation tool that can generate as many sounds as we wish. At the system's core are plug-ins that are parameterizable and that we can combine to create a virtually infinite range of sounds. We use it to build a massive randomly generated timbre dataset that is made up of real and synthesized instruments. We then train deep-convolutional Boltzmann machines on those timbres in an unsupervised way and use the produced feature space as a timbre space. The timbre space we obtain is a better space than a similar space built using MFCCs. We consider it as better in the sense that the distance between two timbres in that space is more similar to the one perceived by a human listener. However, we are far from reaching the performance of a human. We finish by proposing possible improvements that could be tried to close our performance gap.

APA, Harvard, Vancouver, ISO, and other styles

43

Goodfellow, Ian. "Deep learning of representations and its application to computer vision." Thèse, 2014. http://hdl.handle.net/1866/11674.

Full text

APA, Harvard, Vancouver, ISO, and other styles

44

(7551479), Brian Matthew Sutton. "On Spin-inspired Realization of Quantum and Probabilistic Computing." Thesis, 2019.

Find full text

Abstract:

The decline of Moore's law has catalyzed a significant effort to identify beyond-CMOS devices and architectures for the coming decades. A multitude of classical and quantum systems have been proposed to address this challenge, and spintronics has emerged as a promising approach for these post-Moore systems. Many of these architectures are tailored specifically for applications in combinatorial optimization and machine learning. Here we propose the use of spintronics for such applications by exploring two distinct but related computing paradigms. First, the use of spin-currents to manipulate and control quantum information is investigated with demonstrated high-fidelity gate operation. This control is accomplished through repeated entanglement and measurement of a stationary qubit with a flying-spin through spin-torque like effects. Secondly, by transitioning from single-spin quantum bits to larger spin ensembles, we then explore the use of stochastic nanomagnets to realize a probabilistic system that is intrinsically governed by Boltzmann statistics. The nanomagnets explore the search space at rapid speeds and can be used in a wide-range of applications including optimization and quantum emulation by encoding the solution to a given problem as the ground state of the equivalent Boltzmann machine. These applications are demonstrated through hardware emulation using an all-digital autonomous probabilistic circuit.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Machines de Boltzmann restreintes'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles