To see the other types of publications on this topic, follow the link: Dati multimodali.

Dissertations / Theses on the topic 'Dati multimodali'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Dati multimodali.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

GIANSANTI, VALENTINA. "Integration of heterogeneous single cell data with Wasserstein Generative Adversarial Networks." Doctoral thesis, Università degli Studi di Milano-Bicocca, 2023. https://hdl.handle.net/10281/404516.

Full text
Abstract:
Tessuti, organi e organismi sono sistemi biologici complessi, oggetto di studi che mirano alla caratterizzazione dei loro processi biologici. Comprendere il loro funzionamento e la loro interazione in campioni sani e malati consente di interferire, correggere e prevenire le disfunzioni dalle quali si sviluppano possibilmente le malattie. I recenti sviluppi nelle tecnologie di sequenziamento single-cell stanno ampliano la capacità di profilare, a livello di singola cellula, diversi layer molecolari (trascrittoma, genoma, epigenoma, proteoma). Il numero, la grandezza e le diverse modalità dei dataset prodotti è in continua crescita. Ciò spinge allo sviluppo di robusti metodi per l’integrazione di dataset multiomici, che siano essi descrittivi o meno delle stesse cellule. L’integrazione di più fonti di informazione produce una descrizione più ampia e completa dell’intero sistema analizzato. La maggior parte dei sistemi di integrazione disponibili ad oggi consente l’analisi simultanea di un numero limitato di omiche (generalmente due) e richiede conoscenze pregresse riguardo le loro relazioni. Questi metodi spesso impongono la traduzione di una modalità nelle variabili espresse da un altro dato (ad esempio, i picchi di ATAC vengono convertiti in gene activity matrix). Questo step introduce un livello di approssimazione nel dato che potrebbe pregiudicare le analisi svolte in seguito. Da qui nasce MOWGAN (Multi Omic Wasserstein Generative Adversarial Network), un framework basato sul deep-learning, per la simulazione di dati multimodali appaiati in grado di supportare un alto numero di dataset (più di due) e agnostico sulle relazioni che intercorrono tra loro (non viene imposta alcuna assunzione). Ogni modalità viene proiettata in uno spazio descrittivo ridotto, le cui dimensioni sono fissate per tutti i datasets. Questo processo previene la traduzione tra modalità. Le cellule, descritte da vettori nello spazio ridotto, vengono ordinate in base alla prima componente della loro Laplacian Eigenmap. Un regressore Bayesian viene successivamente applicato per selezionare i mini-batch con i quali viene allenata una particolare architettura di deep-learning, la Wasserstein Generative Adversarial Network with gradient penalty. La componente generativa della rete restituisce in uscita un nuovo dataset, appaiato, che viene utilizzato come ponte per il passaggio di informazioni tra i dataset originali. Lo sviluppo di MOWGAN è stato condotto con l’ausilio di dati pubblici per i quali erano disponibili osservazioni di RNA e ATAC sia per le stesse cellule, che per cellule differenti. La valutazione dei risultati è stata condotta sulla base della capacità del dato prodotto di essere integrato con il dato originale. Inoltre, il dato sintetico deve avere informazione condivisa tra le diverse omiche. Questa deve rispettare la natura biologica del dato: le associazioni non devono essere presenti tra entità cellulari rappresentanti tipi cellulari differenti. L’organizzazione del dato in mini-batch consente a MOWGAN di avere una architettura di rete indipendente dal numero di modalità considerate. Infatti, il framework è stato applicato anche per l’integrazione di tre (RNA, ATAC e proteine, RNA ATAC e modificazioni istoniche) e quattro modalità (RNA, ATAC, proteine e modificazioni istoniche). Il rendimento di MOWGAN è stato dunque valutato in termini di scalabilità computazionale (integrazione di molteplici datasets) e significato biologico, essendo quest’ultimo il più importante per non giungere a conclusioni errate nello studio in essere. È stato eseguito un confronto con altri metodi già disponibili in letteratura, riscontrando la maggiore capacità di MOWGAN di creare associazioni inter-modali tra entità cellulari realmente legate. In conclusione, MOWGAN è uno strumento potente per l’integrazione di dati multi-modali in single-cell, che risponde a molte delle problematiche riscontrate nel campo.
Tissues, organs and organisms are complex biological systems. They are objects of many studies aiming at characterizing their biological processes. Understanding how they work and how they interact in healthy and unhealthy samples gives the possibility to interfere, correcting and preventing dysfunctions, possibly leading to diseases. Recent advances in single-cell technologies are expanding our capabilities to profile at single-cell resolution various molecular layers, by targeting the transcriptome, the genome, the epigenome and the proteome. The number of single-cell datasets, their size and the diverse modalities they describe is continuously increasing, prompting the need to develop robust methods to integrate multiomic datasets, whether paired from the same cells or, most challenging, from unpaired separate experiments. The integration of different source of information results in a more comprehensive description of the whole system. Most published methods allow the integration of limited number of omics (generally two) and make assumptions about their inter-relationships. They often impose the conversion of a data modality into the other one (e.g., ATAC peaks converted in a gene activity matrix). This step introduces an important level of approximation, which could affect the analysis later performed. Here we propose MOWGAN (Multi Omic Wasserstein Generative Adversarial Network), a deep-learning based framework to simulate paired multimodal data supporting high number of modalities (more than two) and agnostic about their relationships (no assumption is imposed). Each modality is embedded into feature spaces with same dimensionality across all modalities. This step prevents any conversion between data modalities. The embeddings are sorted based on the first Laplacian Eigenmap. Mini-batches are selected by a Bayesian ridge regressor to train a Wasserstein Generative Adversarial Network with gradient penalty. The output of the generative network is used to bridge real unpaired data. MOWGAN was prototyped on public data for which paired and unpaired RNA and ATAC experiments exists. Evaluation was conducted on the ability to produce data integrable with the original ones, on the amount of shared information between synthetic layers and on the ability to impose association between molecular layers that are truly connected. The organization of the embeddings in mini-batches allows MOWGAN to have a network architecture independent of the number of modalities evaluated. Indeed, the framework was also successfully applied to integrate three (e.g., RNA, ATAC and protein or histone modification data) and four modalities (e.g., RNA, ATAC, protein, histone modifications). MOWGAN’s performance was evaluated in terms of both computational scalability and biological meaning, being the latter the most important to avoid erroneous conclusion. A comparison was conducted with published methods, concluding that MOWGAN performs better when looking at the ability to retrieve the correct biological identity (e.g., cell types) and associations. In conclusion, MOWGAN is a powerful tool for multi-omics data integration in single-cell, which answer most of the critical issues observed in the field.
APA, Harvard, Vancouver, ISO, and other styles
2

Medjahed, Hamid. "Distress situation identification by multimodal data fusion for home healthcare telemonitoring." Thesis, Evry, Institut national des télécommunications, 2010. http://www.theses.fr/2010TELE0002/document.

Full text
Abstract:
Aujourd'hui, la proportion des personnes âgées devient importante par rapport à l'ensemble de la population, et les capacités d'admission dans les hôpitaux sont limitées. En conséquence, plusieurs systèmes de télévigilance médicale ont été développés, mais il existe peu de solutions commerciales. Ces systèmes se concentrent soit sur la mise en oeuvre d’une architecture générique pour l'intégration des systèmes d'information médicale, soit sur l'amélioration de la vie quotidienne des patients en utilisant divers dispositifs automatiques avec alarme, soit sur l’offre de services de soins aux patients souffrant de certaines maladies comme l'asthme, le diabète, les problèmes cardiaques ou pulmonaires, ou la maladie d'Alzheimer. Dans ce contexte, un système automatique pour la télévigilance médicale à domicile est une solution pour faire face à ces problèmes et ainsi permettre aux personnes âgées de vivre en toute sécurité et en toute indépendance à leur domicile. Dans cette thèse, qui s’inscrit dans le cadre de la télévigilance médicale, un nouveau système de télévigilance médicale à plusieurs modalités nommé EMUTEM (Environnement Multimodale pour la Télévigilance Médicale) est présenté. Il combine et synchronise plusieurs modalités ou capteurs, grâce à une technique de fusion de données multimodale basée sur la logique floue. Ce système peut assurer une surveillance continue de la santé des personnes âgées. L'originalité de ce système avec la nouvelle approche de fusion est sa flexibilité à combiner plusieurs modalités de télévigilance médicale. Il offre un grand bénéfice aux personnes âgées en surveillant en permanence leur état de santé et en détectant d’éventuelles situations de détresse
The population age increases in all societies throughout the world. In Europe, for example, the life expectancy for men is about 71 years and for women about 79 years. For North America the life expectancy, currently is about 75 for men and 81 for women. Moreover, the elderly prefer to preserve their independence, autonomy and way of life living at home the longest time possible. The current healthcare infrastructures in these countries are widely considered to be inadequate to meet the needs of an increasingly older population. Home healthcare monitoring is a solution to deal with this problem and to ensure that elderly people can live safely and independently in their own homes for as long as possible. Automatic in-home healthcare monitoring is a technological approach which helps people age in place by continuously telemonitoring. In this thesis, we explore automatic in-home healthcare monitoring by conducting a study of professionals who currently perform in-home healthcare monitoring, by combining and synchronizing various telemonitoring modalities,under a data synchronization and multimodal data fusion platform, FL-EMUTEM (Fuzzy Logic Multimodal Environment for Medical Remote Monitoring). This platform incorporates algorithms that process each modality and providing a technique of multimodal data fusion which can ensures a pervasive in-home health monitoring for elderly people based on fuzzy logic.The originality of this thesis which is the combination of various modalities in the home, about its inhabitant and their surroundings, will constitute an interesting benefit and impact for the elderly person suffering from loneliness. This work complements the stationary smart home environment in bringing to bear its capability for integrative continuous observation and detection of critical situations
APA, Harvard, Vancouver, ISO, and other styles
3

Vielzeuf, Valentin. "Apprentissage neuronal profond pour l'analyse de contenus multimodaux et temporels." Thesis, Normandie, 2019. http://www.theses.fr/2019NORMC229/document.

Full text
Abstract:
Notre perception est par nature multimodale, i.e. fait appel à plusieurs de nos sens. Pour résoudre certaines tâches, il est donc pertinent d’utiliser différentes modalités, telles que le son ou l’image.Cette thèse s’intéresse à cette notion dans le cadre de l’apprentissage neuronal profond. Pour cela, elle cherche à répondre à une problématique en particulier : comment fusionner les différentes modalités au sein d’un réseau de neurones ?Nous proposons tout d’abord d’étudier un problème d’application concret : la reconnaissance automatique des émotions dans des contenus audio-visuels.Cela nous conduit à différentes considérations concernant la modélisation des émotions et plus particulièrement des expressions faciales. Nous proposons ainsi une analyse des représentations de l’expression faciale apprises par un réseau de neurones profonds.De plus, cela permet d’observer que chaque problème multimodal semble nécessiter l’utilisation d’une stratégie de fusion différente.C’est pourquoi nous proposons et validons ensuite deux méthodes pour obtenir automatiquement une architecture neuronale de fusion efficace pour un problème multimodal donné, la première se basant sur un modèle central de fusion et ayant pour visée de conserver une certaine interprétation de la stratégie de fusion adoptée, tandis que la seconde adapte une méthode de recherche d'architecture neuronale au cas de la fusion, explorant un plus grand nombre de stratégies et atteignant ainsi de meilleures performances.Enfin, nous nous intéressons à une vision multimodale du transfert de connaissances. En effet, nous détaillons une méthode non traditionnelle pour effectuer un transfert de connaissances à partir de plusieurs sources, i.e. plusieurs modèles pré-entraînés. Pour cela, une représentation neuronale plus générale est obtenue à partir d’un modèle unique, qui rassemble la connaissance contenue dans les modèles pré-entraînés et conduit à des performances à l'état de l'art sur une variété de tâches d'analyse de visages
Our perception is by nature multimodal, i.e. it appeals to many of our senses. To solve certain tasks, it is therefore relevant to use different modalities, such as sound or image.This thesis focuses on this notion in the context of deep learning. For this, it seeks to answer a particular problem: how to merge the different modalities within a deep neural network?We first propose to study a problem of concrete application: the automatic recognition of emotion in audio-visual contents.This leads us to different considerations concerning the modeling of emotions and more particularly of facial expressions. We thus propose an analysis of representations of facial expression learned by a deep neural network.In addition, we observe that each multimodal problem appears to require the use of a different merge strategy.This is why we propose and validate two methods to automatically obtain an efficient fusion neural architecture for a given multimodal problem, the first one being based on a central fusion network and aimed at preserving an easy interpretation of the adopted fusion strategy. While the second adapts a method of neural architecture search in the case of multimodal fusion, exploring a greater number of strategies and therefore achieving better performance.Finally, we are interested in a multimodal view of knowledge transfer. Indeed, we detail a non-traditional method to transfer knowledge from several sources, i.e. from several pre-trained models. For that, a more general neural representation is obtained from a single model, which brings together the knowledge contained in the pre-trained models and leads to state-of-the-art performances on a variety of facial analysis tasks
APA, Harvard, Vancouver, ISO, and other styles
4

Lazarescu, Mihai M. "Incremental learning for querying multimodal symbolic data." Thesis, Curtin University, 2000. http://hdl.handle.net/20.500.11937/1660.

Full text
Abstract:
In this thesis we present an incremental learning algorithm for learning and classifying the pattern of movement of multiple objects in a dynamic scene. The method that we describe is based on symbolic representations of the patterns. The typical representation has a spatial component that describes the relationships of the objects and a temporal component that describes the ordering of the actions of the objects in the scene. The incremental learning algorithm (ILF) uses evidence based forgetting, generates compact concept structures and can track concept drift.We also present two novel algorithms that combine incremental learning and image analysis. The first algorithm is used in an American Football application and shows how natural language parsing can be combined with image processing and expert background knowledge to address the difficult problem of classifying and learning American Football plays. We present in detail the model developed to representAmerican Football plays, the parser used to process the transcript of the American Football commentary and the algorithms developed to label the players and classify the queries. The second algorithm is used in a cricket application. It combines incremental machine learning and camera motion estimation to classify and learn common cricket shots. We describe the method used to extract and convert the camera motion parameter values to symbolic form and the processing involved in learning the shots.Finally, we explore the issues that arise from combining incremental learning with incremental recognition. Two methods that combine incremental recognition and incremental learning are presented along with a comparison between the algorithms.
APA, Harvard, Vancouver, ISO, and other styles
5

Lazarescu, Mihai M. "Incremental learning for querying multimodal symbolic data." Curtin University of Technology, School of Computing, 2000. http://espace.library.curtin.edu.au:80/R/?func=dbin-jump-full&object_id=10010.

Full text
Abstract:
In this thesis we present an incremental learning algorithm for learning and classifying the pattern of movement of multiple objects in a dynamic scene. The method that we describe is based on symbolic representations of the patterns. The typical representation has a spatial component that describes the relationships of the objects and a temporal component that describes the ordering of the actions of the objects in the scene. The incremental learning algorithm (ILF) uses evidence based forgetting, generates compact concept structures and can track concept drift.We also present two novel algorithms that combine incremental learning and image analysis. The first algorithm is used in an American Football application and shows how natural language parsing can be combined with image processing and expert background knowledge to address the difficult problem of classifying and learning American Football plays. We present in detail the model developed to representAmerican Football plays, the parser used to process the transcript of the American Football commentary and the algorithms developed to label the players and classify the queries. The second algorithm is used in a cricket application. It combines incremental machine learning and camera motion estimation to classify and learn common cricket shots. We describe the method used to extract and convert the camera motion parameter values to symbolic form and the processing involved in learning the shots.Finally, we explore the issues that arise from combining incremental learning with incremental recognition. Two methods that combine incremental recognition and incremental learning are presented along with a comparison between the algorithms.
APA, Harvard, Vancouver, ISO, and other styles
6

DA, CRUZ GARCIA NUNO RICARDO. "Learning with Privileged Information using Multimodal Data." Doctoral thesis, Università degli studi di Genova, 2020. http://hdl.handle.net/11567/997636.

Full text
Abstract:
Computer vision is the science related to teaching machines to see and understand digital images or videos. During the last decade, computer vision has seen tremendous progress on perception tasks such as object detection, semantic segmentation, and video action recognition, which lead to the development and improvements of important industrial applications such as self-driving cars and medical image analysis. These advances are mainly due to fast computation offered by GPUs, the development of high capacity models such as deep neural networks, and the availability of large datasets, often composed by a variety of modalities. In this thesis, we explore how multimodal data can be used to train deep convolutional neural networks. Humans perceive the world through multiple senses, and reason over the multimodal space of stimuli to act and understand the environment. One way to improve the perception capabilities of deep learning methods is to use different modalities as input, as it offers different and complementary information about the scene. Recent multimodal datasets for computer vision tasks include modalities such as depth maps, infrared, skeleton coordinates, and others, besides the traditional RGB. This thesis investigates deep learning systems that learn from multiple visual modalities. In particular, we are interested in a very practical scenario in which an input modality is missing at test time. The question we address is the following: how can we take advantage of multimodal datasets for training our model, knowing that, at test time, a modality might be missing or too noisy? The case of having access to more information at training time than at test time is referred to as learning using privileged information. In this work, we develop methods to address this challenge, with special focus on the tasks of action and object recognition, and on the modalities of depth, optical flow, and RGB, that we use for inference at test time. This thesis advances the art of multimodal learning in three different ways. First, we develop a deep learning method for video classification that is trained on RGB and depth data, and is able to hallucinate depth features and predictions at test time. Second, we build on this method and propose a more generic mechanism based on adversarial learning to learn to mimic the predictions originated by the depth modality, and is able to automatically switch from true depth features to generated depth features in case of a noisy sensor. Third, we develop a method that learns a single network trained on RGB data, that is enriched with additional supervision information from other modalities such as depth and optical flow at training time, and that outperforms an ensemble of networks trained independently on these modalities.
APA, Harvard, Vancouver, ISO, and other styles
7

Xin, Bowen. "Multimodal Data Fusion and Quantitative Analysis for Medical Applications." Thesis, The University of Sydney, 2021. https://hdl.handle.net/2123/26678.

Full text
Abstract:
Medical big data is not only enormous in its size, but also heterogeneous and complex in its data structure, which makes conventional systems or algorithms difficult to process. These heterogeneous medical data include imaging data (e.g., Positron Emission Tomography (PET), Computerized Tomography (CT), Magnetic Resonance Imaging (MRI)), and non-imaging data (e.g., laboratory biomarkers, electronic medical records, and hand-written doctor notes). Multimodal data fusion is an emerging vital field to address this urgent challenge, aiming to process and analyze the complex, diverse and heterogeneous multimodal data. The fusion algorithms bring great potential in medical data analysis, by 1) taking advantage of complementary information from different sources (such as functional-structural complementarity of PET/CT images) and 2) exploiting consensus information that reflects the intrinsic essence (such as the genetic essence underlying medical imaging and clinical symptoms). Thus, multimodal data fusion benefits a wide range of quantitative medical applications, including personalized patient care, more optimal medical operation plan, and preventive public health. Though there has been extensive research on computational approaches for multimodal fusion, there are three major challenges of multimodal data fusion in quantitative medical applications, which are summarized as feature-level fusion, information-level fusion and knowledge-level fusion: • Feature-level fusion. The first challenge is to mine multimodal biomarkers from high-dimensional small-sample multimodal medical datasets, which hinders the effective discovery of informative multimodal biomarkers. Specifically, efficient dimension reduction algorithms are required to alleviate "curse of dimensionality" problem and address the criteria for discovering interpretable, relevant, non-redundant and generalizable multimodal biomarkers. • Information-level fusion. The second challenge is to exploit and interpret inter-modal and intra-modal information for precise clinical decisions. Although radiomics and multi-branch deep learning have been used for implicit information fusion guided with supervision of the labels, there is a lack of methods to explicitly explore inter-modal relationships in medical applications. Unsupervised multimodal learning is able to mine inter-modal relationship as well as reduce the usage of labor-intensive data and explore potential undiscovered biomarkers; however, mining discriminative information without label supervision is an upcoming challenge. Furthermore, the interpretation of complex non-linear cross-modal associations, especially in deep multimodal learning, is another critical challenge in information-level fusion, which hinders the exploration of multimodal interaction in disease mechanism. • Knowledge-level fusion. The third challenge is quantitative knowledge distillation from multi-focus regions on medical imaging. Although characterizing imaging features from single lesions using either feature engineering or deep learning methods have been investigated in recent years, both methods neglect the importance of inter-region spatial relationships. Thus, a topological profiling tool for multi-focus regions is in high demand, which is yet missing in current feature engineering and deep learning methods. Furthermore, incorporating domain knowledge with distilled knowledge from multi-focus regions is another challenge in knowledge-level fusion. To address the three challenges in multimodal data fusion, this thesis provides a multi-level fusion framework for multimodal biomarker mining, multimodal deep learning, and knowledge distillation from multi-focus regions. Specifically, our major contributions in this thesis include: • To address the challenges in feature-level fusion, we propose an Integrative Multimodal Biomarker Mining framework to select interpretable, relevant, non-redundant and generalizable multimodal biomarkers from high-dimensional small-sample imaging and non-imaging data for diagnostic and prognostic applications. The feature selection criteria including representativeness, robustness, discriminability, and non-redundancy are exploited by consensus clustering, Wilcoxon filter, sequential forward selection, and correlation analysis, respectively. SHapley Additive exPlanations (SHAP) method and nomogram are employed to further enhance feature interpretability in machine learning models. • To address the challenges in information-level fusion, we propose an Interpretable Deep Correlational Fusion framework, based on canonical correlation analysis (CCA) for 1) cohesive multimodal fusion of medical imaging and non-imaging data, and 2) interpretation of complex non-linear cross-modal associations. Specifically, two novel loss functions are proposed to optimize the discovery of informative multimodal representations in both supervised and unsupervised deep learning, by jointly learning inter-modal consensus and intra-modal discriminative information. An interpretation module is proposed to decipher the complex non-linear cross-modal association by leveraging interpretation methods in both deep learning and multimodal consensus learning. • To address the challenges in knowledge-level fusion, we proposed a Dynamic Topological Analysis framework, based on persistent homology, for knowledge distillation from inter-connected multi-focus regions in medical imaging and incorporation of domain knowledge. Different from conventional feature engineering and deep learning, our DTA framework is able to explicitly quantify inter-region topological relationships, including global-level geometric structure and community-level clusters. K-simplex Community Graph is proposed to construct the dynamic community graph for representing community-level multi-scale graph structure. The constructed dynamic graph is subsequently tracked with a novel Decomposed Persistence algorithm. Domain knowledge is incorporated into the Adaptive Community Profile, summarizing the tracked multi-scale community topology with additional customizable clinically important factors.
APA, Harvard, Vancouver, ISO, and other styles
8

POLSINELLI, MATTEO. "Modelli di Intelligenza Artificiale per l'analisi di dati da neuroimaging multimodale." Doctoral thesis, Università degli Studi dell'Aquila, 2022. http://hdl.handle.net/11697/192072.

Full text
Abstract:
Medical imaging (MI) refers to several technologies that provide images of organs and tissues of human body for diagnosis and scientific purposes. Furthermore, the technologies that allow us to capture medical images and signals are advancing rapidly, providing higher quality images of previously unmeasured biological features at decreasing costs. This has mainly occurred for highly specialized applications, such as cardiology and neurology. Artificial Intelligence (AI), which to date has largely focused on non medical applications, such as computer vision, provides to be an instrumental toolkit that will help unleash the potential of MI. In fact, the significant variability in anatomy across individuals, the lack of specificity of the imaging techniques, the unpredictability of the diseases, the weakness of the biological signals, the presence of noise and artifacts and the complexities of the underlying biology often make it impossible to derive deterministic algorithmic solutions for the problems encountered in neurology. Aim of this thesis was to develop AI models capable of carrying out quantitative, objective, accurate and reliable analyzes of imaging tools, EEG and MRI, used in neurology. Beyond the development of AI models, attention was focused on the quality of data which can be lowered by the "uncertainty" produced by the issues cited above. Further, the uncertainty affecting data was also described, discussed and addressed. Main results have been the proposal of innovative AI-based strategies for signal and image improvement through artifact reduction and data stabilization both in EEG and in MRI. This has allowed to apply EEG for weak signals recognition and interpretation (infant 3M patients), to provide effective strategies for dealing MRI variability and uncertainty in multiple sclerosis segmentation, both for single source and multiple-source MRI. According to the used evaluation criteria, the obtained results are comparable with those obtained by human experts. Future developments will regard the generalization of the proposed strategies to cope with different diseases or with different applications of MI. Particular attention will be paid to the optimization of the models and to understand the processes underlying their behavior. To this aim, specific strategies for checking the deep structures of the proposed architectures will be studied. In this way, besides model optimization, it would be possible to get the functional relationships among the features generating from the model and use them to improve human knowledge (a sort of inverse transfer learning).
APA, Harvard, Vancouver, ISO, and other styles
9

Khan, Mohd Tauheed. "Multimodal Data Fusion Using Voice and Electromyography Data for Robotic Control." University of Toledo / OhioLINK, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=toledo156440368925597.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Oztarak, Hakan. "Structural And Event Based Multimodal Video Data Modeling." Master's thesis, METU, 2005. http://etd.lib.metu.edu.tr/upload/12606919/index.pdf.

Full text
Abstract:
Investments on multimedia technology enable us to store many more reflections of the real world in digital world as videos. By recording videos about real world entities, we carry a lot of information to the digital world directly. In order to store and efficiently query this information, a video database system (VDBS) is necessary. In this thesis work, we propose a structural, event based and multimodal (SEBM) video data model for VDBSs. SEBM video data model supports three different modalities that are visual, auditory and textual modalities and we propose that we can dissolve these three modalities with a single SEBM video data model. This proposal is supported by the interpretation of the video data by human. Hence we can answer the content based, spatio-temporal and fuzzy queries of the user more easily, since we store the video data as the way that s/he interprets the real world data. We follow divide and conquer technique when answering very complicated queries. We have implemented the SEBM video data model in a Java based system that uses XML for representing the SEBM data model and Berkeley XML DBMS for storing the data based on the SEBM prototype system.
APA, Harvard, Vancouver, ISO, and other styles
11

McLaughlin, N. R. "Robust multimodal person identification given limited training data." Thesis, Queen's University Belfast, 2013. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.579747.

Full text
Abstract:
Abstract This thesis presents a novel method of audio-visual fusion, known as multi- modal optimal feature fusion (MOFF), for person identification where both the speech and facial modalities may be corrupted, and there is a lack of prior knowl- edge about the corruption. Furthermore, it is assumed there is a limited amount of training data for each modality (e.g., a short training speech segment and a single training facial image for each person). A new multimodal feature rep- resentation and a modified cosine similarity are introduced for combining and comparing bimodal features with limited training data, as well as vastly differing data rates and feature sizes. Similarity-based optimal feature selection and multi- condition training are used to reduce the mismatch between training and testing, thereby making the system robust to unknown bimodal corruption. Low-level feature fusion is performed using optimal feature selection, which automatically changes the weighting given to each modality based on the level of corruption. The framework for robust person identification is also applied to noise robust speaker identification, given very limited training data. Experiments have been carried out on a bimodal data set created from the SPIDRE speaker recogni- tion database and AR face recognition database, with variable noise corruption of speech and occlusion in the face images. Combining both modalities using MOFF, leads to significantly improved identification accuracy compared to the component unimodal systems, even with simultaneous corruption of both modal- ities. A novel piecewise-constant illumination model (PCIlVI) is then introduced for illumination invariant facial recognition. This method can be used given a single training facial image for each person, and assuming no prior knowledge of the illumination conditions of both the training and testing images. Small areas of the face are represented using magnitude Fourier features, which takes advan- tage of the shift-invariance of the magnitude Fourier representation, to increase robustness to small misalignment errors and small facial expression changes. Fi- nally, cosine similarity is used as an illumination invariant similarity measure, to compare small facial areas. Experiments have been carried out on the YaleB, ex- tended YaleB and eMU-PIE facial illumination databases. Facial identification accuracy using PCIlVI is comparable to or exceeds that of the literature.
APA, Harvard, Vancouver, ISO, and other styles
12

Käshammer, Philipp Florian. "A Semantic Interpreter for Multimodal and Multirobot Data." Thesis, Luleå tekniska universitet, Institutionen för system- och rymdteknik, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-36896.

Full text
Abstract:
Huge natural disaster events can be so devastating that they often overwhelm human rescuers and yet, they seem to occur more often. The TRADR (Long-Term Human-Robot Teaming for Robot Assisted Disaster Response) research project aims at developing methodology for heterogeneous teams composed of human rescuers as well as ground and aerial robots. While the robots swarm the disaster sites, equipped with advanced sensors, they collect a huge amount row-data that cannot be processed efficiently by humans. Therefore, in the frame of the here presented work, a semantic interpreter has been developed that crawls through the raw data, using state of the art object detection algorithms to identify victim targets and extracts all kinds of information that is relevant for rescuers to plan their missions. Subsequently, this information is restructured by a reasoning process and then stored into a high-level database that can be queried accordingly and ensures data constancy.
APA, Harvard, Vancouver, ISO, and other styles
13

Rao, Dushyant. "Multimodal learning from visual and remotely sensed data." Thesis, The University of Sydney, 2015. http://hdl.handle.net/2123/15535.

Full text
Abstract:
Autonomous vehicles are often deployed to perform exploration and monitoring missions in unseen environments. In such applications, there is often a compromise between the information richness and the acquisition cost of different sensor modalities. Visual data is usually very information-rich, but requires in-situ acquisition with the robot. In contrast, remotely sensed data has a larger range and footprint, and may be available prior to a mission. In order to effectively and efficiently explore and monitor the environment, it is critical to make use of all of the sensory information available to the robot. One important application is the use of an Autonomous Underwater Vehicle (AUV) to survey the ocean floor. AUVs can take high resolution in-situ photographs of the sea floor, which can be used to classify different regions into various habitat classes that summarise the observed physical and biological properties. This is known as benthic habitat mapping. However, since AUVs can only image a tiny fraction of the ocean floor, habitat mapping is usually performed with remotely sensed bathymetry (ocean depth) data, obtained from shipborne multibeam sonar. With the recent surge in unsupervised feature learning and deep learning techniques, a number of previous techniques have investigated the concept of multimodal learning: capturing the relationship between different sensor modalities in order to perform classification and other inference tasks. This thesis proposes related techniques for visual and remotely sensed data, applied to the task of autonomous exploration and monitoring with an AUV. Doing so enables more accurate classification of the benthic environment, and also assists autonomous survey planning. The first contribution of this thesis is to apply unsupervised feature learning techniques to marine data. The proposed techniques are used to extract features from image and bathymetric data separately, and the performance is compared to that with more traditionally used features for each sensor modality. The second contribution is the development of a multimodal learning architecture that captures the relationship between the two modalities. The model is robust to missing modalities, which means it can extract better features for large-scale benthic habitat mapping, where only bathymetry is available. The model is used to perform classification with various combinations of modalities, demonstrating that multimodal learning provides a large performance improvement over the baseline case. The third contribution is an extension of the standard learning architecture using a gated feature learning model, which enables the model to better capture the ‘one-to-many’ relationship between visual and bathymetric data. This opens up further inference capabilities, with the ability to predict visual features from bathymetric data, which allows image-based queries. Such queries are useful for AUV survey planning, especially when supervised labels are unavailable. The final contribution is the novel derivation of a number of information-theoretic measures to aid survey planning. The proposed measures predict the utility of unobserved areas, in terms of the amount of expected additional visual information. As such, they are able to produce utility maps over a large region that can be used by the AUV to determine the most informative locations from a set of candidate missions. The models proposed in this thesis are validated through extensive experiments on real marine data. Furthermore, the introduced techniques have applications in various other areas within robotics. As such, this thesis concludes with a discussion on the broader implications of these contributions, and the future research directions that arise as a result of this work.
APA, Harvard, Vancouver, ISO, and other styles
14

Nourbakhsh, Nargess. "Multimodal Physiological Cognitive Load Measurement." Thesis, The University of Sydney, 2015. http://hdl.handle.net/2123/14294.

Full text
Abstract:
Monitoring users’ cognitive load in real-time allows the system to adjust its interface and improve the interaction experience and user performance. Physiological signals are relatively reliable, real-time measures of cognitive load. Measurement robustness can be improved by taking account of confounding factors, and multimodality has the potential to enhance mental load prediction. This thesis investigates cognitive load measurement by means of physiological data and machine learning techniques. Skin response and eye blink are economical, conveniently-captured physiological measures that were studied. Multiple datasets were used to increase the reliability of the results which confirmed that the explored features can significantly measure different cognitive load levels. Confounding factors can distort cognitive load measurement results. Emotional fluctuations have profound impacts on physiological signals. Therefore to examine the robustness of the explored features, they were evaluated for cognitive load measurement with affective interference in different datasets. The results showed that they can measure multiple cognitive load levels with high accuracy even under emotional changes. Different modalities can impart complementary information. Hence we tried to improve the accuracy by means of multimodal cognitive load measurement. Two fusion techniques were used and different combinations of classifiers and features were examined. Multimodality proved to improve the cognitive load classification accuracy and the studied feature fusions performed well both in the absence and presence of affective stimuli. This thesis proposes frameworks for monitoring cognitive load using physiological data and machine learning techniques. The system was tested during affective fluctuations and modality fusion was performed. The outcomes of this research show that the explored features and methods could be used as means for objective, robust, accurate cognitive load measurement.
APA, Harvard, Vancouver, ISO, and other styles
15

He, Linbo. "Improving 3D Point Cloud Segmentation Using Multimodal Fusion of Projected 2D Imagery Data : Improving 3D Point Cloud Segmentation Using Multimodal Fusion of Projected 2D Imagery Data." Thesis, Linköpings universitet, Datorseende, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-157705.

Full text
Abstract:
Semantic segmentation is a key approach to comprehensive image data analysis. It can be applied to analyze 2D images, videos, and even point clouds that contain 3D data points. On the first two problems, CNNs have achieved remarkable progress, but on point cloud segmentation, the results are less satisfactory due to challenges such as limited memory resource and difficulties in 3D point annotation. One of the research studies carried out by the Computer Vision Lab at Linköping University was aiming to ease the semantic segmentation of 3D point cloud. The idea is that by first projecting 3D data points to 2D space and then focusing only on the analysis of 2D images, we can reduce the overall workload for the segmentation process as well as exploit the existing well-developed 2D semantic segmentation techniques. In order to improve the performance of CNNs for 2D semantic segmentation, the study has used input data derived from different modalities. However, how different modalities can be optimally fused is still an open question. Based on the above-mentioned study, this thesis aims to improve the multistream framework architecture. More concretely, we investigate how different singlestream architectures impact the multistream framework with a given fusion method, and how different fusion methods contribute to the overall performance of a given multistream framework. As a result, our proposed fusion architecture outperformed all the investigated traditional fusion methods. Along with the best singlestream candidate and few additional training techniques, our final proposed multistream framework obtained a relative gain of 7.3\% mIoU compared to the baseline on the semantic3D point cloud test set, increasing the ranking from 12th to 5th position on the benchmark leaderboard.
APA, Harvard, Vancouver, ISO, and other styles
16

Sperandeo, Marco. "Smart mobility: percorsi multimodali in ambiente urbano." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2018. http://amslaurea.unibo.it/16178/.

Full text
Abstract:
La tesi tratta la smart mobility e i suoi effetti in ambito urbanistico, sociale ed economico. Spiega poi come wayfinding e open data siano elementi importanti e necessari per lo sviluppo dei trasporti multimodali. Svolge poi un'analisi di due software che offrono un servizio di travel planner ed infine li confronta.
APA, Harvard, Vancouver, ISO, and other styles
17

Han, Bote. "The Multimodal Interaction through the Design of Data Glove." Thesis, Université d'Ottawa / University of Ottawa, 2015. http://hdl.handle.net/10393/32529.

Full text
Abstract:
In this thesis, we propose and present a multimodal interaction system that can provide a natural way for human-computer interaction. The core idea of this system is to help users to interact with the machine naturally by recognizing various gestures from the user from a wearable device. To achieve this goal, we have implemented a system including both hardware solution and gesture recognizing approaches. For the hardware solution, we designed and implemented a data glove based interaction device with multiple kinds of sensors to detect finger formations, touch commands and hand postures. We also modified and implemented two gesture recognizing approach based on support vector machine (SVM) as well as the lookup table. The detailed design and information is presented in this thesis. In the end, the system achieves supporting over 30 kinds of touch commands, 18 kinds of finger formation, and 10 kinds of hand postures as well as the combination of finger formation and hand posture with the recognition rate of 86.67% as well as the accurate touch command detection. We also evaluated the system from the subjective user experience.
APA, Harvard, Vancouver, ISO, and other styles
18

Sun, Feng-Tso. "Nonparametric Discovery of Human Behavior Patterns from Multimodal Data." Research Showcase @ CMU, 2014. http://repository.cmu.edu/dissertations/359.

Full text
Abstract:
Recent advances in sensor technologies and the growing interest in context- aware applications, such as targeted advertising and location-based services, have led to a demand for understanding human behavior patterns from sensor data. People engage in routine behaviors. Automatic routine discovery goes beyond low-level activity recognition such as sitting or standing and analyzes human behaviors at a higher level (e.g., commuting to work). The goal of the research presented in this thesis is to automatically discover high-level semantic human routines from low-level sensor streams. One recent line of research is to mine human routines from sensor data using parametric topic models. The main shortcoming of parametric models is that they assume a fixed, pre-specified parameter regardless of the data. Choosing an appropriate parameter usually requires an inefficient trial-and-error model selection process. Furthermore, it is even more difficult to find optimal parameter values in advance for personalized applications. The research presented in this thesis offers a novel nonparametric framework for human routine discovery that can infer high-level routines without knowing the number of latent low-level activities beforehand. More specifically, the frame-work automatically finds the size of the low-level feature vocabulary from sensor feature vectors at the vocabulary extraction phase. At the routine discovery phase, the framework further automatically selects the appropriate number of latent low-level activities and discovers latent routines. Moreover, we propose a new generative graphical model to incorporate multimodal sensor streams for the human activity discovery task. The hypothesis and approaches presented in this thesis are evaluated on public datasets in two routine domains: two daily-activity datasets and a transportation mode dataset. Experimental results show that our nonparametric framework can automatically learn the appropriate model parameters from multimodal sensor data without any form of manual model selection procedure and can outperform traditional parametric approaches for human routine discovery tasks.
APA, Harvard, Vancouver, ISO, and other styles
19

Damoni, Arben. "Multimodal segmentation for data mining applications in multimedia engineering." Thesis, London South Bank University, 2012. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.631732.

Full text
Abstract:
This project describes a novel approach to the development of a multimodal video segmentation system for the analysis of multimedia data. The current practices of multimedia data analysis rely either solely on one of the video and audio components or on the presence of both together. The proposed approach makes use of both the video and audio inputs in parallel, complementing each other during the video processing stage, towards optimising both the accuracy and speed of the method. Unlike in the other commonly established methods, the video analysis here is carried out using both the luminance and the chrominance values of the colour images, instead of relying on either of them. The approach considered in the proposed method of video cut detection primarily uses a modified luminance based histogram analysis algorithm, supported by the additional sub-sampling and median filtering options. They improve the efficiency of the method through enhancing its speed and the accuracy of detection respectively. The algorithm mentioned above uses a progressively varying threshold for indicating a significant variation in the measurement of successive histograms for a window length of 2 image frames. The method worked successfully for the videos with varying rates and sizes of the frames that have been under investigation. Because of the degrading effect of chrominance histogram analysis on the processing speed its use is kept to a minimum. This is restricted only to verify the existence of possible cuts, failed to be identified by the luminance analysis. The indication of such cuts could be obtained through audio classification analysis.
APA, Harvard, Vancouver, ISO, and other styles
20

Balakrishnan, Arjun. "Integrity Analysis of Data Sources in Multimodal Localization System." Thesis, université Paris-Saclay, 2020. http://www.theses.fr/2020UPASG060.

Full text
Abstract:
Les véhicules intelligents sont un élément clé pour des systèmes de transport plus sûrs, efficaces et accessibles à travers le monde. En raison de la multitude de sources de données et de processus associés aux véhicules intelligents, la fiabilité de l'ensemble du système dépend fortement de la possibilité d'erreurs ou de mauvaises performances observées dans ses composants. Dans notre travail, nous nous intéressons à la tâche critique de localisation des véhicules intelligents et relevons les défis de la surveillance de l'intégrité des sources de données utilisées dans la localisation. La contribution clé de notre recherche est la proposition d'un nouveau protocole d'intégrité en combinant les concepts d'intégrité des systèmes d'information et les concepts d'intégrité existants dans les Systèmes de Transport Intelligents (STI). Un cadre de surveillance de l'intégrité basé sur le protocole d'intégrité proposé qui peut gérer les problèmes de localisation multimodale est développé. Dans la première étape, une preuve de concept pour ce cadre est développée sur la base d'une estimation de cohérence croisée des sources de données à l'aide de modèles polynomiaux. Sur la base des observations de la première étape, une représentation des données «Feature Grid» est proposée dans la deuxième étape et un prototype généralisé pour le cadre est mis en œuvre. Le cadre est testé sur les autoroutes ainsi que dans des scénarios urbains complexes pour démontrer que le cadre proposé est capable de fournir des estimations d'intégrité continue des sources de données multimodales utilisées dans la localisation intelligente des véhicules
Intelligent vehicles are a key component in humanity’s vision for safer, efficient, and accessible transportation systems across the world. Due to the multitude of data sources and processes associated with Intelligent vehicles, the reliability of the total system is greatly dependent on the possibility of errors or poor performances observed in its components. In our work, we focus on the critical task of localization of intelligent vehicles and address the challenges in monitoring the integrity of data sources used in localization. The primary contribution of our research is the proposition of a novel protocol for integrity by combining integrity concepts from information systems with the existing integrity concepts in the field of Intelligent Transport Systems (ITS). An integrity monitoring framework based on the theorized integrity protocol that can handle multimodal localization problems is formalized. As the first step, a proof of concept for this framework is developed based on cross-consistency estimation of data sources using polynomial models. Based on the observations from the first step, a 'Feature Grid' data representation is proposed in the second step and a generalized prototype for the framework is implemented. The framework is tested in highways as well as complex urban scenarios to demonstrate that the proposed framework is capable of providing continuous integrity estimates of multimodal data sources used in intelligent vehicle localization
APA, Harvard, Vancouver, ISO, and other styles
21

Spechler, Philip. "Predictive Modeling of Adolescent Cannabis Use From Multimodal Data." ScholarWorks @ UVM, 2017. http://scholarworks.uvm.edu/graddis/690.

Full text
Abstract:
Predicting teenage drug use is key to understanding the etiology of substance abuse. However, classic predictive modeling procedures are prone to overfitting and fail to generalize to independent observations. To mitigate these concerns, cross-validated logistic regression with elastic-net regularization was used to predict cannabis use by age 16 from a large sample of fourteen year olds (N=1,319). High-dimensional data (p = 2,413) including parent and child psychometric data, child structural and functional MRI data, and genetic data (candidate single-nucleotide polymorphisms, "SNPs") collected at age 14 were used to predict the initiation of cannabis use (minimum six occasions) by age 16. Analyses were conducted separately for males and females to uncover sex-specific predictive profiles. The performance of the predictive models were assessed using the area under the receiver-operating characteristic curve ("ROC AUC"). Final models returned high predictive performance (generalization mean ROC AUCmales=.71, mean ROC AUCfemales=.81) and contained psychometric features common to both sexes. These common psychometric predictors included greater stressful life events, novelty-seeking personality traits of both the parent and child, and parental cannabis use. In contrast, males exhibited distinct functional neurobiological predictors related to a response- inhibition fMRI task, whereas females exhibited distinct neurobiological predictors related to a social processing fMRI task. Furthermore, the brain predictors exhibited sex- specific effects as the brain predictors of cannabis use for one sex failed to predict cannabis use for the opposite sex. These sex-specific brain predictors also exhibited drug- specific effects as they failed to predict binge-drinking by age 16 in an independent sample of youths. When collapsed across sex, a gene-specific analysis suggested that opioid receptor genetic variation also predicted cannabis use by age 16. Two SNPs on the gene coding for the primary mu-opioid receptor exhibited genetic risk effects, while one SNP on the gene coding for the primary delta-opioid receptor exhibited genetic protective effects. Taken together, these results demonstrate that adolescent cannabis use is reliably predicted in males and females from shared and unique biobehavioral features. These analyses also underscore the need for refined predictive modeling procedures as well as sex-specific inquiries into the etiology of substance abuse. The sex-specific risk-profiles uncovered from these analyses might inform potential etiological mechanisms contributing to substance abuse in adolescence as all predictors were measured prior to the onset of cannabis use.
APA, Harvard, Vancouver, ISO, and other styles
22

Nobis, Claudia. "Multimodale Vielfalt." Doctoral thesis, Humboldt-Universität zu Berlin, Mathematisch-Naturwissenschaftliche Fakultät II, 2015. http://dx.doi.org/10.18452/17194.

Full text
Abstract:
Multimodalität, die Nutzung mehrerer Verkehrsmittel innerhalb eines bestimmten Zeitraums, ist ein Sammelbegriff für sehr unterschiedlich in der Alltagspraxis umgesetztes Mobilitätsverhalten. Sie wird als Gegenkonzept zur einseitigen Nutzung des privaten Autos verstanden, mit dem sich große Hoffnungen für die zukünftige Entwicklung des Verkehrs verbinden. Bisherige Arbeiten grenzen den betrachteten Personenkreis fast immer auf eine bestimmte Form multimodalen Verhaltens ein, allen voran auf die Nutzung des Autos und öffentlicher Verkehrsmittel. Ansatzpunkt der vorliegenden Arbeit ist es, die verschiedenen Facetten multimodalen Verhaltens in ihrer Gesamtheit darzustellen und zu untersuchen. Hierzu wird eine Klassifikation entwickelt, die sich aus der Modalwahl ableitet. Die Analyse des Mobilitätsverhaltens basiert auf den Daten des Deutschen Mobilitätspanels von 1999 bis 2008 und der Studie Mobilität in Deutschland aus den Jahren 2002 und 2008. In Abhängigkeit davon, welche der Verkehrsmittel MIV, ÖV und Fahrrad im Verlauf einer Woche zum Einsatz kommen, werden die Probanden einer Modalgruppe zugeordnet. Die Analyse lässt den enormen Facettenreichtum multimodalen Verhaltens erkennen. Generell ist multimodales Verhalten eine urbane Verhaltensweise, die v.a. den Alltag junger Personen prägt und dies in zunehmendem Maß. In Summe legen Multimodale weniger Kilometer mit dem Auto zurück als monomodale Autofahrer. Ihr CO2-Fußabdruck fällt je nach Datensatz um 20 bis 34 Prozent geringer aus als der von ausschließlichen Autofahrern. Nichtsdestotrotz nutzen viele Multimodale das Auto für einen erheblichen Anteil ihrer Wege. In Zukunft wird ausschlaggebend sein, wie sich die Verkehrsmittelanteile v.a. in Bezug auf weite Wege verändern und wie sich die gegenwärtig auf der Nachfrage- und Angebotsseite feststellbaren Veränderungen auswirken.
Multimodality, the use of several modes of transportation during a specified time period, is a general term for a wide variety of everyday mobility behaviors. It is perceived as an alternative to one-sided use of private cars, and one which has attracted great hopes for the future development of transportation. Based on the research which has been done in the past, people almost always limit themselves to a particular form of multimodal behavior, most often to use of cars and public transportation. The starting point of the present paper is to present and examine the various facets of multimodal behavior in their entirety. To this end, a method of classification will be developed which is derived from the selection of modes of transportation. The analysis of mobility behavior will be based on the data of the German Mobility Panel from 1999 to 2008 and the Mobility in Germany study from the years 2002 and 2008. Subjects will be assigned to modal groups depending on which of the modes of transportation, motorized individual traffic, public transportation and bicycle, are used in the course of a week. The analysis reveals the enormously diverse nature of multimodal behavior. In general, multimodal behavior is an urban phenomenon which is increasingly characterizing the everyday urban routine, especially for younger persons. In aggregate, multimodal persons drive fewer kilometers by car than monomodal car drivers. Their carbon footprint is 20-34 percent less than that of exclusive car drivers, depending on the data set. Nevertheless, many multimodal persons do use cars for a considerable portion of their travel needs. How the relative share of the various modes of transportation will change in the future, especially with respect to long-distance travel, and the impact of the currently observable changes in supply and demand will be decisive factors in the future.
APA, Harvard, Vancouver, ISO, and other styles
23

Fernández, Carbonell Marcos. "Automated Multimodal Emotion Recognition." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-282534.

Full text
Abstract:
Being able to read and interpret affective states plays a significant role in human society. However, this is difficult in some situations, especially when information is limited to either vocal or visual cues. Many researchers have investigated the so-called basic emotions in a supervised way. This thesis holds the results of a multimodal supervised and unsupervised study of a more realistic number of emotions. To that end, audio and video features are extracted from the GEMEP dataset employing openSMILE and OpenFace, respectively. The supervised approach includes the comparison of multiple solutions and proves that multimodal pipelines can outperform unimodal ones, even with a higher number of affective states. The unsupervised approach embraces a traditional and an exploratory method to find meaningful patterns in the multimodal dataset. It also contains an innovative procedure to better understand the output of clustering techniques.
Att kunna läsa och tolka affektiva tillstånd spelar en viktig roll i det mänskliga samhället. Detta är emellertid svårt i vissa situationer, särskilt när information är begränsad till antingen vokala eller visuella signaler. Många forskare har undersökt de så kallade grundläggande känslorna på ett övervakat sätt. Det här examensarbetet innehåller resultaten från en multimodal övervakad och oövervakad studie av ett mer realistiskt antal känslor. För detta ändamål extraheras ljud- och videoegenskaper från GEMEP-data med openSMILE respektive OpenFace. Det övervakade tillvägagångssättet inkluderar jämförelse av flera lösningar och visar att multimodala pipelines kan överträffa unimodala sådana, även med ett större antal affektiva tillstånd. Den oövervakade metoden omfattar en konservativ och en utforskande metod för att hitta meningsfulla mönster i det multimodala datat. Den innehåller också ett innovativt förfarande för att bättre förstå resultatet av klustringstekniker.
APA, Harvard, Vancouver, ISO, and other styles
24

Woodcock, Anna, and Elin Salemyr. "Kampen om kommunikationen : En kvalitativ studie av Försvarsmaktens kommunikation och uppdrag." Thesis, Uppsala universitet, Medier och kommunikation, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-434366.

Full text
Abstract:
During the last decade, the Swedish Armed Forces have been struggling to achieve recruitment goals and to retain military personnel. To attract more people to join the agency, and to increase the knowledge about the agency’s role and function in society, the Swedish Armed Forces continuously run campaigns. Furthermore, the Swedish Armed Forces is a government agency with a mission that is determined by the Swedish government through government decisions. It can therefore be argued that it is important that the campaigns portray the agency’s mission in a correct and truthful way. The purpose of this study has been to investigate how the Swedish Armed Forces portray their mission in their campaign films, and to what extent it corresponds with the agency’s official mission. The research questions are thereby: (1) How is the mission of the Swedish Armed Forces portrayed in two campaign films from 2018 and 2020? (2) To what extent does the Swedish Armed Forces' communication about the agency's mission correspond with their official mission presented in the government decision? (3) Based on identified semiotic resources, and with a neo-institutional perspective on strategic communication, what type of communication has been allowed to take place in the Swedish Armed Forces' campaign films? To answer the research questions, a qualitative data analysis was conducted on a government decision constituting the overall direction of the Swedish Armed Forces, and a multimodal critical discourse analysis was conducted on two campaign films produced by the agency The results were compared, and analyzed through the lens of a neo-institutional perspective on strategic communication. In summary, it can be stated that the study finds that the Swedish Armed Forces, in the two campaign films, portray their mission in a way that greatly corresponds with their official mission as presented in the government decision.
APA, Harvard, Vancouver, ISO, and other styles
25

Kulikova, Sofya. "Integration of multimodal imaging data for investigation of brain development." Thesis, Sorbonne Paris Cité, 2015. http://www.theses.fr/2015PA05T021/document.

Full text
Abstract:
L’Imagerie par résonance magnétique (IRM) est un outil fondamental pour l’exploration in vivo du développement du cerveau chez le fœtus, le bébé et l’enfant. Elle fournit plusieurs paramètres quantitatifs qui reflètent les changements des propriétés tissulaires au cours du développement en fonction de différents processus de maturation. Cependant, l’évaluation fiable de la maturation de la substance blanche est encore une question ouverte: d'une part, aucun de ces paramètres ne peut décrire toute la complexité des changements sous-jacents; d'autre part, aucun d'eux n’est spécifique d’un processus de développement ou d’une propriété tissulaire particulière. L’implémentation d’approches multiparamétriques combinant les informations complémentaires issues des différents paramètres IRM devrait permettre d’améliorer notre compréhension du développement du cerveau. Dans ce travail de thèse, je présente deux exemples de telles approches et montre leur pertinence pour l'étude de la maturation des faisceaux de substance blanche. La première approche fournit une mesure globale de la maturation basée sur la distance de Mahalanobis calculée à partir des différents paramètres IRM (temps de relaxation T1 et T2, diffusivités longitudinale et transverse du tenseur de diffusion DTI) chez des nourrissons (âgés de 3 à 21 semaines) et des adultes. Cette approche offre une meilleure description de l’asynchronisme de maturation à travers les différents faisceaux que les approches uniparamétriques. De plus, elle permet d'estimer les délais relatifs de maturation entre faisceaux. La seconde approche vise à quantifier la myélinisation des tissus cérébraux, en calculant la fraction de molécules d’eau liées à la myéline (MWF) en chaque voxel des images. Cette approche est basée sur un modèle tissulaire avec trois composantes ayant des caractéristiques de relaxation spécifiques, lesquelles ont été pré-calibrées sur trois jeunes adultes sains. Elle permet le calcul rapide des cartes MWF chez les nourrissons et semble bien révéler la progression de la myélinisation à l’échelle cérébrale. La robustesse de cette approche a également été étudiée en simulations. Une autre question cruciale pour l'étude du développement de la substance blanche est l'identification des faisceaux dans le cerveau des enfants. Dans ce travail de thèse, je décris également la création d'un atlas préliminaire de connectivité structurelle chez des enfants âgés de 17 à 81 mois, permettant l'extraction automatique des faisceaux à partir des données de tractographie. Cette approche a démontré sa pertinence pour l'évaluation régionale de la maturation de la substance blanche normale chez l’enfant. Pour finir, j’envisage dans la dernière partie du manuscrit les applications potentielles des différentes méthodes précédemment décrites pour l’étude fine des réseaux de substance blanche dans le cadre de deux exemples spécifiques de pathologies : les épilepsies focales et la leucodystrophie métachromatique
Magnetic Resonance Imaging (MRI) is a fundamental tool for in vivo investigation of brain development in newborns, infants and children. It provides several quantitative parameters that reflect changes in tissue properties during development depending on different undergoing maturational processes. However, reliable evaluation of the white matter maturation is still an open question: on one side, none of these parameters can describe the whole complexity of the undergoing changes; on the other side, neither of them is specific to any particular developmental process or tissue property. Developing multiparametric approaches combining complementary information from different MRI parameters is expected to improve our understanding of brain development. In this PhD work, I present two examples of such approaches and demonstrate their relevancy for investigation of maturation across different white matter bundles. The first approach provides a global measure of maturation based on the Mahalanobis distance calculated from different MRI parameters (relaxation times T1 and T2, longitudinal and transverse diffusivities from Diffusion Tensor Imaging, DTI) in infants (3-21 weeks) and adults. This approach provides a better description of the asynchronous maturation across the bundles than univariate approaches. Furthermore, it allows estimating the relative maturational delays between the bundles. The second approach aims at quantifying myelination of brain tissues by calculating Myelin Water Fraction (MWF) in each image voxel. This approach is based on a 3-component tissue model, with each model component having specific relaxation characteristics that were pre-calibrated in three healthy adult subjects. This approach allows fast computing of the MWF maps from infant data and could reveal progression of the brain myelination. The robustness of this approach was further investigated using computer simulations. Another important issue for studying white matter development in children is bundles identification. In the last part of this work I also describe creation of a preliminary atlas of white matter structural connectivity in children aged 17-81 months. This atlas allows automatic extraction of the bundles from tractography datasets. This approach demonstrated its relevance for evaluation of regional maturation of normal white matter in children. Finally, in the last part of the manuscript I describe potential future applications of the previously developed methods to investigation of the white matter in cases of two specific pathologies: focal epilepsy and metachromatic leukodystrophy
APA, Harvard, Vancouver, ISO, and other styles
26

Gunapati, Venkat Yashwanth. "CLOUD BASED DISTRIBUTED COMPUTING PLATFORM FOR MULTIMODAL ENERGY DATA STREAMS." Case Western Reserve University School of Graduate Studies / OhioLINK, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=case1399373847.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

Jayapandian, Catherine Praveena. "Cloudwave: A Cloud Computing Framework for Multimodal Electrophysiological Big Data." Case Western Reserve University School of Graduate Studies / OhioLINK, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=case1405516626.

Full text
APA, Harvard, Vancouver, ISO, and other styles
28

Diehn, Sabrina Maria. "Analysis of data from multimodal chemical characterizations of plant tissues." Doctoral thesis, Humboldt-Universität zu Berlin, 2021. http://dx.doi.org/10.18452/23065.

Full text
Abstract:
Die Vorverarbeitung und Analyse von spektrometrischen und spektroskopischen Daten von Pflanzengewebe sind in den unterschiedlichsten Forschungsbereichen wie der Pflanzenbiologie, Agrarwissenschaften und Klimaforschung von großer Bedeutung. Der Schwerpunkt dieser Arbeit liegt auf der optimierten Nutzung von Daten von Pflanzengeweben, insbesondere der Daten gewonnen durch Matrix–Assistierte Laser–Desorption–Ionisierung Massenspektrometrie, Raman-Spektroskopie und Fourier-Transform-Infrarotspektroskopie. Die Klassifizierungsfähigkeit mit diesen Methoden wird insbesondere nach Kombination der Daten untereinander und mit zusätzlichen chemischen und biologischen Informationen verglichen. Die diskutierten Beispiele befassen sich mit der Untersuchung und Einordnung innerhalb einer bestimmten Pflanzenart, beispielsweise der Unterscheidung von Proben aus unterschiedlichen Populationen, Wachstumsbedingungen oder Gewebeunterstrukturen. Die Daten wurden mit sowohl mit explorativen Werkzeugen wie der Hauptkomponentenanalyse und der hierarchischen Clusteranalyse, als auch mit Methoden des maschinellen Lernens wie die Diskriminanzanalyse oder künstliche neuronale Netzwerke umfassten. Konkret zeigen die Ergebnisse, dass die Kombination der Methoden mit zusätzlichen pflanzenbezogenen Informationen in einer Konsensus-Hauptkomponentenanalyse zu einer umfassenden Charakterisierung der Proben führt. Es werden verschiedene Strategien zur Datenvorbehandlung diskutiert, um nicht relevante spektrale Information zu reduzieren, z.B. aus Karten von Pflanzengeweben oder eingebetteten Pollenkörnern. Die Ergebnisse dieser Arbeit weisen auf die Relevanz der gezielten Nutzung spektrometrischer und spektroskopischer Daten hin und lassen sich nicht nur auf pflanzenbezogene Themen, sondern auch auf andere analytische Klassifizierungsprobleme übertragen.
The pre-processing and analysis of spectrometric and spectroscopic data of plant tissue are important in a wide variety of research areas, such as plant biology, agricultural science, and climate research. The focus of the thesis is the optimized utilization of data from plant tissues, which includes data from Matrix-Assisted-Laser Desorption/Ionization time of flight mass spectrometry, Raman spectroscopy, and Fourier transform infrared spectroscopy. The ability to attain a classification using these methods is compared, in particular after combination of the data with each other and with additional chemical and biological information. The discussed examples are concerned with the investigation and classification within a particular plant species, such as the distinction of samples from different populations, growth conditions, or tissue substructures. The data were analyzed by exploratory tools such as principal component analysis and hierarchical cluster analysis, as well as by predictive tools that included partial least square-discriminant analysis and machine learning approaches. Specifically, the results show that combination of the methods with additional plant-related information in a consensus principal component analysis leads to a comprehensive characterization of the samples. Different data pre-treatment strategies are discussed to reduce non-relevant spectral information, e.g., from maps of plant tissues or embedded pollen grains. The results in this work indicate the relevance of the targeted utilization of spectrometric and spectroscopic data and could be applied not only to plant-related topics but also to other analytical classification problems.
APA, Harvard, Vancouver, ISO, and other styles
29

Ming, Joy Carol. "#Autism Versus 299.0: Topic Model Exploration of Multimodal Autism Data." Thesis, Harvard University, 2015. http://nrs.harvard.edu/urn-3:HUL.InstRepos:14398542.

Full text
Abstract:
Though prevalence and awareness for Autism Spectrum Disorder (ASD) has steadily increased, a true understanding is hard to reach because of the behavior-based nature of the diagnosis and the heterogeneity of its manifestations. Parents and caregivers often informally discuss symptoms and behaviors they observe from their children with autism through online medical forums, contrasting the more traditional and structured text of electronic medical records collected by doctors. We modify an anchor word driven topic model algorithm originally proposed by Arora et al. (2012a) to elicit and compare the medical concept topics, or “themes” from both modes of data: the novel data set of posts from autism-specific online medical forums and electronic medical records. We present methods to extract relevant medical concepts from colloquially written forum posts through the use of choice sections of the consumer health vocabulary and other filtering techniques. In order to account for the sparsity of concept data, we propose and evaluate a more robust approach to selecting anchor words that takes into account variance and inclusivity. This approach that combines concept and anchor words selection seeds the discussion about how unstructured text can influence and expand understanding of the enigmatic disorder, autism, and how these methods can be applied to similar sources of texts to solve other problems.
APA, Harvard, Vancouver, ISO, and other styles
30

Rastgoo, Mohammad Naim. "Driver stress level detection based on multimodal measurements." Thesis, Queensland University of Technology, 2019. https://eprints.qut.edu.au/134144/1/Mohammad%20Naim%20Rastgoo%20Thesis_Redacted.pdf.

Full text
Abstract:
Successful driver performance is fundamental in preventing vehicle crashes. Stress can negatively affect driver performance and significantly increase the risk of a crash. Therefore, an in-vehicle warning system for driver stress levels is needed to continuously predict dangerous driving situations and proactively alert drivers to ensure safe and comfortable driving. As a result of the recent developments in sensing technologies and context recognition, driver stress can be detected using multimodal measurements. This thesis proposes a general framework for building a driver stress level detection system based on multimodal measurements and adopts different approaches to maximise the performance of the system.
APA, Harvard, Vancouver, ISO, and other styles
31

Böckmann, Christine, Jens Biele, Roland Neuber, and Jenny Niebsch. "Retrieval of multimodal aerosol size distribution by inversion of multiwavelength data." Universität Potsdam, 1997. http://opus.kobv.de/ubp/volltexte/2007/1436/.

Full text
Abstract:
The ill-posed problem of aerosol size distribution determination from a small number of backscatter and extinction measurements was solved successfully with a mollifier method which is advantageous since the ill-posed part is performed on exactly given quantities, the points r where n(r) is evaluated may be freely selected. A new twodimensional model for the troposphere is proposed.
APA, Harvard, Vancouver, ISO, and other styles
32

Salami, Alireza. "Decoding the complex brain : multivariate and multimodal analyses of neuroimaging data." Doctoral thesis, Umeå universitet, Institutionen för integrativ medicinsk biologi (IMB), 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-51842.

Full text
Abstract:
Functional brain images are extraordinarily rich data sets that reveal distributed brain networks engaged in a wide variety of cognitive operations. It is a substantial challenge both to create models of cognition that mimic behavior and underlying cognitive processes and to choose a suitable analytic method to identify underlying brain networks. Most of the contemporary techniques used in analyses of functional neuroimaging data are based on univariate approaches in which single image elements (i.e. voxels) are considered to be computationally independent measures. Beyond univariate methods (e.g. statistical parametric mapping), multivariate approaches, which identify a network across all regions of the brain rather than a tessellation of regions, are potentially well suited for analyses of brain imaging data. A multivariate method (e.g. partial least squares) is a computational strategy that determines time-varying distributed patterns of the brain (as a function of a cognitive task). Compared to its univariate counterparts, a multivariate approach provides greater levels of sensitivity and reflects cooperative interactions among brain regions. Thus, by considering information across more than one measuring point, additional information on brain function can be revealed. Similarly, by considering information across more than one measuring technique, the nature of underlying cognitive processes become well-understood. Cognitive processes have been investigated in conjunction with multiple neuroimaging modalities (e.g. fMRI, sMRI, EEG, DTI), whereas the typical method has been to analyze each modality separately. Accordingly, little work has been carried out to examine the relation between different modalities. Indeed, due to the interconnected nature of brain processing, it is plausible that changes in one modality locally or distally modulate changes in another modality. This thesis focuses on multivariate and multimodal methods of image analysis applied to various cognitive questions. These methods are used in order to extract features that are inaccessible using univariate / unimodal analytic approaches. To this end, I implemented multivariate partial least squares analysis in study I and II in order to identify neural commonalities and differences between the available and accessible information in memory (study I), and also between episodic encoding and episodic retrieval (study II). Study I provided evidence of a qualitative differences between availability and accessibility signals in memory by linking memory access to modality-independent brain regions, and availability in memory to elevated activity in modality-specific brain regions. Study II provided evidence in support of general and specific memory operations during encoding and retrieval by linking general processes to the joint demands on attentional, executive, and strategic processing, and a process-specific network to core episodic memory function. In study II, III, and IV, I explored whether the age-related changes/differences in one modality were driven by age-related changes/differences in another modality. To this end, study II investigated whether age-related functional differences in hippocampus during an episodic memory task could be accounted for by age-related structural differences. I found that age-related local structural deterioration could partially but not entirely account for age-related diminished hippocampal activation. In study III, I sought to explore whether age-related changes in the prefrontal and occipital cortex during a semantic memory task were driven by local and/or distal gray matter loss. I found that age-related diminished prefrontal activation was driven, at least in part, by local gray matter atrophy, whereas the age-related decline in occipital cortex was accounted for by distal gray matter atrophy. Finally, in study IV, I investigated whether white matter (WM) microstructural differences mediated age-related decline in different cognitive domains. The findings implicated WM as one source of age-related decline on tasks measuring processing speed, but they did not support the view that age-related differences in episodic memory, visuospatial ability, or fluency were strongly driven by age-related differences in white-matter pathways. Taken together, the architecture of different aspects of episodic memory (e.g. encoding vs. retrieval; availability vs. accessibility) was characterized using a multivariate partial least squares. This finding highlights usefulness of multivariate techniques in guiding cognitive theories of episodic memory. Additionally, competing theories of cognitive aging were investigated by multimodal integration of age-related changes in brain structure, function, and behavior. The structure-function relationships were specific to brain regions and cognitive domains. Finally, we urged that contemporary theories on cognitive aging need to be extended to longitudinal measures to be further validated.
APA, Harvard, Vancouver, ISO, and other styles
33

Vukotic, Verdran. "Deep Neural Architectures for Automatic Representation Learning from Multimedia Multimodal Data." Thesis, Rennes, INSA, 2017. http://www.theses.fr/2017ISAR0015/document.

Full text
Abstract:
La thèse porte sur le développement d'architectures neuronales profondes permettant d'analyser des contenus textuels ou visuels, ou la combinaison des deux. De manière générale, le travail tire parti de la capacité des réseaux de neurones à apprendre des représentations abstraites. Les principales contributions de la thèse sont les suivantes: 1) Réseaux récurrents pour la compréhension de la parole: différentes architectures de réseaux sont comparées pour cette tâche sur leurs facultés à modéliser les observations ainsi que les dépendances sur les étiquettes à prédire. 2) Prédiction d’image et de mouvement : nous proposons une architecture permettant d'apprendre une représentation d'une image représentant une action humaine afin de prédire l'évolution du mouvement dans une vidéo ; l'originalité du modèle proposé réside dans sa capacité à prédire des images à une distance arbitraire dans une vidéo. 3) Encodeurs bidirectionnels multimodaux : le résultat majeur de la thèse concerne la proposition d'un réseau bidirectionnel permettant de traduire une modalité en une autre, offrant ainsi la possibilité de représenter conjointement plusieurs modalités. L'approche été étudiée principalement en structuration de collections de vidéos, dons le cadre d'évaluations internationales où l'approche proposée s'est imposée comme l'état de l'art. 4) Réseaux adverses pour la fusion multimodale: la thèse propose d'utiliser les architectures génératives adverses pour apprendre des représentations multimodales en offrant la possibilité de visualiser les représentations dans l'espace des images
In this dissertation, the thesis that deep neural networks are suited for analysis of visual, textual and fused visual and textual content is discussed. This work evaluates the ability of deep neural networks to learn automatic multimodal representations in either unsupervised or supervised manners and brings the following main contributions:1) Recurrent neural networks for spoken language understanding (slot filling): different architectures are compared for this task with the aim of modeling both the input context and output label dependencies.2) Action prediction from single images: we propose an architecture that allow us to predict human actions from a single image. The architecture is evaluated on videos, by utilizing solely one frame as input.3) Bidirectional multimodal encoders: the main contribution of this thesis consists of neural architecture that translates from one modality to the other and conversely and offers and improved multimodal representation space where the initially disjoint representations can translated and fused. This enables for improved multimodal fusion of multiple modalities. The architecture was extensively studied an evaluated in international benchmarks within the task of video hyperlinking where it defined the state of the art today.4) Generative adversarial networks for multimodal fusion: continuing on the topic of multimodal fusion, we evaluate the possibility of using conditional generative adversarial networks to lean multimodal representations in addition to providing multimodal representations, generative adversarial networks permit to visualize the learned model directly in the image domain
APA, Harvard, Vancouver, ISO, and other styles
34

Zhu, Meng. "Cross-modal semantic-associative labelling, indexing and retrieval of multimodal data." Thesis, University of Reading, 2010. http://centaur.reading.ac.uk/24828/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

Samper, González Jorge Alberto. "Learning from multimodal data for classification and prediction of Alzheimer's disease." Electronic Thesis or Diss., Sorbonne université, 2019. http://www.theses.fr/2019SORUS361.

Full text
Abstract:
La maladie d’Alzheimer (MA) est la première cause de démence dans le monde, touchant plus de 20 millions de personnes. Son diagnostic précoce est essentiel pour assurer une prise en charge adéquate des patients ainsi que pour développer et tester de nouveaux traitements. La MA est une maladie complexe qui nécessite différentes mesures pour être caractérisée : tests cognitifs et cliniques, neuroimagerie, notamment l’imagerie par résonance magnétique (IRM) et la tomographie par émission de positons (TEP), génotypage, etc. Il y a un intérêt à explorer les capacités discriminatoires et prédictives à un stade précoce de ces différents marqueurs, qui reflètent différents aspects de la maladie et peuvent apporter des informations complémentaires. L’objectif de cette thèse de doctorat était d’évaluer le potentiel et d’intégrer différentes modalités à l’aide de méthodes d’apprentissage statistique, afin de classifier automatiquement les patients atteints de la MA et de prédire l’évolution de la maladie dès ses premiers stades. Plus précisément, nous visions à progresser vers une future application de ces approches à la pratique clinique. La thèse comprend trois études principales. La première porte sur le diagnostic différentiel entre différentes formes de démence à partir des données IRM. Cette étude a été réalisée à l’aide de données de routine clinique, ce qui a permis d’obtenir un scénario d’évaluation plus réaliste. La seconde propose un nouveau cadre pour l’évaluation reproductible des algorithmes de classification de la MA à partir des données IRM et TEP. En effet, bien que de nombreuses approches aient été proposées dans la littérature pour la classification de la MA, elles sont difficiles à comparer et à reproduire. La troisième partie est consacrée à la prédiction de l’évolution de la maladie d’Alzheimer chez les patients atteints de troubles cognitifs légers par l’intégration de données multimodales, notamment l’IRM, la TEP, des évaluations cliniques et cognitives, et le génotypage. En particulier, nous avons systématiquement évalué la valeur ajoutée de la neuroimagerie par rapport aux seules données cliniques/cognitives. Comme la neuroimagerie est plus coûteuse et moins répandue, il est important de justifier son utilisation dans les algorithmes de classification
Alzheimer's disease (AD) is the first cause of dementia worldwide, affecting over 20 million people. Its diagnosis at an early stage is essential to ensure a proper care of patients, and to develop and test novel treatments. AD is a complex disease that has to be characterized by the use of different measurements: cognitive and clinical tests, neuroimaging including magnetic resonance imaging (MRI) and positron emission tomography (PET), genotyping, etc. There is an interest in exploring the discriminative and predictive capabilities of these diverse markers, which reflect different aspects of the disease and potentially carry complementary information, from an early stage of the disease. The objective of this PhD thesis was thus to assess the potential and to integrate multiple modalities using machine learning methods, in order to automatically classify patients with AD and predict the development of the disease from the earliest stages. More specifically, we aimed to make progress toward the translation of such approaches toward clinical practice. The thesis comprises three main studies. The first one tackles the differential diagnosis between different forms of dementia from MRI data. This study was performed using clinical routine data, thereby providing a more realistic evaluation scenario. The second one proposes a new framework for reproducible evaluation of AD classification algorithms from MRI and PET data. Indeed, while numerous approaches have been proposed for AD classification in the literature, they are difficult to compare and to reproduce. The third part is devoted to the prediction of progression to AD in patients with mild cognitive impairment through the integration of multimodal data, including MRI, PET, clinical/cognitive evaluations and genotyping. In particular, we systematically assessed the added value of neuroimaging over clinical/cognitive data only. Since neuroimaging is more expensive and less widely available, this is important to justify its use as input of classification algorithms
APA, Harvard, Vancouver, ISO, and other styles
36

Bao, Guoqing. "End-to-End Machine Learning Models for Multimodal Medical Data Analysis." Thesis, The University of Sydney, 2022. https://hdl.handle.net/2123/28153.

Full text
Abstract:
The pathogenesis of infectious and severe diseases including COVID-19, metabolic disorders, and cancer can be highly complicated because it involves abnormalities in genetic, metabolic, anatomical as well as functional levels. The deteriorative changes could be quantitatively monitored on biochemical markers, genome-wide assays as well as different imaging modalities including radiographic and pathological data. Multimodal medical data, involving three common and essential diagnostic disciplines, i.e., pathology, radiography, and genomics, are increasingly utilized to unravel the complexity of the diseases. High-throughput and deep features can be extracted from different types of medical data to characterize diseases in various quantitative aspects, e.g., compactness and flatness of tumors, and heterogeneity of tissues. State-of-the-art deep learning methods including convolutional neural networks (CNNs) and Transformer have achieved impressive results in analyses of natural image, text, and voice data through an intrinsic and latent manner. However, there are many obstacles and challenges when applying existing machine learning models that initially tuned on natural image and language data to clinical practice, such as shortage of labeled data, distribution and domain discrepancy, data heterogeneity and imbalance, etc. Moreover, those methods are not designed to harness multimodal data under a unified and end-to-end learning paradigm, making them heavily relying on expert involvement and more prone to be affected by intra- and inter-observer variability. To address those limitations, in this thesis, we present novel end-to-end machine learning methods to learn fused feature representations from multimodal medical data, and perform quantitative analyses to identify significant higher-level features from raw medical data in explanation of the characteristics and outcomes of the infectious and severe diseases. • Starting from gold standard pathology images, we propose a bifocal weakly-supervised method which is able to complementarily and simultaneously capture two types of discriminative regions from both shorter and longer image tiles under a small amount of sparsely labeled data to improve recognition and cross-modality analyses of complex morphological and immunohistochemical structures in entire and adjacent multimodal histological slides. • Then, we expand our research on data collected from non-invasive approaches, we present an end-to-end multitask learning model for automated and simultaneous diagnosis and severity assessment of infectious disease which obviates the need for expert involvement, and Shift3D and Random-weighted multitask loss function are two novel algorithm components proposed to learn shift-invariant and shareable representations from fused radiographic imaging and high-throughput numerical data to accelerate model convergence, improve joint learning performance, and resist the influence of intra- and inter-observer variability. • Next, we further involve time-dimension data and invent the machine learning-based method to locate representative imaging features to tackle the problem of non-invasive diagnostic side effects, i.e., radiation, and the low-radiation and non-invasive solution can be used on progression analysis of metabolic disorders over time and evaluation of surgery-induced weight loss effects. • Lastly, we investigate genomic data given genetic disorders can lead to diverse diseases, we build a machine learning pipeline for processing genomic data and analyzing disease prognosis by incorporating statistical power, biological rationale, and machine learning algorithms as a unified prognostic feature extractor. We carried out rigorous and extensive experiments on two large public datasets and two private cohorts covering various forms of medical data, e.g., biochemical markers, genomic profiles, radiomic features, radiological and pathological imaging data. The experiments demonstrated that our proposed machine learning approaches are able to achieve better performances compared to corresponding state-of-the-art methods, and subsequently improve the diagnostic and/or prognostic workflows of infectious and severe diseases including COVID-19, metabolic disorders, and cancer.
APA, Harvard, Vancouver, ISO, and other styles
37

Faghihi, Reza. "Mise en correspondance SPECT-CT par conditions de consistance." Université Joseph Fourier (Grenoble), 2002. http://www.theses.fr/2002GRE19011.

Full text
Abstract:
L'imagerie nucléaire est un des meilleurs outils pour la connaissance des fonctionnalités des organes humains. En effet, en imagerie nucléaire, on reconstruit la distribution d'un traceur radioactif à partir de mesures externes. Les médecins utilisent l'imagerie nucléaire pour étudier de manière qualitative et, si possible quantitative, des fonctionnalités des organes des patients. Comme toutes les autres méthodes d'imagerie médicale, l'imagerie nucléaire a des limites en résolution spatiale mais aussi en précision sur la quantification de la fonction d'activité reconstruite à partir des mesures. L'imagerie par simple photon (SPECT) à l'avantage d'être assez simple à mettre en SSuvre et est largement diffusée dans les hôpitaux. Une des limitations fondamentales de l'imagerie SPECT est l'atténuation et la diffusion des rayons émis les traceurs internes partes tissus du patient qu'ils traversent. Afin de corriger précisément de l'atténuation (et de la diffusion), plusieurs équipes ont proposé des systèmes d'imagerie permettant de mesurer conjointement des données d'émission SPECT et de transmission, ces dernières permettant de reconstruire une carte d'atténuation, ensuite utilisée dans la reconstruction des images SPECT. Dans cette thèse, notre but est la recherche d'une méthode de correction de l'atténuation qui n'augmente pas le dose délivrée aux patients et qui ne nécessite pas d'appareillages complexes supplémentaires (qui souvent diminuent la faisabilité pratique et économique de la correction d'atténuation). Après un état de l'art des méthodes de correction de l'atténuation, nous décrivons la méthode de Natterer pour l'estimation d'une fonction d'atténuation eu SPECT à partir des équations de consistance que doivent vérifier les données de SPECT après correction du diffusé. En supposant que les patients qui ont un examen SPECT, ont généralement déjà eu un examen CT, nous pouvons adapter la méthode des conditions de consistance de Natterer à la mise en correspondance de données CT et de données SPECT d'un patient. Dans notre travail, nous étudions la possibilité de traduire une carte d'atténuation à l'énergie des X en une carte d'atténuation à l'énergie de la SPECT. Nous étudions numériquement' les méthodes d'optimisation pour la mise en correspondance SPECT/CT par minimisation des conditions de consistance des données SPECT, en particulier l'influence du bruit sur les données, des erreurs sur la fonction d'atténuation et des paramètres d'initialisation de l'optimisation. Les résultats que nous obtenons montrent que cette idée est faisable. Des recherches cliniques sur les résultats quantitatifs restent néanmoins nécessaires pour la valider.
APA, Harvard, Vancouver, ISO, and other styles
38

Molins, Jiménez Antonio. "Multimodal integration of EEG and MEG data using minimum ℓ₂-norm estimates." Thesis, Massachusetts Institute of Technology, 2007. http://hdl.handle.net/1721.1/40528.

Full text
Abstract:
Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2007.
Includes bibliographical references (leaves 69-74).
The aim of this thesis was to study the effects of multimodal integration of electroencephalography (EEG) and magnetoencephalography (MEG) data on the minimum ℓ₂-norm estimates of cortical current densities. We investigated analytically the effect of including EEG recordings in MEG studies versus the addition of new MEG channels. To further confirm these results, clinical datasets comprising concurrent MEG/EEG acquisitions were analyzed. Minimum ℓ₂-norm estimates were computed using MEG alone, EEG alone, and the combination of the two modalities. Localization accuracy of responses to median-nerve stimulation was evaluated to study the utility of combining MEG and EEG.
by Antonio Molins Jiménez.
S.M.
APA, Harvard, Vancouver, ISO, and other styles
39

Bießmann, Felix Verfasser], and Klaus-Robert [Akademischer Betreuer] [Müller. "Data-driven analysis for multimodal neuroimaging / Felix Bießmann. Betreuer: Klaus-Robert Müller." Berlin : Universitätsbibliothek der Technischen Universität Berlin, 2012. http://d-nb.info/1018985220/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
40

Young, J. M. "Probabilistic prediction of Alzheimer's disease from multimodal image data with Gaussian processes." Thesis, University College London (University of London), 2015. http://discovery.ucl.ac.uk/1461115/.

Full text
Abstract:
Alzheimer’s disease, the most common form of dementia, is an extremely serious health problem, and one that will become even more so in the coming decades as the global population ages. This has led to a massive effort to develop both new treatments for the condition and new methods of diagnosis; in fact the two are intimately linked as future treatments will depend on earlier diagnosis, which in turn requires the development of biomarkers that can be used to identify and track the disease. This is made possible by studies such as the Alzheimer’s disease neuroimaging initiative which provides previously unimaginable quantities of imaging and other data freely to researchers. It is the task of early diagnosis that this thesis focuses on. We do so by borrowing modern machine learning techniques, and applying them to image data. In particular, we use Gaussian processes (GPs), a previously neglected tool, and show they can be used in place of the more widely used support vector machine (SVM). As combinations of complementary biomarkers have been shown to be more useful than the biomarkers are individually, we go on to show GPs can also be applied to integrate different types of image and non-image data, and thanks to their properties this improves results further than it does with SVMs. In the final two chapters, we also look at different ways to formulate both the prediction of conversion to Alzheimer’s disease as a machine learning problem and the way image data can be used to generate features for input as a machine learning algorithm. Both of these show how unconventional approaches may improve results. The result is an advance in the state-of-the-art for a very clinically important problem, which may prove useful in practice and show a direction of future research to further increase the usefulness of such methods.
APA, Harvard, Vancouver, ISO, and other styles
41

Gimenes, Gabriel Perri. "Advanced techniques for graph analysis: a multimodal approach over planetary-scale data." Universidade de São Paulo, 2015. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-26062015-105026/.

Full text
Abstract:
Applications such as electronic commerce, computer networks, social networks, and biology (protein interaction), to name a few, have led to the production of graph-like data in planetary scale { possibly with millions of nodes and billions of edges. These applications pose challenging problems when the task is to use their data to support decision making processes by means of non-obvious and potentially useful patterns. In order to process such data for pattern discover, researchers and practitioners have used distributed processing resources organized in computational clusters. However, building and managing such clusters can be complex, bringing technical and financial issues that can be prohibitive in a variety of scenarios. Alternatively, it is desirable to process large scale graphs using only one computational node. To do so, we developed processes and algorithms according to three different approaches, building up towards an analytical set capable of revealing patterns, comprehension, and to help with the decision making process over planetary-scale graphs.
Aplicações como comércio eletrônico, redes de computadores, redes sociais e biologia (interação proteica), entre outras, levaram a produção de dados que podem ser representados como grafos à escala planetária { podendo possuir milhões de nós e bilhões de arestas. Tais aplicações apresentam problemas desafiadores quando a tarefa consiste em usar as informações contidas nos grafos para auxiliar processos de tomada de decisão através da descoberta de padrões não triviais e potencialmente utéis. Para processar esses grafos em busca de padrões, tanto pesquisadores como a indústria tem usado recursos de processamento distribuído organizado em clusters computacionais. Entretanto, a construção e manutenção desses clusters pode ser complexa, trazendo tanto problemas técnicos como financeiros que podem ser proibitivos em diversos casos. Por isso, torna-se desejável a capacidade de se processar grafos em larga escala usando somente um nó computacional. Para isso, foram desenvolvidos processos e algoritmos seguindo três abordagens diferentes, visando a definição de um arcabouço de análise capaz de revelar padrões, compreensão e auxiliar na tomada de decisão sobre grafos em escala planetária.
APA, Harvard, Vancouver, ISO, and other styles
42

Quack, Till. "Large scale mining and retrieval of visual data in a multimodal context." Konstanz Hartung-Gorre, 2009. http://d-nb.info/993614620/04.

Full text
APA, Harvard, Vancouver, ISO, and other styles
43

Labourey, Quentin. "Fusions multimodales pour la recherche d'humains par un robot mobile." Thesis, Université Grenoble Alpes (ComUE), 2017. http://www.theses.fr/2017GREAM020/document.

Full text
Abstract:
Dans ce travail, nous considérons le cas d'un robot mobile d'intérieur dont l'objectif est de détecter les humains présents dans l'environnement et de se positionner physiquement par rapport à eux, dans le but de mieux percevoir leur état. Pour cela, le robot dispose de différents capteurs (capteur RGB-Depth, microphones, télémètre laser). Des contributions de natures variées ont été effectuées :Classification d'événements sonores en environnement intérieur : La méthode de classification proposée repose sur une taxonomie de petite taille et est destinée à différencier les marqueurs de la présence humaine. L'utilisation de fonctions de croyance permet de prendre en compte l'incertitude de la classification, et de labelliser un son comme « inconnu ».Fusion audiovisuelle pour la détection de locuteurs successifs dans une conversation : Une méthode de détection de locuteurs est proposée dans le cas du robot immobile, placé comme témoin d'une interaction sociale. Elle repose sur une fusion audiovisuelle probabiliste. Cette méthode a été testée sur des vidéos acquises par le robot.Navigation dédiée à la détection d'humains à l'aide d'une fusion multimodale : A partir d'informations provenant des capteurs hétérogènes, le robot cherche des humains de manière autonome dans un environnement connu. Les informations sont fusionnées au sein d'une grille de perception multimodale. Cette grille permet au robot de prendre une décision quant à son prochain déplacement, à l'aide d'un automate reposant sur des niveaux de priorité des informations perçues. Ce système a été implémenté et testé sur un robot Q.bo.Modélisation crédibiliste de l'environnement pour la navigation : La construction de la grille de perception multimodale est améliorée à l'aide d'un mécanisme de fusion reposant sur la théorie des fonctions de croyance. Ceci permet au robot de maintenir une grille « évidentielle » dans le temps comprenant l'information perçue et son incertitude. Ce système a d'abord été évalué en simulation, puis sur le robot Q.bo
In this work, we consider the case of mobile robot that aims at detecting and positioning itself with respect to humans in its environment. In order to fulfill this mission, the robot is equipped with various sensors (RGB-Depth, microphones, laser telemeter). This thesis contains contributions of various natures:Sound classification in indoor environments: A small taxonomy is proposed in a classification method destined to enable a robot to detect human presence. Uncertainty of classification is taken into account through the use of belief functions, allowing us to label a sound as "unknown".Speaker tracking thanks to audiovisual data fusion: The robot is witness to a social interaction and tracks the successive speakers with probabilistic audiovisual data fusion. The proposed method was tested on videos extracted from the robot's sensors.Navigation dedicated to human detection thanks to a multimodal fusion:} The robot autonomously navigates in a known environment to detect humans thanks to heterogeneous sensors. The data is fused to create a multimodal perception grid. This grid enables the robot to chose its destinations, depending on the priority of perceived information. This system was implemented and tested on a Q.bo robot.Credibilist modelization of the environment for navigation: The creation of the multimodal perception grid is improved by the use of credibilist fusion. This enables the robot to maintain an evidential grid in time, containing the perceived information and its uncertainty. This system was implemented in simulation first, and then on a Q.bo robot
APA, Harvard, Vancouver, ISO, and other styles
44

Masri, Ali. "Multi-Network integration for an Intelligent Mobility." Thesis, Université Paris-Saclay (ComUE), 2017. http://www.theses.fr/2017SACLV091/document.

Full text
Abstract:
Les systèmes de transport sont un des leviers puissants du progrès de toute société. Récemment les modes de déplacement ont évolué significativement et se diversifient. Les distances quotidiennement parcourues par les citoyens ne cessent d'augmenter au cours de ces dernières années. Cette évolution impacte l'attractivité et la compétitivité mais aussi la qualité de vie grandement dépendante de l'évolution des mobilités des personnes et des marchandises. Les gouvernements et les collectivités territoriales développent de plus en plus des politiques d'incitation à l'éco-mobilité. Dans cette thèse nous nous concentrons sur les systèmes de transport public. Ces derniers évoluent continuellement et offrent de nouveaux services couvrant différents modes de transport pour répondre à tous les besoins des usagers. Outre les systèmes de transports en commun, prévus pour le transport de masse, de nouveaux services de mobilité ont vu le jour, tels que le transport à la demande, le covoiturage planifié ou dynamique et l'autopartage ou les vélos en libre-service. Ils offrent des solutions alternatives de mobilité et pourraient être complémentaires aux services traditionnels. Cepandant, ces services sont à l'heure actuelle isolés du reste des modes de transport et des solutions multimodales. Ils sont proposés comme une alternative mais sans intégration réelle aux plans proposés par les outils existants. Pour permettre la multimodalité, le principal challenge de cette thèse est l'intégration de données et/ou de services provenant de systèmes de transports hétérogènes. Par ailleurs, le concept de données ouvertes est aujourd'hui adopté par de nombreuses organisations publiques et privées, leur permettant de publier leurs sources de données sur le Web et de gagner ainsi en visibilité. On se place dans le contexte des données ouvertes et des méthodes et outils du web sémantique pour réaliser cette intégration, en offrant une vue unifiée des réseaux et des services de transport. Les verrous scientifiques auxquels s'intéresse cette thèse sont liés aux problèmes d'intégration à la fois des données et des services informatiques des systèmes de transport sous-jacents
Multimodality requires the integration of heterogeneous transportation data and services to construct a broad view of the transportation network. Many new transportation services (e.g. ridesharing, car-sharing, bike-sharing) are emerging and gaining a lot of popularity since in some cases they provide better trip solutions.However, these services are still isolated from the existing multimodal solutions and are proposed as alternative plans without being really integrated in the suggested plans. The concept of open data is raising and being adopted by many companies where they publish their data sources to the web in order to gain visibility. The goal of this thesis is to use these data to enable multimodality by constructing an extended transportation network that links these new services to existing ones.The challenges we face mainly arise from the integration problem in both transportation services and transportation data
APA, Harvard, Vancouver, ISO, and other styles
45

Heyder, Jakob Wendelin. "Knowledge Base Augmentation from Spreadsheet Data : Combining layout inference with multimodal candidate classification." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-278824.

Full text
Abstract:
Spreadsheets compose a valuable and notably large dataset of documents within many enterprise organizations and on the Web. Although spreadsheets are intuitive to use and equipped with powerful functionalities, extraction and transformation of the data remain a cumbersome and mostly manual task. The great flexibility they provide to the user results in data that is arbitrarily structured and hard to process for other applications. In this paper, we propose a novel architecture that combines supervised layout inference and multimodal candidate classification to allow knowledge base augmentation from arbitrary spreadsheets. In our design, we consider the need for repairing misclassifications and allow for verification and ranking of ambiguous candidates. We evaluate the performance of our system on two datasets, one with single-table spreadsheets, another with spreadsheets of arbitrary format. The evaluation result shows that the proposed system achieves similar performance on single-table spreadsheets compared to state-of-the-art rule-based solutions. Additionally, the flexibility of the system allows us to process arbitrary spreadsheet formats, including horizontally and vertically aligned tables, multiple worksheets, and contextualizing metadata. This was not possible with existing purely text-based or table-based solutions. The experiments demonstrate that it can achieve high effectiveness with an F1 score of 95.71 on arbitrary spreadsheets that require the interpretation of surrounding metadata. The precision of the system can be further increased by applying candidate schema-matching based on semantic similarity of column headers.
Kalkylblad består av ett värdefullt och särskilt stort datasätt av dokument inom många företagsorganisationer och på webben. Även om kalkylblad är intuitivt att använda och är utrustad med kraftfulla funktioner, utvinning och transformation av data är fortfarande en besvärlig och manuell uppgift. Den stora flexibiliteten som de ger användaren resulterar i data som är godtyckligt strukturerade och svåra att bearbeta för andra applikationer. I det här förslaget föreslår vi en ny arkitektur som kombinerar övervakad layoutinferens och multimodal kandidatklassificering för att tillåta kunskapsbasförstärkning från godtyckliga kalkylblad. I vår design överväger vi behovet av att reparera felklassificeringar och möjliggöra verifiering och rangordning av tvetydiga kandidater. Vi utvärderar systemets utförande på två datasätt, en med singeltabellkalkylblad, en annan med kalkylblad av godtyckligt format. Utvärderingsresultatet visar att det föreslagna systemet uppnår liknande prestanda på singel-tabellkalkylblad jämfört med state-of-the-art regelbaserade lösningar. Dessutom tillåter systemets flexibilitet oss att bearbeta godtyckliga kalkylark format, inklusive horisontella och vertikala inriktade tabeller, flera kalkylblad och sammanhangsförande metadata. Detta var inte möjligt med existerande rent textbaserade eller tabellbaserade lösningar. Experimenten visar att det kan uppnå hög effektivitet med en F1-poäng på 95.71 på godtyckliga kalkylblad som kräver tolkning av omgivande metadata. Systemets precision kan ökas ytterligare genom att applicera schema-matchning av kandidater baserat på semantisk likhet mellan kolumnrubriker.
APA, Harvard, Vancouver, ISO, and other styles
46

Gómez, Bruballa Raúl Álamo. "Exploiting the Interplay between Visual and Textual Data for Scene Interpretation." Doctoral thesis, Universitat Autònoma de Barcelona, 2020. http://hdl.handle.net/10803/670533.

Full text
Abstract:
L'experimentació en aprenentatge automàtic en escenaris controlats i amb bases de dades estàndards és necessària per a comparar el rendiment entre algoritmes avaluant-los sota les mateixes condicions. Però també és necessària l'experimentació en com es comporten aquests algoritmes quan són entrenats amb dades menys controlades i aplicats a problemes reals per indagar en com els avanços en recerca poden contribuir a la nostra societat. En aquesta tesi, experimentem amb els algoritmes més recents de visió per ordinador i processament del llengua natural aplicant-los a la interpretació d'escenes multimodals. En particular, investiguem en com la interpretació automàtica d'imatges i text es pot explotar conjuntament per resoldre problemes reals, enfocant-nos en aprendre de dades de xarxes socials. Encarem diverses tasques que impliquen informació visual i textual, discutim les seves particularitats i reptes i exposem les nostres conclusions experimentals. Primer treballem en la detecció de text en imatges. A continuació, treballem amb publicacions de xarxes socials, fent servir els subtítols textuals associats a imatges com a supervisió per apendre característiques visuals, que apliquem a la cerca d'imatges semàntica amb consultes multimodals. Després, treballem amb imatges de xarxes socials geolocalitzades amb etiquetes textuals associades, experimentant en com fer servir les etiquetes com a supervisió, en cerca d'imatges sensible a la localització, i en explotar la localització per l'etiquetatge d'imatges. Finalment, encarem un problema de classificació específic de publicacions de xarxes socials formades per una imatge i un text: Classificació de discurs de l'odi multimodal.
La experimentación en aprendizaje automático en escenarios controlados y con bases de datos estándares es necesaria para comparar el desempeño entre algoritmos evaluándolos en las mismas condiciones. Sin embargo, también en necesaria experimentación en cómo se comportan estos algoritmos cuando son entrenados con datos menos controlados y aplicados a problemas reales para indagar en cómo los avances en investigación pueden contribuir a nuestra sociedad. En esta tesis experimentamos con los algoritmos más recientes de visión por ordenador y procesado del lenguaje natural aplicándolos a la interpretación de escenas multimodales. En particular, investigamos en cómo la interpretación automática de imagen y texto se puede explotar conjuntamente para resolver problemas reales, enfocándonos en aprender de datos de redes sociales. Encaramos diversas tareas que implican información visual y textual, discutimos sus características y retos y exponemos nuestras conclusiones experimentales. Primeramente trabajamos en la detección de texto en imágenes. A continuación, trabajamos con publicaciones de redes sociales, usando las leyendas textuales de imágenes como supervisión para aprender características visuales, que aplicamos a la búsqueda de imágenes semántica con consultas multimodales. Después, trabajamos con imágenes de redes sociales geolocalizadas con etiquetas textuales asociadas, experimentando en cómo usar las etiquetas como supervisión, en búsqueda de imágenes sensible a localización, y en explotar la localización para el etiquetado de imágenes. Finalmente, encaramos un problema de clasificación específico de publicaciones de redes sociales formadas por una imagen y un texto: Clasificación de discurso del odio multimodal.
Machine learning experimentation under controlled scenarios and standard datasets is necessary to compare algorithms performance by evaluating all of them in the same setup. However, experimentation on how those algorithms perform on unconstrained data and applied tasks to solve real world problems is also a must to ascertain how that research can contribute to our society. In this dissertation we experiment with the latest computer vision and natural language processing algorithms applying them to multimodal scene interpretation. Particularly, we research on how image and text understanding can be jointly exploited to address real world problems, focusing on learning from Social Media data. We address several tasks that involve image and textual information, discuss their characteristics and offer our experimentation conclusions. First, we work on detection of scene text in images. Then, we work with Social Media posts, exploiting the captions associated to images as supervision to learn visual features, which we apply to multimodal semantic image retrieval. Subsequently, we work with geolocated Social Media images with associated tags, experimenting on how to use the tags as supervision, on location sensitive image retrieval and on exploiting location information for image tagging. Finally, we work on a specific classification problem of Social Media publications consisting on an image and a text: Multimodal hate speech classification.
APA, Harvard, Vancouver, ISO, and other styles
47

Kirchler, Dominik. "Routage efficace sur réseaux de transport multimodaux." Phd thesis, Ecole Polytechnique X, 2013. http://pastel.archives-ouvertes.fr/pastel-00877450.

Full text
Abstract:
La mobilité est un aspect important des sociétés modernes. Par conséquent, il y a une demande croissante pour des solutions informatiques de calcul d'itinéraire. Dans cette thèse, le routage multimodal et le système Dial-a-Ride sont étudiés. Ils contribuent à une utilisation plus efficace de l'infrastructure de transport disponible, élément déterminant dans la perspective d'un développement durable. La planification d'itinéraires multimodaux est rendus complexe en raison des différents modes de transport qui doivent être combinés. Une généralisation de l'algorithme de Dijkstra peut être utilisée pour trouver les chemins les plus courts sur un réseau multimodal. Cependant, sa performance n'est pas suffisante pour les applications industrielles. De ce fait, cette thèse introduit un nouvel algorithme appelé SDALT. Il s'agit d'une adaptation de la technique d'accélération ALT. Pour évaluer la performance de SDALT, un graphe a été construit à partir d'un réseau multimodal réel basé sur les données de transport de la région française Ile-de-France. Il inclut la marche, les transports en commun, la voiture, la bicyclette ainsi que des informations relative aux horaires les horaires et les conditions de circulation. Les tests de performance montrent que SDALT fonctionne bien, avec un temps de calcul réduit d'un facteur compris entre 1,5 et 60 par rapport à l'algorithme de base. Dans un contexte multimodal autre la question de la détermination du chemin le plus court, se pose celle de trouver un chemin aller-retour multimodal optimal entre un point de départ et un point d'arrivée. Un véhicule privé (voiture ou bicyclette) utilisé pour une première partie du trajet aller doit être récupéré au cours du trajet retour pour être ramené au point de départ. Pour cette raison, le parking doit être choisi de manière à optimiser les temps de déplacement du trajet aller et du trajet retour combinés. L'algorithme qui est proposé ici résout ce problème plus rapidement que les techniques actuelles. Le système Dial-a-Ride offre aux passagers le confort et la flexibilité des voitures privées et des taxis à un moindre coût et avec plus d'éco-efficacité car il regroupe les demandes de transport similaires. Il fonctionne de la manière suivante: les passagers demandent le service en appelant un opérateur. Ils communiquent leur point de départ, leur point de destination, le nombre de passagers, et quelques précisions sur les horaires de service. Un algorithme calcule ensuite les itinéraires et les horaires des véhicules. Cette thèse propose une nouvelle heuristique efficace et rapide de type Granular Tabu Search, capable de produire de bonnes solutions dans des délais courts (jusqu'à 3 minutes). Comparativement aux autres méthodes, et au regard des instances de test de la littérature, cet algorithme donne de bons résultats.
APA, Harvard, Vancouver, ISO, and other styles
48

Saragiotis, Panagiotis. "Cross-modal classification and retrieval of multimodal data using combinations of neural networks." Thesis, University of Surrey, 2006. http://epubs.surrey.ac.uk/843338/.

Full text
Abstract:
Current neurobiological thinking supported, in part, by experimentation stresses the importance of cross-modality. Uni-modal cognitive tasks, language and vision, for example, are performed with the help of many networks working simultaneously or sequentially; and for cross-modal tasks, like picture / object naming and word illustration, the output of these networks is combined to produce higher cognitive behaviour. The notion of multi-net processing is used typically in the pattern recognition literature, where ensemble networks of weak classifiers - typically supervised - appear to outperform strong classifiers. We have built a system, based on combinations of neural networks, that demonstrates how cross-modal classification can be used to retrieve multi-modal data using one of the available modalities of information. Two multi-net systems were used in this work: one comprising Kohonen SOMs that interact with each other via a Hebbian network and a fuzzy ARTMAP network where the interaction is through the embedded map field. The multi-nets were used for the cross-modal retrieval of images given keywords and for finding the appropriate keywords for an image. The systems were trained on two publicly available image databases that had collateral annotations on the images. The Hemera collection, comprising images of pre-segmented single objects, and the Corel collection with images of multiple objects were used for automatically generating various sets of input vectors. We have attempted to develop a method for evaluating the performance of multi-net systems using a monolithic network trained on modally-undifferentiated vectors as an intuitive bench-mark. To this extent single SOM and fuzzy ART networks were trained using a concatenated visual / linguistic vector to test the performance of multi-net systems with typical monolithic systems. Both multi-nets outperform the respective monolithic systems in terms of information retrieval measures of precision and recall on test images drawn from both datasets; the SOM multi-net outperforms the fuzzy ARTMAP both in terms of convergence and precision-recall. The performance of the SOM-based multi-net in retrieval, classification and auto-annotation is on a par with that of state of the art systems like "ALIP" and "Blobworld". Much of the neural network based simulations reported in the literature use supervised learning algorithms. Such algorithms are suited when classes of objects are predefined and objects in themselves are quite unique in terms of their attributes. We have compared the performance of our multi-net systems with that of a multi-layer perceptron (MLP). The MLP does show substantially greater precision and recall on a (fixed) class of objects when compared with our unsupervised systems. However when 'lesioned' -the network connectivity 'damaged' deliberately- the multi-net systems show a greater degree of robustness. Cross-modal systems appear to hold considerable intellectual and commercial potential and the multi-net approach facilitates the simulation of such systems.
APA, Harvard, Vancouver, ISO, and other styles
49

Lu, Pascal. "Statistical Learning from Multimodal Genetic and Neuroimaging data for prediction of Alzheimer's Disease." Electronic Thesis or Diss., Sorbonne université, 2019. http://www.theses.fr/2019SORUS636.

Full text
Abstract:
De nos jours, la maladie d'Alzheimer est la principale cause de démence. Elle provoque des troubles de mémoires et de comportements chez les personnes âgées. La diagnostic précoce de la maladie d'Alzheimer est un sujet actif de recherche. Trois différents types de données jouent un role particulier dans le diagnostic de la maladie d'Alzheimer: les tests cliniques, les données de neuroimagerie et les données génétiques. Les deux premières modalités apportent de l'information concernant l'état actuel du patient. En revanche, les données génétiques permettent d'identifier si un patient est à risque et pourrait développer la maladie d'Alzheimer dans le futur. Par ailleurs, durant la dernière décennie, les chercheurs ont crée des bases de données longitudinales sur la maladie d'Alzheimer et d'importantes recherches ont été réalisées pour le traitement et l'analyse de données complexes en grande dimension. La première contribution de cette thèse sera d'étudier comment combiner différentes modalités dans le but d'améliorer leur pouvoir prédictif dans le contexte de la classification. Nous explorons les modèles multiniveaux permettant de capturer les potentielles interactions entre modalités. Par ailleurs, nous modéliserons la structure de chaque modalité (structure génétique, structure spatiale du cerveau) à travers l'utilisation de pénalités adaptées comme la pénalité ridge pour les images, ou la pénalité group lasso pour les données génétiques. La deuxième contribution de thèse sera d'explorer les modèles permettant de prédire la date de conversion à la maladie d'Alzheimer pour les patients atteints de troubles cognitifs légers. De telles problématiques ont été mises en valeurs à travers de challenge, comme TADPOLE. Nous utiliserons principalement le cadre défini par les modèles de survie. Partant de modèles classiques, comme le modèle d'hasard proportionnel de Cox, du modèle additif d'Aalen, et du modèle log-logistique, nous allons développer d'autres modèles de survie pour la combinaisons de modalités, à travers un modèle log-logistique multiniveau ou un modèle de Cox multiniveau
Alzheimer's Disease (AD) is nowadays the main cause of dementia in the world. It provokes memory and behavioural troubles in elderly people. The early diagnosis of Alzheimer's Disease is an active topic of research. Three different types of data play a major role when it comes to its diagnosis: clinical tests, neuroimaging and genetics. The two first data bring informations concerning the patient's current state. On the contrary, genetic data help to identify whether a patient could develop AD in the future. Furthermore, during the past decade, researchers have created longitudinal dataset on A and important advances for processing and analyse of complex and high-dimensional data have been made. The first contribution of this thesis will be to study how to combine different modalities in order to increase their predictive power in the context of classification. We will focus on hierarchical models that capture potential interactions between modalities. Moreover, we will adequately modelled the structure of each modality (genomic structure, spatial structure for brain images), through the use of adapted penalties such as the ridge penalty for images and the group lasso penalty for genetic data. The second contribution of this thesis will be to explore models for predict the conversion date to Alzheimer's Disease for mild cognitive impairment subjects. Such problematic has been enhanced by the TADPOLE challenge. We will use the framework provided by survival analysis. Starting from basic models such as the Cox proportional hasard model, the additive Aalen model, and the log-logistic model, we will develop other survival models for combining different modalities, such as a multilevel log-logistic model or a multilevel Cox model
APA, Harvard, Vancouver, ISO, and other styles
50

Diehn, Sabrina Maria [Verfasser]. "Analysis of data from multimodal chemical characterizations of plant tissues / Sabrina Maria Diehn." Berlin : Humboldt-Universität zu Berlin, 2021. http://d-nb.info/1238074006/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography