Academic literature on the topic 'Multimodal data processing'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Multimodal data processing.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Multimodal data processing":

1

Kyselova, A. H., G. D. Kiselov, A. A. Serhyeyev, and A. V. Shalaginov. "Processing input data in multimodal applications." Electronics and Communications 16, no. 2 (March 28, 2011): 86–92. http://dx.doi.org/10.20535/2312-1807.2011.16.2.268253.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Boyko, Nataliya. "Models and Algorithms for Multimodal Data Processing." WSEAS TRANSACTIONS ON INFORMATION SCIENCE AND APPLICATIONS 20 (March 14, 2023): 87–97. http://dx.doi.org/10.37394/23209.2023.20.11.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Information technologies and computer equipment are used in almost all areas of activity, which is why new areas of their use are emerging, and the level of ICT implementation is deepening, with more and more functions that were the prerogative of humans being assigned to computers. As science and technology develop, new technologies and technical means are emerging that enable a human-centered approach to software development, better adaptation of human-machine interfaces to user needs, and an increase in the ergonomics of software products, etc. These measures contribute to the formation of fundamentally new opportunities for presenting and processing information about real-world objects with which an individual interacts in production, educational and everyday activities in computer systems. The article aims to identify current models and algorithms for processing multimodal data in computer systems based on a survey of company employees and to analyze these models and algorithms to determine the benefits of using models and algorithms for processing multimodal data. Research methods: comparative analysis; systematization; generalization; survey. Results. It has been established that the recommended multimodal data representation models (the mixed model, the spatiotemporal linked model, and the multilevel ontological model) allow for representing the digital twin of the object under study at differentiated levels of abstraction, and these multimodal data processing models can be combined to obtain the most informative way to describe the physical twin. As a result of the study, it was found that the "general judgment of the experience of using models and algorithms for multimodal data processing" was noted by the respondents in the item "Personally, I would say that models and algorithms for multimodal data processing are practical" with an average value of 8.16 (SD = 0 1.70), in the item "Personally, I would say that models and algorithms for multimodal data processing are understandable (not confusing)" with an average value of 7.52. It has been determined that respondents positively evaluate (with scores above 5.0) models and algorithms for processing multimodal data in work environments as practical, understandable, manageable, and original. columns finish at the same distance from the top of the page.
3

Parsons, Aaron D., Stephen W. T. Price, Nicola Wadeson, Mark Basham, Andrew M. Beale, Alun W. Ashton, J. Frederick W. Mosselmans, and Paul D. Quinn. "Automatic processing of multimodal tomography datasets." Journal of Synchrotron Radiation 24, no. 1 (January 1, 2017): 248–56. http://dx.doi.org/10.1107/s1600577516017756.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
With the development of fourth-generation high-brightness synchrotrons on the horizon, the already large volume of data that will be collected on imaging and mapping beamlines is set to increase by orders of magnitude. As such, an easy and accessible way of dealing with such large datasets as quickly as possible is required in order to be able to address the core scientific problems during the experimental data collection. Savu is an accessible and flexible big data processing framework that is able to deal with both the variety and the volume of data of multimodal and multidimensional scientific datasets output such as those from chemical tomography experiments on the I18 microfocus scanning beamline at Diamond Light Source.
4

Qi, Qingfu, Liyuan Lin, and Rui Zhang. "Feature Extraction Network with Attention Mechanism for Data Enhancement and Recombination Fusion for Multimodal Sentiment Analysis." Information 12, no. 9 (August 24, 2021): 342. http://dx.doi.org/10.3390/info12090342.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Multimodal sentiment analysis and emotion recognition represent a major research direction in natural language processing (NLP). With the rapid development of online media, people often express their emotions on a topic in the form of video, and the signals it transmits are multimodal, including language, visual, and audio. Therefore, the traditional unimodal sentiment analysis method is no longer applicable, which requires the establishment of a fusion model of multimodal information to obtain sentiment understanding. In previous studies, scholars used the feature vector cascade method when fusing multimodal data at each time step in the middle layer. This method puts each modal information in the same position and does not distinguish between strong modal information and weak modal information among multiple modalities. At the same time, this method does not pay attention to the embedding characteristics of multimodal signals across the time dimension. In response to the above problems, this paper proposes a new method and model for processing multimodal signals, which takes into account the delay and hysteresis characteristics of multimodal signals across the time dimension. The purpose is to obtain a multimodal fusion feature emotion analysis representation. We evaluate our method on the multimodal sentiment analysis benchmark dataset CMU Multimodal Opinion Sentiment and Emotion Intensity Corpus (CMU-MOSEI). We compare our proposed method with the state-of-the-art model and show excellent results.
5

Chen, Mujun. "Automatic Image Processing Algorithm for Light Environment Optimization Based on Multimodal Neural Network Model." Computational Intelligence and Neuroscience 2022 (June 3, 2022): 1–12. http://dx.doi.org/10.1155/2022/5156532.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
In this paper, we conduct an in-depth study and analysis of the automatic image processing algorithm based on a multimodal Recurrent Neural Network (m-RNN) for light environment optimization. By analyzing the structure of m-RNN and combining the current research frontiers of image processing and natural language processing, we find out the problem of the ineffectiveness of m-RNN for some image generation descriptions, starting from both the image feature extraction part and text sequence data processing. Unlike traditional image automatic processing algorithms, this algorithm does not need to add complex rules manually. Still, it evaluates and filters through the training image collection and finally generates image automatic processing models by m-RNN. An image semantic segmentation algorithm is proposed based on multimodal attention and adaptive feature fusion. The main idea of the algorithm is to combine adaptive and feature fusion and then introduce data enhancement for small-scale multimodal light environment datasets by extracting the importance between images through multimodal attention. The model proposed in this paper can span the semantic differences of different modalities and construct feature relationships between different modalities to achieve an inferable, interpretable, and scalable feature representation of multimodal data. The automatic processing of light environment images using multimodal neural networks based on traditional algorithms eliminates manual processing and greatly reduces the time and effort of image processing.
6

BASYSTIUK, Oleh, and Nataliia MELNYKOVA. "MULTIMODAL SPEECH RECOGNITION BASED ON AUDIO AND TEXT DATA." Herald of Khmelnytskyi National University. Technical sciences 313, no. 5 (October 27, 2022): 22–25. http://dx.doi.org/10.31891/2307-5732-2022-313-5-22-25.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Systems of machine translation of texts from one language to another simulate the work of a human translator. Their performance depends on the ability to understand the grammar rules of the language. In translation, the basic units are not individual words, but word combinations or phraseological units that express different concepts. Only by using them, more complex ideas can be expressed through the translated text. The main feature of machine translation is different length for input and output. The ability to work with different lengths of input and output provides us with the approach of recurrent neural networks. A recurrent neural network (RNN) is a class of artificial neural network that has connections between nodes. In this case, a connection refers to a connection from a more distant node to a less distant node. The presence of connections allows the RNN to remember and reproduce the entire sequence of reactions to one stimulus. From the point of view of programming, such networks are analogous to cyclic execution, and from the point of view of the system, such networks are equivalent to a state machine. RNNs are commonly used to process word sequences in natural language processing. Usually, a hidden Markov model (HMM) and an N-program language model are used to process a sequence of words. Deep learning has completely changed the approach to machine translation. Researchers in the deep learning field has created simple solutions based on machine learning that outperform the best expert systems. In this paper was reviewed the main features of machine translation based on recurrent neural networks. The advantages of systems based on RNN using the sequence-to-sequence model against statistical translation systems are also highlighted in the article. Two machine translation systems based on the sequence-to-sequence model were constructed using Keras and PyTorch machine learning libraries. Based on the obtained results, libraries analysis was done, and their performance comparison.
7

Basystiuk, Oleh, and Nataliya Melnykova. "Development of the Multimodal Handling Interface Based on Google API." Computer Design Systems. Theory and Practice 6, no. 1 (2024): 216–23. http://dx.doi.org/10.23939/cds2024.01.216.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Today, Artificial Intelligence is a daily routine, becoming deeply entrenched in our lives. One of the most popular and rapidly advancing technologies is speech recognition, which forms an integral part of the broader concept of multimodal data handling. Multimodal data encompasses voice, audio, and text data, constituting a multifaceted approach to understanding and processing information. This paper presents the development of a multimodal handling interface leveraging Google API technologies. The interface aims to facilitate seamless integration and management of diverse data modalities, including text, audio, and video, within a unified platform. Through the utilization of Google API functionalities, such as natural language processing, speech recognition, and video analysis, the interface offers enhanced capabilities for processing, analysing, and interpreting multimodal data. The paper discusses the design and implementation of the interface, highlighting its features and functionalities. Furthermore, it explores potential applications and future directions for utilizing the interface in various domains, including healthcare, education, and multimedia content creation. Overall, the development of the multimodal handling interface based on Google API represents a significant step towards advancing multimodal data processing and enhancing user experience in interacting with diverse data sources.
8

Sulema, Yevgeniya. "MULTIMODAL DATA PROCESSING BASED ON ALGEBRAIC SYSTEM OF AGGREGATES RELATIONS." Radio Electronics, Computer Science, Control, no. 1 (May 15, 2020): 169–80. http://dx.doi.org/10.15588/1607-3274-2020-1-17.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Ren, Jinchang, Junwei Han, and Mauro Dalla Mura. "Special issue on multimodal data fusion for multidimensional signal processing." Multidimensional Systems and Signal Processing 27, no. 4 (August 8, 2016): 801–5. http://dx.doi.org/10.1007/s11045-016-0441-0.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Chen, Shu-Ching. "Embracing Multimodal Data in Multimedia Data Analysis." IEEE MultiMedia 28, no. 3 (July 1, 2021): 5–7. http://dx.doi.org/10.1109/mmul.2021.3104911.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "Multimodal data processing":

1

Cadène, Rémi. "Deep Multimodal Learning for Vision and Language Processing." Electronic Thesis or Diss., Sorbonne université, 2020. http://www.theses.fr/2020SORUS277.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Les technologies du numérique ont joué un rôle déterminant dans la transformation de notre société. Des méthodes statistiques récentes ont été déployées avec succès afin d’automatiser le traitement de la quantité croissante d’images, de vidéos et de textes que nous produisons quotidiennement. En particulier, les réseaux de neurones profonds ont été adopté par les communautés de la vision par ordinateur et du traitement du langage naturel pour leur capacité à interpréter le contenu des images et des textes une fois entraînés sur de grands ensembles de données. Les progrès réalisés dans les deux communautés ont permis de jeter les bases de nouveaux problèmes de recherche à l’intersection entre vision et langage. Dans la première partie de cette thèse, nous nous concentrons sur des moteurs de recherche multimodaux images-textes. Nous proposons une stratégie d’apprentissage pour aligner efficacement les deux modalités tout en structurant l’espace de recherche avec de l’information sémantique. Dans la deuxième partie, nous nous concentrons sur des systèmes capables de répondre à toute question sur une image. Nous proposons une architecture multimodale qui fusionne itérativement les modalités visuelles et textuelles en utilisant un modèle bilinéaire factorisé, tout en modélisant les relations par paires entre chaque région de l’image. Dans la dernière partie, nous abordons les problèmes de biais dans la modélisation. Nous proposons une stratégie d’apprentissage réduisant les biais linguistiques généralement présents dans les systèmes de réponse aux questions visuelles
Digital technologies have become instrumental in transforming our society. Recent statistical methods have been successfully deployed to automate the processing of the growing amount of images, videos, and texts we produce daily. In particular, deep neural networks have been adopted by the computer vision and natural language processing communities for their ability to perform accurate image recognition and text understanding once trained on big sets of data. Advances in both communities built the groundwork for new research problems at the intersection of vision and language. Integrating language into visual recognition could have an important impact on human life through the creation of real-world applications such as next-generation search engines or AI assistants.In the first part of this thesis, we focus on systems for cross-modal text-image retrieval. We propose a learning strategy to efficiently align both modalities while structuring the retrieval space with semantic information. In the second part, we focus on systems able to answer questions about an image. We propose a multimodal architecture that iteratively fuses the visual and textual modalities using a factorized bilinear model while modeling pairwise relationships between each region of the image. In the last part, we address the issues related to biases in the modeling. We propose a learning strategy to reduce the language biases which are commonly present in visual question answering systems
2

Lizarraga, Gabriel M. "A Neuroimaging Web Interface for Data Acquisition, Processing and Visualization of Multimodal Brain Images." FIU Digital Commons, 2018. https://digitalcommons.fiu.edu/etd/3855.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Structural and functional brain images are generated as essential modalities for medical experts to learn about the different functions of the brain. These images are typically visually inspected by experts. Many software packages are available to process medical images, but they are complex and difficult to use. The software packages are also hardware intensive. As a consequence, this dissertation proposes a novel Neuroimaging Web Services Interface (NWSI) as a series of processing pipelines for a common platform to store, process, visualize and share data. The NWSI system is made up of password-protected interconnected servers accessible through a web interface. The web-interface driving the NWSI is based on Drupal, a popular open source content management system. Drupal provides a user-based platform, in which the core code for the security and design tools are updated and patched frequently. New features can be added via modules, while maintaining the core software secure and intact. The webserver architecture allows for the visualization of results and the downloading of tabulated data. Several forms are ix available to capture clinical data. The processing pipeline starts with a FreeSurfer (FS) reconstruction of T1-weighted MRI images. Subsequently, PET, DTI, and fMRI images can be uploaded. The Webserver captures uploaded images and performs essential functionalities, while processing occurs in supporting servers. The computational platform is responsive and scalable. The current pipeline for PET processing calculates all regional Standardized Uptake Value ratios (SUVRs). The FS and SUVR calculations have been validated using Alzheimer's Disease Neuroimaging Initiative (ADNI) results posted at Laboratory of Neuro Imaging (LONI). The NWSI system provides access to a calibration process through the centiloid scale, consolidating Florbetapir and Florbetaben tracers in amyloid PET images. The interface also offers onsite access to machine learning algorithms, and introduces new heat maps that augment expert visual rating of PET images. NWSI has been piloted using data and expertise from Mount Sinai Medical Center, the 1Florida Alzheimer’s Disease Research Center (ADRC), Baptist Health South Florida, Nicklaus Children's Hospital, and the University of Miami. All results were obtained using our processing servers in order to maintain data validity, consistency, and minimal processing bias.
3

Gimenes, Gabriel Perri. "Advanced techniques for graph analysis: a multimodal approach over planetary-scale data." Universidade de São Paulo, 2015. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-26062015-105026/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Applications such as electronic commerce, computer networks, social networks, and biology (protein interaction), to name a few, have led to the production of graph-like data in planetary scale { possibly with millions of nodes and billions of edges. These applications pose challenging problems when the task is to use their data to support decision making processes by means of non-obvious and potentially useful patterns. In order to process such data for pattern discover, researchers and practitioners have used distributed processing resources organized in computational clusters. However, building and managing such clusters can be complex, bringing technical and financial issues that can be prohibitive in a variety of scenarios. Alternatively, it is desirable to process large scale graphs using only one computational node. To do so, we developed processes and algorithms according to three different approaches, building up towards an analytical set capable of revealing patterns, comprehension, and to help with the decision making process over planetary-scale graphs.
Aplicações como comércio eletrônico, redes de computadores, redes sociais e biologia (interação proteica), entre outras, levaram a produção de dados que podem ser representados como grafos à escala planetária { podendo possuir milhões de nós e bilhões de arestas. Tais aplicações apresentam problemas desafiadores quando a tarefa consiste em usar as informações contidas nos grafos para auxiliar processos de tomada de decisão através da descoberta de padrões não triviais e potencialmente utéis. Para processar esses grafos em busca de padrões, tanto pesquisadores como a indústria tem usado recursos de processamento distribuído organizado em clusters computacionais. Entretanto, a construção e manutenção desses clusters pode ser complexa, trazendo tanto problemas técnicos como financeiros que podem ser proibitivos em diversos casos. Por isso, torna-se desejável a capacidade de se processar grafos em larga escala usando somente um nó computacional. Para isso, foram desenvolvidos processos e algoritmos seguindo três abordagens diferentes, visando a definição de um arcabouço de análise capaz de revelar padrões, compreensão e auxiliar na tomada de decisão sobre grafos em escala planetária.
4

Rabhi, Sara. "Optimized deep learning-based multimodal method for irregular medical timestamped data." Electronic Thesis or Diss., Institut polytechnique de Paris, 2022. http://www.theses.fr/2022IPPAS003.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
L'adoption des dossiers médicaux électroniques dans les systèmes d'information des hôpitaux a conduit à la définition de bases de données regroupant divers types de données telles que des notes cliniques textuelles, des événements médicaux longitudinaux et des informations statiques sur les patients. Toutefois, les données ne sont renseignées que lors des consultations médicales ou des séjours hospitaliers. La fréquence de ces visites varie selon l’état de santé du patient. Ainsi, un système capable d'exploiter les différents types de données collectées à différentes échelles de temps est essentiel pour reconstruire la trajectoire de soin du patient, analyser son historique et délivrer des soins adaptés. Ce travail de thèse aborde deux défis principaux du traitement des données médicales : Représenter la séquence des observations médicales à échantillonnage irrégulier et optimiser l'extraction des événements médicaux à partir des textes de notes cliniques. Notre objectif principal est de concevoir une représentation multimodale de la trajectoire de soin du patient afin de résoudre les problèmes de prédiction clinique. Notre premier travail porte sur la modélisation des séries temporelles médicales irrégulières afin d'évaluer l'importance de considérer les écarts de temps entre les visites médicales dans la représentation de la trajectoire de soin d'un patient donné. À cette fin, nous avons mené une étude comparative entre les réseaux de neurones récurrents, les modèles basés sur l’architecture « Transformer » et les techniques de représentation du temps. De plus, l'objectif clinique était de prédire les complications de la rétinopathie chez les patients diabétiques de type 1 de la base de données française CaRéDIAB (Champagne Ardenne Réseau Diabète) en utilisant leur historique de mesures HbA1c. Les résultats de l'étude ont montré que le modèle « Transformer », combiné à la représentation `Soft-One-Hot` des écarts temporels a conduit à un score AUC de 88,65% (spécificité de 85,56%, sensibilité de 83,33%), soit une amélioration de 4,3% par rapport au modèle « LSTM ». Motivés par ces résultats, nous avons étendu notre étude à des séries temporelles multivariées plus courtes et avons prédit le risque de mortalité à l'hôpital pour les patients présents dans la base de données MIMIC-III. L'architecture proposée, HiTT, a amélioré le score AUC de 5 % par rapport à l’architecture « Transformer ». Dans la deuxième étape, nous nous sommes intéressés à l'extraction d'informations médicales à partir des comptes rendus médicaux afin d'enrichir la trajectoire de soin du patient. En particulier, les réseaux de neurones basés sur le module « Transformer » ont montré des résultats encourageants dans d'extraction d'informations médicales. Cependant, ces modèles complexes nécessitent souvent un grand corpus annoté. Cette exigence est difficile à atteindre dans le domaine médical car elle nécessite l'accès à des données privées de patients et des annotateurs experts. Pour réduire les coûts d'annotation, nous avons exploré les stratégies d'apprentissage actif qui se sont avérées efficaces dans de nombreuses tâches, notamment la classification de textes, l’analyse d’image et la reconnaissance vocale. En plus des méthodes existantes, nous avons défini une stratégie d'apprentissage actif, Hybrid Weighted Uncertainty Sampling, qui utilise la représentation cachée du texte donnée par le modèle pour mesurer la représentativité des échantillons. Une simulation utilisant les données du challenge i2b2-2010 a montré que la métrique proposée réduit le coût d'annotation de 70% pour atteindre le même score de performance que l'apprentissage passif. Enfin, nous avons combiné des séries temporelles médicales multivariées et des concepts médicaux extraits des notes cliniques de la base de données MIMIC-III pour entraîner une architecture multimodale. Les résultats du test ont montré une amélioration de 5,3% en considérant les informations textuelles
The wide adoption of Electronic Health Records in hospitals’ information systems has led to the definition of large databases grouping various types of data such as textual notes, longitudinal medical events, and tabular patient information. However, the records are only filled during consultations or hospital stays that depend on the patient’s state, and local habits. A system that can leverage the different types of data collected at different time scales is critical for reconstructing the patient’s health trajectory, analyzing his history, and consequently delivering more adapted care.This thesis work addresses two main challenges of medical data processing: learning to represent the sequence of medical observations with irregular elapsed time between consecutive visits and optimizing the extraction of medical events from clinical notes. Our main goal is to design a multimodal representation of the patient’s health trajectory to solve clinical prediction problems. Our first work built a framework for modeling irregular medical time series to evaluate the importance of considering the time gaps between medical episodes when representing a patient’s health trajectory. To that end, we conducted a comparative study of sequential neural networks and irregular time representation techniques. The clinical objective was to predict retinopathy complications for type 1 diabetes patients in the French database CaRéDIAB (Champagne Ardenne Réseau Diabetes) using their history of HbA1c measurements. The study results showed that the attention-based model combined with the soft one-hot representation of time gaps led to AUROC score of 88.65% (specificity of 85.56%, sensitivity of 83.33%), an improvement of 4.3% when compared to the LSTM-based model. Motivated by these results, we extended our framework to shorter multivariate time series and predicted in-hospital mortality for critical care patients of the MIMIC-III dataset. The proposed architecture, HiTT, improved the AUC score by 5% over the Transformer baseline. In the second step, we focused on extracting relevant medical information from clinical notes to enrich the patient’s health trajectories. Particularly, Transformer-based architectures showed encouraging results in medical information extraction tasks. However, these complex models require a large, annotated corpus. This requirement is hard to achieve in the medical field as it necessitates access to private patient data and high expert annotators. To reduce annotation cost, we explored active learning strategies that have been shown to be effective in tasks such as text classification, information extraction, and speech recognition. In addition to existing methods, we defined a Hybrid Weighted Uncertainty Sampling active learning strategy that takes advantage of the contextual embeddings learned by the Transformer-based approach to measuring the representativeness of samples. A simulated study using the i2b2-2010 challenge dataset showed that our proposed metric reduces the annotation cost by 70% to achieve the same score as passive learning. Lastly, we combined multivariate medical time series and medical concepts extracted from clinical notes of the MIMIC-III database to train a multimodal transformer-based architecture. The test results of the in-hospital mortality task showed an improvement of 5.3% when considering additional text data. This thesis contributes to patient health trajectory representation by alleviating the burden of episodic medical records and the manual annotation of free-text notes
5

Dean, David Brendan. "Synchronous HMMs for audio-visual speech processing." Thesis, Queensland University of Technology, 2008. https://eprints.qut.edu.au/17689/3/David_Dean_Thesis.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Both human perceptual studies and automaticmachine-based experiments have shown that visual information from a speaker's mouth region can improve the robustness of automatic speech processing tasks, especially in the presence of acoustic noise. By taking advantage of the complementary nature of the acoustic and visual speech information, audio-visual speech processing (AVSP) applications can work reliably in more real-world situations than would be possible with traditional acoustic speech processing applications. The two most prominent applications of AVSP for viable human-computer-interfaces involve the recognition of the speech events themselves, and the recognition of speaker's identities based upon their speech. However, while these two fields of speech and speaker recognition are closely related, there has been little systematic comparison of the two tasks under similar conditions in the existing literature. Accordingly, the primary focus of this thesis is to compare the suitability of general AVSP techniques for speech or speaker recognition, with a particular focus on synchronous hidden Markov models (SHMMs). The cascading appearance-based approach to visual speech feature extraction has been shown to work well in removing irrelevant static information from the lip region to greatly improve visual speech recognition performance. This thesis demonstrates that these dynamic visual speech features also provide for an improvement in speaker recognition, showing that speakers can be visually recognised by how they speak, in addition to their appearance alone. This thesis investigates a number of novel techniques for training and decoding of SHMMs that improve the audio-visual speech modelling ability of the SHMM approach over the existing state-of-the-art joint-training technique. Novel experiments are conducted within to demonstrate that the reliability of the two streams during training is of little importance to the final performance of the SHMM. Additionally, two novel techniques of normalising the acoustic and visual state classifiers within the SHMM structure are demonstrated for AVSP. Fused hidden Markov model (FHMM) adaptation is introduced as a novel method of adapting SHMMs from existing wellperforming acoustic hidden Markovmodels (HMMs). This technique is demonstrated to provide improved audio-visualmodelling over the jointly-trained SHMMapproach at all levels of acoustic noise for the recognition of audio-visual speech events. However, the close coupling of the SHMM approach will be shown to be less useful for speaker recognition, where a late integration approach is demonstrated to be superior.
6

Dean, David Brendan. "Synchronous HMMs for audio-visual speech processing." Queensland University of Technology, 2008. http://eprints.qut.edu.au/17689/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Both human perceptual studies and automaticmachine-based experiments have shown that visual information from a speaker's mouth region can improve the robustness of automatic speech processing tasks, especially in the presence of acoustic noise. By taking advantage of the complementary nature of the acoustic and visual speech information, audio-visual speech processing (AVSP) applications can work reliably in more real-world situations than would be possible with traditional acoustic speech processing applications. The two most prominent applications of AVSP for viable human-computer-interfaces involve the recognition of the speech events themselves, and the recognition of speaker's identities based upon their speech. However, while these two fields of speech and speaker recognition are closely related, there has been little systematic comparison of the two tasks under similar conditions in the existing literature. Accordingly, the primary focus of this thesis is to compare the suitability of general AVSP techniques for speech or speaker recognition, with a particular focus on synchronous hidden Markov models (SHMMs). The cascading appearance-based approach to visual speech feature extraction has been shown to work well in removing irrelevant static information from the lip region to greatly improve visual speech recognition performance. This thesis demonstrates that these dynamic visual speech features also provide for an improvement in speaker recognition, showing that speakers can be visually recognised by how they speak, in addition to their appearance alone. This thesis investigates a number of novel techniques for training and decoding of SHMMs that improve the audio-visual speech modelling ability of the SHMM approach over the existing state-of-the-art joint-training technique. Novel experiments are conducted within to demonstrate that the reliability of the two streams during training is of little importance to the final performance of the SHMM. Additionally, two novel techniques of normalising the acoustic and visual state classifiers within the SHMM structure are demonstrated for AVSP. Fused hidden Markov model (FHMM) adaptation is introduced as a novel method of adapting SHMMs from existing wellperforming acoustic hidden Markovmodels (HMMs). This technique is demonstrated to provide improved audio-visualmodelling over the jointly-trained SHMMapproach at all levels of acoustic noise for the recognition of audio-visual speech events. However, the close coupling of the SHMM approach will be shown to be less useful for speaker recognition, where a late integration approach is demonstrated to be superior.
7

Ouenniche, Kaouther. "Multimodal deep learning for audiovisual production." Electronic Thesis or Diss., Institut polytechnique de Paris, 2023. http://www.theses.fr/2023IPPAS020.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Dans le contexte en constante évolution du contenu audiovisuel, la nécessité cruciale d'automatiser l'indexation et l'organisation des archives s'est imposée comme un objectif primordial. En réponse, cette recherche explore l'utilisation de techniques d'apprentissage profond pour automatiser l'extraction de métadonnées diverses dans les archives, améliorant ainsi leur accessibilité et leur réutilisation. La première contribution de cette recherche concerne la classification des mouvements de caméra. Il s'agit d'un aspect crucial de l'indexation du contenu, car il permet une catégorisation efficace et une récupération du contenu vidéo en fonction de la dynamique visuelle qu'il présente. L'approche proposée utilise des réseaux neuronaux convolutionnels 3D avec des blocs résiduels. Une approche semi-automatique pour la construction d'un ensemble de données fiable sur les mouvements de caméra à partir de vidéos disponibles au public est également présentée, réduisant au minimum le besoin d'intervention manuelle. De plus, la création d'un ensemble de données d'évaluation exigeant, comprenant des vidéos de la vie réelle tournées avec des caméras professionnelles à différentes résolutions, met en évidence la robustesse et la capacité de généralisation de la technique proposée, atteignant un taux de précision moyen de 94 %.La deuxième contribution se concentre sur la tâche de Vidéo Question Answering. Dans ce contexte, notre Framework intègre un Transformers léger et un module de cross modalité. Ce module utilise une corrélation croisée pour permettre un apprentissage réciproque entre les caractéristiques visuelles conditionnées par le texte et les caractéristiques textuelles conditionnées par la vidéo. De plus, un scénario de test adversarial avec des questions reformulées met en évidence la robustesse du modèle et son applicabilité dans le monde réel. Les résultats expérimentaux sur MSVD-QA et MSRVTT-QA, valident la méthodologie proposée, avec une précision moyenne de 45 % et 42 % respectivement. La troisième contribution de cette recherche aborde le problème de vidéo captioning. Le travail introduit intègre un module de modality attention qui capture les relations complexes entre les données visuelles et textuelles à l'aide d'une corrélation croisée. De plus, l'intégration de l'attention temporelle améliore la capacité du modèle à produire des légendes significatives en tenant compte de la dynamique temporelle du contenu vidéo. Notre travail intègre également une tâche auxiliaire utilisant une fonction de perte contrastive, ce qui favorise la généralisation du modèle et une compréhension plus approfondie des relations intermodales et des sémantiques sous-jacentes. L'utilisation d'une architecture de transformer pour l'encodage et le décodage améliore considérablement la capacité du modèle à capturer les interdépendances entre les données textuelles et vidéo. La recherche valide la méthodologie proposée par une évaluation rigoureuse sur MSRVTT, atteignant des scores BLEU4, ROUGE et METEOR de 0,4408, 0,6291 et 0,3082 respectivement. Notre approche surpasse les méthodes de l'état de l'art, avec des gains de performance allant de 1,21 % à 1,52 % pour les trois métriques considérées. En conclusion, ce manuscrit offre une exploration holistique des techniques basées sur l'apprentissage profond pour automatiser l'indexation du contenu télévisuel, en abordant la nature laborieuse et chronophage de l'indexation manuelle. Les contributions englobent la classification des types de mouvements de caméra, la vidéo question answering et la vidéo captioning, faisant avancer collectivement l'état de l'art et fournissant des informations précieuses pour les chercheurs dans le domaine. Ces découvertes ont non seulement des applications pratiques pour la recherche et l'indexation de contenu, mais contribuent également à l'avancement plus large des méthodologies d'apprentissage profond dans le contexte multimodal
Within the dynamic landscape of television content, the critical need to automate the indexing and organization of archives has emerged as a paramount objective. In response, this research explores the use of deep learning techniques to automate the extraction of diverse metadata from television archives, improving their accessibility and reuse.The first contribution of this research revolves around the classification of camera motion types. This is a crucial aspect of content indexing as it allows for efficient categorization and retrieval of video content based on the visual dynamics it exhibits. The novel approach proposed employs 3D convolutional neural networks with residual blocks, a technique inspired by action recognition methods. A semi-automatic approach for constructing a reliable camera motion dataset from publicly available videos is also presented, minimizing the need for manual intervention. Additionally, the creation of a challenging evaluation dataset, comprising real-life videos shot with professional cameras at varying resolutions, underlines the robustness and generalization power of the proposed technique, achieving an average accuracy rate of 94%.The second contribution centers on the demanding task of Video Question Answering. In this context, we explore the effectiveness of attention-based transformers for facilitating grounded multimodal learning. The challenge here lies in bridging the gap between the visual and textual modalities and mitigating the quadratic complexity of transformer models. To address these issues, a novel framework is introduced, which incorporates a lightweight transformer and a cross-modality module. This module leverages cross-correlation to enable reciprocal learning between text-conditioned visual features and video-conditioned textual features. Furthermore, an adversarial testing scenario with rephrased questions highlights the model's robustness and real-world applicability. Experimental results on benchmark datasets, such as MSVD-QA and MSRVTT-QA, validate the proposed methodology, with an average accuracy of 45% and 42%, respectively, which represents notable improvements over existing approaches.The third contribution of this research addresses the multimodal video captioning problem, a critical aspect of content indexing. The introduced framework incorporates a modality-attention module that captures the intricate relationships between visual and textual data using cross-correlation. Moreover, the integration of temporal attention enhances the model's ability to produce meaningful captions, considering the temporal dynamics of video content. Our work also incorporates an auxiliary task employing a contrastive loss function, which promotes model generalization and a deeper understanding of inter-modal relationships and underlying semantics. The utilization of a transformer architecture for encoding and decoding significantly enhances the model's capacity to capture interdependencies between text and video data. The research validates the proposed methodology through rigorous evaluation on the MSRVTT benchmark,viachieving BLEU4, ROUGE, and METEOR scores of 0.4408, 0.6291 and 0.3082, respectively. In comparison to state-of-the-art methods, this approach consistently outperforms, with performance gains ranging from 1.21% to 1.52% across the three metrics considered.In conclusion, this manuscript offers a holistic exploration of deep learning-based techniques to automate television content indexing, addressing the labor-intensive and time-consuming nature of manual indexing. The contributions encompass camera motion type classification, VideoQA, and multimodal video captioning, collectively advancing the state of the art and providing valuable insights for researchers in the field. These findings not only have practical applications for content retrieval and indexing but also contribute to the broader advancement of deep learning methodologies in the multimodal context
8

Bernardi, Dario. "A feasibility study on pairinga smartwatch and a mobile devicethrough multi-modal gestures." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-254387.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Pairing is the process of establishing an association between two personal devices. Although such a process is intuitively very simple, achieving a straightforward and secure association is challenging due to several possible attacks and usability-related issues. Indeed, malicious attackers might want to spoof the communication between devices in order to gather sensitive information or harm them. Moreover, offering users simple and usable schemes which attain a high level of security remains a major issue. In addition, due to the great diversity of pairing scenarios and equipment, achieving a single, usable, secure association for all possible devices and use cases is simply not possible.In this thesis, we study the feasibility of a novel pairing scheme based on multi-modal gestures, namely, gestures involving drawing supported by accelerometer data. In particular, a user can pair a smart-watch on his wrist and a mobile device (e.g., a smart-phone) by simply drawing with a finger on the screen at the device.To this purpose, we developed mobile applications for smart-watch and smart-phone to sample and process sensed data in support of a secure commitment-based protocol. Furthermore, we performed experiments to verify whether encoded matching-movements have a clear similarity compared to non-matching movements.The results proved that it is feasible to implement such a scheme which also offers users a natural way to perform secure pairing. This innovative scheme may be adopted by a large number of mobile devices (e.g., smart-watches, smart-phones, tablets, etc.) in different scenarios.
Parkoppling är processen för att etablera en anslutning mellan två personliga enheter. Även om den processen rent intuitivt verkar väldigt enkel så är det en utmaning att göra det säkert på grund av en uppsjö olika attackvektorer och användbarhets-relaterade problem. Faktum är att angripare kanske vill spionera på kommunikationen mellan enheterna för att samla information, eller skada enheten. Dessutom kvarstår problemet att erbjuda användaren ett simpelt och användarvänligt sätt att parkoppla enheter som håller en hög nivå av säkerhet. På grund av mängden av olika enheter och parkopplingsscenarier är det helt enkelt inte möjligt att skapa ett enskilt säkert sätt att parkoppla enheter på.I den här uppsatsen studerar vi genomförbarheten av ett nytt parkopplingsschema baserat på kombinerade rörelser, nämligen en målande rörelse supportat av data från accelerometern. I synnerhet kan en användare parkoppla en smart klocka på sin handled med en mobiltelefon genom att måla med sitt finger på mobiltelefonens skärm. För ändamålet utvecklar vi en mobilapplikation för smarta klocka och mobiltelefoner för att testa och processa inhämtad data som support för ett säkert engagemangsbaserat protokoll. Utöver det genomförde vi ett antal experiment för att verifiera om synkroniserade rörelser har tydliga liknelser i jämförelse med icke synkroniserade rörelser.Resultatet visade att det är genomförbart att implementera ett sådant system vilket också erbjuder användaren ett naturligt sätt att genomföra en säker parkoppling. Detta innovativa system kan komma att användas av ett stort antal mobila enheter (t.ex. smarta klockor, mobiltelefoner, surfplattor etc) i olika scenarion.
9

Mozaffari, Maaref Mohammad Hamed. "A Real-Time and Automatic Ultrasound-Enhanced Multimodal Second Language Training System: A Deep Learning Approach." Thesis, Université d'Ottawa / University of Ottawa, 2020. http://hdl.handle.net/10393/40477.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
The critical role of language pronunciation in communicative competence is significant, especially for second language learners. Despite renewed awareness of the importance of articulation, it remains a challenge for instructors to handle the pronunciation needs of language learners. There are relatively scarce pedagogical tools for pronunciation teaching and learning, such as inefficient, traditional pronunciation instructions like listening and repeating. Recently, electronic visual feedback (EVF) systems (e.g., medical ultrasound imaging) have been exploited in new approaches in such a way that they could be effectively incorporated in a range of teaching and learning contexts. Evaluation of ultrasound-enhanced methods for pronunciation training, such as multimodal methods, has asserted that visualizing articulator’s system as biofeedback to language learners might improve the efficiency of articulation learning. Despite the recent successful usage of multimodal techniques for pronunciation training, manual works and human manipulation are inevitable in many stages of those systems. Furthermore, recognizing tongue shape in noisy and low-contrast ultrasound images is a challenging job, especially for non-expert users in real-time applications. On the other hand, our user study revealed that users could not perceive the placement of their tongue inside the mouth comfortably just by watching pre-recorded videos. Machine learning is a subset of Artificial Intelligence (AI), where machines can learn by experiencing and acquiring skills without human involvement. Inspired by the functionality of the human brain, deep artificial neural networks learn from large amounts of data to perform a task repeatedly. Deep learning-based methods in many computer vision tasks have emerged as the dominant paradigm in recent years. Deep learning methods are powerful in automatic learning of a new job, while unlike traditional image processing methods, they are capable of dealing with many challenges such as object occlusion, transformation variant, and background artifacts. In this dissertation, we implemented a guided language pronunciation training system, benefits from the strengths of deep learning techniques. Our modular system attempts to provide a fully automatic and real-time language pronunciation training tool using ultrasound-enhanced augmented reality. Qualitatively and quantitatively assessments indicate an exceptional performance for our system in terms of flexibility, generalization, robustness, and autonomy outperformed previous techniques. Using our ultrasound-enhanced system, a language learner can observe her/his tongue movements during real-time speech, superimposed on her/his face automatically.
10

Benmoussat, Mohammed Seghir. "Hyperspectral imagery algorithms for the processing of multimodal data : application for metal surface inspection in an industrial context by means of multispectral imagery, infrared thermography and stripe projection techniques." Thesis, Aix-Marseille, 2013. http://www.theses.fr/2013AIXM4347/document.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Le travail présenté dans cette thèse porte sur l'inspection de surfaces métalliques industrielles. Nous proposons de généraliser des méthodes de l'imagerie hyperspectrale à des données multimodales comme des images optiques multi-canales, et des images thermographiques multi-temporelles. Dans la première application, les cubes de données sont construits à partir d'images multi-composantes pour détecter des défauts de surface. Les meilleures performances sont obtenues avec les éclairages multi-longueurs d'ondes dans le visible et le proche IR, et la détection du défaut en utilisant l'angle spectral, avec le spectre moyen comme référence. La deuxième application concerne l'utilisation de l'imagerie thermique pour l'inspection de pièces métalliques nucléaires afin de détecter des défauts de surface et sub-surface. Une approche 1D est proposée, basée sur l'utilisation du kurtosis pour sélectionner la composante principale parmi les premières obtenues après réduction des données avec l’ACP. La méthode proposée donne de bonnes performances avec des données non-bruitées et homogènes, cependant la SVD avec les algorithmes de détection d'anomalies est très robuste aux perturbations. Finalement, une approche, basée sur les techniques d'analyse de franges et la lumière structurée est présentée, dans le but d'inspecter des surfaces métalliques à forme libre. Après avoir déterminé les paramètres décrivant les modèles de franges sinusoïdaux, l'approche proposée consiste à projeter une liste de motifs déphasés et à calculer l'image de phase des motifs enregistrés. La localisation des défauts est basée sur la détection et l'analyse des franges dans les images de phase
The work presented in this thesis deals with the quality control and inspection of industrial metallic surfaces. The purpose is the generalization and application of hyperspectral imagery methods for multimodal data such as multi-channel optical images and multi-temporal thermographic images. In the first application, data cubes are built from multi-component images to detect surface defects within flat metallic parts. The best performances are obtained with multi-wavelength illuminations in the visible and near infrared ranges, and detection using spectral angle mapper with mean spectrum as a reference. The second application turns on the use of thermography imaging for the inspection of nuclear metal components to detect surface and subsurface defects. A 1D approach is proposed based on using the kurtosis to select 1 principal component (PC) from the first PCs obtained after reducing the original data cube with the principal component analysis (PCA) algorithm. The proposed PCA-1PC method gives good performances with non-noisy and homogeneous data, and SVD with anomaly detection algorithms gives the most consistent results and is quite robust to perturbations such as inhomogeneous background. Finally, an approach based on fringe analysis and structured light techniques in case of deflectometric recordings is presented for the inspection of free-form metal surfaces. After determining the parameters describing the sinusoidal stripe patterns, the proposed approach consists in projecting a list of phase-shifted patterns and calculating the corresponding phase-images. Defect location is based on detecting and analyzing the stripes within the phase-images

Books on the topic "Multimodal data processing":

1

Adams, Teresa M. Guidelines for the implementation of multimodal transportation location referencing systems. Washington, D.C: National Academy Press, 2001.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
2

1959-, Grifoni Patrizia, ed. Multimodal human computer interaction and pervasive services. Hershey PA: Information Science Reference, 2009.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
3

Toselli, Alejandro Héctor. Multimodal Interactive Pattern Recognition and Applications. London: Springer-Verlag London Limited, 2011.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
4

Biswas, Pradipta. A Multimodal End-2-End Approach to Accessible Computing. London: Springer London, 2013.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
5

International Evaluation Workshop on Classification of Events, Activities and Relationships (1st 2006 Southampton, England). Multimodal technologies for perception of humans: First International Evaluation Workshop on Classification of Events, Activities and Relationships, CLEAR 2006, Southampton, UK, April 6-7, 2006 : revised selected papers. Berlin: Springer, 2007.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
6

Vieten, Andrea. Monomodale und multimodale Registrierung von autoradiographischen und histologischen Bilddaten. Jülich: Forschungszentrum Jülich, Zentralbibliothek, 2005.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
7

Masson, Paul R. MULTIMOD Mark II: A revised and extended model. Washington, D.C: International Monetary Fund, 1990.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
8

Dey, Somnath, and Debasis Samanta. Unimodal and Multimodal Biometric Data Indexing. De Gruyter, Inc., 2014.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
9

Dey, Somnath, and Debasis Samanta. Unimodal and Multimodal Biometric Data Indexing. de Gruyter GmbH, Walter, 2014.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
10

Dey, Somnath, and Debasis Samanta. Unimodal and Multimodal Biometric Data Indexing. de Gruyter GmbH, Walter, 2015.

Find full text
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Multimodal data processing":

1

Huang, Lihe. "Collecting and processing multimodal data." In Toward Multimodal Pragmatics, 99–108. London: Routledge, 2021. http://dx.doi.org/10.4324/9781003251774-5.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Naït-Ali, Amine, Emre Zeybek, and Xavier Drouot. "Introduction to Multimodal Compression of Biomedical Data." In Advanced Biosignal Processing, 353–74. Berlin, Heidelberg: Springer Berlin Heidelberg, 2009. http://dx.doi.org/10.1007/978-3-540-89506-0_17.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Singh, Archana, and Kavita Sahu. "Emotion Recognition Using Multimodal Fusion Models." In Multimedia Data Processing and Computing, 21–31. Boca Raton: CRC Press, 2023. http://dx.doi.org/10.1201/9781003391272-2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Janiak, Mateusz, Marek Kulbacki, Wojciech Knieć, Jerzy Paweł Nowacki, and Aldona Drabik. "Data Flow Processing Framework for Multimodal Data Environment Software." In New Trends in Intelligent Information and Database Systems, 353–62. Cham: Springer International Publishing, 2015. http://dx.doi.org/10.1007/978-3-319-16211-9_36.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Sathio, Anwar Ali, Muhammad Malook Rind, and Abdullah Lakhan. "Deep Learning Algorithms and Architectures for Multimodal Data Analysis." In Deep Learning for Multimedia Processing Applications, 74–113. Boca Raton: CRC Press, 2023. http://dx.doi.org/10.1201/9781032646268-5.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Meng, Lei, Ah-Hwee Tan, and Donald C. Wunsch II. "Online Multimodal Co-indexing and Retrieval of Social Media Data." In Advanced Information and Knowledge Processing, 155–74. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-02985-2_7.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Smith, Rebecca, and Frank Pollick. "The role of dance experience, visual processing strategies, and quantitative movement features in recognition of emotion from whole-body movements." In Dance Data, Cognition, and Multimodal Communication, 274–94. London: Routledge, 2022. http://dx.doi.org/10.4324/9781003106401-22.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Jin, Peiquan, Jianchuan Li, Lin Mu, Jingren Zhou, and Jie Zhao. "Effective Sentiment Analysis for Multimodal Review Data on the Web." In Algorithms and Architectures for Parallel Processing, 623–38. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-60248-2_43.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Andriyanov, Nikita. "Multimodal Data Processing Based on Text Classifiers and Image Recognition." In Pattern Recognition, Computer Vision, and Image Processing. ICPR 2022 International Workshops and Challenges, 414–23. Cham: Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-37742-6_31.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Patel, Daniel. "Multimodal Summed Area Tables—A Proof of Concept." In Interactive Data Processing and 3D Visualization of the Solid Earth, 179–207. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-90716-7_5.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Multimodal data processing":

1

Yang, Lixin, Genshe Chen, Ronghua Xu, Sherry Chen, and Yu Chen. "Decentralized autonomous imaging data processing using blockchain." In Multimodal Biomedical Imaging XIV, edited by Fred S. Azar, Xavier Intes, and Qianqian Fang. SPIE, 2019. http://dx.doi.org/10.1117/12.2513243.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Kharinov, Mikhail V., and Aleksandr N. Bykov. "Data Structure for Multimodal Signal Processing." In 2019 International Russian Automation Conference. IEEE, 2019. http://dx.doi.org/10.1109/rusautocon.2019.8867769.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Chen, Jia, and Ioannis D. Schizas. "Distributed efficient multimodal data clustering." In 2017 25th European Signal Processing Conference (EUSIPCO). IEEE, 2017. http://dx.doi.org/10.23919/eusipco.2017.8081621.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Fedorov, Igor, Bhaskar D. Rao, and Truong Q. Nguyen. "Multimodal sparse Bayesian dictionary learning applied to multimodal data classification." In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2017. http://dx.doi.org/10.1109/icassp.2017.7952554.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Mukherjee, Arpan, Ali Tajer, Pin-Yu Chen, and Payel Das. "Active Estimation From Multimodal Data." In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021. http://dx.doi.org/10.1109/icassp39728.2021.9414772.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Ehlen, Patrick, Michael Johnston, and Gunaranjan Vasireddy. "Collecting mobile multimodal data for match." In 7th International Conference on Spoken Language Processing (ICSLP 2002). ISCA: ISCA, 2002. http://dx.doi.org/10.21437/icslp.2002-616.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

C. L. Cavalheiro, Laís, Matheus C. Pavan, and Ivandré Paraboni. "Stance Prediction from Multimodal Social Media Data." In International Conference Recent Advances in Natural Language Processing. INCOMA Ltd., Shoumen, BULGARIA, 2023. http://dx.doi.org/10.26615/978-954-452-092-2_027.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Liu, Jingwei. "A New Theory of Data Processing: Applying Artificial Intelligence to Cognition and Humanity." In ICMI '23: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION. New York, NY, USA: ACM, 2023. http://dx.doi.org/10.1145/3577190.3616123.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Zhang, Hong, Li Chen, Jun Liu, and Junsong Yuan. "Hierarchical multi-feature fusion for multimodal data analysis." In 2014 IEEE International Conference on Image Processing (ICIP). IEEE, 2014. http://dx.doi.org/10.1109/icip.2014.7026195.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Basit, Mohammad, Bashir Alam, Zubaida Fatima, and Salman Shaikh. "Natural Disaster Tweets Classification Using Multimodal Data." In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA, USA: Association for Computational Linguistics, 2023. http://dx.doi.org/10.18653/v1/2023.emnlp-main.471.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "Multimodal data processing":

1

Hamlin, Alexandra, Erik Kobylarz, James Lever, Susan Taylor, and Laura Ray. Assessing the feasibility of detecting epileptic seizures using non-cerebral sensor. Engineer Research and Development Center (U.S.), December 2021. http://dx.doi.org/10.21079/11681/42562.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
This paper investigates the feasibility of using non-cerebral, time-series data to detect epileptic seizures. Data were recorded from fifteen patients (7 male, 5 female, 3 not noted, mean age 36.17 yrs), five of whom had a total of seven seizures. Patients were monitored in an inpatient setting using standard video electroencephalography (vEEG), while also wearing sensors monitoring electrocardiography, electrodermal activity, electromyography, accelerometry, and audio signals (vocalizations). A systematic and detailed study was conducted to identify the sensors and the features derived from the non-cerebral sensors that contribute most significantly to separability of data acquired during seizures from non-seizure data. Post-processing of the data using linear discriminant analysis (LDA) shows that seizure data are strongly separable from non-seizure data based on features derived from the signals recorded. The mean area under the receiver operator characteristic (ROC) curve for each individual patient that experienced a seizure during data collection, calculated using LDA, was 0.9682. The features that contribute most significantly to seizure detection differ for each patient. The results show that a multimodal approach to seizure detection using the specified sensor suite is promising in detecting seizures with both sensitivity and specificity. Moreover, the study provides a means to quantify the contribution of each sensor and feature to separability. Development of a non-electroencephalography (EEG) based seizure detection device would give doctors a more accurate seizure count outside of the clinical setting, improving treatment and the quality of life of epilepsy patients.

To the bibliography