Academic literature on the topic 'Multimodal data processing'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Multimodal data processing.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Journal articles on the topic "Multimodal data processing":
Kyselova, A. H., G. D. Kiselov, A. A. Serhyeyev, and A. V. Shalaginov. "Processing input data in multimodal applications." Electronics and Communications 16, no. 2 (March 28, 2011): 86–92. http://dx.doi.org/10.20535/2312-1807.2011.16.2.268253.
Boyko, Nataliya. "Models and Algorithms for Multimodal Data Processing." WSEAS TRANSACTIONS ON INFORMATION SCIENCE AND APPLICATIONS 20 (March 14, 2023): 87–97. http://dx.doi.org/10.37394/23209.2023.20.11.
Parsons, Aaron D., Stephen W. T. Price, Nicola Wadeson, Mark Basham, Andrew M. Beale, Alun W. Ashton, J. Frederick W. Mosselmans, and Paul D. Quinn. "Automatic processing of multimodal tomography datasets." Journal of Synchrotron Radiation 24, no. 1 (January 1, 2017): 248–56. http://dx.doi.org/10.1107/s1600577516017756.
Qi, Qingfu, Liyuan Lin, and Rui Zhang. "Feature Extraction Network with Attention Mechanism for Data Enhancement and Recombination Fusion for Multimodal Sentiment Analysis." Information 12, no. 9 (August 24, 2021): 342. http://dx.doi.org/10.3390/info12090342.
Chen, Mujun. "Automatic Image Processing Algorithm for Light Environment Optimization Based on Multimodal Neural Network Model." Computational Intelligence and Neuroscience 2022 (June 3, 2022): 1–12. http://dx.doi.org/10.1155/2022/5156532.
BASYSTIUK, Oleh, and Nataliia MELNYKOVA. "MULTIMODAL SPEECH RECOGNITION BASED ON AUDIO AND TEXT DATA." Herald of Khmelnytskyi National University. Technical sciences 313, no. 5 (October 27, 2022): 22–25. http://dx.doi.org/10.31891/2307-5732-2022-313-5-22-25.
Basystiuk, Oleh, and Nataliya Melnykova. "Development of the Multimodal Handling Interface Based on Google API." Computer Design Systems. Theory and Practice 6, no. 1 (2024): 216–23. http://dx.doi.org/10.23939/cds2024.01.216.
Sulema, Yevgeniya. "MULTIMODAL DATA PROCESSING BASED ON ALGEBRAIC SYSTEM OF AGGREGATES RELATIONS." Radio Electronics, Computer Science, Control, no. 1 (May 15, 2020): 169–80. http://dx.doi.org/10.15588/1607-3274-2020-1-17.
Ren, Jinchang, Junwei Han, and Mauro Dalla Mura. "Special issue on multimodal data fusion for multidimensional signal processing." Multidimensional Systems and Signal Processing 27, no. 4 (August 8, 2016): 801–5. http://dx.doi.org/10.1007/s11045-016-0441-0.
Chen, Shu-Ching. "Embracing Multimodal Data in Multimedia Data Analysis." IEEE MultiMedia 28, no. 3 (July 1, 2021): 5–7. http://dx.doi.org/10.1109/mmul.2021.3104911.
Dissertations / Theses on the topic "Multimodal data processing":
Cadène, Rémi. "Deep Multimodal Learning for Vision and Language Processing." Electronic Thesis or Diss., Sorbonne université, 2020. http://www.theses.fr/2020SORUS277.
Digital technologies have become instrumental in transforming our society. Recent statistical methods have been successfully deployed to automate the processing of the growing amount of images, videos, and texts we produce daily. In particular, deep neural networks have been adopted by the computer vision and natural language processing communities for their ability to perform accurate image recognition and text understanding once trained on big sets of data. Advances in both communities built the groundwork for new research problems at the intersection of vision and language. Integrating language into visual recognition could have an important impact on human life through the creation of real-world applications such as next-generation search engines or AI assistants.In the first part of this thesis, we focus on systems for cross-modal text-image retrieval. We propose a learning strategy to efficiently align both modalities while structuring the retrieval space with semantic information. In the second part, we focus on systems able to answer questions about an image. We propose a multimodal architecture that iteratively fuses the visual and textual modalities using a factorized bilinear model while modeling pairwise relationships between each region of the image. In the last part, we address the issues related to biases in the modeling. We propose a learning strategy to reduce the language biases which are commonly present in visual question answering systems
Lizarraga, Gabriel M. "A Neuroimaging Web Interface for Data Acquisition, Processing and Visualization of Multimodal Brain Images." FIU Digital Commons, 2018. https://digitalcommons.fiu.edu/etd/3855.
Gimenes, Gabriel Perri. "Advanced techniques for graph analysis: a multimodal approach over planetary-scale data." Universidade de São Paulo, 2015. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-26062015-105026/.
Aplicações como comércio eletrônico, redes de computadores, redes sociais e biologia (interação proteica), entre outras, levaram a produção de dados que podem ser representados como grafos à escala planetária { podendo possuir milhões de nós e bilhões de arestas. Tais aplicações apresentam problemas desafiadores quando a tarefa consiste em usar as informações contidas nos grafos para auxiliar processos de tomada de decisão através da descoberta de padrões não triviais e potencialmente utéis. Para processar esses grafos em busca de padrões, tanto pesquisadores como a indústria tem usado recursos de processamento distribuído organizado em clusters computacionais. Entretanto, a construção e manutenção desses clusters pode ser complexa, trazendo tanto problemas técnicos como financeiros que podem ser proibitivos em diversos casos. Por isso, torna-se desejável a capacidade de se processar grafos em larga escala usando somente um nó computacional. Para isso, foram desenvolvidos processos e algoritmos seguindo três abordagens diferentes, visando a definição de um arcabouço de análise capaz de revelar padrões, compreensão e auxiliar na tomada de decisão sobre grafos em escala planetária.
Rabhi, Sara. "Optimized deep learning-based multimodal method for irregular medical timestamped data." Electronic Thesis or Diss., Institut polytechnique de Paris, 2022. http://www.theses.fr/2022IPPAS003.
The wide adoption of Electronic Health Records in hospitals’ information systems has led to the definition of large databases grouping various types of data such as textual notes, longitudinal medical events, and tabular patient information. However, the records are only filled during consultations or hospital stays that depend on the patient’s state, and local habits. A system that can leverage the different types of data collected at different time scales is critical for reconstructing the patient’s health trajectory, analyzing his history, and consequently delivering more adapted care.This thesis work addresses two main challenges of medical data processing: learning to represent the sequence of medical observations with irregular elapsed time between consecutive visits and optimizing the extraction of medical events from clinical notes. Our main goal is to design a multimodal representation of the patient’s health trajectory to solve clinical prediction problems. Our first work built a framework for modeling irregular medical time series to evaluate the importance of considering the time gaps between medical episodes when representing a patient’s health trajectory. To that end, we conducted a comparative study of sequential neural networks and irregular time representation techniques. The clinical objective was to predict retinopathy complications for type 1 diabetes patients in the French database CaRéDIAB (Champagne Ardenne Réseau Diabetes) using their history of HbA1c measurements. The study results showed that the attention-based model combined with the soft one-hot representation of time gaps led to AUROC score of 88.65% (specificity of 85.56%, sensitivity of 83.33%), an improvement of 4.3% when compared to the LSTM-based model. Motivated by these results, we extended our framework to shorter multivariate time series and predicted in-hospital mortality for critical care patients of the MIMIC-III dataset. The proposed architecture, HiTT, improved the AUC score by 5% over the Transformer baseline. In the second step, we focused on extracting relevant medical information from clinical notes to enrich the patient’s health trajectories. Particularly, Transformer-based architectures showed encouraging results in medical information extraction tasks. However, these complex models require a large, annotated corpus. This requirement is hard to achieve in the medical field as it necessitates access to private patient data and high expert annotators. To reduce annotation cost, we explored active learning strategies that have been shown to be effective in tasks such as text classification, information extraction, and speech recognition. In addition to existing methods, we defined a Hybrid Weighted Uncertainty Sampling active learning strategy that takes advantage of the contextual embeddings learned by the Transformer-based approach to measuring the representativeness of samples. A simulated study using the i2b2-2010 challenge dataset showed that our proposed metric reduces the annotation cost by 70% to achieve the same score as passive learning. Lastly, we combined multivariate medical time series and medical concepts extracted from clinical notes of the MIMIC-III database to train a multimodal transformer-based architecture. The test results of the in-hospital mortality task showed an improvement of 5.3% when considering additional text data. This thesis contributes to patient health trajectory representation by alleviating the burden of episodic medical records and the manual annotation of free-text notes
Dean, David Brendan. "Synchronous HMMs for audio-visual speech processing." Thesis, Queensland University of Technology, 2008. https://eprints.qut.edu.au/17689/3/David_Dean_Thesis.pdf.
Dean, David Brendan. "Synchronous HMMs for audio-visual speech processing." Queensland University of Technology, 2008. http://eprints.qut.edu.au/17689/.
Ouenniche, Kaouther. "Multimodal deep learning for audiovisual production." Electronic Thesis or Diss., Institut polytechnique de Paris, 2023. http://www.theses.fr/2023IPPAS020.
Within the dynamic landscape of television content, the critical need to automate the indexing and organization of archives has emerged as a paramount objective. In response, this research explores the use of deep learning techniques to automate the extraction of diverse metadata from television archives, improving their accessibility and reuse.The first contribution of this research revolves around the classification of camera motion types. This is a crucial aspect of content indexing as it allows for efficient categorization and retrieval of video content based on the visual dynamics it exhibits. The novel approach proposed employs 3D convolutional neural networks with residual blocks, a technique inspired by action recognition methods. A semi-automatic approach for constructing a reliable camera motion dataset from publicly available videos is also presented, minimizing the need for manual intervention. Additionally, the creation of a challenging evaluation dataset, comprising real-life videos shot with professional cameras at varying resolutions, underlines the robustness and generalization power of the proposed technique, achieving an average accuracy rate of 94%.The second contribution centers on the demanding task of Video Question Answering. In this context, we explore the effectiveness of attention-based transformers for facilitating grounded multimodal learning. The challenge here lies in bridging the gap between the visual and textual modalities and mitigating the quadratic complexity of transformer models. To address these issues, a novel framework is introduced, which incorporates a lightweight transformer and a cross-modality module. This module leverages cross-correlation to enable reciprocal learning between text-conditioned visual features and video-conditioned textual features. Furthermore, an adversarial testing scenario with rephrased questions highlights the model's robustness and real-world applicability. Experimental results on benchmark datasets, such as MSVD-QA and MSRVTT-QA, validate the proposed methodology, with an average accuracy of 45% and 42%, respectively, which represents notable improvements over existing approaches.The third contribution of this research addresses the multimodal video captioning problem, a critical aspect of content indexing. The introduced framework incorporates a modality-attention module that captures the intricate relationships between visual and textual data using cross-correlation. Moreover, the integration of temporal attention enhances the model's ability to produce meaningful captions, considering the temporal dynamics of video content. Our work also incorporates an auxiliary task employing a contrastive loss function, which promotes model generalization and a deeper understanding of inter-modal relationships and underlying semantics. The utilization of a transformer architecture for encoding and decoding significantly enhances the model's capacity to capture interdependencies between text and video data. The research validates the proposed methodology through rigorous evaluation on the MSRVTT benchmark,viachieving BLEU4, ROUGE, and METEOR scores of 0.4408, 0.6291 and 0.3082, respectively. In comparison to state-of-the-art methods, this approach consistently outperforms, with performance gains ranging from 1.21% to 1.52% across the three metrics considered.In conclusion, this manuscript offers a holistic exploration of deep learning-based techniques to automate television content indexing, addressing the labor-intensive and time-consuming nature of manual indexing. The contributions encompass camera motion type classification, VideoQA, and multimodal video captioning, collectively advancing the state of the art and providing valuable insights for researchers in the field. These findings not only have practical applications for content retrieval and indexing but also contribute to the broader advancement of deep learning methodologies in the multimodal context
Bernardi, Dario. "A feasibility study on pairinga smartwatch and a mobile devicethrough multi-modal gestures." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-254387.
Parkoppling är processen för att etablera en anslutning mellan två personliga enheter. Även om den processen rent intuitivt verkar väldigt enkel så är det en utmaning att göra det säkert på grund av en uppsjö olika attackvektorer och användbarhets-relaterade problem. Faktum är att angripare kanske vill spionera på kommunikationen mellan enheterna för att samla information, eller skada enheten. Dessutom kvarstår problemet att erbjuda användaren ett simpelt och användarvänligt sätt att parkoppla enheter som håller en hög nivå av säkerhet. På grund av mängden av olika enheter och parkopplingsscenarier är det helt enkelt inte möjligt att skapa ett enskilt säkert sätt att parkoppla enheter på.I den här uppsatsen studerar vi genomförbarheten av ett nytt parkopplingsschema baserat på kombinerade rörelser, nämligen en målande rörelse supportat av data från accelerometern. I synnerhet kan en användare parkoppla en smart klocka på sin handled med en mobiltelefon genom att måla med sitt finger på mobiltelefonens skärm. För ändamålet utvecklar vi en mobilapplikation för smarta klocka och mobiltelefoner för att testa och processa inhämtad data som support för ett säkert engagemangsbaserat protokoll. Utöver det genomförde vi ett antal experiment för att verifiera om synkroniserade rörelser har tydliga liknelser i jämförelse med icke synkroniserade rörelser.Resultatet visade att det är genomförbart att implementera ett sådant system vilket också erbjuder användaren ett naturligt sätt att genomföra en säker parkoppling. Detta innovativa system kan komma att användas av ett stort antal mobila enheter (t.ex. smarta klockor, mobiltelefoner, surfplattor etc) i olika scenarion.
Mozaffari, Maaref Mohammad Hamed. "A Real-Time and Automatic Ultrasound-Enhanced Multimodal Second Language Training System: A Deep Learning Approach." Thesis, Université d'Ottawa / University of Ottawa, 2020. http://hdl.handle.net/10393/40477.
Benmoussat, Mohammed Seghir. "Hyperspectral imagery algorithms for the processing of multimodal data : application for metal surface inspection in an industrial context by means of multispectral imagery, infrared thermography and stripe projection techniques." Thesis, Aix-Marseille, 2013. http://www.theses.fr/2013AIXM4347/document.
The work presented in this thesis deals with the quality control and inspection of industrial metallic surfaces. The purpose is the generalization and application of hyperspectral imagery methods for multimodal data such as multi-channel optical images and multi-temporal thermographic images. In the first application, data cubes are built from multi-component images to detect surface defects within flat metallic parts. The best performances are obtained with multi-wavelength illuminations in the visible and near infrared ranges, and detection using spectral angle mapper with mean spectrum as a reference. The second application turns on the use of thermography imaging for the inspection of nuclear metal components to detect surface and subsurface defects. A 1D approach is proposed based on using the kurtosis to select 1 principal component (PC) from the first PCs obtained after reducing the original data cube with the principal component analysis (PCA) algorithm. The proposed PCA-1PC method gives good performances with non-noisy and homogeneous data, and SVD with anomaly detection algorithms gives the most consistent results and is quite robust to perturbations such as inhomogeneous background. Finally, an approach based on fringe analysis and structured light techniques in case of deflectometric recordings is presented for the inspection of free-form metal surfaces. After determining the parameters describing the sinusoidal stripe patterns, the proposed approach consists in projecting a list of phase-shifted patterns and calculating the corresponding phase-images. Defect location is based on detecting and analyzing the stripes within the phase-images
Books on the topic "Multimodal data processing":
Adams, Teresa M. Guidelines for the implementation of multimodal transportation location referencing systems. Washington, D.C: National Academy Press, 2001.
1959-, Grifoni Patrizia, ed. Multimodal human computer interaction and pervasive services. Hershey PA: Information Science Reference, 2009.
Toselli, Alejandro Héctor. Multimodal Interactive Pattern Recognition and Applications. London: Springer-Verlag London Limited, 2011.
Biswas, Pradipta. A Multimodal End-2-End Approach to Accessible Computing. London: Springer London, 2013.
International Evaluation Workshop on Classification of Events, Activities and Relationships (1st 2006 Southampton, England). Multimodal technologies for perception of humans: First International Evaluation Workshop on Classification of Events, Activities and Relationships, CLEAR 2006, Southampton, UK, April 6-7, 2006 : revised selected papers. Berlin: Springer, 2007.
Vieten, Andrea. Monomodale und multimodale Registrierung von autoradiographischen und histologischen Bilddaten. Jülich: Forschungszentrum Jülich, Zentralbibliothek, 2005.
Masson, Paul R. MULTIMOD Mark II: A revised and extended model. Washington, D.C: International Monetary Fund, 1990.
Dey, Somnath, and Debasis Samanta. Unimodal and Multimodal Biometric Data Indexing. De Gruyter, Inc., 2014.
Dey, Somnath, and Debasis Samanta. Unimodal and Multimodal Biometric Data Indexing. de Gruyter GmbH, Walter, 2014.
Dey, Somnath, and Debasis Samanta. Unimodal and Multimodal Biometric Data Indexing. de Gruyter GmbH, Walter, 2015.
Book chapters on the topic "Multimodal data processing":
Huang, Lihe. "Collecting and processing multimodal data." In Toward Multimodal Pragmatics, 99–108. London: Routledge, 2021. http://dx.doi.org/10.4324/9781003251774-5.
Naït-Ali, Amine, Emre Zeybek, and Xavier Drouot. "Introduction to Multimodal Compression of Biomedical Data." In Advanced Biosignal Processing, 353–74. Berlin, Heidelberg: Springer Berlin Heidelberg, 2009. http://dx.doi.org/10.1007/978-3-540-89506-0_17.
Singh, Archana, and Kavita Sahu. "Emotion Recognition Using Multimodal Fusion Models." In Multimedia Data Processing and Computing, 21–31. Boca Raton: CRC Press, 2023. http://dx.doi.org/10.1201/9781003391272-2.
Janiak, Mateusz, Marek Kulbacki, Wojciech Knieć, Jerzy Paweł Nowacki, and Aldona Drabik. "Data Flow Processing Framework for Multimodal Data Environment Software." In New Trends in Intelligent Information and Database Systems, 353–62. Cham: Springer International Publishing, 2015. http://dx.doi.org/10.1007/978-3-319-16211-9_36.
Sathio, Anwar Ali, Muhammad Malook Rind, and Abdullah Lakhan. "Deep Learning Algorithms and Architectures for Multimodal Data Analysis." In Deep Learning for Multimedia Processing Applications, 74–113. Boca Raton: CRC Press, 2023. http://dx.doi.org/10.1201/9781032646268-5.
Meng, Lei, Ah-Hwee Tan, and Donald C. Wunsch II. "Online Multimodal Co-indexing and Retrieval of Social Media Data." In Advanced Information and Knowledge Processing, 155–74. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-02985-2_7.
Smith, Rebecca, and Frank Pollick. "The role of dance experience, visual processing strategies, and quantitative movement features in recognition of emotion from whole-body movements." In Dance Data, Cognition, and Multimodal Communication, 274–94. London: Routledge, 2022. http://dx.doi.org/10.4324/9781003106401-22.
Jin, Peiquan, Jianchuan Li, Lin Mu, Jingren Zhou, and Jie Zhao. "Effective Sentiment Analysis for Multimodal Review Data on the Web." In Algorithms and Architectures for Parallel Processing, 623–38. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-60248-2_43.
Andriyanov, Nikita. "Multimodal Data Processing Based on Text Classifiers and Image Recognition." In Pattern Recognition, Computer Vision, and Image Processing. ICPR 2022 International Workshops and Challenges, 414–23. Cham: Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-37742-6_31.
Patel, Daniel. "Multimodal Summed Area Tables—A Proof of Concept." In Interactive Data Processing and 3D Visualization of the Solid Earth, 179–207. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-90716-7_5.
Conference papers on the topic "Multimodal data processing":
Yang, Lixin, Genshe Chen, Ronghua Xu, Sherry Chen, and Yu Chen. "Decentralized autonomous imaging data processing using blockchain." In Multimodal Biomedical Imaging XIV, edited by Fred S. Azar, Xavier Intes, and Qianqian Fang. SPIE, 2019. http://dx.doi.org/10.1117/12.2513243.
Kharinov, Mikhail V., and Aleksandr N. Bykov. "Data Structure for Multimodal Signal Processing." In 2019 International Russian Automation Conference. IEEE, 2019. http://dx.doi.org/10.1109/rusautocon.2019.8867769.
Chen, Jia, and Ioannis D. Schizas. "Distributed efficient multimodal data clustering." In 2017 25th European Signal Processing Conference (EUSIPCO). IEEE, 2017. http://dx.doi.org/10.23919/eusipco.2017.8081621.
Fedorov, Igor, Bhaskar D. Rao, and Truong Q. Nguyen. "Multimodal sparse Bayesian dictionary learning applied to multimodal data classification." In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2017. http://dx.doi.org/10.1109/icassp.2017.7952554.
Mukherjee, Arpan, Ali Tajer, Pin-Yu Chen, and Payel Das. "Active Estimation From Multimodal Data." In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021. http://dx.doi.org/10.1109/icassp39728.2021.9414772.
Ehlen, Patrick, Michael Johnston, and Gunaranjan Vasireddy. "Collecting mobile multimodal data for match." In 7th International Conference on Spoken Language Processing (ICSLP 2002). ISCA: ISCA, 2002. http://dx.doi.org/10.21437/icslp.2002-616.
C. L. Cavalheiro, Laís, Matheus C. Pavan, and Ivandré Paraboni. "Stance Prediction from Multimodal Social Media Data." In International Conference Recent Advances in Natural Language Processing. INCOMA Ltd., Shoumen, BULGARIA, 2023. http://dx.doi.org/10.26615/978-954-452-092-2_027.
Liu, Jingwei. "A New Theory of Data Processing: Applying Artificial Intelligence to Cognition and Humanity." In ICMI '23: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION. New York, NY, USA: ACM, 2023. http://dx.doi.org/10.1145/3577190.3616123.
Zhang, Hong, Li Chen, Jun Liu, and Junsong Yuan. "Hierarchical multi-feature fusion for multimodal data analysis." In 2014 IEEE International Conference on Image Processing (ICIP). IEEE, 2014. http://dx.doi.org/10.1109/icip.2014.7026195.
Basit, Mohammad, Bashir Alam, Zubaida Fatima, and Salman Shaikh. "Natural Disaster Tweets Classification Using Multimodal Data." In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA, USA: Association for Computational Linguistics, 2023. http://dx.doi.org/10.18653/v1/2023.emnlp-main.471.
Reports on the topic "Multimodal data processing":
Hamlin, Alexandra, Erik Kobylarz, James Lever, Susan Taylor, and Laura Ray. Assessing the feasibility of detecting epileptic seizures using non-cerebral sensor. Engineer Research and Development Center (U.S.), December 2021. http://dx.doi.org/10.21079/11681/42562.