Auswahl der wissenschaftlichen Literatur zum Thema „Deep multi-Modal learning“
Geben Sie eine Quelle nach APA, MLA, Chicago, Harvard und anderen Zitierweisen an
Inhaltsverzeichnis
Machen Sie sich mit den Listen der aktuellen Artikel, Bücher, Dissertationen, Berichten und anderer wissenschaftlichen Quellen zum Thema "Deep multi-Modal learning" bekannt.
Neben jedem Werk im Literaturverzeichnis ist die Option "Zur Bibliographie hinzufügen" verfügbar. Nutzen Sie sie, wird Ihre bibliographische Angabe des gewählten Werkes nach der nötigen Zitierweise (APA, MLA, Harvard, Chicago, Vancouver usw.) automatisch gestaltet.
Sie können auch den vollen Text der wissenschaftlichen Publikation im PDF-Format herunterladen und eine Online-Annotation der Arbeit lesen, wenn die relevanten Parameter in den Metadaten verfügbar sind.
Zeitschriftenartikel zum Thema "Deep multi-Modal learning"
Shetty D S, Radhika. „Multi-Modal Fusion Techniques in Deep Learning“. International Journal of Science and Research (IJSR) 12, Nr. 9 (05.09.2023): 526–32. http://dx.doi.org/10.21275/sr23905100554.
Der volle Inhalt der QuelleRoostaiyan, Seyed Mahdi, Ehsan Imani und Mahdieh Soleymani Baghshah. „Multi-modal deep distance metric learning“. Intelligent Data Analysis 21, Nr. 6 (15.11.2017): 1351–69. http://dx.doi.org/10.3233/ida-163196.
Der volle Inhalt der QuelleZhu, Xinghui, Liewu Cai, Zhuoyang Zou und Lei Zhu. „Deep Multi-Semantic Fusion-Based Cross-Modal Hashing“. Mathematics 10, Nr. 3 (29.01.2022): 430. http://dx.doi.org/10.3390/math10030430.
Der volle Inhalt der QuelleDu, Lin, Xiong You, Ke Li, Liqiu Meng, Gong Cheng, Liyang Xiong und Guangxia Wang. „Multi-modal deep learning for landform recognition“. ISPRS Journal of Photogrammetry and Remote Sensing 158 (Dezember 2019): 63–75. http://dx.doi.org/10.1016/j.isprsjprs.2019.09.018.
Der volle Inhalt der QuelleWang, Wei, Xiaoyan Yang, Beng Chin Ooi, Dongxiang Zhang und Yueting Zhuang. „Effective deep learning-based multi-modal retrieval“. VLDB Journal 25, Nr. 1 (19.07.2015): 79–101. http://dx.doi.org/10.1007/s00778-015-0391-4.
Der volle Inhalt der QuelleJeong, Changhoon, Sung-Eun Jang, Sanghyuck Na und Juntae Kim. „Korean Tourist Spot Multi-Modal Dataset for Deep Learning Applications“. Data 4, Nr. 4 (12.10.2019): 139. http://dx.doi.org/10.3390/data4040139.
Der volle Inhalt der QuelleYang, Yang, Yi-Feng Wu, De-Chuan Zhan, Zhi-Bin Liu und Yuan Jiang. „Deep Robust Unsupervised Multi-Modal Network“. Proceedings of the AAAI Conference on Artificial Intelligence 33 (17.07.2019): 5652–59. http://dx.doi.org/10.1609/aaai.v33i01.33015652.
Der volle Inhalt der QuelleHua, Yan, Yingyun Yang und Jianhe Du. „Deep Multi-Modal Metric Learning with Multi-Scale Correlation for Image-Text Retrieval“. Electronics 9, Nr. 3 (10.03.2020): 466. http://dx.doi.org/10.3390/electronics9030466.
Der volle Inhalt der QuelleHan, Dong, Hong Nie, Jinbao Chen, Meng Chen, Zhen Deng und Jianwei Zhang. „Multi-modal haptic image recognition based on deep learning“. Sensor Review 38, Nr. 4 (17.09.2018): 486–93. http://dx.doi.org/10.1108/sr-08-2017-0160.
Der volle Inhalt der QuellePyrovolakis, Konstantinos, Paraskevi Tzouveli und Giorgos Stamou. „Multi-Modal Song Mood Detection with Deep Learning“. Sensors 22, Nr. 3 (29.01.2022): 1065. http://dx.doi.org/10.3390/s22031065.
Der volle Inhalt der QuelleDissertationen zum Thema "Deep multi-Modal learning"
Feng, Xue Ph D. Massachusetts Institute of Technology. „Multi-modal and deep learning for robust speech recognition“. Thesis, Massachusetts Institute of Technology, 2017. http://hdl.handle.net/1721.1/113999.
Der volle Inhalt der QuelleCataloged from PDF version of thesis.
Includes bibliographical references (pages 105-115).
Automatic speech recognition (ASR) decodes speech signals into text. While ASR can produce accurate word recognition in clean environments, system performance can degrade dramatically when noise and reverberation are present. In this thesis, speech denoising and model adaptation for robust speech recognition were studied, and four novel methods were introduced to improve ASR robustness. First, we developed an ASR system using multi-channel information from microphone arrays via accurate speaker tracking with Kalman filtering and subsequent beamforming. The system was evaluated on the publicly available Reverb Challenge corpus, and placed second (out of 49 submitted systems) in the recognition task on real data. Second, we explored a speech feature denoising and dereverberation method via deep denoising autoencoders (DDA). The method was evaluated on the CHiME2-WSJ0 corpus and achieved a 16% to 25% absolute improvement in word error rate (WER) compared to the baseline. Third, we developed a method to incorporate heterogeneous multi-modal data with a deep neural network (DNN) based acoustic model. Our experiments on a noisy vehicle-based speech corpus demonstrated that WERs can be reduced by 6.3% relative to the baseline system. Finally, we explored the use of a low-dimensional environmentally-aware feature derived from the total acoustic variability space. Two extraction methods are presented: one via linear discriminant analysis (LDA) projection, and the other via a bottleneck deep neural network (BN-DNN). Our evaluations showed that by adapting ASR systems with the proposed feature, ASR performance was significantly improved. We also demonstrated that the proposed feature yielded promising results on environment identification tasks.
by Xue Feng.
Ph. D.
Mali, Shruti Atul. „Multi-Modal Learning for Abdominal Organ Segmentation“. Thesis, KTH, Skolan för kemi, bioteknologi och hälsa (CBH), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-285866.
Der volle Inhalt der QuelleBen-Younes, Hedi. „Multi-modal representation learning towards visual reasoning“. Electronic Thesis or Diss., Sorbonne université, 2019. http://www.theses.fr/2019SORUS173.
Der volle Inhalt der QuelleThe quantity of images that populate the Internet is dramatically increasing. It becomes of critical importance to develop the technology for a precise and automatic understanding of visual contents. As image recognition systems are becoming more and more relevant, researchers in artificial intelligence now seek for the next generation vision systems that can perform high-level scene understanding. In this thesis, we are interested in Visual Question Answering (VQA), which consists in building models that answer any natural language question about any image. Because of its nature and complexity, VQA is often considered as a proxy for visual reasoning. Classically, VQA architectures are designed as trainable systems that are provided with images, questions about them and their answers. To tackle this problem, typical approaches involve modern Deep Learning (DL) techniques. In the first part, we focus on developping multi-modal fusion strategies to model the interactions between image and question representations. More specifically, we explore bilinear fusion models and exploit concepts from tensor analysis to provide tractable and expressive factorizations of parameters. These fusion mechanisms are studied under the widely used visual attention framework: the answer to the question is provided by focusing only on the relevant image regions. In the last part, we move away from the attention mechanism and build a more advanced scene understanding architecture where we consider objects and their spatial and semantic relations. All models are thoroughly experimentally evaluated on standard datasets and the results are competitive with the literature
Ahmedt, Aristizabal David Esteban. „Multi-modal analysis for the automatic evaluation of epilepsy“. Thesis, Queensland University of Technology, 2019. https://eprints.qut.edu.au/132537/1/David_Ahmedt%20Aristizabal_Thesis.pdf.
Der volle Inhalt der QuelleOuenniche, Kaouther. „Multimodal deep learning for audiovisual production“. Electronic Thesis or Diss., Institut polytechnique de Paris, 2023. http://www.theses.fr/2023IPPAS020.
Der volle Inhalt der QuelleWithin the dynamic landscape of television content, the critical need to automate the indexing and organization of archives has emerged as a paramount objective. In response, this research explores the use of deep learning techniques to automate the extraction of diverse metadata from television archives, improving their accessibility and reuse.The first contribution of this research revolves around the classification of camera motion types. This is a crucial aspect of content indexing as it allows for efficient categorization and retrieval of video content based on the visual dynamics it exhibits. The novel approach proposed employs 3D convolutional neural networks with residual blocks, a technique inspired by action recognition methods. A semi-automatic approach for constructing a reliable camera motion dataset from publicly available videos is also presented, minimizing the need for manual intervention. Additionally, the creation of a challenging evaluation dataset, comprising real-life videos shot with professional cameras at varying resolutions, underlines the robustness and generalization power of the proposed technique, achieving an average accuracy rate of 94%.The second contribution centers on the demanding task of Video Question Answering. In this context, we explore the effectiveness of attention-based transformers for facilitating grounded multimodal learning. The challenge here lies in bridging the gap between the visual and textual modalities and mitigating the quadratic complexity of transformer models. To address these issues, a novel framework is introduced, which incorporates a lightweight transformer and a cross-modality module. This module leverages cross-correlation to enable reciprocal learning between text-conditioned visual features and video-conditioned textual features. Furthermore, an adversarial testing scenario with rephrased questions highlights the model's robustness and real-world applicability. Experimental results on benchmark datasets, such as MSVD-QA and MSRVTT-QA, validate the proposed methodology, with an average accuracy of 45% and 42%, respectively, which represents notable improvements over existing approaches.The third contribution of this research addresses the multimodal video captioning problem, a critical aspect of content indexing. The introduced framework incorporates a modality-attention module that captures the intricate relationships between visual and textual data using cross-correlation. Moreover, the integration of temporal attention enhances the model's ability to produce meaningful captions, considering the temporal dynamics of video content. Our work also incorporates an auxiliary task employing a contrastive loss function, which promotes model generalization and a deeper understanding of inter-modal relationships and underlying semantics. The utilization of a transformer architecture for encoding and decoding significantly enhances the model's capacity to capture interdependencies between text and video data. The research validates the proposed methodology through rigorous evaluation on the MSRVTT benchmark,viachieving BLEU4, ROUGE, and METEOR scores of 0.4408, 0.6291 and 0.3082, respectively. In comparison to state-of-the-art methods, this approach consistently outperforms, with performance gains ranging from 1.21% to 1.52% across the three metrics considered.In conclusion, this manuscript offers a holistic exploration of deep learning-based techniques to automate television content indexing, addressing the labor-intensive and time-consuming nature of manual indexing. The contributions encompass camera motion type classification, VideoQA, and multimodal video captioning, collectively advancing the state of the art and providing valuable insights for researchers in the field. These findings not only have practical applications for content retrieval and indexing but also contribute to the broader advancement of deep learning methodologies in the multimodal context
Tahoun, Mohamed. „Object Shape Perception for Autonomous Dexterous Manipulation Based on Multi-Modal Learning Models“. Electronic Thesis or Diss., Bourges, INSA Centre Val de Loire, 2021. http://www.theses.fr/2021ISAB0003.
Der volle Inhalt der QuelleThis thesis proposes 3D object reconstruction methods based on multimodal deep learning strategies. The targeted applications concern robotic manipulation. First, the thesis proposes a 3D visual reconstruction method from a single view of the object obtained by an RGB-D sensor. Then, in order to improve the quality of 3D reconstruction of objects from a single view, a new method combining visual and tactile information has been proposed based on a learning reconstruction model. The proposed method has been validated on a visual-tactile dataset respecting the kinematic constraints of a robotic hand. The visual-tactile dataset respecting the kinematic properties of the multi-fingered robotic hand has been created in the framework of this PhD work. This dataset is unique in the literature and is also a contribution of the thesis. The validation results show that the tactile information can have an important contribution for the prediction of the complete shape of an object, especially the part that is not visible to the RGD-D sensor. They also show that the proposed model allows to obtain better results compared to those obtained with the best performing methods of the state of the art
Dickens, James. „Depth-Aware Deep Learning Networks for Object Detection and Image Segmentation“. Thesis, Université d'Ottawa / University of Ottawa, 2021. http://hdl.handle.net/10393/42619.
Der volle Inhalt der QuelleHusseini, Orabi Ahmed. „Multi-Modal Technology for User Interface Analysis including Mental State Detection and Eye Tracking Analysis“. Thesis, Université d'Ottawa / University of Ottawa, 2017. http://hdl.handle.net/10393/36451.
Der volle Inhalt der QuelleSiddiqui, Mohammad Faridul Haque. „A Multi-modal Emotion Recognition Framework Through The Fusion Of Speech With Visible And Infrared Images“. University of Toledo / OhioLINK, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=toledo1556459232937498.
Der volle Inhalt der QuelleZhang, Yifei. „Real-time multimodal semantic scene understanding for autonomous UGV navigation“. Thesis, Bourgogne Franche-Comté, 2021. http://www.theses.fr/2021UBFCK002.
Der volle Inhalt der QuelleRobust semantic scene understanding is challenging due to complex object types, as well as environmental changes caused by varying illumination and weather conditions. This thesis studies the problem of deep semantic segmentation with multimodal image inputs. Multimodal images captured from various sensory modalities provide complementary information for complete scene understanding. We provided effective solutions for fully-supervised multimodal image segmentation and few-shot semantic segmentation of the outdoor road scene. Regarding the former case, we proposed a multi-level fusion network to integrate RGB and polarimetric images. A central fusion framework was also introduced to adaptively learn the joint representations of modality-specific features and reduce model uncertainty via statistical post-processing.In the case of semi-supervised semantic scene understanding, we first proposed a novel few-shot segmentation method based on the prototypical network, which employs multiscale feature enhancement and the attention mechanism. Then we extended the RGB-centric algorithms to take advantage of supplementary depth cues. Comprehensive empirical evaluations on different benchmark datasets demonstrate that all the proposed algorithms achieve superior performance in terms of accuracy as well as demonstrating the effectiveness of complementary modalities for outdoor scene understanding for autonomous navigation
Buchteile zum Thema "Deep multi-Modal learning"
Hiriyannaiah, Srinidhi, G. M. Siddesh und K. G. Srinivasa. „Overview of Deep Learning“. In Cloud-based Multi-Modal Information Analytics, 39–55. Boca Raton: Chapman and Hall/CRC, 2023. http://dx.doi.org/10.1201/9781003215974-4.
Der volle Inhalt der QuelleHiriyannaiah, Srinidhi, G. M. Siddesh und K. G. Srinivasa. „Cloud and Deep Learning“. In Cloud-based Multi-Modal Information Analytics, 19–38. Boca Raton: Chapman and Hall/CRC, 2023. http://dx.doi.org/10.1201/9781003215974-3.
Der volle Inhalt der QuelleHiriyannaiah, Srinidhi, G. M. Siddesh und K. G. Srinivasa. „Deep Learning Platforms and Cloud“. In Cloud-based Multi-Modal Information Analytics, 57–70. Boca Raton: Chapman and Hall/CRC, 2023. http://dx.doi.org/10.1201/9781003215974-5.
Der volle Inhalt der QuelleYang, Yang, Yi-Feng Wu, De-Chuan Zhan und Yuan Jiang. „Deep Multi-modal Learning with Cascade Consensus“. In Lecture Notes in Computer Science, 64–72. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-97310-4_8.
Der volle Inhalt der QuelleVarsavsky, Thomas, Zach Eaton-Rosen, Carole H. Sudre, Parashkev Nachev und M. Jorge Cardoso. „PIMMS: Permutation Invariant Multi-modal Segmentation“. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, 201–9. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-030-00889-5_23.
Der volle Inhalt der QuelleLi, Cheng, Hui Sun, Zaiyi Liu, Meiyun Wang, Hairong Zheng und Shanshan Wang. „Learning Cross-Modal Deep Representations for Multi-Modal MR Image Segmentation“. In Lecture Notes in Computer Science, 57–65. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-32245-8_7.
Der volle Inhalt der QuelleLin, Yu. „Sentiment Analysis of Painting Based on Deep Learning“. In Application of Intelligent Systems in Multi-modal Information Analytics, 651–55. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-51556-0_96.
Der volle Inhalt der QuelleYang, Liang, Huajun Wang und Xiaolin Zhang. „A Deep Learning Method for Salient Object Detection“. In Application of Intelligent Systems in Multi-modal Information Analytics, 894–99. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-05484-6_118.
Der volle Inhalt der QuelleLuo, Yanling, Jiawei Wan und Shengqin She. „Software Security Vulnerability Mining Based on Deep Learning“. In Application of Intelligent Systems in Multi-modal Information Analytics, 536–43. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-05237-8_66.
Der volle Inhalt der QuelleZhang, Sen, Changzheng Zhang, Lanjun Wang, Cixing Li, Dandan Tu, Rui Luo, Guojun Qi und Jiebo Luo. „MSAFusionNet: Multiple Subspace Attention Based Deep Multi-modal Fusion Network“. In Machine Learning in Medical Imaging, 54–62. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-32692-0_7.
Der volle Inhalt der QuelleKonferenzberichte zum Thema "Deep multi-Modal learning"
Iyer, Vasanth, Alex J. Aved, Todd B. Howlett, Jeffrey T. Carlo, Asif Mehmood, Niki Pissniou und S. Sitharama Iyengar. „Fast multi-modal reuse: co-occurrence pre-trained deep learning models“. In Real-Time Image Processing and Deep Learning 2019, herausgegeben von Nasser Kehtarnavaz und Matthias F. Carlsohn. SPIE, 2019. http://dx.doi.org/10.1117/12.2519546.
Der volle Inhalt der QuelleKulkarni, Karthik, Prakash Patil und Suvarna G. Kanakaraddi. „Multi-Modal Colour Extraction Using Deep Learning Techniques“. In 2022 Fourth International Conference on Emerging Research in Electronics, Computer Science and Technology (ICERECT). IEEE, 2022. http://dx.doi.org/10.1109/icerect56837.2022.10060086.
Der volle Inhalt der QuelleMüller, K. R., und S. M. Hofmann. „Interpreting Deep Learning Models for Multi-modal Neuroimaging“. In 2023 11th International Winter Conference on Brain-Computer Interface (BCI). IEEE, 2023. http://dx.doi.org/10.1109/bci57258.2023.10078502.
Der volle Inhalt der QuelleHaritha, D., und B. Sandhya. „Multi-modal Medical Data Fusion using Deep Learning“. In 2022 9th International Conference on Computing for Sustainable Global Development (INDIACom). IEEE, 2022. http://dx.doi.org/10.23919/indiacom54597.2022.9763296.
Der volle Inhalt der QuelleYou, Bihao, Jiahao Qin, Yitao Xu, Yunfeng Wu, Yize Liu und Sijia Pan. „Multi - Modal Deep Learning Model for Stock Crises“. In 2023 2nd International Conference on Frontiers of Communications, Information System and Data Science (CISDS). IEEE, 2023. http://dx.doi.org/10.1109/cisds61173.2023.00017.
Der volle Inhalt der QuelleVijayaraghavan, Prashanth, Soroush Vosoughi und Deb Roy. „Twitter Demographic Classification Using Deep Multi-modal Multi-task Learning“. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Stroudsburg, PA, USA: Association for Computational Linguistics, 2017. http://dx.doi.org/10.18653/v1/p17-2076.
Der volle Inhalt der QuelleZhang, Xiao, und Xiaoling Liu. „Interference Signal Recognition Based on Multi-Modal Deep Learning“. In 2020 7th International Conference on Dependable Systems and Their Applications (DSA). IEEE, 2020. http://dx.doi.org/10.1109/dsa51864.2020.00055.
Der volle Inhalt der QuelleLiu, Bao-Yun, Yi-Hsin Jen, Shih-Wei Sun, Li Su und Pao-Chi Chang. „Multi-Modal Deep Learning-Based Violin Bowing Action Recognition“. In 2020 IEEE International Conference on Consumer Electronics - Taiwan (ICCE-Taiwan). IEEE, 2020. http://dx.doi.org/10.1109/icce-taiwan49838.2020.9257995.
Der volle Inhalt der QuelleHuang, Xin, und Yuxin Peng. „Cross-modal deep metric learning with multi-task regularization“. In 2017 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 2017. http://dx.doi.org/10.1109/icme.2017.8019340.
Der volle Inhalt der QuelleLam, Genevieve, Huang Dongyan und Weisi Lin. „Context-aware Deep Learning for Multi-modal Depression Detection“. In ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2019. http://dx.doi.org/10.1109/icassp.2019.8683027.
Der volle Inhalt der Quelle