Gotowa bibliografia na temat „Deep multi-Modal learning”
Utwórz poprawne odniesienie w stylach APA, MLA, Chicago, Harvard i wielu innych
Zobacz listy aktualnych artykułów, książek, rozpraw, streszczeń i innych źródeł naukowych na temat „Deep multi-Modal learning”.
Przycisk „Dodaj do bibliografii” jest dostępny obok każdej pracy w bibliografii. Użyj go – a my automatycznie utworzymy odniesienie bibliograficzne do wybranej pracy w stylu cytowania, którego potrzebujesz: APA, MLA, Harvard, Chicago, Vancouver itp.
Możesz również pobrać pełny tekst publikacji naukowej w formacie „.pdf” i przeczytać adnotację do pracy online, jeśli odpowiednie parametry są dostępne w metadanych.
Artykuły w czasopismach na temat "Deep multi-Modal learning"
Shetty D S, Radhika. "Multi-Modal Fusion Techniques in Deep Learning". International Journal of Science and Research (IJSR) 12, nr 9 (5.09.2023): 526–32. http://dx.doi.org/10.21275/sr23905100554.
Pełny tekst źródłaRoostaiyan, Seyed Mahdi, Ehsan Imani i Mahdieh Soleymani Baghshah. "Multi-modal deep distance metric learning". Intelligent Data Analysis 21, nr 6 (15.11.2017): 1351–69. http://dx.doi.org/10.3233/ida-163196.
Pełny tekst źródłaZhu, Xinghui, Liewu Cai, Zhuoyang Zou i Lei Zhu. "Deep Multi-Semantic Fusion-Based Cross-Modal Hashing". Mathematics 10, nr 3 (29.01.2022): 430. http://dx.doi.org/10.3390/math10030430.
Pełny tekst źródłaDu, Lin, Xiong You, Ke Li, Liqiu Meng, Gong Cheng, Liyang Xiong i Guangxia Wang. "Multi-modal deep learning for landform recognition". ISPRS Journal of Photogrammetry and Remote Sensing 158 (grudzień 2019): 63–75. http://dx.doi.org/10.1016/j.isprsjprs.2019.09.018.
Pełny tekst źródłaWang, Wei, Xiaoyan Yang, Beng Chin Ooi, Dongxiang Zhang i Yueting Zhuang. "Effective deep learning-based multi-modal retrieval". VLDB Journal 25, nr 1 (19.07.2015): 79–101. http://dx.doi.org/10.1007/s00778-015-0391-4.
Pełny tekst źródłaJeong, Changhoon, Sung-Eun Jang, Sanghyuck Na i Juntae Kim. "Korean Tourist Spot Multi-Modal Dataset for Deep Learning Applications". Data 4, nr 4 (12.10.2019): 139. http://dx.doi.org/10.3390/data4040139.
Pełny tekst źródłaYang, Yang, Yi-Feng Wu, De-Chuan Zhan, Zhi-Bin Liu i Yuan Jiang. "Deep Robust Unsupervised Multi-Modal Network". Proceedings of the AAAI Conference on Artificial Intelligence 33 (17.07.2019): 5652–59. http://dx.doi.org/10.1609/aaai.v33i01.33015652.
Pełny tekst źródłaHua, Yan, Yingyun Yang i Jianhe Du. "Deep Multi-Modal Metric Learning with Multi-Scale Correlation for Image-Text Retrieval". Electronics 9, nr 3 (10.03.2020): 466. http://dx.doi.org/10.3390/electronics9030466.
Pełny tekst źródłaHan, Dong, Hong Nie, Jinbao Chen, Meng Chen, Zhen Deng i Jianwei Zhang. "Multi-modal haptic image recognition based on deep learning". Sensor Review 38, nr 4 (17.09.2018): 486–93. http://dx.doi.org/10.1108/sr-08-2017-0160.
Pełny tekst źródłaPyrovolakis, Konstantinos, Paraskevi Tzouveli i Giorgos Stamou. "Multi-Modal Song Mood Detection with Deep Learning". Sensors 22, nr 3 (29.01.2022): 1065. http://dx.doi.org/10.3390/s22031065.
Pełny tekst źródłaRozprawy doktorskie na temat "Deep multi-Modal learning"
Feng, Xue Ph D. Massachusetts Institute of Technology. "Multi-modal and deep learning for robust speech recognition". Thesis, Massachusetts Institute of Technology, 2017. http://hdl.handle.net/1721.1/113999.
Pełny tekst źródłaCataloged from PDF version of thesis.
Includes bibliographical references (pages 105-115).
Automatic speech recognition (ASR) decodes speech signals into text. While ASR can produce accurate word recognition in clean environments, system performance can degrade dramatically when noise and reverberation are present. In this thesis, speech denoising and model adaptation for robust speech recognition were studied, and four novel methods were introduced to improve ASR robustness. First, we developed an ASR system using multi-channel information from microphone arrays via accurate speaker tracking with Kalman filtering and subsequent beamforming. The system was evaluated on the publicly available Reverb Challenge corpus, and placed second (out of 49 submitted systems) in the recognition task on real data. Second, we explored a speech feature denoising and dereverberation method via deep denoising autoencoders (DDA). The method was evaluated on the CHiME2-WSJ0 corpus and achieved a 16% to 25% absolute improvement in word error rate (WER) compared to the baseline. Third, we developed a method to incorporate heterogeneous multi-modal data with a deep neural network (DNN) based acoustic model. Our experiments on a noisy vehicle-based speech corpus demonstrated that WERs can be reduced by 6.3% relative to the baseline system. Finally, we explored the use of a low-dimensional environmentally-aware feature derived from the total acoustic variability space. Two extraction methods are presented: one via linear discriminant analysis (LDA) projection, and the other via a bottleneck deep neural network (BN-DNN). Our evaluations showed that by adapting ASR systems with the proposed feature, ASR performance was significantly improved. We also demonstrated that the proposed feature yielded promising results on environment identification tasks.
by Xue Feng.
Ph. D.
Mali, Shruti Atul. "Multi-Modal Learning for Abdominal Organ Segmentation". Thesis, KTH, Skolan för kemi, bioteknologi och hälsa (CBH), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-285866.
Pełny tekst źródłaBen-Younes, Hedi. "Multi-modal representation learning towards visual reasoning". Electronic Thesis or Diss., Sorbonne université, 2019. http://www.theses.fr/2019SORUS173.
Pełny tekst źródłaThe quantity of images that populate the Internet is dramatically increasing. It becomes of critical importance to develop the technology for a precise and automatic understanding of visual contents. As image recognition systems are becoming more and more relevant, researchers in artificial intelligence now seek for the next generation vision systems that can perform high-level scene understanding. In this thesis, we are interested in Visual Question Answering (VQA), which consists in building models that answer any natural language question about any image. Because of its nature and complexity, VQA is often considered as a proxy for visual reasoning. Classically, VQA architectures are designed as trainable systems that are provided with images, questions about them and their answers. To tackle this problem, typical approaches involve modern Deep Learning (DL) techniques. In the first part, we focus on developping multi-modal fusion strategies to model the interactions between image and question representations. More specifically, we explore bilinear fusion models and exploit concepts from tensor analysis to provide tractable and expressive factorizations of parameters. These fusion mechanisms are studied under the widely used visual attention framework: the answer to the question is provided by focusing only on the relevant image regions. In the last part, we move away from the attention mechanism and build a more advanced scene understanding architecture where we consider objects and their spatial and semantic relations. All models are thoroughly experimentally evaluated on standard datasets and the results are competitive with the literature
Ahmedt, Aristizabal David Esteban. "Multi-modal analysis for the automatic evaluation of epilepsy". Thesis, Queensland University of Technology, 2019. https://eprints.qut.edu.au/132537/1/David_Ahmedt%20Aristizabal_Thesis.pdf.
Pełny tekst źródłaOuenniche, Kaouther. "Multimodal deep learning for audiovisual production". Electronic Thesis or Diss., Institut polytechnique de Paris, 2023. http://www.theses.fr/2023IPPAS020.
Pełny tekst źródłaWithin the dynamic landscape of television content, the critical need to automate the indexing and organization of archives has emerged as a paramount objective. In response, this research explores the use of deep learning techniques to automate the extraction of diverse metadata from television archives, improving their accessibility and reuse.The first contribution of this research revolves around the classification of camera motion types. This is a crucial aspect of content indexing as it allows for efficient categorization and retrieval of video content based on the visual dynamics it exhibits. The novel approach proposed employs 3D convolutional neural networks with residual blocks, a technique inspired by action recognition methods. A semi-automatic approach for constructing a reliable camera motion dataset from publicly available videos is also presented, minimizing the need for manual intervention. Additionally, the creation of a challenging evaluation dataset, comprising real-life videos shot with professional cameras at varying resolutions, underlines the robustness and generalization power of the proposed technique, achieving an average accuracy rate of 94%.The second contribution centers on the demanding task of Video Question Answering. In this context, we explore the effectiveness of attention-based transformers for facilitating grounded multimodal learning. The challenge here lies in bridging the gap between the visual and textual modalities and mitigating the quadratic complexity of transformer models. To address these issues, a novel framework is introduced, which incorporates a lightweight transformer and a cross-modality module. This module leverages cross-correlation to enable reciprocal learning between text-conditioned visual features and video-conditioned textual features. Furthermore, an adversarial testing scenario with rephrased questions highlights the model's robustness and real-world applicability. Experimental results on benchmark datasets, such as MSVD-QA and MSRVTT-QA, validate the proposed methodology, with an average accuracy of 45% and 42%, respectively, which represents notable improvements over existing approaches.The third contribution of this research addresses the multimodal video captioning problem, a critical aspect of content indexing. The introduced framework incorporates a modality-attention module that captures the intricate relationships between visual and textual data using cross-correlation. Moreover, the integration of temporal attention enhances the model's ability to produce meaningful captions, considering the temporal dynamics of video content. Our work also incorporates an auxiliary task employing a contrastive loss function, which promotes model generalization and a deeper understanding of inter-modal relationships and underlying semantics. The utilization of a transformer architecture for encoding and decoding significantly enhances the model's capacity to capture interdependencies between text and video data. The research validates the proposed methodology through rigorous evaluation on the MSRVTT benchmark,viachieving BLEU4, ROUGE, and METEOR scores of 0.4408, 0.6291 and 0.3082, respectively. In comparison to state-of-the-art methods, this approach consistently outperforms, with performance gains ranging from 1.21% to 1.52% across the three metrics considered.In conclusion, this manuscript offers a holistic exploration of deep learning-based techniques to automate television content indexing, addressing the labor-intensive and time-consuming nature of manual indexing. The contributions encompass camera motion type classification, VideoQA, and multimodal video captioning, collectively advancing the state of the art and providing valuable insights for researchers in the field. These findings not only have practical applications for content retrieval and indexing but also contribute to the broader advancement of deep learning methodologies in the multimodal context
Tahoun, Mohamed. "Object Shape Perception for Autonomous Dexterous Manipulation Based on Multi-Modal Learning Models". Electronic Thesis or Diss., Bourges, INSA Centre Val de Loire, 2021. http://www.theses.fr/2021ISAB0003.
Pełny tekst źródłaThis thesis proposes 3D object reconstruction methods based on multimodal deep learning strategies. The targeted applications concern robotic manipulation. First, the thesis proposes a 3D visual reconstruction method from a single view of the object obtained by an RGB-D sensor. Then, in order to improve the quality of 3D reconstruction of objects from a single view, a new method combining visual and tactile information has been proposed based on a learning reconstruction model. The proposed method has been validated on a visual-tactile dataset respecting the kinematic constraints of a robotic hand. The visual-tactile dataset respecting the kinematic properties of the multi-fingered robotic hand has been created in the framework of this PhD work. This dataset is unique in the literature and is also a contribution of the thesis. The validation results show that the tactile information can have an important contribution for the prediction of the complete shape of an object, especially the part that is not visible to the RGD-D sensor. They also show that the proposed model allows to obtain better results compared to those obtained with the best performing methods of the state of the art
Dickens, James. "Depth-Aware Deep Learning Networks for Object Detection and Image Segmentation". Thesis, Université d'Ottawa / University of Ottawa, 2021. http://hdl.handle.net/10393/42619.
Pełny tekst źródłaHusseini, Orabi Ahmed. "Multi-Modal Technology for User Interface Analysis including Mental State Detection and Eye Tracking Analysis". Thesis, Université d'Ottawa / University of Ottawa, 2017. http://hdl.handle.net/10393/36451.
Pełny tekst źródłaSiddiqui, Mohammad Faridul Haque. "A Multi-modal Emotion Recognition Framework Through The Fusion Of Speech With Visible And Infrared Images". University of Toledo / OhioLINK, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=toledo1556459232937498.
Pełny tekst źródłaZhang, Yifei. "Real-time multimodal semantic scene understanding for autonomous UGV navigation". Thesis, Bourgogne Franche-Comté, 2021. http://www.theses.fr/2021UBFCK002.
Pełny tekst źródłaRobust semantic scene understanding is challenging due to complex object types, as well as environmental changes caused by varying illumination and weather conditions. This thesis studies the problem of deep semantic segmentation with multimodal image inputs. Multimodal images captured from various sensory modalities provide complementary information for complete scene understanding. We provided effective solutions for fully-supervised multimodal image segmentation and few-shot semantic segmentation of the outdoor road scene. Regarding the former case, we proposed a multi-level fusion network to integrate RGB and polarimetric images. A central fusion framework was also introduced to adaptively learn the joint representations of modality-specific features and reduce model uncertainty via statistical post-processing.In the case of semi-supervised semantic scene understanding, we first proposed a novel few-shot segmentation method based on the prototypical network, which employs multiscale feature enhancement and the attention mechanism. Then we extended the RGB-centric algorithms to take advantage of supplementary depth cues. Comprehensive empirical evaluations on different benchmark datasets demonstrate that all the proposed algorithms achieve superior performance in terms of accuracy as well as demonstrating the effectiveness of complementary modalities for outdoor scene understanding for autonomous navigation
Części książek na temat "Deep multi-Modal learning"
Hiriyannaiah, Srinidhi, G. M. Siddesh i K. G. Srinivasa. "Overview of Deep Learning". W Cloud-based Multi-Modal Information Analytics, 39–55. Boca Raton: Chapman and Hall/CRC, 2023. http://dx.doi.org/10.1201/9781003215974-4.
Pełny tekst źródłaHiriyannaiah, Srinidhi, G. M. Siddesh i K. G. Srinivasa. "Cloud and Deep Learning". W Cloud-based Multi-Modal Information Analytics, 19–38. Boca Raton: Chapman and Hall/CRC, 2023. http://dx.doi.org/10.1201/9781003215974-3.
Pełny tekst źródłaHiriyannaiah, Srinidhi, G. M. Siddesh i K. G. Srinivasa. "Deep Learning Platforms and Cloud". W Cloud-based Multi-Modal Information Analytics, 57–70. Boca Raton: Chapman and Hall/CRC, 2023. http://dx.doi.org/10.1201/9781003215974-5.
Pełny tekst źródłaYang, Yang, Yi-Feng Wu, De-Chuan Zhan i Yuan Jiang. "Deep Multi-modal Learning with Cascade Consensus". W Lecture Notes in Computer Science, 64–72. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-97310-4_8.
Pełny tekst źródłaVarsavsky, Thomas, Zach Eaton-Rosen, Carole H. Sudre, Parashkev Nachev i M. Jorge Cardoso. "PIMMS: Permutation Invariant Multi-modal Segmentation". W Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, 201–9. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-030-00889-5_23.
Pełny tekst źródłaLi, Cheng, Hui Sun, Zaiyi Liu, Meiyun Wang, Hairong Zheng i Shanshan Wang. "Learning Cross-Modal Deep Representations for Multi-Modal MR Image Segmentation". W Lecture Notes in Computer Science, 57–65. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-32245-8_7.
Pełny tekst źródłaLin, Yu. "Sentiment Analysis of Painting Based on Deep Learning". W Application of Intelligent Systems in Multi-modal Information Analytics, 651–55. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-51556-0_96.
Pełny tekst źródłaYang, Liang, Huajun Wang i Xiaolin Zhang. "A Deep Learning Method for Salient Object Detection". W Application of Intelligent Systems in Multi-modal Information Analytics, 894–99. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-05484-6_118.
Pełny tekst źródłaLuo, Yanling, Jiawei Wan i Shengqin She. "Software Security Vulnerability Mining Based on Deep Learning". W Application of Intelligent Systems in Multi-modal Information Analytics, 536–43. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-05237-8_66.
Pełny tekst źródłaZhang, Sen, Changzheng Zhang, Lanjun Wang, Cixing Li, Dandan Tu, Rui Luo, Guojun Qi i Jiebo Luo. "MSAFusionNet: Multiple Subspace Attention Based Deep Multi-modal Fusion Network". W Machine Learning in Medical Imaging, 54–62. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-32692-0_7.
Pełny tekst źródłaStreszczenia konferencji na temat "Deep multi-Modal learning"
Iyer, Vasanth, Alex J. Aved, Todd B. Howlett, Jeffrey T. Carlo, Asif Mehmood, Niki Pissniou i S. Sitharama Iyengar. "Fast multi-modal reuse: co-occurrence pre-trained deep learning models". W Real-Time Image Processing and Deep Learning 2019, redaktorzy Nasser Kehtarnavaz i Matthias F. Carlsohn. SPIE, 2019. http://dx.doi.org/10.1117/12.2519546.
Pełny tekst źródłaKulkarni, Karthik, Prakash Patil i Suvarna G. Kanakaraddi. "Multi-Modal Colour Extraction Using Deep Learning Techniques". W 2022 Fourth International Conference on Emerging Research in Electronics, Computer Science and Technology (ICERECT). IEEE, 2022. http://dx.doi.org/10.1109/icerect56837.2022.10060086.
Pełny tekst źródłaMüller, K. R., i S. M. Hofmann. "Interpreting Deep Learning Models for Multi-modal Neuroimaging". W 2023 11th International Winter Conference on Brain-Computer Interface (BCI). IEEE, 2023. http://dx.doi.org/10.1109/bci57258.2023.10078502.
Pełny tekst źródłaHaritha, D., i B. Sandhya. "Multi-modal Medical Data Fusion using Deep Learning". W 2022 9th International Conference on Computing for Sustainable Global Development (INDIACom). IEEE, 2022. http://dx.doi.org/10.23919/indiacom54597.2022.9763296.
Pełny tekst źródłaYou, Bihao, Jiahao Qin, Yitao Xu, Yunfeng Wu, Yize Liu i Sijia Pan. "Multi - Modal Deep Learning Model for Stock Crises". W 2023 2nd International Conference on Frontiers of Communications, Information System and Data Science (CISDS). IEEE, 2023. http://dx.doi.org/10.1109/cisds61173.2023.00017.
Pełny tekst źródłaVijayaraghavan, Prashanth, Soroush Vosoughi i Deb Roy. "Twitter Demographic Classification Using Deep Multi-modal Multi-task Learning". W Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Stroudsburg, PA, USA: Association for Computational Linguistics, 2017. http://dx.doi.org/10.18653/v1/p17-2076.
Pełny tekst źródłaZhang, Xiao, i Xiaoling Liu. "Interference Signal Recognition Based on Multi-Modal Deep Learning". W 2020 7th International Conference on Dependable Systems and Their Applications (DSA). IEEE, 2020. http://dx.doi.org/10.1109/dsa51864.2020.00055.
Pełny tekst źródłaLiu, Bao-Yun, Yi-Hsin Jen, Shih-Wei Sun, Li Su i Pao-Chi Chang. "Multi-Modal Deep Learning-Based Violin Bowing Action Recognition". W 2020 IEEE International Conference on Consumer Electronics - Taiwan (ICCE-Taiwan). IEEE, 2020. http://dx.doi.org/10.1109/icce-taiwan49838.2020.9257995.
Pełny tekst źródłaHuang, Xin, i Yuxin Peng. "Cross-modal deep metric learning with multi-task regularization". W 2017 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 2017. http://dx.doi.org/10.1109/icme.2017.8019340.
Pełny tekst źródłaLam, Genevieve, Huang Dongyan i Weisi Lin. "Context-aware Deep Learning for Multi-modal Depression Detection". W ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2019. http://dx.doi.org/10.1109/icassp.2019.8683027.
Pełny tekst źródła