Journal articles on the topic 'Deep multi-Modal learning'

To see the other types of publications on this topic, follow the link: Deep multi-Modal learning.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Deep multi-Modal learning.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Shetty D S, Radhika. "Multi-Modal Fusion Techniques in Deep Learning." International Journal of Science and Research (IJSR) 12, no. 9 (September 5, 2023): 526–32. http://dx.doi.org/10.21275/sr23905100554.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Roostaiyan, Seyed Mahdi, Ehsan Imani, and Mahdieh Soleymani Baghshah. "Multi-modal deep distance metric learning." Intelligent Data Analysis 21, no. 6 (November 15, 2017): 1351–69. http://dx.doi.org/10.3233/ida-163196.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Zhu, Xinghui, Liewu Cai, Zhuoyang Zou, and Lei Zhu. "Deep Multi-Semantic Fusion-Based Cross-Modal Hashing." Mathematics 10, no. 3 (January 29, 2022): 430. http://dx.doi.org/10.3390/math10030430.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Due to the low costs of its storage and search, the cross-modal retrieval hashing method has received much research interest in the big data era. Due to the application of deep learning, the cross-modal representation capabilities have risen markedly. However, the existing deep hashing methods cannot consider multi-label semantic learning and cross-modal similarity learning simultaneously. That means potential semantic correlations among multimedia data are not fully excavated from multi-category labels, which also affects the original similarity preserving of cross-modal hash codes. To this end, this paper proposes deep multi-semantic fusion-based cross-modal hashing (DMSFH), which uses two deep neural networks to extract cross-modal features, and uses a multi-label semantic fusion method to improve cross-modal consistent semantic discrimination learning. Moreover, a graph regularization method is combined with inter-modal and intra-modal pairwise loss to preserve the nearest neighbor relationship between data in Hamming subspace. Thus, DMSFH not only retains semantic similarity between multi-modal data, but integrates multi-label information into modal learning as well. Extensive experimental results on two commonly used benchmark datasets show that our DMSFH is competitive with the state-of-the-art methods.
4

Du, Lin, Xiong You, Ke Li, Liqiu Meng, Gong Cheng, Liyang Xiong, and Guangxia Wang. "Multi-modal deep learning for landform recognition." ISPRS Journal of Photogrammetry and Remote Sensing 158 (December 2019): 63–75. http://dx.doi.org/10.1016/j.isprsjprs.2019.09.018.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Wang, Wei, Xiaoyan Yang, Beng Chin Ooi, Dongxiang Zhang, and Yueting Zhuang. "Effective deep learning-based multi-modal retrieval." VLDB Journal 25, no. 1 (July 19, 2015): 79–101. http://dx.doi.org/10.1007/s00778-015-0391-4.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Jeong, Changhoon, Sung-Eun Jang, Sanghyuck Na, and Juntae Kim. "Korean Tourist Spot Multi-Modal Dataset for Deep Learning Applications." Data 4, no. 4 (October 12, 2019): 139. http://dx.doi.org/10.3390/data4040139.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Recently, deep learning-based methods for solving multi-modal tasks such as image captioning, multi-modal classification, and cross-modal retrieval have attracted much attention. To apply deep learning for such tasks, large amounts of data are needed for training. However, although there are several Korean single-modal datasets, there are not enough Korean multi-modal datasets. In this paper, we introduce a KTS (Korean tourist spot) dataset for Korean multi-modal deep-learning research. The KTS dataset has four modalities (image, text, hashtags, and likes) and consists of 10 classes related to Korean tourist spots. All data were extracted from Instagram and preprocessed. We performed two experiments, image classification and image captioning with the dataset, and they showed appropriate results. We hope that many researchers will use this dataset for multi-modal deep-learning research.
7

Yang, Yang, Yi-Feng Wu, De-Chuan Zhan, Zhi-Bin Liu, and Yuan Jiang. "Deep Robust Unsupervised Multi-Modal Network." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 5652–59. http://dx.doi.org/10.1609/aaai.v33i01.33015652.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
In real-world applications, data are often with multiple modalities, and many multi-modal learning approaches are proposed for integrating the information from different sources. Most of the previous multi-modal methods utilize the modal consistency to reduce the complexity of the learning problem, therefore the modal completeness needs to be guaranteed. However, due to the data collection failures, self-deficiencies, and other various reasons, multi-modal instances are often incomplete in real applications, and have the inconsistent anomalies even in the complete instances, which jointly result in the inconsistent problem. These degenerate the multi-modal feature learning performance, and will finally affect the generalization abilities in different tasks. In this paper, we propose a novel Deep Robust Unsupervised Multi-modal Network structure (DRUMN) for solving this real problem within a unified framework. The proposed DRUMN can utilize the extrinsic heterogeneous information from unlabeled data against the insufficiency caused by the incompleteness. On the other hand, the inconsistent anomaly issue is solved with an adaptive weighted estimation, rather than adjusting the complex thresholds. As DRUMN can extract the discriminative feature representations for each modality, experiments on real-world multimodal datasets successfully validate the effectiveness of our proposed method.
8

Hua, Yan, Yingyun Yang, and Jianhe Du. "Deep Multi-Modal Metric Learning with Multi-Scale Correlation for Image-Text Retrieval." Electronics 9, no. 3 (March 10, 2020): 466. http://dx.doi.org/10.3390/electronics9030466.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Multi-modal retrieval is a challenge due to heterogeneous gap and a complex semantic relationship between different modal data. Typical research map different modalities into a common subspace with a one-to-one correspondence or similarity/dissimilarity relationship of inter-modal data, in which the distances of heterogeneous data can be compared directly; thus, inter-modal retrieval can be achieved by the nearest neighboring search. However, most of them ignore intra-modal relations and complicated semantics between multi-modal data. In this paper, we propose a deep multi-modal metric learning method with multi-scale semantic correlation to deal with the retrieval tasks between image and text modalities. A deep model with two branches is designed to nonlinearly map raw heterogeneous data into comparable representations. In contrast to binary similarity, we formulate semantic relationship with multi-scale similarity to learn fine-grained multi-modal distances. Inter-modal and intra-modal correlations constructed on multi-scale semantic similarity are incorporated to train the deep model in an end-to-end way. Experiments validate the effectiveness of our proposed method on multi-modal retrieval tasks, and our method outperforms state-of-the-art methods on NUS-WIDE, MIR Flickr, and Wikipedia datasets.
9

Han, Dong, Hong Nie, Jinbao Chen, Meng Chen, Zhen Deng, and Jianwei Zhang. "Multi-modal haptic image recognition based on deep learning." Sensor Review 38, no. 4 (September 17, 2018): 486–93. http://dx.doi.org/10.1108/sr-08-2017-0160.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Purpose This paper aims to improve the diversity and richness of haptic perception by recognizing multi-modal haptic images. Design/methodology/approach First, the multi-modal haptic data collected by BioTac sensors from different objects are pre-processed, and then combined into haptic images. Second, a multi-class and multi-label deep learning model is designed, which can simultaneously learn four haptic features (hardness, thermal conductivity, roughness and texture) from the haptic images, and recognize objects based on these features. The haptic images with different dimensions and modalities are provided for testing the recognition performance of this model. Findings The results imply that multi-modal data fusion has a better performance than single-modal data on tactile understanding, and the haptic images with larger dimension are conducive to more accurate haptic measurement. Practical implications The proposed method has important potential application in unknown environment perception, dexterous grasping manipulation and other intelligent robotics domains. Originality/value This paper proposes a new deep learning model for extracting multiple haptic features and recognizing objects from multi-modal haptic images.
10

Pyrovolakis, Konstantinos, Paraskevi Tzouveli, and Giorgos Stamou. "Multi-Modal Song Mood Detection with Deep Learning." Sensors 22, no. 3 (January 29, 2022): 1065. http://dx.doi.org/10.3390/s22031065.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
The production and consumption of music in the contemporary era results in big data generation and creates new needs for automated and more effective management of these data. Automated music mood detection constitutes an active task in the field of MIR (Music Information Retrieval). The first approach to correlating music and mood was made in 1990 by Gordon Burner who researched the way that musical emotion affects marketing. In 2016, Lidy and Schiner trained a CNN for the task of genre and mood classification based on audio. In 2018, Delbouys et al. developed a multi-modal Deep Learning system combining CNN and LSTM architectures and concluded that multi-modal approaches overcome single channel models. This work will examine and compare single channel and multi-modal approaches for the task of music mood detection applying Deep Learning architectures. Our first approach tries to utilize the audio signal and the lyrics of a musical track separately, while the second approach applies a uniform multi-modal analysis to classify the given data into mood classes. The available data we will use to train and evaluate our models comes from the MoodyLyrics dataset, which includes 2000 song titles with labels from four mood classes, {happy, angry, sad, relaxed}. The result of this work leads to a uniform prediction of the mood that represents a music track and has usage in many applications.
11

Priyasad, Darshana, Tharindu Fernando, Simon Denman, Sridha Sridharan, and Clinton Fookes. "Memory based fusion for multi-modal deep learning." Information Fusion 67 (March 2021): 136–46. http://dx.doi.org/10.1016/j.inffus.2020.10.005.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Kasa, Kevin, David Burns, Mitchell G. Goldenberg, Omar Selim, Cari Whyne, and Michael Hardisty. "Multi-Modal Deep Learning for Assessing Surgeon Technical Skill." Sensors 22, no. 19 (September 27, 2022): 7328. http://dx.doi.org/10.3390/s22197328.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
This paper introduces a new dataset of a surgical knot-tying task, and a multi-modal deep learning model that achieves comparable performance to expert human raters on this skill assessment task. Seventy-two surgical trainees and faculty were recruited for the knot-tying task, and were recorded using video, kinematic, and image data. Three expert human raters conducted the skills assessment using the Objective Structured Assessment of Technical Skill (OSATS) Global Rating Scale (GRS). We also designed and developed three deep learning models: a ResNet-based image model, a ResNet-LSTM kinematic model, and a multi-modal model leveraging the image and time-series kinematic data. All three models demonstrate performance comparable to the expert human raters on most GRS domains. The multi-modal model demonstrates the best overall performance, as measured using the mean squared error (MSE) and intraclass correlation coefficient (ICC). This work is significant since it demonstrates that multi-modal deep learning has the potential to replicate human raters on a challenging human-performed knot-tying task. The study demonstrates an algorithm with state-of-the-art performance in surgical skill assessment. As objective assessment of technical skill continues to be a growing, but resource-heavy, element of surgical education, this study is an important step towards automated surgical skill assessment, ultimately leading to reduced burden on training faculty and institutes.
13

Niu, Yulei, Zhiwu Lu, Ji-Rong Wen, Tao Xiang, and Shih-Fu Chang. "Multi-Modal Multi-Scale Deep Learning for Large-Scale Image Annotation." IEEE Transactions on Image Processing 28, no. 4 (April 2019): 1720–31. http://dx.doi.org/10.1109/tip.2018.2881928.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Wang, Qiuli, Dan Yang, Zhihuan Li, Xiaohong Zhang, and Chen Liu. "Deep Regression via Multi-Channel Multi-Modal Learning for Pneumonia Screening." IEEE Access 8 (2020): 78530–41. http://dx.doi.org/10.1109/access.2020.2990423.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Park, Jongchan, Min-Hyun Kim, and Dong-Geol Choi. "Correspondence Learning for Deep Multi-Modal Recognition and Fraud Detection." Electronics 10, no. 7 (March 28, 2021): 800. http://dx.doi.org/10.3390/electronics10070800.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Deep learning-based methods have achieved good performance in various recognition benchmarks mostly by utilizing single modalities. As different modalities contain complementary information to each other, multi-modal based methods are proposed to implicitly utilize them. In this paper, we propose a simple technique, called correspondence learning (CL), which explicitly learns the relationship among multiple modalities. The multiple modalities in the data samples are randomly mixed among different samples. If the modalities are from the same sample (not mixed), then they have positive correspondence, and vice versa. CL is an auxiliary task for the model to predict the correspondence among modalities. The model is expected to extract information from each modality to check correspondence and achieve better representations in multi-modal recognition tasks. In this work, we first validate the proposed method in various multi-modal benchmarks including CMU Multimodal Opinion-Level Sentiment Intensity (CMU-MOSI) and CMU Multimodal Opinion Sentiment and Emotion Intensity (CMU-MOSEI) sentiment analysis datasets. In addition, we propose a fraud detection method using the learned correspondence among modalities. To validate this additional usage, we collect a multi-modal dataset for fraud detection using real-world samples for reverse vending machines.
16

Dong, Guan-Nan, Chi-Man Pun, and Zheng Zhang. "Deep Collaborative Multi-Modal Learning for Unsupervised Kinship Estimation." IEEE Transactions on Information Forensics and Security 16 (2021): 4197–210. http://dx.doi.org/10.1109/tifs.2021.3098165.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Bhatt, Gaurav, Piyush Jha, and Balasubramanian Raman. "Representation learning using step-based deep multi-modal autoencoders." Pattern Recognition 95 (November 2019): 12–23. http://dx.doi.org/10.1016/j.patcog.2019.05.032.

Full text
APA, Harvard, Vancouver, ISO, and other styles
18

Belfedhal, Alaa Eddine. "Multi-Modal Deep Learning for Effective Malicious Webpage Detection." Revue d'Intelligence Artificielle 37, no. 4 (August 31, 2023): 1005–13. http://dx.doi.org/10.18280/ria.370422.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
The pervasive threat of malicious webpages, which can lead to financial loss, data breaches, and malware infections, underscores the need for effective detection methods. Conventional techniques for detecting malicious web content primarily rely on URL-based features or features extracted from various webpage components, employing a single feature vector input into a machine learning model for classifying webpages as benign or malicious. However, these approaches insufficiently address the complexities inherent in malicious webpages. To overcome this limitation, a novel Multi-Modal Deep Learning method for malicious webpage detection is proposed in this study. Three types of automatically extracted features, specifically those derived from the URL, the JavaScript code, and the webpage text, are leveraged. Each feature type is processed by a distinct deep learning model, facilitating a comprehensive analysis of the webpage. The proposed method demonstrates a high degree of effectiveness, achieving an accuracy rate of 97.90% and a false negative rate of a mere 2%. The results highlight the advantages of utilizing multi-modal features and deep learning techniques for detecting malicious webpages. By considering various aspects of web content, the proposed method offers improved accuracy and a more comprehensive understanding of malicious activities, thereby enhancing web user security and effectively mitigating the risks associated with malicious webpages.
19

M. Shahzad, H., Sohail Masood Bhatti, Arfan Jaffar, and Muhammad Rashid. "A Multi-Modal Deep Learning Approach for Emotion Recognition." Intelligent Automation & Soft Computing 36, no. 2 (2023): 1561–70. http://dx.doi.org/10.32604/iasc.2023.032525.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Zhang, Ning, Huarui Wu, Huaji Zhu, Ying Deng, and Xiao Han. "Tomato Disease Classification and Identification Method Based on Multimodal Fusion Deep Learning." Agriculture 12, no. 12 (November 25, 2022): 2014. http://dx.doi.org/10.3390/agriculture12122014.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Considering that the occurrence and spread of diseases are closely related to the planting environment, a tomato disease diagnosis method based on Multi-ResNet34 multi-modal fusion learning based on residual learning is proposed for the problem of limited recognition rate of a single RGB image of a tomato disease. Based on the ResNet34 backbone network, this paper introduces transfer learning to speed up training, reduce data dependencies, and prevent overfitting due to a small amount of sample data; it also integrates multi-source data (tomato disease image data and environmental parameters). The feature-level multi-modal data fusion method is used to retain the key information of the data to identify the feature, so that the different modal data can complement, support and correct each other, and obtain a more accurate identification effect. Firstly, Mask R-CNN was used to extract partial images of leaves from complex background tomato disease images to reduce the influence of background regions on disease identification. Then, the formed image environment data set was input into the multi-modal fusion model to obtain the identification results of disease types. The proposed multi-modal fusion model Multi-ResNet34 has a classification accuracy of 98.9% for six tomato diseases: bacterial spot, late blight, leaf mold, yellow aspergillosis, gray mold, and early blight, which is higher than that of the single-modal model. With the increase by 1.1%, the effect is obvious. The method in this paper can provide an important basis for the analysis and diagnosis of tomato intelligent greenhouse diseases in the context of agricultural informatization.
21

Kiela, Douwe, and Stephen Clark. "Learning Neural Audio Embeddings for Grounding Semantics in Auditory Perception." Journal of Artificial Intelligence Research 60 (December 26, 2017): 1003–30. http://dx.doi.org/10.1613/jair.5665.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Multi-modal semantics, which aims to ground semantic representations in perception, has relied on feature norms or raw image data for perceptual input. In this paper we examine grounding semantic representations in raw auditory data, using standard evaluations for multi-modal semantics. After having shown the quality of such auditorily grounded representations, we show how they can be applied to tasks where auditory perception is relevant, including two unsupervised categorization experiments, and provide further analysis. We find that features transfered from deep neural networks outperform bag of audio words approaches. To our knowledge, this is the first work to construct multi-modal models from a combination of textual information and auditory information extracted from deep neural networks, and the first work to evaluate the performance of tri-modal (textual, visual and auditory) semantic models.
22

Yang, Yang, Zhilei Wu, Yuexiang Yang, Shuangshuang Lian, Fengjie Guo, and Zhiwei Wang. "A Survey of Information Extraction Based on Deep Learning." Applied Sciences 12, no. 19 (September 27, 2022): 9691. http://dx.doi.org/10.3390/app12199691.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
As a core task and an important link in the fields of natural language understanding and information retrieval, information extraction (IE) can structure and semanticize unstructured multi-modal information. In recent years, deep learning (DL) has attracted considerable research attention to IE tasks. Deep learning-based entity relation extraction techniques have gradually surpassed traditional feature- and kernel-function-based methods in terms of the depth of feature extraction and model accuracy. In this paper, we explain the basic concepts of IE and DL, primarily expounding on the research progress and achievements of DL technologies in the field of IE. At the level of IE tasks, it is expounded from entity relationship extraction, event extraction, and multi-modal information extraction three aspects, and creates a comparative analysis of various extraction techniques. We also summarize the prospects and development trends in DL in the field of IE as well as difficulties requiring further study. It is believed that research can be carried out in the direction of multi-model and multi-task joint extraction, information extraction based on knowledge enhancement, and information fusion based on multi-modal at the method level. At the model level, further research should be carried out in the aspects of strengthening theoretical research, model lightweight, and improving model generalization ability.
23

Li, Zhe, Yuming Jiang, and Ruijiang Li. "Abstract 2313: Multi-modal deep learning to predict cancer outcomes by integrating radiology and pathology images." Cancer Research 84, no. 6_Supplement (March 22, 2024): 2313. http://dx.doi.org/10.1158/1538-7445.am2024-2313.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Abstract Purpose: Cancer patients routinely undergo radiologic and pathologic evaluation for their diagnostic workup. These data modalities represent a valuable and readily available resource for developing new prognostic tools. Given their vast difference in spatial scales, effective methods to integrate the two modalities are currently lacking. Here, we aim to develop a multi-modal approach to integrate radiology and pathology images for predicting outcomes in cancer patients. Methods: We propose a multi-modal weakly-supervised deep learning framework to integrate radiology and pathology images for survival prediction. We first extract multi-scale features from whole-slide H&E-stained pathology images to characterize cellular and tissue phenotypes as well as spatial cellular organization. We then build a hierarchical co-attention transformer to effectively learn the multi-modal interactions between radiology and pathology image features. Finally, a multimodal risk score is derived by combining complementary information from two images modalities and clinical data for predicting outcome. We evaluate our approach in lung, gastric, and brain cancers with matched radiology and pathology images and clinical data available, each with separate training and external validation cohorts. Results: The multi-modal deep learning models achieved a reasonably high accuracy for predicting survival outcomes in the external validation cohorts (C-index range: 0.72-0.75 across three cancer types). The multi-modal prognostic models significantly improved upon single-modal approach based on radiology or pathology images or clinical data alone (C-index range: 0.53-0.71, P<0.01). The multi-modal deep learning models were significantly associated with disease-free survival and overall survival (hazard ratio range: 3.23-4.46, P<0.0001). In multivariable analyses, the models remained an independent prognostic factor (P<0.01) after adjusting for clinicopathological variables including cancer stage and tumor differentiation. Conclusions: The proposed multi-modal deep learning approach outperforms traditional methods for predicting survival outcomes by leveraging routinely available radiology and pathology images. With further independent validation, this may afford a promising approach to improve risk stratification and better inform treatment strategies for cancer patients. Citation Format: Zhe Li, Yuming Jiang, Ruijiang Li. Multi-modal deep learning to predict cancer outcomes by integrating radiology and pathology images [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 2313.
24

Ghoniem, Rania M., Abeer D. Algarni, Basel Refky, and Ahmed A. Ewees. "Multi-Modal Evolutionary Deep Learning Model for Ovarian Cancer Diagnosis." Symmetry 13, no. 4 (April 10, 2021): 643. http://dx.doi.org/10.3390/sym13040643.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Ovarian cancer (OC) is a common reason for mortality among women. Deep learning has recently proven better performance in predicting OC stages and subtypes. However, most of the state-of-the-art deep learning models employ single modality data, which may afford low-level performance due to insufficient representation of important OC characteristics. Furthermore, these deep learning models still lack to the optimization of the model construction, which requires high computational cost to train and deploy them. In this work, a hybrid evolutionary deep learning model, using multi-modal data, is proposed. The established multi-modal fusion framework amalgamates gene modality alongside with histopathological image modality. Based on the different states and forms of each modality, we set up deep feature extraction network, respectively. This includes a predictive antlion-optimized long-short-term-memory model to process gene longitudinal data. Another predictive antlion-optimized convolutional neural network model is included to process histopathology images. The topology of each customized feature network is automatically set by the antlion optimization algorithm to make it realize better performance. After that the output from the two improved networks is fused based upon weighted linear aggregation. The deep fused features are finally used to predict OC stage. A number of assessment indicators was used to compare the proposed model to other nine multi-modal fusion models constructed using distinct evolutionary algorithms. This was conducted using a benchmark for OC and two benchmarks for breast and lung cancers. The results reveal that the proposed model is more precise and accurate in diagnosing OC and the other cancers.
25

Li, Xuefei, Liangtu Song, Liu Liu, and Linli Zhou. "GSS-RiskAsser: A Multi-Modal Deep-Learning Framework for Urban Gas Supply System Risk Assessment on Business Users." Sensors 21, no. 21 (October 22, 2021): 7010. http://dx.doi.org/10.3390/s21217010.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Gas supply system risk assessment is a serious and important problem in cities. Existing methods tend to manually build mathematical models to predict risk value from single-modal information, i.e., pipeline parameters. In this paper, we attempt to consider this problem from a deep-learning perspective and define a novel task, Urban Gas Supply System Risk Assessment (GSS-RA). To drive deep-learning techniques into this task, we collect and build a domain-specific dataset GSS-20K containing multi-modal data. Accompanying the dataset, we design a new deep-learning framework named GSS-RiskAsser to learn risk prediction. In our method, we design a parallel-transformers Vision Embedding Transformer (VET) and Score Matrix Transformer (SMT) to process multi-modal information, and then propose a Multi-Modal Fusion (MMF) module to fuse the features with a cross-attention mechanism. Experiments show that GSS-RiskAsser could work well on GSS-RA task and facilitate practical applications. Our data and code will be made publicly available.
26

Zheng, Qiushuo, Hao Wen, Meng Wang, and Guilin Qi. "Visual Entity Linking via Multi-modal Learning." Data Intelligence 4, no. 1 (2022): 1–19. http://dx.doi.org/10.1162/dint_a_00114.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Abstract Existing visual scene understanding methods mainly focus on identifying coarse-grained concepts about the visual objects and their relationships, largely neglecting fine-grained scene understanding. In fact, many data-driven applications on the Web (e.g., news-reading and e-shopping) require accurate recognition of much less coarse concepts as entities and proper linking them to a knowledge graph (KG), which can take their performance to the next level. In light of this, in this paper, we identify a new research task: visual entity linking for fine-grained scene understanding. To accomplish the task, we first extract features of candidate entities from different modalities, i.e., visual features, textual features, and KG features. Then, we design a deep modal-attention neural network-based learning-to-rank method which aggregates all features and maps visual objects to the entities in KG. Extensive experimental results on the newly constructed dataset show that our proposed method is effective as it significantly improves the accuracy performance from 66.46% to 83.16% compared with baselines.
27

Wilson, Justin C., Suku Nair, Sandro Scielzo, and Eric C. Larson. "Objective Measures of Cognitive Load Using Deep Multi-Modal Learning." Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 5, no. 1 (March 19, 2021): 1–35. http://dx.doi.org/10.1145/3448111.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
The capability of measuring human performance objectively is hard to overstate, especially in the context of the instructor and student relationship within the process of learning. In this work, we investigate the automated classification of cognitive load leveraging the aviation domain as a surrogate for complex task workload induction. We use a mixed virtual and physical flight environment, given a suite of biometric sensors utilizing the HTC Vive Pro Eye and the E4 Empatica. We create and evaluate multiple models. And we have taken advantage of advancements in deep learning such as generative learning, multi-modal learning, multi-task learning, and x-vector architectures to classify multiple tasks across 40 subjects inclusive of three subject types --- pilots, operators, and novices. Our cognitive load model can automate the evaluation of cognitive load agnostic to subject, subject type, and flight maneuver (task) with an accuracy of over 80%. Further, this approach is validated with real-flight data from five test pilots collected over two test and evaluation flights on a C-17 aircraft.
28

Sharma, Pulkit, Achut Manandhar, Patrick Thomson, Jacob Katuva, Robert Hope, and David A. Clifton. "Combining Multi-Modal Statistics for Welfare Prediction Using Deep Learning." Sustainability 11, no. 22 (November 11, 2019): 6312. http://dx.doi.org/10.3390/su11226312.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
In the context of developing countries, effective groundwater resource management is often hindered by a lack of data integration between resource availability, water demand, and the welfare of water users. As a consequence, drinking water-related policies and investments, while broadly beneficial, are unlikely to be able to target the most in need. To find the households in need, we need to estimate their welfare status first. However, the current practices for estimating welfare need a detailed questionnaire in the form of a survey which is time-consuming and resource-intensive. In this work, we propose an alternate solution to this problem by performing a small set of cost-effective household surveys, which can be collected over a short amount of time. We try to compensate for the loss of information by using other modalities of data. By combining different modalities of data, this work aims to characterize the welfare status of people with respect to their local drinking water resource. This work employs deep learning-based methods to model welfare using multi-modal data from household surveys, community handpump abstraction, and groundwater levels. We employ a multi-input multi-output deep learning framework, where different types of deep learning models are used for different modalities of data. Experimental results in this work have demonstrated that the multi-modal data in the form of a small set of survey questions, handpump abstraction data, and groundwater level can be used to estimate the welfare status of households. In addition, the results show that different modalities of data have complementary information, which, when combined, improves the overall performance of our ability to predict welfare.
29

Glavan, Andreea, and Estefanía Talavera. "InstaIndoor and multi-modal deep learning for indoor scene recognition." Neural Computing and Applications 34, no. 9 (January 22, 2022): 6861–77. http://dx.doi.org/10.1007/s00521-021-06781-2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

Liu, Yu, and Tinne Tuytelaars. "A Deep Multi-Modal Explanation Model for Zero-Shot Learning." IEEE Transactions on Image Processing 29 (2020): 4788–803. http://dx.doi.org/10.1109/tip.2020.2975980.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Xiang, Lei, Yong Chen, Weitang Chang, Yiqiang Zhan, Weili Lin, Qian Wang, and Dinggang Shen. "Deep-Learning-Based Multi-Modal Fusion for Fast MR Reconstruction." IEEE Transactions on Biomedical Engineering 66, no. 7 (July 2019): 2105–14. http://dx.doi.org/10.1109/tbme.2018.2883958.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

Xu, Xiangyang, Yuncheng Li, Gangshan Wu, and Jiebo Luo. "Multi-modal deep feature learning for RGB-D object detection." Pattern Recognition 72 (December 2017): 300–313. http://dx.doi.org/10.1016/j.patcog.2017.07.026.

Full text
APA, Harvard, Vancouver, ISO, and other styles
33

Wei, Jie, Huaping Liu, Gaowei Yan, and Fuchun Sun. "Robotic grasping recognition using multi-modal deep extreme learning machine." Multidimensional Systems and Signal Processing 28, no. 3 (March 3, 2016): 817–33. http://dx.doi.org/10.1007/s11045-016-0389-0.

Full text
APA, Harvard, Vancouver, ISO, and other styles
34

Kim, Woo-Hyeon, Geon-Woo Kim, and Joo-Chang Kim. "Multi-Modal Deep Learning based Metadata Extensions for Video Clipping." International Journal on Advanced Science, Engineering and Information Technology 14, no. 1 (February 28, 2024): 375–80. http://dx.doi.org/10.18517/ijaseit.14.1.19047.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
General video search and recommendation systems primarily rely on metadata and personal information. Metadata includes file names, keywords, tags, and genres, among others, and is used to describe the video's content. The video platform assesses the relevance of user search queries to the video metadata and presents search results in order of highest relevance. Recommendations are based on videos with metadata judged to be similar to the one the user is currently watching. Most platforms offer search and recommendation services by employing separate algorithms for metadata and personal information. Therefore, metadata plays a vital role in video search. Video service platforms develop various algorithms to provide users with more accurate search results and recommendations. Quantifying video similarity is essential to enhance the accuracy of search results and recommendations. Since content producers primarily provide basic metadata, it can be abused. Additionally, the resemblance between similar video segments may diminish depending on its duration. This paper proposes a metadata expansion model that utilizes object recognition and Speech-to-Text (STT) technology. The model selects key objects by analyzing the frequency of their appearance in the video, extracts audio separately, transcribes it into text, and extracts the script. Scripts are quantified by tokenizing them into words using text-mining techniques. By augmenting metadata with key objects and script tokens, various video content search and recommendation platforms are expected to deliver results closer to user search terms and recommend related content.
35

Althenayan, Albatoul S., Shada A. AlSalamah, Sherin Aly, Thamer Nouh, Bassam Mahboub, Laila Salameh, Metab Alkubeyyer, and Abdulrahman Mirza. "COVID-19 Hierarchical Classification Using a Deep Learning Multi-Modal." Sensors 24, no. 8 (April 20, 2024): 2641. http://dx.doi.org/10.3390/s24082641.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Coronavirus disease 2019 (COVID-19), originating in China, has rapidly spread worldwide. Physicians must examine infected patients and make timely decisions to isolate them. However, completing these processes is difficult due to limited time and availability of expert radiologists, as well as limitations of the reverse-transcription polymerase chain reaction (RT-PCR) method. Deep learning, a sophisticated machine learning technique, leverages radiological imaging modalities for disease diagnosis and image classification tasks. Previous research on COVID-19 classification has encountered several limitations, including binary classification methods, single-feature modalities, small public datasets, and reliance on CT diagnostic processes. Additionally, studies have often utilized a flat structure, disregarding the hierarchical structure of pneumonia classification. This study aims to overcome these limitations by identifying pneumonia caused by COVID-19, distinguishing it from other types of pneumonia and healthy lungs using chest X-ray (CXR) images and related tabular medical data, and demonstrate the value of incorporating tabular medical data in achieving more accurate diagnoses. Resnet-based and VGG-based pre-trained convolutional neural network (CNN) models were employed to extract features, which were then combined using early fusion for the classification of eight distinct classes. We leveraged the hierarchal structure of pneumonia classification within our approach to achieve improved classification outcomes. Since an imbalanced dataset is common in this field, a variety of versions of generative adversarial networks (GANs) were used to generate synthetic data. The proposed approach tested in our private datasets of 4523 patients achieved a macro-avg F1-score of 95.9% and an F1-score of 87.5% for COVID-19 identification using a Resnet-based structure. In conclusion, in this study, we were able to create an accurate deep learning multi-modal to diagnose COVID-19 and differentiate it from other kinds of pneumonia and normal lungs, which will enhance the radiological diagnostic process.
36

Siddanna, S. R., and Y. C. Kiran. "Two Stage Multi Modal Deep Learning Kannada Character Recognition Model Adaptive to Discriminative Patterns of Kannada Characters." Indian Journal Of Science And Technology 16, no. 3 (January 22, 2023): 155–66. http://dx.doi.org/10.17485/ijst/v16i3.1904.

Full text
APA, Harvard, Vancouver, ISO, and other styles
37

Williams-Lekuona, Mikel, Georgina Cosma, and Iain Phillips. "A Framework for Enabling Unpaired Multi-Modal Learning for Deep Cross-Modal Hashing Retrieval." Journal of Imaging 8, no. 12 (December 15, 2022): 328. http://dx.doi.org/10.3390/jimaging8120328.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Cross-Modal Hashing (CMH) retrieval methods have garnered increasing attention within the information retrieval research community due to their capability to deal with large amounts of data thanks to the computational efficiency of hash-based methods. To date, the focus of cross-modal hashing methods has been on training with paired data. Paired data refers to samples with one-to-one correspondence across modalities, e.g., image and text pairs where the text sample describes the image. However, real-world applications produce unpaired data that cannot be utilised by most current CMH methods during the training process. Models that can learn from unpaired data are crucial for real-world applications such as cross-modal neural information retrieval where paired data is limited or not available to train the model. This paper provides (1) an overview of the CMH methods when applied to unpaired datasets, (2) proposes a framework that enables pairwise-constrained CMH methods to train with unpaired samples, and (3) evaluates the performance of state-of-the-art CMH methods across different pairing scenarios.
38

Zheng, Ke, and Zhou Li. "An Image-Text Matching Method for Multi-Modal Robots." Journal of Organizational and End User Computing 36, no. 1 (December 8, 2023): 1–21. http://dx.doi.org/10.4018/joeuc.334701.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
With the rapid development of artificial intelligence and deep learning, image-text matching has gradually become an important research topic in cross-modal fields. Achieving correct image-text matching requires a strong understanding of the correspondence between visual and textual information. In recent years, deep learning-based image-text matching methods have achieved significant success. However, image-text matching requires a deep understanding of intra-modal information and the exploration of fine-grained alignment between image regions and textual words. How to integrate these two aspects into a single model remains a challenge. Additionally, reducing the internal complexity of the model and effectively constructing and utilizing prior knowledge are also areas worth exploring, therefore addressing the issues of excessive computational complexity in existing fine-grained matching methods and the lack of multi-perspective matching.
39

Juan, Bao, Tuo Min, Hou Meng Ting, Li Xi Yu, and Wang Qun. "Research on Intelligent Medical Engineering Analysis and Decision Based on Deep Learning." International Journal of Web Services Research 19, no. 1 (January 1, 2022): 1–9. http://dx.doi.org/10.4018/ijwsr.314949.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
With the increasing amount of medical data and the high dimensional and diversified complex information, based on artificial intelligence and machine learning, a new way is provided that is multi-source, heterogeneous, high dimensional, real-time, multi-scale, dynamic, and uncertain. Driven by medical and health big data and using deep learning theories and methods, this paper proposes a new mode of “multi-modal fusion-association mining-analysis and prediction-intelligent decision” for intelligent medicine analysis and decision making. First, research on “multi-modal fusion method of medical big data based on deep learning” explores a new method of medical big data fusion in complex environment. Second, research on “dynamic change rules and analysis and prediction methods of medical big data based on deep learning” explores a new method for medical big data fusion in complex environment. Third, research on “intelligent medicine decision method” explores a new intelligent medicine decision method.
40

Song, Kuiyong, Lianke Zhou, and Hongbin Wang. "Deep Coupling Recurrent Auto-Encoder with Multi-Modal EEG and EOG for Vigilance Estimation." Entropy 23, no. 10 (October 9, 2021): 1316. http://dx.doi.org/10.3390/e23101316.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Vigilance estimation of drivers is a hot research field of current traffic safety. Wearable devices can monitor information regarding the driver’s state in real time, which is then analyzed by a data analysis model to provide an estimation of vigilance. The accuracy of the data analysis model directly affects the effect of vigilance estimation. In this paper, we propose a deep coupling recurrent auto-encoder (DCRA) that combines electroencephalography (EEG) and electrooculography (EOG). This model uses a coupling layer to connect two single-modal auto-encoders to construct a joint objective loss function optimization model, which consists of single-modal loss and multi-modal loss. The single-modal loss is measured by Euclidean distance, and the multi-modal loss is measured by a Mahalanobis distance of metric learning, which can effectively reflect the distance between different modal data so that the distance between different modes can be described more accurately in the new feature space based on the metric matrix. In order to ensure gradient stability in the long sequence learning process, a multi-layer gated recurrent unit (GRU) auto-encoder model was adopted. The DCRA integrates data feature extraction and feature fusion. Relevant comparative experiments show that the DCRA is better than the single-modal method and the latest multi-modal fusion. The DCRA has a lower root mean square error (RMSE) and a higher Pearson correlation coefficient (PCC).
41

D’Isanto, A. "Uncertain Photometric Redshifts with Deep Learning Methods." Proceedings of the International Astronomical Union 12, S325 (October 2016): 209–12. http://dx.doi.org/10.1017/s1743921316013090.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
AbstractThe need for accurate photometric redshifts estimation is a topic that has fundamental importance in Astronomy, due to the necessity of efficiently obtaining redshift information without the need of spectroscopic analysis. We propose a method for determining accurate multi-modal photo-z probability density functions (PDFs) using Mixture Density Networks (MDN) and Deep Convolutional Networks (DCN). A comparison with a Random Forest (RF) is performed.
42

Liang, Chengxu, and Jianshe Dong. "A Survey of Deep Learning-based Facial Expression Recognition Research." Frontiers in Computing and Intelligent Systems 5, no. 2 (September 1, 2023): 56–60. http://dx.doi.org/10.54097/fcis.v5i2.12445.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Facial expression is one of the ways to convey emotional expression. Deep learning is used to analyze facial expression to understand people's true feelings, and human-computer interaction is integrated into it. However, in the natural real environment and various interference (such as lighting, age and ethnicity), facial expression recognition will face many challenges. In recent years, with the development of artificial intelligence, scholars have studied more and more facial expression recognition in the case of interference, which not only promotes the theoretical research, but also makes it popularized in the application. Facial expression recognition is to identify facial expressions to carry out emotion analysis, and emotion analysis can be analyzed with the help of facial expressions, speech, text, video and other signals. Therefore, facial expression recognition can be regarded as a research direction of emotion analysis. This paper focuses on the perspective of facial expression recognition to summarize. In the process of facial expression recognition, researchers usually try to combine multiple modal information such as voice, text, picture and video for analysis. Due to the differences between single-modal data set and multi-modal data set, this paper will analyze static facial expression recognition, dynamic facial expression recognition and multi-modal fusion. This research has a wide range of applications, such as: smart elderly care, medical research, detection of fatigue driving and other fields.
43

Choi, Sanghyuk Roy, and Minhyeok Lee. "Estimating the Prognosis of Low-Grade Glioma with Gene Attention Using Multi-Omics and Multi-Modal Schemes." Biology 11, no. 10 (October 5, 2022): 1462. http://dx.doi.org/10.3390/biology11101462.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
The prognosis estimation of low-grade glioma (LGG) patients with deep learning models using gene expression data has been extensively studied in recent years. However, the deep learning models used in these studies do not utilize the latest deep learning techniques, such as residual learning and ensemble learning. To address this limitation, in this study, a deep learning model using multi-omics and multi-modal schemes, namely the Multi-Prognosis Estimation Network (Multi-PEN), is proposed. When using Multi-PEN, gene attention layers are employed for each datatype, including mRNA and miRNA, thereby allowing us to identify prognostic genes. Additionally, recent developments in deep learning, such as residual learning and layer normalization, are utilized. As a result, Multi-PEN demonstrates competitive performance compared to conventional models for prognosis estimation. Furthermore, the most significant prognostic mRNA and miRNA were identified using the attention layers in Multi-PEN. For instance, MYBL1 was identified as the most significant prognostic mRNA. Such a result accords with the findings in existing studies that have demonstrated that MYBL1 regulates cell survival, proliferation, and differentiation. Additionally, hsa-mir-421 was identified as the most significant prognostic miRNA, and it has been extensively reported that hsa-mir-421 is highly associated with various cancers. These results indicate that the estimations of Multi-PEN are valid and reliable and showcase Multi-PEN’s capacity to present hypotheses regarding prognostic mRNAs and miRNAs.
44

He, Chao, Xinghua Zhang, Dongqing Song, Yingshan Shen, Chengjie Mao, Huosheng Wen, Dingju Zhu , and Lihua Cai. "Mixture of Attention Variants for Modal Fusion in Multi-Modal Sentiment Analysis." Big Data and Cognitive Computing 8, no. 2 (January 29, 2024): 14. http://dx.doi.org/10.3390/bdcc8020014.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
With the popularization of better network access and the penetration of personal smartphones in today’s world, the explosion of multi-modal data, particularly opinionated video messages, has created urgent demands and immense opportunities for Multi-Modal Sentiment Analysis (MSA). Deep learning with the attention mechanism has served as the foundation technique for most state-of-the-art MSA models due to its ability to learn complex inter- and intra-relationships among different modalities embedded in video messages, both temporally and spatially. However, modal fusion is still a major challenge due to the vast feature space created by the interactions among different data modalities. To address the modal fusion challenge, we propose an MSA algorithm based on deep learning and the attention mechanism, namely the Mixture of Attention Variants for Modal Fusion (MAVMF). The MAVMF algorithm includes a two-stage process: in stage one, self-attention is applied to effectively extract image and text features, and the dependency relationships in the context of video discourse are captured by a bidirectional gated recurrent neural module; in stage two, four multi-modal attention variants are leveraged to learn the emotional contributions of important features from different modalities. Our proposed approach is end-to-end and has been shown to achieve a superior performance to the state-of-the-art algorithms when tested with two largest public datasets, CMU-MOSI and CMU-MOSEI.
45

Zhang, Huan, and Shunren Xia. "Enhancing Acute Bilirubin Encephalopathy Diagnosis with Multi-Modal MRI: A Deep Learning Approach." Applied Sciences 14, no. 6 (March 14, 2024): 2464. http://dx.doi.org/10.3390/app14062464.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Background: Acute Bilirubin Encephalopathy (ABE) is a major cause of infant mortality and disability, making early detection and treatment essential to prevent further progression and complications. Methods: To enhance the diagnostic capabilities of multi-modal Magnetic Resonance Imaging (MRI) for ABE, we proposed a deep learning model integrating an attention module (AM) with a central network (CentralNet). This model was tested on MRI data from 145 newborns diagnosed with ABE and 140 non-ABE newborns, utilizing both T1-weighted and T2-weighted images. Results: The findings indicated the following: (1) In single-modality experiments, the inclusion of AM significantly improved all the performance metrics compared to the models without AM. Specifically, for T1-weighted MRI, the accuracy was 0.639 ± 0.04, AUC was 0.682 ± 0.037, and sensitivity was 0.688 ± 0.09. For the T2-weighted images, the accuracy was 0.738 ± 0.039 and the AUC was 0.796 ± 0.025. (2) In multi-modal experiments, using T1 + T2 images, our model achieved the best accuracy of 0.845 ± 0.018, AUC of 0.913 ± 0.02, and sensitivity of 0.954 ± 0.069, compared to models without an AM and CentralNet. The specificity remained relatively stable, while the precision and F1 scores significantly increased, reaching 0.792 ± 0.048 and 0.862 ± 0.017, respectively. Conclusions: This study emphasizes the effectiveness of combining attention modules with CentralNet, significantly enhancing the accuracy of multi-modal MRI in classifying ABE. It presents a new perspective and possibility for the clinical application of multi-modal MRI imaging in the diagnosis of ABE.
46

Farahnakian, Fahimeh, and Jukka Heikkonen. "Deep Learning Based Multi-Modal Fusion Architectures for Maritime Vessel Detection." Remote Sensing 12, no. 16 (August 5, 2020): 2509. http://dx.doi.org/10.3390/rs12162509.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Object detection is a fundamental computer vision task for many real-world applications. In the maritime environment, this task is challenging due to varying light, view distances, weather conditions, and sea waves. In addition, light reflection, camera motion and illumination changes may cause to false detections. To address this challenge, we present three fusion architectures to fuse two imaging modalities: visible and infrared. These architectures can provide complementary information from two modalities in different levels: pixel-level, feature-level, and decision-level. They employed deep learning for performing fusion and detection. We investigate the performance of the proposed architectures conducting a real marine image dataset, which is captured by color and infrared cameras on-board a vessel in the Finnish archipelago. The cameras are employed for developing autonomous ships, and collect data in a range of operation and climatic conditions. Experiments show that feature-level fusion architecture outperforms the state-of-the-art other fusion level architectures.
47

Hssayeni, Murtadha D., and Behnaz Ghoraani. "Multi-Modal Physiological Data Fusion for Affect Estimation Using Deep Learning." IEEE Access 9 (2021): 21642–52. http://dx.doi.org/10.1109/access.2021.3055933.

Full text
APA, Harvard, Vancouver, ISO, and other styles
48

Arya, Nikhilanand, and Sriparna Saha. "Multi-modal advanced deep learning architectures for breast cancer survival prediction." Knowledge-Based Systems 221 (June 2021): 106965. http://dx.doi.org/10.1016/j.knosys.2021.106965.

Full text
APA, Harvard, Vancouver, ISO, and other styles
49

YAO Hong-ge, 姚红革, 沈新霞 SHEN Xin-xia, 李宇 LI Yu, 喻钧 YU Jun, and 雷松泽 LEI Song-ze. "Multi-modal Fusion Brain Tumor Detection Method Based on Deep Learning." ACTA PHOTONICA SINICA 48, no. 7 (2019): 717001. http://dx.doi.org/10.3788/gzxb20194807.0717001.

Full text
APA, Harvard, Vancouver, ISO, and other styles
50

Паршин, А. И., М. Н. Аралов, В. Ф. Барабанов, and Н. И. Гребенникова. "RANDOM MULTI-MODAL DEEP LEARNING IN THE PROBLEM OF IMAGE RECOGNITION." ВЕСТНИК ВОРОНЕЖСКОГО ГОСУДАРСТВЕННОГО ТЕХНИЧЕСКОГО УНИВЕРСИТЕТА, no. 4 (October 20, 2021): 21–26. http://dx.doi.org/10.36622/vstu.2021.17.4.003.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Задача распознавания изображений - одна из самых сложных в машинном обучении, требующая от исследователя как глубоких знаний, так и больших временных и вычислительных ресурсов. В случае использования нелинейных и сложных данных применяются различные архитектуры глубоких нейронных сетей, но при этом сложным вопросом остается проблема выбора нейронной сети. Основными архитектурами, используемыми повсеместно, являются свёрточные нейронные сети (CNN), рекуррентные нейронные сети (RNN), глубокие нейронные сети (DNN). На основе рекуррентных нейронных сетей (RNN) были разработаны сети с долгой краткосрочной памятью (LSTM) и сети с управляемыми реккурентными блоками (GRU). Каждая архитектура нейронной сети имеет свою структуру, свои настраиваемые и обучаемые параметры, обладает своими достоинствами и недостатками. Комбинируя различные виды нейронных сетей, можно существенно улучшить качество предсказания в различных задачах машинного обучения. Учитывая, что выбор оптимальной архитектуры сети и ее параметров является крайне трудной задачей, рассматривается один из методов построения архитектуры нейронных сетей на основе комбинации свёрточных, рекуррентных и глубоких нейронных сетей. Показано, что такие архитектуры превосходят классические алгоритмы машинного обучения The image recognition task is one of the most difficult in machine learning, requiring both deep knowledge and large time and computational resources from the researcher. In the case of using nonlinear and complex data, various architectures of deep neural networks are used but the problem of choosing a neural network remains a difficult issue. The main architectures used everywhere are convolutional neural networks (CNN), recurrent neural networks (RNN), deep neural networks (DNN). Based on recurrent neural networks (RNNs), Long Short Term Memory Networks (LSTMs) and Controlled Recurrent Unit Networks (GRUs) were developed. Each neural network architecture has its own structure, customizable and trainable parameters, and advantages and disadvantages. By combining different types of neural networks, you can significantly improve the quality of prediction in various machine learning problems. Considering that the choice of the optimal network architecture and its parameters is an extremely difficult task, one of the methods for constructing the architecture of neural networks based on a combination of convolutional, recurrent and deep neural networks is considered. We showed that such architectures are superior to classical machine learning algorithms

To the bibliography