Journal articles on the topic 'Multi-modal Machine Learning'

To see the other types of publications on this topic, follow the link: Multi-modal Machine Learning.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Multi-modal Machine Learning.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Liang, Haotian, and Zhanqing Wang. "Hierarchical Attention Networks for Multimodal Machine Learning." Journal of Physics: Conference Series 2218, no. 1 (March 1, 2022): 012020. http://dx.doi.org/10.1088/1742-6596/2218/1/012020.

Full text
Abstract:
Abstract The Visual Question Answering (VQA) task is to infer the correct answer to a free-form question based on the given image. This task is challenging because it requires model handling both visual and textual information. Most successful attempts on VQA task have been achieved by using attention mechanism which can capture inter-modal and intra-modal dependencies. In this paper, we raise a new attention-based model to solve VQA. We use question information to guide model concentrate on special regions and attribute and hierarchically reason the answer. We also propose multi-modal fusion strategy based on co-attention method to fuse both visual and textual information. Under the same experimental conditions, extensive experiments on VQA-v2.0 dataset illustrate our method performance exceeds the performance of some state-of-the-art methods of the same experimental conditions.
APA, Harvard, Vancouver, ISO, and other styles
2

Nachiappan, Balusamy, N. Rajkumar, C. Viji, and Mohanraj A. "Artificial and Deceitful Faces Detection Using Machine Learning." Salud, Ciencia y Tecnología - Serie de Conferencias 3 (March 11, 2024): 611. http://dx.doi.org/10.56294/sctconf2024611.

Full text
Abstract:
Security certification is becoming popular for many applications, such as significant financial transactions. PIN and password authentication is the most common method of authentication. Due to the finite length of the password, the security level is low and can be easily damaged. Adding a new dimension to the sensing mode-driven state-of-the-art multi-modal boundary face recognition system of the image-based solutions. It combines the active complex visual features extracted from the latest facial recognition model and uses a custom Convolution Neural Network (CNN) issue facial authentications and extraction capabilities to ensure the safety of face recognition. The Echo function is dependent on the geometry and material of the face, not disguised by the pictures and videos, such as multi-modal design is easy to image-based face recognition system.
APA, Harvard, Vancouver, ISO, and other styles
3

Liu, Ang, Tianying Lin, Hailong Han, Xiaopei Zhang, Ze Chen, Fuwan Gan, Haibin Lv, and Xiaoping Liu. "Analyzing modal power in multi-mode waveguide via machine learning." Optics Express 26, no. 17 (August 10, 2018): 22100. http://dx.doi.org/10.1364/oe.26.022100.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Liu, Huaping, Jing Fang, Xinying Xu, and Fuchun Sun. "Surface Material Recognition Using Active Multi-modal Extreme Learning Machine." Cognitive Computation 10, no. 6 (July 4, 2018): 937–50. http://dx.doi.org/10.1007/s12559-018-9571-z.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Wei, Jie, Huaping Liu, Gaowei Yan, and Fuchun Sun. "Robotic grasping recognition using multi-modal deep extreme learning machine." Multidimensional Systems and Signal Processing 28, no. 3 (March 3, 2016): 817–33. http://dx.doi.org/10.1007/s11045-016-0389-0.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

A, Mr Balaji. "Extracting Audio from Image Using Machine Learning." INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 08, no. 04 (April 24, 2024): 1–5. http://dx.doi.org/10.55041/ijsrem31532.

Full text
Abstract:
This study introduces a new method for extracting sound from pictures by utilizing machine learning. Lately, there has been a lot of excitement around multi-modal learning because of its ability to reveal valuable information from various sources, like images and sound. Our research is centered on using the unique qualities of visual and auditory signals to predict sound content from pictures. This opens up possibilities for enhancing accessibility, creating content, and providing immersive user experiences. We start by exploring previous research in multi-modal learning, audio-visual processing, and tasks like image captioning and sound source localization. Based on this background, we introduce an approach that merges convolutional neural networks (CNNs) for image analysis with recurrent neural networks (RNNs) or transformers for sequence interpretation. The system is educated on a collection of matched images and associated audio tracks, allowing it to grasp the intricate connections between visual and auditory data. In our study, we carefully assessed the performance of our proposed method by using well-known metrics. We measure how well our method works by comparing it to other methods and showing that it can accurately and quickly extract audio from images. We also show through qualitative analysis that our model can create clear audio representations from a variety of visual inputs. After a thorough discussion, we analyze the findings, pointing out both the advantages and drawbacks of our method. We pinpoint potential areas for further study, such as delving into more advanced structures and incorporating semantic data to enhance audio extraction. To sum up, this study adds to the expanding field of multi-modal learning by introducing a promising model for extracting audio from images through machine learning. Our results emphasize the potential of this technology to improve accessibility, inspire creativity, and increase user engagement in different fields. Key Words: Audio Extraction, Machine Learning, Computer Vision, Deep Learning, Convolutional Neural Networks
APA, Harvard, Vancouver, ISO, and other styles
7

Asim, Yousra, Basit Raza, Ahmad Kamran Malik, Saima Rathore, Lal Hussain, and Mohammad Aksam Iftikhar. "A multi-modal, multi-atlas-based approach for Alzheimer detection via machine learning." International Journal of Imaging Systems and Technology 28, no. 2 (January 10, 2018): 113–23. http://dx.doi.org/10.1002/ima.22263.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

G, Nandhini, and Santosh K. Balivada. "Multi-Modal Feature Integration in Machine Learning Predictions for Cardiovascular Diseases." International Journal of Health Technology and Innovation 2, no. 03 (December 7, 2023): 15–18. http://dx.doi.org/10.60142/ijhti.v2i03.03.

Full text
Abstract:
Early detection and prevention of cardiovascular illnesses rely heavily on phonocardiogram (PCG) and electrocardiogram (ECG). A novel multi-modal machine learning strategy based on ECG and PCG data is presented in this work for predicting cardiovascular diseases (CVD). ECG and PCG features are combined for optimal feature subset selection using a genetic algorithm (GA). Then, machine learning classifiers are implemented to do the classification of abnormal and normal signals
APA, Harvard, Vancouver, ISO, and other styles
9

Liu, Huaping, Fengxue Li, Xinying Xu, and Fuchun Sun. "Multi-modal local receptive field extreme learning machine for object recognition." Neurocomputing 277 (February 2018): 4–11. http://dx.doi.org/10.1016/j.neucom.2017.04.077.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Lamichhane, Bidhan, Dinal Jayasekera, Rachel Jakes, Matthew F. Glasser, Justin Zhang, Chunhui Yang, Derayvia Grimes, et al. "Multi-modal biomarkers of low back pain: A machine learning approach." NeuroImage: Clinical 29 (2021): 102530. http://dx.doi.org/10.1016/j.nicl.2020.102530.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Huang, Haiming, Junhao Lin, Linyuan Wu, Bin Fang, Zhenkun Wen, and Fuchun Sun. "Machine learning-based multi-modal information perception for soft robotic hands." Tsinghua Science and Technology 25, no. 2 (April 2020): 255–69. http://dx.doi.org/10.26599/tst.2019.9010009.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

He, Liqi, Zuchao Li, Xiantao Cai, and Ping Wang. "Multi-Modal Latent Space Learning for Chain-of-Thought Reasoning in Language Models." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 16 (March 24, 2024): 18180–87. http://dx.doi.org/10.1609/aaai.v38i16.29776.

Full text
Abstract:
Chain-of-thought (CoT) reasoning has exhibited impressive performance in language models for solving complex tasks and answering questions. However, many real-world questions require multi-modal information, such as text and images. Previous research on multi-modal CoT has primarily focused on extracting fixed image features from off-the-shelf vision models and then fusing them with text using attention mechanisms. This approach has limitations because these vision models were not designed for complex reasoning tasks and do not align well with language thoughts. To overcome this limitation, we introduce a novel approach for multi-modal CoT reasoning that utilizes latent space learning via diffusion processes to generate effective image features that align with language thoughts. Our method fuses image features and text representations at a deep level and improves the complex reasoning ability of multi-modal CoT. We demonstrate the efficacy of our proposed method on multi-modal ScienceQA and machine translation benchmarks, achieving state-of-the-art performance on ScienceQA. Overall, our approach offers a more robust and effective solution for multi-modal reasoning in language models, enhancing their ability to tackle complex real-world problems.
APA, Harvard, Vancouver, ISO, and other styles
13

Zhang, Lingyu, Xu Geng, Zhiwei Qin, Hongjun Wang, Xiao Wang, Ying Zhang, Jian Liang, Guobin Wu, Xuan Song, and Yunhai Wang. "Multi-Modal Graph Interaction for Multi-Graph Convolution Network in Urban Spatiotemporal Forecasting." Sustainability 14, no. 19 (September 29, 2022): 12397. http://dx.doi.org/10.3390/su141912397.

Full text
Abstract:
Graph convolution network-based approaches have been recently used to model region-wise relationships in region-level prediction problems in urban computing. Each relationship represents a kind of spatial dependency, such as region-wise distance or functional similarity. To incorporate multiple relationships into a spatial feature extraction, we define the problem as a multi-modal machine learning problem on multi-graph convolution networks. Leveraging the advantage of multi-modal machine learning, we propose to develop modality interaction mechanisms for this problem in order to reduce the generalization error by reinforcing the learning of multi-modal coordinated representations. In this work, we propose two interaction techniques for handling features in lower layers and higher layers, respectively. In lower layers, we propose grouped GCN to combine the graph connectivity from different modalities for a more complete spatial feature extraction. In higher layers, we adapt multi-linear relationship networks to GCN by exploring the dimension transformation and freezing part of the covariance structure. The adapted approach, called multi-linear relationship GCN, learns more generalized features to overcome the train–test divergence induced by time shifting. We evaluated our model on a ride-hailing demand forecasting problem using two real-world datasets. The proposed technique outperforms state-of-the art baselines in terms of prediction accuracy, training efficiency, interpretability and model robustness.
APA, Harvard, Vancouver, ISO, and other styles
14

Ehiabhi, Jolly, and Haifeng Wang. "A Systematic Review of Machine Learning Models in Mental Health Analysis Based on Multi-Channel Multi-Modal Biometric Signals." BioMedInformatics 3, no. 1 (March 1, 2023): 193–219. http://dx.doi.org/10.3390/biomedinformatics3010014.

Full text
Abstract:
With the increase in biosensors and data collection devices in the healthcare industry, artificial intelligence and machine learning have attracted much attention in recent years. In this study, we offered a comprehensive review of the current trends and the state-of-the-art in mental health analysis as well as the application of machine-learning techniques for analyzing multi-variate/multi-channel multi-modal biometric signals.This study reviewed the predominant mental-health-related biosensors, including polysomnography (PSG), electroencephalogram (EEG), electro-oculogram (EOG), electromyogram (EMG), and electrocardiogram (ECG). We also described the processes used for data acquisition, data-cleaning, feature extraction, machine-learning modeling, and performance evaluation. This review showed that support-vector-machine and deep-learning techniques have been well studied, to date.After reviewing over 200 papers, we also discussed the current challenges and opportunities in this field.
APA, Harvard, Vancouver, ISO, and other styles
15

Bhatt, Saachin, Mustansar Ghazanfar, and Mohammad Hossein Amirhosseini. "Sentiment-Driven Cryptocurrency Price Prediction: A Machine Learning Approach Utilizing Historical Data and Social Media Sentiment Analysis." Machine Learning and Applications: An International Journal 10, no. 2/3 (September 28, 2023): 01–15. http://dx.doi.org/10.5121/mlaij.2023.10301.

Full text
Abstract:
This research explores the impact of social media sentiments on predicting Bitcoin prices using machine learning models, integrating on-chain data, and applying a Multi Modal Fusion Model. Historical crypto market, on-chain, and Twitter data from 2014 to 2022 were used to train models including K-Nearest Neighbors, Logistic Regression, Gaussian Naive Bayes, Support Vector Machine, Extreme Gradient Boosting, and Multi Modal Fusion. Performance was compared with and without Twitter sentiment data which was analysed using the Twitter-roBERTa and VADAR models. Inclusion of sentiment data enhanced model performance, with Twitter-roBERTa-based models achieving an average accuracy score of 0.81. The best performing model was an optimised Multi Modal Fusion model using Twitter-roBERTa, with an accuracy score of 0.90. This research underscores the value of integrating social media sentiment analysis and onchain data in financial forecasting, providing a robust tool for informed decision-making in cryptocurrency trading.
APA, Harvard, Vancouver, ISO, and other styles
16

Islam, Kazi Aminul, Mohammad Shahab Uddin, Chiman Kwan, and Jiang Li. "Flood Detection Using Multi-Modal and Multi-Temporal Images: A Comparative Study." Remote Sensing 12, no. 15 (July 30, 2020): 2455. http://dx.doi.org/10.3390/rs12152455.

Full text
Abstract:
Natural disasters such as flooding can severely affect human life and property. To provide rescue through an emergency response team, we need an accurate flooding assessment of the affected area after the event. Traditionally, it requires a lot of human resources to obtain an accurate estimation of a flooded area. In this paper, we compared several traditional machine-learning approaches for flood detection including multi-layer perceptron (MLP), support vector machine (SVM), deep convolutional neural network (DCNN) with recent domain adaptation-based approaches, based on a multi-modal and multi-temporal image dataset. Specifically, we used SPOT-5 and RADAR images from the flood event that occurred in November 2000 in Gloucester, UK. Experimental results show that the domain adaptation-based approach, semi-supervised domain adaptation (SSDA) with 20 labeled data samples, achieved slightly better values of the area under the precision-recall (PR) curve (AUC) of 0.9173 and F1 score of 0.8846 than those by traditional machine approaches. However, SSDA required much less labor for ground-truth labeling and should be recommended in practice.
APA, Harvard, Vancouver, ISO, and other styles
17

Li, Xiong, Yangping Qiu, Juan Zhou, and Ziruo Xie. "Applications and Challenges of Machine Learning Methods in Alzheimer's Disease Multi-Source Data Analysis." Current Genomics 22, no. 8 (December 2021): 564–82. http://dx.doi.org/10.2174/1389202923666211216163049.

Full text
Abstract:
Background: Recent development in neuroimaging and genetic testing technologies have made it possible to measure pathological features associated with Alzheimer's disease (AD) in vivo. Mining potential molecular markers of AD from high-dimensional, multi-modal neuroimaging and omics data will provide a new basis for early diagnosis and intervention in AD. In order to discover the real pathogenic mutation and even understand the pathogenic mechanism of AD, lots of machine learning methods have been designed and successfully applied to the analysis and processing of large-scale AD biomedical data. Objective: To introduce and summarize the applications and challenges of machine learning methods in Alzheimer's disease multi-source data analysis. Methods: The literature selected in the review is obtained from Google Scholar, PubMed, and Web of Science. The keywords of literature retrieval include Alzheimer's disease, bioinformatics, image genetics, genome-wide association research, molecular interaction network, multi-omics data integration, and so on. Conclusion: This study comprehensively introduces machine learning-based processing techniques for AD neuroimaging data and then shows the progress of computational analysis methods in omics data, such as the genome, proteome, and so on. Subsequently, machine learning methods for AD imaging analysis are also summarized. Finally, we elaborate on the current emerging technology of multi-modal neuroimaging, multi-omics data joint analysis, and present some outstanding issues and future research directions.
APA, Harvard, Vancouver, ISO, and other styles
18

Doan, H. G., and N. T. Nguyen. "Fusion Machine Learning Strategies for Multi-modal Sensor-based Hand Gesture Recognition." Engineering, Technology & Applied Science Research 12, no. 3 (June 6, 2022): 8628–33. http://dx.doi.org/10.48084/etasr.4913.

Full text
Abstract:
Hand gesture recognition has attracted the attention of many scientists, because of its high applicability in fields such as sign language expression and human machine interaction. Many approaches have been deployed to detect and recognize hand gestures, like wearable devices, image information, and/or a combination of sensors and computer vision. However, the method of using wearable sensors brings much higher accuracy and is less affected by occlusion, lighting conditions, and complex background. Existing solutions separately utilize sensor information and/or only use sensor information processing and decision-making algorithms over conventional threshold comparison algorithms and do not analyze data or utilize machine learning algorithms. In this paper, a multi-modal solution is proposed that combines information for measuring the curvature of the fingers and sensors for measuring angular velocity and acceleration. The provided information from the sensors is normalized and analyzed and various fusion strategies are used. Then, the most suitable algorithm for these sensor-based multiple modalities is proposed. The proposed system also analyzes the differences between gestures and actions that are almost similar but in fact, they are just normal moving gestures.
APA, Harvard, Vancouver, ISO, and other styles
19

Паршин, А. И., М. Н. Аралов, В. Ф. Барабанов, and Н. И. Гребенникова. "RANDOM MULTI-MODAL DEEP LEARNING IN THE PROBLEM OF IMAGE RECOGNITION." ВЕСТНИК ВОРОНЕЖСКОГО ГОСУДАРСТВЕННОГО ТЕХНИЧЕСКОГО УНИВЕРСИТЕТА, no. 4 (October 20, 2021): 21–26. http://dx.doi.org/10.36622/vstu.2021.17.4.003.

Full text
Abstract:
Задача распознавания изображений - одна из самых сложных в машинном обучении, требующая от исследователя как глубоких знаний, так и больших временных и вычислительных ресурсов. В случае использования нелинейных и сложных данных применяются различные архитектуры глубоких нейронных сетей, но при этом сложным вопросом остается проблема выбора нейронной сети. Основными архитектурами, используемыми повсеместно, являются свёрточные нейронные сети (CNN), рекуррентные нейронные сети (RNN), глубокие нейронные сети (DNN). На основе рекуррентных нейронных сетей (RNN) были разработаны сети с долгой краткосрочной памятью (LSTM) и сети с управляемыми реккурентными блоками (GRU). Каждая архитектура нейронной сети имеет свою структуру, свои настраиваемые и обучаемые параметры, обладает своими достоинствами и недостатками. Комбинируя различные виды нейронных сетей, можно существенно улучшить качество предсказания в различных задачах машинного обучения. Учитывая, что выбор оптимальной архитектуры сети и ее параметров является крайне трудной задачей, рассматривается один из методов построения архитектуры нейронных сетей на основе комбинации свёрточных, рекуррентных и глубоких нейронных сетей. Показано, что такие архитектуры превосходят классические алгоритмы машинного обучения The image recognition task is one of the most difficult in machine learning, requiring both deep knowledge and large time and computational resources from the researcher. In the case of using nonlinear and complex data, various architectures of deep neural networks are used but the problem of choosing a neural network remains a difficult issue. The main architectures used everywhere are convolutional neural networks (CNN), recurrent neural networks (RNN), deep neural networks (DNN). Based on recurrent neural networks (RNNs), Long Short Term Memory Networks (LSTMs) and Controlled Recurrent Unit Networks (GRUs) were developed. Each neural network architecture has its own structure, customizable and trainable parameters, and advantages and disadvantages. By combining different types of neural networks, you can significantly improve the quality of prediction in various machine learning problems. Considering that the choice of the optimal network architecture and its parameters is an extremely difficult task, one of the methods for constructing the architecture of neural networks based on a combination of convolutional, recurrent and deep neural networks is considered. We showed that such architectures are superior to classical machine learning algorithms
APA, Harvard, Vancouver, ISO, and other styles
20

Irfan, Bahar, Michael Garcia Ortiz, Natalia Lyubova, and Tony Belpaeme. "Multi-modal Open World User Identification." ACM Transactions on Human-Robot Interaction 11, no. 1 (March 31, 2022): 1–50. http://dx.doi.org/10.1145/3477963.

Full text
Abstract:
User identification is an essential step in creating a personalised long-term interaction with robots. This requires learning the users continuously and incrementally, possibly starting from a state without any known user. In this article, we describe a multi-modal incremental Bayesian network with online learning, which is the first method that can be applied in such scenarios. Face recognition is used as the primary biometric, and it is combined with ancillary information, such as gender, age, height, and time of interaction to improve the recognition. The Multi-modal Long-term User Recognition Dataset is generated to simulate various human-robot interaction (HRI) scenarios and evaluate our approach in comparison to face recognition, soft biometrics, and a state-of-the-art open world recognition method (Extreme Value Machine). The results show that the proposed methods significantly outperform the baselines, with an increase in the identification rate up to 47.9% in open-set and closed-set scenarios, and a significant decrease in long-term recognition performance loss. The proposed models generalise well to new users, provide stability, improve over time, and decrease the bias of face recognition. The models were applied in HRI studies for user recognition, personalised rehabilitation, and customer-oriented service, which showed that they are suitable for long-term HRI in the real world.
APA, Harvard, Vancouver, ISO, and other styles
21

Belfedhal, Alaa Eddine. "Multi-Modal Deep Learning for Effective Malicious Webpage Detection." Revue d'Intelligence Artificielle 37, no. 4 (August 31, 2023): 1005–13. http://dx.doi.org/10.18280/ria.370422.

Full text
Abstract:
The pervasive threat of malicious webpages, which can lead to financial loss, data breaches, and malware infections, underscores the need for effective detection methods. Conventional techniques for detecting malicious web content primarily rely on URL-based features or features extracted from various webpage components, employing a single feature vector input into a machine learning model for classifying webpages as benign or malicious. However, these approaches insufficiently address the complexities inherent in malicious webpages. To overcome this limitation, a novel Multi-Modal Deep Learning method for malicious webpage detection is proposed in this study. Three types of automatically extracted features, specifically those derived from the URL, the JavaScript code, and the webpage text, are leveraged. Each feature type is processed by a distinct deep learning model, facilitating a comprehensive analysis of the webpage. The proposed method demonstrates a high degree of effectiveness, achieving an accuracy rate of 97.90% and a false negative rate of a mere 2%. The results highlight the advantages of utilizing multi-modal features and deep learning techniques for detecting malicious webpages. By considering various aspects of web content, the proposed method offers improved accuracy and a more comprehensive understanding of malicious activities, thereby enhancing web user security and effectively mitigating the risks associated with malicious webpages.
APA, Harvard, Vancouver, ISO, and other styles
22

Toda, Kanon, Kazuya Kishizawa, Yuma Toyoda, Kohei Noda, Heeyoung Lee, Kentaro Nakamura, Koichi Ichige, and Yosuke Mizuno. "Characterization of modal interference in multi-core polymer optical fibers and its application to temperature sensing." Applied Physics Express 15, no. 7 (June 13, 2022): 072002. http://dx.doi.org/10.35848/1882-0786/ac749e.

Full text
Abstract:
Abstract Various types of fiber-optic temperature sensors have been developed on the basis of modal interference in multimode fibers, which include not only glass fibers but also polymer optical fibers (POFs). Herein, we investigate the spectral patterns of the modal interference in multi-core POFs (originally developed for imaging) and observe their unique temperature dependencies with no clear frequency shift or critical wavelength. We then show that, by machine learning, the modal interference in the multi-core POFs can be potentially used for highly accurate temperature sensing with an error of ∼0.3 °C.
APA, Harvard, Vancouver, ISO, and other styles
23

Ma’sum, Muhammad Anwar, Hadaiq Rolis Sanabila, Petrus Mursanto, and Wisnu Jatmiko. "Clustering versus Incremental Learning Multi-Codebook Fuzzy Neural Network for Multi-Modal Data Classification." Computation 8, no. 1 (January 13, 2020): 6. http://dx.doi.org/10.3390/computation8010006.

Full text
Abstract:
One of the challenges in machine learning is a classification in multi-modal data. The problem needs a customized method as the data has a feature that spreads in several areas. This study proposed a multi-codebook fuzzy neural network classifiers using clustering and incremental learning approaches to deal with multi-modal data classification. The clustering methods used are K-Means and GMM clustering. Experiment result, on a synthetic dataset, the proposed method achieved the highest performance with 84.76% accuracy. Whereas on the benchmark dataset, the proposed method has the highest performance with 79.94% accuracy. The proposed method has 24.9% and 4.7% improvements in synthetic and benchmark datasets respectively compared to the original version. The proposed classifier has better accuracy compared to a popular neural network with 10% and 4.7% margin in synthetic and benchmark dataset respectively.
APA, Harvard, Vancouver, ISO, and other styles
24

Wróblewska, Anna, Jacek Dąbrowski, Michał Pastuszak, Andrzej Michałowski, Michał Daniluk, Barbara Rychalska, Mikołaj Wieczorek, and Sylwia Sysko-Romańczuk. "Designing Multi-Modal Embedding Fusion-Based Recommender." Electronics 11, no. 9 (April 27, 2022): 1391. http://dx.doi.org/10.3390/electronics11091391.

Full text
Abstract:
Recommendation systems have lately been popularised globally. However, often they need to be adapted to particular data and the use case. We have developed a machine learning-based recommendation system, which can be easily applied to almost any items and/or actions domain. Contrary to existing recommendation systems, our system supports multiple types of interaction data with various modalities of metadata through a multi-modal fusion of different data representations. We deployed the system into numerous e-commerce stores, e.g., food and beverages, shoes, fashion items, and telecom operators. We present our system and its main algorithms for data representations and multi-modal fusion. We show benchmark results on open datasets that outperform the state-of-the-art prior work. We also demonstrate use cases for different e-commerce sites.
APA, Harvard, Vancouver, ISO, and other styles
25

Xu, Ziqi, Jingwen Zhang, Jacob Greenberg, Madelyn Frumkin, Saad Javeed, Justin K. Zhang, Braeden Benedict, et al. "Predicting Multi-dimensional Surgical Outcomes with Multi-modal Mobile Sensing." Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 8, no. 2 (May 13, 2024): 1–30. http://dx.doi.org/10.1145/3659628.

Full text
Abstract:
Pre-operative prediction of post-surgical recovery for patients is vital for clinical decision-making and personalized treatments, especially with lumbar spine surgery, where patients exhibit highly heterogeneous outcomes. Existing predictive tools mainly rely on traditional Patient-Reported Outcome Measures (PROMs), which fail to capture the long-term dynamics of patient conditions before the surgery. Moreover, existing studies focus on predicting a single surgical outcome. However, recovery from spine surgery is multi-dimensional, including multiple distinctive but interrelated outcomes, such as pain interference, physical function, and quality of recovery. In recent years, the emergence of smartphones and wearable devices has presented new opportunities to capture longitudinal and dynamic information regarding patients' conditions outside the hospital. This paper proposes a novel machine learning approach, Multi-Modal Multi-Task Learning (M3TL), using smartphones and wristbands to predict multiple surgical outcomes after lumbar spine surgeries. We formulate the prediction of pain interference, physical function, and quality of recovery as a multi-task learning (MTL) problem. We leverage multi-modal data to capture the static and dynamic characteristics of patients, including (1) traditional features from PROMs and Electronic Health Records (EHR), (2) Ecological Momentary Assessment (EMA) collected from smartphones, and (3) sensing data from wristbands. Moreover, we introduce new features derived from the correlation of EMA and wearable features measured within the same time frame, effectively enhancing predictive performance by capturing the interdependencies between the two data modalities. Our model interpretation uncovers the complementary nature of the different data modalities and their distinctive contributions toward multiple surgical outcomes. Furthermore, through individualized decision analysis, our model identifies personal high risk factors to aid clinical decision making and approach personalized treatments. In a clinical study involving 122 patients undergoing lumbar spine surgery, our M3TL model outperforms a diverse set of baseline methods in predictive performance, demonstrating the value of integrating multi-modal data and learning from multiple surgical outcomes. This work contributes to advancing personalized peri-operative care with accurate pre-operative predictions of multi-dimensional outcomes.
APA, Harvard, Vancouver, ISO, and other styles
26

Kalyani, BJD, Kopparthi Praneeth Sai, N. M. Deepika, Shaik Shahanaz, and G. Lohitha. "Smart Multi-Model Emotion Recognition System with Deep learning." International Journal on Recent and Innovation Trends in Computing and Communication 11, no. 1 (February 6, 2023): 139–44. http://dx.doi.org/10.17762/ijritcc.v11i1.6061.

Full text
Abstract:
Emotion recognition is added a new dimension to the sentiment analysis. This paper presents a multi-modal human emotion recognition web application by considering of three traits includes speech, text, facial expressions, to extract and analyze emotions of people who are giving interviews. Now a days there is a rapid development of Machine Learning, Artificial Intelligence and deep learning, this emotion recognition is getting more attention from researchers. These machines are said to be intelligent only if they are able to do human recognition or sentiment analysis. Emotion recognition helps in spam call detection, blackmailing calls, customer services, lie detectors, audience engagement, suspicious behavior. In this paper focus on facial expression analysis is carried out by using deep learning approaches with speech signals and input text.
APA, Harvard, Vancouver, ISO, and other styles
27

Zhang, Wenyin, Yong Wu, Bo Yang, Shunbo Hu, Liang Wu, and Sahraoui Dhelimd. "Overview of Multi-Modal Brain Tumor MR Image Segmentation." Healthcare 9, no. 8 (August 16, 2021): 1051. http://dx.doi.org/10.3390/healthcare9081051.

Full text
Abstract:
The precise segmentation of brain tumor images is a vital step towards accurate diagnosis and effective treatment of brain tumors. Magnetic Resonance Imaging (MRI) can generate brain images without tissue damage or skull artifacts, providing important discriminant information for clinicians in the study of brain tumors and other brain diseases. In this paper, we survey the field of brain tumor MRI images segmentation. Firstly, we present the commonly used databases. Then, we summarize multi-modal brain tumor MRI image segmentation methods, which are divided into three categories: conventional segmentation methods, segmentation methods based on classical machine learning methods, and segmentation methods based on deep learning methods. The principles, structures, advantages and disadvantages of typical algorithms in each method are summarized. Finally, we analyze the challenges, and suggest a prospect for future development trends.
APA, Harvard, Vancouver, ISO, and other styles
28

Juan, Bao, Tuo Min, Hou Meng Ting, Li Xi Yu, and Wang Qun. "Research on Intelligent Medical Engineering Analysis and Decision Based on Deep Learning." International Journal of Web Services Research 19, no. 1 (January 1, 2022): 1–9. http://dx.doi.org/10.4018/ijwsr.314949.

Full text
Abstract:
With the increasing amount of medical data and the high dimensional and diversified complex information, based on artificial intelligence and machine learning, a new way is provided that is multi-source, heterogeneous, high dimensional, real-time, multi-scale, dynamic, and uncertain. Driven by medical and health big data and using deep learning theories and methods, this paper proposes a new mode of “multi-modal fusion-association mining-analysis and prediction-intelligent decision” for intelligent medicine analysis and decision making. First, research on “multi-modal fusion method of medical big data based on deep learning” explores a new method of medical big data fusion in complex environment. Second, research on “dynamic change rules and analysis and prediction methods of medical big data based on deep learning” explores a new method for medical big data fusion in complex environment. Third, research on “intelligent medicine decision method” explores a new intelligent medicine decision method.
APA, Harvard, Vancouver, ISO, and other styles
29

Huang, Tianhao, Xiaozhi Zhu, and Mo Niu. "An End-to-End Benchmarking Tool for Analyzing the Hardware-Software Implications of Multi-modal DNNs." ACM SIGMETRICS Performance Evaluation Review 51, no. 3 (January 3, 2024): 25–27. http://dx.doi.org/10.1145/3639830.3639841.

Full text
Abstract:
Abstract-Multi-modal deep neural networks (DNNs) have become increasingly pervasive in many machine learning application domains due to their superior accuracy by fusing various modalities together. However, multi-modal DNNs present many unique characteristics such as multi-stage execution, frequent synchronization and high heterogeneity, which are not well understood in the system and architecture community. In this article, we first present and characterize a set of multi-modal DNN workloads of different sizes from five domains and measure metrics like accuracy to ensure the availability of these applications from the algorithm perspective. We then explore their important hardwaresoftware implications from system and architecture aspects by conducting an in-depth analysis on the unique hardware-software characteristics of multimodal DNNs. We hope that our work can help guide future hardware-software design and optimization for efficient inference of multi-modal DNN applications on both cloud and edge computing platforms.
APA, Harvard, Vancouver, ISO, and other styles
30

Li, Pengpai, Yongmei Hu, and Zhi-Ping Liu. "Prediction of cardiovascular diseases by integrating multi-modal features with machine learning methods." Biomedical Signal Processing and Control 66 (April 2021): 102474. http://dx.doi.org/10.1016/j.bspc.2021.102474.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Sammali, Federica, Celine Blank, Tom G. H. Bakkes, Yizhou Huang, Chiara Rabotti, Benedictus C. Schoot, and Massimo Mischi. "Multi-Modal Uterine-Activity Measurements for Prediction of Embryo Implantation by Machine Learning." IEEE Access 9 (2021): 47096–111. http://dx.doi.org/10.1109/access.2021.3067716.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

Yao, Wenfang, Kejing Yin, William K. Cheung, Jia Liu, and Jing Qin. "DrFuse: Learning Disentangled Representation for Clinical Multi-Modal Fusion with Missing Modality and Modal Inconsistency." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 15 (March 24, 2024): 16416–24. http://dx.doi.org/10.1609/aaai.v38i15.29578.

Full text
Abstract:
The combination of electronic health records (EHR) and medical images is crucial for clinicians in making diagnoses and forecasting prognoses. Strategically fusing these two data modalities has great potential to improve the accuracy of machine learning models in clinical prediction tasks. However, the asynchronous and complementary nature of EHR and medical images presents unique challenges. Missing modalities due to clinical and administrative factors are inevitable in practice, and the significance of each data modality varies depending on the patient and the prediction target, resulting in inconsistent predictions and suboptimal model performance. To address these challenges, we propose DrFuse to achieve effective clinical multi-modal fusion. It tackles the missing modality issue by disentangling the features shared across modalities and those unique within each modality. Furthermore, we address the modal inconsistency issue via a disease-wise attention layer that produces the patient- and disease-wise weighting for each modality to make the final prediction. We validate the proposed method using real-world large-scale datasets, MIMIC-IV and MIMIC-CXR. Experimental results show that the proposed method significantly outperforms the state-of-the-art models.
APA, Harvard, Vancouver, ISO, and other styles
33

Mason, Rachel E., Nicholas R. Vaughn, and Gregory P. Asner. "Mapping Buildings across Heterogeneous Landscapes: Machine Learning and Deep Learning Applied to Multi-Modal Remote Sensing Data." Remote Sensing 15, no. 18 (September 6, 2023): 4389. http://dx.doi.org/10.3390/rs15184389.

Full text
Abstract:
We describe the production of maps of buildings on Hawai’i Island, based on complementary information contained in two different types of remote sensing data. The maps cover 3200 km2 over a highly varied set of landscape types and building densities. A convolutional neural network was first trained to identify building candidates in LiDAR data. To better differentiate between true buildings and false positives, the CNN-based building probability map was then used, together with 400–2400 nm imaging spectroscopy, as input to a gradient boosting model. Simple vector operations were then employed to further refine the final maps. This stepwise approach resulted in detection of 84%, 100%, and 97% of manually labeled buildings, at the 0.25, 0.5, and 0.75 percentiles of true building size, respectively, with very few false positives. The median absolute error in modeled building areas was 15%. This novel integration of deep learning, machine learning, and multi-modal remote sensing data was thus effective in detecting buildings over large scales and diverse landscapes, with potential applications in urban planning, resource management, and disaster response. The adaptable method presented here expands the range of techniques available for object detection in multi-modal remote sensing data and can be tailored to various kinds of input data, landscape types, and mapping goals.
APA, Harvard, Vancouver, ISO, and other styles
34

Zhang, Shuyan, Steve Qing Yang Wu, Melissa Hum, Jayakumar Perumal, Ern Yu Tan, Ann Siew Gek Lee, Jinghua Teng, U. S. Dinish, and Malini Olivo. "Complete characterization of RNA biomarker fingerprints using a multi-modal ATR-FTIR and SERS approach for label-free early breast cancer diagnosis." RSC Advances 14, no. 5 (2024): 3599–610. http://dx.doi.org/10.1039/d3ra05723b.

Full text
Abstract:
With the multi-modal approach combining ATR-FTIR and SERS, we achieved an extended spectral range for molecular fingerprint detection of RNA biomarkers. Machine learning results shows 91.6% blind test accuracy for label-free breast cancer diagnosis.
APA, Harvard, Vancouver, ISO, and other styles
35

Ghaffar, M. A. A., T. T. Vu, and T. H. Maul. "MULTI-MODAL REMOTE SENSING DATA FUSION FRAMEWORK." ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLII-4/W2 (July 5, 2017): 85–89. http://dx.doi.org/10.5194/isprs-archives-xlii-4-w2-85-2017.

Full text
Abstract:
The inconsistency between the freely available remote sensing datasets and crowd-sourced data from the resolution perspective forms a big challenge in the context of data fusion. In classical classification problems, crowd-sourced data are represented as points that may or not be located within the same pixel. This discrepancy can result in having mixed pixels that could be unjustly classified. Moreover, it leads to failure in retaining sufficient level of details from data inferences. In this paper we propose a method that can preserve detailed inferences from remote sensing datasets accompanied with crowd-sourced data. We show that advanced machine learning techniques can be utilized towards this objective. The proposed method relies on two steps, firstly we enhance the spatial resolution of the satellite image using Convolutional Neural Networks and secondly we fuse the crowd-sourced data with the upscaled version of the satellite image. However, the covered scope in this paper is concerning the first step. Results show that CNN can enhance Landsat 8 scenes resolution visually and quantitatively.
APA, Harvard, Vancouver, ISO, and other styles
36

Naseem, Muhammad Tahir, Haneol Seo, Na-Hyun Kim, and Chan-Su Lee. "Pathological Gait Classification Using Early and Late Fusion of Foot Pressure and Skeleton Data." Applied Sciences 14, no. 2 (January 9, 2024): 558. http://dx.doi.org/10.3390/app14020558.

Full text
Abstract:
Classifying pathological gaits is crucial for identifying impairments in specific areas of the human body. Previous studies have extensively employed machine learning and deep learning (DL) methods, using various wearable (e.g., inertial sensors) and non-wearable (e.g., foot pressure plates and depth cameras) sensors. This study proposes early and late fusion methods through DL to categorize one normal and five abnormal (antalgic, lurch, steppage, stiff-legged, and Trendelenburg) pathological gaits. Initially, single-modal approaches were utilized: first, foot pressure data were augmented for transformer-based models; second, skeleton data were applied to a spatiotemporal graph convolutional network (ST-GCN). Subsequently, a multi-modal approach using early fusion by concatenating features from both the foot pressure and skeleton datasets was introduced. Finally, multi-modal fusions, applying early fusion to the feature vector and late fusion by merging outputs from both modalities with and without varying weights, were evaluated. The foot pressure-based and skeleton-based models achieved 99.04% and 78.24% accuracy, respectively. The proposed multi-modal approach using early fusion achieved 99.86% accuracy, whereas the late fusion method achieved 96.95% accuracy without weights and 99.17% accuracy with different weights. Thus, the proposed multi-modal models using early fusion methods demonstrated state-of-the-art performance on the GIST pathological gait database.
APA, Harvard, Vancouver, ISO, and other styles
37

Chopparapu, SaiTeja, and Joseph Beatrice Seventline. "An Efficient Multi-modal Facial Gesture-based Ensemble Classification and Reaction to Sound Framework for Large Video Sequences." Engineering, Technology & Applied Science Research 13, no. 4 (August 9, 2023): 11263–70. http://dx.doi.org/10.48084/etasr.6087.

Full text
Abstract:
Machine learning-based feature extraction and classification models play a vital role in evaluating and detecting patterns in multivariate facial expressions. Most conventional feature extraction and multi-modal pattern detection models are independent of filters for multi-class classification problems. In traditional multi-modal facial feature extraction models, it is difficult to detect the dependent correlated feature sets and use ensemble classification processes. This study used advanced feature filtering, feature extraction measures, and ensemble multi-class expression prediction to optimize the efficiency of feature classification. A filter-based multi-feature ranking-based voting framework was implemented on different multiple-based classifiers. Experimental results were evaluated on different multi-modal facial features for the automatic emotions listener using a speech synthesis library. The evaluation results showed that the proposed model had better feature classification, feature selection, prediction, and runtime than traditional approaches on heterogeneous facial databases.
APA, Harvard, Vancouver, ISO, and other styles
38

Ma’sum, Muhammad Anwar. "Intelligent Clustering and Dynamic Incremental Learning to Generate Multi-Codebook Fuzzy Neural Network for Multi-Modal Data Classification." Symmetry 12, no. 4 (April 24, 2020): 679. http://dx.doi.org/10.3390/sym12040679.

Full text
Abstract:
Classification in multi-modal data is one of the challenges in the machine learning field. The multi-modal data need special treatment as its features are distributed in several areas. This study proposes multi-codebook fuzzy neural networks by using intelligent clustering and dynamic incremental learning for multi-modal data classification. In this study, we utilized intelligent K-means clustering based on anomalous patterns and intelligent K-means clustering based on histogram information. In this study, clustering is used to generate codebook candidates before the training process, while incremental learning is utilized when the condition to generate a new codebook is sufficient. The condition to generate a new codebook in incremental learning is based on the similarity of the winner class and other classes. The proposed method was evaluated in synthetic and benchmark datasets. The experiment results showed that the proposed multi-codebook fuzzy neural networks that use dynamic incremental learning have significant improvements compared to the original fuzzy neural networks. The improvements were 15.65%, 5.31% and 11.42% on the synthetic dataset, the benchmark dataset, and the average of all datasets, respectively, for incremental version 1. The incremental learning version 2 improved by 21.08% 4.63%, and 14.35% on the synthetic dataset, the benchmark dataset, and the average of all datasets, respectively. The multi-codebook fuzzy neural networks that use intelligent clustering also had significant improvements compared to the original fuzzy neural networks, achieving 23.90%, 2.10%, and 15.02% improvements on the synthetic dataset, the benchmark dataset, and the average of all datasets, respectively.
APA, Harvard, Vancouver, ISO, and other styles
39

Sen, Atriya, Beckett Sterner, Nico Franz, Caleb Powel, and Nathan Upham. "Combining Machine Learning & Reasoning for Biodiversity Data Intelligence." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 17 (May 18, 2021): 14911–19. http://dx.doi.org/10.1609/aaai.v35i17.17750.

Full text
Abstract:
The current crisis in global natural resource management makes it imperative that we better leverage the vast data sources associated with taxonomic entities (such as recognized species of plants and animals), which are known collectively as biodiversity data. However, these data pose considerable challenges for artificial intelligence: while growing rapidly in volume, they remain highly incomplete for many taxonomic groups, often show conflicting signals from different sources, and are multi-modal and therefore constantly changing in structure. In this paper, we motivate, describe, and present a novel workflow combining machine learning and automated reasoning, to discover patterns of taxonomic identity and change - i.e. “taxonomic intelligence” - leading to scalable and broadly impactful AI solutions within the bio-data realm.
APA, Harvard, Vancouver, ISO, and other styles
40

Zhang, Xue, Fusen Guo, Tao Chen, Lei Pan, Gleb Beliakov, and Jianzhang Wu. "A Brief Survey of Machine Learning and Deep Learning Techniques for E-Commerce Research." Journal of Theoretical and Applied Electronic Commerce Research 18, no. 4 (December 4, 2023): 2188–216. http://dx.doi.org/10.3390/jtaer18040110.

Full text
Abstract:
The rapid growth of e-commerce has significantly increased the demand for advanced techniques to address specific tasks in the e-commerce field. In this paper, we present a brief survey of machine learning and deep learning techniques in the context of e-commerce, focusing on the years 2018–2023 in a Google Scholar search, with the aim of identifying state-of-the-art approaches, main topics, and potential challenges in the field. We first introduce the applied machine learning and deep learning techniques, spanning from support vector machines, decision trees, and random forests to conventional neural networks, recurrent neural networks, generative adversarial networks, and beyond. Next, we summarize the main topics, including sentiment analysis, recommendation systems, fake review detection, fraud detection, customer churn prediction, customer purchase behavior prediction, prediction of sales, product classification, and image recognition. Finally, we discuss the main challenges and trends, which are related to imbalanced data, over-fitting and generalization, multi-modal learning, interpretability, personalization, chatbots, and virtual assistance. This survey offers a concise overview of the current state and future directions regarding the use of machine learning and deep learning techniques in the context of e-commerce. Further research and development will be necessary to address the evolving challenges and opportunities presented by the dynamic e-commerce landscape.
APA, Harvard, Vancouver, ISO, and other styles
41

Shangaranarayanee, N. P., V. Aakashbabu, M. Balamurugan, and R. Gokulraj. "Machine Learning Driven Smart Transportation Sharing." Journal of ISMAC 6, no. 1 (March 2024): 1–12. http://dx.doi.org/10.36548/jismac.2024.1.001.

Full text
Abstract:
In many urban areas, traffic congestion has become one of the most challenging issues of modern life, resulting in detrimental effects on the environment, productivity loss, fuel wastage, and longer travel times. As a solution, people are increasingly turning to shared transportation modes due to the convenience of multi-modal journeys facilitated by smart transportation systems. The last mile problem refers to the fact that, in large cities, buses and trains deliver passengers to transit stations close to retail and job areas, leaving them needing another form of transportation to reach their final destination. By promoting the use of public transportation and addressing this issue, a smart bike-sharing system can contribute to reducing traffic congestion. The study presents a review of various methods that are associated with the designing of the bike sharing system and suggests a model incorporating various methods to derive solutions, with a focus on utilizing clustering algorithms for the analysis of the provided time series dataset. The study reveals that the application of algorithms such as the K-Means algorithm, Fuzzy C-means, etc. would be very effective in visualizing the resulting clusters and improve the forecasting accuracy.
APA, Harvard, Vancouver, ISO, and other styles
42

Bednarek, Michal, Piotr Kicki, and Krzysztof Walas. "On Robustness of Multi-Modal Fusion—Robotics Perspective." Electronics 9, no. 7 (July 16, 2020): 1152. http://dx.doi.org/10.3390/electronics9071152.

Full text
Abstract:
The efficient multi-modal fusion of data streams from different sensors is a crucial ability that a robotic perception system should exhibit to ensure robustness against disturbances. However, as the volume and dimensionality of sensory-feedback increase it might be difficult to manually design a multimodal-data fusion system that can handle heterogeneous data. Nowadays, multi-modal machine learning is an emerging field with research focused mainly on analyzing vision and audio information. Although, from the robotics perspective, haptic sensations experienced from interaction with an environment are essential to successfully execute useful tasks. In our work, we compared four learning-based multi-modal fusion methods on three publicly available datasets containing haptic signals, images, and robots’ poses. During tests, we considered three tasks involving such data, namely grasp outcome classification, texture recognition, and—most challenging—multi-label classification of haptic adjectives based on haptic and visual data. Conducted experiments were focused not only on the verification of the performance of each method but mainly on their robustness against data degradation. We focused on this aspect of multi-modal fusion, as it was rarely considered in the research papers, and such degradation of sensory feedback might occur during robot interaction with its environment. Additionally, we verified the usefulness of data augmentation to increase the robustness of the aforementioned data fusion methods.
APA, Harvard, Vancouver, ISO, and other styles
43

Althenayan, Albatoul S., Shada A. AlSalamah, Sherin Aly, Thamer Nouh, Bassam Mahboub, Laila Salameh, Metab Alkubeyyer, and Abdulrahman Mirza. "COVID-19 Hierarchical Classification Using a Deep Learning Multi-Modal." Sensors 24, no. 8 (April 20, 2024): 2641. http://dx.doi.org/10.3390/s24082641.

Full text
Abstract:
Coronavirus disease 2019 (COVID-19), originating in China, has rapidly spread worldwide. Physicians must examine infected patients and make timely decisions to isolate them. However, completing these processes is difficult due to limited time and availability of expert radiologists, as well as limitations of the reverse-transcription polymerase chain reaction (RT-PCR) method. Deep learning, a sophisticated machine learning technique, leverages radiological imaging modalities for disease diagnosis and image classification tasks. Previous research on COVID-19 classification has encountered several limitations, including binary classification methods, single-feature modalities, small public datasets, and reliance on CT diagnostic processes. Additionally, studies have often utilized a flat structure, disregarding the hierarchical structure of pneumonia classification. This study aims to overcome these limitations by identifying pneumonia caused by COVID-19, distinguishing it from other types of pneumonia and healthy lungs using chest X-ray (CXR) images and related tabular medical data, and demonstrate the value of incorporating tabular medical data in achieving more accurate diagnoses. Resnet-based and VGG-based pre-trained convolutional neural network (CNN) models were employed to extract features, which were then combined using early fusion for the classification of eight distinct classes. We leveraged the hierarchal structure of pneumonia classification within our approach to achieve improved classification outcomes. Since an imbalanced dataset is common in this field, a variety of versions of generative adversarial networks (GANs) were used to generate synthetic data. The proposed approach tested in our private datasets of 4523 patients achieved a macro-avg F1-score of 95.9% and an F1-score of 87.5% for COVID-19 identification using a Resnet-based structure. In conclusion, in this study, we were able to create an accurate deep learning multi-modal to diagnose COVID-19 and differentiate it from other kinds of pneumonia and normal lungs, which will enhance the radiological diagnostic process.
APA, Harvard, Vancouver, ISO, and other styles
44

Chalumuri, Yekanth Ram, Jacob P. Kimball, Azin Mousavi, Jonathan S. Zia, Christopher Rolfes, Jesse D. Parreira, Omer T. Inan, and Jin-Oh Hahn. "Classification of Blood Volume Decompensation State via Machine Learning Analysis of Multi-Modal Wearable-Compatible Physiological Signals." Sensors 22, no. 4 (February 10, 2022): 1336. http://dx.doi.org/10.3390/s22041336.

Full text
Abstract:
This paper presents a novel computational algorithm to estimate blood volume decompensation state based on machine learning (ML) analysis of multi-modal wearable-compatible physiological signals. To the best of our knowledge, our algorithm may be the first of its kind which can not only discriminate normovolemia from hypovolemia but also classify hypovolemia into absolute hypovolemia and relative hypovolemia. We realized our blood volume classification algorithm by (i) extracting a multitude of features from multi-modal physiological signals including the electrocardiogram (ECG), the seismocardiogram (SCG), the ballistocardiogram (BCG), and the photoplethysmogram (PPG), (ii) constructing two ML classifiers using the features, one to classify normovolemia vs. hypovolemia and the other to classify hypovolemia into absolute hypovolemia and relative hypovolemia, and (iii) sequentially integrating the two to enable multi-class classification (normovolemia, absolute hypovolemia, and relative hypovolemia). We developed the blood volume decompensation state classification algorithm using the experimental data collected from six animals undergoing normovolemia, relative hypovolemia, and absolute hypovolemia challenges. Leave-one-subject-out analysis showed that our classification algorithm achieved an F1 score and accuracy of (i) 0.93 and 0.89 in classifying normovolemia vs. hypovolemia, (ii) 0.88 and 0.89 in classifying hypovolemia into absolute hypovolemia and relative hypovolemia, and (iii) 0.77 and 0.81 in classifying the overall blood volume decompensation state. The analysis of the features embedded in the ML classifiers indicated that many features are physiologically plausible, and that multi-modal SCG-BCG fusion may play an important role in achieving good blood volume classification efficacy. Our work may complement existing computational algorithms to estimate blood volume compensatory reserve as a potential decision-support tool to provide guidance on context-sensitive hypovolemia therapeutic strategy.
APA, Harvard, Vancouver, ISO, and other styles
45

Jo, Saehan, and Immanuel Trummer. "ThalamusDB: Approximate Query Processing on Multi-Modal Data." Proceedings of the ACM on Management of Data 2, no. 3 (May 29, 2024): 1–26. http://dx.doi.org/10.1145/3654989.

Full text
Abstract:
We introduce ThalamusDB, a novel approximate query processing system that processes complex SQL queries on multi-modal data. ThalamusDB supports SQL queries integrating natural language predicates on visual, audio, and text data. To answer such queries, ThalamusDB exploits a collection of zero-shot models in combination with relational processing. ThalamusDB utilizes deterministic approximate query processing, harnessing the relative efficiency of relational processing to mitigate the computational demands of machine learning inference. For evaluating a natural language predicate, ThalamusDB requests a small number of labels from users. User can specify their preferences on the performance objective regarding the three relevant metrics: approximation error, computation time, and labeling overheads. The ThalamusDB query optimizer chooses optimized plans according to user preferences, prioritizing data processing and requested labels to maximize impact. Experiments with several real-world data sets, taken from Craigslist, YouTube, and Netflix, show that ThalamusDB achieves an average speedup of 35.0x over MindsDB, an exact processing baseline, and outperforms ABAE, a sampling-based method, in 78.9% of cases.
APA, Harvard, Vancouver, ISO, and other styles
46

Zhang, Jianhua, Zhong Yin, Peng Chen, and Stefano Nichele. "Emotion recognition using multi-modal data and machine learning techniques: A tutorial and review." Information Fusion 59 (July 2020): 103–26. http://dx.doi.org/10.1016/j.inffus.2020.01.011.

Full text
APA, Harvard, Vancouver, ISO, and other styles
47

Mansouri, Nesrin, Daniel Balvay, Omar Zenteno, Caterina Facchin, Thulaciga Yoganathan, Thomas Viel, Joaquin Lopez Herraiz, Bertrand Tavitian, and Mailyn Pérez-Liva. "Machine Learning of Multi-Modal Tumor Imaging Reveals Trajectories of Response to Precision Treatment." Cancers 15, no. 6 (March 14, 2023): 1751. http://dx.doi.org/10.3390/cancers15061751.

Full text
Abstract:
The standard assessment of response to cancer treatments is based on gross tumor characteristics, such as tumor size or glycolysis, which provide very indirect information about the effect of precision treatments on the pharmacological targets of tumors. Several advanced imaging modalities allow for the visualization of targeted tumor hallmarks. Descriptors extracted from these images can help establishing new classifications of precision treatment response. We propose a machine learning (ML) framework to analyze metabolic–anatomical–vascular imaging features from positron emission tomography, ultrafast Doppler, and computed tomography in a mouse model of paraganglioma undergoing anti-angiogenic treatment with sunitinib. Imaging features from the follow-up of sunitinib-treated (n = 8, imaged once-per-week/6-weeks) and sham-treated (n = 8, imaged once-per-week/3-weeks) mice groups were dimensionally reduced and analyzed with hierarchical clustering Analysis (HCA). The classes extracted from HCA were used with 10 ML classifiers to find a generalized tumor stage prediction model, which was validated with an independent dataset of sunitinib-treated mice. HCA provided three stages of treatment response that were validated using the best-performing ML classifier. The Gaussian naive Bayes classifier showed the best performance, with a training accuracy of 98.7 and an average area under curve of 100. Our results show that metabolic–anatomical–vascular markers allow defining treatment response trajectories that reflect the efficacy of an anti-angiogenic drug on the tumor target hallmark.
APA, Harvard, Vancouver, ISO, and other styles
48

Ullah, Ubaid, Jeong-Sik Lee, Chang-Hyeon An, Hyeonjin Lee, Su-Yeong Park, Rock-Hyun Baek, and Hyun-Chul Choi. "A Review of Multi-Modal Learning from the Text-Guided Visual Processing Viewpoint." Sensors 22, no. 18 (September 8, 2022): 6816. http://dx.doi.org/10.3390/s22186816.

Full text
Abstract:
For decades, co-relating different data domains to attain the maximum potential of machines has driven research, especially in neural networks. Similarly, text and visual data (images and videos) are two distinct data domains with extensive research in the past. Recently, using natural language to process 2D or 3D images and videos with the immense power of neural nets has witnessed a promising future. Despite the diverse range of remarkable work in this field, notably in the past few years, rapid improvements have also solved future challenges for researchers. Moreover, the connection between these two domains is mainly subjected to GAN, thus limiting the horizons of this field. This review analyzes Text-to-Image (T2I) synthesis as a broader picture, Text-guided Visual-output (T2Vo), with the primary goal being to highlight the gaps by proposing a more comprehensive taxonomy. We broadly categorize text-guided visual output into three main divisions and meaningful subdivisions by critically examining an extensive body of literature from top-tier computer vision venues and closely related fields, such as machine learning and human–computer interaction, aiming at state-of-the-art models with a comparative analysis. This study successively follows previous surveys on T2I, adding value by analogously evaluating the diverse range of existing methods, including different generative models, several types of visual output, critical examination of various approaches, and highlighting the shortcomings, suggesting the future direction of research.
APA, Harvard, Vancouver, ISO, and other styles
49

Jiao, Zhuqing, Siwei Chen, Haifeng Shi, and Jia Xu. "Multi-Modal Feature Selection with Feature Correlation and Feature Structure Fusion for MCI and AD Classification." Brain Sciences 12, no. 1 (January 5, 2022): 80. http://dx.doi.org/10.3390/brainsci12010080.

Full text
Abstract:
Feature selection for multiple types of data has been widely applied in mild cognitive impairment (MCI) and Alzheimer’s disease (AD) classification research. Combining multi-modal data for classification can better realize the complementarity of valuable information. In order to improve the classification performance of feature selection on multi-modal data, we propose a multi-modal feature selection algorithm using feature correlation and feature structure fusion (FC2FS). First, we construct feature correlation regularization by fusing a similarity matrix between multi-modal feature nodes. Then, based on manifold learning, we employ feature matrix fusion to construct feature structure regularization, and learn the local geometric structure of the feature nodes. Finally, the two regularizations are embedded in a multi-task learning model that introduces low-rank constraint, the multi-modal features are selected, and the final features are linearly fused and input into a support vector machine (SVM) for classification. Different controlled experiments were set to verify the validity of the proposed method, which was applied to MCI and AD classification. The accuracy of normal controls versus Alzheimer’s disease, normal controls versus late mild cognitive impairment, normal controls versus early mild cognitive impairment, and early mild cognitive impairment versus late mild cognitive impairment achieve 91.85 ± 1.42%, 85.33 ± 2.22%, 78.29 ± 2.20%, and 77.67 ± 1.65%, respectively. This method makes up for the shortcomings of the traditional multi-modal feature selection based on subjects and fully considers the relationship between feature nodes and the local geometric structure of feature space. Our study not only enhances the interpretation of feature selection but also improves the classification performance, which has certain reference values for the identification of MCI and AD.
APA, Harvard, Vancouver, ISO, and other styles
50

Qiu, Chen, Stephan Mandt, and Maja Rudolph. "History Marginalization Improves Forecasting in Variational Recurrent Neural Networks." Entropy 23, no. 12 (November 24, 2021): 1563. http://dx.doi.org/10.3390/e23121563.

Full text
Abstract:
Deep probabilistic time series forecasting models have become an integral part of machine learning. While several powerful generative models have been proposed, we provide evidence that their associated inference models are oftentimes too limited and cause the generative model to predict mode-averaged dynamics. Mode-averaging is problematic since many real-world sequences are highly multi-modal, and their averaged dynamics are unphysical (e.g., predicted taxi trajectories might run through buildings on the street map). To better capture multi-modality, we develop variational dynamic mixtures (VDM): a new variational family to infer sequential latent variables. The VDM approximate posterior at each time step is a mixture density network, whose parameters come from propagating multiple samples through a recurrent architecture. This results in an expressive multi-modal posterior approximation. In an empirical study, we show that VDM outperforms competing approaches on highly multi-modal datasets from different domains.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography