Articoli di riviste sul tema "Multimodal data processing"

Segui questo link per vedere altri tipi di pubblicazioni sul tema: Multimodal data processing.

Cita una fonte nei formati APA, MLA, Chicago, Harvard e in molti altri stili

Scegli il tipo di fonte:

Vedi i top-50 articoli di riviste per l'attività di ricerca sul tema "Multimodal data processing".

Accanto a ogni fonte nell'elenco di riferimenti c'è un pulsante "Aggiungi alla bibliografia". Premilo e genereremo automaticamente la citazione bibliografica dell'opera scelta nello stile citazionale di cui hai bisogno: APA, MLA, Harvard, Chicago, Vancouver ecc.

Puoi anche scaricare il testo completo della pubblicazione scientifica nel formato .pdf e leggere online l'abstract (il sommario) dell'opera se è presente nei metadati.

Vedi gli articoli di riviste di molte aree scientifiche e compila una bibliografia corretta.

1

Kyselova, A. H., G. D. Kiselov, A. A. Serhyeyev e A. V. Shalaginov. "Processing input data in multimodal applications". Electronics and Communications 16, n. 2 (28 marzo 2011): 86–92. http://dx.doi.org/10.20535/2312-1807.2011.16.2.268253.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
2

Boyko, Nataliya. "Models and Algorithms for Multimodal Data Processing". WSEAS TRANSACTIONS ON INFORMATION SCIENCE AND APPLICATIONS 20 (14 marzo 2023): 87–97. http://dx.doi.org/10.37394/23209.2023.20.11.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Information technologies and computer equipment are used in almost all areas of activity, which is why new areas of their use are emerging, and the level of ICT implementation is deepening, with more and more functions that were the prerogative of humans being assigned to computers. As science and technology develop, new technologies and technical means are emerging that enable a human-centered approach to software development, better adaptation of human-machine interfaces to user needs, and an increase in the ergonomics of software products, etc. These measures contribute to the formation of fundamentally new opportunities for presenting and processing information about real-world objects with which an individual interacts in production, educational and everyday activities in computer systems. The article aims to identify current models and algorithms for processing multimodal data in computer systems based on a survey of company employees and to analyze these models and algorithms to determine the benefits of using models and algorithms for processing multimodal data. Research methods: comparative analysis; systematization; generalization; survey. Results. It has been established that the recommended multimodal data representation models (the mixed model, the spatiotemporal linked model, and the multilevel ontological model) allow for representing the digital twin of the object under study at differentiated levels of abstraction, and these multimodal data processing models can be combined to obtain the most informative way to describe the physical twin. As a result of the study, it was found that the "general judgment of the experience of using models and algorithms for multimodal data processing" was noted by the respondents in the item "Personally, I would say that models and algorithms for multimodal data processing are practical" with an average value of 8.16 (SD = 0 1.70), in the item "Personally, I would say that models and algorithms for multimodal data processing are understandable (not confusing)" with an average value of 7.52. It has been determined that respondents positively evaluate (with scores above 5.0) models and algorithms for processing multimodal data in work environments as practical, understandable, manageable, and original. columns finish at the same distance from the top of the page.
3

Parsons, Aaron D., Stephen W. T. Price, Nicola Wadeson, Mark Basham, Andrew M. Beale, Alun W. Ashton, J. Frederick W. Mosselmans e Paul D. Quinn. "Automatic processing of multimodal tomography datasets". Journal of Synchrotron Radiation 24, n. 1 (1 gennaio 2017): 248–56. http://dx.doi.org/10.1107/s1600577516017756.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
With the development of fourth-generation high-brightness synchrotrons on the horizon, the already large volume of data that will be collected on imaging and mapping beamlines is set to increase by orders of magnitude. As such, an easy and accessible way of dealing with such large datasets as quickly as possible is required in order to be able to address the core scientific problems during the experimental data collection. Savu is an accessible and flexible big data processing framework that is able to deal with both the variety and the volume of data of multimodal and multidimensional scientific datasets output such as those from chemical tomography experiments on the I18 microfocus scanning beamline at Diamond Light Source.
4

Qi, Qingfu, Liyuan Lin e Rui Zhang. "Feature Extraction Network with Attention Mechanism for Data Enhancement and Recombination Fusion for Multimodal Sentiment Analysis". Information 12, n. 9 (24 agosto 2021): 342. http://dx.doi.org/10.3390/info12090342.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Multimodal sentiment analysis and emotion recognition represent a major research direction in natural language processing (NLP). With the rapid development of online media, people often express their emotions on a topic in the form of video, and the signals it transmits are multimodal, including language, visual, and audio. Therefore, the traditional unimodal sentiment analysis method is no longer applicable, which requires the establishment of a fusion model of multimodal information to obtain sentiment understanding. In previous studies, scholars used the feature vector cascade method when fusing multimodal data at each time step in the middle layer. This method puts each modal information in the same position and does not distinguish between strong modal information and weak modal information among multiple modalities. At the same time, this method does not pay attention to the embedding characteristics of multimodal signals across the time dimension. In response to the above problems, this paper proposes a new method and model for processing multimodal signals, which takes into account the delay and hysteresis characteristics of multimodal signals across the time dimension. The purpose is to obtain a multimodal fusion feature emotion analysis representation. We evaluate our method on the multimodal sentiment analysis benchmark dataset CMU Multimodal Opinion Sentiment and Emotion Intensity Corpus (CMU-MOSEI). We compare our proposed method with the state-of-the-art model and show excellent results.
5

Chen, Mujun. "Automatic Image Processing Algorithm for Light Environment Optimization Based on Multimodal Neural Network Model". Computational Intelligence and Neuroscience 2022 (3 giugno 2022): 1–12. http://dx.doi.org/10.1155/2022/5156532.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
In this paper, we conduct an in-depth study and analysis of the automatic image processing algorithm based on a multimodal Recurrent Neural Network (m-RNN) for light environment optimization. By analyzing the structure of m-RNN and combining the current research frontiers of image processing and natural language processing, we find out the problem of the ineffectiveness of m-RNN for some image generation descriptions, starting from both the image feature extraction part and text sequence data processing. Unlike traditional image automatic processing algorithms, this algorithm does not need to add complex rules manually. Still, it evaluates and filters through the training image collection and finally generates image automatic processing models by m-RNN. An image semantic segmentation algorithm is proposed based on multimodal attention and adaptive feature fusion. The main idea of the algorithm is to combine adaptive and feature fusion and then introduce data enhancement for small-scale multimodal light environment datasets by extracting the importance between images through multimodal attention. The model proposed in this paper can span the semantic differences of different modalities and construct feature relationships between different modalities to achieve an inferable, interpretable, and scalable feature representation of multimodal data. The automatic processing of light environment images using multimodal neural networks based on traditional algorithms eliminates manual processing and greatly reduces the time and effort of image processing.
6

BASYSTIUK, Oleh, e Nataliia MELNYKOVA. "MULTIMODAL SPEECH RECOGNITION BASED ON AUDIO AND TEXT DATA". Herald of Khmelnytskyi National University. Technical sciences 313, n. 5 (27 ottobre 2022): 22–25. http://dx.doi.org/10.31891/2307-5732-2022-313-5-22-25.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Systems of machine translation of texts from one language to another simulate the work of a human translator. Their performance depends on the ability to understand the grammar rules of the language. In translation, the basic units are not individual words, but word combinations or phraseological units that express different concepts. Only by using them, more complex ideas can be expressed through the translated text. The main feature of machine translation is different length for input and output. The ability to work with different lengths of input and output provides us with the approach of recurrent neural networks. A recurrent neural network (RNN) is a class of artificial neural network that has connections between nodes. In this case, a connection refers to a connection from a more distant node to a less distant node. The presence of connections allows the RNN to remember and reproduce the entire sequence of reactions to one stimulus. From the point of view of programming, such networks are analogous to cyclic execution, and from the point of view of the system, such networks are equivalent to a state machine. RNNs are commonly used to process word sequences in natural language processing. Usually, a hidden Markov model (HMM) and an N-program language model are used to process a sequence of words. Deep learning has completely changed the approach to machine translation. Researchers in the deep learning field has created simple solutions based on machine learning that outperform the best expert systems. In this paper was reviewed the main features of machine translation based on recurrent neural networks. The advantages of systems based on RNN using the sequence-to-sequence model against statistical translation systems are also highlighted in the article. Two machine translation systems based on the sequence-to-sequence model were constructed using Keras and PyTorch machine learning libraries. Based on the obtained results, libraries analysis was done, and their performance comparison.
7

Basystiuk, Oleh, e Nataliya Melnykova. "Development of the Multimodal Handling Interface Based on Google API". Computer Design Systems. Theory and Practice 6, n. 1 (2024): 216–23. http://dx.doi.org/10.23939/cds2024.01.216.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Today, Artificial Intelligence is a daily routine, becoming deeply entrenched in our lives. One of the most popular and rapidly advancing technologies is speech recognition, which forms an integral part of the broader concept of multimodal data handling. Multimodal data encompasses voice, audio, and text data, constituting a multifaceted approach to understanding and processing information. This paper presents the development of a multimodal handling interface leveraging Google API technologies. The interface aims to facilitate seamless integration and management of diverse data modalities, including text, audio, and video, within a unified platform. Through the utilization of Google API functionalities, such as natural language processing, speech recognition, and video analysis, the interface offers enhanced capabilities for processing, analysing, and interpreting multimodal data. The paper discusses the design and implementation of the interface, highlighting its features and functionalities. Furthermore, it explores potential applications and future directions for utilizing the interface in various domains, including healthcare, education, and multimedia content creation. Overall, the development of the multimodal handling interface based on Google API represents a significant step towards advancing multimodal data processing and enhancing user experience in interacting with diverse data sources.
8

Sulema, Yevgeniya. "MULTIMODAL DATA PROCESSING BASED ON ALGEBRAIC SYSTEM OF AGGREGATES RELATIONS". Radio Electronics, Computer Science, Control, n. 1 (15 maggio 2020): 169–80. http://dx.doi.org/10.15588/1607-3274-2020-1-17.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
9

Ren, Jinchang, Junwei Han e Mauro Dalla Mura. "Special issue on multimodal data fusion for multidimensional signal processing". Multidimensional Systems and Signal Processing 27, n. 4 (8 agosto 2016): 801–5. http://dx.doi.org/10.1007/s11045-016-0441-0.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
10

Chen, Shu-Ching. "Embracing Multimodal Data in Multimedia Data Analysis". IEEE MultiMedia 28, n. 3 (1 luglio 2021): 5–7. http://dx.doi.org/10.1109/mmul.2021.3104911.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
11

Penia, Oleksandr, e Yevgeniya Sulema. "ЗАСТОСУВАННЯ ГЛИБОКИХ ШТУЧНИХ НЕЙРОННИХ МЕРЕЖ ДЛЯ КЛАСИФІКАЦІЇ МУЛЬТИМОДАЛЬНИХ ДАНИХ". System technologies 6, n. 149 (1 aprile 2024): 11–22. http://dx.doi.org/10.34185/1562-9945-6-149-2023-02.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Multimodal data analysis is gaining attention in recent research. Pu Liang et al. (2023) provide a comprehensive overview on multimodal machine learning, highlighting its founda-tions, challenges and achievements in recent years. More problem-oriented works propose new methods and applications for multimodal ML, such a Ngiam et al. (2011) propose to use joint audio and video data to improve speech recognition accuracy; Sun, Wand and Li (2018) describe application of multimodal classification for breast cancer prognosis prediction; Mao et al. (2014) propose an architecture of multimodal recurrent network to generate text de-scription of images and so on. However, such works usually focus on the task itself and meth-ods therein, and not on integrating multimodal data processing into other software systems. The goal of this research is to propose a way to conduct multimodal data processing, specifically as a part of a digital twin systems, thus efficiency and near-real-time operation are required. The paper presents an approach to conduct parallel multimodal data classification, adapting to available computing power. The method is modular and scalable and intended for in digital twin application as a part of analysis and modeling tools. Later, the detailed example of such a software module is discussed. It uses multimodal data from open datasets to detect and classify the behavior of pets using deep learning mod-els. Videos are processed using two artificial neural networks: YOLOv3 object detection net-work to process individual frames of the video and a relatively simple convolutional network to classify sounds based on their frequency spectra. Constructed module uses a producer-consumer parallel processing pattern and allows processing 5 frames per second of a video on available hardware, which can be sufficiently improved by using GPU acceleration or more paralleled processing threads.
12

Pester, Andreas, Yevgeniya Sulema, Ivan Dychka e Olga Sulema. "Temporal Multimodal Data-Processing Algorithms Based on Algebraic System of Aggregates". Algorithms 16, n. 4 (29 marzo 2023): 186. http://dx.doi.org/10.3390/a16040186.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
In many tasks related to an object’s observation or real-time monitoring, the gathering of temporal multimodal data is required. Such data sets are semantically connected as they reflect different aspects of the same object. However, data sets of different modalities are usually stored and processed independently. This paper presents an approach based on the application of the Algebraic System of Aggregates (ASA) operations that enable the creation of an object’s complex representation, referred to as multi-image (MI). The representation of temporal multimodal data sets as the object’s MI yields simple data-processing procedures as it provides a solid semantic connection between data describing different features of the same object, process, or phenomenon. In terms of software development, the MI is a complex data structure used for data processing with ASA operations. This paper provides a detailed presentation of this concept.
13

Ping, Zou, e Yueyan Liu. "Classification and Visual Design Analysis of Network Expression Based on Big Data Multimodal Intelligence Technology". Discrete Dynamics in Nature and Society 2022 (20 aprile 2022): 1–7. http://dx.doi.org/10.1155/2022/7542606.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
The rapid development of the Internet in modern society has promoted the development of many different network platforms. In the context of big data, many types of multimodal data such as pictures, videos, and texts are generated in the platform. Through the analysis of multimodal data, we can provide better services for users. The traditional big data analysis platform cannot achieve a completely stable state for the analysis of multimodal data. The construction of multimodal intelligent platform can achieve efficient analysis of relevant data, so as to create greater economic benefits for the society. This paper mainly studies the historical development trend of big data multimodal intelligence technology and the data processing method of multimodal intelligence technology applied to network expression classification, including data acquisition, storage, and analysis. Finally, it studied the fusion algorithm between multimodal data and visual design, as well as the classification of network expression and the application result analysis of visual design in big data multimodal intelligence technology.
14

Mingyu, Ji, Zhou Jiawei e Wei Ning. "AFR-BERT: Attention-based mechanism feature relevance fusion multimodal sentiment analysis model". PLOS ONE 17, n. 9 (9 settembre 2022): e0273936. http://dx.doi.org/10.1371/journal.pone.0273936.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Multimodal sentiment analysis is an essential task in natural language processing which refers to the fact that machines can analyze and recognize emotions through logical reasoning and mathematical operations after learning multimodal emotional features. For the problem of how to consider the effective fusion of multimodal data and the relevance of multimodal data in multimodal sentiment analysis, we propose an attention-based mechanism feature relevance fusion multimodal sentiment analysis model (AFR-BERT). In the data pre-processing stage, text features are extracted using the pre-trained language model BERT (Bi-directional Encoder Representation from Transformers), and the BiLSTM (Bi-directional Long Short-Term Memory) is used to obtain the internal information of the audio. In the data fusion phase, the multimodal data fusion network effectively fuses multimodal features through the interaction of text and audio information. During the data analysis phase, the multimodal data association network analyzes the data by exploring the correlation of fused information between text and audio. In the data output phase, the model outputs the results of multimodal sentiment analysis. We conducted extensive comparative experiments on the publicly available sentiment analysis datasets CMU-MOSI and CMU-MOSEI. The experimental results show that AFR-BERT improves on the classical multimodal sentiment analysis model in terms of relevant performance metrics. In addition, ablation experiments and example analysis show that the multimodal data analysis network in AFR-BERT can effectively capture and analyze the sentiment features in text and audio.
15

Sohrab, Fahad, Jenni Raitoharju, Alexandros Iosifidis e Moncef Gabbouj. "Multimodal subspace support vector data description". Pattern Recognition 110 (febbraio 2021): 107648. http://dx.doi.org/10.1016/j.patcog.2020.107648.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
16

Kuhnke, Philipp, Markus Kiefer e Gesa Hartwigsen. "Task-Dependent Functional and Effective Connectivity during Conceptual Processing". Cerebral Cortex 31, n. 7 (3 marzo 2021): 3475–93. http://dx.doi.org/10.1093/cercor/bhab026.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Abstract Conceptual knowledge is central to cognition. Previous neuroimaging research indicates that conceptual processing involves both modality-specific perceptual-motor areas and multimodal convergence zones. For example, our previous functional magnetic resonance imaging (fMRI) study revealed that both modality-specific and multimodal regions respond to sound and action features of concepts in a task-dependent fashion (Kuhnke P, Kiefer M, Hartwigsen G. 2020b. Task-dependent recruitment of modality-specific and multimodal regions during conceptual processing. Cereb Cortex. 30:3938–3959.). However, it remains unknown whether and how modality-specific and multimodal areas interact during conceptual tasks. Here, we asked 1) whether multimodal and modality-specific areas are functionally coupled during conceptual processing, 2) whether their coupling depends on the task, 3) whether information flows top-down, bottom-up or both, and 4) whether their coupling is behaviorally relevant. We combined psychophysiological interaction analyses with dynamic causal modeling on the fMRI data of our previous study. We found that functional coupling between multimodal and modality-specific areas strongly depended on the task, involved both top-down and bottom-up information flow, and predicted conceptually guided behavior. Notably, we also found coupling between different modality-specific areas and between different multimodal areas. These results suggest that functional coupling in the conceptual system is extensive, reciprocal, task-dependent, and behaviorally relevant. We propose a new model of the conceptual system that incorporates task-dependent functional interactions between modality-specific and multimodal areas.
17

Boyko, Nataliya. "Tools for Implementing the Models and Algorithms for Processing Multimodal Data". Computer Science and Information Technology 11, n. 1 (aprile 2023): 1–10. http://dx.doi.org/10.13189/csit.2023.110101.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
18

Barros, Marcel, Andressa Pinto, Andres Monroy, Felipe Moreno, Jefferson Coelho, Aldomar Pietro Silva, Caio Fabricio Deberaldini Netto et al. "Early Detection of Extreme Storm Tide Events Using Multimodal Data Processing". Proceedings of the AAAI Conference on Artificial Intelligence 38, n. 20 (24 marzo 2024): 21923–31. http://dx.doi.org/10.1609/aaai.v38i20.30194.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Sea-level rise is a well-known consequence of climate change. Several studies have estimated the social and economic impact of the increase in extreme flooding. An efficient way to mitigate its consequences is the development of a flood alert and prediction system, based on high-resolution numerical models and robust sensing networks. However, current models use various simplifying assumptions that compromise accuracy to ensure solvability within a reasonable timeframe, hindering more regular and cost-effective forecasts for various locations along the shoreline. To address these issues, this work proposes a hybrid model for multimodal data processing that combines physics-based numerical simulations, data obtained from a network of sensors, and satellite images to provide refined wave and sea-surface height forecasts, with real results obtained in a critical location within the Port of Santos (the largest port in Latin America). Our approach exhibits faster convergence than data-driven models while achieving more accurate predictions. Moreover, the model handles irregularly sampled time series and missing data without the need for complex preprocessing mechanisms or data imputation while keeping low computational costs through a combination of time encoding, recurrent and graph neural networks. Enabling raw sensor data to be easily combined with existing physics-based models opens up new possibilities for accurate extreme storm tide events forecast systems that enhance community safety and aid policymakers in their decision-making processes.
19

Guo, Zhixin, Chaoyang Wang, Jianping Zhou, Guanjie Zheng, Xinbing Wang e Chenghu Zhou. "GeoKnowledgeFusion: A Platform for Multimodal Data Compilation from Geoscience Literature". Remote Sensing 16, n. 9 (23 aprile 2024): 1484. http://dx.doi.org/10.3390/rs16091484.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
With the advent of big data science, the field of geoscience has undergone a paradigm shift toward data-driven scientific discovery. However, the abundance of geoscience data distributed across multiple sources poses significant challenges to researchers in terms of data compilation, which includes data collection, collation, and database construction. To streamline the data compilation process, we present GeoKnowledgeFusion, a publicly accessible platform for the fusion of text, visual, and tabular knowledge extracted from the geoscience literature. GeoKnowledgeFusion leverages a powerful network of models that provide a joint multimodal understanding of text, image, and tabular data, enabling researchers to efficiently curate and continuously update their databases. To demonstrate the practical applications of GeoKnowledgeFusion, we present two scenarios: the compilation of Sm-Nd isotope data for constructing a domain-specific database and geographic analysis, and the data extraction process for debris flow disasters. The data compilation process for these use cases encompasses various tasks, including PDF pre-processing, target element recognition, human-in-the-loop annotation, and joint multimodal knowledge understanding. The findings consistently reveal patterns that align with manually compiled data, thus affirming the credibility and dependability of our automated data processing tool. To date, GeoKnowledgeFusion has supported forty geoscience research teams within the program by processing over 40,000 documents uploaded by geoscientists.
20

Pawłowski, Maciej, Anna Wróblewska e Sylwia Sysko-Romańczuk. "Effective Techniques for Multimodal Data Fusion: A Comparative Analysis". Sensors 23, n. 5 (21 febbraio 2023): 2381. http://dx.doi.org/10.3390/s23052381.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Data processing in robotics is currently challenged by the effective building of multimodal and common representations. Tremendous volumes of raw data are available and their smart management is the core concept of multimodal learning in a new paradigm for data fusion. Although several techniques for building multimodal representations have been proven successful, they have not yet been analyzed and compared in a given production setting. This paper explored three of the most common techniques, (1) the late fusion, (2) the early fusion, and (3) the sketch, and compared them in classification tasks. Our paper explored different types of data (modalities) that could be gathered by sensors serving a wide range of sensor applications. Our experiments were conducted on Amazon Reviews, MovieLens25M, and Movie-Lens1M datasets. Their outcomes allowed us to confirm that the choice of fusion technique for building multimodal representation is crucial to obtain the highest possible model performance resulting from the proper modality combination. Consequently, we designed criteria for choosing this optimal data fusion technique.
21

Fijalkow, Inbar, Elad Heiman e Hagit Messer. "Parameter Estimation from Heterogeneous/Multimodal Data Sets". IEEE Signal Processing Letters 23, n. 3 (marzo 2016): 390–93. http://dx.doi.org/10.1109/lsp.2016.2523886.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
22

Katz, Ori, Ronen Talmon, Yu-Lun Lo e Hau-Tieng Wu. "Alternating diffusion maps for multimodal data fusion". Information Fusion 45 (gennaio 2019): 346–60. http://dx.doi.org/10.1016/j.inffus.2018.01.007.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
23

Wu, Laiyun, Jee Eun Kang, Younshik Chung e Alexander Nikolaev. "Monitoring Multimodal Travel Environment Using Automated Fare Collection Data: Data Processing and Reliability Analysis". Journal of Big Data Analytics in Transportation 1, n. 2-3 (25 novembre 2019): 123–46. http://dx.doi.org/10.1007/s42421-019-00012-w.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
24

Kulvinder Singh, Et al. "Enhancing Multimodal Information Retrieval Through Integrating Data Mining and Deep Learning Techniques". International Journal on Recent and Innovation Trends in Computing and Communication 11, n. 9 (30 ottobre 2023): 560–69. http://dx.doi.org/10.17762/ijritcc.v11i9.8844.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Multimodal information retrieval, the task of re trieving relevant information from heterogeneous data sources such as text, images, and videos, has gained significant attention in recent years due to the proliferation of multimedia content on the internet. This paper proposes an approach to enhance multimodal information retrieval by integrating data mining and deep learning techniques. Traditional information retrieval systems often struggle to effectively handle multimodal data due to the inherent complexity and diversity of such data sources. In this study, we leverage data mining techniques to preprocess and structure multimodal data efficiently. Data mining methods enable us to extract valuable patterns, relationships, and features from different modalities, providing a solid foundation for sub- sequent retrieval tasks. To further enhance the performance of multimodal information retrieval, deep learning techniques are employed. Deep neural networks have demonstrated their effectiveness in various multimedia tasks, including image recognition, natural language processing, and video analysis. By integrating deep learning models into our retrieval framework, we aim to capture complex intermodal dependencies and semantically rich representations, enabling more accurate and context-aware retrieval.
25

Engebretsen, Martin. "From Decoding a Graph to Processing a Multimodal Message". Nordicom Review 41, n. 1 (18 febbraio 2020): 33–50. http://dx.doi.org/10.2478/nor-2020-0004.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
AbstractData visualisation – in the forms of graphs, charts, and maps – represents a text type growing in prevalence and impact in many cultural domains; education, journalism, business, PR, and more. Research on data visualisation reception is scarce, particularly that related to interactive and dynamic forms of data visualisation in digital media. Taking an approach inspired by grounded theory, in this article I investigate the ways in which young students interact with data visualisations found in digital news media. Combining observations from reading sessions with ten in-depth interviews, I investigate how the informants read, interpreted, and responded emotionally to data visualisations including visual metaphors, interactivity, and animation.
26

Bergamaschi, Antoine, Kadda Medjoubi, Cédric Messaoudi, Sergio Marco e Andrea Somogyi. "MMX-I: data-processing software for multimodal X-ray imaging and tomography". Journal of Synchrotron Radiation 23, n. 3 (12 aprile 2016): 783–94. http://dx.doi.org/10.1107/s1600577516003052.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
A new multi-platform freeware has been developed for the processing and reconstruction of scanning multi-technique X-ray imaging and tomography datasets. The software platform aims to treat different scanning imaging techniques: X-ray fluorescence, phase, absorption and dark field and any of their combinations, thus providing an easy-to-use data processing tool for the X-ray imaging user community. A dedicated data input stream copes with the input and management of large datasets (several hundred GB) collected during a typical multi-technique fast scan at the Nanoscopium beamline and even on a standard PC. To the authors' knowledge, this is the first software tool that aims at treating all of the modalities of scanning multi-technique imaging and tomography experiments.
27

Kotus, J., K. Łopatka, A. Czyżewski e G. Bogdanis. "Processing of acoustical data in a multimodal bank operating room surveillance system". Multimedia Tools and Applications 75, n. 17 (17 ottobre 2014): 10787–805. http://dx.doi.org/10.1007/s11042-014-2264-z.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
28

Kawadkar, Pankaj, B.Rebecca e Puppala Krupa Sagar. "Clustering Techniques for Person Authentication from Online Intelligence Data Inspired by Nature". International Journal of Scientific Methods in Engineering and Management 01, n. 04 (2023): 31–38. http://dx.doi.org/10.58599/ijsmem.2023.1404.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Image processing is essential for the success of image-based authentication. Included in multiple ”Multimodal image classification” subheadings. In this research, we will investigate three methods that have been shown to improve the precision of image classification. Pre-processing refers to the subsequent phase of extracting and classifying features. Gaussian filters are used for the pre-processing step, while the PSO algorithm is responsible for the feature extraction. Incorporating categorization algorithms is made possible by employing the ECNN. Finally, we evaluate our proposal by contrasting it with state-of-the-art scientific findings.
29

Zhang, Zhenchao, George Vosselman, Markus Gerke, Claudio Persello, Devis Tuia e Michael Ying Yang. "Detecting Building Changes between Airborne Laser Scanning and Photogrammetric Data". Remote Sensing 11, n. 20 (18 ottobre 2019): 2417. http://dx.doi.org/10.3390/rs11202417.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Detecting topographic changes in an urban environment and keeping city-level point clouds up-to-date are important tasks for urban planning and monitoring. In practice, remote sensing data are often available only in different modalities for two epochs. Change detection between airborne laser scanning data and photogrammetric data is challenging due to the multi-modality of the input data and dense matching errors. This paper proposes a method to detect building changes between multimodal acquisitions. The multimodal inputs are converted and fed into a light-weighted pseudo-Siamese convolutional neural network (PSI-CNN) for change detection. Different network configurations and fusion strategies are compared. Our experiments on a large urban data set demonstrate the effectiveness of the proposed method. Our change map achieves a recall rate of 86.17%, a precision rate of 68.16%, and an F1-score of 76.13%. The comparison between Siamese architecture and feed-forward architecture brings many interesting findings and suggestions to the design of networks for multimodal data processing.
30

Dmytro, Rvach, e Yevgeniya Sulema. "Mulsemedia data consolidation method". System technologies 6, n. 143 (13 novembre 2023): 69–79. http://dx.doi.org/10.34185/1562-9945-6-143-2022-06.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
The synchronization of multimodal data is one of the essential tasks related to mulse-media data processing. The concept of mulsemedia (MULtiple SEnsorial MEDIA) involves the registration, storage, processing, transmission and reproduction by computer-based tools of multimodal information about a physical object that humans can perceive through their senses. Such information includes audiovisual information (object's appearance, acoustic properties, etc.), tactile information (surface texture, temperature), kinesthetic information (weight, object's centre of gravity), information about its taste, smell, etc. The perception of mulsemedia information by a person is the process that exists over time. Because of this, the registration of mulsemedia data should be carried out with the fixation of the moments of time when the relevant mulsemedia information existed or its perception made sense for a human who supervises the object as mulsemedia data is temporal. This paper presents a method that enables the consolidation and synchronization of mulsemedia data using the principles of multithreading. The universal method was designed to support combining data of different modalities in parallel threads. The application of the proposed method solves problems associated with integrating data of different modalities and formats in the same time interval. The effectiveness of applying this method increases by us-ing multithreaded distributed computing. This method is designed for use in the development of mulsemedia software systems. The modified JSON format (TJSON – Timeline JSON) was proposed in the paper, as well. TJSON-object is a complex data structure for representing the synchronized mulsemedia data and their further processing. The proposed method can be further extended with other approaches and technologies. For example, artificial intelligence methods can be applied to assess the correlation between data from different modalities. This can help improve the method's accuracy and the output files' quality.
31

Ivanko, Denis, Alexey Karpov, Dmitrii Fedotov, Irina Kipyatkova, Dmitry Ryumin, Dmitriy Ivanko, Wolfgang Minker e Milos Zelezny. "Multimodal speech recognition: increasing accuracy using high speed video data". Journal on Multimodal User Interfaces 12, n. 4 (1 agosto 2018): 319–28. http://dx.doi.org/10.1007/s12193-018-0267-1.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
32

Bauer, Dominik F., Tom Russ, Barbara I. Waldkirch, Christian Tönnes, William P. Segars, Lothar R. Schad, Frank G. Zöllner e Alena-Kathrin Golla. "Generation of annotated multimodal ground truth datasets for abdominal medical image registration". International Journal of Computer Assisted Radiology and Surgery 16, n. 8 (2 maggio 2021): 1277–85. http://dx.doi.org/10.1007/s11548-021-02372-7.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Abstract Purpose Sparsity of annotated data is a major limitation in medical image processing tasks such as registration. Registered multimodal image data are essential for the diagnosis of medical conditions and the success of interventional medical procedures. To overcome the shortage of data, we present a method that allows the generation of annotated multimodal 4D datasets. Methods We use a CycleGAN network architecture to generate multimodal synthetic data from the 4D extended cardiac–torso (XCAT) phantom and real patient data. Organ masks are provided by the XCAT phantom; therefore, the generated dataset can serve as ground truth for image segmentation and registration. Realistic simulation of respiration and heartbeat is possible within the XCAT framework. To underline the usability as a registration ground truth, a proof of principle registration is performed. Results Compared to real patient data, the synthetic data showed good agreement regarding the image voxel intensity distribution and the noise characteristics. The generated T1-weighted magnetic resonance imaging, computed tomography (CT), and cone beam CT images are inherently co-registered. Thus, the synthetic dataset allowed us to optimize registration parameters of a multimodal non-rigid registration, utilizing liver organ masks for evaluation. Conclusion Our proposed framework provides not only annotated but also multimodal synthetic data which can serve as a ground truth for various tasks in medical imaging processing. We demonstrated the applicability of synthetic data for the development of multimodal medical image registration algorithms.
33

Sun, Fanglei, e Zhifeng Diao. "Research on Data Fusion Method Based on Multisource Data Awareness of Internet of Things". Journal of Sensors 2022 (4 luglio 2022): 1–10. http://dx.doi.org/10.1155/2022/5001953.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
The diversity of big data in Internet of Things is one of the important characteristics that distinguish it from traditional big data. Big data of Internet of Things is often composed of a variety of data with different structural forms. The description of the same thing by these different modal data has certain independence and strong relevance. Accurately and efficiently extracting and processing the hidden fusion information in the big data of the Internet of Things is helpful to solve various multimodal data analysis tasks at present. In this paper, a multimodal interactive function fusion model based on attention mechanism is proposed, which provides more efficient and accurate information for emotion classification tasks. Firstly, a sparse noise reduction self-encoder is used to extract text features, Secondly, features are extracted by encoder. Finally, an interactive fusion module is constructed, which makes text features and image features learn their internal information then the combination function is applied to the emotion classification task.
34

Paulmann, Silke, Sarah Jessen e Sonja A. Kotz. "Investigating the Multimodal Nature of Human Communication". Journal of Psychophysiology 23, n. 2 (gennaio 2009): 63–76. http://dx.doi.org/10.1027/0269-8803.23.2.63.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
The multimodal nature of human communication has been well established. Yet few empirical studies have systematically examined the widely held belief that this form of perception is facilitated in comparison to unimodal or bimodal perception. In the current experiment we first explored the processing of unimodally presented facial expressions. Furthermore, auditory (prosodic and/or lexical-semantic) information was presented together with the visual information to investigate the processing of bimodal (facial and prosodic cues) and multimodal (facial, lexic, and prosodic cues) human communication. Participants engaged in an identity identification task, while event-related potentials (ERPs) were being recorded to examine early processing mechanisms as reflected in the P200 and N300 component. While the former component has repeatedly been linked to physical property stimulus processing, the latter has been linked to more evaluative “meaning-related” processing. A direct relationship between P200 and N300 amplitude and the number of information channels present was found. The multimodal-channel condition elicited the smallest amplitude in the P200 and N300 components, followed by an increased amplitude in each component for the bimodal-channel condition. The largest amplitude was observed for the unimodal condition. These data suggest that multimodal information induces clear facilitation in comparison to unimodal or bimodal information. The advantage of multimodal perception as reflected in the P200 and N300 components may thus reflect one of the mechanisms allowing for fast and accurate information processing in human communication.
35

Dychka, Ivan A., e Yevgeniya S. Sulema. "Logical Operations in Algebraic System of Aggregates for Multimodal Data Representation and Processing". Research Bulletin of the National Technical University of Ukraine "Kyiv Politechnic Institute", n. 6 (17 dicembre 2018): 44–52. http://dx.doi.org/10.20535/1810-0546.2018.6.151546.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
36

Rivière, D., D. Geffroy, I. Denghien, N. Souedet e Y. Cointepas. "BrainVISA: an extensible software environment for sharing multimodal neuroimaging data and processing tools". NeuroImage 47 (luglio 2009): S163. http://dx.doi.org/10.1016/s1053-8119(09)71720-3.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
37

Pamart, A., O. Guillon, S. Faraci, E. Gattet, M. Genevois, J. M. Vallet e L. De Luca. "MULTISPECTRAL PHOTOGRAMMETRIC DATA ACQUISITION AND PROCESSING FORWALL PAINTINGS STUDIES". ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLII-2/W3 (23 febbraio 2017): 559–66. http://dx.doi.org/10.5194/isprs-archives-xlii-2-w3-559-2017.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
In the field of wall paintings studies different imaging techniques are commonly used for the documentation and the decision making in term of conservation and restoration. There is nowadays some challenging issues to merge scientific imaging techniques in a multimodal context (i.e. multi-sensors, multi-dimensions, multi-spectral and multi-temporal approaches). For decades those CH objects has been widely documented with Technical Photography (TP) which gives precious information to understand or retrieve the painting layouts and history. More recently there is an increasing demand of the use of digital photogrammetry in order to provide, as one of the possible output, an orthophotomosaic which brings a possibility for metrical quantification of conservators/restorators observations and actions planning. This paper presents some ongoing experimentations of the LabCom MAP-CICRP relying on the assumption that those techniques can be merged through a common pipeline to share their own benefits and create a more complete documentation.
38

Caschera, Maria Chiara, Patrizia Grifoni e Fernando Ferri. "Emotion Classification from Speech and Text in Videos Using a Multimodal Approach". Multimodal Technologies and Interaction 6, n. 4 (12 aprile 2022): 28. http://dx.doi.org/10.3390/mti6040028.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Emotion classification is a research area in which there has been very intensive literature production concerning natural language processing, multimedia data, semantic knowledge discovery, social network mining, and text and multimedia data mining. This paper addresses the issue of emotion classification and proposes a method for classifying the emotions expressed in multimodal data extracted from videos. The proposed method models multimodal data as a sequence of features extracted from facial expressions, speech, gestures, and text, using a linguistic approach. Each sequence of multimodal data is correctly associated with the emotion by a method that models each emotion using a hidden Markov model. The trained model is evaluated on samples of multimodal sentences associated with seven basic emotions. The experimental results demonstrate a good classification rate for emotions.
39

Porta, Alberto, Federico Aletti, Frederic Vallais e Giuseppe Baselli. "Multimodal signal processing for the analysis of cardiovascular variability". Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 367, n. 1887 (22 ottobre 2008): 391–409. http://dx.doi.org/10.1098/rsta.2008.0229.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Cardiovascular (CV) variability as a primary vital sign carrying information about CV regulation systems is reviewed by pointing out the role of the main rhythms and the various control and functional systems involved. The high complexity of the addressed phenomena fosters a multimodal approach that relies on data analysis models and deals with the ongoing interactions of many signals at a time. The importance of closed-loop identification and causal analysis is remarked upon and basic properties, application conditions and methods are recalled. The need of further integration of CV signals relevant to peripheral and systemic haemodynamics, respiratory mechanics, neural afferent and efferent pathways is also stressed.
40

Decker, Kevin T., e Brett J. Borghetti. "Hyperspectral Point Cloud Projection for the Semantic Segmentation of Multimodal Hyperspectral and Lidar Data with Point Convolution-Based Deep Fusion Neural Networks". Applied Sciences 13, n. 14 (14 luglio 2023): 8210. http://dx.doi.org/10.3390/app13148210.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
The fusion of dissimilar data modalities in neural networks presents a significant challenge, particularly in the case of multimodal hyperspectral and lidar data. Hyperspectral data, typically represented as images with potentially hundreds of bands, provide a wealth of spectral information, while lidar data, commonly represented as point clouds with millions of unordered points in 3D space, offer structural information. The complementary nature of these data types presents a unique challenge due to their fundamentally different representations requiring distinct processing methods. In this work, we introduce an alternative hyperspectral data representation in the form of a hyperspectral point cloud (HSPC), which enables ingestion and exploitation with point cloud processing neural network methods. Additionally, we present a composite fusion-style, point convolution-based neural network architecture for the semantic segmentation of HSPC and lidar point cloud data. We investigate the effects of the proposed HSPC representation for both unimodal and multimodal networks ingesting a variety of hyperspectral and lidar data representations. Finally, we compare the performance of these networks against each other and previous approaches. This study paves the way for innovative approaches to multimodal remote sensing data fusion, unlocking new possibilities for enhanced data analysis and interpretation.
41

Goel, Anshika, Saurav Roy, Khushboo Punjabi, Ritwick Mishra, Manjari Tripathi, Deepika Shukla e Pravat K. Mandal. "PRATEEK: Integration of Multimodal Neuroimaging Data to Facilitate Advanced Brain Research". Journal of Alzheimer's Disease 83, n. 1 (31 agosto 2021): 305–17. http://dx.doi.org/10.3233/jad-210440.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Background: In vivo neuroimaging modalities such as magnetic resonance imaging (MRI), functional MRI (fMRI), magnetoencephalography (MEG), magnetic resonance spectroscopy (MRS), and quantitative susceptibility mapping (QSM) are useful techniques to understand brain anatomical structure, functional activity, source localization, neurochemical profiles, and tissue susceptibility respectively. Integrating unique and distinct information from these neuroimaging modalities will further help to enhance the understanding of complex neurological diseases. Objective: To develop a processing scheme for multimodal data integration in a seamless manner on healthy young population, thus establishing a generalized framework for various clinical conditions (e.g., Alzheimer’s disease). Methods: A multimodal data integration scheme has been developed to integrate the outcomes from multiple neuroimaging data (fMRI, MEG, MRS, and QSM) spatially. Furthermore, the entire scheme has been incorporated into a user-friendly toolbox- “PRATEEK”. Results: The proposed methodology and toolbox has been tested for viability among fourteen healthy young participants. The data-integration scheme was tested for bilateral occipital cortices as the regions of interest and can also be extended to other anatomical regions. Overlap percentage from each combination of two modalities (fMRI-MRS, MEG-MRS, fMRI-QSM, and fMRI-MEG) has been computed and also been qualitatively assessed for combinations of the three (MEG-MRS-QSM) and four (fMRI-MEG-MRS-QSM) modalities. Conclusion: This user-friendly toolbox minimizes the need of an expertise in handling different neuroimaging tools for processing and analyzing multimodal data. The proposed scheme will be beneficial for clinical studies where geometric information plays a crucial role for advance brain research.
42

Mohd Ali, Maimunah, Norhashila Hashim, Samsuzana Abd Aziz e Ola Lasekan. "Utilisation of Deep Learning with Multimodal Data Fusion for Determination of Pineapple Quality Using Thermal Imaging". Agronomy 13, n. 2 (30 gennaio 2023): 401. http://dx.doi.org/10.3390/agronomy13020401.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Fruit quality is an important aspect in determining the consumer preference in the supply chain. Thermal imaging was used to determine different pineapple varieties according to the physicochemical changes of the fruit by means of the deep learning method. Deep learning has gained attention in fruit classification and recognition in unimodal processing. This paper proposes a multimodal data fusion framework for the determination of pineapple quality using deep learning methods based on the feature extraction acquired from thermal imaging. Feature extraction was selected from the thermal images that provided a correlation with the quality attributes of the fruit in developing the deep learning models. Three different types of deep learning architectures, including ResNet, VGG16, and InceptionV3, were built to develop the multimodal data fusion framework for the classification of pineapple varieties based on the concatenation of multiple features extracted by the robust networks. The multimodal data fusion coupled with powerful convolutional neural network architectures can remarkably distinguish different pineapple varieties. The proposed multimodal data fusion framework provides a reliable determination of fruit quality that can improve the recognition accuracy and the model performance up to 0.9687. The effectiveness of multimodal deep learning data fusion and thermal imaging has huge potential in monitoring the real-time determination of physicochemical changes of fruit.
43

Mikhnenko, P. A. "Analysis of Multimodal Data in Project Management: Prospects for Using Machine Learning". Management Sciences 13, n. 4 (3 febbraio 2024): 71–89. http://dx.doi.org/10.26794/2304-022x-2023-13-4-71-89.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
The modern project environment is characterized by high complexity, uncertainty, speed and depth of changes that affect the project during its life cycle. However, the project’s change management processes do not take into account the need to implement analytical procedures for dynamic processing of multimodal data arrays. The purpose of the study is to determine the content of analytical procedures for project management and substantiate the use of machine learning technologies for their effective implementation. The methodological basis was project management methods, theory of change, concepts of artificial intelligence and machine learning, as well as analytical approaches. Methods of descriptive modeling of the project management process and expert assessments of the prospects for using machine learning technologies were also used in the work. The information base was made up of scientific materials on the topic under consideration, as well as expert assessments. The results of the study allowed us to conclude that for the analysis of multimodal data, natural language processing and intellectual decision support technologies are most in demand, which can serve as the basis for new technological solutions in the field of project management.
44

Yu, Ping, Hongwei Zhao, Kun Yang, Hanlin Chen, Xiaozhong Geng, Ming Hu e Hui Yan. "Blockchain-Enabled Joint Resource Allocation for Virtualized Video Service Functions". Security and Communication Networks 2022 (16 maggio 2022): 1–16. http://dx.doi.org/10.1155/2022/4349097.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Power information is an important guarantee for energy security. As an important technical means of safety management and risk control, video monitoring is widely used in the power industry. Power video monitoring system uses efficient processing of multimodal video data and automatically identifies abnormal events and equipment status, replacing human monitoring with machine. Video monitoring data of power substations usually contain both visual information and auditory information, and the data types are diversified. The multimodal video data provides a rich underlying data source for the intelligent monitoring function, but it requires multiple service forms for efficient processing. Most intelligent edge monitoring equipment are only equipped with lightweight computing resources and limited battery supply, limited resources, and weak local processing data capabilities. Power video monitoring system has the characteristics of distribution, openness, interconnection, and intellectualization. Its intelligent edge video equipment is widely distributed, which also brings convenience and also brings security risks in terms of data security and reliability. For the outdoor multimodal power video monitoring system scenario, this paper adopts the edge-cloud distributed system architecture to solve the problem of resource shortage and adopts the first proposed service function virtualization (SFV) to solve the problem of multimodal video data processing. At the same time, the problem of security protection is solved by introducing blockchain to establish trust among intelligent video equipment and service providers. Under the security protection of virtualized service consortium blockchain (VSCB), virtualization technology is introduced into the service function chain (SFC) to realize SFV and solve the resource optimal allocation problem of multimodal video data processing. The work mainly involves the joint mapping of virtual resources, physical resources, and the joint optimization of computing and communication resources. Problems such as large state space and high dimensionality of action space have an impact on resource allocation. The stochastic optimization problem of resource allocation is established as a Markov decision process (MDP) model, and SFV technology is used to optimize cost and delay. The resource allocation optimization algorithm (RAOA-A3C) based on asynchronous advantage actor-critic algorithm (A3C) is proposed. Simulation experiments show that the RAOA-A3C proposed in this paper is more suitable for high-dynamic, multidimensional, and distributed power video monitoring system scenario and has achieved better optimization results in reducing time delay and deployment costs.
45

Nie, Wei-Zhi, Wen-Juan Peng, Xiang-yu Wang, Yi-liang Zhao e Yu-Ting Su. "Multimedia venue semantic modeling based on multimodal data". Journal of Visual Communication and Image Representation 48 (ottobre 2017): 375–85. http://dx.doi.org/10.1016/j.jvcir.2016.11.015.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
46

Haleem, Muhammad Salman, Audrey Ekuban, Alessio Antonini, Silvio Pagliara, Leandro Pecchia e Carlo Allocca. "Deep-Learning-Driven Techniques for Real-Time Multimodal Health and Physical Data Synthesis". Electronics 12, n. 9 (25 aprile 2023): 1989. http://dx.doi.org/10.3390/electronics12091989.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
With the advent of Artificial Intelligence for healthcare, data synthesis methods present crucial benefits in facilitating the fast development of AI models while protecting data subjects and bypassing the need to engage with the complexity of data sharing and processing agreements. Existing technologies focus on synthesising real-time physiological and physical records based on regular time intervals. Real health data are, however, characterised by irregularities and multimodal variables that are still hard to reproduce, preserving the correlation across time and different dimensions. This paper presents two novel techniques for synthetic data generation of real-time multimodal electronic health and physical records, (a) the Temporally Correlated Multimodal Generative Adversarial Network and (b) the Document Sequence Generator. The paper illustrates the need and use of these techniques through a real use case, the H2020 GATEKEEPER project of AI for healthcare. Furthermore, the paper presents the evaluation for both individual cases and a discussion about the comparability between techniques and their potential applications of synthetic data at the different stages of the software development life-cycle.
47

Zhao, Jian, Wenhua Dong, Lijuan Shi, Wenqian Qiang, Zhejun Kuang, Dawei Xu e Tianbo An. "Multimodal Feature Fusion Method for Unbalanced Sample Data in Social Network Public Opinion". Sensors 22, n. 15 (25 luglio 2022): 5528. http://dx.doi.org/10.3390/s22155528.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
With the wide application of social media, public opinion analysis in social networks has been unable to be met through text alone because the existing public opinion information includes data information of various modalities, such as voice, text, and facial expressions. Therefore multi-modal emotion analysis is the current focus of public opinion analysis. In addition, multi-modal emotion recognition of speech is an important factor restricting the multi-modal emotion analysis. In this paper, the emotion feature retrieval method for speech is firstly explored and the processing method of sample disequilibrium data is then analyzed. By comparing and studying the different feature fusion methods of text and speech, respectively, the multi-modal feature fusion method for sample disequilibrium data is proposed to realize multi-modal emotion recognition. Experiments are performed using two publicly available datasets (IEMOCAP and MELD), which shows that processing multi-modality data through this method can obtain good fine-grained emotion recognition results, laying a foundation for subsequent social public opinion analysis.
48

Zhang, Yingjie. "The current status and prospects of transformer in multimodality". Applied and Computational Engineering 11, n. 1 (25 settembre 2023): 224–30. http://dx.doi.org/10.54254/2755-2721/11/20230240.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
At present, the attention mechanism represented by transformer has greatly promoted the development of natural language processing (NLP) and image processing (CV). However, in the multimodal field, the application of attention mechanism still mainly focuses on extracting the features of different types of data, and then fusing these features (such as text and image). With the increasing scale of the model and the instability of the Internet data, feature fusion has been difficult to solve the growing variety of multimodal problems for us, and the multimodal field has always lacked a model that can uniformly handle all types of data. In this paper, we first take the CV and NLP fields as examples to review various derived models of transformer. Then, based on the mechanism of word embedding and image embedding, we discuss how embedding with different granularity is handled uniformly under the attention mechanism in multimodal scenes. Further, we reveal that this mechanism will not only be limited to CV and NLP, but the real unified model will be able to handle tasks across data types through pre-training and fine tuning. Finally, on the specific implementation of the unified model, this paper lists several cases, and analyzes the valuable research directions in related fields.
49

Jeong, Dayoung, e Kyungsik Han. "PRECYSE: Predicting Cybersickness using Transformer for Multimodal Time-Series Sensor Data". Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 8, n. 2 (13 maggio 2024): 1–24. http://dx.doi.org/10.1145/3659594.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Cybersickness, a factor that hinders user immersion in VR, has been the subject of ongoing attempts to predict it using AI. Previous studies have used CNN and LSTM for prediction models and used attention mechanisms and XAI for data analysis, yet none explored a transformer that can better reflect the spatial and temporal characteristics of the data, beneficial for enhancing prediction and feature importance analysis. In this paper, we propose cybersickness prediction models using multimodal time-series sensor data (i.e., eye movement, head movement, and physiological signals) based on a transformer algorithm, considering sensor data pre-processing and multimodal data fusion methods. We constructed the MSCVR dataset consisting of normalized sensor data, spectrogram formatted sensor data, and cybersickness levels collected from 45 participants through a user study. We proposed two methods for embedding multimodal time-series sensor data into the transformer: modality-specific spatial and temporal transformer encoders for normalized sensor data (MS-STTN) and modality-specific spatial-temporal transformer encoder for spectrogram (MS-STTS). MS-STTN yielded the highest performance in the ablation study and the comparison of the existing models. Furthermore, by analyzing the importance of data features, we determined their relevance to cybersickness over time, especially the salience of eye movement features. Our results and insights derived from multimodal time-series sensor data and the transformer model provide a comprehensive understanding of cybersickness and its association with sensor data. Our MSCVR dataset and code are publicly available: https://github.com/dayoung-jeong/PRECYSE.git.
50

Cui, Hang, Liang Hu e Ling Chi. "Advances in Computer-Aided Medical Image Processing". Applied Sciences 13, n. 12 (13 giugno 2023): 7079. http://dx.doi.org/10.3390/app13127079.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
The primary objective of this study is to provide an extensive review of deep learning techniques for medical image recognition, highlighting their potential for improving diagnostic accuracy and efficiency. We systematically organize the paper by first discussing the characteristics and challenges of medical imaging techniques, with a particular focus on magnetic resonance imaging (MRI) and computed tomography (CT). Subsequently, we delve into direct image processing methods, such as image enhancement and multimodal medical image fusion, followed by an examination of intelligent image recognition approaches tailored to specific anatomical structures. These approaches employ various deep learning models and techniques, including convolutional neural networks (CNNs), transfer learning, attention mechanisms, and cascading strategies, to overcome challenges related to unclear edges, overlapping regions, and structural distortions. Furthermore, we emphasize the significance of neural network design in medical imaging, concentrating on the extraction of multilevel features using U-shaped structures, dense connections, 3D convolution, and multimodal feature fusion. Finally, we identify and address the key challenges in medical image recognition, such as data quality, model interpretability, generalizability, and computational resource requirements. By proposing future directions in data accessibility, active learning, explainable AI, model robustness, and computational efficiency, this study paves the way for the successful integration of AI in clinical practice and enhanced patient care.

Vai alla bibliografia