Articoli di riviste sul tema "Multimodal processing"

Segui questo link per vedere altri tipi di pubblicazioni sul tema: Multimodal processing.

Cita una fonte nei formati APA, MLA, Chicago, Harvard e in molti altri stili

Scegli il tipo di fonte:

Vedi i top-50 articoli di riviste per l'attività di ricerca sul tema "Multimodal processing".

Accanto a ogni fonte nell'elenco di riferimenti c'è un pulsante "Aggiungi alla bibliografia". Premilo e genereremo automaticamente la citazione bibliografica dell'opera scelta nello stile citazionale di cui hai bisogno: APA, MLA, Harvard, Chicago, Vancouver ecc.

Puoi anche scaricare il testo completo della pubblicazione scientifica nel formato .pdf e leggere online l'abstract (il sommario) dell'opera se è presente nei metadati.

Vedi gli articoli di riviste di molte aree scientifiche e compila una bibliografia corretta.

1

Ng, Vincent, e Shengjie Li. "Multimodal Propaganda Processing". Proceedings of the AAAI Conference on Artificial Intelligence 37, n. 13 (26 giugno 2023): 15368–75. http://dx.doi.org/10.1609/aaai.v37i13.26792.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Propaganda campaigns have long been used to influence public opinion via disseminating biased and/or misleading information. Despite the increasing prevalence of propaganda content on the Internet, few attempts have been made by AI researchers to analyze such content. We introduce the task of multimodal propaganda processing, where the goal is to automatically analyze propaganda content. We believe that this task presents a long-term challenge to AI researchers and that successful processing of propaganda could bring machine understanding one important step closer to human understanding. We discuss the technical challenges associated with this task and outline the steps that need to be taken to address it.
2

Sinke, Christopher, Janina Neufeld, Daniel Wiswede, Hinderk M. Emrich, Stefan Bleich e Gregor R. Szycik. "Multisensory processing in synesthesia — differences in the EEG signal during uni- and multimodal processing". Seeing and Perceiving 25 (2012): 53. http://dx.doi.org/10.1163/187847612x646749.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Synesthesia is a condition in which stimulation in one processing stream (e.g., letters or music) leads to perception in an unstimulated processing stream (e.g., colors). Behavioral differences in mutisensory processing have been shown for multimodal illusions, but the differences in neural processing are still unclear. In the present study, we examined uni- and multimodal processing in 14 people with synesthesia and 13 controls using EEG recordings and a simple detection task. Stimuli were either presented acoustically, visually or multimodaly (simultaneous visual and auditory stimulation). In the multimodal condition, auditory and visual stimuli were either matching or mismatching (e.g., a lion either roaring or ringing). The subjects had to press a button as soon as something was presented visually or acoustically. Results: ERPs revealed occipital group differences in the negative amplitude between 100 and 200 ms after stimulus presentation. Relative to controls, synesthetes showed an increased negative component peaking around 150 ms. This group difference is found in all visual conditions. Unimodal acoustical stimulation leads to increased negative amplitude in synesthetes in the same time window over parietal and visual electrodes. Overall this shows that processing in the occipital lobe is different in synesthetes independent of the stimulated modality. In addition, differences in the negative amplitude between processing of incongruent and congruent multimodal stimuli could be detected in the same time window between synesthetes and controls over left frontal sites. This shows that also multimodal integration processes are different in synesthetes.
3

D'Ulizia, Arianna, Fernando Ferri e Patrizia Grifoni. "Generating Multimodal Grammars for Multimodal Dialogue Processing". IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans 40, n. 6 (novembre 2010): 1130–45. http://dx.doi.org/10.1109/tsmca.2010.2041227.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
4

Barricelli, Barbara Rita, Piero Mussio, Marco Padula e Paolo Luigi Scala. "TMS for multimodal information processing". Multimedia Tools and Applications 54, n. 1 (27 aprile 2010): 97–120. http://dx.doi.org/10.1007/s11042-010-0527-x.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
5

Parsons, Aaron D., Stephen W. T. Price, Nicola Wadeson, Mark Basham, Andrew M. Beale, Alun W. Ashton, J. Frederick W. Mosselmans e Paul D. Quinn. "Automatic processing of multimodal tomography datasets". Journal of Synchrotron Radiation 24, n. 1 (1 gennaio 2017): 248–56. http://dx.doi.org/10.1107/s1600577516017756.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
With the development of fourth-generation high-brightness synchrotrons on the horizon, the already large volume of data that will be collected on imaging and mapping beamlines is set to increase by orders of magnitude. As such, an easy and accessible way of dealing with such large datasets as quickly as possible is required in order to be able to address the core scientific problems during the experimental data collection. Savu is an accessible and flexible big data processing framework that is able to deal with both the variety and the volume of data of multimodal and multidimensional scientific datasets output such as those from chemical tomography experiments on the I18 microfocus scanning beamline at Diamond Light Source.
6

Holler, Judith, e Stephen C. Levinson. "Multimodal Language Processing in Human Communication". Trends in Cognitive Sciences 23, n. 8 (agosto 2019): 639–52. http://dx.doi.org/10.1016/j.tics.2019.05.006.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
7

Farzin, Faraz, Eric P. Charles e Susan M. Rivera. "Development of Multimodal Processing in Infancy". Infancy 14, n. 5 (1 settembre 2009): 563–78. http://dx.doi.org/10.1080/15250000903144207.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
8

Zhang, Ge, Tianxiang Luo, Witold Pedrycz, Mohammed A. El-Meligy, Mohamed Abdel Fattah Sharaf e Zhiwu Li. "Outlier Processing in Multimodal Emotion Recognition". IEEE Access 8 (2020): 55688–701. http://dx.doi.org/10.1109/access.2020.2981760.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
9

Metaxakis, Athanasios, Dionysia Petratou e Nektarios Tavernarakis. "Multimodal sensory processing in Caenorhabditis elegans". Open Biology 8, n. 6 (giugno 2018): 180049. http://dx.doi.org/10.1098/rsob.180049.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Multisensory integration is a mechanism that allows organisms to simultaneously sense and understand external stimuli from different modalities. These distinct signals are transduced into neuronal signals that converge into decision-making neuronal entities. Such decision-making centres receive information through neuromodulators regarding the organism's physiological state and accordingly trigger behavioural responses. Despite the importance of multisensory integration for efficient functioning of the nervous system, and also the implication of dysfunctional multisensory integration in the aetiology of neuropsychiatric disease, little is known about the relative molecular mechanisms. Caenorhabditis elegans is an appropriate model system to study such mechanisms and elucidate the molecular ways through which organisms understand external environments in an accurate and coherent fashion.
10

Nock, Harriet J., Giridharan Iyengar e Chalapathy Neti. "Multimodal processing by finding common cause". Communications of the ACM 47, n. 1 (1 gennaio 2004): 51. http://dx.doi.org/10.1145/962081.962105.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
11

Glasbey, C. A., e N. J. Martin. "Multimodal microscopy by digital image processing". Journal of Microscopy 181, n. 3 (marzo 1996): 225–37. http://dx.doi.org/10.1046/j.1365-2818.1996.91372.x.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
12

Kyselova, A. H., G. D. Kiselov, A. A. Serhyeyev e A. V. Shalaginov. "Processing input data in multimodal applications". Electronics and Communications 16, n. 2 (28 marzo 2011): 86–92. http://dx.doi.org/10.20535/2312-1807.2011.16.2.268253.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
13

Kuhnke, Philipp, Markus Kiefer e Gesa Hartwigsen. "Task-Dependent Functional and Effective Connectivity during Conceptual Processing". Cerebral Cortex 31, n. 7 (3 marzo 2021): 3475–93. http://dx.doi.org/10.1093/cercor/bhab026.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Abstract Conceptual knowledge is central to cognition. Previous neuroimaging research indicates that conceptual processing involves both modality-specific perceptual-motor areas and multimodal convergence zones. For example, our previous functional magnetic resonance imaging (fMRI) study revealed that both modality-specific and multimodal regions respond to sound and action features of concepts in a task-dependent fashion (Kuhnke P, Kiefer M, Hartwigsen G. 2020b. Task-dependent recruitment of modality-specific and multimodal regions during conceptual processing. Cereb Cortex. 30:3938–3959.). However, it remains unknown whether and how modality-specific and multimodal areas interact during conceptual tasks. Here, we asked 1) whether multimodal and modality-specific areas are functionally coupled during conceptual processing, 2) whether their coupling depends on the task, 3) whether information flows top-down, bottom-up or both, and 4) whether their coupling is behaviorally relevant. We combined psychophysiological interaction analyses with dynamic causal modeling on the fMRI data of our previous study. We found that functional coupling between multimodal and modality-specific areas strongly depended on the task, involved both top-down and bottom-up information flow, and predicted conceptually guided behavior. Notably, we also found coupling between different modality-specific areas and between different multimodal areas. These results suggest that functional coupling in the conceptual system is extensive, reciprocal, task-dependent, and behaviorally relevant. We propose a new model of the conceptual system that incorporates task-dependent functional interactions between modality-specific and multimodal areas.
14

Boyko, Nataliya. "Models and Algorithms for Multimodal Data Processing". WSEAS TRANSACTIONS ON INFORMATION SCIENCE AND APPLICATIONS 20 (14 marzo 2023): 87–97. http://dx.doi.org/10.37394/23209.2023.20.11.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Information technologies and computer equipment are used in almost all areas of activity, which is why new areas of their use are emerging, and the level of ICT implementation is deepening, with more and more functions that were the prerogative of humans being assigned to computers. As science and technology develop, new technologies and technical means are emerging that enable a human-centered approach to software development, better adaptation of human-machine interfaces to user needs, and an increase in the ergonomics of software products, etc. These measures contribute to the formation of fundamentally new opportunities for presenting and processing information about real-world objects with which an individual interacts in production, educational and everyday activities in computer systems. The article aims to identify current models and algorithms for processing multimodal data in computer systems based on a survey of company employees and to analyze these models and algorithms to determine the benefits of using models and algorithms for processing multimodal data. Research methods: comparative analysis; systematization; generalization; survey. Results. It has been established that the recommended multimodal data representation models (the mixed model, the spatiotemporal linked model, and the multilevel ontological model) allow for representing the digital twin of the object under study at differentiated levels of abstraction, and these multimodal data processing models can be combined to obtain the most informative way to describe the physical twin. As a result of the study, it was found that the "general judgment of the experience of using models and algorithms for multimodal data processing" was noted by the respondents in the item "Personally, I would say that models and algorithms for multimodal data processing are practical" with an average value of 8.16 (SD = 0 1.70), in the item "Personally, I would say that models and algorithms for multimodal data processing are understandable (not confusing)" with an average value of 7.52. It has been determined that respondents positively evaluate (with scores above 5.0) models and algorithms for processing multimodal data in work environments as practical, understandable, manageable, and original. columns finish at the same distance from the top of the page.
15

Chen, Mujun. "Automatic Image Processing Algorithm for Light Environment Optimization Based on Multimodal Neural Network Model". Computational Intelligence and Neuroscience 2022 (3 giugno 2022): 1–12. http://dx.doi.org/10.1155/2022/5156532.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
In this paper, we conduct an in-depth study and analysis of the automatic image processing algorithm based on a multimodal Recurrent Neural Network (m-RNN) for light environment optimization. By analyzing the structure of m-RNN and combining the current research frontiers of image processing and natural language processing, we find out the problem of the ineffectiveness of m-RNN for some image generation descriptions, starting from both the image feature extraction part and text sequence data processing. Unlike traditional image automatic processing algorithms, this algorithm does not need to add complex rules manually. Still, it evaluates and filters through the training image collection and finally generates image automatic processing models by m-RNN. An image semantic segmentation algorithm is proposed based on multimodal attention and adaptive feature fusion. The main idea of the algorithm is to combine adaptive and feature fusion and then introduce data enhancement for small-scale multimodal light environment datasets by extracting the importance between images through multimodal attention. The model proposed in this paper can span the semantic differences of different modalities and construct feature relationships between different modalities to achieve an inferable, interpretable, and scalable feature representation of multimodal data. The automatic processing of light environment images using multimodal neural networks based on traditional algorithms eliminates manual processing and greatly reduces the time and effort of image processing.
16

Tanaka, Yukari, Hirokata Fukushima, Kazuo Okanoya e Masako Myowa-Yamakoshi. "Mothers' multimodal information processing is modulated by multimodal interactions with their infants". International Journal of Psychophysiology 94, n. 2 (novembre 2014): 174. http://dx.doi.org/10.1016/j.ijpsycho.2014.08.744.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
17

Chmiel, Agnieszka, Przemysław Janikowski e Agnieszka Lijewska. "Multimodal processing in simultaneous interpreting with text". Target. International Journal of Translation Studies 32, n. 1 (21 gennaio 2020): 37–58. http://dx.doi.org/10.1075/target.18157.chm.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Abstract The present study focuses on (in)congruence of input between the visual and the auditory modality in simultaneous interpreting with text. We asked twenty-four professional conference interpreters to simultaneously interpret an aurally and visually presented text with controlled incongruences in three categories (numbers, names and control words), while measuring interpreting accuracy and eye movements. The results provide evidence for the dominance of the visual modality, which goes against the professional standard of following the auditory modality in the case of incongruence. Numbers enjoyed the greatest accuracy across conditions possibly due to simple cross-language semantic mappings. We found no evidence for a facilitation effect for congruent items, and identified an impeding effect of the presence of the visual text for incongruent items. These results might be interpreted either as evidence for the Colavita effect (in which visual stimuli take precedence over auditory ones) or as strategic behaviour applied by professional interpreters to avoid risk.
18

Brunet, Paul M., Roddy Cowie, Dirk Heylen, Anton Nijholt e Marc Schröder. "Conceptual frameworks for multimodal social signal processing". Journal on Multimodal User Interfaces 6, n. 3-4 (26 maggio 2012): 95–99. http://dx.doi.org/10.1007/s12193-012-0099-3.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
19

Penia, Oleksandr, e Yevgeniya Sulema. "ЗАСТОСУВАННЯ ГЛИБОКИХ ШТУЧНИХ НЕЙРОННИХ МЕРЕЖ ДЛЯ КЛАСИФІКАЦІЇ МУЛЬТИМОДАЛЬНИХ ДАНИХ". System technologies 6, n. 149 (1 aprile 2024): 11–22. http://dx.doi.org/10.34185/1562-9945-6-149-2023-02.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Multimodal data analysis is gaining attention in recent research. Pu Liang et al. (2023) provide a comprehensive overview on multimodal machine learning, highlighting its founda-tions, challenges and achievements in recent years. More problem-oriented works propose new methods and applications for multimodal ML, such a Ngiam et al. (2011) propose to use joint audio and video data to improve speech recognition accuracy; Sun, Wand and Li (2018) describe application of multimodal classification for breast cancer prognosis prediction; Mao et al. (2014) propose an architecture of multimodal recurrent network to generate text de-scription of images and so on. However, such works usually focus on the task itself and meth-ods therein, and not on integrating multimodal data processing into other software systems. The goal of this research is to propose a way to conduct multimodal data processing, specifically as a part of a digital twin systems, thus efficiency and near-real-time operation are required. The paper presents an approach to conduct parallel multimodal data classification, adapting to available computing power. The method is modular and scalable and intended for in digital twin application as a part of analysis and modeling tools. Later, the detailed example of such a software module is discussed. It uses multimodal data from open datasets to detect and classify the behavior of pets using deep learning mod-els. Videos are processed using two artificial neural networks: YOLOv3 object detection net-work to process individual frames of the video and a relatively simple convolutional network to classify sounds based on their frequency spectra. Constructed module uses a producer-consumer parallel processing pattern and allows processing 5 frames per second of a video on available hardware, which can be sufficiently improved by using GPU acceleration or more paralleled processing threads.
20

Zou, Zhuo. "Performance analysis of using multimodal embedding and word embedding transferred to sentiment classification". Applied and Computational Engineering 5, n. 1 (14 giugno 2023): 417–22. http://dx.doi.org/10.54254/2755-2721/5/20230610.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Multimodal machine learning is one of artificial intelligence's most important research topics. Contrastive Language-Image Pretraining (CLIP) is one of the applications of multimodal machine Learning and is well applied to computer vision. However, there is a research gap in applying CLIP in natural language processing. Therefore, based on IMDB, this paper applies the multimodal features of CLIP and three other pre-trained word vectors, Glove, Word2vec, and BERT, to compare their effects on sentiment classification of natural language processing, to test the performance of CLIP multimodal feature tuning in natural language processing. The results show that the multimodal feature of CLIP does not produce a significant effect on sentiment classification, and other multimodal features gain better effects. The highest accuracy is produced by BERT, and the Word embedding of CLIP is the lowest of the four accuracies of word embedding. At the same time, glove and word2vec are relatively close. The reason may be that the pre-trained CLIP model learns SOTA image representations from pictures and their descriptions, which is unsuitable for sentiment classification tasks. The specific reason remains untested.
21

Zhang, Ye, Diego Frassinelli, Jyrki Tuomainen, Jeremy I. Skipper e Gabriella Vigliocco. "More than words: word predictability, prosody, gesture and mouth movements in natural language comprehension". Proceedings of the Royal Society B: Biological Sciences 288, n. 1955 (21 luglio 2021): 20210500. http://dx.doi.org/10.1098/rspb.2021.0500.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
The ecology of human language is face-to-face interaction, comprising cues such as prosody, co-speech gestures and mouth movements. Yet, the multimodal context is usually stripped away in experiments as dominant paradigms focus on linguistic processing only. In two studies we presented video-clips of an actress producing naturalistic passages to participants while recording their electroencephalogram. We quantified multimodal cues (prosody, gestures, mouth movements) and measured their effect on a well-established electroencephalographic marker of processing load in comprehension (N400). We found that brain responses to words were affected by informativeness of co-occurring multimodal cues, indicating that comprehension relies on linguistic and non-linguistic cues. Moreover, they were affected by interactions between the multimodal cues, indicating that the impact of each cue dynamically changes based on the informativeness of other cues. Thus, results show that multimodal cues are integral to comprehension, hence, our theories must move beyond the limited focus on speech and linguistic processing.
22

García, Adolfo M., Eugenia Hesse, Agustina Birba, Federico Adolfi, Ezequiel Mikulan, Miguel Martorell Caro, Agustín Petroni et al. "Time to Face Language: Embodied Mechanisms Underpin the Inception of Face-Related Meanings in the Human Brain". Cerebral Cortex 30, n. 11 (24 giugno 2020): 6051–68. http://dx.doi.org/10.1093/cercor/bhaa178.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Abstract In construing meaning, the brain recruits multimodal (conceptual) systems and embodied (modality-specific) mechanisms. Yet, no consensus exists on how crucial the latter are for the inception of semantic distinctions. To address this issue, we combined electroencephalographic (EEG) and intracranial EEG (iEEG) to examine when nouns denoting facial body parts (FBPs) and nonFBPs are discriminated in face-processing and multimodal networks. First, FBP words increased N170 amplitude (a hallmark of early facial processing). Second, they triggered fast (~100 ms) activity boosts within the face-processing network, alongside later (~275 ms) effects in multimodal circuits. Third, iEEG recordings from face-processing hubs allowed decoding ~80% of items before 200 ms, while classification based on multimodal-network activity only surpassed ~70% after 250 ms. Finally, EEG and iEEG connectivity between both networks proved greater in early (0–200 ms) than later (200–400 ms) windows. Collectively, our findings indicate that, at least for some lexico-semantic categories, meaning is construed through fast reenactments of modality-specific experience.
23

Paulmann, Silke, Sarah Jessen e Sonja A. Kotz. "Investigating the Multimodal Nature of Human Communication". Journal of Psychophysiology 23, n. 2 (gennaio 2009): 63–76. http://dx.doi.org/10.1027/0269-8803.23.2.63.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
The multimodal nature of human communication has been well established. Yet few empirical studies have systematically examined the widely held belief that this form of perception is facilitated in comparison to unimodal or bimodal perception. In the current experiment we first explored the processing of unimodally presented facial expressions. Furthermore, auditory (prosodic and/or lexical-semantic) information was presented together with the visual information to investigate the processing of bimodal (facial and prosodic cues) and multimodal (facial, lexic, and prosodic cues) human communication. Participants engaged in an identity identification task, while event-related potentials (ERPs) were being recorded to examine early processing mechanisms as reflected in the P200 and N300 component. While the former component has repeatedly been linked to physical property stimulus processing, the latter has been linked to more evaluative “meaning-related” processing. A direct relationship between P200 and N300 amplitude and the number of information channels present was found. The multimodal-channel condition elicited the smallest amplitude in the P200 and N300 components, followed by an increased amplitude in each component for the bimodal-channel condition. The largest amplitude was observed for the unimodal condition. These data suggest that multimodal information induces clear facilitation in comparison to unimodal or bimodal information. The advantage of multimodal perception as reflected in the P200 and N300 components may thus reflect one of the mechanisms allowing for fast and accurate information processing in human communication.
24

Daulay, Nahdyah Sari, SitiIsma Sari Lubis e Widya Wulandari. "MULTIMODAL METAPHOR IN ADVERTISEMENT". AICLL: ANNUAL INTERNATIONAL CONFERENCE ON LANGUAGE AND LITERATURE 1, n. 1 (17 aprile 2018): 170–75. http://dx.doi.org/10.30743/aicll.v1i1.24.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Metaphor based on the cognitive linguistic view can be defined as a tool which allows us to understand one conceptual domain in terms of another. What usually happens is that we use a physical. What we need to comprehend, is the target domain. It means that human cognition is organized in conceptual schema. Rodriguez (2015) stated that multimodal needs a mental comprehension process which differs from processing visual or verbal concepts alone. Metaphor has been used in many advertising. The metaphor can be interpreted differently from one to others. This paper was to present an analysis of visual metaphors, and to illustrate the existence of a possible of multimodal metaphors in advertising. Multimodal needs a mental comprehension process which differs from processing visual or verbal concepts alone. In this case this study only focuses on the analysis of multimodality metaphor which found in some advertisements. In analyzing the multimodal metaphors in commercial advertising, corpus private static adverts from the TV were selected. All of the pictures presented are a verbal part.
25

Salamone, Paula C., Agustina Legaz, Lucas Sedeño, Sebastián Moguilner, Matías Fraile-Vazquez, Cecilia Gonzalez Campo, Sol Fittipaldi et al. "Interoception Primes Emotional Processing: Multimodal Evidence from Neurodegeneration". Journal of Neuroscience 41, n. 19 (7 aprile 2021): 4276–92. http://dx.doi.org/10.1523/jneurosci.2578-20.2021.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
26

Nakamura, S. "Statistical multimodal integration for audio-visual speech processing". IEEE Transactions on Neural Networks 13, n. 4 (luglio 2002): 854–66. http://dx.doi.org/10.1109/tnn.2002.1021886.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
27

Li Deng, Kuansan Wang, A. Acero, Hsiao-Wuen Hon, J. Droppo, C. Boulis, Ye-Yi Wang et al. "Distributed speech processing in miPad's multimodal user interface". IEEE Transactions on Speech and Audio Processing 10, n. 8 (novembre 2002): 605–19. http://dx.doi.org/10.1109/tsa.2002.804538.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
28

Samman, Shatha N., Kay M. Stanney, Joseph Dalton, Ali M. Ahmad, Clint Bowers e Valerie Sims. "Multimodal Interaction: Multi-Capacity Processing Beyond 7 +/− 2". Proceedings of the Human Factors and Ergonomics Society Annual Meeting 48, n. 3 (settembre 2004): 386–90. http://dx.doi.org/10.1177/154193120404800324.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
29

Bengio, Samy. "Multimodal speech processing using asynchronous Hidden Markov Models". Information Fusion 5, n. 2 (giugno 2004): 81–89. http://dx.doi.org/10.1016/j.inffus.2003.04.001.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
30

Ruan, Ludan, Anwen Hu, Yuqing Song, Liang Zhang, Sipeng Zheng e Qin Jin. "Accommodating Audio Modality in CLIP for Multimodal Processing". Proceedings of the AAAI Conference on Artificial Intelligence 37, n. 8 (26 giugno 2023): 9641–49. http://dx.doi.org/10.1609/aaai.v37i8.26153.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Multimodal processing has attracted much attention lately especially with the success of pre-training. However, the exploration has mainly focused on vision-language pre-training, as introducing more modalities can greatly complicate model design and optimization. In this paper, we extend the state-of-the-art Vision-Language model CLIP to accommodate the audio modality for Vision-Language-Audio multimodal processing. Specifically, we apply inter-modal and intra-modal contrastive learning to explore the correlation between audio and other modalities in addition to the inner characteristics of the audio modality. Moreover, we further design an audio type token to dynamically learn different audio information type for different scenarios, as both verbal and nonverbal heterogeneous information is conveyed in general audios. Our proposed CLIP4VLA model is validated in different downstream tasks including video retrieval and video captioning, and achieves the state-of-the-art performance on the benchmark datasets of MSR-VTT, VATEX, and Audiocaps.The corresponding code and checkpoints will be released at https://github.com/ludanruan/CLIP4VLA.
31

Hecht, David, Miriam Reiner e Gad Halevy. "Multimodal Virtual Environments: Response Times, Attention, and Presence". Presence: Teleoperators and Virtual Environments 15, n. 5 (1 ottobre 2006): 515–23. http://dx.doi.org/10.1162/pres.15.5.515.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Multimodal virtual environments (VE) succeed better than single-sensory technologies in creating a sense of presence. We hypothesize that the underlying cognitive mechanism is related to a faster mental processing of multimodal events. Comparing simple detection times of unimodal (auditory, visual, and haptic) events, with bimodal and trimodal combinations, we show that mental processing times are in the following order: unimodal > bimodal > trimodal. Given this processing-speed advantage, multimodal VE users start their cognitive process faster, thus, in a similar exposure time they can pay attention to more informative cues and subtle details in the environment and integrate them creatively. This richer, more complete and coherent experience may contribute to an enhanced sense of presence.
32

Xu, Kailiang, Feiyao Ling, Honglei Chen, Yifang Li, Tho N. H. T. Tran, Pascal Laugier, Jean-Gabriel Minonzio e Dean Ta. "Recent advances in phase-array based wideband multimodal dispersion curves extraction and plate-waveguide parameter inversion". Journal of the Acoustical Society of America 152, n. 4 (ottobre 2022): A268. http://dx.doi.org/10.1121/10.0016235.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Multimodal ultrasonic guided waves with rich dispersion information have been widely applied for waveguide evaluation. However, there are still some challenges to extract multimode dispersion curves using the traditional pitch-catch method, including lack of spatial information and poor signal-to-noise ratio. Phase array has been recently developed for measuring ultrasonic guided waves, which brings many advantages for wideband multimodal dispersion curves extraction and parameter inversion. In the last years, basing on a multi-emitter and multi-receiver measurement, we proposed some array dispersive signal processing strategies for analyzing guided waves in long cortical bone, including the sparse singular vector decomposition (sparse-SVD) method for enhancing wavenumber estimation resolution and the low-amplitude mode extraction, Radon transform and dispersive Radon transform (DRT) with the capability to project temporal array dispersive signals on the space of parameters of interest for solving the inverse problem, and our recent work of deep neural networks for solving the intractable multiparameter inverse problem from the array signals to the waveguide elasticity. In the talk, we present (1) recent advances in array signal processing for extracting wideband multimodal dispersion curves; (2) some new perspectives to retrieve the waveguide parameters; (3) some pilot clinical results of long cortical bone evaluation using ultrasonic guided waves.
33

Mingyu, Ji, Zhou Jiawei e Wei Ning. "AFR-BERT: Attention-based mechanism feature relevance fusion multimodal sentiment analysis model". PLOS ONE 17, n. 9 (9 settembre 2022): e0273936. http://dx.doi.org/10.1371/journal.pone.0273936.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Multimodal sentiment analysis is an essential task in natural language processing which refers to the fact that machines can analyze and recognize emotions through logical reasoning and mathematical operations after learning multimodal emotional features. For the problem of how to consider the effective fusion of multimodal data and the relevance of multimodal data in multimodal sentiment analysis, we propose an attention-based mechanism feature relevance fusion multimodal sentiment analysis model (AFR-BERT). In the data pre-processing stage, text features are extracted using the pre-trained language model BERT (Bi-directional Encoder Representation from Transformers), and the BiLSTM (Bi-directional Long Short-Term Memory) is used to obtain the internal information of the audio. In the data fusion phase, the multimodal data fusion network effectively fuses multimodal features through the interaction of text and audio information. During the data analysis phase, the multimodal data association network analyzes the data by exploring the correlation of fused information between text and audio. In the data output phase, the model outputs the results of multimodal sentiment analysis. We conducted extensive comparative experiments on the publicly available sentiment analysis datasets CMU-MOSI and CMU-MOSEI. The experimental results show that AFR-BERT improves on the classical multimodal sentiment analysis model in terms of relevant performance metrics. In addition, ablation experiments and example analysis show that the multimodal data analysis network in AFR-BERT can effectively capture and analyze the sentiment features in text and audio.
34

Qi, Qingfu, Liyuan Lin e Rui Zhang. "Feature Extraction Network with Attention Mechanism for Data Enhancement and Recombination Fusion for Multimodal Sentiment Analysis". Information 12, n. 9 (24 agosto 2021): 342. http://dx.doi.org/10.3390/info12090342.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Multimodal sentiment analysis and emotion recognition represent a major research direction in natural language processing (NLP). With the rapid development of online media, people often express their emotions on a topic in the form of video, and the signals it transmits are multimodal, including language, visual, and audio. Therefore, the traditional unimodal sentiment analysis method is no longer applicable, which requires the establishment of a fusion model of multimodal information to obtain sentiment understanding. In previous studies, scholars used the feature vector cascade method when fusing multimodal data at each time step in the middle layer. This method puts each modal information in the same position and does not distinguish between strong modal information and weak modal information among multiple modalities. At the same time, this method does not pay attention to the embedding characteristics of multimodal signals across the time dimension. In response to the above problems, this paper proposes a new method and model for processing multimodal signals, which takes into account the delay and hysteresis characteristics of multimodal signals across the time dimension. The purpose is to obtain a multimodal fusion feature emotion analysis representation. We evaluate our method on the multimodal sentiment analysis benchmark dataset CMU Multimodal Opinion Sentiment and Emotion Intensity Corpus (CMU-MOSEI). We compare our proposed method with the state-of-the-art model and show excellent results.
35

Fahrner, Harald, Stefan Kirrmann, Mark Gainey, Marianne Schmucker, Martin Vogel e Felix Ernst Heinemann. "Multimodal Document Management in Radiotherapy, an Update". Journal of Radiation Oncology Informatics 10, n. 1 (11 febbraio 2019): 9. http://dx.doi.org/10.5166/jroi-10-1-1.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Background: In 2013, we presented a study entitled “Multimodal document management in radiotherapy”, demonstrating the excellent routine performance of the system about four years after its initiation by evaluating a sample of n=500 documents. During this time the system saw additional developments and significant improvements: the most important innovative step being the automatic document processing. This has been completely reworked, to minimize staff-machine interaction, to increase processing speed and to further simplify the overall document handling. This improved system has been running practically without any problems for several months. Methods: While reworking the automatic document processing, we have developed algorithms that allow us to transfer documents with varying type, within a single scanning procedure, into our departmental system. The system identifies and corrects for any arbitrary order or rotation of scanned pages. Finally, after the transfer into the departmental system, all documents are in the correct order and they are automatically linked to the respective patient record. Results: According to our surveys, the error rate of the system, as in the previous version, is 0%. Compared to manual scanning and mapping of documents, we can quantify a 30-fold increase in the processing speed. In spite of these additional and elaborate processes, code optimizations yielded a processing speed increase of 20%. Pre-sorting of the documents (e.g., medical reports, or documents of informed consents) can be completely dispensed with the automated correction for jumbled documents or document rotations. In this manner 25,000 documents are automatically processed each year in the Department of Radiation Oncology at the University of Freiburg. Conclusion: With the methods presented in this study, and some additional bug fixes, and small improvements, automatic document processing of our departmental system was significantly improved without compromising the error rate. Keywords: Clinic management, documents, workflow, optimisation, efficiency, automation, Mosaiq, oncology informatics
36

Song, Chunlai. "Enhancing Multimodal Understanding With LIUS". Journal of Organizational and End User Computing 36, n. 1 (12 gennaio 2024): 1–17. http://dx.doi.org/10.4018/joeuc.336276.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
VQA (visual question and answer) is the task of enabling a computer to generate accurate textual answers based on given images and related questions. It integrates computer vision and natural language processing and requires a model that is able to understand not only the image content but also the question in order to generate appropriate linguistic answers. However, current limitations in cross-modal understanding often result in models that struggle to accurately capture the complex relationships between images and questions, leading to inaccurate or ambiguous answers. This research aims to address this challenge through a multifaceted approach that combines the strengths of vision and language processing. By introducing the innovative LIUS framework, a specialized vision module was built to process image information and fuse features using multiple scales. The insights gained from this module are integrated with a “reasoning module” (LLM) to generate answers.
37

Rajalingam B. e Priya R. "Enhancement of Hybrid Multimodal Medical Image Fusion Techniques for Clinical Disease Analysis". International Journal of Computer Vision and Image Processing 8, n. 3 (luglio 2018): 16–40. http://dx.doi.org/10.4018/ijcvip.2018070102.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Multimodal medical image fusion is one the most significant and useful disease analytic techniques. This research article proposes the hybrid multimodality medical image fusion methods and discusses the most essential advantages and disadvantages of these methods. The hybrid multimodal medical image fusion algorithms are used to improve the quality of fused multimodality medical image. Magnetic resonance imaging, positron emission tomography, and single photon emission computed tomography are the input multimodal therapeutic images used for fusion process. An experimental result of proposed hybrid fusion techniques provides the fused multimodal medical images of highest quality, shortest processing time, and best visualization. Both traditional and hybrid multimodal medical image fusion algorithms are evaluated using several quality metrics. Compared with existing techniques the proposed result gives the better processing performance in both qualitative and quantitative evaluation criteria. This is favorable, especially for helping in accurate clinical disease analysis.
38

Liang, Chu, Jiajie Xu, Jie Zhao, Ying Chen e Jiwei Huang. "Deep Learning-Based Construction and Processing of Multimodal Corpus for IoT Devices in Mobile Edge Computing". Computational Intelligence and Neuroscience 2022 (5 agosto 2022): 1–10. http://dx.doi.org/10.1155/2022/2241310.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Dialogue sentiment analysis is a hot topic in the field of artificial intelligence in recent years, in which the construction of multimodal corpus is the key part of dialogue sentiment analysis. With the rapid development of the Internet of Things (IoT), it provides a new means to collect the multiparty dialogues to construct a multimodal corpus. The rapid development of Mobile Edge Computing (MEC) provides a new platform for the construction of multimodal corpus. In this paper, we construct a multimodal corpus on MEC servers to make full use of the storage space distributed at the edge of the network according to the procedure of constructing a multimodal corpus that we propose. At the same time, we build a deep learning model (sentiment analysis model) and use the constructed corpus to train the deep learning model for sentiment on MEC servers to make full use of the computing power distributed at the edge of the network. We carry out experiments based on real-world dataset collected by IoT devices, and the results validate the effectiveness of our sentiment analysis model.
39

Pinto, Inês F., Maria Raquel Aires-Barros e Ana M. Azevedo. "Multimodal chromatography: debottlenecking the downstream processing of monoclonal antibodies". Pharmaceutical Bioprocessing 3, n. 3 (giugno 2015): 263–79. http://dx.doi.org/10.4155/pbp.15.7.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
40

Engebretsen, Martin. "From Decoding a Graph to Processing a Multimodal Message". Nordicom Review 41, n. 1 (18 febbraio 2020): 33–50. http://dx.doi.org/10.2478/nor-2020-0004.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
AbstractData visualisation – in the forms of graphs, charts, and maps – represents a text type growing in prevalence and impact in many cultural domains; education, journalism, business, PR, and more. Research on data visualisation reception is scarce, particularly that related to interactive and dynamic forms of data visualisation in digital media. Taking an approach inspired by grounded theory, in this article I investigate the ways in which young students interact with data visualisations found in digital news media. Combining observations from reading sessions with ten in-depth interviews, I investigate how the informants read, interpreted, and responded emotionally to data visualisations including visual metaphors, interactivity, and animation.
41

Li, Tianyun, e Bicheng Fan. "Attention-Sharing Initiative of Multimodal Processing in Simultaneous Interpreting". International Journal of Translation, Interpretation, and Applied Linguistics 2, n. 2 (luglio 2020): 42–53. http://dx.doi.org/10.4018/ijtial.20200701.oa4.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
This study sets out to describe simultaneous interpreters' attention-sharing initiatives when exposed under input from both videotaped speech recording and real-time transcriptions. Separation of mental energy in acquiring visual input accords with the human brain's statistic optimization principle where the same property of an object is presented through diverse fashions. In examining professional interpreters' initiatives, the authors invited five professional English-Chinese conference interpreters to simultaneously interpret a videotaped speech with real-time captions generated by speech recognition engine while meanwhile monitoring their eye movements. The results indicate the professional interpreters' preferences in referring to visually presented captions along with the speaker's facial expressions, where low-frequency words, proper names, and numbers gained greater attention than words with higher frequency. This phenomenon might be explained by the working memory theory in which the central executive enables redundancy gains retrieved from dual-channel information.
42

Zhang, Tao, e Martin McKinney. "Multimodal signal processing and machine learning for hearing instruments". Journal of the Acoustical Society of America 143, n. 3 (marzo 2018): 1745. http://dx.doi.org/10.1121/1.5035696.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
43

Porta, Alberto, Federico Aletti, Frederic Vallais e Giuseppe Baselli. "Multimodal signal processing for the analysis of cardiovascular variability". Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 367, n. 1887 (22 ottobre 2008): 391–409. http://dx.doi.org/10.1098/rsta.2008.0229.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Cardiovascular (CV) variability as a primary vital sign carrying information about CV regulation systems is reviewed by pointing out the role of the main rhythms and the various control and functional systems involved. The high complexity of the addressed phenomena fosters a multimodal approach that relies on data analysis models and deals with the ongoing interactions of many signals at a time. The importance of closed-loop identification and causal analysis is remarked upon and basic properties, application conditions and methods are recalled. The need of further integration of CV signals relevant to peripheral and systemic haemodynamics, respiratory mechanics, neural afferent and efferent pathways is also stressed.
44

Deng, L., Y. Wang, K. Wang, A. Acero, H. Hon, J. Droppo, C. Boulis, M. Mahajan e X. D. Huang. "Speech and Language Processing for Multimodal Human-Computer Interaction". Journal of VLSI Signal Processing-Systems for Signal, Image, and Video Technology 36, n. 2/3 (febbraio 2004): 161–87. http://dx.doi.org/10.1023/b:vlsi.0000015095.19623.73.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
45

Jia, Zixi, Yilu Wang, Shengming Li, Meiqi Yang, Zhongyuan Liu e Huijing Zhang. "MICDnet: Multimodal information processing networks for Crohn’s disease diagnosis". Computers in Biology and Medicine 168 (gennaio 2024): 107790. http://dx.doi.org/10.1016/j.compbiomed.2023.107790.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
46

Glinert, E. P., e M. M. Blattner. "Multimodal Interaction". IEEE Multimedia 3, n. 4 (1996): 13. http://dx.doi.org/10.1109/mmul.1996.556455.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
47

Blattner, M. M., e E. P. Glinert. "Multimodal integration". IEEE Multimedia 3, n. 4 (1996): 14–24. http://dx.doi.org/10.1109/93.556457.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
48

Cooperstock, Jeremy. "Multimodal Telepresence Systems". IEEE Signal Processing Magazine 28, n. 1 (gennaio 2011): 77–86. http://dx.doi.org/10.1109/msp.2010.939040.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
49

Улмасова, Хилолахон. "Simultaneous interpreting as a multimodal task". Арабский язык в эпоху глобализации: инновационные подходы и методы обучения 1, n. 1 (29 dicembre 2023): 544–48. http://dx.doi.org/10.47689/atgd:iyom-vol1-iss1-pp544-548-id28645.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
The paper examines the processing of Simultaneous Interpreting, its difficulties and multimodal task, the importance of Visual Information in Simultaneous Interpreting, Simultaneous Interpreting as a Complex Task and visual, auditory and audiovisual stimuli in the brain
50

Burr, David, Ottavia Silva, Guido Marco Cicchini, Martin S. Banks e Maria Concetta Morrone. "Temporal mechanisms of multimodal binding". Proceedings of the Royal Society B: Biological Sciences 276, n. 1663 (25 febbraio 2009): 1761–69. http://dx.doi.org/10.1098/rspb.2008.1899.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
The simultaneity of signals from different senses—such as vision and audition—is a useful cue for determining whether those signals arose from one environmental source or from more than one. To understand better the sensory mechanisms for assessing simultaneity, we measured the discrimination thresholds for time intervals marked by auditory, visual or auditory–visual stimuli, as a function of the base interval. For all conditions, both unimodal and cross-modal, the thresholds followed a characteristic ‘dipper function’ in which the lowest thresholds occurred when discriminating against a non-zero interval. The base interval yielding the lowest threshold was roughly equal to the threshold for discriminating asynchronous from synchronous presentations. Those lowest thresholds occurred at approximately 5, 15 and 75 ms for auditory, visual and auditory–visual stimuli, respectively. Thus, the mechanisms mediating performance with cross-modal stimuli are considerably slower than the mechanisms mediating performance within a particular sense. We developed a simple model with temporal filters of different time constants and showed that the model produces discrimination functions similar to the ones we observed in humans. Both for processing within a single sense, and for processing across senses, temporal perception is affected by the properties of temporal filters, the outputs of which are used to estimate time offsets, correlations between signals, and more.

Vai alla bibliografia