Log in

Relevant bibliographies by topics / Audio-visual attention

Academic literature on the topic 'Audio-visual attention'

Author: Grafiati

Published: 9 March 2023

Last updated: 10 March 2023

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Contents

Journal articles
Dissertations / Theses
Books
Book chapters
Conference papers
Reports

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Audio-visual attention.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Audio-visual attention"

1

Chen, Yanxiang, Tam V. Nguyen, Mohan Kankanhalli, Jun Yuan, Shuicheng Yan, and Meng Wang. "Audio Matters in Visual Attention." IEEE Transactions on Circuits and Systems for Video Technology 24, no. 11 (November 2014): 1992–2003. http://dx.doi.org/10.1109/tcsvt.2014.2329380.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Lee, Yong-Hyeok, Dong-Won Jang, Jae-Bin Kim, Rae-Hong Park, and Hyung-Min Park. "Audio–Visual Speech Recognition Based on Dual Cross-Modality Attentions with the Transformer Model." Applied Sciences 10, no. 20 (October 17, 2020): 7263. http://dx.doi.org/10.3390/app10207263.

Full text

Abstract:

Since attention mechanism was introduced in neural machine translation, attention has been combined with the long short-term memory (LSTM) or replaced the LSTM in a transformer model to overcome the sequence-to-sequence (seq2seq) problems with the LSTM. In contrast to the neural machine translation, audio–visual speech recognition (AVSR) may provide improved performance by learning the correlation between audio and visual modalities. As a result that the audio has richer information than the video related to lips, AVSR is hard to train attentions with balanced modalities. In order to increase the role of visual modality to a level of audio modality by fully exploiting input information in learning attentions, we propose a dual cross-modality (DCM) attention scheme that utilizes both an audio context vector using video query and a video context vector using audio query. Furthermore, we introduce a connectionist-temporal-classification (CTC) loss in combination with our attention-based model to force monotonic alignments required in AVSR. Recognition experiments on LRS2-BBC and LRS3-TED datasets showed that the proposed model with the DCM attention scheme and the hybrid CTC/attention architecture achieved at least a relative improvement of 7.3% on average in the word error rate (WER) compared to competing methods based on the transformer model.

APA, Harvard, Vancouver, ISO, and other styles

3

Iwaki, Sunao, Mitsuo Tonoike, Masahiko Yamaguchi, and Takashi Hamada. "Modulation of extrastriate visual processing by audio-visual intermodal selective attention." NeuroImage 11, no. 5 (May 2000): S21. http://dx.doi.org/10.1016/s1053-8119(00)90956-x.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

NAGASAKI, Yoshiki, Masaki HAYASHI, Naoshi KANEKO, and Yoshimitsu AOKI. "Temporal Cross-Modal Attention for Audio-Visual Event Localization." Journal of the Japan Society for Precision Engineering 88, no. 3 (March 5, 2022): 263–68. http://dx.doi.org/10.2493/jjspe.88.263.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Xuan, Hanyu, Zhenyu Zhang, Shuo Chen, Jian Yang, and Yan Yan. "Cross-Modal Attention Network for Temporal Inconsistent Audio-Visual Event Localization." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 01 (April 3, 2020): 279–86. http://dx.doi.org/10.1609/aaai.v34i01.5361.

Full text

Abstract:

In human multi-modality perception systems, the benefits of integrating auditory and visual information are extensive as they provide plenty supplementary cues for understanding the events. Despite some recent methods proposed for such application, they cannot deal with practical conditions with temporal inconsistency. Inspired by human system which puts different focuses at specific locations, time segments and media while performing multi-modality perception, we provide an attention-based method to simulate such process. Similar to human mechanism, our network can adaptively select “where” to attend, “when” to attend and “which” to attend for audio-visual event localization. In this way, even with large temporal inconsistent between vision and audio, our network is able to adaptively trade information between different modalities and successfully achieve event localization. Our method achieves state-of-the-art performance on AVE (Audio-Visual Event) dataset collected in the real life. In addition, we also systemically investigate audio-visual event localization tasks. The visualization results also help us better understand how our model works.

APA, Harvard, Vancouver, ISO, and other styles

6

Iwaki, Sunao. "Audio-visual intermodal orientation of attention modulates task-specific extrastriate visual processing." Neuroscience Research 68 (January 2010): e269. http://dx.doi.org/10.1016/j.neures.2010.07.1195.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Keitel, Christian, and Matthias M. Müller. "Audio-visual synchrony and feature-selective attention co-amplify early visual processing." Experimental Brain Research 234, no. 5 (August 1, 2015): 1221–31. http://dx.doi.org/10.1007/s00221-015-4392-8.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Zhu, Hao, Man-Di Luo, Rui Wang, Ai-Hua Zheng, and Ran He. "Deep Audio-visual Learning: A Survey." International Journal of Automation and Computing 18, no. 3 (April 15, 2021): 351–76. http://dx.doi.org/10.1007/s11633-021-1293-0.

Full text

Abstract:

AbstractAudio-visual learning, aimed at exploiting the relationship between audio and visual modalities, has drawn considerable attention since deep learning started to be used successfully. Researchers tend to leverage these two modalities to improve the performance of previously considered single-modality tasks or address new challenging problems. In this paper, we provide a comprehensive survey of recent audio-visual learning development. We divide the current audio-visual learning tasks into four different subfields: audio-visual separation and localization, audio-visual correspondence learning, audio-visual generation, and audio-visual representation learning. State-of-the-art methods, as well as the remaining challenges of each subfield, are further discussed. Finally, we summarize the commonly used datasets and challenges.

APA, Harvard, Vancouver, ISO, and other styles

9

Ran, Yue, Hongying Tang, Baoqing Li, and Guohui Wang. "Self-Supervised Video Representation and Temporally Adaptive Attention for Audio-Visual Event Localization." Applied Sciences 12, no. 24 (December 9, 2022): 12622. http://dx.doi.org/10.3390/app122412622.

Full text

Abstract:

Localizing the audio-visual events in video requires a combined judgment of visual and audio components. To integrate multimodal information, existing methods modeled the cross-modal relationships by feeding unimodal features into attention modules. However, these unimodal features are encoded in separate spaces, resulting in a large heterogeneity gap between modalities. Existing attention modules, on the other hand, ignore the temporal asynchrony between vision and hearing when constructing cross-modal connections, which may lead to the misinterpretation of one modality by another. Therefore, this paper aims to improve event localization performance by addressing these two problems and proposes a framework that feeds audio and visual features encoded in the same semantic space into a temporally adaptive attention module. Specifically, we develop a self-supervised representation method to encode features with a smaller heterogeneity gap by matching corresponding semantic cues between synchronized audio and visual signals. Furthermore, we develop a temporally adaptive cross-modal attention based on a weighting method that dynamically channels attention according to the time differences between event-related features. The proposed framework achieves state-of-the-art performance on the public audio-visual event dataset and the experimental results not only show that our self-supervised method can learn more discriminative features but also verify the effectiveness of our strategy for assigning attention.

APA, Harvard, Vancouver, ISO, and other styles

10

Zhao, Sicheng, Yunsheng Ma, Yang Gu, Jufeng Yang, Tengfei Xing, Pengfei Xu, Runbo Hu, Hua Chai, and Kurt Keutzer. "An End-to-End Visual-Audio Attention Network for Emotion Recognition in User-Generated Videos." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 01 (April 3, 2020): 303–11. http://dx.doi.org/10.1609/aaai.v34i01.5364.

Full text

Abstract:

Emotion recognition in user-generated videos plays an important role in human-centered computing. Existing methods mainly employ traditional two-stage shallow pipeline, i.e. extracting visual and/or audio features and training classifiers. In this paper, we propose to recognize video emotions in an end-to-end manner based on convolutional neural networks (CNNs). Specifically, we develop a deep Visual-Audio Attention Network (VAANet), a novel architecture that integrates spatial, channel-wise, and temporal attentions into a visual 3D CNN and temporal attentions into an audio 2D CNN. Further, we design a special classification loss, i.e. polarity-consistent cross-entropy loss, based on the polarity-emotion hierarchy constraint to guide the attention generation. Extensive experiments conducted on the challenging VideoEmotion-8 and Ekman-6 datasets demonstrate that the proposed VAANet outperforms the state-of-the-art approaches for video emotion recognition. Our source code is released at: https://github.com/maysonma/VAANet.

APA, Harvard, Vancouver, ISO, and other styles

More sources

Dissertations / Theses on the topic "Audio-visual attention"

1

Sharma, Dinkar. "Effects of attention on audio-visual speech." Thesis, University of Reading, 1989. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.329379.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Ren, Reede. "Audio-visual football video analysis, from structure detection to attention analysis." Thesis, Connect to e-thesis. Move to record for print version, 2008. http://theses.gla.ac.uk/77/.

Full text

Abstract:

Thesis (Ph.D.) - University of Glasgow, 2008.
Ph.D. thesis submitted to the Faculty of Information and Mathematical Sciences, Department of Computing Science, University of Glasgow, 2008. Includes bibliographical references. Print version also available.

APA, Harvard, Vancouver, ISO, and other styles

3

Song, Guanghan. "Effect of sound in videos on gaze : contribution to audio-visual saliency modelling." Thesis, Grenoble, 2013. http://www.theses.fr/2013GRENT013/document.

Full text

Abstract:

Les humains reçoivent grande quantité d'informations de l'environnement avec vue et l'ouïe . Pour nous aider à réagir rapidement et correctement, il existe des mécanismes dans le cerveau à l'attention de polarisation vers des régions particulières , à savoir les régions saillants . Ce biais attentionnel n'est pas seulement influencée par la vision , mais aussi influencée par l'interaction audio - visuelle . Selon la littérature existante , l'attention visuelle peut être étudié à mouvements oculaires , mais l'effet sonore sur le mouvement des yeux dans les vidéos est peu connue . L'objectif de cette thèse est d'étudier l'influence du son dans les vidéos sur le mouvement des yeux et de proposer un modèle de saillance audio - visuel pour prédire les régions saillants dans les vidéos avec plus de précision . A cet effet, nous avons conçu une première expérience audio - visuelle de poursuite oculaire . Nous avons créé une base de données d'extraits vidéo courts choisis dans divers films . Ces extraits ont été consultés par les participants , soit avec leur bande originale (condition AV ) , ou sans bande sonore ( état V) . Nous avons analysé la différence de positions de l'oeil entre les participants des conditions de AV et V . Les résultats montrent qu'il n'existe un effet du bruit sur le mouvement des yeux et l'effet est plus important pour la classe de la parole à l'écran . Ensuite , nous avons conçu une deuxième expérience audiovisuelle avec treize classes de sons. En comparant la différence de positions de l'oeil entre les participants des conditions de AV et V , nous concluons que l'effet du son est différente selon le type de son , et les classes avec la voix humaine ( c'est à dire les classes parole , chanteur , bruit humain et chanteurs ) ont le plus grand effet . Plus précisément , la source sonore a attiré considérablement la position des yeux uniquement lorsque le son a été la voix humaine . En outre , les participants atteints de la maladie de AV avaient une durée moyenne plus courte de fixation que de l'état de V . Enfin , nous avons proposé un modèle de saillance audio- visuel préliminaire sur la base des résultats des expériences ci-dessus . Dans ce modèle , deux stratégies de fusion de l'information audio et visuelle ont été décrits: l'un pour la classe de son discours , et l'autre pour la musique classe de son instrument . Les stratégies de fusion audio - visuelle définies dans le modèle améliore la prévisibilité à la condition AV
Humans receive large quantity of information from the environment with sight and hearing. To help us to react rapidly and properly, there exist mechanisms in the brain to bias attention towards particular regions, namely the salient regions. This attentional bias is not only influenced by vision, but also influenced by audio-visual interaction. According to existing literature, the visual attention can be studied towards eye movements, however the sound effect on eye movement in videos is little known. The aim of this thesis is to investigate the influence of sound in videos on eye movement and to propose an audio-visual saliency model to predict salient regions in videos more accurately. For this purpose, we designed a first audio-visual experiment of eye tracking. We created a database of short video excerpts selected from various films. These excerpts were viewed by participants either with their original soundtrack (AV condition), or without soundtrack (V condition). We analyzed the difference of eye positions between participants with AV and V conditions. The results show that there does exist an effect of sound on eye movement and the effect is greater for the on-screen speech class. Then, we designed a second audio-visual experiment with thirteen classes of sound. Through comparing the difference of eye positions between participants with AV and V conditions, we conclude that the effect of sound is different depending on the type of sound, and the classes with human voice (i.e. speech, singer, human noise and singers classes) have the greatest effect. More precisely, sound source significantly attracted eye position only when the sound was human voice. Moreover, participants with AV condition had a shorter average duration of fixation than with V condition. Finally, we proposed a preliminary audio-visual saliency model based on the findings of the above experiments. In this model, two fusion strategies of audio and visual information were described: one for speech sound class, and one for musical instrument sound class. The audio-visual fusion strategies defined in the model improves its predictability with AV condition

APA, Harvard, Vancouver, ISO, and other styles

4

D'AMELIO, ALESSANDRO. "A STOCHASTIC FORAGING MODEL OF ATTENTIVE EYE GUIDANCE ON DYNAMIC STIMULI." Doctoral thesis, Università degli Studi di Milano, 2021. http://hdl.handle.net/2434/816678.

Full text

Abstract:

Understanding human behavioural signals is one of the key ingredients of an effective human-human and human-computer interaction (HCI). In such respect, non verbal communication plays a key role and is composed by a variety of modalities acting jointly to convey a common message. In particular, cues like gesture, facial expression, prosody etc. have the same importance as spoken words. Gaze behaviour makes no exception, being one of the most common, yet unobtrusive ways of communicating. To this aim, many computational models of visual attention allocation have been proposed; although such models were primarily conceived in the psychological field, in the last couple of decades, the problem of predicting attention allocation on a visual stimuli has started to catch the interest of the computer vision and pattern recognition community, pushed by the fast growing number of possible applications (e.g. autonomous driving, image/video compression, robotics). In this renaissance of attention modelling, some of the key features characterizing eye movements were at best overlooked; in particular the explicit unrolling in time of eye movements (i.e. their dynamics) has been seldom taken into account. Moreover, the vast majority of the proposed models are only able to deal with static stimuli (images), with few notable exceptions. The main contribution of this work is a novel computational model of attentive eye guidance which derives gaze dynamics in a principled way, by reformulating attention deployment as a stochastic foraging problem. We show how treating a virtual observer attending to a video as a stochastic composite forager searching for valuable patches in a multi-modal landscape, leads to simulated gaze trajectories that are not statistically distinguishable from the ones performed by humans while free-viewing the same scene. Model simulation and experiments are carried out on a publicly available dataset of eye-tracked subjects displaying conversations and social interactions between humans.

APA, Harvard, Vancouver, ISO, and other styles

5

MARINI, FRANCESCO. "Attentional control guides the strategic filtering of potential distraction as revealed by behavior and Fmri." Doctoral thesis, Università degli Studi di Milano-Bicocca, 2014. http://hdl.handle.net/10281/50236.

Full text

Abstract:

When dealing with significant sensory stimuli, performance can be hampered by distracting events. Attention mechanisms lessen such negative effects, enabling selection of relevant information while blocking potential distraction. Recent work shows that preventing the negative impact of forthcoming distraction is actively achieved by attentional selection processes. Thus, I hypothesize that the engagement of a distraction-filtering mechanism to counteract distraction, although indisputably beneficial when distraction occurs, also taxes cognitive-brain systems when distraction is expected but does not occur, leading to performance costs. In my thesis, I seek the behavioral and brain signature of a mechanism for the filtering of potential distraction within and between sensory modalities. I show that, when potential distraction is foreseen in a stimulus-processing context, a cognitive mechanism is engaged for limiting negative impact of irrelevant stimuli on behavioral performance, yet its engagement is resource-demanding and thus incurs a performance cost when distraction does not occur. This cost consists of slower response times to a simple sensory stimulus when presented alone but in a potentially-distracting context, as compared to the same stimulus presented in a completely distraction-free context. This cost generalizes across different target and distracters sensory modalities, such as touch, vision and audition, and to both space-based and feature-based attention tasks. The activation of the filtering mechanism relies on both strategic and reactive processes, as shown by its dynamic dependence on probabilistic and cross-trial contingencies. Probability of conflict substantially modulates the magnitude of the filtering cost, which results larger in contexts where the probability of experiencing conflict is high. Crucially, across participants, the observed strategic cost is inversely related to the interference exerted by a distracter on distracter-present trials. The strategic filtering mechanism is predominantly adopted as a longer-term, sustained, cognitive set throughout an extended time period. Its activation is associated with sustained brain activity in prefrontal areas and in the frontoparietal attentional network. Sustained brain activity in prefrontal areas correlates across participants with the filtering cost, thus confirming a close relationship between this sustained activation and the observed behavioral cost. I also show that the recruitment of the distraction filtering mechanism in a potentially distracting context guides attention and behavior through different top-down modulations. In fact, when potential distraction is foreseen, the activation of a filtering mechanism promotes both the attenuation of sensory representation of distracting stimuli in extrastriate visual cortex and the prevention of involuntary activations of conflict-driven motor responses in the premotor cortex. These results attest to the existence of a system for the monitoring and filtering of potential distraction in the human brain that likely reflects a general mechanism of cognitive-attentional control.

APA, Harvard, Vancouver, ISO, and other styles

6

Kristal-Ern, Alfred. "Can sound be used to effectively direct players' attention in a visual gameplay oriented task?" Thesis, Luleå tekniska universitet, Medier ljudteknik och upplevelseproduktion och teater, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-63621.

Full text

Abstract:

In this study, the understanding about multimodal perception from previous studies has been used to create a perceptually demanding visual search task inside a game. Also, a subtle multimodal cue was created to be in-directly informative about the visual search target’s location by attracting subjects’ attention. 20 subjects were divided equally among the experiments two conditions, one where the subjects had no access to the multimodal information and one where the subjects did have access to the multimodal information. The multimodal information conveyed to the subjects in this experiment was temporal synchrony between a visual light pulsating and a sound being modulated using level and low-pass filtering. Results showed that the subjects that were given the multimodal information improved more on the search task than the group without multimodal information, but the subjects in the multimodal group also perceived the pace of the task as higher. However, it is unclear exactly how the multimodal cue helped the subjects since the playing subjects did not seem to change their search movement pattern to favor the location of the search target, as was expected. Further, the difficulties and considerations of testing in a game environment is discussed and it is concluded that the gamer population is a very varied group which has big impacts on methodology of in-game experiments. To identify sub-groups, further research could study why different players use different search behaviors.

APA, Harvard, Vancouver, ISO, and other styles

7

Lind, Erik. "The role of an audio-visual attentional stimulus in influencing affective responses during graded cycling exercise." [Ames, Iowa : Iowa State University], 2008.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

8

Chen, Ya-Ching, and 陳雅靖. "The Effect of Audio Rhythm on Visual Attention." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/73567288538165563347.

Full text

Abstract:

碩士
國立交通大學
應用藝術研究所
92
Vision and audition are the two major modalities we use to receive outside messages. There is evidence indicating that vision and audition do not function independently. A better understanding of the interaction between these two senses is of great value to a designer. The focus of our study is the effect of audio rhythm on visual attention. We hypothesize that if the visual and the auditory stimuli are synchronized, viewer can be cued by the auditory rhythm and would pay more attention to the synchronized visual event. The aim of this study is to test this hypothesis. We used rapid serial visual presentation (RSVP) as a means to probe subjects’ visual spatial attention on a given spot of the screen. Two RSVP streams of different rhythms were presented to the viewer on each trial. One of the RSVP stream is synchronized with an auditory stimulus while another is not. If the viewer’s attention can be guided by the auditory rhythm, one would predict that the performance in the synchronized RSVP be better than that in another stream. The results show that: 1. The auditory rhythm, while being task-irrelevant by itself, does cue subjects’ attention to the synchronized visual event. 2. The power of cueing visual events is critically dependent upon the acoustic properties of the auditory stimulus. 3. Some rhythms are more potent than others in binding visual and auditory events. 4. As most viewers were not aware that one of the RSVP streams was synchronized to the auditory event, we believe the enhancement effect by synchronization occurs at an early, preconscious level.

APA, Harvard, Vancouver, ISO, and other styles

9

Lin, Yan-Bo, and 林彥伯. "Cross-Modality Co-Attention for Audio-Visual Event Localization." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/mamwpe.

Full text

Abstract:

碩士
國立臺灣大學
電信工程學研究所
107
Audio-visual event localization requires one to identify the event labelacross video frames by jointly observing visual and audio information. To address this task, we propose a deep neural network named Audio-Visual sequence-to-sequence dual network (AVSDN). By jointly taking both audio and visual features at each time segment as inputs, our proposed model learns global and local event information in a sequence to sequence manner. Besides, we also propose a deep learning framework of cross-modality co-attention for audio-visual event localization. The co-attention framework can be applied on existing methods and AVSDN. Our co-attention modelis able to exploit intra and inter-frame visual information, with audio features jointly observed to perform co-attention over the above three modalities.With visual, temporal, and audio information observed across consecutive video frames, our model achieves promising capability in extracting informative spatial/temporal features for improved event localization. Moreover,our model is able to produce instance-level attention, which would identify image regions at the instance level which are associated with the sound/event of interest. Experiments on a benchmark dataset confirm the effectiveness of our proposed framework,with ablation studies performed to verify the design of our propose network model.

APA, Harvard, Vancouver, ISO, and other styles

10

ZHOU, YONG-FENG, and 周永豐. "Combining Brainwave Instrument to Develop Visual and Audio Attention Test System: Apply to Children with Attention Deficit Hyperactivity Disorder." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/jsbfv5.

Full text

Abstract:

碩士
朝陽科技大學
資訊工程系
105
In this study, we focus on the development of visual and auditory attention test system (VAAT), which is used to evaluate the visual and auditory focus. The purpose of this study is to verify the concurrent validity of VAAT in this study, and to explore the effects of ADHD (Attention deficit hyperactivity disorder) on VAAT and Conners Continuous Performance Test 3rd Edition (CPT 3) and the Conners Kiddie Continuous Performance Test 2rd Edition, K-CPT 2). We also study the correlation between the scores of the parameters and the state of the brain waves. The subjects were by the 16 ADHD children and 2 other symptomatic and healthy children consisting of preschool group, the age from 4 to 7 years old; 11 ADHD children and 6 other symptomatic children consisting of kid group (age from 8 to 13 years old). Each child in the preschool group received K-CPT2 and VAAT. Each child in the kid group received CPT 3 and VAAT. To use the Neurosky brain cube mobile version (MindWave Mobile) measures the brain waves. The results of this study show that VAAT and K-CPT 2 have significantly positively correlated with more than half of the parameters. VAAT and CPT 3 have significantly positively correlated with most of parameters. Therefore, VAAT has good concurrent validity, it can be applied to the attention assessment. In compared with the children with ADHD of the visual and auditory ability, the preschool group of children with ADHD have poor auditory ability to identify target and non-target ability, and slower auditory response than the visual ability; the kid group of children with ADHD in visual and audio ability comparison, easy to omission auditory target and slower auditory response. The children with ADHD do right and woring brain wave difference, preschool group of children with ADHD when the Beta wave is reduced, showing inattention, and easy to do wrong (mistakenly press) non-target, but kid group of children with ADHD were not found in the test, because the number of cases is too small. In the analysis of the correlation between the parameters and brain wave performance, it was found that preschool group of children with ADHD in the identification ability and Alpha wave were negative correlattion, and the identification ability was lower, the Alpha wave will reduce and showed a tense phenomenon. The change of the reaction block and Delta wave were a negative correlation, representing the worse performance of change of the reaction block, the Delta wave will reduce, showed inattention and cognitive decline phenomenon. The kid group of children with ADHD in error rate, the Theta wave and Delta wave were negative correlation, indicating that the higher error rate score, the Theta and Delta wave will reduce, showed inattention and cognitive decline phenomenon. Preschool group with different types of ADHD children found in the K-CPT 2 performance, inattentive type was detected inattentive problem; impulsive type was detected vigilant problem, but wasn’t detected impulsive problem; combined type was detected inattentive problem, but wasn’t detected impulsive problem. Kid group with different types of ADHD children found in the CPT 3 performance, inattentive type was detected inattentive and sustained attention problems; impulsive type was detected sustained attention problem, but wasn’t detected impulsive problem; combined type was detected inattentive, sustained attention and vigilant problems, but wasn’t detected impulsive problem. This study developed VAAT not only to evaluate audio and visual attention, but also to explore the significant correlation between test performance and brain wave statistics. Therefore, if we can expand the test of normal children and ADHD children in the future, continued to verify and improve its reliability and validity. We believe by immediately and unique features of VAAT, it should be universal in children's attention tests and provide children's medical clinical application reference.

APA, Harvard, Vancouver, ISO, and other styles

Books on the topic "Audio-visual attention"

1

The power of multisensory preaching and teaching: Increase attention, comprehension, and retention. Grand Rapids, Mich: Zondervan, 2008.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

2

Schacher, Jan C. Algorithmic Spatialization. Edited by Roger T. Dean and Alex McLean. Oxford University Press, 2018. http://dx.doi.org/10.1093/oxfordhb/9780190226992.013.12.

Full text

Abstract:

Beginning with a brief historical overview of spatial audio and music practices, this chapter looks at principles of sound spatialization, algorithms for composing and rendering spatial sound and music, and different techniques of spatial source positioning and sound space manipulation. These operations include composing with abstract objects in a sound scene, creating compound sounds using source clusters, altering spatial characteristics by means of spectral sound decomposition, and the manipulation of artificial acoustic spaces. The chapter goes on to discuss practical issues of live spatialization and, through an example piece, the ways a number of different algorithms collaborate in the constitution of a generative audio-visual installation with surround audio and video. Finally, the challenges and pitfalls of using spatialization and some of the common reasons for failure are brought to attention.

APA, Harvard, Vancouver, ISO, and other styles

3

Colmeiro, José. Peripheral Visions / Global Sounds. Liverpool University Press, 2018. http://dx.doi.org/10.5949/liverpool/9781786940308.001.0001.

Full text

Abstract:

Galician audio/visual culture has experienced an unprecedented period of growth following the process of political and cultural devolution in post-Franco Spain. This creative explosion has occurred in a productive dialogue with global currents and with considerable projection beyond the geopolitical boundaries of the nation and the state, but these seismic changes are only beginning to be the subject of attention of cultural and media studies. This book examines contemporary audio/visual production in Galicia as privileged channels through which modern Galician cultural identities have been imagined, constructed and consumed, both at home and abroad. The cultural redefinition of Galicia in the global age is explored through different media texts (popular music, cinema, video) which cross established boundaries and deterritorialise new border zones where tradition and modernity dissolve, generating creative tensions between the urban and the rural, the local and the global, the real and the imagined. The book aims for the deperipheralization and deterritorialization of the Galician cultural map by overcoming long-established hegemonic exclusions, whether based on language, discipline, genre, gender, origins, or territorial demarcation, while aiming to disjoint the center/periphery dichotomy that has relegated Galician culture to the margins. In essence, it is an attempt to resituate Galicia and Galician studies out of the periphery and open them to the world.

APA, Harvard, Vancouver, ISO, and other styles

4

Cruz, Gabriela. Grand Illusion. Oxford University Press, 2020. http://dx.doi.org/10.1093/oso/9780190915056.001.0001.

Full text

Abstract:

Grand Illusion is a new history of grand opera as an art of illusion facilitated by the introduction of gaslight illumination at the Académie Royale de Musique (Paris) in the 1820s. It contends that gas lighting and the technologies of illusion used in the theater after the 1820s spurred the development of a new lyrical art, attentive to the conditions of darkness and radiance, and inspired by the model of phantasmagoria. Karl Marx, Walter Benjamin, and Theodor Adorno have used the concept of phantasmagoria to arrive at a philosophical understanding of modern life as total spectacle, in which the appearance of things supplants their reality. The book argues that the Académie became an early laboratory for this historical process of commodification, for the transformation of opera into an audio-visual spectacle delivering dream-like images. It shows that this transformation began in Paris and then defined opera after the mid-century. In the hands of Giacomo Meyerbeer (Robert le diable, L’Africaine), Richard Wagner (Der fliegende Holländer, Lohengrin, and Tristan und Isolde), and Giuseppe Verdi (Aida), opera became an expanded form of phantasmagoria.

APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Audio-visual attention"

1

Fang, Yinghong, Junpeng Zhang, and Cewu Lu. "Attention-Based Audio-Visual Fusion for Video Summarization." In Neural Information Processing, 328–40. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-36711-4_28.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Schauerte, Boris. "Bottom-Up Audio-Visual Attention for Scene Exploration." In Cognitive Systems Monographs, 35–113. Cham: Springer International Publishing, 2016. http://dx.doi.org/10.1007/978-3-319-33796-8_3.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Pleshkova, Snejana, and Alexander Bekiarski. "Audio Visual Attention Models in the Mobile Robots Navigation." In New Approaches in Intelligent Image Analysis, 253–94. Cham: Springer International Publishing, 2016. http://dx.doi.org/10.1007/978-3-319-32192-9_8.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Lin, Yan-Bo, and Yu-Chiang Frank Wang. "Audiovisual Transformer with Instance Attention for Audio-Visual Event Localization." In Computer Vision – ACCV 2020, 274–90. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-69544-6_17.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Mercea, Otniel-Bogdan, Thomas Hummel, A. Sophia Koepke, and Zeynep Akata. "Temporal and Cross-modal Attention for Audio-Visual Zero-Shot Learning." In Lecture Notes in Computer Science, 488–505. Cham: Springer Nature Switzerland, 2022. http://dx.doi.org/10.1007/978-3-031-20044-1_28.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Sun, Zhongbo, Yannan Wang, and Li Cao. "An Attention Based Speaker-Independent Audio-Visual Deep Learning Model for Speech Enhancement." In MultiMedia Modeling, 722–28. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-37734-2_60.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Tzinis, Efthymios, Scott Wisdom, Tal Remez, and John R. Hershey. "AudioScopeV2: Audio-Visual Attention Architectures for Calibrated Open-Domain On-Screen Sound Separation." In Lecture Notes in Computer Science, 368–85. Cham: Springer Nature Switzerland, 2022. http://dx.doi.org/10.1007/978-3-031-19836-6_21.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Chen, Chin-Ling, Yung-Wen Tang, Yong-Feng Zhou, and Yue-Xun Chen. "Development of Audio and Visual Attention Assessment System in Combination with Brain Wave Instrument: Apply to Children with Attention Deficit Hyperactivity Disorder." In Advances in Intelligent Systems and Computing, 153–61. Singapore: Springer Singapore, 2017. http://dx.doi.org/10.1007/978-981-10-6487-6_19.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Yeromin, Mykola Borysovych, and Igor Charskykh. "Universal and Specific Codes of Cultural Context in Audio-Visual Media." In Cross-Cultural Perspectives on Technology-Enhanced Language Learning, 68–82. IGI Global, 2018. http://dx.doi.org/10.4018/978-1-5225-5463-9.ch004.

Full text

Abstract:

Mission of the chapter is to draw the attention to how specific and universal cultural contexts influence audio-visual media used in technology-enhanced language learning (TELL) and how additional efforts in this area from both faculty and students might give very satisfying and rich results, both drawing from cultural differences to ensure the mutual enrichment and appealing to universal basic principles that could be understood in different cultures more or less similarly and/or identical. As audio-visual media nowadays finds its way as a large area of the internet, filtering what is suitable for TELL and what might not be depends a lot on cultural context of media, which should be chosen wisely depending on situation and curriculum. Also included are the recommendations, based on authors' experience in the field of study, and a vast array of background information.

APA, Harvard, Vancouver, ISO, and other styles

10

Abankwah, Ruth Mpatawuwa. "Managing Audio-Visual Resources in Selected Developed and Developing Countries." In Handbook of Research on Heritage Management and Preservation, 126–49. IGI Global, 2018. http://dx.doi.org/10.4018/978-1-5225-3137-1.ch007.

Full text

Abstract:

This chapter emphasises that audio-visual (AV) resources are very fragile and need to be stored in ideal conditions to preserve them for posterity. It describes different types of AV materials and the conditions under which they should be kept. It is based on a study that was conducted in the Eastern and Southern Africa Regional Branch of the International Council on Archives (ESARBICA) region. Data were gathered using quantitative and qualitative methods. The results revealed lack of equipment to monitor environmental conditions, absence of policies to govern the acquisition, appraisal, access, preservation, retention, digitisation and disposal of AV materials, and failure to apply the records life cycle (or any model) to AV records. The results point to a need for national archives to develop guidelines that apply to AV materials particularly in Africa. Particular attention should be given to training AV archivists in the region using an integrated curriculum.

APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Audio-visual attention"

1

Cheng, Shuaiyang, Xing Gao, Liang Song, and Jianbing Xiahou. "Audio-Visual Salieny Network with Audio Attention Module." In ICAIIS 2021: 2021 2nd International Conference on Artificial Intelligence and Information Systems. New York, NY, USA: ACM, 2021. http://dx.doi.org/10.1145/3469213.3470254.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Zhang, Wen, and Jie Shao. "Multi-Attention Audio-Visual Fusion Network for Audio Spatialization." In ICMR '21: International Conference on Multimedia Retrieval. New York, NY, USA: ACM, 2021. http://dx.doi.org/10.1145/3460426.3463624.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Lee, Jong-Seok, Francesca De Simone, and Touradj Ebrahimi. "Video coding based on audio-visual attention." In 2009 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 2009. http://dx.doi.org/10.1109/icme.2009.5202435.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Lee, Jiyoung, Sunok Kim, Seungryong Kim, and Kwanghoon Sohn. "Audio-Visual Attention Networks for Emotion Recognition." In MM '18: ACM Multimedia Conference. New York, NY, USA: ACM, 2018. http://dx.doi.org/10.1145/3264869.3264873.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Chianese, Angelo, Vincenzo Moscato, Antonio Penta, and Antonio Picariello. "Scene Detection using Visual and Audio Attention." In 1st International ICST Conference on Ambient Media and Systems. ICST, 2008. http://dx.doi.org/10.4108/icst.ambisys2008.2828.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

SUGANO, Y., and S. IWAMIYA. "THE EFFECTS OF AUDIO—VISUAL SYNCHRONIZATION ON THE ATTENTION TO THE AUDIO—VISUAL MATERIALS." In MMM 2000. WORLD SCIENTIFIC, 2000. http://dx.doi.org/10.1142/9789812791993_0001.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Just, N., M. Laabs, E. Unver, B. Gunel, S. Worrall, and A. M. Kondoz. "AVISION Audio and visual attention models applied to 2D and 3D audio-visual content." In 2011 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB). IEEE, 2011. http://dx.doi.org/10.1109/bmsb.2011.5954949.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Li, Chenda, and Yanmin Qian. "Deep Audio-Visual Speech Separation with Attention Mechanism." In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020. http://dx.doi.org/10.1109/icassp40776.2020.9054180.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Wu, Yu, Linchao Zhu, Yan Yan, and Yi Yang. "Dual Attention Matching for Audio-Visual Event Localization." In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, 2019. http://dx.doi.org/10.1109/iccv.2019.00639.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Guo, Ningning, Huaping Liu, and Linhua Jiang. "Attention-based Visual-Audio Fusion for Video Caption Generation." In 2019 IEEE 4th International Conference on Advanced Robotics and Mechatronics (ICARM). IEEE, 2019. http://dx.doi.org/10.1109/icarm.2019.8834066.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "Audio-visual attention"

1

Tarasenko, Rostyslav O., Svitlana M. Amelina, Yuliya M. Kazhan, and Olga V. Bondarenko. The use of AR elements in the study of foreign languages at the university. CEUR Workshop Proceedings, November 2020. http://dx.doi.org/10.31812/123456789/4421.

Full text

Abstract:

The article deals with the analysis of the impact of the using AR technology in the study of a foreign language by university students. It is stated out that AR technology can be a good tool for learning a foreign language. The use of elements of AR in the course of studying a foreign language, in particular in the form of virtual excursions, is proposed. Advantages of using AR technology in the study of the German language are identified, namely: the possibility of involvement of different channels of information perception, the integrity of the representation of the studied object, the faster and better memorization of new vocabulary, the development of communicative foreign language skills. The ease and accessibility of using QR codes to obtain information about the object of study from open Internet sources is shown. The results of a survey of students after virtual tours are presented. A reorientation of methodological support for the study of a foreign language at universities is proposed. Attention is drawn to the use of AR elements in order to support students with different learning styles (audio, visual, kinesthetic).

APA, Harvard, Vancouver, ISO, and other styles

2

Tarasenko, Rostyslav O., Svitlana M. Amelina, Yuliya M. Kazhan, and Olga V. Bondarenko. The use of AR elements in the study of foreign languages at the university. CEUR Workshop Proceedings, November 2020. http://dx.doi.org/10.31812/123456789/4421.

Full text

Abstract:

The article deals with the analysis of the impact of the using AR technology in the study of a foreign language by university students. It is stated out that AR technology can be a good tool for learning a foreign language. The use of elements of AR in the course of studying a foreign language, in particular in the form of virtual excursions, is proposed. Advantages of using AR technology in the study of the German language are identified, namely: the possibility of involvement of different channels of information perception, the integrity of the representation of the studied object, the faster and better memorization of new vocabulary, the development of communicative foreign language skills. The ease and accessibility of using QR codes to obtain information about the object of study from open Internet sources is shown. The results of a survey of students after virtual tours are presented. A reorientation of methodological support for the study of a foreign language at universities is proposed. Attention is drawn to the use of AR elements in order to support students with different learning styles (audio, visual, kinesthetic).

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!