Log in

Relevant bibliographies by topics / Audio-visual attention / Journal articles

Journal articles on the topic 'Audio-visual attention'

To see the other types of publications on this topic, follow the link: Audio-visual attention.

Author: Grafiati

Published: 9 March 2023

Last updated: 10 March 2023

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Audio-visual attention.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Chen, Yanxiang, Tam V. Nguyen, Mohan Kankanhalli, Jun Yuan, Shuicheng Yan, and Meng Wang. "Audio Matters in Visual Attention." IEEE Transactions on Circuits and Systems for Video Technology 24, no. 11 (November 2014): 1992–2003. http://dx.doi.org/10.1109/tcsvt.2014.2329380.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Lee, Yong-Hyeok, Dong-Won Jang, Jae-Bin Kim, Rae-Hong Park, and Hyung-Min Park. "Audio–Visual Speech Recognition Based on Dual Cross-Modality Attentions with the Transformer Model." Applied Sciences 10, no. 20 (October 17, 2020): 7263. http://dx.doi.org/10.3390/app10207263.

Full text

Abstract:

Since attention mechanism was introduced in neural machine translation, attention has been combined with the long short-term memory (LSTM) or replaced the LSTM in a transformer model to overcome the sequence-to-sequence (seq2seq) problems with the LSTM. In contrast to the neural machine translation, audio–visual speech recognition (AVSR) may provide improved performance by learning the correlation between audio and visual modalities. As a result that the audio has richer information than the video related to lips, AVSR is hard to train attentions with balanced modalities. In order to increase the role of visual modality to a level of audio modality by fully exploiting input information in learning attentions, we propose a dual cross-modality (DCM) attention scheme that utilizes both an audio context vector using video query and a video context vector using audio query. Furthermore, we introduce a connectionist-temporal-classification (CTC) loss in combination with our attention-based model to force monotonic alignments required in AVSR. Recognition experiments on LRS2-BBC and LRS3-TED datasets showed that the proposed model with the DCM attention scheme and the hybrid CTC/attention architecture achieved at least a relative improvement of 7.3% on average in the word error rate (WER) compared to competing methods based on the transformer model.

APA, Harvard, Vancouver, ISO, and other styles

3

Iwaki, Sunao, Mitsuo Tonoike, Masahiko Yamaguchi, and Takashi Hamada. "Modulation of extrastriate visual processing by audio-visual intermodal selective attention." NeuroImage 11, no. 5 (May 2000): S21. http://dx.doi.org/10.1016/s1053-8119(00)90956-x.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

NAGASAKI, Yoshiki, Masaki HAYASHI, Naoshi KANEKO, and Yoshimitsu AOKI. "Temporal Cross-Modal Attention for Audio-Visual Event Localization." Journal of the Japan Society for Precision Engineering 88, no. 3 (March 5, 2022): 263–68. http://dx.doi.org/10.2493/jjspe.88.263.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Xuan, Hanyu, Zhenyu Zhang, Shuo Chen, Jian Yang, and Yan Yan. "Cross-Modal Attention Network for Temporal Inconsistent Audio-Visual Event Localization." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 01 (April 3, 2020): 279–86. http://dx.doi.org/10.1609/aaai.v34i01.5361.

Full text

Abstract:

In human multi-modality perception systems, the benefits of integrating auditory and visual information are extensive as they provide plenty supplementary cues for understanding the events. Despite some recent methods proposed for such application, they cannot deal with practical conditions with temporal inconsistency. Inspired by human system which puts different focuses at specific locations, time segments and media while performing multi-modality perception, we provide an attention-based method to simulate such process. Similar to human mechanism, our network can adaptively select “where” to attend, “when” to attend and “which” to attend for audio-visual event localization. In this way, even with large temporal inconsistent between vision and audio, our network is able to adaptively trade information between different modalities and successfully achieve event localization. Our method achieves state-of-the-art performance on AVE (Audio-Visual Event) dataset collected in the real life. In addition, we also systemically investigate audio-visual event localization tasks. The visualization results also help us better understand how our model works.

APA, Harvard, Vancouver, ISO, and other styles

6

Iwaki, Sunao. "Audio-visual intermodal orientation of attention modulates task-specific extrastriate visual processing." Neuroscience Research 68 (January 2010): e269. http://dx.doi.org/10.1016/j.neures.2010.07.1195.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Keitel, Christian, and Matthias M. Müller. "Audio-visual synchrony and feature-selective attention co-amplify early visual processing." Experimental Brain Research 234, no. 5 (August 1, 2015): 1221–31. http://dx.doi.org/10.1007/s00221-015-4392-8.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Zhu, Hao, Man-Di Luo, Rui Wang, Ai-Hua Zheng, and Ran He. "Deep Audio-visual Learning: A Survey." International Journal of Automation and Computing 18, no. 3 (April 15, 2021): 351–76. http://dx.doi.org/10.1007/s11633-021-1293-0.

Full text

Abstract:

AbstractAudio-visual learning, aimed at exploiting the relationship between audio and visual modalities, has drawn considerable attention since deep learning started to be used successfully. Researchers tend to leverage these two modalities to improve the performance of previously considered single-modality tasks or address new challenging problems. In this paper, we provide a comprehensive survey of recent audio-visual learning development. We divide the current audio-visual learning tasks into four different subfields: audio-visual separation and localization, audio-visual correspondence learning, audio-visual generation, and audio-visual representation learning. State-of-the-art methods, as well as the remaining challenges of each subfield, are further discussed. Finally, we summarize the commonly used datasets and challenges.

APA, Harvard, Vancouver, ISO, and other styles

9

Ran, Yue, Hongying Tang, Baoqing Li, and Guohui Wang. "Self-Supervised Video Representation and Temporally Adaptive Attention for Audio-Visual Event Localization." Applied Sciences 12, no. 24 (December 9, 2022): 12622. http://dx.doi.org/10.3390/app122412622.

Full text

Abstract:

Localizing the audio-visual events in video requires a combined judgment of visual and audio components. To integrate multimodal information, existing methods modeled the cross-modal relationships by feeding unimodal features into attention modules. However, these unimodal features are encoded in separate spaces, resulting in a large heterogeneity gap between modalities. Existing attention modules, on the other hand, ignore the temporal asynchrony between vision and hearing when constructing cross-modal connections, which may lead to the misinterpretation of one modality by another. Therefore, this paper aims to improve event localization performance by addressing these two problems and proposes a framework that feeds audio and visual features encoded in the same semantic space into a temporally adaptive attention module. Specifically, we develop a self-supervised representation method to encode features with a smaller heterogeneity gap by matching corresponding semantic cues between synchronized audio and visual signals. Furthermore, we develop a temporally adaptive cross-modal attention based on a weighting method that dynamically channels attention according to the time differences between event-related features. The proposed framework achieves state-of-the-art performance on the public audio-visual event dataset and the experimental results not only show that our self-supervised method can learn more discriminative features but also verify the effectiveness of our strategy for assigning attention.

APA, Harvard, Vancouver, ISO, and other styles

10

Zhao, Sicheng, Yunsheng Ma, Yang Gu, Jufeng Yang, Tengfei Xing, Pengfei Xu, Runbo Hu, Hua Chai, and Kurt Keutzer. "An End-to-End Visual-Audio Attention Network for Emotion Recognition in User-Generated Videos." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 01 (April 3, 2020): 303–11. http://dx.doi.org/10.1609/aaai.v34i01.5364.

Full text

Abstract:

Emotion recognition in user-generated videos plays an important role in human-centered computing. Existing methods mainly employ traditional two-stage shallow pipeline, i.e. extracting visual and/or audio features and training classifiers. In this paper, we propose to recognize video emotions in an end-to-end manner based on convolutional neural networks (CNNs). Specifically, we develop a deep Visual-Audio Attention Network (VAANet), a novel architecture that integrates spatial, channel-wise, and temporal attentions into a visual 3D CNN and temporal attentions into an audio 2D CNN. Further, we design a special classification loss, i.e. polarity-consistent cross-entropy loss, based on the polarity-emotion hierarchy constraint to guide the attention generation. Extensive experiments conducted on the challenging VideoEmotion-8 and Ekman-6 datasets demonstrate that the proposed VAANet outperforms the state-of-the-art approaches for video emotion recognition. Our source code is released at: https://github.com/maysonma/VAANet.

APA, Harvard, Vancouver, ISO, and other styles

11

Weijkamp, Janne, and Makiko Sadakata. "Attention to affective audio-visual information: Comparison between musicians and non-musicians." Psychology of Music 45, no. 2 (July 7, 2016): 204–15. http://dx.doi.org/10.1177/0305735616654216.

Full text

Abstract:

Individuals with more musical training repeatedly demonstrate enhanced auditory perception abilities. The current study examined how these enhanced auditory skills interact with attention to affective audio-visual stimuli. A total of 16 participants with more than 5 years of musical training (musician group) and 16 participants with less than 2 years of musical training (non-musician group) took part in a version of the audio-visual emotional Stroop test, using happy, neutral, and sad emotions. Participants were presented with congruent and incongruent combinations of face and voice stimuli while judging the emotion of either the face or the voice. As predicted, musicians were less susceptible to interference from visual information on auditory emotion judgments than non-musicians, as evidenced by musicians being more accurate when judging auditory emotions when presented with congruent and incongruent visual information. Musicians were also more accurate than non-musicians at identifying visual emotions when presented with concurrent auditory information. Thus, musicians were less influenced by congruent/incongruent information in a non-target modality compared to non-musicians. The results suggest that musical training influences audio-visual information processing.

APA, Harvard, Vancouver, ISO, and other styles

12

Bouchara, Tifanie, and Brian F. G. Katz. "Redundancy gains in audio–visual search." Seeing and Perceiving 25 (2012): 181. http://dx.doi.org/10.1163/187847612x648116.

Full text

Abstract:

This study concerns stimuli-driven perceptual processes involved in target search among concurrent distractors with a focus on comparing auditory, visual, and audio–visual search tasks. Previous works, concerning unimodal search tasks, highlighted different preattentive features that can enhance target saliency, making it ‘pop-out’, e.g., a visually sharp target among blurred distractors. A cue from another modality can also help direct attention towards the target. Our study investigates a new kind of search task, where stimuli consist of audio–visual objects presented using both audio and visual modalities simultaneously. Redundancy effects are evaluated, first from the combination of audio and visual modalities, second from the combination of each unimodal cue in such a bimodal search task. A perceptual experiment was performed where the task was to identify an audio–visual object from a set of six competing stimuli. We employed static visual blur and developed an auditory blur analogue to cue the search. Results show that both visual and auditory blurs render distractors less prominent and automatically attracts attention toward a sharp target. The combination of both unimodal blurs, i.e., audio–visual blur, also proved to be an efficient cue to facilitate bimodal search task. Results also showed that search tasks were performed faster in redundant bimodal conditions than in unimodal ones. That gain was due to redundant target effect only without any redundancy gain due to the cue combination, as solely cueing the visual component was sufficient, with no improvement found by the addition of the redundant audio cue in bimodal search tasks.

APA, Harvard, Vancouver, ISO, and other styles

13

Lee, Byoung-Gi, Jong-Suk Choi, Sang-Suk Yoon, Mun-Taek Choi, Mun-Sang Kim, and Dai-Jin Kim. "Audio-Visual Fusion for Sound Source Localization and Improved Attention." Transactions of the Korean Society of Mechanical Engineers A 35, no. 7 (July 1, 2011): 737–43. http://dx.doi.org/10.3795/ksme-a.2011.35.7.737.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Ikumi, Nara, and Salvador Soto-Faraco. "Selective Attention Modulates the Direction of Audio-Visual Temporal Recalibration." PLoS ONE 9, no. 7 (July 8, 2014): e99311. http://dx.doi.org/10.1371/journal.pone.0099311.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

Lee, Jong-Seok, Francesca De Simone, and Touradj Ebrahimi. "Efficient video coding based on audio-visual focus of attention." Journal of Visual Communication and Image Representation 22, no. 8 (November 2011): 704–11. http://dx.doi.org/10.1016/j.jvcir.2010.11.002.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

FAGIOLI, S., A. COUYOUMDJIAN, and F. FERLAZZO. "Audio-visual dynamic remapping in an endogenous spatial attention task." Behavioural Brain Research 173, no. 1 (October 2, 2006): 30–38. http://dx.doi.org/10.1016/j.bbr.2006.05.030.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

Chen, Minran, Song Zhao, Jiaqi Yu, Xuechen Leng, Mengdie Zhai, Chengzhi Feng, and Wenfeng Feng. "Audiovisual Emotional Congruency Modulates the Stimulus-Driven Cross-Modal Spread of Attention." Brain Sciences 12, no. 9 (September 10, 2022): 1229. http://dx.doi.org/10.3390/brainsci12091229.

Full text

Abstract:

It has been reported that attending to stimuli in visual modality can spread to task-irrelevant but synchronously presented stimuli in auditory modality, a phenomenon termed the cross-modal spread of attention, which could be either stimulus-driven or representation-driven depending on whether the visual constituent of an audiovisual object is further selected based on the object representation. The stimulus-driven spread of attention occurs whenever a task-irrelevant sound synchronizes with an attended visual stimulus, regardless of the cross-modal semantic congruency. The present study recorded event-related potentials (ERPs) to investigate whether the stimulus-driven cross-modal spread of attention could be modulated by audio-visual emotional congruency in a visual oddball task where emotion (positive/negative) was task-irrelevant. The results first demonstrated a prominent stimulus-driven spread of attention regardless of audio-visual emotional congruency by showing that for all audiovisual pairs, the extracted ERPs to the auditory constituents of audiovisual stimuli within the time window of 200–300 ms were significantly larger than ERPs to the same auditory stimuli delivered alone. However, the amplitude of this stimulus-driven auditory Nd component during 200–300 ms was significantly larger for emotionally incongruent than congruent audiovisual stimuli when their visual constituents’ emotional valences were negative. Moreover, the Nd was sustained during 300–400 ms only for the incongruent audiovisual stimuli with emotionally negative visual constituents. These findings suggest that although the occurrence of the stimulus-driven cross-modal spread of attention is independent of audio-visual emotional congruency, its magnitude is nevertheless modulated even when emotion is task-irrelevant.

APA, Harvard, Vancouver, ISO, and other styles

18

Glumm, Monica M., Kathy L. Kehring, and Timothy L. White. "Effects of Visual and Auditory Cues About Threat Location on Target Acquisition and Attention to Auditory Communications." Proceedings of the Human Factors and Ergonomics Society Annual Meeting 49, no. 3 (September 2005): 347–51. http://dx.doi.org/10.1177/154193120504900328.

Full text

Abstract:

This laboratory study examined the effects of visual, spatial language, and 3-D audio cues about target location on target acquisition performance and the recall of information contained in concurrent radio communications. Two baseline conditions were also included in the analysis: no cues (baseline 1) and target presence cues only (baseline 2). In modes in which target location cues were provided, 100% of the targets presented were acquired compared to 94% in baseline 1 and 95% in baseline 2. On average, targets were acquired 1.4 seconds faster in the visual, spatial language, and 3-D audio modes than in the baseline conditions, with times in the visual and 3-D audio modes being 1 second faster than those in spatial language. Overall workload scores were lower in the 3-D audio mode than in all other conditions except the visual mode. Less information (23%) was recalled from auditory communications in baseline 1 than in the other four conditions where attention could be directed to communications between target presentations.

APA, Harvard, Vancouver, ISO, and other styles

19

Li, Yidi, Hong Liu, and Hao Tang. "Multi-Modal Perception Attention Network with Self-Supervised Learning for Audio-Visual Speaker Tracking." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 2 (June 28, 2022): 1456–63. http://dx.doi.org/10.1609/aaai.v36i2.20035.

Full text

Abstract:

Multi-modal fusion is proven to be an effective method to improve the accuracy and robustness of speaker tracking, especially in complex scenarios. However, how to combine the heterogeneous information and exploit the complementarity of multi-modal signals remains a challenging issue. In this paper, we propose a novel Multi-modal Perception Tracker (MPT) for speaker tracking using both audio and visual modalities. Specifically, a novel acoustic map based on spatial-temporal Global Coherence Field (stGCF) is first constructed for heterogeneous signal fusion, which employs a camera model to map audio cues to the localization space consistent with the visual cues. Then a multi-modal perception attention network is introduced to derive the perception weights that measure the reliability and effectiveness of intermittent audio and video streams disturbed by noise. Moreover, a unique cross-modal self-supervised learning method is presented to model the confidence of audio and visual observations by leveraging the complementarity and consistency between different modalities. Experimental results show that the proposed MPT achieves 98.6% and 78.3% tracking accuracy on the standard and occluded datasets, respectively, which demonstrates its robustness under adverse conditions and outperforms the current state-of-the-art methods.

APA, Harvard, Vancouver, ISO, and other styles

20

Wang, Chunxiao, Jingjing Zhang, Wei Jiang, and Shuang Wang. "A Deep Multimodal Model for Predicting Affective Responses Evoked by Movies Based on Shot Segmentation." Security and Communication Networks 2021 (September 28, 2021): 1–12. http://dx.doi.org/10.1155/2021/7650483.

Full text

Abstract:

Predicting the emotions evoked in a viewer watching movies is an important research element in affective video content analysis over a wide range of applications. Generally, the emotion of the audience is evoked by the combined effect of the audio-visual messages of the movies. Current research has mainly used rough middle- and high-level audio and visual features to predict experienced emotions, but combining semantic information to refine features to improve emotion prediction results is still not well studied. Therefore, on the premise of considering the time structure and semantic units of a movie, this paper proposes a shot-based audio-visual feature representation method and a long short-term memory (LSTM) model incorporating a temporal attention mechanism for experienced emotion prediction. First, the shot-based audio-visual feature representation defines a method for extracting and combining audio and visual features of each shot clip, and the advanced pretraining models in the related audio-visual tasks are used to extract the audio and visual features with different semantic levels. Then, four components are included in the prediction model: a nonlinear multimodal feature fusion layer, a temporal feature capture layer, a temporal attention layer, and a sentiment prediction layer. This paper focuses on experienced emotion prediction and evaluates the proposed method on the extended COGNIMUSE dataset. The method performs significantly better than the state-of-the-art while significantly reducing the number of calculations, with increases in the Pearson correlation coefficient (PCC) from 0.46 to 0.62 for arousal and from 0.18 to 0.34 for valence in experienced emotion.

APA, Harvard, Vancouver, ISO, and other styles

21

Patten, Elena, Linda R. Watson, and Grace T. Baranek. "Temporal Synchrony Detection and Associations with Language in Young Children with ASD." Autism Research and Treatment 2014 (2014): 1–8. http://dx.doi.org/10.1155/2014/678346.

Full text

Abstract:

Temporally synchronous audio-visual stimuli serve to recruit attention and enhance learning, including language learning in infants. Although few studies have examined this effect on children with autism, it appears that the ability to detect temporal synchrony between auditory and visual stimuli may be impaired, particularly given social-linguistic stimuli delivered via oral movement and spoken language pairings. However, children with autism can detect audio-visual synchrony given nonsocial stimuli (objects dropping and their corresponding sounds). We tested whether preschool children with autism could detect audio-visual synchrony given video recordings of linguistic stimuli paired with movement of related toys in the absence of faces. As a group, children with autism demonstrated the ability to detect audio-visual synchrony. Further, the amount of time they attended to the synchronous condition was positively correlated with receptive language. Findings suggest that object manipulations may enhance multisensory processing in linguistic contexts. Moreover, associations between synchrony detection and language development suggest that better processing of multisensory stimuli may guide and direct attention to communicative events thus enhancing linguistic development.

APA, Harvard, Vancouver, ISO, and other styles

22

Zhang, Jingran, Xing Xu, Fumin Shen, Huimin Lu, Xin Liu, and Heng Tao Shen. "Enhancing Audio-Visual Association with Self-Supervised Curriculum Learning." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 4 (May 18, 2021): 3351–59. http://dx.doi.org/10.1609/aaai.v35i4.16447.

Full text

Abstract:

The recent success of audio-visual representations learning can be largely attributed to their pervasive concurrency property, which can be used as a self-supervision signal and extract correlation information. While most recent works focus on capturing the shared associations between the audio and visual modalities, they rarely consider multiple audio and video pairs at once and pay little attention to exploiting the valuable information within each modality. To tackle this problem, we propose a novel audio-visual representation learning method dubbed self-supervised curriculum learning (SSCL) under the teacher-student learning manner. Specifically, taking advantage of contrastive learning, a two-stage scheme is exploited, which transfers the cross-modal information between teacher and student model as a phased process. The proposed SSCL approach regards the pervasive property of audiovisual concurrency as latent supervision and mutually distills the structure knowledge of visual to audio data. Notably, the SSCL method can learn discriminative audio and visual representations for various downstream applications. Extensive experiments conducted on both action video recognition and audio sound recognition tasks show the remarkably improved performance of the SSCL method compared with the state-of-the-art self-supervised audio-visual representation learning methods.

APA, Harvard, Vancouver, ISO, and other styles

23

Kováčová, Michaela, and Martina Martausová. "Audio-Visual Culture in Textbooks of German as a Foreign Language: A Crossroads Between Media Competence and Subject-Specific Objectives." Studia Universitatis Babeș-Bolyai Philologia 67, no. 2 (June 30, 2022): 311–28. http://dx.doi.org/10.24193/subbphilo.2022.2.18.

Full text

Abstract:

"Audio-visual media have become an intrinsic part of the school curriculum in many countries as a means of complementing teaching and learning processes and ensuring modern and effective teaching in various fields of study, including foreign languages. This study will explore references to audio-visual culture in three internationally available textbook sets for pubescent learners of German as a foreign language: Deutsch.com, Direkt, and Ideen. Audio-vision is used here to refer to audio-visual cultural properties and includes the product and its production, reception, and position within the context of the media culture of the country of origin. Methodologically, the study uses content analysis to examine individual references to audio-vision according to a) the frequency of occurrence at individual levels of language competency, b) the implication of the examined references to audio-visual material with objectives specific to the acquisition of a foreign language (vocabulary, grammar, listening, reading, speaking, intercultural education) and media literacy c) the attention paid to films, genres and other aspects of audio-vision, d) the connection between references to audio-vision and themes and topics discussed in a foreign language, e) the didactic function of the references, and f) their contribution to achieving media literacy goals (audio-visual/film literacy), providing a detailed description of the lessons in each of the studied textbook sets that make use of references to audio-visual material in their curricula. Keywords: audio-visual literacy, audio-vision, German language, textbooks, foreign language teaching "

APA, Harvard, Vancouver, ISO, and other styles

24

Purcell, Kevin P., and Anthony D. Andre. "Effects of Visual and Audio Callouts on Pilot Visual Attention during Electronic Moving Map Use." Proceedings of the Human Factors and Ergonomics Society Annual Meeting 44, no. 13 (July 2000): 108. http://dx.doi.org/10.1177/154193120004401339.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Li, Ning, and Linda Ng Boyle. "Allocation of Driver Attention for Varying In-Vehicle System Modalities." Human Factors: The Journal of the Human Factors and Ergonomics Society 62, no. 8 (December 30, 2019): 1349–64. http://dx.doi.org/10.1177/0018720819879585.

Full text

Abstract:

Objective This paper examines drivers’ allocation of attention using response time to a tactile detection response task (TDRT) while interacting with an in-vehicle information system (IVIS) over time. Background Longer TDRT response time is associated with higher cognitive workload. However, it is not clear what role is assumed by the human and system in response to varying in-vehicle environments over time. Method A driving simulator study with 24 participants was conducted with a restaurant selection task of two difficulty levels (easy and hard) presented in three modalities (audio only, visual only, hybrid). A linear mixed-effects model was applied to identify factors that affect TDRT response time. A nonparametric time-series model was also used to explore the visual attention allocation under the hybrid mode over time. Results The visual-only mode significantly increased participants’ response time compared with the audio-only mode. Females took longer to respond to the TDRT when engaged with an IVIS. The study showed that participants tend to use the visual component more toward the end of the easy tasks, whereas the visual mode was used more at the beginning of the harder tasks. Conclusion The visual-only mode of the IVIS increased drivers’ cognitive workload when compared with the auditory-only mode. Drivers showed different visual attention allocation during the easy and hard restaurant selection tasks in the hybrid mode. Application The findings can help guide the design of automotive user interfaces and help manage cognitive workload.

APA, Harvard, Vancouver, ISO, and other styles

26

Mashannudin, Mashannudin. "PENERAPAN METODE DEMONTRASI BERBANTUAN MEDIA AUDIO VISUAL UNTUK MENINGKATKAN PERHATIAN DAN PRESTASI BELAJAR." Diadik: Jurnal Ilmiah Teknologi Pendidikan 10, no. 1 (September 30, 2021): 93–100. http://dx.doi.org/10.33369/diadik.v10i1.18113.

Full text

Abstract:

This study aims to describe the application of demonstration methods assisted by Audio Visual media to increase student attention and learning achievement. The research conducted was Classroom Action Research which was carried out in three cycles, each cycle through the stages of planning, implementation, observation and reflection. The subjects of the study were grade VI students of SD Negeri 36 Lahat. The instruments used were observation sheets and test sheets. Data collection techniques by observation and test, while the experimental class only tested the demonstration method. Data analysis was performed using a t test and a classical average value. Based on the analysis of the data of this study it was concluded that (1) The application of an audio visual assisted demonstration method can increase student attention; (2) The application of an audio visual assisted demonstration method can improve student learning achievement; (3) There is an effective use of audio visual assisted demonstration methods to improve student achievement in natural science subjects grade VI at 36 Public Schools in Lahat

APA, Harvard, Vancouver, ISO, and other styles

27

Ramezanzade, Hesam. "Adding Acoustical to Visual Movement Patterns to Retest Whether Imitation Is Goal- or Pattern-Directed." Perceptual and Motor Skills 127, no. 1 (August 29, 2019): 225–47. http://dx.doi.org/10.1177/0031512519870418.

Full text

Abstract:

This study compared two different motor skill modeling presentations (with and without goal display) in visual and audio-visual conditions for learning a complex skill (basketball jump shot) to evaluate the importance of (a) audio information and (b) goal observation in motor performance kinematics. Specifically, we sought to understand whether the simultaneous presentation of auditory and visual patterns could usefully direct the learner’s attention from goal to pattern stimuli. I selected 40 students ( Mage = 20.47 years) who had no prior experience with the basketball jump shot or free throw and randomly assigned them into four groups: Pattern/Visual, Pattern/Audio-Visual, Pattern-Goal/Visual, and Pattern-Goal/Audio-Visual. Participants in the pattern-only groups watched only the skilled motor pattern, while those in the pattern-goal groups watched both the pattern and its outcome. Participants in the visual-only groups simply watched the visual pattern, while those in audio-visual groups saw and heard the pattern; we sonified the angular velocity of the skilled performer’s elbow joint. Participants then performed in two conditions with and without balls. On all dependent variables, the participants’ performance following the audio-visual presentations was better than when following the visual-only presentations. In addition, the participants’ performance in pattern-only groups was better than in pattern-goal groups, but this improved pattern-only performance was far less extensive in the audio-visual than in the visual-only group. In sum, complex motor skill imitation was enhanced by an audio pattern of elbow angular velocity in support of generalist theories of imitation learning.

APA, Harvard, Vancouver, ISO, and other styles

28

Seo, Minji, and Myungho Kim. "Fusing Visual Attention CNN and Bag of Visual Words for Cross-Corpus Speech Emotion Recognition." Sensors 20, no. 19 (September 28, 2020): 5559. http://dx.doi.org/10.3390/s20195559.

Full text

Abstract:

Speech emotion recognition (SER) classifies emotions using low-level features or a spectrogram of an utterance. When SER methods are trained and tested using different datasets, they have shown performance reduction. Cross-corpus SER research identifies speech emotion using different corpora and languages. Recent cross-corpus SER research has been conducted to improve generalization. To improve the cross-corpus SER performance, we pretrained the log-mel spectrograms of the source dataset using our designed visual attention convolutional neural network (VACNN), which has a 2D CNN base model with channel- and spatial-wise visual attention modules. To train the target dataset, we extracted the feature vector using a bag of visual words (BOVW) to assist the fine-tuned model. Because visual words represent local features in the image, the BOVW helps VACNN to learn global and local features in the log-mel spectrogram by constructing a frequency histogram of visual words. The proposed method shows an overall accuracy of 83.33%, 86.92%, and 75.00% in the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), the Berlin Database of Emotional Speech (EmoDB), and Surrey Audio-Visual Expressed Emotion (SAVEE), respectively. Experimental results on RAVDESS, EmoDB, SAVEE demonstrate improvements of 7.73%, 15.12%, and 2.34% compared to existing state-of-the-art cross-corpus SER approaches.

APA, Harvard, Vancouver, ISO, and other styles

29

Park, So-Hyun, and Young-Ho Park. "Audio-Visual Tensor Fusion Network for Piano Player Posture Classification." Applied Sciences 10, no. 19 (September 29, 2020): 6857. http://dx.doi.org/10.3390/app10196857.

Full text

Abstract:

Playing the piano in the correct position is important because the correct position helps to produce good sound and prevents injuries. Many studies have been conducted in the field of piano playing posture recognition that combines various techniques. Most of these techniques are based on analyzing visual information. However, in the piano education field, it is essential to utilize audio information in addition to visual information due to the deep relationship between posture and sound. In this paper, we propose an audio-visual tensor fusion network (simply, AV-TFN) for piano performance posture classification. Unlike existing studies that used only visual information, the proposed method uses audio information to improve the accuracy in classifying the postures of professional and amateur pianists. For this, we first introduce a dataset called C3Pap (Classic piano performance postures of amateur and professionals) that contains actual piano performance videos in diverse environments. Furthermore, we propose a data structure that represents audio-visual information. The proposed data structure represents audio information on the color scale and visual information on the black and white scale for representing relativeness between them. We call this data structure an audio-visual tensor. Finally, we compare the performance of the proposed method with state-of-the-art approaches: VN (Visual Network), AN (Audio Network), AVN (Audio-Visual Network) with concatenation and attention techniques. The experiment results demonstrate that AV-TFN outperforms existing studies and, thus, can be effectively used in the classification of piano playing postures.

APA, Harvard, Vancouver, ISO, and other styles

30

Mulauzi, Felesia, Phiri Bwalya, Chishimba Soko, Vincent Njobvu, Jane Katema, and Felix Silungwe. "Preservation of audio-visual archives in Zambia." ESARBICA Journal: Journal of the Eastern and Southern Africa Regional Branch of the International Council on Archives 40 (November 6, 2021): 42–59. http://dx.doi.org/10.4314/esarjo.v40i1.4.

Full text

Abstract:

Audio-visual records and archives constitute a fundamental heritage that satisfies multiple needs, including education, training, research and entertainment. As such, there is a need to appropriately preserve and conserve them so they can be accessed for as long as they are needed. In spite of their significant role in safeguarding cultural heritage, audio-visual records and archives, are often neglected and accorded less attention than paper-based records, especially in developing countries. Hence, there is a risk of losing information held in audio-visual form. That is why this study looked at how the National Archives of Zambia (NAZ) and the Zambia National Broadcasting Corporation (ZNBC) preserve audio-visual materials to ensure long-term accessibility of the information. The study investigated the types of audio-visual collections held, the storage equipment used, measures put in place to ensure long-term accessibility of audio-visual materials, the disaster preparedness plans in place to safeguard audio-visual archives and the major challenges encountered in the preservation of audio-visual materials. The findings of the study revealed that films (microfilm and microfiche), photographs and manuscripts, and video (video tapes) and sound recordings (compact cassette) constitute the biggest audio-visual collection preserved. The equipment used to store audio-visual materials included open shelves, specialised cabinets, electronic database for digitised materials, aisle mobiles and cupboards. The measures taken to ensure the long-term accessibility of audio-visual collection included digitisation and migration of endangered records and archives; fumigation of storage areas; conservation of damaged materials and regulation of temperatures and humidity in the storage areas. The disaster preparedness plans in place mostly covered structure insurance; protection against fire and water by way of installing fire extinguishers; smoke sensors; fire detectors and construction of purpose-built structures. The major challenges faced were financial constraints; technological obsolescence; lack of playback equipment; limited training; lack of strong back-up systems and inadequate storage facilities.

APA, Harvard, Vancouver, ISO, and other styles

31

Mulauzi, Felesia, Phiri Bwalya, Chishimba Soko, Vincent Njobvu, Jane Katema, and Felix Silungwe. "Preservation of audio-visual archives in Zambia." ESARBICA Journal: Journal of the Eastern and Southern Africa Regional Branch of the International Council on Archives 40 (November 6, 2021): 42–59. http://dx.doi.org/10.4314/esarjo.v40i.4.

Full text

Abstract:

Audio-visual records and archives constitute a fundamental heritage that satisfies multiple needs, including education, training, research and entertainment. As such, there is a need to appropriately preserve and conserve them so they can be accessed for as long as they are needed. In spite of their significant role in safeguarding cultural heritage, audio-visual records and archives, are often neglected and accorded less attention than paper-based records, especially in developing countries. Hence, there is a risk of losing information held in audio-visual form. That is why this study looked at how the National Archives of Zambia (NAZ) and the Zambia National Broadcasting Corporation (ZNBC) preserve audio-visual materials to ensure long-term accessibility of the information. The study investigated the types of audio-visual collections held, the storage equipment used, measures put in place to ensure long-term accessibility of audio-visual materials, the disaster preparedness plans in place to safeguard audio-visual archives and the major challenges encountered in the preservation of audio-visual materials. The findings of the study revealed that films (microfilm and microfiche), photographs and manuscripts, and video (video tapes) and sound recordings (compact cassette) constitute the biggest audio-visual collection preserved. The equipment used to store audio-visual materials included open shelves, specialised cabinets, electronic database for digitised materials, aisle mobiles and cupboards. The measures taken to ensure the long-term accessibility of audio-visual collection included digitisation and migration of endangered records and archives; fumigation of storage areas; conservation of damaged materials and regulation of temperatures and humidity in the storage areas. The disaster preparedness plans in place mostly covered structure insurance; protection against fire and water by way of installing fire extinguishers; smoke sensors; fire detectors and construction of purpose-built structures. The major challenges faced were financial constraints; technological obsolescence; lack of playback equipment; limited training; lack of strong back-up systems and inadequate storage facilities.

APA, Harvard, Vancouver, ISO, and other styles

32

Motlicek, Petr, Stefan Duffner, Danil Korchagin, Hervé Bourlard, Carl Scheffler, Jean-Marc Odobez, Giovanni Del Galdo, Markus Kallinger, and Oliver Thiergart. "Real-Time Audio-Visual Analysis for Multiperson Videoconferencing." Advances in Multimedia 2013 (2013): 1–21. http://dx.doi.org/10.1155/2013/175745.

Full text

Abstract:

We describe the design of a system consisting of several state-of-the-art real-time audio and video processing components enabling multimodal stream manipulation (e.g., automatic online editing for multiparty videoconferencing applications) in open, unconstrained environments. The underlying algorithms are designed to allow multiple people to enter, interact, and leave the observable scene with no constraints. They comprise continuous localisation of audio objects and its application for spatial audio object coding, detection, and tracking of faces, estimation of head poses and visual focus of attention, detection and localisation of verbal and paralinguistic events, and the association and fusion of these different events. Combined all together, they represent multimodal streams with audio objects and semantic video objects and provide semantic information for stream manipulation systems (like a virtual director). Various experiments have been performed to evaluate the performance of the system. The obtained results demonstrate the effectiveness of the proposed design, the various algorithms, and the benefit of fusing different modalities in this scenario.

APA, Harvard, Vancouver, ISO, and other styles

33

King, Hannah, and Ioana Chitoran. "Difficult to hear but easy to see: Audio-visual perception of the /r/-/w/ contrast in Anglo-English." Journal of the Acoustical Society of America 152, no. 1 (July 2022): 368–79. http://dx.doi.org/10.1121/10.0012660.

Full text

Abstract:

This paper investigates the influence of visual cues in the perception of the /r/-/w/ contrast in Anglo-English. Audio-visual perception of Anglo-English /r/ warrants attention because productions are increasingly non-lingual, labiodental (e.g., [ʋ]), possibly involving visual prominence of the lips for the post-alveolar approximant [ɹ]. Forty native speakers identified [ɹ] and [w] stimuli in four presentation modalities: auditory-only, visual-only, congruous audio-visual, and incongruous audio-visual. Auditory stimuli were presented in noise. The results indicate that native Anglo-English speakers can identify [ɹ] and [w] from visual information alone with almost perfect accuracy. Furthermore, visual cues dominate the perception of the /r/-/w/ contrast when auditory and visual cues are mismatched. However, auditory perception is ambiguous because participants tend to perceive both [ɹ] and [w] as /r/. Auditory ambiguity is related to Anglo-English listeners' exposure to acoustic variation for /r/, especially to [ʋ], which is often confused with [w]. It is suggested that a specific labial configuration for Anglo-English /r/ encodes the contrast with /w/ visually, compensating for the ambiguous auditory contrast. An audio-visual enhancement hypothesis is proposed, and the findings are discussed with regard to sound change.

APA, Harvard, Vancouver, ISO, and other styles

34

Knobel, Samuel Elia Johannes, Brigitte Charlotte Kaufmann, Nora Geiser, Stephan Moreno Gerber, René M. Müri, Tobias Nef, Thomas Nyffeler, and Dario Cazzoli. "Effects of Virtual Reality–Based Multimodal Audio-Tactile Cueing in Patients With Spatial Attention Deficits: Pilot Usability Study." JMIR Serious Games 10, no. 2 (May 25, 2022): e34884. http://dx.doi.org/10.2196/34884.

Full text

Abstract:

Background Virtual reality (VR) devices are increasingly being used in medicine and other areas for a broad spectrum of applications. One of the possible applications of VR involves the creation of an environment manipulated in a way that helps patients with disturbances in the spatial allocation of visual attention (so-called hemispatial neglect). One approach to ameliorate neglect is to apply cross-modal cues (ie, cues in sensory modalities other than the visual one, eg, auditory and tactile) to guide visual attention toward the neglected space. So far, no study has investigated the effects of audio-tactile cues in VR on the spatial deployment of visual attention in neglect patients. Objective This pilot study aimed to investigate the feasibility and usability of multimodal (audio-tactile) cueing, as implemented in a 3D VR setting, in patients with neglect, and obtain preliminary results concerning the effects of different types of cues on visual attention allocation compared with noncued conditions. Methods Patients were placed in a virtual environment using a head-mounted display (HMD). The inlay of the HMD was equipped to deliver tactile feedback to the forehead. The task was to find and flag appearing birds. The birds could appear at 4 different presentation angles (lateral and paracentral on the left and right sides), and with (auditory, tactile, or audio-tactile cue) or without (no cue) a spatially meaningful cue. The task usability and feasibility, and 2 simple in-task measures (performance and early orientation) were assessed in 12 right-hemispheric stroke patients with neglect (5 with and 7 without additional somatosensory impairment). Results The new VR setup showed high usability (mean score 10.2, SD 1.85; maximum score 12) and no relevant side effects (mean score 0.833, SD 0.834; maximum score 21). A repeated measures ANOVA on task performance data, with presentation angle, cue type, and group as factors, revealed a significant main effect of cue type (F30,3=9.863; P<.001) and a significant 3-way interaction (F90,9=2.057; P=.04). Post-hoc analyses revealed that among patients without somatosensory impairment, any cue led to better performance compared with no cue, for targets on the left side, and audio-tactile cues did not seem to have additive effects. Among patients with somatosensory impairment, performance was better with both auditory and audio-tactile cueing than with no cue, at every presentation angle; conversely, tactile cueing alone had no significant effect at any presentation angle. Analysis of early orientation data showed that any type of cue triggered better orientation in both groups for lateral presentation angles, possibly reflecting an early alerting effect. Conclusions Overall, audio-tactile cueing seems to be a promising method to guide patient attention. For instance, in the future, it could be used as an add-on method that supports attentional orientation during established therapeutic approaches.

APA, Harvard, Vancouver, ISO, and other styles

35

Zhang, Weiyu, Se-Hoon Jeong, and Martin Fishbein†. "Situational Factors Competing for Attention." Journal of Media Psychology 22, no. 1 (January 2010): 2–13. http://dx.doi.org/10.1027/1864-1105/a000002.

Full text

Abstract:

This study investigates how multitasking interacts with levels of sexually explicit content to influence an individual’s ability to recognize TV content. A 2 (multitasking vs. nonmultitasking) by 3 (low, medium, and high sexual content) between-subjects experiment was conducted. The analyses revealed that multitasking not only impaired task performance, but also decreased TV recognition. An inverted-U relationship between degree of sexually explicit content and recognition of TV content was found, but only when subjects were multitasking. In addition, multitasking interfered with subjects’ ability to recognize audio information more than their ability to recognize visual information.

APA, Harvard, Vancouver, ISO, and other styles

36

Sugiyanti, Endang. "PENERAPAN MEDIA AUDIO VISUAL DALAM PENINGKATAN PEMAHAMAN HAJI DAN UMRAH." Wawasan: Jurnal Kediklatan Balai Diklat Keagamaan Jakarta 1, no. 1 (November 23, 2020): 79–90. http://dx.doi.org/10.53800/wawasan.v1i1.38.

Full text

Abstract:

Learning design must be arranged systematically. In the Learning process, the teacher plays an important role in determining the success or failure of learning achievement. The selection of appropriate learning media is something that teachers need to pay attention to with the application of Audio Visual media to improve students' understanding in following the learning process on the subject of Jurisprudence and pilgrimage material. The formulation of the problem is: How is the application of Audio Visual media in increasing the activeness and learning outcomes of students in the subject of Jurisprudence and Umrah in MTs Negeri 28 Jakarta, in collecting data the author uses tests, observations and student responses. The results of the application of audio-visual media answered from the formulation of the problem where the application of audio-visual media on the learning of the material of hajj and umrah pilgrimage has achieved success. Student learning outcomes in the fiqh learning of Hajj and Umrah material has reached its completeness with an average value of 88.09 with a percentage of 95.6% clearly visible improvement in learning outcomes after using audio-visual media has improved better.

APA, Harvard, Vancouver, ISO, and other styles

37

Wright, Thomas D., Jamie Ward, Sarah Simonon, and Aaron Margolis. "Where’s Wally? Audio–visual mismatch directs ocular saccades in sensory substitution." Seeing and Perceiving 25 (2012): 61. http://dx.doi.org/10.1163/187847612x646820.

Full text

Abstract:

Sensory substitution is the representation of information from one sensory modality (e.g., vision) within another modality (e.g., audition). We used a visual-to-auditory sensory substitution device (SSD) to explore the effect of incongruous (true-)visual and substituted-visual signals on visual attention. In our multisensory sensory substitution paradigm, both visual and sonified-visual information were presented. By making small alterations to the sonified image, but not the seen image, we introduced audio–visual mismatch. The alterations consisted of the addition of a small image (for instance, the Wally character from the ‘Where’s Wally?’ books) within the original image. Participants were asked to listen to the sonified image and identify which quadrant contained the alteration. Monitoring eye movements revealed the effect of the audio–visual mismatch on covert visual attention. We found that participants consistently fixated more, and dwelled for longer, in the quadrant corresponding to the location (in the sonified image) of the target. This effect was not contingent on the participant reporting the location of the target correctly, which indicates a low-level interaction between an auditory stream and visual attention. We propose that this suggests a shared visual workspace that is accessible by visual sources other than the eyes. If this is indeed the case, it would support the development of other, more esoteric, forms of sensory substitution. These could include an expanded field of view (e.g., rear-view cameras), overlaid visual information (e.g., thermal imaging) or restoration of partial visual field loss (e.g., hemianopsia).

APA, Harvard, Vancouver, ISO, and other styles

38

Yin, Yifang, Harsh Shrivastava, Ying Zhang, Zhenguang Liu, Rajiv Ratn Shah, and Roger Zimmermann. "Enhanced Audio Tagging via Multi- to Single-Modal Teacher-Student Mutual Learning." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 12 (May 18, 2021): 10709–17. http://dx.doi.org/10.1609/aaai.v35i12.17280.

Full text

Abstract:

Recognizing ongoing events based on acoustic clues has been a critical yet challenging problem that has attracted significant research attention in recent years. Joint audio-visual analysis can improve the event detection accuracy but may not always be feasible as under many circumstances only audio recordings are available in real-world scenarios. To solve the challenges, we present a novel visual-assisted teacher-student mutual learning framework for robust sound event detection from audio recordings. Our model adopts a multi-modal teacher network based on both acoustic and visual clues, and a single-modal student network based on acoustic clues only. Conventional teacher-student learning performs unsatisfactorily for knowledge transfer from a multi-modality network to a single-modality network. We thus present a mutual learning framework by introducing a single-modal transfer loss and a cross-modal transfer loss to collaboratively learn the audio-visual correlations between the two networks. Our proposed solution takes the advantages of joint audio-visual analysis in training while maximizing the feasibility of the model in use cases. Our extensive experiments on the DCASE17 and the DCASE18 sound event detection datasets show that our proposed method outperforms the state-of-the-art audio tagging approaches.

APA, Harvard, Vancouver, ISO, and other styles

39

Riza Oktariana and Wiwik Yeni Herlina. "ANALISIS KEMAMPUAN MENYIMAK ANAK KELOMPOK B TK BUNGONG SEULEUPOK SYIAH KUALA BANDA ACEH BERBANTUKAN MEDIA AUDIO VISUAL." Jurnal Buah Hati 7, no. 2 (September 22, 2020): 224–36. http://dx.doi.org/10.46244/buahhati.v7i2.1169.

Full text

Abstract:

The ability to listen is a big process of listening to, recognizing, and interpreting spoken symbols. Audio Visual Media is a type of media that in addition to containing sound elements also contains visible image elements, such as video recordings, various sizes of films, sound slides, and so on. The formulation of the problem in this study How is the listening ability of group B children in Bungong Seuleupok Kindergarten Syiah Kuala Banda Aceh assisted by audio-visual media. The reason the researcher uses audio visual is because audio visual can attract children's attention to listen because by displaying videos that are played by children and can see and hear directly. The aim is to determine the listening ability of group B children at Bungong Seuleupok Kindergarten Syiah Kuala Banda Aceh assisted by audio visual media. The instrument used was interviews with a sample size of 4 (four) childrens and the data were analyzed using qualitative methods with interviews. From the results of the research analysis on listening to children through audio-visual media at TK Bungong Seulupok Syiah Kuala Banda Aceh, it can be seen that the four children have no problem listening. With this, the researchers suggested that the children's listening ability was included in the visual character. And with the help of audio-visual media, children become interested in listening to the learning given according to the learning character they have and the comfortable atmosphere for the child. Abstrak Kemampuan menyimak adalah proses besar mendengarkan, mengenal, serta menginterpretasikan lambang-lambang lisan. Media Audio Visual yaitu jenis media yang selain mengandung unsur suara juga mengandung unsur gambar yang dapat dilihat, seperti rekaman video, berbagai ukuran film, slide suara, dan lain sebagainya. Rumusan masalah dalam penelitian ini Bagaimanakah kemampuan menyimak anak kelompok B TK Bungong Seuleupok Syiah Kuala Banda Aceh berbantukan media audio visual. Alasan peneliti mengunakan audio visual yaitu: karena audio visual dapat menarik perhatian anak untuk menyimak karena dengan menampilkan vidio yang diputarkan anak dan dapat melihat dan mendengar secara langsung. Tujuannya adalah untuk mengetahui kemampuan menyimak anak kelompok B TK Bungong Seuleupok Syiah Kuala Banda Aceh berbantukan media audio visual. Instrumen yang digunakan adalah wawancara dengan jumlah subjek adalah 4 (empat) orang anak dan data dianalisis dengan menggunakan metode kualitatif dengan wawancara. Dari hasil analis penelitian tentang menyimak anak melalui media audio visual di TK Bungong Seuleupok Syiah Kuala Banda Aceh terlihat bahwa ke empat anak tidak ada masalah dalam menyimak. Dengan hal tersebut maka peneliti mengemukakan bahwa kemampuan menyimak anak masuk pada karakter visual. Dan dengan berbantukan media audio visual anak menjadi tertarik dalam menyimak pembelajaran yang diberikan sesuai dengan karakter belajaran yang dimiliki dan suasana yang nyaman bagi anak. Kata Kunci: Kemampuan Menyimak, Media Audio Visual

APA, Harvard, Vancouver, ISO, and other styles

40

Kozlova, Elena I. "Ways of Electronic Publications' Classification in the Legal Deposit System." Bibliotekovedenie [Russian Journal of Library Science], no. 2 (April 27, 2012): 28–32. http://dx.doi.org/10.25281/0869-608x-2012-0-2-28-32.

Full text

Abstract:

On the problems of liberal interpretation of responsibilities for the delivery of electronic publications and audio-visual products by the producers, which leads to significant losses in the acquisition of National Library and Information Collection as an object of cultural heritage and in informing of libraries on information resources published in the territory of the Russian Federation. A comparison of classification features of audio books and audio-visual products as defined by law and national standards. Particular attention is paid to expediency of enhancement of electronic documents varieties as a part of an obligatory copy by amending the existing regulations.

APA, Harvard, Vancouver, ISO, and other styles

41

Et.al, Charanjit Kaur Swaran Singh. "Review of Research on the Use of Audio-Visual Aids among Learners’ English Language." Turkish Journal of Computer and Mathematics Education (TURCOMAT) 12, no. 3 (April 11, 2021): 895–904. http://dx.doi.org/10.17762/turcomat.v12i3.800.

Full text

Abstract:

This paper reviews the literature on the use of audio-visual to improve lower proficiency learners’ English Language. Audio-visual aids have been very beneficial to make learning interesting among the English as a second language (ESL) because it facilitates learning. Teachers have to be innovative in their pedagogy approach to make language teaching effective. Today, teachers are no longer dependent on traditional method of teaching instead they have adapted different techniques to teach language. Past studies have reported that teachers opted to use audio-visual because students showed less interest towards English language, students view English language as subject to fulfil the examination requirement, students’ displayed reluctance to speak due to their fear to make mistakes in pronunciation and other. Hence, this paper further analyses the need to investigate the issues that are related on the usage of audio-visual aids to the teaching of English language which needs immediate attention.

APA, Harvard, Vancouver, ISO, and other styles

42

Morís Fernández, Luis, Maya Visser, and Salvador Soto-Faraco. "Influence of selective attention to sound in multisensory integration." Seeing and Perceiving 25 (2012): 154. http://dx.doi.org/10.1163/187847612x647856.

Full text

Abstract:

We assessed the role of audiovisual integration in selective attention by testing selective attention to sound. Participants were asked to focus on one audio speech stream out of two audio streams presented simultaneously at different pitch. We measured recall of words from the cued or the uncued sentence using a 2AFC at the end of each trial. A video-clip of the mouth of a speaker was presented in the middle of the display, matching one of the two simultaneous auditory streams (50% of the time it matched the cued sentence and the rest the uncued one). In Experiment 1 the cue was 75% valid. Recall in the valid trials was better than in the invalid ones. The critical result was, however, that only in the valid condition we did find differences between audio–visual matching and audio-visually mismatching sentences. On the invalid condition these differences were not found. In Experiment 2 the cue to the relevant sentence was 100% valid, and we included a control condition where the lips didn’t match either of the sentences. When the lips matched the cued sentence performance was better than when they matched the uncued sentence or none of them, suggesting a benefit of audiovisual matching rather than a cost of mismatch. Our results indicate that attention to acoustic frequency (pitch) plays an important role in what sounds benefit from multisensory integration.

APA, Harvard, Vancouver, ISO, and other styles

43

Fleming, Justin T., Ross K. Maddox, and Barbara G. Shinn-Cunningham. "Spatial alignment between faces and voices improves selective attention to audio-visual speech." Journal of the Acoustical Society of America 150, no. 4 (October 2021): 3085–100. http://dx.doi.org/10.1121/10.0006415.

Full text

APA, Harvard, Vancouver, ISO, and other styles

44

Loria, Tristan, Joëlle Hajj, Kanji Tanaka, Katsumi Watanabe, and Luc Tremblay. "The deployment of spatial attention during goal-directed action alters audio-visual integration." Journal of Vision 19, no. 10 (September 6, 2019): 111c. http://dx.doi.org/10.1167/19.10.111c.

Full text

APA, Harvard, Vancouver, ISO, and other styles

45

Chen, Yanxiang, Minglong Song, Lixia Xue, Xiaoxue Chen, and Meng Wang. "An audio–visual human attention analysis approach to abrupt change detection in videos." Signal Processing 110 (May 2015): 143–54. http://dx.doi.org/10.1016/j.sigpro.2014.08.006.

Full text

APA, Harvard, Vancouver, ISO, and other styles

46

Lee, Jong-Seok, Francesca De Simone, and Touradj Ebrahimi. "Subjective Quality Evaluation of Foveated Video Coding Using Audio-Visual Focus of Attention." IEEE Journal of Selected Topics in Signal Processing 5, no. 7 (November 2011): 1322–31. http://dx.doi.org/10.1109/jstsp.2011.2165199.

Full text

APA, Harvard, Vancouver, ISO, and other styles

47

Brungart, Douglas S., Alexander J. Kordik, and Brian D. Simpson. "Audio and Visual Cues in a Two-Talker Divided Attention Speech-Monitoring Task." Human Factors: The Journal of the Human Factors and Ergonomics Society 47, no. 3 (September 2005): 562–73. http://dx.doi.org/10.1518/001872005774860023.

Full text

APA, Harvard, Vancouver, ISO, and other styles

48

Wang, Weixing, Qianqian Li, Jingwen Xie, Ningfeng Hu, Ziao Wang, and Ning Zhang. "Research on emotional semantic retrieval of attention mechanism oriented to audio-visual synesthesia." Neurocomputing 519 (January 2023): 194–204. http://dx.doi.org/10.1016/j.neucom.2022.11.036.

Full text

APA, Harvard, Vancouver, ISO, and other styles

49

KHOMSATUN, KHOMSATUN. "MENINGKATKAN KETERAMPILAN BEWUDU MELALUI METODE DEMONSTRASI YANG DIKOMBINASIKAN DENGAN MEDIA AUDIO VISUAL PADA PESERTA DIDIK DI KELAS VII SMP NEGERI 21 PONTIANAK." EDUCATOR : Jurnal Inovasi Tenaga Pendidik dan Kependidikan 2, no. 3 (November 8, 2022): 322–29. http://dx.doi.org/10.51878/educator.v2i3.1636.

Full text

Abstract:

The purpose of the study was to determine the teacher's efforts to improve Bewudu Skills through the Demonstration Method combined with Audio Visual Media for Students in Class VII C of SMP Negeri 21 Pontianak. The technique used is qualitative description. Classroom Action Research (CAR) Procedures. The results showed that the results of taharah (ablution) learning in seventh grade students of SMP Negeri 21. The results showed that the use of audio-visual media by applying the demonstration method had a positive impact on improving students' ablution skills, which was marked by an increase in students' learning mastery in each cycle, namely the first cycle 73.52% and the second cycle 94.11%. The use of audio-visual media by applying the demonstration method can make students feel that they are getting attention and the opportunity to express opinions, ideas, ideas and questions. By learning to use audio-visual media by applying the demonstration method, learning Islamic Religious Education is more fun ABSTRAKTujuan penelitian adalah untuk mengetahui upaya guru Meningkatkan Keterampilan Bewudu Melalui Metode Demonstrasi yang dikombinasikan dengan Media Audio Visual Pada Peserta Didik di Kelas VII C SMP Negeri 21 Pontianak. teknik yang digunakan deskripsi kualitatif. Prosedur Penelitian Tindakan Kelas (PTK). Hasil penelitian bahwa hasil pembelajaran thaharah (wudhu) pada peserta didik kelas VII SMP Negeri 21. Hasil penelitian bahwa penggunaan media audio visual dengan menerapkan metode demonstrasi memiliki dampak positif dalam meningkatakan keterampilan wudhu peserta didik yang ditandai dengan peningkatan ketuntasan belajar peserta didik dalam setiap siklus yaitu siklus I 73,52 % dan siklus II 94,11 %. Penggunaan media audio visual dengan menerapkan metode demonstrasi dapat menjadikan peserta didik merasa dirinya mendapat perhatian dan kesempatan untuk menyampaikan pendapat, gagasan, ide dan pertanyaan. Dengan pembelajaran menggunakan media audio visual dengan menerapkan metode demonstrasi,pada pembelajaran Pendidikan Agama Islam lebih menyenangkan

APA, Harvard, Vancouver, ISO, and other styles

50

Korzeniowska, A. T., H. Root-Gutteridge, J. Simner, and D. Reby. "Audio–visual crossmodal correspondences in domestic dogs ( Canis familiaris )." Biology Letters 15, no. 11 (November 2019): 20190564. http://dx.doi.org/10.1098/rsbl.2019.0564.

Full text

Abstract:

Crossmodal correspondences are intuitively held relationships between non-redundant features of a stimulus, such as auditory pitch and visual illumination. While a number of correspondences have been identified in humans to date (e.g. high pitch is intuitively felt to be luminant, angular and elevated in space), their evolutionary and developmental origins remain unclear. Here, we investigated the existence of audio–visual crossmodal correspondences in domestic dogs, and specifically, the known human correspondence in which high auditory pitch is associated with elevated spatial position. In an audio–visual attention task, we found that dogs engaged more with audio–visual stimuli that were congruent with human intuitions (high auditory pitch paired with a spatially elevated visual stimulus) compared to incongruent (low pitch paired with elevated visual stimulus). This result suggests that crossmodal correspondences are not a uniquely human or primate phenomenon and they cannot easily be dismissed as merely lexical conventions (i.e. matching ‘high’ pitch with ‘high’ elevation).

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!