Academic literature on the topic 'Active speaker detection'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Active speaker detection.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Active speaker detection"

1

Assunção, Gustavo, Nuno Gonçalves, and Paulo Menezes. "Bio-Inspired Modality Fusion for Active Speaker Detection." Applied Sciences 11, no. 8 (April 10, 2021): 3397. http://dx.doi.org/10.3390/app11083397.

Full text
Abstract:
Human beings have developed fantastic abilities to integrate information from various sensory sources exploring their inherent complementarity. Perceptual capabilities are therefore heightened, enabling, for instance, the well-known "cocktail party" and McGurk effects, i.e., speech disambiguation from a panoply of sound signals. This fusion ability is also key in refining the perception of sound source location, as in distinguishing whose voice is being heard in a group conversation. Furthermore, neuroscience has successfully identified the superior colliculus region in the brain as the one responsible for this modality fusion, with a handful of biological models having been proposed to approach its underlying neurophysiological process. Deriving inspiration from one of these models, this paper presents a methodology for effectively fusing correlated auditory and visual information for active speaker detection. Such an ability can have a wide range of applications, from teleconferencing systems to social robotics. The detection approach initially routes auditory and visual information through two specialized neural network structures. The resulting embeddings are fused via a novel layer based on the superior colliculus, whose topological structure emulates spatial neuron cross-mapping of unimodal perceptual fields. The validation process employed two publicly available datasets, with achieved results confirming and greatly surpassing initial expectations.
APA, Harvard, Vancouver, ISO, and other styles
2

Pu, Jie, Yannis Panagakis, and Maja Pantic. "Active Speaker Detection and Localization in Videos Using Low-Rank and Kernelized Sparsity." IEEE Signal Processing Letters 27 (2020): 865–69. http://dx.doi.org/10.1109/lsp.2020.2996412.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Lindstrom, Fredric, Keni Ren, Kerstin Persson Waye, and Haibo Li. "A comparison of two active‐speaker‐detection methods suitable for usage in noise dosimeter measurements." Journal of the Acoustical Society of America 123, no. 5 (May 2008): 3527. http://dx.doi.org/10.1121/1.2934471.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Zhu, Ying-Xin, and Hao-Ran Jin. "Speaker Localization Based on Audio-Visual Bimodal Fusion." Journal of Advanced Computational Intelligence and Intelligent Informatics 25, no. 3 (May 20, 2021): 375–82. http://dx.doi.org/10.20965/jaciii.2021.p0375.

Full text
Abstract:
The demand for fluency in human–computer interaction is on an increase globally; thus, the active localization of the speaker by the machine has become a problem worth exploring. Considering that the stability and accuracy of the single-mode localization method are low, while the multi-mode localization method can utilize the redundancy of information to improve accuracy and anti-interference, a speaker localization method based on voice and image multimodal fusion is proposed. First, the voice localization method based on time differences of arrival (TDOA) in a microphone array and the face detection method based on the AdaBoost algorithm are presented herein. Second, a multimodal fusion method based on spatiotemporal fusion of speech and image is proposed, and it uses a coordinate system converter and frame rate tracker. The proposed method was tested by positioning the speaker stand at 15 different points, and each point was tested 50 times. The experimental results demonstrate that there is a high accuracy when the speaker stands in front of the positioning system within a certain range.
APA, Harvard, Vancouver, ISO, and other styles
5

Stefanov, Kalin, Jonas Beskow, and Giampiero Salvi. "Self-Supervised Vision-Based Detection of the Active Speaker as Support for Socially Aware Language Acquisition." IEEE Transactions on Cognitive and Developmental Systems 12, no. 2 (June 2020): 250–59. http://dx.doi.org/10.1109/tcds.2019.2927941.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

DAI, Hai, Kean CHEN, Yang WANG, and Haoxin YU. "Fault detection method of secondary sound source in ANC system based on impedance characteristics." Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University 40, no. 6 (December 2022): 1242–49. http://dx.doi.org/10.1051/jnwpu/20224061242.

Full text
Abstract:
As an indispensable component in an active noise control system, the working states of the secondary sound sources affect directly noise reduction and the robustness of the system. Therefore, it is very crucial to detect the working states of the secondary sound sources in the process of active control in real time. In this study, a real-time fault detection method for secondary sound sources during the process of active control is presented, and the corresponding detection algorithm is numerically given and experimentally verified. By collecting the input voltage and output current of the speaker unit, the sound quality factor is calculated to comprehensively judge the working state of the secondary sound sources. The simulation and experimental results show that the present method is simple and has low computational cost, can accurately reflect accurately the working states of the secondary sound sources in real time, and provides a basis for judging the operation of the active noise control system.
APA, Harvard, Vancouver, ISO, and other styles
7

Ahmad, Zubair, Alquhayz, and Ditta. "Multimodal Speaker Diarization Using a Pre-Trained Audio-Visual Synchronization Model." Sensors 19, no. 23 (November 25, 2019): 5163. http://dx.doi.org/10.3390/s19235163.

Full text
Abstract:
Speaker diarization systems aim to find ‘who spoke when?’ in multi-speaker recordings. The dataset usually consists of meetings, TV/talk shows, telephone and multi-party interaction recordings. In this paper, we propose a novel multimodal speaker diarization technique, which finds the active speaker through audio-visual synchronization model for diarization. A pre-trained audio-visual synchronization model is used to find the synchronization between a visible person and the respective audio. For that purpose, short video segments comprised of face-only regions are acquired using a face detection technique and are then fed to the pre-trained model. This model is a two streamed network which matches audio frames with their respective visual input segments. On the basis of high confidence video segments inferred by the model, the respective audio frames are used to train Gaussian mixture model (GMM)-based clusters. This method helps in generating speaker specific clusters with high probability. We tested our approach on a popular subset of AMI meeting corpus consisting of 5.4 h of recordings for audio and 5.8 h of different set of multimodal recordings. A significant improvement is noticed with the proposed method in term of DER when compared to conventional and fully supervised audio based speaker diarization. The results of the proposed technique are very close to the complex state-of-the art multimodal diarization which shows significance of such simple yet effective technique.
APA, Harvard, Vancouver, ISO, and other styles
8

Wang, Shaolei, Zhongyuan Wang, Wanxiang Che, Sendong Zhao, and Ting Liu. "Combining Self-supervised Learning and Active Learning for Disfluency Detection." ACM Transactions on Asian and Low-Resource Language Information Processing 21, no. 3 (May 31, 2022): 1–25. http://dx.doi.org/10.1145/3487290.

Full text
Abstract:
Spoken language is fundamentally different from the written language in that it contains frequent disfluencies or parts of an utterance that are corrected by the speaker. Disfluency detection (removing these disfluencies) is desirable to clean the input for use in downstream NLP tasks. Most existing approaches to disfluency detection heavily rely on human-annotated data, which is scarce and expensive to obtain in practice. To tackle the training data bottleneck, in this work, we investigate methods for combining self-supervised learning and active learning for disfluency detection. First, we construct large-scale pseudo training data by randomly adding or deleting words from unlabeled data and propose two self-supervised pre-training tasks: (i) a tagging task to detect the added noisy words and (ii) sentence classification to distinguish original sentences from grammatically incorrect sentences. We then combine these two tasks to jointly pre-train a neural network. The pre-trained neural network is then fine-tuned using human-annotated disfluency detection training data. The self-supervised learning method can capture task-special knowledge for disfluency detection and achieve better performance when fine-tuning on a small annotated dataset compared to other supervised methods. However, limited in that the pseudo training data are generated based on simple heuristics and cannot fully cover all the disfluency patterns, there is still a performance gap compared to the supervised models trained on the full training dataset. We further explore how to bridge the performance gap by integrating active learning during the fine-tuning process. Active learning strives to reduce annotation costs by choosing the most critical examples to label and can address the weakness of self-supervised learning with a small annotated dataset. We show that by combining self-supervised learning with active learning, our model is able to match state-of-the-art performance with just about 10% of the original training data on both the commonly used English Switchboard test set and a set of in-house annotated Chinese data.
APA, Harvard, Vancouver, ISO, and other styles
9

Maltezou-Papastylianou, Constantina, Riccardo Russo, Denise Wallace, Chelsea Harmsworth, and Silke Paulmann. "Different stages of emotional prosody processing in healthy ageing–evidence from behavioural responses, ERPs, tDCS, and tRNS." PLOS ONE 17, no. 7 (July 21, 2022): e0270934. http://dx.doi.org/10.1371/journal.pone.0270934.

Full text
Abstract:
Past research suggests that the ability to recognise the emotional intent of a speaker decreases as a function of age. Yet, few studies have looked at the underlying cause for this effect in a systematic way. This paper builds on the view that emotional prosody perception is a multi-stage process and explores which step of the recognition processing line is impaired in healthy ageing using time-sensitive event-related brain potentials (ERPs). Results suggest that early processes linked to salience detection as reflected in the P200 component and initial build-up of emotional representation as linked to a subsequent negative ERP component are largely unaffected in healthy ageing. The two groups show, however, emotional prosody recognition differences: older participants recognise emotional intentions of speakers less well than younger participants do. These findings were followed up by two neuro-stimulation studies specifically targeting the inferior frontal cortex to test if recognition improves during active stimulation relative to sham. Overall, results suggests that neither tDCS nor high-frequency tRNS stimulation at 2mA for 30 minutes facilitates emotional prosody recognition rates in healthy older adults.
APA, Harvard, Vancouver, ISO, and other styles
10

Lahemer, Elfituri S. F., and Ahmad Rad. "An Audio-Based SLAM for Indoor Environments: A Robotic Mixed Reality Presentation." Sensors 24, no. 9 (April 27, 2024): 2796. http://dx.doi.org/10.3390/s24092796.

Full text
Abstract:
In this paper, we present a novel approach referred to as the audio-based virtual landmark-based HoloSLAM. This innovative method leverages a single sound source and microphone arrays to estimate the voice-printed speaker’s direction. The system allows an autonomous robot equipped with a single microphone array to navigate within indoor environments, interact with specific sound sources, and simultaneously determine its own location while mapping the environment. The proposed method does not require multiple audio sources in the environment nor sensor fusion to extract pertinent information and make accurate sound source estimations. Furthermore, the approach incorporates Robotic Mixed Reality using Microsoft HoloLens to superimpose landmarks, effectively mitigating the audio landmark-related issues of conventional audio-based landmark SLAM, particularly in situations where audio landmarks cannot be discerned, are limited in number, or are completely missing. The paper also evaluates an active speaker detection method, demonstrating its ability to achieve high accuracy in scenarios where audio data are the sole input. Real-time experiments validate the effectiveness of this method, emphasizing its precision and comprehensive mapping capabilities. The results of these experiments showcase the accuracy and efficiency of the proposed system, surpassing the constraints associated with traditional audio-based SLAM techniques, ultimately leading to a more detailed and precise mapping of the robot’s surroundings.
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "Active speaker detection"

1

Li, Yi. "Speaker Diarization System for Call-center data." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-286677.

Full text
Abstract:
To answer the question who spoke when, speaker diarization (SD) is a critical step for many speech applications in practice. The task of our project is building a MFCC-vector based speaker diarization system on top of a speaker verification system (SV), which is an existing Call-centers application to check the customer’s identity from a phone call. Our speaker diarization system uses 13-Dimensional MFCCs as Features, performs Voice Active Detection (VAD), segmentation, Linear Clustering and the Hierarchical Clustering based on GMM and the BIC score. By applying it, we decrease the Equal Error Rate (EER) of the SV from 18.1% in the baseline experiment to 3.26% on the general call-center conversations. To better analyze and evaluate the system, we also simulated a set of call-center data based on the public audio databases ICSI corpus.
För att svara på frågan vem som talade när är högtalardarisering (SD) ett kritiskt steg för många talapplikationer i praktiken. Uppdraget med vårt projekt är att bygga ett MFCC-vektorbaserat högtalar-diariseringssystem ovanpå ett högtalarverifieringssystem (SV), som är ett befintligt Call-center-program för att kontrollera kundens identitet från ett telefonsamtal. Vårt högtalarsystem använder 13-dimensionella MFCC: er som funktioner, utför Voice Active Detection (VAD), segmentering, linjär gruppering och hierarkisk gruppering baserat på GMM och BIC-poäng. Genom att tillämpa den minskar vi EER (Equal Error Rate) från 18,1 % i baslinjeexperimentet till 3,26 % för de allmänna samtalscentret. För att bättre analysera och utvärdera systemet simulerade vi också en uppsättning callcenter-data baserat på de offentliga ljuddatabaserna ICSI corpus.
APA, Harvard, Vancouver, ISO, and other styles
2

Pouthier, Baptiste. "Apprentissage profond et statistique sur données audiovisuelles dédié aux systèmes embarqués pour l'interface homme-machine." Electronic Thesis or Diss., Université Côte d'Azur, 2024. http://www.theses.fr/2024COAZ4019.

Full text
Abstract:
Les algorithmes d'apprentissage profond ont révolutionné le domaine des interfaces Homme-machine. Une nouvelle ère d'algorithmes audiovisuels a vu le jour, élargissant le spectre des cas d'utilisations et renforçant les performances des systèmes traditionnels. Cependant, cette remarquable avancée est entravée par les coûts computationnels ; l'intégration des algorithmes audio-visuels sur des systèmes intégrés représente un défi considérable. Cette thèse se concentre principalement sur cette limitation, adressant l'optimisation des algorithmes audiovisuels à plusieurs niveaux fonctionnels et au regard de différents critères tels que la latence et la précision. Notre approche implique l'examination méticuleuse et l'amélioration des éléments clés de l'interface Homme machine audiovisuelle. En particulier, nous investiguons et contribuons dans les domaines de la détection du locuteur actif et de la reconnaissance de la parole audiovisuelle. En adressant ces tâches, nous visons à réduire le décalage existant entre le potentiel des algorithmes audiovisuels et leur application dans les systèmes embarqués. Notre recherche introduit plusieurs modèles efficients pour la détection du locuteur actif. D'une part, notre nouvelle stratégie de fusion audiovisuelle permet d'améliorer significativement l'état de l'art, avec un modèle comparativement plus simple. D'autre part, nous explorons la recherche automatique d'architecture neuronale pour développer un modèle particulièrement compact et efficient pour la détection du locuteur actif. En outre, nous présentons nos travaux sur la reconnaissance de la parole audiovisuelle, avec un accent particulier porté sur la reconnaissance de mots clés. Notre contribution principale dans ce domaine cible l'aspect visuel de la parole, avec une approche à base de graphes pour simplifier la chaîne de traitement des données
In the rapidly evolving landscape of human-machine interfaces, deep learning has been nothing short of revolutionary. It has ushered in a new era of audio-visual algorithms, which, in turn, have expanded the horizons of potential applications and strengthened the performance of traditional systems. However, these remarkable advancements come with a caveat - many of these algorithms are computationally demanding, rendering their integration onto embedded devices a formidable task. The primary focus of this thesis is to surmount this limitation through a comprehensive optimization effort, addressing the critical factors of latency and accuracy in audio-visual algorithms. Our approach entails a meticulous examination and enhancement of key components in the audio-visual human-machine interaction pipeline; we investigate and make contributions to fundamental aspects of audio-visual technology in Active Speaker Detection and Audio-visual Speech Recognition tasks. By tackling these critical building blocks, we aim to bridge the gap between the vast potential of audio-visual algorithms and their practical application in embedded systems. Our research introduces efficient models in Active Speaker Detection. On the one hand, our novel audio-visual fusion strategy yields significant improvements over other state-of-the-art systems, featuring a relatively simpler model. On the other hand, we explore neural architecture search, resulting in the development of a compact yet efficient architecture for the Active Speaker Detection problem. Furthermore, we present our work on audio-visual speech recognition, with a specific emphasis on keyword spotting. Our main contribution targets the visual aspect of speech recognition with a graph-based approach designed to streamline the visual processing pipeline, promising simpler audio-visual recognition systems
APA, Harvard, Vancouver, ISO, and other styles

Books on the topic "Active speaker detection"

1

Kelly, Carla. Miss Milton speaks her mind. Seattle, WA: Camel Press, 2014.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
2

Francis, Dick. Banker. Oxford: Heinemann, 1992.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
3

Escott, John. Girl on a motorcycle. Oxford: Oxford University Press, 2008.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
4

Conan, Doyle Arthur. The Sign of Four. Peterborough, Ont: Broadview Press, 2001.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
5

Conan, Doyle Arthur. The sign of four =: El signo de cuatro. London: Heinemann Educational, 1985.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
6

Chee, Traci. Speaker. Penguin Young Readers Group, 2018.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
7

Chee, Traci. Speaker. Penguin Young Readers Group, 2018.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
8

Chee, Traci. The speaker. 2017.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
9

Bankir: Detektivnyĭ roman. [Moskva]: EKSMO-Press, 1999.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
10

Conan, Doyle A. Sign of the Four (1890), Also Called the Sign of Four, Is the Second Novel Featuring Sherlock Holmes Written by Sir Arthur Conan Doyle. Doyle Wrote Four Novels and 56 Short Stories Featuring the Fictional Detective. Independently Published, 2021.

Find full text
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Active speaker detection"

1

Alcázar, Juan León, Moritz Cordes, Chen Zhao, and Bernard Ghanem. "End-to-End Active Speaker Detection." In Lecture Notes in Computer Science, 126–43. Cham: Springer Nature Switzerland, 2022. http://dx.doi.org/10.1007/978-3-031-19836-6_8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Chakravarty, Punarjay, and Tinne Tuytelaars. "Cross-Modal Supervision for Learning Active Speaker Detection in Video." In Computer Vision – ECCV 2016, 285–301. Cham: Springer International Publishing, 2016. http://dx.doi.org/10.1007/978-3-319-46454-1_18.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Min, Kyle, Sourya Roy, Subarna Tripathi, Tanaya Guha, and Somdeb Majumdar. "Learning Long-Term Spatial-Temporal Graphs for Active Speaker Detection." In Lecture Notes in Computer Science, 371–87. Cham: Springer Nature Switzerland, 2022. http://dx.doi.org/10.1007/978-3-031-19833-5_22.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Yang, Yatao, and Siyu Yan. "Cross-Modal Active Speaker Detection Algorithm in Video and End-To-End Landing Solution." In Lecture Notes in Electrical Engineering, 313–23. Singapore: Springer Nature Singapore, 2024. http://dx.doi.org/10.1007/978-981-97-2200-6_29.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Pallavi, C., Girija R, Vedhapriyavadhana R, Barnali Dey, and Rajiv Vincent. "A Relative Investigation of Various Algorithms for Online Financial Fraud Detection Techniques." In Recent Trends in Intensive Computing. IOS Press, 2021. http://dx.doi.org/10.3233/apc210174.

Full text
Abstract:
Online financial transactions play a crucial role in today’s economy. It becomes an unavoidable part of the business and global activities. Transaction fraud executes thoughtful intimidations to e-commerce spending. Now-a-days, the online contract or business is fetching additional sound by knowing the types of online transaction frauds associated with, these are raising which disturbs the currency accompanying business. It has the capability to confine and encumber the contract accomplished by the intruder from an honest consumer’s credit card information. In order to avoid such a problem, the proposed system is established transaction limit for the customers. Efficient data is only considered for detecting fraudulent user action and it happens only at the time of registration. Transaction which is happening for any individual is not at all known to any FDS (Fraud Detection System) consecutively at the bank which mainly issues credit cards to customers. To speak out this problem, BLA (Behaviour and Location Analysis) is executed. The FDS tracks at a credit card provided by bank. All the inbound business is directed to the FDS aimed at confirmation, authentication and verification. FDS catches the card particulars and matter to confirm that the operation is fake or genuine. The pick-up merchandises are unknown to Fraud Detection System. If the transaction is assumed to be fraud, then the corresponding bank declines it. In order to verify the individuality, uniqueness or originality, it uses spending patterns and geographical area. In case, if any suspicious pattern is identified or detected, the FDS system needs verification. The information which is already registered by the user, the system identifies infrequent outlines in the disbursement method. After three invalid attempts, the system will hinder the user. In this proposed system, most of the algorithms are checked and investigated for online financial fraud detection techniques.
APA, Harvard, Vancouver, ISO, and other styles
6

Dalton, Gene, and Ann Devitt. "Gaeilge Gaming." In Computer-Assisted Language Learning, 1093–110. IGI Global, 2019. http://dx.doi.org/10.4018/978-1-5225-7663-1.ch052.

Full text
Abstract:
In the 2011 census almost one in three Irish teenagers claimed to be unable to speak Irish (Central Statistics Office, Ireland, 2012), despite the language being taught daily in school. The challenges facing the Irish language in schools are complex and multifaceted. The research reported here attempts to address some of these challenges by adopting a novel approach to teaching Irish to primary school children using an online detective game. This paper details how a group of 10 year old children (n = 17) report their experience of the game, and how this compares to its proposed affordances for language learning. Overall, the children responded very positively, and identified significant motivational factors associated with the game, such as rewards, positive team interactions, challenge and active learning. Their feedback demonstrates that this use of gaming technology has the potential to support children's language learning through creating a language community where users are motivated to use Irish in a meaningful way.
APA, Harvard, Vancouver, ISO, and other styles
7

Wilkins, Heidi. "Talking Back: Voice in Screwball Comedy." In Talkies, Road Movies and Chick Flicks. Edinburgh University Press, 2016. http://dx.doi.org/10.3366/edinburgh/9781474406895.003.0002.

Full text
Abstract:
Film had always been accompanied by sound in one form or another, but the ‘talkies’ introduced the prospect of a wider variety of film genres within mainstream narrative cinema that had not been possible during the silent era: genres that were reliant on language and verbalisation rather than mime and gesture. This development marked a change in film performance and acting style. As noted by Robert B. Ray: ‘Sound and the new indigenous acting style encouraged the flourishing of genres that silence and grandiloquent acting had previously hindered: the musical, the gangster film, the detective story, screwball comedy and humour that depended on language rather than slapstick.’ Although silent slapstick comedy remained in Hollywood, championed by the Marx Brothers, among others, the ‘talkies’ created great demand for a new generation of actors, those who could speak; it also generated a near-panic when these proved to be not that easily obtainable. Writers and directors of screwball comedy seized this opportunity, recognising that the comedy genre needed to incorporate the possibilities offered by synchronised sound.
APA, Harvard, Vancouver, ISO, and other styles
8

Cripps, Yvonne. "The Public Interest Disclosure Act 1998." In Freedom Of Expression And Freedom Of Information, 275–87. Oxford University PressOxford, 2000. http://dx.doi.org/10.1093/oso/9780198268390.003.0018.

Full text
Abstract:
Abstract Disclosures made by public and private sector workers acting in the public interest may be crucially important in protecting the public from a range of hazards including injury, risk to life, fraud, financial malpractice and environmental damage. Those who make such disclosures deserve legal protection. Even jurisdictions in which there is wide-ranging freedom of information legislation will benefit from occasional disclosures by public-spirited individuals who focus initial attention on matters of which the public ought to be aware. Damaging or potentially damaging activities that would otherwise escape detection, may, at a relatively early stage, come to the notice of workers who may or may not have the courage to speak out. Notorious examples of such situations include the Piper Alpha disaster, the Clapham Junction train collision, the foundering of the Herald of Free Enterprise, the arms to Iraq affair and the collapse of the Bank of Credit and Commerce International (BCCI).
APA, Harvard, Vancouver, ISO, and other styles
9

Skulacheva, Tatyana, Natalia Slioussar, Alexander Kostyuk, Anna Lipina, Emil Latypov, and Varvara Koroleva. "The Influence of Verse on Cognitive Processes: A Psycholinguistic Experiment." In Tackling the Toolkit: Plotting Poetry through Computational Literary Studies, 155–66. Institute of Czech Literature of the Czech Academy of Sciences, 2022. http://dx.doi.org/10.51305/icl.cz.9788076580336.10.

Full text
Abstract:
Modern psycho- and neurolinguistics use standards of precision typical of the natural sciences. As verse scholarship also bases its standards on those of the natural sciences, it can be combined fruitfully with the natural sciences, including neuroscience. This may ultimately allow us to answer the fundamental question of how verse and prose are processed in the brain. In this paper, we present the preliminary results of our project that aims to uncover how verse’s effects on cognitive processes compare to those of prose. We conducted 3 experiments with 110 informants who were native speakers of Russian between 18 and 55 years old. These experiments had the same design but involved different stimulus texts and groups of informants (40+40+30). Informants are known to slow down their reading considerably if they detect a textual error. Our aim was to compare the reading times for different verse and prose fragments when they contained errors and when they were error-free. We found that errors in verse remain undetected while the same errors are easily perceived in a corresponding prose text. The observation of this phenomenon in all three experiments is important proof of its validity. We suggest that prose and verse differently activate two ways of processing information in the brain: the first way is logical and relies on critical thinking including error detection, while the second is associative and depends on mental imagery rather than sequential logic
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Active speaker detection"

1

Roth, Joseph, Sourish Chaudhuri, Ondrej Klejch, Radhika Marvin, Andrew Gallagher, Liat Kaver, Sharadh Ramaswamy, et al. "Ava Active Speaker: An Audio-Visual Dataset for Active Speaker Detection." In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020. http://dx.doi.org/10.1109/icassp40776.2020.9053900.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Stefanov, Kalin, Jonas Beskow, and Giampiero Salvi. "Vision-based Active Speaker Detection in Multiparty Interaction." In GLU 2017 International Workshop on Grounding Language Understanding. ISCA: ISCA, 2017. http://dx.doi.org/10.21437/glu.2017-10.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Huang, Chong, and Kazuhito Koishida. "Improved Active Speaker Detection based on Optical Flow." In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE, 2020. http://dx.doi.org/10.1109/cvprw50498.2020.00483.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Chakravarty, Punarjay, Jeroen Zegers, Tinne Tuytelaars, and Hugo Van hamme. "Active speaker detection with audio-visual co-training." In ICMI '16: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION. New York, NY, USA: ACM, 2016. http://dx.doi.org/10.1145/2993148.2993172.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Kheradiya, Jatin, Sandeep Reddy C, and Rajesh Hegde. "Active Speaker Detection using audio-visual sensor array." In 2014 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT). IEEE, 2014. http://dx.doi.org/10.1109/isspit.2014.7300636.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Wuerkaixi, Abudukelimu, You Zhang, Zhiyao Duan, and Changshui Zhang. "Rethinking Audio-Visual Synchronization for Active Speaker Detection." In 2022 IEEE 32nd International Workshop on Machine Learning for Signal Processing (MLSP). IEEE, 2022. http://dx.doi.org/10.1109/mlsp55214.2022.9943352.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Jiang, Yidi, Ruijie Tao, Zexu Pan, and Haizhou Li. "Target Active Speaker Detection with Audio-visual Cues." In INTERSPEECH 2023. ISCA: ISCA, 2023. http://dx.doi.org/10.21437/interspeech.2023-574.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Liao, Junhua, Haihan Duan, Kanghui Feng, Wanbing Zhao, Yanbing Yang, and Liangyin Chen. "A Light Weight Model for Active Speaker Detection." In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2023. http://dx.doi.org/10.1109/cvpr52729.2023.02196.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Alcazar, Juan Leon, Fabian Caba Heilbron, Ali K. Thabet, and Bernard Ghanem. "MAAS: Multi-modal Assignation for Active Speaker Detection." In 2021 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, 2021. http://dx.doi.org/10.1109/iccv48922.2021.00033.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Madrigal, Francisco, Frederic Lerasle, Lionel Pibre, and Isabelle Ferrane. "Audio-Video detection of the active speaker in meetings." In 2020 25th International Conference on Pattern Recognition (ICPR). IEEE, 2021. http://dx.doi.org/10.1109/icpr48806.2021.9412681.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "Active speaker detection"

1

Mizrach, Amos, Michal Mazor, Amots Hetzroni, Joseph Grinshpun, Richard Mankin, Dennis Shuman, Nancy Epsky, and Robert Heath. Male Song as a Tool for Trapping Female Medflies. United States Department of Agriculture, December 2002. http://dx.doi.org/10.32747/2002.7586535.bard.

Full text
Abstract:
This interdisciplinaray work combines expertise in engineering and entomology in Israel and the US, to develop an acoustic trap for mate-seeking female medflies. Medflies are among the world's most economically harmful pests, and monitoring and control efforts cost about $800 million each year in Israel and the US. Efficient traps are vitally important tools for medfly quarantine and pest management activities; they are needed for early detection, for predicting dispersal patterns and for estimating medfly abundance within infested regions. Early detection facilitates rapid response to invasions, in order to contain them. Prediction of dispersal patterns facilitates preemptive action, and estimates of the pests' abundance lead to quantification of medfly infestations and control efforts. Although olfactory attractants and traps exist for capturing male and mated female medflies, there are still no satisfactorily efficient means to attract and trap virgin and remating females (a significant and dangerous segment of the population). We proposed to explore the largely ignored mechanism of female attraction to male song that the flies use in courtship. The potential of such an approach is indicated by studies under this project. Our research involved the identification, isolation, and augmentation of the most attractive components of male medfly songs and the use of these components in the design and testing of traps incorporating acoustic lures. The project combined expertise in acoustic engineering and instrumentation, fruit fly behavior, and integrated pest management. The BARD support was provided for 1 year to enable proof-of-concept studies, aimed to determine: 1) whether mate-seeking female medflies are attracted to male songs; and 2) over what distance such attraction works. Male medfly calling song was recorded during courtship. Multiple acoustic components of male song were examined and tested for synergism with substrate vibrations produced by various surfaces, plates and loudspeakers, with natural and artificial sound playbacks. A speaker-funnel system was developed that focused the playback signal to reproduce as closely as possible the near-field spatial characteristics of the sounds produced by individual males. In initial studies, the system was tasted by observing the behavior of females while the speaker system played songs at various intensities. Through morning and early afternoon periods of peak sexual activity, virgin female medflies landed on a sheet of filter paper at the funnel outlet and stayed longer during broadcasting than during the silent part of the cycle. In later studies, females were captured on sticky paper at the funnel outlet. The mean capture rates were 67 and 44%, respectively, during sound emission and silent control periods. The findings confirmed that female trapping was improved if a male calling song was played. The second stage of the research focused on estimating the trapping range. Initial results indicated that the range possibly extended to 70 cm, but additional, verification tests remain to be conducted. Further studies are planned also to consider effects of combining acoustic and pheromonal cues.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography