Academic literature on the topic 'Audio speaker'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Audio speaker.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Audio speaker"

1

Burton, Paul. "Audio speaker." Journal of the Acoustical Society of America 89, no. 1 (January 1991): 495. http://dx.doi.org/10.1121/1.400405.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Tsuda, Shiro. "Audio speaker and method for assembling an audio speaker." Journal of the Acoustical Society of America 118, no. 2 (2005): 589. http://dx.doi.org/10.1121/1.2040247.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Tsuda, Shiro. "Audio speaker and method for assembling an audio speaker." Journal of the Acoustical Society of America 123, no. 2 (2008): 586. http://dx.doi.org/10.1121/1.2857671.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Page, Steven L. "Audio speaker system." Journal of the Acoustical Society of America 99, no. 3 (1996): 1277. http://dx.doi.org/10.1121/1.414786.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Yagisawa, Toshihiro. "Audio mirror speaker." Journal of the Acoustical Society of America 100, no. 1 (1996): 23. http://dx.doi.org/10.1121/1.415929.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Kery, Ervin, and Steve A. Alverson. "Audio speaker system." Journal of the Acoustical Society of America 91, no. 3 (March 1992): 1794. http://dx.doi.org/10.1121/1.403719.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Minnerath, Donald L., and Robert J. Minnerath. "Audio speaker apparatus." Journal of the Acoustical Society of America 87, no. 2 (February 1990): 931. http://dx.doi.org/10.1121/1.398815.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Babel, Molly. "Adaptation to Social-Linguistic Associations in Audio-Visual Speech." Brain Sciences 12, no. 7 (June 28, 2022): 845. http://dx.doi.org/10.3390/brainsci12070845.

Full text
Abstract:
Listeners entertain hypotheses about how social characteristics affect a speaker’s pronunciation. While some of these hypotheses may be representative of a demographic, thus facilitating spoken language processing, others may be erroneous stereotypes that impede comprehension. As a case in point, listeners’ stereotypes of language and ethnicity pairings in varieties of North American English can improve intelligibility and comprehension, or hinder these processes. Using audio-visual speech this study examines how listeners adapt to speech in noise from four speakers who are representative of selected accent-ethnicity associations in the local speech community: an Asian English-L1 speaker, a white English-L1 speaker, an Asian English-L2 speaker, and a white English-L2 speaker. The results suggest congruent accent-ethnicity associations facilitate adaptation, and that the mainstream local accent is associated with a more diverse speech community.
APA, Harvard, Vancouver, ISO, and other styles
9

Ballesteros-Larrota, Dora Maria, Diego Renza-Torres, and Steven Andrés Camacho-Vargas. "Blind speaker identification for audio forensic purposes." DYNA 84, no. 201 (June 12, 2017): 259. http://dx.doi.org/10.15446/dyna.v84n201.60407.

Full text
Abstract:
Este artículo presenta un método ciego para identificación del hablante, con fines de audio forense. Se basa en un sistema de decisión que trabajo con reglas difusas y la correlación entre los cocleagramas del audio de prueba y de los audios de los sospechosos. Nuestro sistema proporciona salida nula, con único sospechoso o con un grupo de sospechosos. De acuerdo a las pruebas realizadas, el desempeño global del sistema (OA) es 0.97 con un valor de concordancia (índice kappa) de 0.75. Adicionalmente, a diferencia de sistemas clásicos en los que un bajo valor de selección incorrecta (FP) implica un alto valor de rechazo incorrecto (FN), nuestro sistema puede trabajar con valores de FP y FN igual a cero, de forma simultánea. Finalmente, nuestro sistema trabaja con identificación ciega, es decir, no es necesaria una fase de entrenamiento o conocimiento previo de los audios; característica importante para audio forense.
APA, Harvard, Vancouver, ISO, and other styles
10

Hillerin, Marie Georgescu de. "Speaker Protocol." Consumer Electronics Test & Development 2021, no. 2 (January 2022): 56. http://dx.doi.org/10.12968/s2754-7744(23)70084-5.

Full text
Abstract:
DXOMARK does not take audio lightly: the French quality evaluation expert built its own anechoic chamber, commissioned professional musicians, and even bought an apartment to use exclusively for its audio tests. But what exactly goes on behind the soundproof doors?
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "Audio speaker"

1

Khan, Faheem. "Audio-visual speaker separation." Thesis, University of East Anglia, 2016. https://ueaeprints.uea.ac.uk/59679/.

Full text
Abstract:
Communication using speech is often an audio-visual experience. Listeners hear what is being uttered by speakers and also see the corresponding facial movements and other gestures. This thesis is an attempt to exploit this bimodal (audio-visual) nature of speech for speaker separation. In addition to the audio speech features, visual speech features are used to achieve the task of speaker separation. An analysis of the correlation between audio and visual speech features is carried out first. This correlation between audio and visual features is then used in the estimation of clean audio features from visual features using Gaussian MixtureModels (GMMs) andMaximum a Posteriori (MAP) estimation. For speaker separation three methods are proposed that use the estimated clean audio features. Firstly, the estimated clean audio features are used to construct aWiener filter to separate the mixed speech at various signal-to-noise ratios (SNRs) into target and competing speakers. TheWiener filter gains are modified in several ways in search for improvements in quality and intelligibility of the extracted speech. Secondly, the estimated clean audio features are used in developing visually-derived binary masking method for speaker separation. The estimated audio features are used to compute time-frequency binary masks that identify the regions where the target speaker dominates. These regions are retained and formthe estimate of the target speaker’s speech. Experimental results compare the visually-derived binary masks with ideal binary masks which shows a useful level of accuracy. The effectiveness of the visually-derived binary mask for speaker separation is then evaluated through estimates of speech quality and speech intelligibility and shows substantial gains over the original mixture. Thirdly, the estimated clean audio features and the visually-derivedWiener filtering are used to modify the operation of an effective audio-only method of speaker separation, namely the soft mask method, to allow visual speech information to improve the separation task. Experimental results are presented that compare the proposed audio-visual speaker separation with the audio-only method using both speech quality and intelligibility metrics. Finally, a detailed comparison is made of the proposed and existing methods of speaker separation using objective and subjective measures.
APA, Harvard, Vancouver, ISO, and other styles
2

Kwon, Patrick (Patrick Ryan) 1975. "Speaker spotting : automatic annotation of audio data with speaker identity." Thesis, Massachusetts Institute of Technology, 1998. http://hdl.handle.net/1721.1/47608.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Seymour, R. "Audio-visual speech and speaker recognition." Thesis, Queen's University Belfast, 2008. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.492489.

Full text
Abstract:
In this thesis, a number of important issues relating to the use of both audio and video information for speech and speaker recognition are investigated. A comprehensive comparison of different visual feature types is given, including both geometric and image transformation based features. A new geometric based method for feature extraction is described, as well as the novel use of curvelet based features. Different methods for constructing the feature vectors are compared, as well as feature vector sizes and the use of dynamic features. Each feature type is tested against three types of visual noise: compression, blurring and jitter. A novel method of integrating the audio and video information streams called the maximum stream posterior (MSP) is described. This method is tested in both speaker dependent and speaker independent audio-visual speech recognition (AVSR) systems, and is shown to be robust to noise in either the audio or video streams, given no prior knowledge of the noise. This method is then extended to form the maximum weighted stream posterior (MWSP) method. Finally, both the MSP and MWSP are tested in an audio-visual speaker recognition system (AVSpR). / Experiments using the XM2VTS database will show that both of these methods can outperform ,_.','/ standard methods in terms of recognition accuracy in situations where either stream is corrupted.
APA, Harvard, Vancouver, ISO, and other styles
4

Malegaonkar, Amit. "Speaker-based indexation of conversational audio." Thesis, University of Hertfordshire, 2006. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.440175.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

D'Arca, Eleonora. "Speaker tracking in a joint audio-video network." Thesis, Heriot-Watt University, 2015. http://hdl.handle.net/10399/2972.

Full text
Abstract:
Situational awareness is achieved naturally by the human senses of sight and hearing in combination. System-level automatic scene understanding aims at replicating this human ability using cooperative microphones and cameras. In this thesis, we integrate and fuse audio and video signals at different levels of abstractions to detect and track a speaker in a scenario where people are free to move indoors. Despite the low complexity of the system, which consists of just 4 microphones pairs and 1 camera, results show that the overall multimodal tracker is more reliable than single modality systems, tolerating large occlusions and cross-talking. The system evaluation is performed on both single modality and multimodality tracking. The performance improvement given by the audio-video integration and fusion is quantified in terms of tracking precision and accuracy as well as speaker diarisation error rate and precision-recall recognition metrics. We evaluate our results vs. the closest works: a 56% improvement on audio only sound source localisation computational cost and an 18% increment on the speaker diarisation error rate over a speaker-only unit is achieved.
APA, Harvard, Vancouver, ISO, and other styles
6

Lathe, Andrew. "Speaker Prototyping Design." Digital Commons @ East Tennessee State University, 2020. https://dc.etsu.edu/honors/584.

Full text
Abstract:
Audio design is a pertinent industry in today’s world, with an extremely large market including leaders such as Bose, Harman International, and Sennheiser. This project is designed to explore the processes that are necessary to create a new type of product in this market. The end goal is to have a functioning, high–quality set of speakers to prove various concepts of design and prototyping. The steps involved in this project go through the entire design process from initial choice of product to a finished prototype. Processes include the selection of outsourced components such as drivers and necessary connectors. The design stage will include any design processes necessary to create the enclosure or any electronics. Production will be controlled by shipping dates and any potential issues that lie within the methods chosen for production. The final product will be tested for response. The prototyping process is usually fulfilled by various departments with extreme expertise in the respective field.
APA, Harvard, Vancouver, ISO, and other styles
7

Martí, Guerola Amparo. "Multichannel audio processing for speaker localization, separation and enhancement." Doctoral thesis, Universitat Politècnica de València, 2013. http://hdl.handle.net/10251/33101.

Full text
Abstract:
This thesis is related to the field of acoustic signal processing and its applications to emerging communication environments. Acoustic signal processing is a very wide research area covering the design of signal processing algorithms involving one or several acoustic signals to perform a given task, such as locating the sound source that originated the acquired signals, improving their signal to noise ratio, separating signals of interest from a set of interfering sources or recognizing the type of source and the content of the message. Among the above tasks, Sound Source localization (SSL) and Automatic Speech Recognition (ASR) have been specially addressed in this thesis. In fact, the localization of sound sources in a room has received a lot of attention in the last decades. Most real-word microphone array applications require the localization of one or more active sound sources in adverse environments (low signal-to-noise ratio and high reverberation). Some of these applications are teleconferencing systems, video-gaming, autonomous robots, remote surveillance, hands-free speech acquisition, etc. Indeed, performing robust sound source localization under high noise and reverberation is a very challenging task. One of the most well-known algorithms for source localization in noisy and reverberant environments is the Steered Response Power - Phase Transform (SRP-PHAT) algorithm, which constitutes the baseline framework for the contributions proposed in this thesis. Another challenge in the design of SSL algorithms is to achieve real-time performance and high localization accuracy with a reasonable number of microphones and limited computational resources. Although the SRP-PHAT algorithm has been shown to be an effective localization algorithm for real-world environments, its practical implementation is usually based on a costly fine grid-search procedure, making the computational cost of the method a real issue. In this context, several modifications and optimizations have been proposed to improve its performance and applicability. An effective strategy that extends the conventional SRP-PHAT functional is presented in this thesis. This approach performs a full exploration of the sampled space rather than computing the SRP at discrete spatial positions, increasing its robustness and allowing for a coarser spatial grid that reduces the computational cost required in a practical implementation with a small hardware cost (reduced number of microphones). This strategy allows to implement real-time applications based on location information, such as automatic camera steering or the detection of speech/non-speech fragments in advanced videoconferencing systems. As stated before, besides the contributions related to SSL, this thesis is also related to the field of ASR. This technology allows a computer or electronic device to identify the words spoken by a person so that the message can be stored or processed in a useful way. ASR is used on a day-to-day basis in a number of applications and services such as natural human-machine interfaces, dictation systems, electronic translators and automatic information desks. However, there are still some challenges to be solved. A major problem in ASR is to recognize people speaking in a room by using distant microphones. In distant-speech recognition, the microphone does not only receive the direct path signal, but also delayed replicas as a result of multi-path propagation. Moreover, there are multiple situations in teleconferencing meetings when multiple speakers talk simultaneously. In this context, when multiple speaker signals are present, Sound Source Separation (SSS) methods can be successfully employed to improve ASR performance in multi-source scenarios. This is the motivation behind the training method for multiple talk situations proposed in this thesis. This training, which is based on a robust transformed model constructed from separated speech in diverse acoustic environments, makes use of a SSS method as a speech enhancement stage that suppresses the unwanted interferences. The combination of source separation and this specific training has been explored and evaluated under different acoustical conditions, leading to improvements of up to a 35% in ASR performance.
Martí Guerola, A. (2013). Multichannel audio processing for speaker localization, separation and enhancement [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/33101
TESIS
APA, Harvard, Vancouver, ISO, and other styles
8

Lucey, Simon. "Audio-visual speech processing." Thesis, Queensland University of Technology, 2002. https://eprints.qut.edu.au/36172/7/SimonLuceyPhDThesis.pdf.

Full text
Abstract:
Speech is inherently bimodal, relying on cues from the acoustic and visual speech modalities for perception. The McGurk effect demonstrates that when humans are presented with conflicting acoustic and visual stimuli, the perceived sound may not exist in either modality. This effect has formed the basis for modelling the complementary nature of acoustic and visual speech by encapsulating them into the relatively new research field of audio-visual speech processing (AVSP). Traditional acoustic based speech processing systems have attained a high level of performance in recent years, but the performance of these systems is heavily dependent on a match between training and testing conditions. In the presence of mismatched conditions (eg. acoustic noise) the performance of acoustic speech processing applications can degrade markedly. AVSP aims to increase the robustness and performance of conventional speech processing applications through the integration of the acoustic and visual modalities of speech, in particular the tasks of isolated word speech and text-dependent speaker recognition. Two major problems in AVSP are addressed in this thesis, the first of which concerns the extraction of pertinent visual features for effective speech reading and visual speaker recognition. Appropriate representations of the mouth are explored for improved classification performance for speech and speaker recognition. Secondly, there is the question of how to effectively integrate the acoustic and visual speech modalities for robust and improved performance. This question is explored in-depth using hidden Markov model(HMM)classifiers. The development and investigation of integration strategies for AVSP required research into a new branch of pattern recognition known as classifier combination theory. A novel framework is presented for optimally combining classifiers so their combined performance is greater than any of those classifiers individually. The benefits of this framework are not restricted to AVSP, as they can be applied to any task where there is a need for combining independent classifiers.
APA, Harvard, Vancouver, ISO, and other styles
9

Abdelraheem, Mahmoud Fakhry Mahmoud. "Exploiting spatial and spectral information for audio source separation and speaker diarization." Doctoral thesis, University of Trento, 2016. http://eprints-phd.biblio.unitn.it/1876/1/PhD_Thesis.pdf.

Full text
Abstract:
The goal of multichannel audio source separation is to produce high quality separated audio signals, observing mixtures of these signals. The difficulty of tackling the problem comes from not only the source propagation through noisy and echoing environments, but also overlapped source signals. Among the different research directions pursued around this problem, the adoption of probabilistic and advanced modeling aims at exploiting the diversity of multichannel propagation, and the redundancy of source signals. Moreover, prior information about the environments or the signals is helpful to improve the quality and to accelerate the separation. In this thesis, we propose methods to increase the effectiveness of model-based audio source separation methods by exploiting prior information applying spectral and sparse modeling theories. The work is divided into two main parts. In the first part, spectral modeling based on Nonnegative Matrix Factorization is adopted to represent the source signals. The parameters of Gaussian model-based source separation are estimated in sense of Maximum-Likelihood using a Generalized Expectation-Maximization algorithm by applying supervised Nonnegative Matrix and Tensor Factorization, given spectral descriptions of the source signals. Three modalities of making the descriptions available are addressed, i.e. the descriptions are on-line trained during the separation, pre-trained and made directly available, or pre-trained and made indirectly available. In the latter, a detection method is proposed in order to identify the descriptions best representing the signals in the mixtures. In the second part, sparse modeling is adopted to represent the propagation environments. Spatial descriptions of the environments, either deterministic or probabilistic, are pre-trained and made indirectly available. A detection method is proposed in order to identify the deterministic descriptions best representing the environments. The detected descriptions are then used to perform source separation by minimizing a non-convex $l_0$-norm function. For speaker diarization where the task is to determine ``who spoke when" in real meetings, a Watson mixture model is optimized using an Expectation-Maximization algorithm in order to detect the probabilistic descriptions, best representing the environments, and to estimate the temporal activity of each source. The performance of the proposed methods is experimentally evaluated using different datasets, between simulated and live-recorded. The elaborated results show the superiority of the proposed methods over recently developed methods used as baselines.
APA, Harvard, Vancouver, ISO, and other styles
10

Dean, David Brendan. "Synchronous HMMs for audio-visual speech processing." Thesis, Queensland University of Technology, 2008. https://eprints.qut.edu.au/17689/3/David_Dean_Thesis.pdf.

Full text
Abstract:
Both human perceptual studies and automaticmachine-based experiments have shown that visual information from a speaker's mouth region can improve the robustness of automatic speech processing tasks, especially in the presence of acoustic noise. By taking advantage of the complementary nature of the acoustic and visual speech information, audio-visual speech processing (AVSP) applications can work reliably in more real-world situations than would be possible with traditional acoustic speech processing applications. The two most prominent applications of AVSP for viable human-computer-interfaces involve the recognition of the speech events themselves, and the recognition of speaker's identities based upon their speech. However, while these two fields of speech and speaker recognition are closely related, there has been little systematic comparison of the two tasks under similar conditions in the existing literature. Accordingly, the primary focus of this thesis is to compare the suitability of general AVSP techniques for speech or speaker recognition, with a particular focus on synchronous hidden Markov models (SHMMs). The cascading appearance-based approach to visual speech feature extraction has been shown to work well in removing irrelevant static information from the lip region to greatly improve visual speech recognition performance. This thesis demonstrates that these dynamic visual speech features also provide for an improvement in speaker recognition, showing that speakers can be visually recognised by how they speak, in addition to their appearance alone. This thesis investigates a number of novel techniques for training and decoding of SHMMs that improve the audio-visual speech modelling ability of the SHMM approach over the existing state-of-the-art joint-training technique. Novel experiments are conducted within to demonstrate that the reliability of the two streams during training is of little importance to the final performance of the SHMM. Additionally, two novel techniques of normalising the acoustic and visual state classifiers within the SHMM structure are demonstrated for AVSP. Fused hidden Markov model (FHMM) adaptation is introduced as a novel method of adapting SHMMs from existing wellperforming acoustic hidden Markovmodels (HMMs). This technique is demonstrated to provide improved audio-visualmodelling over the jointly-trained SHMMapproach at all levels of acoustic noise for the recognition of audio-visual speech events. However, the close coupling of the SHMM approach will be shown to be less useful for speaker recognition, where a late integration approach is demonstrated to be superior.
APA, Harvard, Vancouver, ISO, and other styles

Books on the topic "Audio speaker"

1

Consultants, PEER, Center for Environmental Information (U.S.), and Risk Reduction Engineering Laboratory (U.S.), eds. Physical/chemical treatment of hazardous waste sites: Speaker slide copies and supporting information. Cincinnati, OH: U.S. Environmental Protection Agency, Risk Reduction Engineering Laboratory, 1990.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
2

Engineers, Society of Automotive, Audio Systems Conference (1986 : Detroit, Mich.), and SAE International Congress & Exposition (1986 : Detroit, Mich.), eds. Audio systems: Speakers, receivers, non-audio electronics. Warrendale, PA: Society of Automotive Engineers, 1986.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
3

Beginner's Chinese with audio CDs. New York: Hippocrene Books, 1997.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
4

Beginning Portuguese w/ two audio CD's. New York: McGraw-Hill, 2011.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
5

Tyson-Ward, Sue. Beginning Portuguese w/ two audio CD's. New York: McGraw-Hill, 2011.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
6

Beginner's Norwegian with 2 audio CDs. New York: Hippocrene Books, 2005.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
7

Robert, Niebuhr, ed. Beginner's Croatian with 2 audio cds. New York: Hippocrene Books, 2009.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
8

Speak out!: Creating podcasts and other audio recordings. Ann Arbor, Michigan: Cherry Lake Publishing, 2013.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
9

Atkinson, Jane. The Wealthy Speaker Audio Book. Speaker Launcher, 2006.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
10

Dufseth, Rhonda. Bluetooth Dual Mode Speaker Audio Application. Microchip Technology Incorporated, 2020.

Find full text
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Audio speaker"

1

Julia, Luc E., Larry P. Heck, and Adam J. Cheyer. "A speaker identification agent." In Audio- and Video-based Biometric Person Authentication, 261–66. Berlin, Heidelberg: Springer Berlin Heidelberg, 1997. http://dx.doi.org/10.1007/bfb0016003.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Jourlin, Pierre, Juergen Luettin, Dominique Genoud, and Hubert Wassner. "Acoustic-labial speaker verification." In Audio- and Video-based Biometric Person Authentication, 319–26. Berlin, Heidelberg: Springer Berlin Heidelberg, 1997. http://dx.doi.org/10.1007/bfb0016011.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Furui, Sadaoki. "Recent advances in speaker recognition." In Audio- and Video-based Biometric Person Authentication, 235–52. Berlin, Heidelberg: Springer Berlin Heidelberg, 1997. http://dx.doi.org/10.1007/bfb0016001.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Campr, Pavel, Marie Kunešová, Jan Vaněk, Jan Čech, and Josef Psutka. "Audio-Video Speaker Diarization for Unsupervised Speaker and Face Model Creation." In Text, Speech and Dialogue, 465–72. Cham: Springer International Publishing, 2014. http://dx.doi.org/10.1007/978-3-319-10816-2_56.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Charlet, D., and D. Jouvet. "Optimizing feature set for speaker verification." In Audio- and Video-based Biometric Person Authentication, 203–10. Berlin, Heidelberg: Springer Berlin Heidelberg, 1997. http://dx.doi.org/10.1007/bfb0015997.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Almaadeed, Noor, Amar Aggoun, and Abbes Amira. "Audio-Visual Feature Fusion for Speaker Identification." In Neural Information Processing, 56–67. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012. http://dx.doi.org/10.1007/978-3-642-34475-6_8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Khalidov, Vasil, Florence Forbes, Miles Hansard, Elise Arnaud, and Radu Horaud. "Audio-Visual Clustering for 3D Speaker Localization." In Machine Learning for Multimodal Interaction, 86–97. Berlin, Heidelberg: Springer Berlin Heidelberg, 2008. http://dx.doi.org/10.1007/978-3-540-85853-9_8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Nguyen, Trung Hieu, Eng Siong Chng, and Haizhou Li. "Speaker Diarization: An Emerging Research." In Speech and Audio Processing for Coding, Enhancement and Recognition, 229–77. New York, NY: Springer New York, 2014. http://dx.doi.org/10.1007/978-1-4939-1456-2_8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Neumaier, Theresa. "New Englishes and Conversation Analysis." In Varieties of English Around the World, 65–83. Amsterdam: John Benjamins Publishing Company, 2023. http://dx.doi.org/10.1075/veaw.g68.04neu.

Full text
Abstract:
This study assesses the potential of using conversation analytic methodology to investigate syntactic variation in New Englishes. It analyses transcripts and audio files of face-to-face interactions between speakers of Caribbean and Southeast Asian Englishes and illustrates how syntax provides essential clues allowing interactants to project upcoming places of speaker change. Current speakers might adapt their turns underway to avoid transition to a next speaker, but speaker groups differ when it comes to which syntactic constructions they prefer in this context. As these interactional preferences seem to correlate with linguistic preferences (such as a high frequency of topicalization), the present study suggests that they constitute a case of emergent grammar, and hence should be considered a factor in investigating syntactic variation.
APA, Harvard, Vancouver, ISO, and other styles
10

Chibelushi, Claude C. "Fuzzy Audio-Visual Feature Maps for Speaker Identification." In Applications and Science in Soft Computing, 317–22. Berlin, Heidelberg: Springer Berlin Heidelberg, 2004. http://dx.doi.org/10.1007/978-3-540-45240-9_43.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Audio speaker"

1

Meignier, Sylvain, Jean-François Bonastre, and Ivan Magrin-Chagnolleau. "Speaker utterances tying among speaker segmented audio documents using hierarchical classification: towards speaker indexing of audio databases." In 7th International Conference on Spoken Language Processing (ICSLP 2002). ISCA: ISCA, 2002. http://dx.doi.org/10.21437/icslp.2002-196.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Chen, Tianxiang, Avrosh Kumar, Parav Nagarsheth, Ganesh Sivaraman, and Elie Khoury. "Generalization of Audio Deepfake Detection." In Odyssey 2020 The Speaker and Language Recognition Workshop. ISCA: ISCA, 2020. http://dx.doi.org/10.21437/odyssey.2020-19.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Kandasamy, Veera Vignesh, and Anup Bera. "Improving Robustness of Age and Gender Prediction based on Custom Speech Data." In 8th International Conference on Signal, Image Processing and Embedded Systems (SIGEM 2022). Academy and Industry Research Collaboration Center (AIRCC), 2022. http://dx.doi.org/10.5121/csit.2022.122005.

Full text
Abstract:
With the increased use of human-machine interaction via voice enabled smart devices over the years, there are growing demands for better accuracy of the speech analytics systems. Several studies show that speech analytics system exhibits bias towards speaker demographics, such age, gender, race, accent etc. To avoid such a bias, speaker demographic information can be used to prepare training dataset for the speech analytics model. Also, speaker demographic information can be used for targeted advertisement, recommendation, and forensic science. In this research we will demonstrate some algorithms for age and gender prediction from speech data with our custom dataset that covers speakers from around the world with varying accents. In order to extract speaker age and gender from speech data, we’ve also included a method for determining the appropriate length of audio file to be ingested into the system, which will reduce computational time. This study also identifies the most effective padding and cropping mechanism for obtaining the best results from the input audio file. We investigated the impact of various parameters on the performance and end-to-end implementation of a real-time speaker age and gender information extraction system. Our best model has a RMSE value of 4.1 for age prediction and 99.5% for gender prediction on custom test dataset.
APA, Harvard, Vancouver, ISO, and other styles
4

Cetnarowicz, Damian, and Adam Dabrowski. "Speaker tracking audio-video system." In 2016 Signal Processing: Algorithms, Architectures, Arrangements and Applications (SPA). IEEE, 2016. http://dx.doi.org/10.1109/spa.2016.7763618.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Nwe, Tin Lay, Hanwu Sun, Haizhou Li, and Susanto Rahardja. "Speaker diarization in meeting audio." In ICASSP 2009 - 2009 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2009. http://dx.doi.org/10.1109/icassp.2009.4960523.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Wilcox, Lynn D., Don Kimber, and Francine R. Chen. "Audio indexing using speaker identification." In SPIE's 1994 International Symposium on Optics, Imaging, and Instrumentation, edited by Richard J. Mammone and J. David Murley, Jr. SPIE, 1994. http://dx.doi.org/10.1117/12.191878.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Nwe, Tin Lay, Minghui Dong, Swe Zin Kalayar Khine, and Haizhou Li. "Multi-speaker meeting audio segmentation." In Interspeech 2008. ISCA: ISCA, 2008. http://dx.doi.org/10.21437/interspeech.2008-625.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Naigao Jin, Yi Zhang, and Fuliang Yin. "Audio-visual 3D speaker tracking." In IET International Conference on Wireless Mobile and Multimedia Networks Proceedings (ICWMMN 2006). IEE, 2006. http://dx.doi.org/10.1049/cp:20061239.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Remes, Ulpu, Janne Pylkkonen, and Mikko Kurimo. "Segregation of Speakers for Speaker Adaptation in TV News Audio." In 2007 IEEE International Conference on Acoustics, Speech, and Signal Processing. IEEE, 2007. http://dx.doi.org/10.1109/icassp.2007.366954.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Jiang, Ziyue, Yi Ren, Ming Lei, and Zhou Zhao. "FedSpeech: Federated Text-to-Speech with Continual Learning." In Thirtieth International Joint Conference on Artificial Intelligence {IJCAI-21}. California: International Joint Conferences on Artificial Intelligence Organization, 2021. http://dx.doi.org/10.24963/ijcai.2021/527.

Full text
Abstract:
Federated learning enables collaborative training of machine learning models under strict privacy restrictions and federated text-to-speech aims to synthesize natural speech of multiple users with a few audio training samples stored in their devices locally. However, federated text-to-speech faces several challenges: very few training samples from each speaker are available, training samples are all stored in local device of each user, and global model is vulnerable to various attacks. In this paper, we propose a novel federated learning architecture based on continual learning approaches to overcome the difficulties above. Specifically, 1) we use gradual pruning masks to isolate parameters for preserving speakers' tones; 2) we apply selective masks for effectively reusing knowledge from tasks; 3) a private speaker embedding is introduced to keep users' privacy. Experiments on a reduced VCTK dataset demonstrate the effectiveness of FedSpeech: it nearly matches multi-task training in terms of multi-speaker speech quality; moreover, it sufficiently retains the speakers' tones and even outperforms the multi-task training in the speaker similarity experiment.
APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "Audio speaker"

1

Issues in Data Processing and Relevant Population Selection. OSAC Speaker Recognition Subcommittee, November 2022. http://dx.doi.org/10.29325/osac.tg.0006.

Full text
Abstract:
In Forensic Automatic Speaker Recognition (FASR), forensic examiners typically compare audio recordings of a speaker whose identity is in question with recordings of known speakers to assist investigators and triers of fact in a legal proceeding. The performance of automated speaker recognition (SR) systems used for this purpose depends largely on the characteristics of the speech samples being compared. Examiners must understand the requirements of specific systems in use as well as the audio characteristics that impact system performance. Mismatch conditions between the known and questioned data samples are of particular importance, but the need for, and impact of, audio pre-processing must also be understood. The data selected for use in a relevant population can also be critical to the performance of the system. This document describes issues that arise in the processing of case data and in the selections of a relevant population for purposes of conducting an examination using a human supervised automatic speaker recognition approach in a forensic context. The document is intended to comply with the Organization of Scientific Area Committees (OSAC) for Forensic Science Technical Guidance Document.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography