Journal articles on the topic 'Audio speaker'

To see the other types of publications on this topic, follow the link: Audio speaker.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Audio speaker.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Burton, Paul. "Audio speaker." Journal of the Acoustical Society of America 89, no. 1 (January 1991): 495. http://dx.doi.org/10.1121/1.400405.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Tsuda, Shiro. "Audio speaker and method for assembling an audio speaker." Journal of the Acoustical Society of America 118, no. 2 (2005): 589. http://dx.doi.org/10.1121/1.2040247.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Tsuda, Shiro. "Audio speaker and method for assembling an audio speaker." Journal of the Acoustical Society of America 123, no. 2 (2008): 586. http://dx.doi.org/10.1121/1.2857671.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Page, Steven L. "Audio speaker system." Journal of the Acoustical Society of America 99, no. 3 (1996): 1277. http://dx.doi.org/10.1121/1.414786.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Yagisawa, Toshihiro. "Audio mirror speaker." Journal of the Acoustical Society of America 100, no. 1 (1996): 23. http://dx.doi.org/10.1121/1.415929.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Kery, Ervin, and Steve A. Alverson. "Audio speaker system." Journal of the Acoustical Society of America 91, no. 3 (March 1992): 1794. http://dx.doi.org/10.1121/1.403719.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Minnerath, Donald L., and Robert J. Minnerath. "Audio speaker apparatus." Journal of the Acoustical Society of America 87, no. 2 (February 1990): 931. http://dx.doi.org/10.1121/1.398815.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Babel, Molly. "Adaptation to Social-Linguistic Associations in Audio-Visual Speech." Brain Sciences 12, no. 7 (June 28, 2022): 845. http://dx.doi.org/10.3390/brainsci12070845.

Full text
Abstract:
Listeners entertain hypotheses about how social characteristics affect a speaker’s pronunciation. While some of these hypotheses may be representative of a demographic, thus facilitating spoken language processing, others may be erroneous stereotypes that impede comprehension. As a case in point, listeners’ stereotypes of language and ethnicity pairings in varieties of North American English can improve intelligibility and comprehension, or hinder these processes. Using audio-visual speech this study examines how listeners adapt to speech in noise from four speakers who are representative of selected accent-ethnicity associations in the local speech community: an Asian English-L1 speaker, a white English-L1 speaker, an Asian English-L2 speaker, and a white English-L2 speaker. The results suggest congruent accent-ethnicity associations facilitate adaptation, and that the mainstream local accent is associated with a more diverse speech community.
APA, Harvard, Vancouver, ISO, and other styles
9

Ballesteros-Larrota, Dora Maria, Diego Renza-Torres, and Steven Andrés Camacho-Vargas. "Blind speaker identification for audio forensic purposes." DYNA 84, no. 201 (June 12, 2017): 259. http://dx.doi.org/10.15446/dyna.v84n201.60407.

Full text
Abstract:
Este artículo presenta un método ciego para identificación del hablante, con fines de audio forense. Se basa en un sistema de decisión que trabajo con reglas difusas y la correlación entre los cocleagramas del audio de prueba y de los audios de los sospechosos. Nuestro sistema proporciona salida nula, con único sospechoso o con un grupo de sospechosos. De acuerdo a las pruebas realizadas, el desempeño global del sistema (OA) es 0.97 con un valor de concordancia (índice kappa) de 0.75. Adicionalmente, a diferencia de sistemas clásicos en los que un bajo valor de selección incorrecta (FP) implica un alto valor de rechazo incorrecto (FN), nuestro sistema puede trabajar con valores de FP y FN igual a cero, de forma simultánea. Finalmente, nuestro sistema trabaja con identificación ciega, es decir, no es necesaria una fase de entrenamiento o conocimiento previo de los audios; característica importante para audio forense.
APA, Harvard, Vancouver, ISO, and other styles
10

Hillerin, Marie Georgescu de. "Speaker Protocol." Consumer Electronics Test & Development 2021, no. 2 (January 2022): 56. http://dx.doi.org/10.12968/s2754-7744(23)70084-5.

Full text
Abstract:
DXOMARK does not take audio lightly: the French quality evaluation expert built its own anechoic chamber, commissioned professional musicians, and even bought an apartment to use exclusively for its audio tests. But what exactly goes on behind the soundproof doors?
APA, Harvard, Vancouver, ISO, and other styles
11

Yaacoub Sahyoun, Joseph. "Low profile audio speaker." Journal of the Acoustical Society of America 118, no. 4 (2005): 2102. http://dx.doi.org/10.1121/1.2125192.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Noro, Masao. "SPEAKER SYSTEM, AUDIO AMPLIFIER, AND AUDIO SYSTEM." Journal of the Acoustical Society of America 131, no. 5 (2012): 4220. http://dx.doi.org/10.1121/1.4712230.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

Khoma, Volodymyr, Yuriy Khoma, Vitalii Brydinskyi, and Alexander Konovalov. "Development of Supervised Speaker Diarization System Based on the PyAnnote Audio Processing Library." Sensors 23, no. 4 (February 13, 2023): 2082. http://dx.doi.org/10.3390/s23042082.

Full text
Abstract:
Diarization is an important task when work with audiodata is executed, as it provides a solution to the problem related to the need of dividing one analyzed call recording into several speech recordings, each of which belongs to one speaker. Diarization systems segment audio recordings by defining the time boundaries of utterances, and typically use unsupervised methods to group utterances belonging to individual speakers, but do not answer the question “who is speaking?” On the other hand, there are biometric systems that identify individuals on the basis of their voices, but such systems are designed with the prerequisite that only one speaker is present in the analyzed audio recording. However, some applications involve the need to identify multiple speakers that interact freely in an audio recording. This paper proposes two architectures of speaker identification systems based on a combination of diarization and identification methods, which operate on the basis of segment-level or group-level classification. The open-source PyAnnote framework was used to develop the system. The performance of the speaker identification system was verified through the application of the AMI Corpus open-source audio database, which contains 100 h of annotated and transcribed audio and video data. The research method consisted of four experiments to select the best-performing supervised diarization algorithms on the basis of PyAnnote. The first experiment was designed to investigate how the selection of the distance function between vector embedding affects the reliability of identification of a speaker’s utterance in a segment-level classification architecture. The second experiment examines the architecture of cluster-centroid (group-level) classification, i.e., the selection of the best clustering and classification methods. The third experiment investigates the impact of different segmentation algorithms on the accuracy of identifying speaker utterances, and the fourth examines embedding window sizes. Experimental results demonstrated that the group-level approach offered better identification results were compared to the segment-level approach, and the latter had the advantage of real-time processing.
APA, Harvard, Vancouver, ISO, and other styles
14

Weychan, Radoslaw, Tomasz Marciniak, Agnieszka Stankiewicz, and Adam Dabrowski. "Real Time Recognition Of Speakers From Internet Audio Stream." Foundations of Computing and Decision Sciences 40, no. 3 (September 1, 2015): 223–33. http://dx.doi.org/10.1515/fcds-2015-0014.

Full text
Abstract:
Abstract In this paper we present an automatic speaker recognition technique with the use of the Internet radio lossy (encoded) speech signal streams. We show an influence of the audio encoder (e.g., bitrate) on the speaker model quality. The model of each speaker was calculated with the use of the Gaussian mixture model (GMM) approach. Both the speaker recognition and the further analysis were realized with the use of short utterances to facilitate real time processing. The neighborhoods of the speaker models were analyzed with the use of the ISOMAP algorithm. The experiments were based on four 1-hour public debates with 7–8 speakers (including the moderator), acquired from the Polish radio Internet services. The presented software was developed with the MATLAB environment.
APA, Harvard, Vancouver, ISO, and other styles
15

Hustad, Katherine C., and Meghan A. Cahill. "Effects of Presentation Mode and Repeated Familiarization on Intelligibility of Dysarthric Speech." American Journal of Speech-Language Pathology 12, no. 2 (May 2003): 198–208. http://dx.doi.org/10.1044/1058-0360(2003/066).

Full text
Abstract:
Clinical measures of speech intelligibility are widely used as one means of characterizing the speech of individuals with dysarthria. Many variables associated with both the speaker and the listener contribute to what is actually measured as intelligibility. The present study explored the effects of presentation modality (audiovisual vs. audio-only information) and the effects of speaker-specific familiarization across 4 trials on the intelligibility of speakers with mild and severe dysarthria associated with cerebral palsy. Results revealed that audiovisual information did not enhance intelligibility relative to audio-only information for 4 of the 5 speakers studied. The one speaker whose intelligibility increased when audiovisual information was presented had the most severe dysarthria and concomitant motor impairments. Results for speaker-specific repeated familiarization were relatively homogeneous across speakers, demonstrating significant intelligibility score improvements across 4 trials and, in particular, a significant improvement in intelligibility between the 1st and 4th trials.
APA, Harvard, Vancouver, ISO, and other styles
16

Vryzas, Nikolaos, Nikolaos Tsipas, and Charalampos Dimoulas. "Web Radio Automation for Audio Stream Management in the Era of Big Data." Information 11, no. 4 (April 11, 2020): 205. http://dx.doi.org/10.3390/info11040205.

Full text
Abstract:
Radio is evolving in a changing digital media ecosystem. Audio-on-demand has shaped the landscape of big unstructured audio data available online. In this paper, a framework for knowledge extraction is introduced, to improve discoverability and enrichment of the provided content. A web application for live radio production and streaming is developed. The application offers typical live mixing and broadcasting functionality, while performing real-time annotation as a background process by logging user operation events. For the needs of a typical radio station, a supervised speaker classification model is trained for the recognition of 24 known speakers. The model is based on a convolutional neural network (CNN) architecture. Since not all speakers are known in radio shows, a CNN-based speaker diarization method is also proposed. The trained model is used for the extraction of fixed-size identity d-vectors. Several clustering algorithms are evaluated, having the d-vectors as input. The supervised speaker recognition model for 24 speakers scores an accuracy of 88.34%, while unsupervised speaker diarization scores a maximum accuracy of 87.22%, as tested on an audio file with speech segments from three unknown speakers. The results are considered encouraging regarding the applicability of the proposed methodology.
APA, Harvard, Vancouver, ISO, and other styles
17

Rottenberg, William B., and Robert S. Robinson. "AUDIO SPEAKER WITH RADIAL ELECTROMAGNET." Journal of the Acoustical Society of America 132, no. 3 (2012): 1867. http://dx.doi.org/10.1121/1.4752132.

Full text
APA, Harvard, Vancouver, ISO, and other styles
18

Goldfarb, Barry S. "Audio bass speaker driver circuit." Journal of the Acoustical Society of America 103, no. 6 (June 1998): 3133. http://dx.doi.org/10.1121/1.423006.

Full text
APA, Harvard, Vancouver, ISO, and other styles
19

Spindler, William E. "Audio speaker with harmonic enclosure." Journal of the Acoustical Society of America 113, no. 2 (2003): 682. http://dx.doi.org/10.1121/1.1560236.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

V, Sethuram, Ande Prasad, and R. Rajeswara Rao. "Metaheuristic adapted convolutional neural network for Telugu speaker diarization." Intelligent Decision Technologies 15, no. 4 (January 10, 2022): 561–77. http://dx.doi.org/10.3233/idt-211005.

Full text
Abstract:
In speech technology, a pivotal role is being played by the Speaker diarization mechanism. In general, speaker diarization is the mechanism of partitioning the input audio stream into homogeneous segments based on the identity of the speakers. The automatic transcription readability can be improved with the speaker diarization as it is good in recognizing the audio stream into the speaker turn and often provides the true speaker identity. In this research work, a novel speaker diarization approach is introduced under three major phases: Feature Extraction, Speech Activity Detection (SAD), and Speaker Segmentation and Clustering process. Initially, from the input audio stream (Telugu language) collected, the Mel Frequency Cepstral coefficient (MFCC) based features are extracted. Subsequently, in Speech Activity Detection (SAD), the music and silence signals are removed. Then, the acquired speech signals are segmented for each individual speaker. Finally, the segmented signals are subjected to the speaker clustering process, where the Optimized Convolutional Neural Network (CNN) is used. To make the clustering more appropriate, the weight and activation function of CNN are fine-tuned by a new Self Adaptive Sea Lion Algorithm (SA-SLnO). Finally, a comparative analysis is made to exhibit the superiority of the proposed speaker diarization work. Accordingly, the accuracy of the proposed method is 0.8073, which is 5.255, 2.45%, and 0.075, superior to the existing works.
APA, Harvard, Vancouver, ISO, and other styles
21

Pejovic, Jovana, Eiling Yee, and Monika Molnar. "Speaker matters: Natural inter-speaker variation affects 4-month-olds’ perception of audio-visual speech." First Language 40, no. 2 (September 27, 2019): 113–27. http://dx.doi.org/10.1177/0142723719876382.

Full text
Abstract:
In the language development literature, studies often make inferences about infants’ speech perception abilities based on their responses to a single speaker. However, there can be significant natural variability across speakers in how speech is produced (i.e., inter-speaker differences). The current study examined whether inter-speaker differences can affect infants’ ability to detect a mismatch between the auditory and visual components of vowels. Using an eye-tracker, 4.5-month-old infants were tested on auditory-visual (AV) matching for two vowels (/i/ and /u/). Critically, infants were tested with two speakers who naturally differed in how distinctively they articulated the two vowels within and across the categories. Only infants who watched and listened to the speaker whose visual articulations of the two vowels were most distinct from one another were sensitive to AV mismatch. This speaker also produced a visually more distinct /i/ as compared to the other speaker. This finding suggests that infants are sensitive to the distinctiveness of AV information across speakers, and that when making inferences about infants’ perceptual abilities, characteristics of the speaker should be taken into account.
APA, Harvard, Vancouver, ISO, and other styles
22

Rokanatnam, Thurgeaswary, and Hazinah Kutty Mammi. "Study on Gender Identification Based on Audio Recordings Using Gaussian Mixture Model and Mel Frequency Cepstrum Coefficient Technique." International Journal of Innovative Computing 11, no. 2 (October 31, 2021): 35–41. http://dx.doi.org/10.11113/ijic.v11n2.343.

Full text
Abstract:
Speaker recognition is an ability to identify speaker’s characteristics based from spoken language. The purpose of this study is to identify gender of speakers based on audio recordings. The objective of this study is to evaluate the accuracy rate of this technique to differentiate the gender and also to determine the performance rate to classify even when using self-acquired recordings. Audio forensics uses voice recordings as part of evidence to solve cases. This study is mainly conducted to provide an easier technique to identify the unknown speaker characteristics in forensic field. This experiment is fulfilled by training the pattern classifier using gender dependent data. In order to train the model, a speech database is obtained from an online speech corpus comprising of both male and female speakers. During the testing phase, apart from the data from speech corpus, audio recordings of UTM students will too be used to determine the accuracy rate of this speaker identification experiment. As for the technique to run this experiment, Mel Frequency Cepstrum Coefficient (MFCC) algorithm is used to extract the features from speech data while Gaussian Mixture Model (GMM) is used to model the gender identifier. Noise removal was not used for any speech data in this experiment. Python software is used to extract using MFCC coefficients and model the behavior using GMM technique. Experiment results show that GMM-MFCC technique can identify gender regardless of language but with varying accuracy rate.
APA, Harvard, Vancouver, ISO, and other styles
23

Wang, Suzhen, Lincheng Li, Yu Ding, and Xin Yu. "One-Shot Talking Face Generation from Single-Speaker Audio-Visual Correlation Learning." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 3 (June 28, 2022): 2531–39. http://dx.doi.org/10.1609/aaai.v36i3.20154.

Full text
Abstract:
Audio-driven one-shot talking face generation methods are usually trained on video resources of various persons. However, their created videos often suffer unnatural mouth shapes and asynchronous lips because those methods struggle to learn a consistent speech style from different speakers. We observe that it would be much easier to learn a consistent speech style from a specific speaker, which leads to authentic mouth movements. Hence, we propose a novel one-shot talking face generation framework by exploring consistent correlations between audio and visual motions from a specific speaker and then transferring audio-driven motion fields to a reference image. Specifically, we develop an Audio-Visual Correlation Transformer (AVCT) that aims to infer talking motions represented by keypoint based dense motion fields from an input audio. In particular, considering audio may come from different identities in deployment, we incorporate phonemes to represent audio signals. In this manner, our AVCT can inherently generalize to audio spoken by other identities. Moreover, as face keypoints are used to represent speakers, AVCT is agnostic against appearances of the training speaker, and thus allows us to manipulate face images of different identities readily. Considering different face shapes lead to different motions, a motion field transfer module is exploited to reduce the audio-driven dense motion field gap between the training identity and the one-shot reference. Once we obtained the dense motion field of the reference image, we employ an image renderer to generate its talking face videos from an audio clip. Thanks to our learned consistent speaking style, our method generates authentic mouth shapes and vivid movements. Extensive experiments demonstrate that our synthesized videos outperform the state-of-the-art in terms of visual quality and lip-sync.
APA, Harvard, Vancouver, ISO, and other styles
24

Chapple, Boo, and William Wong. "Can You Hear the Femur Play? Bone Audio Speakers at the Nanoscale." Leonardo 41, no. 4 (August 2008): 355–59. http://dx.doi.org/10.1162/leon.2008.41.4.355.

Full text
Abstract:
This paper describes the research process involved in making audio speakers out of cow bone. The paper begins by discussing the conceptual basis of the work. It goes on to explain the piezoelectric nature of the bone matrix and how this makes it possible for bone to operate as an audio speaker. It then chronicles the process of working from a theoretical possibility to a functional speaker. In the concluding section of the paper, the final artifacts and conceptual outcomes of the process are discussed.
APA, Harvard, Vancouver, ISO, and other styles
25

Garcia, Jane Mertz, and Paul A. Dagenais. "Dysarthric Sentence Intelligibility." Journal of Speech, Language, and Hearing Research 41, no. 6 (December 1998): 1282–93. http://dx.doi.org/10.1044/jslhr.4106.1282.

Full text
Abstract:
This study examined changes in the sentence intelligibility scores of speakers with dysarthria in association with different signal-independent factors (contextual influences). This investigation focused on the presence or absence of iconic gestures while speaking sentences with low or high semantic predictiveness. The speakers were 4 individuals with dysarthria, who varied from one another in terms of their level of speech intelligibility impairment, gestural abilities, and overall level of motor functioning. Ninety-six inexperienced listeners (24 assigned to each speaker) orthographically transcribed 16 test sentences presented in an audio + video or audio-only format. The sentences had either low or high semantic predictiveness and were spoken by each speaker with and without the corresponding gestures. The effects of signal-independent factors (presence or absence of iconic gestures, low or high semantic predictiveness, and audio + video or audio-only presentation formats) were analyzed for individual speakers. Not all signal-independent information benefited speakers similarly. Results indicated that use of gestures and high semantic predictiveness improved sentence intelligibility for 2 speakers. The other 2 speakers benefited from high predictive messages. The audio + video presentation mode enhanced listener understanding for all speakers, although there were interactions related to specific speaking situations. Overall, the contributions of relevant signal-independent information were greater for the speakers with more severely impaired intelligibility. The results are discussed in terms of understanding the contribution of signal-independent factors to the communicative process.
APA, Harvard, Vancouver, ISO, and other styles
26

Prabhala, Jagat Chaitanya, Venkatnareshbabu K, and Ragoju Ravi. "OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIARIZATION SYSTEMS: A MATHEMATICAL FORMULATION." Applied Mathematics and Sciences An International Journal (MathSJ) 10, no. 1/2 (June 26, 2023): 1–10. http://dx.doi.org/10.5121/mathsj.2023.10201.

Full text
Abstract:
Speaker diarization is a critical task in speech processing that aims to identify "who spoke when?" in an audio or video recording that contains unknown amounts of speech from unknown speakers and unknown number of speakers. Diarization has numerous applications in speech recognition, speaker identification, and automatic captioning. Supervised and unsupervised algorithms are used to address speaker diarization problems, but providing exhaustive labeling for the training dataset can become costly in supervised learning, while accuracy can be compromised when using unsupervised approaches. This paper presents a novel approach to speaker diarization, which defines loosely labeled data and employs x-vector embedding and a formalized approach for threshold searching with a given abstract similarity metric to cluster temporal segments into unique user segments. The proposed algorithm uses concepts of graph theory, matrix algebra, and genetic algorithm to formulate and solve the optimization problem. Additionally, the algorithm is applied to English, Spanish, and Chinese audios, and the performance is evaluated using wellknown similarity metrics. The results demonstrate that the robustness of the proposed approach. The findings of this research have significant implications for speech processing, speaker identification including those with tonal differences. The proposed method offers a practical and efficient solution for speaker diarization in real-world scenarios where there are labeling time and cost constraints
APA, Harvard, Vancouver, ISO, and other styles
27

Rakhmanenko, I. A., A. A. Shelupanov, and E. Y. Kostyuchenko. "Automatic text-independent speaker verification using convolutional deep belief network." Computer Optics 44, no. 4 (August 2020): 596–605. http://dx.doi.org/10.18287/2412-6179-co-621.

Full text
Abstract:
This paper is devoted to the use of the convolutional deep belief network as a speech feature extractor for automatic text-independent speaker verification. The paper describes the scope and problems of automatic speaker verification systems. Types of modern speaker verification systems and types of speech features used in speaker verification systems are considered. The structure and learning algorithm of convolutional deep belief networks is described. The use of speech features extracted from three layers of a trained convolution deep belief network is proposed. Experimental studies of the proposed features were performed on two speech corpora: own speech corpus including audio recordings of 50 speakers and TIMIT speech corpus including audio recordings of 630 speakers. The accuracy of the proposed features was assessed using different types of classifiers. Direct use of these features did not increase the accuracy compared to the use of traditional spectral speech features, such as mel-frequency cepstral coefficients. However, the use of these features in the classifiers ensemble made it possible to achieve a reduction of the equal error rate to 0.21% on 50-speaker speech corpus and to 0.23% on the TIMIT speech corpus.
APA, Harvard, Vancouver, ISO, and other styles
28

Kasai, Junichi, Hiroshi Imai, and Takayuki Yanagishima. "Audio speaker system for automotive vehicle." Journal of the Acoustical Society of America 91, no. 3 (March 1992): 1796. http://dx.doi.org/10.1121/1.403705.

Full text
APA, Harvard, Vancouver, ISO, and other styles
29

Huang, Wan-Fang, and Chang Hua City. "Mult-channel audio center speaker device." Journal of the Acoustical Society of America 119, no. 2 (2006): 686. http://dx.doi.org/10.1121/1.2174496.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

Stiles, Enrique M., and Richard C. Calderwood. "Thermal chimney equipped audio speaker cabinet." Journal of the Acoustical Society of America 122, no. 2 (2007): 695. http://dx.doi.org/10.1121/1.2771300.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Yanagishima, Takayuki. "Driver unit for automotive audio speaker." Journal of the Acoustical Society of America 80, no. 1 (July 1986): 371. http://dx.doi.org/10.1121/1.394076.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

Sunardi, Ariyawan, Aripin Triyanto, Nurkahfi Irwansyah, Woro Agus Nurtiyanto, Awalludin Saputra, and Muhammad Koiru Ikhsan. "PELATIHAN PEMASANGAN DAN PERAWATAN AUDIO SYSTEM DI MUSHOLA BAITURROHMAN, TAMBORA-JAKBAR." Jurnal Pengabdian Kepada Masyarakat (JPKM) - Aphelion 1, no. 01 (September 14, 2020): 11. http://dx.doi.org/10.32493/jpka.v1i01.6901.

Full text
Abstract:
Masjid dan musholla adalah tempat beribadah untuk umat islam. Peralatan pendukungnya antara lain audio system. Audio system digunakan untuk mengumandangkan adzan serta iqomah. Bagian pendukung dari peralatan audio system, antara lain adalah amplifier, speaker, microphone dan kabel. Amplifier berguna untuk mengatur suara yang keluar ke speaker. Amplifier dilengkapi dengan pengaturan keseimbangan suara, bass dan treble yang masing-masing digunakan untuk memperjelas suara. Speaker digunakan untuk audio system di dalam maupun luar mushola. Microphone digunakan untuk penghubung dari amplifier ke speaker. Musholla Baiturrohman adalah salah satu tempat ibadah yang terletak di daerah Tambora-Jakarta Barat. Setelah melakukan survei di Musholla Baiturrohman, mahasiswa dan dosen melihat peralatan audio system yang sudah tidak berfungsi. Amplifier sudah tidak bisa dikontrol, microphone tidak mengeluarkan suara ke speaker dan instalasi listrik yang tidak aman dari jangkauan manusia. Kami melakukan pelatihan pemasangan dan perawatan audio system kepada para jamaah Musholla Baiturrohman. Pelatihan ini untuk meningkatkan kemampuan jamaah atau juga menghasilkan para ahli pemasangan dan perawatan audio system untuk berwirausaha. Metode pelatihan dengan ceramah, diskusi dan praktikum pemasangan audio system di Musholla Baiturrohman. Pelatihan perawatan juga dilakukan pembimbingan agar jamaah bisa menjaga lifetime dari perangkat audio system yang terpasang. Audio system Musholla Baiturrohman telah terpasang dan berfungsi dengan baik. Para jamaah mengikuti pelatihan ini dengan baik dan antusias
APA, Harvard, Vancouver, ISO, and other styles
33

Zhang, Xu, and Liguo Weng. "Realistic Speech-Driven Talking Video Generation with Personalized Pose." Complexity 2020 (December 28, 2020): 1–8. http://dx.doi.org/10.1155/2020/6629634.

Full text
Abstract:
In this work, we propose a method to transform a speaker’s speech information into a target character’s talking video; the method could make the mouth shape synchronization, expression, and body posture more realistic in the synthesized speaker video. This is a challenging task because changes of mouth shape and posture are coupled with audio semantic information. The model training is difficult to converge, and the model effect is unstable in complex scenes. Existing speech-driven speaker methods cannot solve this problem well. The method proposed in this paper first generates the sequence of key points of the speaker’s face and body postures from the audio signal in real time and then visualizes these key points as a series of two-dimensional skeleton images. Subsequently, we generate the final real speaker video through the video generation network. We take a random sampling of audio clips, encode audio contents and temporal correlations using a more effective network structure, and optimize and iterate network outputs using differential loss and attitude perception loss, so as to obtain a smoother pose key-point sequence and better performance. In addition, by inserting a specified action frame into the synthesized human pose sequence window, action poses of the synthesized speaker are enriched, making the synthesis effect more realistic and natural. Then, the final speaker video is generated by the obtained gesture key points through the video generation network. In order to generate realistic and high-resolution pose detail videos, we insert a local attention mechanism into the key point network of the generated pose sequence and give higher attention to the local details of the characters through spatial weight masks. In order to verify the effectiveness of the proposed method, we used the objective evaluation index NME and user subjective evaluation methods, respectively. Experiment results showed that our method could vividly use audio contentsto generate corresponding speaker videos, and its lip-matching accuracy and expression postures are better than those of previous work. Compared with existing methods in the NME index and user subjective evaluation, our method showed better results.
APA, Harvard, Vancouver, ISO, and other styles
34

Poojary, Nigam R., and K. H. Ashish. "Text To Speech with Custom Voice." International Journal for Research in Applied Science and Engineering Technology 11, no. 4 (April 30, 2023): 4523–30. http://dx.doi.org/10.22214/ijraset.2023.51217.

Full text
Abstract:
Abstract: The Text to Speech with Custom Voice system described in this work has vast applicability in numerous industries, including entertainment, education, and accessibility. The proposed text-to-speech (TTS) system is capable of generating speech audio in custom voices, even those not included in the training data. The system comprises a speaker encoder, a synthesizer, and a WaveRNN vocoder. Multiple speakers from a dataset of clean speech without transcripts are used to train the speaker encoder for a speaker verification process. The reference speech of the target speaker is used to create a fixed-dimensional embedding vector. Using the speaker embedding, the synthesizer network based on Tacotron2 creates a mel spectrogram from text, and the WaveRNN vocoder transforms the mel spectrogram into time-domain waveform samples. These waveform samples are converted to audio, which is the output of our work. The adaptable modular design enables external users to quickly integrate the Text to Speech with Custom Voice system into their products. Additionally, users can edit specific modules and pipeline phases in this work without changing the source code. To achieve the best performance, the speaker encoder, synthesizer, and vocoder must be trained on a variety of speaker datasets.
APA, Harvard, Vancouver, ISO, and other styles
35

Tsuchida, Masaru, Takahito Kawanishi, Hiroshi Murase, and Shigeru Takagi. "Joint Audio-Visual Tracking Based on Dynamically Weighted Linear Combination of Probability State Density." Journal of Advanced Computational Intelligence and Intelligent Informatics 8, no. 2 (March 20, 2004): 190–99. http://dx.doi.org/10.20965/jaciii.2004.p0190.

Full text
Abstract:
This paper proposes a method that can be applied to speaker tracking under stabilized, continuous conditions using visual and audio information even when input information is interrupted due to disturbance or occlusion caused by the effects of noise or varying illumination. Using this method, the position of a speaker is expressed based on a likelihood distribution that is obtained through integration of visual information and audio information. First, visual and audio information is integrated as as a weighted linear combination of probability density distribution, which is estimated as a result of the observation of the visual and audio information. In this case, the weight is taken as a variable, which varys in proportion to the maximum value of probability density distributions obtained for each type of information. Next, the result obtained as described above and the weighted linear combination of the distribution in the past are obtained, and the result thus obtained is taken as the likelihood distribution related to the position of the speaker. By changing the weight dynamically, it becomes possible to select the type of information freely or to add weight and, accordingly, to conduct stabilized, continuous tracking even when the speaker cannot be detected momentarily due to occlusion, voice interruption, or noise. We conducted a series of experiments on speaker tracking using circular microphone array and an omni-directional camera. In this way, we have succeeded in confirming it possible to perform stabilized tracking on speakers continuously in spite of occlusion or voice interruption.
APA, Harvard, Vancouver, ISO, and other styles
36

SUWANNATHAT, Thatsaphan, Jun-ichi IMAI, and Masahide KANEKO. "1P1-K06 Audio-Visual Speaker Detection in Human-Robot Interaction." Proceedings of JSME annual Conference on Robotics and Mechatronics (Robomec) 2007 (2007): _1P1—K06_1—_1P1—K06_4. http://dx.doi.org/10.1299/jsmermd.2007._1p1-k06_1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
37

Wilcox, Lynn D. "Unsupervised speaker clustering for automatic speaker indexing of recorded audio data." Journal of the Acoustical Society of America 103, no. 4 (April 1998): 1701. http://dx.doi.org/10.1121/1.421064.

Full text
APA, Harvard, Vancouver, ISO, and other styles
38

Kimber, Donald G. "Method of speaker clustering for unknown speakers in conversation a audio data." Journal of the Acoustical Society of America 102, no. 5 (1997): 2480. http://dx.doi.org/10.1121/1.420370.

Full text
APA, Harvard, Vancouver, ISO, and other styles
39

Hustad, Katherine C., and Jane Mertz Garcia. "Aided and Unaided Speech Supplementation Strategies." Journal of Speech, Language, and Hearing Research 48, no. 5 (October 2005): 996–1012. http://dx.doi.org/10.1044/1092-4388(2005/068).

Full text
Abstract:
Purpose: This study compared the influence of speaker-implemented iconic hand gestures and alphabet cues on speech intelligibility scores and strategy helpfulness ratings for 3 adults with cerebral palsy and dysarthria who differed from one another in their overall motor abilities. Method: A total of 144 listeners (48 per speaker) orthographically transcribed sentences spoken with alphabet cues (aided), iconic hand gestures (unaided), and a habitual speech control condition; scores were compared within audio-visual and audio-only listening formats. Results: When listeners were presented with simultaneous audio and visual information, both alphabet cues and hand gestures resulted in higher intelligibility scores and higher helpfulness ratings than the no-cues control condition for each of the 3 speakers. When listeners were presented with only the audio signal, alphabet cues and gestures again resulted in higher intelligibility scores than no cues for 2 of the 3 speakers. Temporal acoustic analyses showed that alphabet cues had consistent effects on speech production, including reduced speech rate, reduced articulation rate, and increased frequency and duration of pauses. Findings for gestures were less consistent, with marked differences noted among speakers. Conclusions: Results illustrate that individual differences play an important role in the value of supplemental augmentative and alternative communication strategies and that aided and unaided strategies can have similar positive effects on the communication of speakers with global motor impairment.
APA, Harvard, Vancouver, ISO, and other styles
40

Singh, Satyanand. "Forensic and Automatic Speaker Recognition System." International Journal of Electrical and Computer Engineering (IJECE) 8, no. 5 (October 1, 2018): 2804. http://dx.doi.org/10.11591/ijece.v8i5.pp2804-2811.

Full text
Abstract:
<span lang="EN-US">Current Automatic Speaker Recognition (ASR) System has emerged as an important medium of confirmation of identity in many businesses, ecommerce applications, forensics and law enforcement as well. Specialists trained in criminological recognition can play out this undertaking far superior by looking at an arrangement of acoustic, prosodic, and semantic attributes which has been referred to as structured listening. An algorithmbased system has been developed in the recognition of forensic speakers by physics scientists and forensic linguists to reduce the probability of a contextual bias or pre-centric understanding of a reference model with the validity of an unknown audio sample and any suspicious individual. Many researchers are continuing to develop automatic algorithms in signal processing and machine learning so that improving performance can effectively introduce the speaker’s identity, where the automatic system performs equally with the human audience. In this paper, I examine the literature about the identification of speakers by machines and humans, emphasizing the key technical speaker pattern emerging for the automatic technology in the last decade. I focus on many aspects of automatic speaker recognition (ASR) systems, including speaker-specific features, speaker models, standard assessment data sets, and performance metrics</span>
APA, Harvard, Vancouver, ISO, and other styles
41

Ahmad, Zubair, Alquhayz, and Ditta. "Multimodal Speaker Diarization Using a Pre-Trained Audio-Visual Synchronization Model." Sensors 19, no. 23 (November 25, 2019): 5163. http://dx.doi.org/10.3390/s19235163.

Full text
Abstract:
Speaker diarization systems aim to find ‘who spoke when?’ in multi-speaker recordings. The dataset usually consists of meetings, TV/talk shows, telephone and multi-party interaction recordings. In this paper, we propose a novel multimodal speaker diarization technique, which finds the active speaker through audio-visual synchronization model for diarization. A pre-trained audio-visual synchronization model is used to find the synchronization between a visible person and the respective audio. For that purpose, short video segments comprised of face-only regions are acquired using a face detection technique and are then fed to the pre-trained model. This model is a two streamed network which matches audio frames with their respective visual input segments. On the basis of high confidence video segments inferred by the model, the respective audio frames are used to train Gaussian mixture model (GMM)-based clusters. This method helps in generating speaker specific clusters with high probability. We tested our approach on a popular subset of AMI meeting corpus consisting of 5.4 h of recordings for audio and 5.8 h of different set of multimodal recordings. A significant improvement is noticed with the proposed method in term of DER when compared to conventional and fully supervised audio based speaker diarization. The results of the proposed technique are very close to the complex state-of-the art multimodal diarization which shows significance of such simple yet effective technique.
APA, Harvard, Vancouver, ISO, and other styles
42

Han, Cong, James O’Sullivan, Yi Luo, Jose Herrero, Ashesh D. Mehta, and Nima Mesgarani. "Speaker-independent auditory attention decoding without access to clean speech sources." Science Advances 5, no. 5 (May 2019): eaav6134. http://dx.doi.org/10.1126/sciadv.aav6134.

Full text
Abstract:
Speech perception in crowded environments is challenging for hearing-impaired listeners. Assistive hearing devices cannot lower interfering speakers without knowing which speaker the listener is focusing on. One possible solution is auditory attention decoding in which the brainwaves of listeners are compared with sound sources to determine the attended source, which can then be amplified to facilitate hearing. In realistic situations, however, only mixed audio is available. We utilize a novel speech separation algorithm to automatically separate speakers in mixed audio, with no need for the speakers to have prior training. Our results show that auditory attention decoding with automatically separated speakers is as accurate and fast as using clean speech sounds. The proposed method significantly improves the subjective and objective quality of the attended speaker. Our study addresses a major obstacle in actualization of auditory attention decoding that can assist hearing-impaired listeners and reduce listening effort for normal-hearing subjects.
APA, Harvard, Vancouver, ISO, and other styles
43

Dong, Yingjun, Neil G. MacLaren, Yiding Cao, Francis J. Yammarino, Shelley D. Dionne, Michael D. Mumford, Shane Connelly, Hiroki Sayama, and Gregory A. Ruark. "Utterance Clustering Using Stereo Audio Channels." Computational Intelligence and Neuroscience 2021 (September 25, 2021): 1–8. http://dx.doi.org/10.1155/2021/6151651.

Full text
Abstract:
Utterance clustering is one of the actively researched topics in audio signal processing and machine learning. This study aims to improve the performance of utterance clustering by processing multichannel (stereo) audio signals. Processed audio signals were generated by combining left- and right-channel audio signals in a few different ways and then by extracting the embedded features (also called d-vectors) from those processed audio signals. This study applied the Gaussian mixture model for supervised utterance clustering. In the training phase, a parameter-sharing Gaussian mixture model was obtained to train the model for each speaker. In the testing phase, the speaker with the maximum likelihood was selected as the detected speaker. Results of experiments with real audio recordings of multiperson discussion sessions showed that the proposed method that used multichannel audio signals achieved significantly better performance than a conventional method with mono-audio signals in more complicated conditions.
APA, Harvard, Vancouver, ISO, and other styles
44

Jeyalakshmy, Mrs G. "Connection with the Multiple Bluetooth Speaker with the Single Device." International Journal for Research in Applied Science and Engineering Technology 11, no. 6 (June 30, 2023): 558–62. http://dx.doi.org/10.22214/ijraset.2023.53442.

Full text
Abstract:
Abstract: This paper presents the design and development of an Audio Connector application that enables users to route audio from a single input source to multiple output devices simultaneously. It is able to communicate with a wide range of devices without an interface. Users can concurrently stream audio to two wireless speakers or headphones using the Dual Audio functionality. Additionally, users can independently adjust the media output volume of each audio device. The times when you and your buddies would constantly quarrel about the volume of the audio are long gone. Theoretically, all Bluetooth 5.0 and higher-capable devices can use the dual audio capability. However, other elements like the hardware capabilities of your device and its operating system, for example, the Android version, also affect your ability to use this feature. The Bluetooth signal's maximum range is still 10 metres, despite advancing technical requirements for wireless communication
APA, Harvard, Vancouver, ISO, and other styles
45

Linn, Aaron, and Leif Blackmon. "AUDIO SPEAKER HAVING A REMOVABLE VOICE COIL." Journal of the Acoustical Society of America 131, no. 3 (2012): 2343. http://dx.doi.org/10.1121/1.3696735.

Full text
APA, Harvard, Vancouver, ISO, and other styles
46

Harrison, Stanley N. "Speaker system with folded audio transmission passage." Journal of the Acoustical Society of America 90, no. 6 (December 1991): 3395. http://dx.doi.org/10.1121/1.401324.

Full text
APA, Harvard, Vancouver, ISO, and other styles
47

Stiles, Enrique M., and Richard C. Calderwood. "Audio speaker with graduated voice coil windings." Journal of the Acoustical Society of America 126, no. 1 (2009): 516. http://dx.doi.org/10.1121/1.3182960.

Full text
APA, Harvard, Vancouver, ISO, and other styles
48

Toh, Hilary, Pei Xuan Lee, Boon Pang Lim, and Nancy F. Chen. "Detecting speaker change in background audio streams." Journal of the Acoustical Society of America 134, no. 5 (November 2013): 4074. http://dx.doi.org/10.1121/1.4830881.

Full text
APA, Harvard, Vancouver, ISO, and other styles
49

Fan, Xing, and John H. L. Hansen. "Speaker Identification Within Whispered Speech Audio Streams." IEEE Transactions on Audio, Speech, and Language Processing 19, no. 5 (July 2011): 1408–21. http://dx.doi.org/10.1109/tasl.2010.2091631.

Full text
APA, Harvard, Vancouver, ISO, and other styles
50

Fedigan, Stephen John. "Apparatus And Method For Monitoring Speaker Cone Displacement In An Audio Speaker." Journal of the Acoustical Society of America 130, no. 6 (2011): 4175. http://dx.doi.org/10.1121/1.3668756.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography