To see the other types of publications on this topic, follow the link: Speech diarization.

Journal articles on the topic 'Speech diarization'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Speech diarization.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Mertens, Robert, Po-Sen Huang, Luke Gottlieb, Gerald Friedland, Ajay Divakaran, and Mark Hasegawa-Johnson. "On the Applicability of Speaker Diarization to Audio Indexing of Non-Speech and Mixed Non-Speech/Speech Video Soundtracks." International Journal of Multimedia Data Engineering and Management 3, no. 3 (2012): 1–19. http://dx.doi.org/10.4018/jmdem.2012070101.

Full text
Abstract:
A video’s soundtrack is usually highly correlated to its content. Hence, audio-based techniques have recently emerged as a means for video concept detection complementary to visual analysis. Most state-of-the-art approaches rely on manual definition of predefined sound concepts such as “ngine sounds,” “utdoor/indoor sounds.” These approaches come with three major drawbacks: manual definitions do not scale as they are highly domain-dependent, manual definitions are highly subjective with respect to annotators and a large part of the audio content is omitted since the predefined concepts are usu
APA, Harvard, Vancouver, ISO, and other styles
2

Astapov, Sergei, Aleksei Gusev, Marina Volkova, et al. "Application of Fusion of Various Spontaneous Speech Analytics Methods for Improving Far-Field Neural-Based Diarization." Mathematics 9, no. 23 (2021): 2998. http://dx.doi.org/10.3390/math9232998.

Full text
Abstract:
Recently developed methods in spontaneous speech analytics require the use of speaker separation based on audio data, referred to as diarization. It is applied to widespread use cases, such as meeting transcription based on recordings from distant microphones and the extraction of the target speaker’s voice profiles from noisy audio. However, speech recognition and analysis can be hindered by background and point-source noise, overlapping speech, and reverberation, which all affect diarization quality in conjunction with each other. To compensate for the impact of these factors, there are a va
APA, Harvard, Vancouver, ISO, and other styles
3

Lyu, Ke-Ming, Ren-yuan Lyu, and Hsien-Tsung Chang. "Real-time multilingual speech recognition and speaker diarization system based on Whisper segmentation." PeerJ Computer Science 10 (March 29, 2024): e1973. http://dx.doi.org/10.7717/peerj-cs.1973.

Full text
Abstract:
This research presents the development of a cutting-edge real-time multilingual speech recognition and speaker diarization system that leverages OpenAI’s Whisper model. The system specifically addresses the challenges of automatic speech recognition (ASR) and speaker diarization (SD) in dynamic, multispeaker environments, with a focus on accurately processing Mandarin speech with Taiwanese accents and managing frequent speaker switches. Traditional speech recognition systems often fall short in such complex multilingual and multispeaker contexts, particularly in SD. This study, therefore, inte
APA, Harvard, Vancouver, ISO, and other styles
4

Prabhala, Jagat Chaitanya, Venkatnareshbabu K, and Ragoju Ravi. "OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIARIZATION SYSTEMS: A MATHEMATICAL FORMULATION." Applied Mathematics and Sciences An International Journal (MathSJ) 10, no. 1/2 (2023): 1–10. http://dx.doi.org/10.5121/mathsj.2023.10201.

Full text
Abstract:
Speaker diarization is a critical task in speech processing that aims to identify "who spoke when?" in an audio or video recording that contains unknown amounts of speech from unknown speakers and unknown number of speakers. Diarization has numerous applications in speech recognition, speaker identification, and automatic captioning. Supervised and unsupervised algorithms are used to address speaker diarization problems, but providing exhaustive labeling for the training dataset can become costly in supervised learning, while accuracy can be compromised when using unsupervised approaches. This
APA, Harvard, Vancouver, ISO, and other styles
5

V, Sethuram, Ande Prasad, and R. Rajeswara Rao. "Metaheuristic adapted convolutional neural network for Telugu speaker diarization." Intelligent Decision Technologies 15, no. 4 (2022): 561–77. http://dx.doi.org/10.3233/idt-211005.

Full text
Abstract:
In speech technology, a pivotal role is being played by the Speaker diarization mechanism. In general, speaker diarization is the mechanism of partitioning the input audio stream into homogeneous segments based on the identity of the speakers. The automatic transcription readability can be improved with the speaker diarization as it is good in recognizing the audio stream into the speaker turn and often provides the true speaker identity. In this research work, a novel speaker diarization approach is introduced under three major phases: Feature Extraction, Speech Activity Detection (SAD), and
APA, Harvard, Vancouver, ISO, and other styles
6

Murali, Abhejay, Satwik Dutta, Meena Chandra Shekar, Dwight Irvin, Jay Buzhardt, and John H. Hansen. "Towards developing speaker diarization for parent-child interactions." Journal of the Acoustical Society of America 152, no. 4 (2022): A61. http://dx.doi.org/10.1121/10.0015551.

Full text
Abstract:
Daily interactions of children with their parents are crucial for spoken language skills and overall development. Capturing such interactions can help to provide meaningful feedback to parents as well as practitioners. Naturalistic audio capture and developing further speech processing pipeline for parent-child interactions is a challenging problem. One of the first important steps in the speech processing pipeline is Speaker Diarization—to identify who spoke when. Speaker Diarization is the method of separating a captured audio stream into analogous segments that are differentiated by the spe
APA, Harvard, Vancouver, ISO, and other styles
7

Taha, Thaer Mufeed, Zaineb Ben Messaoud, and Mondher Frikha. "Convolutional Neural Network Architectures for Gender, Emotional Detection from Speech and Speaker Diarization." International Journal of Interactive Mobile Technologies (iJIM) 18, no. 03 (2024): 88–103. http://dx.doi.org/10.3991/ijim.v18i03.43013.

Full text
Abstract:
This paper introduces three system architectures for speaker identification that aim to overcome the limitations of diarization and voice-based biometric systems. Diarization systems utilize unsupervised algorithms to segment audio data based on the time boundaries of utterances, but they do not distinguish individual speakers. On the other hand, voice-based biometric systems can only identify individuals in recordings with a single speaker. Identifying speakers in recordings of natural conversations can be challenging, especially when emotional shifts can alter voice characteristics, making g
APA, Harvard, Vancouver, ISO, and other styles
8

Kothalkar, Prasanna V., John H. L. Hansen, Dwight Irvin, and Jay Buzhardt. "Child-adult speech diarization in naturalistic conditions of preschool classrooms using room-independent ResNet model and automatic speech recognition-based re-segmentation." Journal of the Acoustical Society of America 155, no. 2 (2024): 1198–215. http://dx.doi.org/10.1121/10.0024353.

Full text
Abstract:
Speech and language development are early indicators of overall analytical and learning ability in children. The preschool classroom is a rich language environment for monitoring and ensuring growth in young children by measuring their vocal interactions with teachers and classmates. Early childhood researchers are naturally interested in analyzing naturalistic vs controlled lab recordings to measure both quality and quantity of such interactions. Unfortunately, present-day speech technologies are not capable of addressing the wide dynamic scenario of early childhood classroom settings. Due to
APA, Harvard, Vancouver, ISO, and other styles
9

Kshirod, Kshirod Sarmah. "Speaker Diarization with Deep Learning Techniques." Turkish Journal of Computer and Mathematics Education (TURCOMAT) 11, no. 3 (2020): 2570–82. http://dx.doi.org/10.61841/turcomat.v11i3.14309.

Full text
Abstract:
Speaker diarization is a task to identify the speaker when different speakers spoke in an audio or video recording environment. Artificial intelligence (AI) fields have effectively used Deep Learning (DL) to solve a variety of real-world application challenges. With effective applications in a wide range of subdomains, such as natural language processing, image processing, computer vision, speech and speaker recognition, and emotion recognition, cyber security, and many others, DL, a very innovative field of Machine Learning (ML), that is quickly emerging as the most potent machine learning te
APA, Harvard, Vancouver, ISO, and other styles
10

Lleida, Eduardo, Alfonso Ortega, Antonio Miguel, et al. "Albayzin 2018 Evaluation: The IberSpeech-RTVE Challenge on Speech Technologies for Spanish Broadcast Media." Applied Sciences 9, no. 24 (2019): 5412. http://dx.doi.org/10.3390/app9245412.

Full text
Abstract:
The IberSpeech-RTVE Challenge presented at IberSpeech 2018 is a new Albayzin evaluation series supported by the Spanish Thematic Network on Speech Technologies (Red Temática en Tecnologías del Habla (RTTH)). That series was focused on speech-to-text transcription, speaker diarization, and multimodal diarization of television programs. For this purpose, the Corporacion Radio Television Española (RTVE), the main public service broadcaster in Spain, and the RTVE Chair at the University of Zaragoza made more than 500 h of broadcast content and subtitles available for scientists. The dataset includ
APA, Harvard, Vancouver, ISO, and other styles
11

Ahmad, Rehan, Syed Zubair, and Hani Alquhayz. "Speech Enhancement for Multimodal Speaker Diarization System." IEEE Access 8 (2020): 126671–80. http://dx.doi.org/10.1109/access.2020.3007312.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Kothalkar, Prasanna V., Dwight Irvin, Jay Buzhardt, and John H. Hansen. "End-to-end child-adult speech diarization in naturalistic conditions of preschool classrooms." Journal of the Acoustical Society of America 153, no. 3_supplement (2023): A174. http://dx.doi.org/10.1121/10.0018568.

Full text
Abstract:
Speech and language development are early indicators of overall analytical and learning ability in pre-school children. Early childhood researchers are interested in analyzing naturalistic versus controlled lab recordings to assess both quality and quantity of such communication interactions between children and adults/teachers. Unfortunately, present-day speech technologies are not capable of addressing the wide dynamic scenario of early childhood classroom settings. Due to diversity of acoustic events/conditionsin daylong audio streams, automated speaker diarization technology is limited and
APA, Harvard, Vancouver, ISO, and other styles
13

Kaur, Sukhvinder, and J. S. Sohal. "Speech Activity Detection and its Evaluation in Speaker Diarization System." INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY 16, no. 1 (2017): 7567–72. http://dx.doi.org/10.24297/ijct.v16i1.5893.

Full text
Abstract:
In speaker diarization, the speech/voice activity detection is performed to separate speech, non-speech and silent frames. Zero crossing rate and root mean square value of frames of audio clips has been used to select training data for silent, speech and nonspeech models. The trained models are used by two classifiers, Gaussian mixture model (GMM) and Artificial neural network (ANN), to classify the speech and non-speech frames of audio clip. The results of ANN and GMM classifier are compared by Receiver operating characteristics (ROC) curve and Detection ErrorTradeoff (DET) graph. It is concl
APA, Harvard, Vancouver, ISO, and other styles
14

Hansen, John H., Aditya Joglekar, and Meena Chandra Shekar. "Fearless steps Apollo: Advancements in robust speech technologies and naturalistic corpus development from Earth to the Moon." Journal of the Acoustical Society of America 152, no. 4 (2022): A61. http://dx.doi.org/10.1121/10.0015549.

Full text
Abstract:
Recent developments in deep learning strategies have revolutionized Speech and Language Technologies(SLT). Deep learning models often rely on massive naturalistic datasets to produce the necessary complexity required for generating superior performance. However, most massive SLT datasets are not publicly available, limiting the potential for academic research. Through this work, we showcase the CRSS-UTDallas led efforts to recover, digitize, and openly distribute over 50,000 hrs of speech data recorded during the 12 NASA Apollo manned missions, and outline our continuing efforts to digitize an
APA, Harvard, Vancouver, ISO, and other styles
15

Sultan, Wael Ali, Mourad Samir Semary, and Sherif Mahdy Abdou. "An Efficient Speaker Diarization Pipeline for Conversational Speech." Benha Journal of Applied Sciences 9, no. 5 (2024): 141–46. http://dx.doi.org/10.21608/bjas.2024.284482.1414.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

Kone, Tenon Charly, Sebastian Ghinet, Sayed Ahmed Dana, and Anant Grewal. "Speech detection models for effective communicable disease risk assessment in air travel environments." Journal of the Acoustical Society of America 155, no. 3_Supplement (2024): A277. http://dx.doi.org/10.1121/10.0027492.

Full text
Abstract:
In environments characterized by elevated noise levels, such as airports or aircraft cabins, travelers often find themselves involuntarily speaking loudly and drawing closer to one another in an effort to enhance communication and speech intelligibility. Unfortunately, this unintentional behaviour increases the risk of respiratory particles dispersion, potentially carrying infectious agents like bacteria which makes the contagion control more challenging. The accurate characterization of the risk associated to speaking, in such a challenging noise environment with multiple overlapping speech s
APA, Harvard, Vancouver, ISO, and other styles
17

Zelenak, Martin, Carlos Segura, Jordi Luque, and Javier Hernando. "Simultaneous Speech Detection With Spatial Features for Speaker Diarization." IEEE Transactions on Audio, Speech, and Language Processing 20, no. 2 (2012): 436–46. http://dx.doi.org/10.1109/tasl.2011.2160167.

Full text
APA, Harvard, Vancouver, ISO, and other styles
18

Viñals, Ignacio, Alfonso Ortega, Antonio Miguel, and Eduardo Lleida. "The Domain Mismatch Problem in the Broadcast Speaker Attribution Task." Applied Sciences 11, no. 18 (2021): 8521. http://dx.doi.org/10.3390/app11188521.

Full text
Abstract:
The demand of high-quality metadata for the available multimedia content requires the development of new techniques able to correctly identify more and more information, including the speaker information. The task known as speaker attribution aims at identifying all or part of the speakers in the audio under analysis. In this work, we carry out a study of the speaker attribution problem in the broadcast domain. Through our experiments, we illustrate the positive impact of diarization on the final performance. Additionally, we show the influence of the variability present in broadcast data, dep
APA, Harvard, Vancouver, ISO, and other styles
19

Indu D. "A Methodology for Speaker Diazaration System Based on LSTM and MFCC Coefficients." Journal of Electrical Systems 20, no. 6s (2024): 2938–45. http://dx.doi.org/10.52783/jes.3299.

Full text
Abstract:
Research on Speaker Identification is always difficult. A speaker may be automatically identified using by comparing their voice sample with their previously recorded voice, the machine learning strategy has grown in favor in recent years. Convolutional neural networks (CNN) , deep neural networks (DNN) are some of the machine learning techniques that has employed recently. The article will discuss a successful speaker verification system based on the d-vector to construct a new approach based on speaker diarization. In particular, in this article, we use the concept of LSTM to cluster the spe
APA, Harvard, Vancouver, ISO, and other styles
20

Sathyapriya, S., and A. Indhumathi. "An Efficient Speaker Diarization using Privacy Preserving Audio Features Based of Speech/Non Speech Detection." International Journal of Computer Trends and Technology 9, no. 4 (2014): 184–87. http://dx.doi.org/10.14445/22312803/ijctt-v9p136.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Huang, Zili, Marc Delcroix, Leibny Paola Garcia, Shinji Watanabe, Desh Raj, and Sanjeev Khudanpur. "Joint speaker diarization and speech recognition based on region proposal networks." Computer Speech & Language 72 (March 2022): 101316. http://dx.doi.org/10.1016/j.csl.2021.101316.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

Khoma, Volodymyr, Yuriy Khoma, Vitalii Brydinskyi, and Alexander Konovalov. "Development of Supervised Speaker Diarization System Based on the PyAnnote Audio Processing Library." Sensors 23, no. 4 (2023): 2082. http://dx.doi.org/10.3390/s23042082.

Full text
Abstract:
Diarization is an important task when work with audiodata is executed, as it provides a solution to the problem related to the need of dividing one analyzed call recording into several speech recordings, each of which belongs to one speaker. Diarization systems segment audio recordings by defining the time boundaries of utterances, and typically use unsupervised methods to group utterances belonging to individual speakers, but do not answer the question “who is speaking?” On the other hand, there are biometric systems that identify individuals on the basis of their voices, but such systems are
APA, Harvard, Vancouver, ISO, and other styles
23

Jung, Dahae, Min-Kyoung Bae, Man Yong Choi, Eui Chul Lee, and Jinoo Joung. "Speaker diarization method of telemarketer and client for improving speech dictation performance." Journal of Supercomputing 72, no. 5 (2015): 1757–69. http://dx.doi.org/10.1007/s11227-015-1470-4.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Zhu, Qiushi, Jie Zhang, Yu Gu, Yuchen Hu, and Lirong Dai. "Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi-Modal Speech Representation." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 17 (2024): 19768–76. http://dx.doi.org/10.1609/aaai.v38i17.29951.

Full text
Abstract:
Self-supervised speech pre-training methods have developed rapidly in recent years, which show to be very effective for many near-field single-channel speech tasks. However, far-field multichannel speech processing is suffering from the scarcity of labeled multichannel data and complex ambient noises. The efficacy of self-supervised learning for far-field multichannel and multi-modal speech processing has not been well explored. Considering that visual information helps to improve speech recognition performance in noisy scenes, in this work we propose the multichannel multi-modal speech self-s
APA, Harvard, Vancouver, ISO, and other styles
25

Papala, Gowtham, Aniket Ransing, and Pooja Jain. "Sentiment Analysis and Speaker Diarization in Hindi and Marathi Using using Finetuned Whisper." Scalable Computing: Practice and Experience 24, no. 4 (2023): 835–46. http://dx.doi.org/10.12694/scpe.v24i4.2248.

Full text
Abstract:
Automatic Speech Recognition (ASR) is a crucial technology that enables machines to automatically recognize human voices based on audio signals. In recent years, there has been a rigorous growth in the development of ASR models with the emergence of new techniques and algorithms. One such model is the Whisper ASR model developed by OpenAI, which is based on a Transformer encoder-decoder architecture and can handle multiple tasks such as language identification, transcription, and translation. However, there are still limitations to the Whisper ASR model, such as speaker diarization, summarizat
APA, Harvard, Vancouver, ISO, and other styles
26

Senoussaoui, Mohammed, Patrick Kenny, Themos Stafylakis, and Pierre Dumouchel. "A Study of the Cosine Distance-Based Mean Shift for Telephone Speech Diarization." IEEE/ACM Transactions on Audio, Speech, and Language Processing 22, no. 1 (2014): 217–27. http://dx.doi.org/10.1109/taslp.2013.2285474.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

Vryzas, Nikolaos, Nikolaos Tsipas, and Charalampos Dimoulas. "Web Radio Automation for Audio Stream Management in the Era of Big Data." Information 11, no. 4 (2020): 205. http://dx.doi.org/10.3390/info11040205.

Full text
Abstract:
Radio is evolving in a changing digital media ecosystem. Audio-on-demand has shaped the landscape of big unstructured audio data available online. In this paper, a framework for knowledge extraction is introduced, to improve discoverability and enrichment of the provided content. A web application for live radio production and streaming is developed. The application offers typical live mixing and broadcasting functionality, while performing real-time annotation as a background process by logging user operation events. For the needs of a typical radio station, a supervised speaker classificatio
APA, Harvard, Vancouver, ISO, and other styles
28

Lleida, Eduardo, Luis Javier Rodriguez-Fuentes, Javier Tejedor, et al. "An Overview of the IberSpeech-RTVE 2022 Challenges on Speech Technologies." Applied Sciences 13, no. 15 (2023): 8577. http://dx.doi.org/10.3390/app13158577.

Full text
Abstract:
Evaluation campaigns provide a common framework with which the progress of speech technologies can be effectively measured. The aim of this paper is to present a detailed overview of the IberSpeech-RTVE 2022 Challenges, which were organized as part of the IberSpeech 2022 conference under the ongoing series of Albayzin evaluation campaigns. In the 2022 edition, four challenges were launched: (1) speech-to-text transcription; (2) speaker diarization and identity assignment; (3) text and speech alignment; and (4) search on speech. Different databases that cover different domains (e.g., broadcast
APA, Harvard, Vancouver, ISO, and other styles
29

Hansen, John H. L., Maryam Najafian, Rasa Lileikyte, Dwight Irvin, and Beth Rous. "Speech and language processing for assessing child–adult interaction based on diarization and location." International Journal of Speech Technology 22, no. 3 (2019): 697–709. http://dx.doi.org/10.1007/s10772-019-09590-0.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

Cerva, Petr, Jan Silovsky, Jindrich Zdansky, Jan Nouza, and Ladislav Seps. "Speaker-adaptive speech recognition using speaker diarization for improved transcription of large spoken archives." Speech Communication 55, no. 10 (2013): 1033–46. http://dx.doi.org/10.1016/j.specom.2013.06.017.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Joglekar, Aditya, Ivan Lopez-Espejo, and John H. Hansen. "Fearless Steps APOLLO: Challenges in keyword spotting and topic detection for naturalistic audio streams." Journal of the Acoustical Society of America 153, no. 3_supplement (2023): A173. http://dx.doi.org/10.1121/10.0018566.

Full text
Abstract:
Fearless Steps (FS) APOLLO is a + 50,000 hr audio resource established by CRSS-UTDallas capturing all communications between NASA-MCC personnel, backroom staff, and Astronauts across manned Apollo Missions. Such a massive audio resource without metadata/unlabeled corpus provides limited benefit for communities outside Speech-and-Language Technology (SLT). Supplementing this audio with rich metadata developed using robust automated mechanisms to transcribe and highlight naturalistic communications can facilitate open research opportunities for SLT, speech sciences, education, and historical arc
APA, Harvard, Vancouver, ISO, and other styles
32

Xiao, Bo, Chewei Huang, Zac E. Imel, David C. Atkins, Panayiotis Georgiou, and Shrikanth S. Narayanan. "A technology prototype system for rating therapist empathy from audio recordings in addiction counseling." PeerJ Computer Science 2 (April 20, 2016): e59. http://dx.doi.org/10.7717/peerj-cs.59.

Full text
Abstract:
Scaling up psychotherapy services such as for addiction counseling is a critical societal need. One challenge is ensuring quality of therapy, due to the heavy cost of manual observational assessment. This work proposes a speech technology-based system to automate the assessment of therapist empathy—a key therapy quality index—from audio recordings of the psychotherapy interactions. We designed a speech processing system that includes voice activity detection and diarization modules, and an automatic speech recognizer plus a speaker role matching module to extract the therapist’s language cues.
APA, Harvard, Vancouver, ISO, and other styles
33

Kalanadhabhatta, Manasa, Mohammad Mehdi Rastikerdar, Tauhidur Rahman, Adam S. Grabell, and Deepak Ganesan. "Playlogue: Dataset and Benchmarks for Analyzing Adult-Child Conversations During Play." Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 8, no. 4 (2024): 1–34. http://dx.doi.org/10.1145/3699775.

Full text
Abstract:
There has been growing interest in developing ubiquitous technologies to analyze adult-child speech in naturalistic settings such as free play in order to support children's social and academic development, language acquisition, and parent-child interactions. However, these technologies often rely on off-the-shelf speech processing tools that have not been evaluated on child speech or child-directed adult speech, whose unique characteristics might result in significant performance gaps when using models trained on adult speech. This work introduces the Playlogue dataset containing over 33 hour
APA, Harvard, Vancouver, ISO, and other styles
34

Di Cesare, Michele Giuseppe, David Perpetuini, Daniela Cardone, and Arcangelo Merla. "Machine Learning-Assisted Speech Analysis for Early Detection of Parkinson’s Disease: A Study on Speaker Diarization and Classification Techniques." Sensors 24, no. 5 (2024): 1499. http://dx.doi.org/10.3390/s24051499.

Full text
Abstract:
Parkinson’s disease (PD) is a neurodegenerative disorder characterized by a range of motor and non-motor symptoms. One of the notable non-motor symptoms of PD is the presence of vocal disorders, attributed to the underlying pathophysiological changes in the neural control of the laryngeal and vocal tract musculature. From this perspective, the integration of machine learning (ML) techniques in the analysis of speech signals has significantly contributed to the detection and diagnosis of PD. Particularly, MEL Frequency Cepstral Coefficients (MFCCs) and Gammatone Frequency Cepstral Coefficients
APA, Harvard, Vancouver, ISO, and other styles
35

Yella, Sree Harsha, and Herve Bourlard. "Overlapping Speech Detection Using Long-Term Conversational Features for Speaker Diarization in Meeting Room Conversations." IEEE/ACM Transactions on Audio, Speech, and Language Processing 22, no. 12 (2014): 1688–700. http://dx.doi.org/10.1109/taslp.2014.2346315.

Full text
APA, Harvard, Vancouver, ISO, and other styles
36

Ghorbani, Shahram, and John H. L. Hansen. "Advanced accent/dialect identification and accentedness assessment with multi-embedding models and automatic speech recognition." Journal of the Acoustical Society of America 155, no. 6 (2024): 3848–60. http://dx.doi.org/10.1121/10.0026235.

Full text
Abstract:
The ability to accurately classify accents and assess accentedness in non-native speakers are challenging tasks due primarily to the complexity and diversity of accent and dialect variations. In this study, embeddings from advanced pretrained language identification (LID) and speaker identification (SID) models are leveraged to improve the accuracy of accent classification and non-native accentedness assessment. Findings demonstrate that employing pretrained LID and SID models effectively encodes accent/dialect information in speech. Furthermore, the LID and SID encoded accent information comp
APA, Harvard, Vancouver, ISO, and other styles
37

Anmella, Gerard, Michele De Prisco, Jeremiah B. Joyce, et al. "Automated Speech Analysis in Bipolar Disorder: The CALIBER Study Protocol and Preliminary Results." Journal of Clinical Medicine 13, no. 17 (2024): 4997. http://dx.doi.org/10.3390/jcm13174997.

Full text
Abstract:
Background: Bipolar disorder (BD) involves significant mood and energy shifts reflected in speech patterns. Detecting these patterns is crucial for diagnosis and monitoring, currently assessed subjectively. Advances in natural language processing offer opportunities to objectively analyze them. Aims: To (i) correlate speech features with manic-depressive symptom severity in BD, (ii) develop predictive models for diagnostic and treatment outcomes, and (iii) determine the most relevant speech features and tasks for these analyses. Methods: This naturalistic, observational study involved longitud
APA, Harvard, Vancouver, ISO, and other styles
38

Zeulner, Tobias, Gerhard Johann Hagerer, Moritz Müller, Ignacio Vazquez, and Peter A. Gloor. "Predicting Individual Well-Being in Teamwork Contexts Based on Speech Features." Information 15, no. 4 (2024): 217. http://dx.doi.org/10.3390/info15040217.

Full text
Abstract:
Current methods for assessing individual well-being in team collaboration at the workplace often rely on manually collected surveys. This limits continuous real-world data collection and proactive measures to improve team member workplace satisfaction. We propose a method to automatically derive social signals related to individual well-being in team collaboration from raw audio and video data collected in teamwork contexts. The goal was to develop computational methods and measurements to facilitate the mirroring of individuals’ well-being to themselves. We focus on how speech behavior is per
APA, Harvard, Vancouver, ISO, and other styles
39

Kaur, Sukhvinder, Chander Prabha, Ravinder Pal Singh, et al. "Optimized technique for speaker changes detection in multispeaker audio recording using pyknogram and efficient distance metric." PLOS ONE 19, no. 11 (2024): e0314073. http://dx.doi.org/10.1371/journal.pone.0314073.

Full text
Abstract:
Segmentation process is very popular in Speech recognition, word count, speaker indexing and speaker diarization process. This paper describes the speaker segmentation system which detects the speaker change point in an audio recording of multi speakers with the help of feature extraction and proposed distance metric algorithms. In this new approach, pre-processing of audio stream includes noise reduction, speech compression by using discrete wavelet transform (Daubechies wavelet ‘db40’ at level 2) and framing. It is followed by two feature extraction algorithms pyknogram and nonlinear energy
APA, Harvard, Vancouver, ISO, and other styles
40

Delgado, Héctor, Anna Matamala, and Javier Serrano. "Speaker diarization and speech recognition in the semi-automatization of audio description: An exploratory study on future possibilities?" Cadernos de Tradução 35, no. 2 (2015): 308. http://dx.doi.org/10.5007/2175-7968.2015v35n2p308.

Full text
APA, Harvard, Vancouver, ISO, and other styles
41

Diez, Mireia, Lukas Burget, Federico Landini, and Jan Cernocky. "Analysis of Speaker Diarization Based on Bayesian HMM With Eigenvoice Priors." IEEE/ACM Transactions on Audio, Speech, and Language Processing 28 (2020): 355–68. http://dx.doi.org/10.1109/taslp.2019.2955293.

Full text
APA, Harvard, Vancouver, ISO, and other styles
42

Dawalatabad, Nauman, Srikanth Madikeri, C. Chandra Sekhar, and Hema A. Murthy. "Novel Architectures for Unsupervised Information Bottleneck Based Speaker Diarization of Meetings." IEEE/ACM Transactions on Audio, Speech, and Language Processing 29 (2021): 14–27. http://dx.doi.org/10.1109/taslp.2020.3036231.

Full text
APA, Harvard, Vancouver, ISO, and other styles
43

O’Malley, Ronan, Bahman Mirhedari, Kirsty Harkness, et al. "055 The digital doctor: a fully automated stratification and monitoring system for patients with memory complaints." Journal of Neurology, Neurosurgery & Psychiatry 90, no. 12 (2019): A23.2—A23. http://dx.doi.org/10.1136/jnnp-2019-abn-2.76.

Full text
Abstract:
IntroductionReferrals to specialist memory clinics have increased out of proportion to the incidence of dementia. Time and financial pressures are consequently exerted on a service striving to deliver high quality care. We have developed a fully automated ‘Digital Doctor’ with the aim of providing pre-clinic risk stratification and ongoing monitoring for patients with memory concerns.MethodsWe recruited 15 participants with Functional Memory Disorder (FMD), Mild Cognitive Impairment (MCI) and Alzheimer’s disease each as well as 15 healthy controls. Participants answered 12 questions posed by t
APA, Harvard, Vancouver, ISO, and other styles
44

Ding, Huitong, Adrian Lister, Cody Karjadi, et al. "EARLY DETECTION OF ALZHEIMER’S DISEASE AND RELATED DEMENTIAS FROM VOICE RECORDINGS: THE FRAMINGHAM HEART STUDY." Innovation in Aging 7, Supplement_1 (2023): 1024. http://dx.doi.org/10.1093/geroni/igad104.3291.

Full text
Abstract:
Abstract With the aging global population and the increasing prevalence of dementia, there is a growing focus on identifying mild cognitive impairment (MCI), a pre-dementia state, to enable timely interventions that could potentially slow down neurodegeneration. Producing speech is a cognitively complex task that engages various cognitive domains, while the ease of audio data collection underscores the potential cost-effectiveness and noninvasiveness that voice may offer. This study aims to construct a machine learning pipeline that incorporates speaker diarization, feature extraction, feature
APA, Harvard, Vancouver, ISO, and other styles
45

Praharaj, Sambit, Maren Scheffel, Marcel Schmitz, Marcus Specht, and Hendrik Drachsler. "Towards Automatic Collaboration Analytics for Group Speech Data Using Learning Analytics." Sensors 21, no. 9 (2021): 3156. http://dx.doi.org/10.3390/s21093156.

Full text
Abstract:
Collaboration is an important 21st Century skill. Co-located (or face-to-face) collaboration (CC) analytics gained momentum with the advent of sensor technology. Most of these works have used the audio modality to detect the quality of CC. The CC quality can be detected from simple indicators of collaboration such as total speaking time or complex indicators like synchrony in the rise and fall of the average pitch. Most studies in the past focused on “how group members talk” (i.e., spectral, temporal features of audio like pitch) and not “what they talk”. The “what” of the conversations is mor
APA, Harvard, Vancouver, ISO, and other styles
46

Hershkovich, Leeor, Sabyasachi Bandyopadhyay, Jack Wittmayer, et al. "96 Proof of Principle: Can Paragraph Recall Pauses and Speech Frequencies Correctly Classify Cognitively Compromised Older Adults?" Journal of the International Neuropsychological Society 29, s1 (2023): 767–68. http://dx.doi.org/10.1017/s1355617723009530.

Full text
Abstract:
Objective:Recent research has found that machine learning based analysis of patient speech can be used to classify Alzheimer’s Disease. We know of no studies, however, which systematically explore the value of pausing events in speech for detecting cognitive limitations. Using retrospectively acquired voice data from paragraph memory tests, we created two types of pause features: a) the number and duration of pauses, and b) frequency components in speech immediately following pausing. Multiple machine learning models were used to assess how these features could effectively discriminate individ
APA, Harvard, Vancouver, ISO, and other styles
47

McDonald, Margarethe, Taeahn Kwon, Hyunji Kim, Youngki Lee, and Eon-Suk Ko. "Evaluating the Language ENvironment Analysis System for Korean." Journal of Speech, Language, and Hearing Research 64, no. 3 (2021): 792–808. http://dx.doi.org/10.1044/2020_jslhr-20-00489.

Full text
Abstract:
Purpose The algorithm of the Language ENvironment Analysis (LENA) system for calculating language environment measures was trained on American English; thus, its validity with other languages cannot be assumed. This article evaluates the accuracy of the LENA system applied to Korean. Method We sampled sixty 5-min recording clips involving 38 key children aged 7–18 months from a larger data set. We establish the identification error rate, precision, and recall of LENA classification compared to human coders. We then examine the correlation between standard LENA measures of adult word count, chi
APA, Harvard, Vancouver, ISO, and other styles
48

Kumar, Krishna. "Speaker Diarization: A Review." INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 07, no. 06 (2023). http://dx.doi.org/10.55041/ijsrem24075.

Full text
Abstract:
Speaker diarization is the task of determining "who spoke when?" in an audio or video recording that contains an unknown amount of speech and an unknown number of speakers. It is a challenging task due to the variability of human speech, the presence of overlapping speech, and the lack of prior information about the speakers. It is the process of labeling a speech signal with labels corresponding to the identity of speakers. It is a crucial task in audio signal processing and speech analysis. A recent review of speaker diarization research since 2018 can be found in this paper which discusses
APA, Harvard, Vancouver, ISO, and other styles
49

Xu, Sean Shensheng, Xiaoquan Ke, Man-Wai Mak, et al. "Speaker-turn aware diarization for speech-based cognitive assessments." Frontiers in Neuroscience 17 (January 16, 2024). http://dx.doi.org/10.3389/fnins.2023.1351848.

Full text
Abstract:
IntroductionSpeaker diarization is an essential preprocessing step for diagnosing cognitive impairments from speech-based Montreal cognitive assessments (MoCA).MethodsThis paper proposes three enhancements to the conventional speaker diarization methods for such assessments. The enhancements tackle the challenges of diarizing MoCA recordings on two fronts. First, multi-scale channel interdependence speaker embedding is used as the front-end speaker representation for overcoming the acoustic mismatch caused by far-field microphones. Specifically, a squeeze-and-excitation (SE) unit and channel-d
APA, Harvard, Vancouver, ISO, and other styles
50

Roberto Sánchez Cárdenas and Marvin Coto-Jiménez. "Application of Fischer semi discriminant analysis for speaker diarization in costa rican radio broadcasts." Revista Tecnología en Marcha, November 16, 2022. http://dx.doi.org/10.18845/tm.v35i8.6464.

Full text
Abstract:
Automatic segmentation and classification of audio streams is a challenging problem, with many applications, such as indexing multi – media digital libraries, information retrieving, and the building of speech corpus or spoken corpus) for particular languages and accents. Those corpus is a database of speech audio files and the corresponding text transcriptions. Among the several steps and tasks required for any of those applications, the speaker diarization is one of the most relevant, because it pretends to find boundaries in the audio recordings according to who speaks in each fragment. Spe
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!