Journal articles on the topic 'Automated speech Recognition'

To see the other types of publications on this topic, follow the link: Automated speech Recognition.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Automated speech Recognition.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Vucovich, Megan, Rami R. Hallac, Alex A. Kane, Julie Cook, Cortney Van'T Slot, and James R. Seaward. "Automated cleft speech evaluation using speech recognition." Journal of Cranio-Maxillofacial Surgery 45, no. 8 (August 2017): 1268–71. http://dx.doi.org/10.1016/j.jcms.2017.05.002.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Manikandan, K., Apurva Singh, Sakshi Agarwal, and Ankita Singh. "Automated Scrolling Using Speech Recognition." International Journal of Technology 7, no. 1 (2017): 15. http://dx.doi.org/10.5958/2231-3915.2017.00004.9.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Smith, L. A., B. L. Scott, L. S. Lin, and J. M. Newell. "Automated training for speech recognition." Journal of the Acoustical Society of America 86, S1 (November 1989): S78. http://dx.doi.org/10.1121/1.2027652.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Koenecke, Allison, Andrew Nam, Emily Lake, Joe Nudell, Minnie Quartey, Zion Mengesha, Connor Toups, John R. Rickford, Dan Jurafsky, and Sharad Goel. "Racial disparities in automated speech recognition." Proceedings of the National Academy of Sciences 117, no. 14 (March 23, 2020): 7684–89. http://dx.doi.org/10.1073/pnas.1915768117.

Full text
Abstract:
Automated speech recognition (ASR) systems, which use sophisticated machine-learning algorithms to convert spoken language to text, have become increasingly widespread, powering popular virtual assistants, facilitating automated closed captioning, and enabling digital dictation platforms for health care. Over the last several years, the quality of these systems has dramatically improved, due both to advances in deep learning and to the collection of large-scale datasets used to train the systems. There is concern, however, that these tools do not work equally well for all subgroups of the population. Here, we examine the ability of five state-of-the-art ASR systems—developed by Amazon, Apple, Google, IBM, and Microsoft—to transcribe structured interviews conducted with 42 white speakers and 73 black speakers. In total, this corpus spans five US cities and consists of 19.8 h of audio matched on the age and gender of the speaker. We found that all five ASR systems exhibited substantial racial disparities, with an average word error rate (WER) of 0.35 for black speakers compared with 0.19 for white speakers. We trace these disparities to the underlying acoustic models used by the ASR systems as the race gap was equally large on a subset of identical phrases spoken by black and white individuals in our corpus. We conclude by proposing strategies—such as using more diverse training datasets that include African American Vernacular English—to reduce these performance differences and ensure speech recognition technology is inclusive.
APA, Harvard, Vancouver, ISO, and other styles
5

Margolis, Robert H., Richard H. Wilson, George L. Saly, Heather M. Gregoire, and Brandon M. Madsen. "Automated Forced-Choice Tests of Speech Recognition." Journal of the American Academy of Audiology 32, no. 09 (October 2021): 606–15. http://dx.doi.org/10.1055/s-0041-1733964.

Full text
Abstract:
Abstract Purpose This project was undertaken to develop automated tests of speech recognition, including speech-recognition threshold (SRT) and word-recognition test, using forced-choice responses and computerized scoring of responses. Specific aims were (1) to develop an automated method for measuring SRT for spondaic words that produces scores that are in close agreement with average pure-tone thresholds and (2) to develop an automated test of word recognition that distinguishes listeners with normal hearing from those with sensorineural hearing loss and which informs the hearing aid evaluation process. Method An automated SRT protocol was designed to converge on the lowest level at which the listener responds correctly to two out of two spondees presented monaurally. A word-recognition test was conducted with monosyllabic words (female speaker) presented monaurally at a fixed level. For each word, there were three rhyming foils, displayed on a touchscreen with the test word. The listeners touched the word they thought they heard. Participants were young listeners with normal hearing and listeners with sensorineural hearing loss. Words were also presented with nonrhyming foils and in an open-set paradigm. The open-set responses were scored by a graduate student research assistant. Results The SRT results agreed closely with the pure-tone average (PTA) obtained by automated audiometry. The agreement was similar to results obtained with the conventional SRT scoring method. Word-recognition scores were highest for the closed-set, nonrhyming lists and lowest for open-set responses. For the hearing loss participants, the scores varied widely. There was a moderate correlation between word-recognition scores and pure-tone thresholds which increased as more high frequencies were brought into the PTA. Based on the findings of this study, a clinical protocol was designed that determines if a listener's performance was in the normal range and if the listener benefited from increasing the level of the stimuli. Conclusion SRTs obtained using the automated procedure are comparable to the results obtained by the conventional clinical method that is in common use. The automated closed-set word-recognition test results show clear differentiation between scores for the normal and hearing loss groups. These procedures provide clinical test results that are not dependent on the availability of an audiologist to perform the tests.
APA, Harvard, Vancouver, ISO, and other styles
6

Foltz, Peter W., Darrell Laham, and Marcia Derr. "Automated Speech Recognition for Modeling Team Performance." Proceedings of the Human Factors and Ergonomics Society Annual Meeting 47, no. 4 (October 2003): 673–77. http://dx.doi.org/10.1177/154193120304700402.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Townshend, Brent. "Automated language assessment using speech recognition modeling." Journal of the Acoustical Society of America 120, no. 6 (2006): 3451. http://dx.doi.org/10.1121/1.2409447.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Kuzmin, A., and S. Ivanov. "Speech to Text System for Noisy and Quiet Speech." Journal of Physics: Conference Series 2096, no. 1 (November 1, 2021): 012071. http://dx.doi.org/10.1088/1742-6596/2096/1/012071.

Full text
Abstract:
Abstract This paper examines one of the available and simple methods to develop speech recognition systems capable of recognizing speech from noisy or silent recordings. Such systems improve the automated operation of call centers, and also bring us closer to creating speech recognition models capable of ignoring the speech deficiencies of speakers.
APA, Harvard, Vancouver, ISO, and other styles
9

Patil, Vishakha. "Review on Automated Elevator-an Attentive Elevator to Elevate using Speech Recognition." Journal of Advanced Research in Power Electronics and Power Systems 08, no. 1&2 (August 6, 2021): 20–26. http://dx.doi.org/10.24321/2456.1401.202102.

Full text
Abstract:
Elevator has over time become an important part of our day-to-day life. It is used as an everyday transport device useful to move goods as well as persons. In the modern world, the city and crowded areas require multiform buildings. According to wheelchair access laws, elevators/lifts are a must requirement in new multi-stored buildings. The main purpose of this project is to operate the elevator by voice command. The project is operating based on voice, which could help handicap people or dwarf people to travel from one place to another without the help of any other person. The use of a microcontroller is to control different devices and integrate each module, namely- voice module, motor module, and LCD. LCD is used to display the present status of the lift. The reading edge of our project is the “voice recognition system” which genet’s exceptional result while recognizing speech.
APA, Harvard, Vancouver, ISO, and other styles
10

Barry, Timothy P., Kristen K. Liggett, David T. Williamson, and John M. Reising. "Enhanced Recognition Accuracy with the Simultaneous Use of Three Automated Speech Recognition Systems." Proceedings of the Human Factors Society Annual Meeting 36, no. 4 (October 1992): 288–92. http://dx.doi.org/10.1177/154193129203600406.

Full text
Abstract:
Two studies were performed to test the efficacy of using three different automated speech recognition devices in parallel to obtain speech recognition accuracies better than those produced by each of the individual systems alone. The first experiment compared the recognition accuracy of each of the three individual systems with the accuracy obtained by combining the data from all three systems using a simple “Majority Rules” algorithm. The second experiment made the same comparison, but used a more sophisticated algorithm developed using the performance data obtained from experiment 1. Results from the first experiment revealed a modest increase in speech recognition accuracy using all three systems in concert along with the Simple Majority Rules (SMR) algorithm. Results from the second experiment showed an even greater improvement in recognition accuracy using the three systems in concert and an Enhanced Majority Rules (EMR) algorithm. The implications of using intelligent software and multiple speech recognition devices to improve speech recognition accuracy are discussed.
APA, Harvard, Vancouver, ISO, and other styles
11

Et. al., Vijay A. Kotkar,. "Interactive Robot for Automated Question and Answer System." Turkish Journal of Computer and Mathematics Education (TURCOMAT) 12, no. 1S (April 11, 2021): 90–95. http://dx.doi.org/10.17762/turcomat.v12i1s.1566.

Full text
Abstract:
This paper aims in developing an intelligent interactive robot with multi-functions which provides entertainment and companion. To obtain the information accurately, we have used speech recognition to perform the operations. For the Robot behavior, planning, interactions with the voice assistant and interactions with the user the various speech recognition results are applied. The robot has a simple design. In this study, we have used the microphone for speech recognition. In addition, we have used room automation, notice display using voice message of the intelligent interaction between human and robots.
APA, Harvard, Vancouver, ISO, and other styles
12

Yu, Jing, Nianhua Ye, Xueqin Du, and Lu Han. "Automated English Speech Recognition Using Dimensionality Reduction with Deep Learning Approach." Wireless Communications and Mobile Computing 2022 (March 7, 2022): 1–11. http://dx.doi.org/10.1155/2022/3597347.

Full text
Abstract:
Speech recognition technology is a multidisciplinary field, comprising signal processing, pattern recognition, acoustics, artificial intelligence, etc. Presently, speech recognition plays a vital role in human-computer interface in information technology. Due to the advancements of deep learning (DL) models, speech recognition system has received significant attention among researchers in several areas of speech recognition like mobile communication, voice recognition, and personal digital assistance. This paper presents an automated English speech recognition using dimensionality reduction and deep learning (AESR-DRDL) approach. The proposed AESR-DRDL technique involves a series of operations, namely, feature extraction, preprocessing, dimensionality reduction, and speech recognition. During feature extraction process, a hybridization of high-dimension rich feature vectors is derived from the speech as well as glottal-waveform signals by the use of MFCC, PLPC, and MVDR techniques. Besides, the high dimensionality of features can be reduced by the design of quasioppositional poor and rich optimization algorithm (QOPROA). Moreover, the Bidirectional Long Short-Term Memory (BiLSTM) technique is employed for speech recognition, and the optimal hyperparameter tuning of the Bidirectional Long Short-Term Memory technique can be chosen using Adagrad optimizer. For the dimensionality reduction technique, the quasioppositional poor and rich optimization algorithm (QOPROA) is applied. The performance validation of the AESR-DRDL technique is carried out against benchmark datasets, and the results reported the better performance of the AESR-DRDL technique compared to recent approaches. The AESR-DRDL technique has shown to be superior in terms of recovery time, with an average of 0.50 days. The AESR-DRDL method's overall performance has been validated using benchmark datasets, and the results show that it outperforms more current technique. Because of this, the AESR-DRDL approach can be used to recognize English speech.
APA, Harvard, Vancouver, ISO, and other styles
13

Dirks, Ruthann, and Marvin J. Dirks. "Introducing Business Communication Students to Automated Speech Recognition." Journal of Education for Business 72, no. 3 (January 1997): 153–56. http://dx.doi.org/10.1080/08832323.1997.10116846.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Wharton, Cathleen, Monica Marics, and George Engelbeck. "Speech Recognition Vocabulary Scoping for Automated Call Routing." Proceedings of the Human Factors and Ergonomics Society Annual Meeting 37, no. 3 (October 1993): 240–43. http://dx.doi.org/10.1177/154193129303700306.

Full text
Abstract:
Call routing involves directing incoming telephone calls from a central number to an appropriate person or department. In the course of an ongoing work project, a quick study was performed to scope the vocabulary requirements for a speech recognition automated call routing application for a large department store. Forty-one participants were given 35 sample shopping tasks and were asked which department they would ask for when calling the store. The range of responses for a given task was large. With a 29 item recognition vocabulary consisting of most frequent responses and root phrases (e.g., “sport” for “sporting goods”), 57% of user responses would be covered. Users were also asked to rate the confidence of their department choice. The greater the variety of responses to a task across all participants, the less confident participants were of their responses.
APA, Harvard, Vancouver, ISO, and other styles
15

Goldman, R. E., M. Sanchez-Hernandez, D. Ross-Degnan, J. D. Piette, C. M. Trinacty, and S. R. Simon. "Developing an automated speech-recognition telephone diabetes intervention." International Journal for Quality in Health Care 20, no. 4 (April 10, 2008): 264–70. http://dx.doi.org/10.1093/intqhc/mzn021.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

Mönnich, G., and T. Wetter. "Requirements for Speech Recognition to Support Medical Documentation." Methods of Information in Medicine 39, no. 01 (2000): 63–69. http://dx.doi.org/10.1055/s-0038-1634252.

Full text
Abstract:
Abstract:Recent advances in the development of automated speech recognition (ASR) have made routine applications for medical documentation possible. To achieve this, ASR has to be optimally integrated into the specific documentation scenario. The classification presented in this paper allows the definition of specification requirements. For two different documentation scenarios the appropriate product selection has been done according to this classification. Two evaluation studies are presented, addressing the usefulness of applying automated speech recognition.
APA, Harvard, Vancouver, ISO, and other styles
17

HimaBindu, Gottumukkala, Gondi Lakshmeeswari, Giddaluru Lalitha, and Pedalanka P. S. Subhashini. "Recognition Using DNN with Bacterial Foraging Optimization Using MFCC Coefficients." Journal Européen des Systèmes Automatisés 54, no. 2 (April 27, 2021): 283–87. http://dx.doi.org/10.18280/jesa.540210.

Full text
Abstract:
Speech is an important mode of communication for people. For a long time, researchers have been working hard to develop conversational machines which will communicate with speech technology. Voice recognition is a part of a science called signal processing. Speech recognition is becoming more successful for providing user authentication. The process of user recognition is becoming more popular now a days for providing security by authenticating the users. With the rising importance of automated information processing and telecommunications, the usefulness of recognizing an individual from the features of user voice is increasing. In this paper, the three stages of speech recognition processing are defined as pre-processing, feature extraction and decoding. Speech comprehension has been significantly enhanced by using foreign languages. Automatic Speech Recognition (ASR) aims to translate text to speech. Speaker recognition is the method of recognizing an individual through his/her voice signals. The new speaker initially privileges identity for speaker authentication, and then the stated model is used for identification. The identity argument is approved when the match is above a predefined threshold. The speech used for these tasks may be either text-dependent or text-independent. The article uses Bacterial Foraging Optimization Algorithm (BFO) for accurate speech recognition through Mel Frequency Cepstral Coefficients (MFCC) model using DNN. Speech recognition efficiency is compared to that of the conventional system.
APA, Harvard, Vancouver, ISO, and other styles
18

Ivanko, D., and D. Ryumin. "A NOVEL TASK-ORIENTED APPROACH TOWARD AUTOMATED LIP-READING SYSTEM IMPLEMENTATION." International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLIV-2/W1-2021 (April 15, 2021): 85–89. http://dx.doi.org/10.5194/isprs-archives-xliv-2-w1-2021-85-2021.

Full text
Abstract:
Abstract. Visual information plays a key role in automatic speech recognition (ASR) when audio is corrupted by background noise, or even inaccessible. Speech recognition using visual information is called lip-reading. The initial idea of visual speech recognition comes from humans’ experience: we are able to recognize spoken words from the observation of a speaker's face without or with limited access to the sound part of the voice. Based on the conducted experimental evaluations as well as on analysis of the research field we propose a novel task-oriented approach towards practical lip-reading system implementation. Its main purpose is to be some kind of a roadmap for researchers who need to build a reliable visual speech recognition system for their task. In a rough approximation, we can divide the task of lip-reading into two parts, depending on the complexity of the problem. First, if we need to recognize isolated words, numbers or small phrases (e.g. Telephone numbers with a strict grammar or keywords). Or second, if we need to recognize continuous speech (phrases or sentences). All these stages disclosed in detail in this paper. Based on the proposed approach we implemented from scratch automatic visual speech recognition systems of three different architectures: GMM-CHMM, DNN-HMM and purely End-to-end. A description of the methodology, tools, step-by-step development and all necessary parameters are disclosed in detail in current paper. It is worth noting that for the Russian speech recognition, such systems were created for the first time.
APA, Harvard, Vancouver, ISO, and other styles
19

Saunders, Gabrielle H. "A fully automated response time test of speech recognition." Journal of the Acoustical Society of America 92, no. 4 (October 1992): 2385. http://dx.doi.org/10.1121/1.404790.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Yeracaris, Yoryos. "AUTOMATED SPEECH RECOGNITION PROXY SYSTEM FOR NATURAL LANGUAGE UNDERSTANDING." Journal of the Acoustical Society of America 135, no. 1 (2014): 573. http://dx.doi.org/10.1121/1.4861513.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Erler, Kevin, and George H. Freeman. "An articulatory‐feature‐based HMM for automated speech recognition." Journal of the Acoustical Society of America 95, no. 5 (May 1994): 2873. http://dx.doi.org/10.1121/1.409444.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

Illner, Vojtěch, Tereza Tykalová, Michal Novotný, Jiří Klempíř, Petr Dušek, and Jan Rusz. "Toward Automated Articulation Rate Analysis via Connected Speech in Dysarthrias." Journal of Speech, Language, and Hearing Research 65, no. 4 (April 4, 2022): 1386–401. http://dx.doi.org/10.1044/2021_jslhr-21-00549.

Full text
Abstract:
Purpose: This study aimed to evaluate the reliability of different approaches for estimating the articulation rates in connected speech of Parkinsonian patients with different stages of neurodegeneration compared to healthy controls. Method: Monologues and reading passages were obtained from 25 patients with idiopathic rapid eye movement sleep behavior disorder (iRBD), 25 de novo patients with Parkinson's disease (PD), 20 patients with multiple system atrophy (MSA), and 20 healthy controls. The recordings were subsequently evaluated using eight syllable localization algorithms, and their performances were compared to a manual transcript used as a reference. Results: The Google & Pyphen method, based on automatic speech recognition followed by hyphenation, outperformed the other approaches (automated vs. hand transcription: r > .87 for monologues and r > .91 for reading passages, p < .001) in precise feature estimates and resilience to dysarthric speech. The Praat script algorithm achieved sufficient robustness (automated vs. hand transcription: r > .65 for monologues and r > .78 for reading passages, p < .001). Compared to the control group, we detected a slow rate in patients with MSA and a tendency toward a slower rate in patients with iRBD, whereas the articulation rate was unchanged in patients with early untreated PD. Conclusions: The state-of-the-art speech recognition tool provided the most precise articulation rate estimates. If speech recognizer is not accessible, the freely available Praat script based on simple intensity thresholding might still provide robust properties even in severe dysarthria. Automated articulation rate assessment may serve as a natural, inexpensive biomarker for monitoring disease severity and a differential diagnosis of Parkinsonism.
APA, Harvard, Vancouver, ISO, and other styles
23

Harikant, Shashidhar, Rakshitha Prasad, Vijaya Lakshmi R, and Sidhramappa H. "SPEECH EMOTION RECOGNITION USING DEEP LEARNING." International Research Journal of Computer Science 9, no. 8 (August 13, 2022): 267–71. http://dx.doi.org/10.26562/irjcs.2022.v0908.22.

Full text
Abstract:
Speech Emotion Recognition is a present topic of the research since it has wide range of application. SER is a vital part of effective human interaction in the speech processing. Speech Emotion recognition is a domain that is growing rapidly in the recent years. Unlike humans, machines deficit the potential to perceive and express emotions. But the improvisation of human-computer interaction can be done by automated SER thereby turn down the need of human mediation in recent time. The primary goal of SER is to improve man-machine interface. This paper covers Deep Learning to train the model, Librosa to classify the audio data. In deep learning CNN is used to classify the model based on frequency parameter. This paper also contains the study of various speech emotion recognition methods like, happy, sad, angry, disgust, surprise and fear.
APA, Harvard, Vancouver, ISO, and other styles
24

Diaz-Asper, Catherine, Chelsea Chandler, R. Scott Turner, Brigid Reynolds, and Brita Elvevåg. "Acceptability of collecting speech samples from the elderly via the telephone." DIGITAL HEALTH 7 (January 2021): 205520762110021. http://dx.doi.org/10.1177/20552076211002103.

Full text
Abstract:
Objective There is a critical need to develop rapid, inexpensive and easily accessible screening tools for mild cognitive impairment (MCI) and Alzheimer’s disease (AD). We report on the efficacy of collecting speech via the telephone to subsequently develop sensitive metrics that may be used as potential biomarkers by leveraging natural language processing methods. Methods Ninety-one older individuals who were cognitively unimpaired or diagnosed with MCI or AD participated from home in an audio-recorded telephone interview, which included a standard cognitive screening tool, and the collection of speech samples. In this paper we address six questions of interest: (1) Will elderly people agree to participate in a recorded telephone interview? (2) Will they complete it? (3) Will they judge it an acceptable approach? (4) Will the speech that is collected over the telephone be of a good quality? (5) Will the speech be intelligible to human raters? (6) Will transcriptions produced by automated speech recognition accurately reflect the speech produced? Results Participants readily agreed to participate in the telephone interview, completed it in its entirety, and rated the approach as acceptable. Good quality speech was produced for further analyses to be applied, and almost all recorded words were intelligible for human transcription. Not surprisingly, human transcription outperformed off the shelf automated speech recognition software, but further investigation into automated speech recognition shows promise for its usability in future work. Conclusion Our findings demonstrate that collecting speech samples from elderly individuals via the telephone is well tolerated, practical, and inexpensive, and produces good quality data for uses such as natural language processing.
APA, Harvard, Vancouver, ISO, and other styles
25

Pekarskikh, Svetlana, Evgeny Kostyuchenko, and Lidiya Balatskaya. "Evaluation of Speech Quality Through Recognition and Classification of Phonemes." Symmetry 11, no. 12 (November 25, 2019): 1447. http://dx.doi.org/10.3390/sym11121447.

Full text
Abstract:
This paper discusses an approach for assessing the quality of speech while undergoing speech rehabilitation. One of the main reasons for speech quality decrease during the surgical treatment of vocal tract diseases is the loss of the vocal tractˈs parts and the disruption of its symmetry. In particular, one of the most common oncological diseases of the oral cavity is cancer of the tongue. During surgical treatment, a glossectomy is performed, which leads to the need for speech rehabilitation to eliminate the occurring speech defects, leading to a decrease in speech intelligibility. In this paper, we present an automated approach for conducting the speech quality evaluation. The approach relies on a convolutional neural network (CNN). The main idea of the approach is to train an individual neural network for a patient before having an operation to recognize typical sounding of phonemes for their speech. The neural network will thereby be able to evaluate the similarity between the patientˈs speech before and after the surgery. The recognition based on the full phoneme set and the recognition by groups of phonemes were considered. The correspondence of assessments obtained through the autorecognition approach with those from the human-based approach is shown. The automated approach is principally applicable to defining boundaries between phonemes. The paper shows that iterative training of the neural network and continuous updating of the training dataset gradually improve the ability of the CNN to define boundaries between different phonemes.
APA, Harvard, Vancouver, ISO, and other styles
26

A. Y. Alqaralleh, Bassam, Fahad Aldhaban, Feras Mohammed A-Matarneh, and Esam A. AlQaralleh. "Automated Handwriting Recognition and Speech Synthesizer for Indigenous Language Processing." Computers, Materials & Continua 72, no. 2 (2022): 3913–27. http://dx.doi.org/10.32604/cmc.2022.026531.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

Farkhadov, Mais Pasha. "Speech recognition in the automated queuing service systems for users." SPIIRAS Proceedings 4, no. 19 (March 17, 2014): 65. http://dx.doi.org/10.15622/sp.19.4.

Full text
APA, Harvard, Vancouver, ISO, and other styles
28

Johannes, R. S., and D. L. Carr-Locke. "The Role of Automated Speech Recognition in Endoscopic Data Collection." Endoscopy 24, S 2 (July 1992): 493–98. http://dx.doi.org/10.1055/s-2007-1010528.

Full text
APA, Harvard, Vancouver, ISO, and other styles
29

Deller, J. R., D. Hsu, and L. J. Ferrier. "Encouraging results in the automated recognition of cerebral palsy speech." IEEE Transactions on Biomedical Engineering 35, no. 3 (March 1988): 218–20. http://dx.doi.org/10.1109/10.1366.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

Gardner, Daryle Jean, David DeFruiter, Mark Keith, Mike Kline, Michael Dresel, and Robert Knapp. "Automated Speech Recognition as a Function of Formal Speech Training and Passage of Time between Template Training and Testing." Proceedings of the Human Factors Society Annual Meeting 29, no. 10 (October 1985): 937–41. http://dx.doi.org/10.1177/154193128502901008.

Full text
Abstract:
The purpose of the present study was to assess the longevity of templates trained using a speaker-dependent voice recognition system, and to determine whether recognition varies with the degree of formal speech training. Results indicate that vocabulary recognition is fairly stable over a two-month period, and that subjects with formal voice training do not appreciably differ in performance from novice speakers. However, experience with the voice recognition system did result in improved recognition performance for both trained and novice speakers.
APA, Harvard, Vancouver, ISO, and other styles
31

Alasadi, A. A., T. H. Aldhayni, R. R. Deshmukh, A. H. Alahmadi, and A. S. Alshebami. "Efficient Feature Extraction Algorithms to Develop an Arabic Speech Recognition System." Engineering, Technology & Applied Science Research 10, no. 2 (April 4, 2020): 5547–53. http://dx.doi.org/10.48084/etasr.3465.

Full text
Abstract:
This paper studies three feature extraction methods, Mel-Frequency Cepstral Coefficients (MFCC), Power-Normalized Cepstral Coefficients (PNCC), and Modified Group Delay Function (ModGDF) for the development of an Automated Speech Recognition System (ASR) in Arabic. The Support Vector Machine (SVM) algorithm processed the obtained features. These feature extraction algorithms extract speech or voice characteristics and process the group delay functionality calculated straight from the voice signal. These algorithms were deployed to extract audio forms from Arabic speakers. PNCC provided the best recognition results in Arabic speech in comparison with the other methods. Simulation results showed that PNCC and ModGDF were more accurate than MFCC in Arabic speech recognition.
APA, Harvard, Vancouver, ISO, and other styles
32

Bai, Yunling. "Pronunciation Tutor for Deaf Children based on ASR." Highlights in Science, Engineering and Technology 24 (December 27, 2022): 119–24. http://dx.doi.org/10.54097/hset.v24i.3903.

Full text
Abstract:
ASR, whose full name is Automated Speech Recognition, is a technology that converts human speech into text. Speech recognition, a multidisciplinary field, is closely related to acoustics, phonetics, linguistics, digital signal processing theory, information theory, computer science and other disciplines. ASR has been applied in educational technology such as deaf children's education in this day and age. This paper makes a preview of a project in which a computer-aided tutor for deaf children instruction based on the speech recognition technology. This tutor utilizes three effective models and is combined with data mining technology. Two evaluation approaches and overview of embedded experiment are also detailed in this paper.
APA, Harvard, Vancouver, ISO, and other styles
33

Pisoni, David B., and Howard C. Nusbaum. "Developing Methods for Assessing the Performance of Speech Synthesis and Recognition Systems." Proceedings of the Human Factors Society Annual Meeting 30, no. 13 (September 1986): 1344–48. http://dx.doi.org/10.1177/154193128603001324.

Full text
Abstract:
As speech I/O technology develops and improves, there is an increased need for standardized methods to systematically assess the performance of these systems. At the present time, speech synthesis and speech recognition technologies are at different levels of maturation and, accordingly, the procedures for testing the performance of these systems are at different stages of development. In the present paper, we describe the results of testing several text-to-speech systems using traditional intelligibility measures. In addition, we outline the design and philosophy of an automated testing procedure for measuring the performance of isolated utterance speaker-dependent speech recognition systems.
APA, Harvard, Vancouver, ISO, and other styles
34

Hicks, William T., and Robert E. Yantorno. "Determining the threshold for usable speech within co‐channel speech with the SPHINX automated speech recognition system." Journal of the Acoustical Society of America 116, no. 4 (October 2004): 2480. http://dx.doi.org/10.1121/1.4784905.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

Pyataeva, Anna, and Anton Dzyuba. "Artificial neural network technology for lips reading." E3S Web of Conferences 333 (2021): 01009. http://dx.doi.org/10.1051/e3sconf/202133301009.

Full text
Abstract:
The paper presents the use of neural networks for the task of automated speech reading by lips articulation. Speech recognition is performed in two stages. First, a face search is performed and the lips area is selected in a separate frame of the video sequence using Haar features. Then the sequence of frames goes to the input of deep learning convolutional and recurrent neural networks for speech viseme recognition. Experimental studies were carried out using independently obtained videos with Russian-speaking speakers.
APA, Harvard, Vancouver, ISO, and other styles
36

Omazić, Marija, and Martina Lekić. "Assessing speech-to-speech translation quality: Case study of the ILA S2S app." Hieronymus : Časopis za istraživanja prevođenja i terminologije 8 (2022): 1–26. http://dx.doi.org/10.17234/hieronymus.8.1.

Full text
Abstract:
Machine translation (MT) is becoming qualitatively more successful and quantitatively more productive at an unprecedented pace. It is becoming a widespread solution to the challenges of a constantly rising demand for quick and affordable translations of both text and speech, causing disruption and adjustments of the translation practice and profession, but at the same time making multilingual communication easier than ever before. This paper focuses on the speech-to-speech (S2S) translation app Instant Language Assistant (ILA), which brings together the state-of-the-art translation technology: automatic speech recognition, machine translation and text-to-speech synthesis, and allows for MT-mediated multilingual communication. The aim of the paper is to assess the quality of translations of conversational language produced by the S2S translation app ILA for en-de and en-hr language pairs. The research includes several levels of translation quality analysis: human translation quality assessment by translation experts using the Fluency/Adequacy Metrics, light-post editing, and automated MT evaluation (BLEU). Moreover, the translation output is assessed with respect to language pairs to get an insight into whether they affect the MT output quality and how. The results show a relatively high quality of translations produced by the S2S translation app ILA across all assessment models and a correlation between human and automated assessment results.
APA, Harvard, Vancouver, ISO, and other styles
37

Schnoor, Tyler T., Matthew C. Kelley, and Benjamin V. Tucker. "Automated accent rating using deep neural networks." Journal of the Acoustical Society of America 150, no. 4 (October 2021): A357. http://dx.doi.org/10.1121/10.0008581.

Full text
Abstract:
Automated accentedness rating has the potential to improve many human-computer interactions involving speech, including the adaptation of automatic speech recognition or other artificial intelligence models to the speaker's accent. Accent ratings may also be used as a metric by which language learners can quantify their progress. This study employs bidirectional long short-term memory layers in a neural network to predict human ratings of the accentedness of recorded speech. Speech data are extracted in 5-s segments from over 2000 first- and second-language English speakers from multiple corpora. Human ratings are obtained in an online experiment where participants rate the accentedness of a given speech recording on a 9-point Likert scale. Mel-frequency cepstral coefficients and mel-filterbank energy features are tested as speech input representations for the neural network. When inference is tested using 10-fold cross validation, the mean correlation between the model’s predictions and human ratings is high (r = 0.74). While previous methods attained a similar correlation by automatically comparing speech that has been transcribed [Wieling et al., Lang. Dyn. Chang. 4, 253–269 (2014)] or by making accent-specific Gaussian mixture models [Cheng et al., Interspeech 2013 (2013), pp. 2574–2578], the present model requires no transcription and can perform accent-general inference.
APA, Harvard, Vancouver, ISO, and other styles
38

Werner, Lauren, Gaojian Huang, and Brandon J. Pitts. "Automated Speech Recognition Systems and Older Adults: A Literature Review and Synthesis." Proceedings of the Human Factors and Ergonomics Society Annual Meeting 63, no. 1 (November 2019): 42–46. http://dx.doi.org/10.1177/1071181319631121.

Full text
Abstract:
The number of older adults is growing significantly worldwide. At the same time, technological developments are rapidly evolving, and older populations are expected to interact more frequently with such sophisticated systems. Automated speech recognition (ASR) systems is an example of one technology that is increasingly present in daily life. However, age-related physical changes may alter speech production and limit the effectiveness of ASR systems for older individuals. The goal of this paper was to summarize the current knowledge on ASR systems and older adults. The PRISMA method was employed and 17 studies were compared on the basis of word error rate (WER). Overall, WER was found to be influenced by age, gender, and the number of speech samples used to train ASR systems. This work has implications for the development of future human-machine technologies that will be used by a wide range of age groups.
APA, Harvard, Vancouver, ISO, and other styles
39

Hill, David R., Andrew Pearce, and Brian Wyvill. "Animating speech: an automated approach using speech synthesised by rules." Visual Computer 3, no. 5 (March 1988): 277–89. http://dx.doi.org/10.1007/bf01914863.

Full text
APA, Harvard, Vancouver, ISO, and other styles
40

Carrier, Michael. "Automated Speech Recognition in language learning: Potential models, benefits and impact." Training Language and Culture 1, no. 1 (February 2017): 46–61. http://dx.doi.org/10.29366/2017tlc.1.1.3.

Full text
APA, Harvard, Vancouver, ISO, and other styles
41

Yasin, Sana, Umar Draz, Tariq Ali, Kashaf Shahid, Amna Abid, Rukhsana Bibi, Muhammad Irfan, et al. "Automated Speech Recognition System to Detect Babies’ Feelings through Feature Analysis." Computers, Materials & Continua 73, no. 2 (2022): 4349–67. http://dx.doi.org/10.32604/cmc.2022.028251.

Full text
APA, Harvard, Vancouver, ISO, and other styles
42

ZHANG, Bin, Akio FUNAKUBO, and Yasuhiro FUKUI. "Research on an Automated Speech Recognition System Based on Lip Movement." Journal of Life Support Engineering 10, no. 3 (1998): 106–10. http://dx.doi.org/10.5136/lifesupport.10.106.

Full text
APA, Harvard, Vancouver, ISO, and other styles
43

Nevatia, Rishabh. "Lip Reading: Delving into Deep Learning." International Journal for Research in Applied Science and Engineering Technology 9, no. 9 (September 30, 2021): 1555–61. http://dx.doi.org/10.22214/ijraset.2021.38216.

Full text
Abstract:
Abstract: Lip reading is the visual task of interpreting phrases from lip movements. While speech is one of the most common ways of communicating among individuals, understanding what a person wants to convey while having access only to their lip movements is till date a task that has not seen its paradigm. Various stages are involved in the process of automated lip reading, ranging from extraction of features to applying neural networks. This paper covers various deep learning approaches that are used for lip reading Keywords: Automatic Speech Recognition, Lip Reading, Neural Networks, Feature Extraction, Deep Learning
APA, Harvard, Vancouver, ISO, and other styles
44

Kaur, Gurpreet, Mohit Srivastava, and Amod Kumar. "Speaker and Speech Recognition using Deep Neural Network." International Journal of Emerging Research in Management and Technology 6, no. 8 (June 25, 2018): 118. http://dx.doi.org/10.23956/ijermt.v6i8.126.

Full text
Abstract:
In command and control applications, feature extraction process is very important for good accuracy and less learning time. In order to deal with these metrics, we have proposed an automated combined speaker and speech recognition technique. In this paper five isolated words are recorded with four speakers, two males and two females. We have used the Mel Frequency Cepstral Coefficient (MFCC) feature extraction method with Genetic Algorithm to optimize the extracted features and generate an appropriate feature set. In first phase, feature extraction using MFCC is executed following the feature optimization using Genetic Algorithm and in last & third phase, training is conducted using the Deep Neural Network. In the end, evaluation and validation of the proposed work model is done by setting real environment. To check the efficiency of the proposed work, we have calculated the parameters like accuracy, precision rate, recall rate, sensitivity and specificity..
APA, Harvard, Vancouver, ISO, and other styles
45

Sapiński, Tomasz, Dorota Kamińska, Adam Pelikant, and Gholamreza Anbarjafari. "Emotion Recognition from Skeletal Movements." Entropy 21, no. 7 (June 29, 2019): 646. http://dx.doi.org/10.3390/e21070646.

Full text
Abstract:
Automatic emotion recognition has become an important trend in many artificial intelligence (AI) based applications and has been widely explored in recent years. Most research in the area of automated emotion recognition is based on facial expressions or speech signals. Although the influence of the emotional state on body movements is undeniable, this source of expression is still underestimated in automatic analysis. In this paper, we propose a novel method to recognise seven basic emotional states—namely, happy, sad, surprise, fear, anger, disgust and neutral—utilising body movement. We analyse motion capture data under seven basic emotional states recorded by professional actor/actresses using Microsoft Kinect v2 sensor. We propose a new representation of affective movements, based on sequences of body joints. The proposed algorithm creates a sequential model of affective movement based on low level features inferred from the spacial location and the orientation of joints within the tracked skeleton. In the experimental results, different deep neural networks were employed and compared to recognise the emotional state of the acquired motion sequences. The experimental results conducted in this work show the feasibility of automatic emotion recognition from sequences of body gestures, which can serve as an additional source of information in multimodal emotion recognition.
APA, Harvard, Vancouver, ISO, and other styles
46

Cho, Sunghye, Naomi Nevler, Sanjana Shellikeri, Natalia Parjane, David J. Irwin, Neville Ryant, Sharon Ash, Christopher Cieri, Mark Liberman, and Murray Grossman. "Lexical and Acoustic Characteristics of Young and Older Healthy Adults." Journal of Speech, Language, and Hearing Research 64, no. 2 (February 17, 2021): 302–14. http://dx.doi.org/10.1044/2020_jslhr-19-00384.

Full text
Abstract:
Purpose This study examines the effect of age on language use with an automated analysis of digitized speech obtained from semistructured, narrative speech samples. Method We examined the Cookie Theft picture descriptions produced by 37 older and 76 young healthy participants. Using modern natural language processing and automatic speech recognition tools, we automatically annotated part-of-speech categories of all tokens, calculated the number of tense-inflected verbs, mean length of clause, and vocabulary diversity, and we rated nouns and verbs for five lexical features: word frequency, familiarity, concreteness, age of acquisition, and semantic ambiguity. We also segmented the speech signals into speech and silence and calculated acoustic features, such as total speech time, mean speech and pause segment durations, and pitch values. Results Older speakers produced significantly more fillers, pronouns, and verbs and fewer conjunctions, determiners, nouns, and prepositions than young participants. Older speakers' nouns and verbs were more familiar, more frequent (verbs only), and less ambiguous compared to those of young speakers. Older speakers produced shorter clauses with a lower vocabulary diversity than young participants. They also produced shorter speech segments and longer pauses with increased total speech time and total number of words. Lastly, we observed an interaction of age and sex in pitch ranges. Conclusions Our results suggest that older speakers' lexical content is less diverse, and these speakers produce shorter clauses than young participants in monologic, narrative speech. Our findings show that lexical and acoustic characteristics of semistructured speech samples can be examined with automated methods.
APA, Harvard, Vancouver, ISO, and other styles
47

SCHULZ, KLAUS U., and TOMEK MIKO ŁAJEWSKI. "Between finite state and Prolog: constraint-based automata for efficient recognition of phrases." Natural Language Engineering 2, no. 4 (December 1996): 365–66. http://dx.doi.org/10.1017/s1351324997001630.

Full text
Abstract:
In computational linguistics, efficient recognition of phrases is an important prerequisite for many ambitious goals, such as automated extraction of terminology, part of speech disambiguation, and automated translation. If one wants to recognize a certain well-defined set of phrases, the question of which type of computational device to use for this task arises. For sets of phrases that are not too complex, as well as for many subtasks of the recognition process, finite state methods are appropriate and favourable because of their efficiency Gross and Perrin 1989; Silberztein 1993; Tapanainen 1995. However, if very large sets of possibly complex phrases are considered where correct resolution of grammatical structure requires morphological analysis (e.g. verb argument structure, extraposition of relative clauses, etc.), then the design and implementation of an appropriate finite state automaton might turn out to be infeasible in practice due to the immense number of morphological variants to be captured.
APA, Harvard, Vancouver, ISO, and other styles
48

Al-Aynati, Maamoun M., and Katherine A. Chorneyko. "Comparison of Voice-Automated Transcription and Human Transcription in Generating Pathology Reports." Archives of Pathology & Laboratory Medicine 127, no. 6 (June 1, 2003): 721–25. http://dx.doi.org/10.5858/2003-127-721-covtah.

Full text
Abstract:
Abstract Context.—Software that can convert spoken words into written text has been available since the early 1980s. Early continuous speech systems were developed in 1994, with the latest commercially available editions having a claimed accuracy of up to 98% of speech recognition at natural speech rates. Objectives.—To evaluate the efficacy of one commercially available voice-recognition software system with pathology vocabulary in generating pathology reports and to compare this with human transcription. To draw cost analysis conclusions regarding human versus computer-based transcription. Design.—Two hundred six routine pathology reports from the surgical pathology material handled at St Joseph's Healthcare, Hamilton, Ontario, were generated simultaneously using computer-based transcription and human transcription. The following hardware and software were used: a desktop 450-MHz Intel Pentium III processor with 192 MB of RAM, a speech-quality sound card (Sound Blaster), noise-canceling headset microphone, and IBM ViaVoice Pro version 8 with pathology vocabulary support (Voice Automated, Huntington Beach, Calif). The cost of the hardware and software used was approximately Can $2250. Results.—A total of 23 458 words were transcribed using both methods with a mean of 114 words per report. The mean accuracy rate was 93.6% (range, 87.4%–96%) using the computer software, compared to a mean accuracy of 99.6% (range, 99.4%–99.8%) for human transcription (P &lt; .001). Time needed to edit documents by the primary evaluator (M.A.) using the computer was on average twice that needed for editing the documents produced by human transcriptionists (range, 1.4–3.5 times). The extra time needed to edit documents was 67 minutes per week (13 minutes per day). Conclusions.—Computer-based continuous speech-recognition systems in pathology can be successfully used in pathology practice even during the handling of gross pathology specimens. The relatively low accuracy rate of this voice-recognition software with resultant increased editing burden on pathologists may not encourage its application on a wide scale in pathology departments with sufficient human transcription services, despite significant potential financial savings. However, computer-based transcription represents an attractive and relatively inexpensive alternative to human transcription in departments where there is a shortage of transcription services, and will no doubt become more commonly used in pathology departments in the future.
APA, Harvard, Vancouver, ISO, and other styles
49

Toth, Laszlo, Ildiko Hoffmann, Gabor Gosztolya, Veronika Vincze, Greta Szatloczki, Zoltan Banreti, Magdolna Pakaski, and Janos Kalman. "A Speech Recognition-based Solution for the Automatic Detection of Mild Cognitive Impairment from Spontaneous Speech." Current Alzheimer Research 15, no. 2 (January 3, 2018): 130–38. http://dx.doi.org/10.2174/1567205014666171121114930.

Full text
Abstract:
Background: Even today the reliable diagnosis of the prodromal stages of Alzheimer's disease (AD) remains a great challenge. Our research focuses on the earliest detectable indicators of cognitive decline in mild cognitive impairment (MCI). Since the presence of language impairment has been reported even in the mild stage of AD, the aim of this study is to develop a sensitive neuropsychological screening method which is based on the analysis of spontaneous speech production during performing a memory task. In the future, this can form the basis of an Internet-based interactive screening software for the recognition of MCI. Methods: Participants were 38 healthy controls and 48 clinically diagnosed MCI patients. The provoked spontaneous speech by asking the patients to recall the content of 2 short black and white films (one direct, one delayed), and by answering one question. Acoustic parameters (hesitation ratio, speech tempo, length and number of silent and filled pauses, length of utterance) were extracted from the recorded speech signals, first manually (using the Praat software), and then automatically, with an automatic speech recognition (ASR) based tool. First, the extracted parameters were statistically analyzed. Then we applied machine learning algorithms to see whether the MCI and the control group can be discriminated automatically based on the acoustic features. Results: The statistical analysis showed significant differences for most of the acoustic parameters (speech tempo, articulation rate, silent pause, hesitation ratio, length of utterance, pause-per-utterance ratio). The most significant differences between the two groups were found in the speech tempo in the delayed recall task, and in the number of pauses for the question-answering task. The fully automated version of the analysis process – that is, using the ASR-based features in combination with machine learning - was able to separate the two classes with an F1-score of 78.8%. Conclusion: The temporal analysis of spontaneous speech can be exploited in implementing a new, automatic detection-based tool for screening MCI for the community.
APA, Harvard, Vancouver, ISO, and other styles
50

Modi, Rohan. "Transcript Anatomization with Multi-Linguistic and Speech Synthesis Features." International Journal for Research in Applied Science and Engineering Technology 9, no. VI (June 20, 2021): 1755–58. http://dx.doi.org/10.22214/ijraset.2021.35371.

Full text
Abstract:
Handwriting Detection is a process or potential of a computer program to collect and analyze comprehensible input that is written by hand from various types of media such as photographs, newspapers, paper reports etc. Handwritten Text Recognition is a sub-discipline of Pattern Recognition. Pattern Recognition is refers to the classification of datasets or objects into various categories or classes. Handwriting Recognition is the process of transforming a handwritten text in a specific language into its digitally expressible script represented by a set of icons known as letters or characters. Speech synthesis is the artificial production of human speech using Machine Learning based software and audio output based computer hardware. While there are many systems which convert normal language text in to speech, the aim of this paper is to study Optical Character Recognition with speech synthesis technology and to develop a cost effective user friendly image based offline text to speech conversion system using CRNN neural networks model and Hidden Markov Model. The automated interpretation of text that has been written by hand can be very useful in various instances where processing of great amounts of handwritten data is required, such as signature verification, analysis of various types of documents and recognition of amounts written on bank cheques by hand.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography