Academic literature on the topic 'Automated speech Recognition'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Automated speech Recognition.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Automated speech Recognition"

1

Vucovich, Megan, Rami R. Hallac, Alex A. Kane, Julie Cook, Cortney Van'T Slot, and James R. Seaward. "Automated cleft speech evaluation using speech recognition." Journal of Cranio-Maxillofacial Surgery 45, no. 8 (August 2017): 1268–71. http://dx.doi.org/10.1016/j.jcms.2017.05.002.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Manikandan, K., Apurva Singh, Sakshi Agarwal, and Ankita Singh. "Automated Scrolling Using Speech Recognition." International Journal of Technology 7, no. 1 (2017): 15. http://dx.doi.org/10.5958/2231-3915.2017.00004.9.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Smith, L. A., B. L. Scott, L. S. Lin, and J. M. Newell. "Automated training for speech recognition." Journal of the Acoustical Society of America 86, S1 (November 1989): S78. http://dx.doi.org/10.1121/1.2027652.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Koenecke, Allison, Andrew Nam, Emily Lake, Joe Nudell, Minnie Quartey, Zion Mengesha, Connor Toups, John R. Rickford, Dan Jurafsky, and Sharad Goel. "Racial disparities in automated speech recognition." Proceedings of the National Academy of Sciences 117, no. 14 (March 23, 2020): 7684–89. http://dx.doi.org/10.1073/pnas.1915768117.

Full text
Abstract:
Automated speech recognition (ASR) systems, which use sophisticated machine-learning algorithms to convert spoken language to text, have become increasingly widespread, powering popular virtual assistants, facilitating automated closed captioning, and enabling digital dictation platforms for health care. Over the last several years, the quality of these systems has dramatically improved, due both to advances in deep learning and to the collection of large-scale datasets used to train the systems. There is concern, however, that these tools do not work equally well for all subgroups of the population. Here, we examine the ability of five state-of-the-art ASR systems—developed by Amazon, Apple, Google, IBM, and Microsoft—to transcribe structured interviews conducted with 42 white speakers and 73 black speakers. In total, this corpus spans five US cities and consists of 19.8 h of audio matched on the age and gender of the speaker. We found that all five ASR systems exhibited substantial racial disparities, with an average word error rate (WER) of 0.35 for black speakers compared with 0.19 for white speakers. We trace these disparities to the underlying acoustic models used by the ASR systems as the race gap was equally large on a subset of identical phrases spoken by black and white individuals in our corpus. We conclude by proposing strategies—such as using more diverse training datasets that include African American Vernacular English—to reduce these performance differences and ensure speech recognition technology is inclusive.
APA, Harvard, Vancouver, ISO, and other styles
5

Margolis, Robert H., Richard H. Wilson, George L. Saly, Heather M. Gregoire, and Brandon M. Madsen. "Automated Forced-Choice Tests of Speech Recognition." Journal of the American Academy of Audiology 32, no. 09 (October 2021): 606–15. http://dx.doi.org/10.1055/s-0041-1733964.

Full text
Abstract:
Abstract Purpose This project was undertaken to develop automated tests of speech recognition, including speech-recognition threshold (SRT) and word-recognition test, using forced-choice responses and computerized scoring of responses. Specific aims were (1) to develop an automated method for measuring SRT for spondaic words that produces scores that are in close agreement with average pure-tone thresholds and (2) to develop an automated test of word recognition that distinguishes listeners with normal hearing from those with sensorineural hearing loss and which informs the hearing aid evaluation process. Method An automated SRT protocol was designed to converge on the lowest level at which the listener responds correctly to two out of two spondees presented monaurally. A word-recognition test was conducted with monosyllabic words (female speaker) presented monaurally at a fixed level. For each word, there were three rhyming foils, displayed on a touchscreen with the test word. The listeners touched the word they thought they heard. Participants were young listeners with normal hearing and listeners with sensorineural hearing loss. Words were also presented with nonrhyming foils and in an open-set paradigm. The open-set responses were scored by a graduate student research assistant. Results The SRT results agreed closely with the pure-tone average (PTA) obtained by automated audiometry. The agreement was similar to results obtained with the conventional SRT scoring method. Word-recognition scores were highest for the closed-set, nonrhyming lists and lowest for open-set responses. For the hearing loss participants, the scores varied widely. There was a moderate correlation between word-recognition scores and pure-tone thresholds which increased as more high frequencies were brought into the PTA. Based on the findings of this study, a clinical protocol was designed that determines if a listener's performance was in the normal range and if the listener benefited from increasing the level of the stimuli. Conclusion SRTs obtained using the automated procedure are comparable to the results obtained by the conventional clinical method that is in common use. The automated closed-set word-recognition test results show clear differentiation between scores for the normal and hearing loss groups. These procedures provide clinical test results that are not dependent on the availability of an audiologist to perform the tests.
APA, Harvard, Vancouver, ISO, and other styles
6

Foltz, Peter W., Darrell Laham, and Marcia Derr. "Automated Speech Recognition for Modeling Team Performance." Proceedings of the Human Factors and Ergonomics Society Annual Meeting 47, no. 4 (October 2003): 673–77. http://dx.doi.org/10.1177/154193120304700402.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Townshend, Brent. "Automated language assessment using speech recognition modeling." Journal of the Acoustical Society of America 120, no. 6 (2006): 3451. http://dx.doi.org/10.1121/1.2409447.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Kuzmin, A., and S. Ivanov. "Speech to Text System for Noisy and Quiet Speech." Journal of Physics: Conference Series 2096, no. 1 (November 1, 2021): 012071. http://dx.doi.org/10.1088/1742-6596/2096/1/012071.

Full text
Abstract:
Abstract This paper examines one of the available and simple methods to develop speech recognition systems capable of recognizing speech from noisy or silent recordings. Such systems improve the automated operation of call centers, and also bring us closer to creating speech recognition models capable of ignoring the speech deficiencies of speakers.
APA, Harvard, Vancouver, ISO, and other styles
9

Patil, Vishakha. "Review on Automated Elevator-an Attentive Elevator to Elevate using Speech Recognition." Journal of Advanced Research in Power Electronics and Power Systems 08, no. 1&2 (August 6, 2021): 20–26. http://dx.doi.org/10.24321/2456.1401.202102.

Full text
Abstract:
Elevator has over time become an important part of our day-to-day life. It is used as an everyday transport device useful to move goods as well as persons. In the modern world, the city and crowded areas require multiform buildings. According to wheelchair access laws, elevators/lifts are a must requirement in new multi-stored buildings. The main purpose of this project is to operate the elevator by voice command. The project is operating based on voice, which could help handicap people or dwarf people to travel from one place to another without the help of any other person. The use of a microcontroller is to control different devices and integrate each module, namely- voice module, motor module, and LCD. LCD is used to display the present status of the lift. The reading edge of our project is the “voice recognition system” which genet’s exceptional result while recognizing speech.
APA, Harvard, Vancouver, ISO, and other styles
10

Barry, Timothy P., Kristen K. Liggett, David T. Williamson, and John M. Reising. "Enhanced Recognition Accuracy with the Simultaneous Use of Three Automated Speech Recognition Systems." Proceedings of the Human Factors Society Annual Meeting 36, no. 4 (October 1992): 288–92. http://dx.doi.org/10.1177/154193129203600406.

Full text
Abstract:
Two studies were performed to test the efficacy of using three different automated speech recognition devices in parallel to obtain speech recognition accuracies better than those produced by each of the individual systems alone. The first experiment compared the recognition accuracy of each of the three individual systems with the accuracy obtained by combining the data from all three systems using a simple “Majority Rules” algorithm. The second experiment made the same comparison, but used a more sophisticated algorithm developed using the performance data obtained from experiment 1. Results from the first experiment revealed a modest increase in speech recognition accuracy using all three systems in concert along with the Simple Majority Rules (SMR) algorithm. Results from the second experiment showed an even greater improvement in recognition accuracy using the three systems in concert and an Enhanced Majority Rules (EMR) algorithm. The implications of using intelligent software and multiple speech recognition devices to improve speech recognition accuracy are discussed.
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "Automated speech Recognition"

1

Davies, David Richard Llewellyn, and dave davies@canberra edu au. "Representing Time in Automated Speech Recognition." The Australian National University. Research School of Information Sciences and Engineering, 2003. http://thesis.anu.edu.au./public/adt-ANU20040602.163031.

Full text
Abstract:
This thesis explores the treatment of temporal information in Automated Speech Recognition. It reviews the study of time in speech perception and concludes that while some temporal information in the speech signal is of crucial value in the speech decoding process not all temporal information is relevant to decoding. We then review the representation of temporal information in the main automated recognition techniques: Hidden Markov Models and Artificial Neural Networks. We find that both techniques have difficulty representing the type of temporal information that is phonetically or phonologically significant in the speech signal. In an attempt to improve this situation we explore the problem of representation of temporal information in the acoustic vectors commonly used to encode the speech acoustic signal in the front-ends of speech recognition systems. We attempt, where possible, to let the signal provide the temporal structure rather than imposing a fixed, clock-based timing framework. We develop a novel acoustic temporal parameter (the Parameter Similarity Length), a measure of temporal stability, that is tested against the time derivatives of acoustic parameters conventionally used in acoustic vectors.
APA, Harvard, Vancouver, ISO, and other styles
2

Sooful, Jayren Jugpal. "Automated phoneme mapping for cross-language speech recognition." Diss., Pretoria [s.n.], 2004. http://upetd.up.ac.za/thesis/available/etd-01112005-131128.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

LAYOUSS, NIZAR GANDY ASSAF. "A critical examination of deep learningapproaches to automated speech recognition." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-153681.

Full text
Abstract:
Recently, deep learning techniques have been successfully applied to automatic speech recognition (ASR) tasks. Most current speech recognition systems use Hidden Markov Models (HMMs) to deal with the temporal variability of speech and Gaussian mixture models (GMMs) are exploited to model the emission probability of the HMM. Deep Neural Networks (DNNs) and Deep Belief Networks(DBNs) have recently proven though to outperform GMMs in modeling the probability of emission in HMMs. Deep architectures such as DBNs with many hidden layers are useful for multilevel feature representation thus building a distributed representation at different levels of a certain input. These networks are first pre-trained as a multi-layer generative model of a window of feature vector without making use of any discriminative information in unsupervised mode. Once the generative pre-training is complete, discriminative fine-tuning is performed to adjust the model parameters to make them better at predicting. Our aim is to study different levels of representation for speech acoustic features that are produced by the hidden layers of DBNs. To this end, we estimate phoneme recognition error and use classification accuracy evaluated with Support Vector Machines (SVMs) as a measure of separability between the DBN representations of 61 phoneme classes. In addition, we investigate the relation between different subgroups/categories of phonemes at various representation levels using correlation analysis. The tests have been performed on TIMIT database and simulations have been developed to run on a graphics processing unit (GPU) cluster at PDC/KTH.
APA, Harvard, Vancouver, ISO, and other styles
4

Dookhoo, Raul. "AUTOMATED REGRESSION TESTING APPROACH TO EXPANSION AND REFINEMENT OF SPEECH RECOGNITION GRAMMARS." Master's thesis, University of Central Florida, 2008. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/2634.

Full text
Abstract:
This thesis describes an approach to automated regression testing for speech recognition grammars. A prototype Audio Regression Tester called ART has been developed using Microsoft's Speech API and C#. ART allows a user to perform any of three tasks: automatically generate a new XML-based grammar file from standardized SQL database entries, record and cross-reference audio files for use by an underlying speech recognition engine, and perform regression tests with the aid of an oracle grammar. ART takes as input a wave sound file containing speech and a newly created XML grammar file. It then simultaneously executes two tests: one with the wave file and the new grammar file and the other with the wave file and the oracle grammar. The comparison result of the tests is used to determine whether the test was successful or not. This allows rapid exhaustive evaluations of additions to grammar files to guarantee forward process as the complexity of the voice domain grows. The data used in this research to derive results were taken from the LifeLike project. However, the capabilities of ART extend beyond LifeLike. The results gathered have shown that using a person's recorded voice to do regression testing is as effective as having the person do live testing. A cost-benefit analysis, using two published equations, one for Cost and the other for Benefit, was also performed to determine if automated regression testing is really more effective than manual testing. Cost captures the salaries of the engineers who perform regression testing tasks and Benefit captures revenue gains or losses related to changes in product release time. ART had a higher benefit of $21461.08 when compared to manual regression testing which had a benefit of $21393.99. Coupled with its excellent error detection rates, ART has proven to be very efficient and cost-effective in speech grammar creation and refinement.
M.S.
School of Electrical Engineering and Computer Science
Engineering and Computer Science
Computer Science MS
APA, Harvard, Vancouver, ISO, and other styles
5

Tsuchiya, Shinsuke. "Elicited Imitation and Automated Speech Recognition: Evaluating Differences among Learners of Japanese." BYU ScholarsArchive, 2011. https://scholarsarchive.byu.edu/etd/2782.

Full text
Abstract:
This study addresses the usefulness of elicited imitation (EI) and automated speech recognition (ASR) as a tool for second language acquisition (SLA) research by evaluating differences among learners of Japanese. The findings indicate that the EI and ASR grading system used in this study was able to differentiate between beginning- and advanced-level learners as well as instructed and self-instructed learners. No significant difference was found between self-instructed learners with and without post-mission instruction. The procedure, reliability and validity of the ASR-based computerized EI are discussed. Results and discussion will provide insights regarding different types of second language (L2) development, the effects of instruction, implications for teaching, as well as limitations of the EI and ASR grading system.
APA, Harvard, Vancouver, ISO, and other styles
6

Brashear, Helene Margaret. "Improving the efficacy of automated sign language practice tools." Diss., Georgia Institute of Technology, 2010. http://hdl.handle.net/1853/34703.

Full text
Abstract:
The CopyCat project is an interdisciplinary effort to create a set of computer-aided language learning tools for deaf children. The CopyCat games allow children to interact with characters using American Sign Language (ASL). Through Wizard of Oz pilot studies we have developed a set of games, shown their efficacy in improving young deaf children's language and memory skills, and collected a large corpus of signing examples. Our previous implementation of the automatic CopyCat games uses automatic sign language recognition and verification in the infrastructure of a memory repetition and phrase verification task. The goal of my research is to expand the automatic sign language system to transition the CopyCat games to include the flexibility of a dialogue system. I have created a labeling ontology from analysis of the CopyCat signing corpus, and I have used the ontology to describe the contents of the CopyCat data set. This ontology was used to change and improve the automatic sign language recognition system and to add flexibility to language use in the automatic game.
APA, Harvard, Vancouver, ISO, and other styles
7

Morton, Hazel. "A scenario based approach to speech-enabled computer assisted language learning based on automated speech recognition and virtual reality graphics." Thesis, University of Edinburgh, 2007. http://hdl.handle.net/1842/15438.

Full text
Abstract:
By using speech recognition technology, Computer Assisted Language Learning (CALL) programs can provide learners with opportunities to practise speaking in the target language and develop their oral language skills. This research is a contribution to the emerging and innovative area of speech-enabled CALL applications. It describes a CALL application, SPELL (Spoken Electronic Language Learning), which integrates software for speaker independent continuous speech recognition with embodied virtual agents and virtual worlds to create an immersive environment in which learners can converse in the target language in contextualized scenarios. The design of the program is based on a communicative approach to second language acquisition which posits that learning activities should give learners opportunities to communicate in the target language in meaningful contexts. In applying a communicative approach to the design of a CALL program, the speech recogniser is programmed to allow a variety of responses form the learner and to recognise grammatical and ungrammatical utterances so that the learner can receive relevant and immediate feedback to their utterance. Feedback takes two key forms: reformations, where the system repeats or reformulates the agent’s initial speech, and recasts, where the system repeats the learner’s utterance, implicitly correcting any errors. This research claims that speech-enabled CALL systems which employ an open-ended approach to the recognition grammars and which adapt a communicative approach are usable, engaging and motivating conversational tools for language learners. In addition, by employing implicit feedback strategies in the design, speech recognition errors can be mitigated such that interactions between learners and embodied virtual agents can proceed while providing learners with valuable target language input during the interactions. These claims are based on a series of three empirical studies conducted with end users of the system.
APA, Harvard, Vancouver, ISO, and other styles
8

Gargett, Ross. "The Use of Automated Speech Recognition in Electronic Health Records in Rural Health Care Systems." Digital Commons @ East Tennessee State University, 2016. https://dc.etsu.edu/honors/340.

Full text
Abstract:
Since the HITECH (Health Information Technology for Economic and Clinical Health) Act was enacted, healthcare providers are required to achieve “Meaningful Use.” CPOE (Clinical Provider Order Entry), is one such requirement. Many providers prefer to dictate their orders rather than typing them. Medical vocabulary is wrought with its own terminology and department-specific acronyms, and many ASR (Automated Speech Recognition) systems are not trained to interpret this language. The purpose of this thesis research was to investigate the use and effectiveness of ASR in the healthcare industry. Multiple hospitals and multiple clinicians agreed to be followed through their use of an ASR system to enter patient data into the record. As a result of this research, the effectiveness and use of the ASR was examined, and multiple issues with the use and accuracy of the system were uncovered.
APA, Harvard, Vancouver, ISO, and other styles
9

Zylich, Brian Matthew. "Training Noise-Robust Spoken Phrase Detectors with Scarce and Private Data: An Application to Classroom Observation Videos." Digital WPI, 2019. https://digitalcommons.wpi.edu/etd-theses/1289.

Full text
Abstract:
We explore how to automatically detect specific phrases in audio from noisy, multi-speaker videos using deep neural networks. Specifically, we focus on classroom observation videos that contain a few adult teachers and several small children (< 5 years old). At any point in these videos, multiple people may be talking, shouting, crying, or singing simultaneously. Our goal is to recognize polite speech phrases such as "Good job", "Thank you", "Please", and "You're welcome", as the occurrence of such speech is one of the behavioral markers used in classroom observation coding via the Classroom Assessment Scoring System (CLASS) protocol. Commercial speech recognition services such as Google Cloud Speech are impractical because of data privacy concerns. Therefore, we train and test our own custom models using a combination of publicly available classroom videos from YouTube, as well as a private dataset of real classroom observation videos collected by our colleagues at the University of Virginia. We also crowdsource an additional 1152 recordings of polite speech phrases to augment our training dataset. Our contributions are the following: (1) we design a crowdsourcing task for efficiently labeling speech events in classroom videos, (2) we develop a neural network-based architecture for speech recognition, robust to noise and overlapping speech, and (3) we explore methods to synthesize new and authentic audio data, both to increase the training set size and reduce the class imbalance. Finally, using our trained polite speech detector, (4) we investigate the relationship between polite speech and CLASS scores and enable teachers to visualize their use of polite language.
APA, Harvard, Vancouver, ISO, and other styles
10

Alcaraz, Meseguer Noelia. "Speech Analysis for Automatic Speech Recognition." Thesis, Norwegian University of Science and Technology, Department of Electronics and Telecommunications, 2009. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-9092.

Full text
Abstract:

The classical front end analysis in speech recognition is a spectral analysis which parametrizes the speech signal into feature vectors; the most popular set of them is the Mel Frequency Cepstral Coefficients (MFCC). They are based on a standard power spectrum estimate which is first subjected to a log-based transform of the frequency axis (mel- frequency scale), and then decorrelated by using a modified discrete cosine transform. Following a focused introduction on speech production, perception and analysis, this paper gives a study of the implementation of a speech generative model; whereby the speech is synthesized and recovered back from its MFCC representations. The work has been developed into two steps: first, the computation of the MFCC vectors from the source speech files by using HTK Software; and second, the implementation of the generative model in itself, which, actually, represents the conversion chain from HTK-generated MFCC vectors to speech reconstruction. In order to know the goodness of the speech coding into feature vectors and to evaluate the generative model, the spectral distance between the original speech signal and the one produced from the MFCC vectors has been computed. For that, spectral models based on Linear Prediction Coding (LPC) analysis have been used. During the implementation of the generative model some results have been obtained in terms of the reconstruction of the spectral representation and the quality of the synthesized speech.

APA, Harvard, Vancouver, ISO, and other styles

Books on the topic "Automated speech Recognition"

1

Yu, Dong, and Li Deng. Automatic Speech Recognition. London: Springer London, 2015. http://dx.doi.org/10.1007/978-1-4471-5779-3.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Lee, Kai-Fu. Automatic Speech Recognition. Boston, MA: Springer US, 1989. http://dx.doi.org/10.1007/978-1-4615-3650-5.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Huang, X. D. Hidden Markov models for speech recognition. Edinburgh: Edinburgh University Press, 1990.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
4

Woelfel, Matthias. Distant speech recognition. Chichester, West Sussex, U.K: Wiley, 2009.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
5

Junqua, Jean-Claude, and Jean-Paul Haton. Robustness in Automatic Speech Recognition. Boston, MA: Springer US, 1996. http://dx.doi.org/10.1007/978-1-4613-1297-0.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Lee, Chin-Hui, Frank K. Soong, and Kuldip K. Paliwal, eds. Automatic Speech and Speaker Recognition. Boston, MA: Springer US, 1996. http://dx.doi.org/10.1007/978-1-4613-1367-0.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Keshet, Joseph, and Samy Bengio, eds. Automatic Speech and Speaker Recognition. Chichester, UK: John Wiley & Sons, Ltd, 2009. http://dx.doi.org/10.1002/9780470742044.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Markowitz, Judith A. Using speech recognition. Upper Saddle River, N.J: Prentice Hall PTR, 1996.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
9

Ainsworth, W. A. Speech recognition by machine. London: Peregrinus on behalf of the Institution of Electrical Engineers, 1987.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
10

Ainsworth, W. A. Speech recognition by machine. London, U.K: P. Peregrinus on behalf of the Institution of Electrical Engineers, 1988.

Find full text
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Automated speech Recognition"

1

Suendermann, David, Jackson Liscombe, Roberto Pieraccini, and Keelan Evanini. "“How am I Doing?”: A New Framework to Effectively Measure the Performance of Automated Customer Care Contact Centers." In Advances in Speech Recognition, 155–79. Boston, MA: Springer US, 2010. http://dx.doi.org/10.1007/978-1-4419-5951-5_7.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Schmitt, Alexander, Roberto Pieraccini, and Tim Polzehl. "“For Heaven’s Sake, Gimme a Live Person!” Designing Emotion-Detection Customer Care Voice Applications in Automated Call Centers." In Advances in Speech Recognition, 191–219. Boston, MA: Springer US, 2010. http://dx.doi.org/10.1007/978-1-4419-5951-5_9.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Rajan, Sai Sathiesh, Sakshi Udeshi, and Sudipta Chattopadhyay. "AequeVox: Automated Fairness Testing of Speech Recognition Systems." In Fundamental Approaches to Software Engineering, 245–67. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-030-99429-7_14.

Full text
Abstract:
AbstractAutomatic Speech Recognition (ASR) systems have become ubiquitous. They can be found in a variety of form factors and are increasingly important in our daily lives. As such, ensuring that these systems are equitable to different subgroups of the population is crucial. In this paper, we introduce, AequeVox, an automated testing framework for evaluating the fairness of ASR systems. AequeVox simulates different environments to assess the effectiveness of ASR systems for different populations. In addition, we investigate whether the chosen simulations are comprehensible to humans. We further propose a fault localization technique capable of identifying words that are not robust to these varying environments. Both components of AequeVox are able to operate in the absence of ground truth data.We evaluate AequeVox on speech from four different datasets using three different commercial ASRs. Our experiments reveal that non-native English, female and Nigerian English speakers generate 109%, 528.5% and 156.9% more errors, on average than native English, male and UK Midlands speakers, respectively. Our user study also reveals that 82.9% of the simulations (employed through speech transformations) had a comprehensibility rating above seven (out of ten), with the lowest rating being 6.78. This further validates the fairness violations discovered by AequeVox. Finally, we show that the non-robust words, as predicted by the fault localization technique embodied in AequeVox, show 223.8% more errors than the predicted robust words across all ASRs.
APA, Harvard, Vancouver, ISO, and other styles
4

Gruber, Ivan, Pavel Ircing, Petr Neduchal, Marek Hrúz, Miroslav Hlaváč, Zbyněk Zajíc, Jan Švec, and Martin Bulín. "An Automated Pipeline for Robust Image Processing and Optical Character Recognition of Historical Documents." In Speech and Computer, 166–75. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-60276-5_17.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Abelardo, Amanda, Washington Silva, and Ginalber Serra. "CPSO Applied in the Optimization of a Speech Recognition System." In Intelligent Data Engineering and Automated Learning – IDEAL 2014, 134–41. Cham: Springer International Publishing, 2014. http://dx.doi.org/10.1007/978-3-319-10840-7_17.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Romanovskyi, O., I. Iosifov, O. Iosifova, V. Sokolov, F. Kipchuk, and I. Sukaylo. "Automated Pipeline for Training Dataset Creation from Unlabeled Audios for Automatic Speech Recognition." In Advances in Computer Science for Engineering and Education IV, 25–36. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-80472-5_3.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Kim, Jung-Hyun, and Kwang-Seok Hong. "Speech and Gesture Recognition-Based Robust Language Processing Interface in Noise Environment." In Intelligent Data Engineering and Automated Learning – IDEAL 2006, 338–45. Berlin, Heidelberg: Springer Berlin Heidelberg, 2006. http://dx.doi.org/10.1007/11875581_41.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Silva, Washington, and Ginalber Serra. "A Hybrid Approach Based on DCT-Genetic-Fuzzy Inference System for Speech Recognition." In Intelligent Data Engineering and Automated Learning - IDEAL 2012, 52–59. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012. http://dx.doi.org/10.1007/978-3-642-32639-4_7.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Faria, Hugo, Manuel Rodrigues, and Paulo Novais. "An Approach to Authenticity Speech Validation Through Facial Recognition and Artificial Intelligence Techniques." In Intelligent Data Engineering and Automated Learning – IDEAL 2022, 54–63. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-21753-1_6.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Blanchard, Nathaniel, Michael Brady, Andrew M. Olney, Marci Glaus, Xiaoyi Sun, Martin Nystrand, Borhan Samei, Sean Kelly, and Sidney D’Mello. "A Study of Automatic Speech Recognition in Noisy Classroom Environments for Automated Dialog Analysis." In Lecture Notes in Computer Science, 23–33. Cham: Springer International Publishing, 2015. http://dx.doi.org/10.1007/978-3-319-19773-9_3.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Automated speech Recognition"

1

Johnstone, Anne, and Gerry Altmann. "Automated speech recognition." In the second conference. Morristown, NJ, USA: Association for Computational Linguistics, 1985. http://dx.doi.org/10.3115/976931.976966.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Alshamsi, Humaid, Veton Kepuska, Hazza Alshamsi, and Hongying Meng. "Automated Speech Emotion Recognition on Smart Phones." In 2018 9th IEEE Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON). IEEE, 2018. http://dx.doi.org/10.1109/uemcon.2018.8796594.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Rawat, Seema, Parv Gupta, and Praveen Kumar. "Digital life assistant using automated speech recognition." In 2014 Innovative Applications of Computational Intelligence on Power, Energy and Controls with their impact on Humanity (CIPECH). IEEE, 2014. http://dx.doi.org/10.1109/cipech.2014.7019075.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Shyry, S. Prayla, K. Kaja Kartheek, and K. N. RR Aravind. "Election Prediction with Automated Speech Emotion Recognition." In 2020 4th International Conference on Trends in Electronics and Informatics (ICOEI). IEEE, 2020. http://dx.doi.org/10.1109/icoei48184.2020.9143050.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Patel, Sunny, Ujjayan Dhar, Suraj Gangwani, Rohit Lad, and Pallavi Ahire. "Hand-gesture recognition for automated speech generation." In 2016 IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (RTEICT). IEEE, 2016. http://dx.doi.org/10.1109/rteict.2016.7807817.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Zhang, Xinlei, Takashi Miyaki, and Jun Rekimoto. "WithYou: Automated Adaptive Speech Tutoring With Context-Dependent Speech Recognition." In CHI '20: CHI Conference on Human Factors in Computing Systems. New York, NY, USA: ACM, 2020. http://dx.doi.org/10.1145/3313831.3376322.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Iwama, Futoshi, and Takashi Fukuda. "Automated Testing of Basic Recognition Capability for Speech Recognition Systems." In 2019 12th IEEE Conference on Software Testing, Validation and Verification (ICST). IEEE, 2019. http://dx.doi.org/10.1109/icst.2019.00012.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Anithadevi, N., P. Gokul, S. Muhil Nandan, R. Magesh, and S. Shiddharth. "Automated Speech Recognition System For Speaker Emotion Classification." In 2020 5th International Conference on Computing, Communication and Security (ICCCS). IEEE, 2020. http://dx.doi.org/10.1109/icccs49678.2020.9277228.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Chan, David, and Shalini Ghosh. "Content-Context Factorized Representations for Automated Speech Recognition." In Interspeech 2022. ISCA: ISCA, 2022. http://dx.doi.org/10.21437/interspeech.2022-390.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Patel, Ibrahim, and Y. Srinivasa Rao. "Technologies automated speech recognition approach to finger spelling." In 2010 International Conference on Computing, Communication and Networking Technologies (ICCCNT'10). IEEE, 2010. http://dx.doi.org/10.1109/icccnt.2010.5591724.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "Automated speech Recognition"

1

Clements, Mark A., John H. Hansen, Kathleen E. Cummings, and Sungjae Lim. Automatic Recognition of Speech in Stressful Environments. Fort Belvoir, VA: Defense Technical Information Center, August 1991. http://dx.doi.org/10.21236/ada242917.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Brown, Peter F. The Acoustic-Modeling Problem in Automatic Speech Recognition. Fort Belvoir, VA: Defense Technical Information Center, December 1987. http://dx.doi.org/10.21236/ada188529.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Vergyri, Dimitra, and Katrin Kirchhoff. Automatic Diacritization of Arabic for Acoustic Modeling in Speech Recognition. Fort Belvoir, VA: Defense Technical Information Center, January 2004. http://dx.doi.org/10.21236/ada457846.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Bass, James D. Advancing Noise Robust Automatic Speech Recognition for Command and Control Applications. Fort Belvoir, VA: Defense Technical Information Center, March 2006. http://dx.doi.org/10.21236/ada461436.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Stevenson, G. Analysis of Pre-Trained Deep Neural Networks for Large-Vocabulary Automatic Speech Recognition. Office of Scientific and Technical Information (OSTI), July 2016. http://dx.doi.org/10.2172/1289367.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Fatehifar, Mohsen, Josef Schlittenlacher, David Wong, and Kevin Munro. Applications Of Automatic Speech Recognition And Text-To-Speech Models To Detect Hearing Loss: A Scoping Review Protocol. INPLASY - International Platform of Registered Systematic Review and Meta-analysis Protocols, January 2023. http://dx.doi.org/10.37766/inplasy2023.1.0029.

Full text
Abstract:
Review question / Objective: This scoping review aims to identify published methods that have used automatic speech recognition or text-to-speech recognition technologies to detect hearing loss and report on their accuracy and limitations. Condition being studied: Hearing enables us to communicate with the surrounding world. According to reports by the World Health Organization, 1.5 billion suffer from some degree of hearing loss of which 430 million require medical attention. It is estimated that by 2050, 1 in every 4 people will experience some sort of hearing disability. Hearing loss can significantly impact people’s ability to communicate and makes social interactions a challenge. In addition, it can result in anxiety, isolation, depression, hindrance of learning, and a decrease in general quality of life. A hearing assessment is usually done in hospitals and clinics with special equipment and trained staff. However, these services are not always available in less developed countries. Even in developed countries, like the UK, access to these facilities can be a challenge in rural areas. Moreover, during a crisis like the Covid-19 pandemic, accessing the required healthcare can become dangerous and challenging even in large cities.
APA, Harvard, Vancouver, ISO, and other styles
7

Oran, D. Requirements for Distributed Control of Automatic Speech Recognition (ASR), Speaker Identification/Speaker Verification (SI/SV), and Text-to-Speech (TTS) Resources. RFC Editor, December 2005. http://dx.doi.org/10.17487/rfc4313.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Tao, Yang, Amos Mizrach, Victor Alchanatis, Nachshon Shamir, and Tom Porter. Automated imaging broiler chicksexing for gender-specific and efficient production. United States Department of Agriculture, December 2014. http://dx.doi.org/10.32747/2014.7594391.bard.

Full text
Abstract:
Extending the previous two years of research results (Mizarch, et al, 2012, Tao, 2011, 2012), the third year’s efforts in both Maryland and Israel were directed towards the engineering of the system. The activities included the robust chick handling and its conveyor system development, optical system improvement, online dynamic motion imaging of chicks, multi-image sequence optimal feather extraction and detection, and pattern recognition. Mechanical System Engineering The third model of the mechanical chick handling system with high-speed imaging system was built as shown in Fig. 1. This system has the improved chick holding cups and motion mechanisms that enable chicks to open wings through the view section. The mechanical system has achieved the speed of 4 chicks per second which exceeds the design specs of 3 chicks per second. In the center of the conveyor, a high-speed camera with UV sensitive optical system, shown in Fig.2, was installed that captures chick images at multiple frames (45 images and system selectable) when the chick passing through the view area. Through intensive discussions and efforts, the PIs of Maryland and ARO have created the protocol of joint hardware and software that uses sequential images of chick in its fall motion to capture opening wings and extract the optimal opening positions. This approached enables the reliable feather feature extraction in dynamic motion and pattern recognition. Improving of Chick Wing Deployment The mechanical system for chick conveying and especially the section that cause chicks to deploy their wings wide open under the fast video camera and the UV light was investigated along the third study year. As a natural behavior, chicks tend to deploy their wings as a mean of balancing their body when a sudden change in the vertical movement was applied. In the latest two years, this was achieved by causing the chicks to move in a free fall, in the earth gravity (g) along short vertical distance. The chicks have always tended to deploy their wing but not always in wide horizontal open situation. Such position is requested in order to get successful image under the video camera. Besides, the cells with checks bumped suddenly at the end of the free falling path. That caused the chicks legs to collapse inside the cells and the image of wing become bluer. For improving the movement and preventing the chick legs from collapsing, a slowing down mechanism was design and tested. This was done by installing of plastic block, that was printed in a predesign variable slope (Fig. 3) at the end of the path of falling cells (Fig.4). The cells are moving down in variable velocity according the block slope and achieve zero velocity at the end of the path. The slop was design in a way that the deacceleration become 0.8g instead the free fall gravity (g) without presence of the block. The tests showed better deployment and wider chick's wing opening as well as better balance along the movement. Design of additional sizes of block slops is under investigation. Slops that create accelerations of 0.7g, 0.9g, and variable accelerations are designed for improving movement path and images.
APA, Harvard, Vancouver, ISO, and other styles
9

Issues in Data Processing and Relevant Population Selection. OSAC Speaker Recognition Subcommittee, November 2022. http://dx.doi.org/10.29325/osac.tg.0006.

Full text
Abstract:
In Forensic Automatic Speaker Recognition (FASR), forensic examiners typically compare audio recordings of a speaker whose identity is in question with recordings of known speakers to assist investigators and triers of fact in a legal proceeding. The performance of automated speaker recognition (SR) systems used for this purpose depends largely on the characteristics of the speech samples being compared. Examiners must understand the requirements of specific systems in use as well as the audio characteristics that impact system performance. Mismatch conditions between the known and questioned data samples are of particular importance, but the need for, and impact of, audio pre-processing must also be understood. The data selected for use in a relevant population can also be critical to the performance of the system. This document describes issues that arise in the processing of case data and in the selections of a relevant population for purposes of conducting an examination using a human supervised automatic speaker recognition approach in a forensic context. The document is intended to comply with the Organization of Scientific Area Committees (OSAC) for Forensic Science Technical Guidance Document.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography