Добірка наукової літератури з теми "Syllable-based ASR"

Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями

Оберіть тип джерела:

Ознайомтеся зі списками актуальних статей, книг, дисертацій, тез та інших наукових джерел на тему "Syllable-based ASR".

Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.

Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.

Статті в журналах з теми "Syllable-based ASR"

1

Galatang, Danny Henry, and Suyanto Suyanto. "Syllable-Based Indonesian Automatic Speech Recognition." International Journal on Electrical Engineering and Informatics 12, no. 4 (December 31, 2020): 720–28. http://dx.doi.org/10.15676/ijeei.2020.12.4.2.

Повний текст джерела
Анотація:
The syllable-based automatic speech recognition (ASR) systems commonly perform better than the phoneme-based ones. This paper focuses on developing an Indonesian monosyllable-based ASR (MSASR) system using an ASR engine called SPRAAK and comparing it to a phoneme-based one. The Mozilla DeepSpeech-based end-to-end ASR (MDSE2EASR), one of the state-of-the-art models based on character (similar to the phoneme-based model), is also investigated to confirm the result. Besides, a novel Kaituoxu SpeechTransformer (KST) E2EASR is also examined. Testing on the Indonesian speech corpus of 5,439 words shows that the proposed MSASR produces much higher word accuracy (76.57%) than the monophone-based one (63.36%). Its performance is comparable to the character-based MDS-E2EASR, which produces 76.90%, and the character-based KST-E2EASR (78.00%). In the future, this monosyllable-based ASR is possible to be improved to the bisyllable-based one to give higher word accuracy. Nevertheless, extensive bisyllable acoustic models must be handled using an advanced method.
Стилі APA, Harvard, Vancouver, ISO та ін.
2

Valizada, Alakbar. "DEVELOPMENT OF A REAL-TIME SPEECH RECOGNITION SYSTEM FOR THE AZERBAIJANI LANGUAGE." Problems of Information Society 14, no. 2 (July 5, 2023): 55–60. http://dx.doi.org/10.25045/jpis.v14.i2.07.

Повний текст джерела
Анотація:
This paper investigates the development of a real-time automatic speech recognition system dedicated to the Azerbaijani language, focusing on addressing the prevalent gap in speech recognition system for underrepresented languages. Our research integrates a hybrid acoustic modeling approach that combines Hidden Markov Model and Deep Neural Network to interpret the complexities of Azerbaijani acoustic patterns effectively. Recognizing the agglutinative nature of Azerbaijani, the ASR system employs a syllable-based n-gram model for language modeling, ensuring the system accurately captures the syntax and semantics of Azerbaijani speech. To enable real-time capabilities, we incorporate WebSocket technology, which facilitates efficient bidirectional communication between the client and server, necessary for processing streaming speech data instantly. The Kaldi and SRILM toolkits are used for the training of acoustic and language models, respectively, contributing to the system's robust performance and adaptability. We have conducted comprehensive experiments to test the effectiveness of our system, the results of which strongly corroborate the utility of the syllable-based subword modeling approach for Azerbaijani language recognition. Our proposed ASR system shows superior performance in terms of recognition accuracy and rapid response times, outperforming other systems tested on the same language data. The system's success not only proves beneficial for Azerbaijani language recognition but also provides a valuable framework for potential future applications in other agglutinative languages, thereby contributing to the promotion of linguistic diversity in automatic speech recognition technology.
Стилі APA, Harvard, Vancouver, ISO та ін.
3

Mahesha, P., and D. S. Vinod. "Gaussian Mixture Model Based Classification of Stuttering Dysfluencies." Journal of Intelligent Systems 25, no. 3 (July 1, 2016): 387–99. http://dx.doi.org/10.1515/jisys-2014-0140.

Повний текст джерела
Анотація:
AbstractThe classification of dysfluencies is one of the important steps in objective measurement of stuttering disorder. In this work, the focus is on investigating the applicability of automatic speaker recognition (ASR) method for stuttering dysfluency recognition. The system designed for this particular task relies on the Gaussian mixture model (GMM), which is the most widely used probabilistic modeling technique in ASR. The GMM parameters are estimated from Mel frequency cepstral coefficients (MFCCs). This statistical speaker-modeling technique represents the fundamental characteristic sounds of speech signal. Using this model, we build a dysfluency recognizer that is capable of recognizing dysfluencies irrespective of a person as well as what is being said. The performance of the system is evaluated for different types of dysfluencies such as syllable repetition, word repetition, prolongation, and interjection using speech samples from the University College London Archive of Stuttered Speech (UCLASS).
Стилі APA, Harvard, Vancouver, ISO та ін.
4

Perrine, Brittany L., Ronald C. Scherer, and Jason A. Whitfield. "Signal Interpretation Considerations When Estimating Subglottal Pressure From Oral Air Pressure." Journal of Speech, Language, and Hearing Research 62, no. 5 (May 21, 2019): 1326–37. http://dx.doi.org/10.1044/2018_jslhr-s-17-0432.

Повний текст джерела
Анотація:
Purpose Oral air pressure measurements during lip occlusion for /pVpV/ syllable strings are used to estimate subglottal pressure during the vowel. Accuracy of this method relies on smoothly produced syllable repetitions. The purpose of this study was to investigate the oral air pressure waveform during the /p/ lip occlusions and propose physiological explanations for nonflat shapes. Method Ten adult participants were trained to produce the “standard condition” and were instructed to produce nonstandard tasks. Results from 8 participants are included. The standard condition required participants to produce /pːiːpːiː.../ syllables smoothly at approximately 1.5 syllables/s. The nonstandard tasks included an air leak between the lips, faster syllable repetition rates, an initial voiced consonant, and 2-syllable word productions. Results Eleven oral air pressure waveform shapes were identified during the lip occlusions, and plausible physiological explanations for each shape are provided based on the tasks in which they occurred. Training the use of the standard condition, the initial voice consonant condition, and the 2-syllable word production increased the likelihood of rectangular oral air pressure waveform shapes. Increasing the rate beyond 1.5 syllables/s improved the probability of producing rectangular oral air pressure signal shapes in some participants. Conclusions Visual and verbal feedback improved the likelihood of producing rectangular oral air pressure signal shapes. The physiological explanations of variations in the oral air pressure waveform shape may provide direction to the clinician or researcher when providing feedback to increase the accuracy of estimating subglottal pressure from oral air pressure.
Стилі APA, Harvard, Vancouver, ISO та ін.
5

Rui, Xian Yi, Yi Biao Yu, and Ying Jiang. "Connected Mandarin Digit Speech Recognition Using Two-Layer Acoustic Universal Structure." Advanced Materials Research 846-847 (November 2013): 1380–83. http://dx.doi.org/10.4028/www.scientific.net/amr.846-847.1380.

Повний текст джерела
Анотація:
Because of the single-syllable of Chinese words and the confusing nature of Chinese pronunciation, connected mandarin digit speech recognition (CMDSR) is a challenging task in the field of speech recognition. This paper applied a novel acoustic representation of speech, called the acoustic universal structure (AUS) where the non-linguistic variations such as vocal tract length, lines and noises are well removed. A two-layer matching strategy based on the AUS models of speech, including the digit and string AUS models, is proposed for connected mandarin digit speech recognition. The speech recognition system for connected mandarin digits is described in detail, and the experimental results show that the proposed method can obtain the higher recognition rate.
Стилі APA, Harvard, Vancouver, ISO та ін.
6

Rong, Panying. "Neuromotor Control of Speech and Speechlike Tasks: Implications From Articulatory Gestures." Perspectives of the ASHA Special Interest Groups 5, no. 5 (October 23, 2020): 1324–38. http://dx.doi.org/10.1044/2020_persp-20-00070.

Повний текст джерела
Анотація:
Purpose This study aimed to provide a preliminary examination of the articulatory control of speech and speechlike tasks based on a gestural framework and identify shared and task-specific articulatory factors in speech and speechlike tasks. Method Ten healthy participants performed two speechlike tasks (i.e., alternating motion rate [AMR] and sequential motion rate [SMR]) and three speech tasks (i.e., reading of “clever Kim called the cat clinic” at the regular, fast, and slow rates) that varied in phonological complexity and rate. Articulatory kinematics were recorded using an electromagnetic kinematic tracking system (Wave, Northern Digital Inc.). Based on the gestural framework for articulatory phonology, the gestures of tongue body and lips were derived from the kinematic data. These gestures were subjected to a fine-grained analysis, which extracted (a) four gestural features (i.e., range of magnitude [ROM], frequency [Freq], acceleration time, and maximum speed [maxSpd]) for the tongue body gesture; (b) three intergestural measures including the peak intergestural coherence (InterCOH), frequency at which the peak intergestural coherence occurs (Freq_InterCOH), and the mean absolute relative phase between the tongue body and lip gestures; and (c) three intragestural (i.e., interarticulator) measures including the peak intragestural coherence (IntraCOH), Freq_IntraCOH, and mean absolute relative phase between the tongue body and the jaw, which are the component articulators that underlie the tongue body gesture. In addition, the performance rate for each task was also derived. The effects of task and sex on all the articulatory and behavioral measures were examined using mixed-design analysis of variance followed by post hoc pairwise comparisons across tasks. Results Task had a significant effect on performance rate, ROM, Freq, maxSpd, InterCOH, Freq_InterCOH, IntraCOH, and Freq_IntraCOH. Compared to the speech tasks, the AMR task showed a decrease in ROM and increases in Freq, InterCOH, Freq_InterCOH, IntraCOH, and Freq_IntraCOH. The SMR task showed similar ROM, Freq, maxSpd, InterCOH, and IntraCOH as the fast and regular speech tasks. Conclusions The simple phonological structure and demand for rapid syllable rate for the AMR task may elicit a distinct articulatory control mechanism. Despite being a rapid nonsense syllable repetition task, the relatively complex phonological structure of the SMR task appeared to elicit a similar articulatory control mechanism as that of speech production. Based on these shared and task-specific articulatory features between speech and speechlike tasks, the clinical implications for articulatory assessment were discussed.
Стилі APA, Harvard, Vancouver, ISO та ін.
7

Van der Burg, Erik, and Patrick T. Goodbourn. "Rapid, generalized adaptation to asynchronous audiovisual speech." Proceedings of the Royal Society B: Biological Sciences 282, no. 1804 (April 7, 2015): 20143083. http://dx.doi.org/10.1098/rspb.2014.3083.

Повний текст джерела
Анотація:
The brain is adaptive. The speed of propagation through air, and of low-level sensory processing, differs markedly between auditory and visual stimuli; yet the brain can adapt to compensate for the resulting cross-modal delays. Studies investigating temporal recalibration to audiovisual speech have used prolonged adaptation procedures, suggesting that adaptation is sluggish. Here, we show that adaptation to asynchronous audiovisual speech occurs rapidly. Participants viewed a brief clip of an actor pronouncing a single syllable. The voice was either advanced or delayed relative to the corresponding lip movements, and participants were asked to make a synchrony judgement. Although we did not use an explicit adaptation procedure, we demonstrate rapid recalibration based on a single audiovisual event. We find that the point of subjective simultaneity on each trial is highly contingent upon the modality order of the preceding trial. We find compelling evidence that rapid recalibration generalizes across different stimuli, and different actors. Finally, we demonstrate that rapid recalibration occurs even when auditory and visual events clearly belong to different actors. These results suggest that rapid temporal recalibration to audiovisual speech is primarily mediated by basic temporal factors, rather than higher-order factors such as perceived simultaneity and source identity.
Стилі APA, Harvard, Vancouver, ISO та ін.
8

Truong Tien, Toan. "ASR - VLSP 2021: An Efficient Transformer-based Approach for Vietnamese ASR Task." VNU Journal of Science: Computer Science and Communication Engineering 38, no. 1 (June 30, 2022). http://dx.doi.org/10.25073/2588-1086/vnucsce.325.

Повний текст джерела
Анотація:
Various techniques have been applied to enhance automatic speech recognition during the last few years. Reaching auspicious performance in natural language processing makes Transformer architecture becoming the de facto standard in numerous domains. This paper first presents our effort to collect a 3000-hour Vietnamese speech corpus. After that, we introduce the system used for VLSP 2021 ASR task 2, which is based on the Transformer. Our simple method achieves a favorable syllable error rate of 6.72% and gets second place on the private test. Experimental results indicate that the proposed approach dominates traditional methods with lower syllable error rates on general-domain evaluation sets. Finally, we show that applying Vietnamese word segmentation on the label does not improve the efficiency of the ASR system.
Стилі APA, Harvard, Vancouver, ISO та ін.
9

Thanh, Pham Viet, Le Duc Cuong, Dao Dang Huy, Luu Duc Thanh, Nguyen Duc Tan, Dang Trung Duc Anh, and Nguyen Thi Thu Trang. "ASR - VLSP 2021: Semi-supervised Ensemble Model for Vietnamese Automatic Speech Recognition." VNU Journal of Science: Computer Science and Communication Engineering 38, no. 1 (June 30, 2022). http://dx.doi.org/10.25073/2588-1086/vnucsce.332.

Повний текст джерела
Анотація:
Automatic speech recognition (ASR) is gaining huge advances with the arrival of End-to-End architectures. Semi-supervised learning methods, which can utilize unlabeled data, have largely contributed to the success of ASR systems, giving them the ability to surpass human performance. However, most of the researches focus on developing these techniques for English speech recognition, which raises concern about their performance in other languages, especially in low-resource scenarios. In this paper, we aim at proposing a Vietnamese ASR system for participating in the VLSP 2021 Automatic Speech Recognition Shared Task. The system is based on the Wav2vec 2.0 framework, along with the application of self-training and several data augmentation techniques. Experimental results show that on the ASR-T1 test set of the shared task, our proposed model achieved a remarkable result, ranked as the second place with a Syllable Error Rate (SyER) of 11.08%.
Стилі APA, Harvard, Vancouver, ISO та ін.
10

Haley, Katarina L., Adam Jacks, Soomin Kim, Marcia Rodriguez, and Lorelei P. Johnson. "Normative Values for Word Syllable Duration With Interpretation in a Large Sample of Stroke Survivors With Aphasia." American Journal of Speech-Language Pathology, August 18, 2023, 1–13. http://dx.doi.org/10.1044/2023_ajslp-22-00300.

Повний текст джерела
Анотація:
Purpose: Slow speech rate and abnormal temporal prosody are primary diagnostic criteria for differentiating between people with aphasia who do and do not have apraxia of speech. We sought to identify appropriate cutoff values for abnormal word syllable duration (WSD) in a word repetition task, interpret them relative to a data set of people with chronic aphasia, and evaluate the extent to which manually derived measures could be approximated through an automated process that relied on commercial speech recognition technology. Method: Fifty neurotypical participants produced 49 multisyllabic words during a repetition task. Audio recordings were submitted to an automated speech recognition (ASR) service (IBM Watson) to measure word duration and generate an orthographic transcription. The transcribed words were compared to a lexical database, and the number of syllables was identified. Automatic and manual measures were compared for 50% of the sample. Results were interpreted relative to WSD scores from an existing data set of 195 people with mostly chronic aphasia. Results: ASR correctly identified 83% of target words and 98% of target syllable counts. Automated word duration calculations were longer than manual measures due to imprecise cursor placement. Upon applying regression coefficients to the automated measures and examining the frequency distributions for both manual and estimated measures, a WSD of 303–316 ms was found to indicate longer-than-normal performance (corresponding to the 95th percentile). With this cutoff, 40%–45% of participants with aphasia in our comparison sample had an abnormally long WSD. Conclusions: We recommend using a rounded WSD cutoff score between 303 and 316 ms for manual measures. Future research will focus on customizing automated WSD methods to speech samples from people with aphasia, identifying target words that maximize production and measurement reliability, and developing WSD standard scores based on a large participant sample with and without aphasia.
Стилі APA, Harvard, Vancouver, ISO та ін.

Дисертації з теми "Syllable-based ASR"

1

DavidSarwono and 林立成. "A Syllable Cluster Based Weighted Kernel Feature Matrix for ASR Substitution Error Correction." Thesis, 2012. http://ndltd.ncl.edu.tw/handle/28550786350424171660.

Повний текст джерела
Анотація:
碩士
國立成功大學
資訊工程學系碩博士班
100
Abstract A Syllable Cluster Based Weighted Kernel Feature Matrix for ASR Substitution Error Correction David Sarwono * Chung-Hsien Wu** Institute of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan, R.O.C. In recent years Automatic Speech Recognition (ASR) technology has become one of the most growing technologies in engineering science and research. However, the performance of ASR technology is still restricted in adverse environments. Errors in Automatic Speech Recognition outputs lead to low performance for speech applications, therefore correction techniques for these errors will be beneficial to applications relied on ASR outputs. In this study, A Syllable Cluster Based Weighted Kernel Feature Matrix based on Context Dependent Syllable Cluster (CDSC) is proposed for the generation of correction candidates. For candidate selection in the second stage, the n-gram language model is used to determine the final corrected sentence output, thus to improve speech recognition output results recognition rate. Experiments show that the proposed method improved from 48.50% to 45.31% and 15.37% to 10.31% in terms of Word Error Rate score and Syllable Error Rate as compared to the speech recognition approach. Keyword-Context Dependent Syllable, Automatic Speech Recognizer, Error Correction, Natural Language Processing * The Author ** The Advisor
Стилі APA, Harvard, Vancouver, ISO та ін.
2

Anoop, C. S. "Automatic speech recognition for low-resource Indian languages." Thesis, 2023. https://etd.iisc.ac.in/handle/2005/6195.

Повний текст джерела
Анотація:
Building good models for automatic speech recognition (ASR) requires large amounts of annotated speech data. Recent advancements in end-to-end speech recognition have aggravated the need for data. However, most Indian languages are low-resourced and lack enough training data to build robust and efficient ASR systems. Despite the challenges associated with the scarcity of data, Indian languages offer some unique characteristics that can be utilized to improve speech recognition in low-resource settings. Most languages have an overlapping phoneme set and a strong correspondence between their character sets and pronunciations. Though the writing systems are different, the Unicode tables are organized so that similar-sounding characters occur at the same offset in the range assigned for each language. In the first part of the thesis, we try to exploit the pronunciation similarities among multiple Indian languages by using a shared set of pronunciation-based tokens. We evaluate the ASR performance for four choices of tokens, namely Epitran, Indian language speech sound label set (ILSL12), Sanskrit phonetic library encoding (SLP1), and SLP1-M (SLP1 modified to include some contextual pronunciation rules). Using Sanskrit as a representative Indian language, we conduct monolingual experiments to evaluate their ASR performance. Conventional Gaussian mixture model (GMM) - hidden Markov model (HMM) approaches, and neural network models leveraging on the alignments from the conventional models benefit from the stringent pronunciation modeling in SLP1-M. However, end-to-end (E2E) trained time-delay neural networks (TDNN) yield the best results with SLP1. Most Indian languages are spoken in units of syllables. However, syllables have never been used for E2E speech recognition in the Indian language, to the best of our knowledge. So we compare token units like native script characters, SLP1, and syllables in the monolingual settings for multiple Indian languages. We also evaluate the performance of sub-word units generated with the byte pair encoding (BPE) and unigram language model (ULM) algorithms on these basic units. We find that syllable-based sub-word units are promising alternatives to graphemes in monolingual speech recognition if the dataset fairly covers the syllables in the language. The benefits of syllable sub-words in E2E speech recognition may be attributed to the reduced effective length of the token sequences. We also investigate if the models trained on different token units can complement each other in a pretraining-fine-tuning setup. However, the performance improvements in such a setup with syllable-BPE and SLP1 character tokens are minor compared to the syllable-BPE trained model. We also investigate the suitability of syllable-based units in a cross-lingual training setup for a low-resource target language. However, the model faces convergence issues. SLP1 characters are a better choice in crosslingual transfer learning than the syllable sub-words. In the first part, we also verify the effectiveness of SpecAugment in an extremely low-resource setting. We apply SpecAugment on the log-mel spectrogram for data augmentation in a limited dataset of just 5.5 hours. The assumption is that the target language has no closely related high-resource source language, and only very limited data is available. SpecAugment provides an absolute improvement of 13.86% in WER on a connectionist temporal classification (CTC) based E2E system with weighted finite-state transducer (WFST) decoding. Based on this result, we extensively use SpecAugment in our experiments with E2E models. In the second part of the thesis, we address the strategies for improving the performance of ASR systems in low-resource scenarios (target languages), exploiting the annotated data from high-resource languages (source languages). Based on the results in the first part of the thesis, we extensively use SLP1 tokens in multilingual experiments on E2E networks. We specifically explore the following settings: (a) Labeled audio data is not available in the target language. Only a limited amount of unlabeled data is available. We propose using unsupervised domain adaptation (UDA) approaches in a hybrid DNN(deep neural network)-HMM setting to build ASR systems for low-resource languages sharing a common acoustic space with high-resource languages. We explore two architectures: i) domain adversarial training using gradient reversal layer (GRL) and ii) domain separation network (DSN). The GRL and DSN architectures give absolute improvements of 6.71% and 7.32%, respectively, in word error rate (WER) over the baseline DNN with Hindi in the source domain and Sanskrit in the target domain. We also find that a judicious selection of the source language yields further improvements. (b) Target language has only a small amount of labeled data and has some amount of text data to build language models. We try to benefit from the available data in high-resource languages through a common shared label set to build unified acoustic (AM) and language models (LM). We study and compare the performance of these unified models with that of the monolingual model in low-resource conditions. The unified language-agnostic AM + LM performs better than monolingual AM + LM in cases where (a) only limited speech data is available for training the acoustic models and (b) the speech data is from domains different from that used in training. Multilingual AM + monolingual LM performs the best in general. However, from the results, applying unified models directly (without fine-tuning) to unseen languages does not seem to be a good choice. (c) There are N target languages with limited training data and several source languages with large training sets. We explore the usefulness of model-agnostic meta-learning (MAML) pre-training for Indian languages and study the importance of selection of the source languages. We find that MAML beats joint multilingual pretraining by an average of 5.4% in CER and 20.3% in WER with just five epochs of fine-tuning. Moreover, MAML achieves performances similar to joint multilingual training with just 25% of the training data. Similarity with the source languages impacts the target language’s ASR performance. We propose a text-similarity-based loss-weighting scheme to exploit this artifact. We find absolute improvements of 1% (on average) in WER with the loss-weighing scheme. The main contributions of the thesis are: 1. Finding that the use of SLP1 tokens as a common label set for Indian languages helps to remove the redundancy involved in pooling the characters from multiple languages. 2. Exploring for the first time (to the best of our knowledge) syllable-based token units for E2E speech recognition in Indian languages. We find that they are suitable only for monolingual ASR systems. 3. Formulating the ASR in a low-resource language lacking labeled data (for the first time) as an unsupervised domain adaptation problem from a related high-resource language. 4. Exploring for the first time both unified acoustic and language models in a multilingual ASR for Indian languages. The scheme has shown success in cases where the data for acoustic modeling is limited and in settings where the test data is out-of-domain. 5. Proposing a textual similarity-based loss-weighing scheme for MAML pretraining which improves the performance of vanilla MAML models.
Стилі APA, Harvard, Vancouver, ISO та ін.

Книги з теми "Syllable-based ASR"

1

Stein, Gabriele. Peter Levins’ description of word-formation (1570). Oxford University Press, 2017. http://dx.doi.org/10.1093/oso/9780198807377.003.0008.

Повний текст джерела
Анотація:
One of the most original English lexicographical ventures in the sixteenth century was Peter Levins’ Manipulus vocabulorum (1570). This is the first English rhyming dictionary. Some nine thousand English words were arranged in the alphabetical order of their last syllable and then translated into Latin. Levins’ word selection will thus have been largely based on the sound structure of the lexical items. The long years spent by Levins on assembling and arranging the dictionary material inevitably drew his attention to English suffixes like -able, -er, -ish, -less, and -ness and such second elements in compounds as fold, garth, house, man, and yard. The column arrangement of the dictionary is thus often interrupted by explicit specifications of synchronic English word-formation patterns (and Latin correspondences).
Стилі APA, Harvard, Vancouver, ISO та ін.
2

Uffmann, Christian. World Englishes and Phonological Theory. Edited by Markku Filppula, Juhani Klemola, and Devyani Sharma. Oxford University Press, 2015. http://dx.doi.org/10.1093/oxfordhb/9780199777716.013.32.

Повний текст джерела
Анотація:
The relationship between phonological theory and World Englishes is generally characterized by a mutual lack of interest. This chapter argues for a greater engagement of both fields with each other, looking at constraint-based theories of phonology, especially Optimality Theory (OT), as a case in point. Contact varieties of English provide strong evidence for synchronically active constraints, as it is substrate or L1 constraints that are regularly transferred to the contact variety, not rules. Additionally, contact varieties that have properties that are in some way ‘in between’ the substrate and superstrate systems provide evidence for constraint hierarchies or implicational relationships between constraints, illustrated here primarily with examples from syllable structure. Conversely, for a scholar working on the description of World Englishes, OT can offer an explanation of where the patterns found in a contact variety come from, namely from the transfer of substrate constraint rankings (and subsequent gradual constraint demotion).
Стилі APA, Harvard, Vancouver, ISO та ін.

Частини книг з теми "Syllable-based ASR"

1

Kim, Byeongchang, Junhwi Choi, and Gary Geunbae Lee. "ASR Error Management Using RNN Based Syllable Prediction for Spoken Dialog Applications." In Advances in Parallel and Distributed Computing and Ubiquitous Services, 99–106. Singapore: Springer Singapore, 2016. http://dx.doi.org/10.1007/978-981-10-0068-3_12.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
2

Schneider, Anne H., Johannes Hellrich, and Saturnino Luz. "Word, Syllable and Phoneme Based Metrics Do Not Correlate with Human Performance in ASR-Mediated Tasks." In Advances in Natural Language Processing, 392–99. Cham: Springer International Publishing, 2014. http://dx.doi.org/10.1007/978-3-319-10888-9_39.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
3

Heslop, Kate. "A Poetry Machine." In Viking Mediologies, 160–84. Fordham University Press, 2022. http://dx.doi.org/10.5422/fordham/9780823298242.003.0007.

Повний текст джерела
Анотація:
Analysis of the two manuscript versions of the Second Grammatical Treatise reveals a common interest in musical performance, which is also reflected in the treatise’s tripartite division of sound, indebted to medieval music theory. Music and grammar meet in the ars rithmica, an analytical tradition devoted to syllable-counting, often rhyming kinds of poetry usually performed to musical accompaniment. The “new poetics” of the late twelfth and thirteenth centuries, influenced by ars rithmica, posits meter as a tool for the renewal of poetry based on the best of old traditions. The influence of ars rithmica is apparent in the grammatical treatises, and its characteristic style of analysis can be traced in Háttatal’s account of the end-rhymed runhent meter. The “poetry machine” of the Codex Upsaliensis version of the Second Grammatical Treatise, a diagrammatical representation of poetic rhyme based on the conceit of a hurdy-gurdy with letter-annotated keys, demonstrates that the interrelationship between rhyme, music, and meter was virulent in a pedagogical context in late medieval Iceland, while its manuscript link with Háttatal suggests a reading of the Prose Edda compilation as an Icelandic “new poetics.”
Стилі APA, Harvard, Vancouver, ISO та ін.

Тези доповідей конференцій з теми "Syllable-based ASR"

1

Ryu, Hyuksu, Minsu Na, and Minhwa Chung. "Pronunciation modeling of loanwords for Korean ASR using phonological knowledge and syllable-based segmentation." In 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA). IEEE, 2015. http://dx.doi.org/10.1109/apsipa.2015.7415308.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
2

Liu, Chao-Hong, Chung-Hsien Wu, David Sarwono, and Jhing-Fa Wang. "Candidate generation for ASR output error correction using a context-dependent syllable cluster-based confusion matrix." In Interspeech 2011. ISCA: ISCA, 2011. http://dx.doi.org/10.21437/interspeech.2011-488.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
3

Hromada, Daniel, and Hyungjoong Kim. "Digital Primer Implementation of Human-Machine Peer Learning for Reading Acquisition: Introducing Curriculum 2." In 10th International Conference on Human Interaction and Emerging Technologies (IHIET 2023). AHFE International, 2023. http://dx.doi.org/10.54941/ahfe1004027.

Повний текст джерела
Анотація:
The aim of the digital primer project is cognitive enrichment and fostering of acquisition of basic literacy and numeracy of 5 – 10 year old children. Here, we focus on Primer's ability to accurately process child speech which is fundamental to the acquisition of reading component of the Primer. We first note that automatic speech recognition (ASR) and speech-to-text of child speech is a challenging task even for large-scale, cloud-based ASR systems. Given that the Primer is an embedded AI artefact which aims to perform all computations on edge devices like RaspberryPi or Nvidia Jetson, the task is even more challenging and special tricks and hacks need to be implemented to execute all necessary inferences in quasi-real-time. One such trick explored in this article is transformation of a generic ASR problem into much more constrained multiclass-classification problem by means of task-specific language models / scorers. Another one relates to adoption of "human machine peer learning" (HMPL) strategy whereby the DeepSpeech model behind the ASR system is supposed to gradually adapt its parameters to particular characteristics of the child using it. In this article, we describe first, syllable-oriented exercise by means of which the Primer aimed to assist one 5-year-old pre-schooler in increase of her reading competence. The pupil went through sequence of exercises composed of evaluation and learning tasks. Consistently with previous HMPL study, we observe increase of both child's reading skill as well as of machine's ability to accurately process child's speech.
Стилі APA, Harvard, Vancouver, ISO та ін.
4

Qu, Zhongdi, Parisa Haghani, Eugene Weinstein, and Pedro Moreno. "Syllable-based acoustic modeling with CTC-SMBR-LSTM." In 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). IEEE, 2017. http://dx.doi.org/10.1109/asru.2017.8268932.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
5

Moungsri, Decha, Tomoki Koriyama, and Takao Kobayashi. "Enhanced F0 generation for GPR-based speech synthesis considering syllable-based prosodic features." In 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). IEEE, 2017. http://dx.doi.org/10.1109/apsipa.2017.8282285.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
6

Sakamoto, Nagisa, Kazumasa Yamamoto, and Seiichi Nakagawa. "Combination of syllable based N-gram search and word search for spoken term detection through spoken queries and IV/OOV classification." In 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU). IEEE, 2015. http://dx.doi.org/10.1109/asru.2015.7404795.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Ми пропонуємо знижки на всі преміум-плани для авторів, чиї праці увійшли до тематичних добірок літератури. Зв'яжіться з нами, щоб отримати унікальний промокод!

До бібліографії