To see the other types of publications on this topic, follow the link: Speech waveform analysis.

Journal articles on the topic 'Speech waveform analysis'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Speech waveform analysis.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Askenfelt, Anders G., and Britta Hammarberg. "Speech Waveform Perturbation Analysis." Journal of Speech, Language, and Hearing Research 29, no. 1 (March 1986): 50–64. http://dx.doi.org/10.1044/jshr.2901.50.

Full text
Abstract:
The performance of seven acoustic measures of cycle-to-cycle variations (perturbations) in the speech waveform was compared. All measures were calculated automatically and applied on running speech. Three of the measures refer to the frequency of occurrence and severity of waveform perturbations in special selected parts of the speech, identified by means of the rate of change in the fundamental frequency. Three other measures refer to statistical properties of the distribution of the relative frequency differences between adjacent pitch periods. One perturbation measure refers to the percentage of consecutive pitch period differences with alternating signs. The acoustic measures were tested on tape recorded speech samples from 41 voice patients, before and after successful therapy. Scattergrams of acoustic waveform perturbation data versus an average of perceived deviant voice qualities, as rated by voice clinicians, are presented. The perturbation measures were compared with regard to the acoustic-perceptual correlation and their ability to discriminate between normal and pathological voice status. The standard deviation of the distribution of the relative frequency differences was suggested as the most useful acoustic measure of waveform perturbations for clinical applications.
APA, Harvard, Vancouver, ISO, and other styles
2

Yohanes, Banu W. "Linear Prediction and Long Term Predictor Analysis and Synthesis." Techné : Jurnal Ilmiah Elektroteknika 16, no. 01 (April 3, 2017): 49–58. http://dx.doi.org/10.31358/techne.v16i01.158.

Full text
Abstract:
Spectral analysis may not provide an accurate description of speech articulation. This article presents an experimental setup of representing speech waveform directly in terms of timevarying parameters. It is related to the transfer function of the vocal tract. Linear Prediction, Long Term Predictor Analysis, and Synthesis filters are designed and implemented, as well as the theory behind introduced. The workflows of the filters are explained by detailed and codes of those filters. Original waveform files are framed with Hamming window and for each frames the filters are applied, and the reconstructed speeches are compared to original waveforms. The results come out that LP and LTP analysis can be used in DSPs due to its periodical characteristic, but some distortion might be coursed, which examined in the experiments.
APA, Harvard, Vancouver, ISO, and other styles
3

Tadic, Predrag, Zeljko Djurovic, and Branko Kovacevic. "Analysis of speech waveform quantization methods." Journal of Automatic Control 18, no. 1 (2008): 19–22. http://dx.doi.org/10.2298/jac0801019t.

Full text
Abstract:
Digitalization, consisting of sampling and quantization, is the first step in any digital signal processing algorithm. In most cases, the quantization is uniform. However, having knowledge of certain stochastic attributes of the signal (namely, the probability density function, or pdf), quantization can be made more efficient, in the sense of achieving a greater signal to quantization noise ratio. This means that narrower channel bandwidths are required for transmitting a signal of the same quality. Alternatively, if signal storage is of interest, rather than transmission, considerable savings in memory space can be made. This paper presents several available methods for speech signal pdf estimation, and quantizer optimization in the sense of minimizing the quantization error power.
APA, Harvard, Vancouver, ISO, and other styles
4

Read, Charles, Eugene H. Buder, and Raymond D. Kent. "Speech Analysis Systems." Journal of Speech, Language, and Hearing Research 35, no. 2 (April 1992): 314–32. http://dx.doi.org/10.1044/jshr.3502.314.

Full text
Abstract:
Performance characteristics are reviewed for seven systems marketed for acoustic speech analysis: CSpeech, CSRE, ILS-PC, Kay Elemetrics model 5500 Sona-Graph, MacSpeech Lab II, MSL, and Signalyze. The characteristics reviewed include system components, basic capabilities (signal acquisition, waveform operations, analysis, and other functions), documentation, user interface, data formats and journaling, speed and precision of spectral analysis, and speed and precision of fundamental frequency analysis. Basic capabilities are also tabulated for three recently introduced systems: the Sensimetrics SpeechStation, the Kay Elemetrics Computerized Speech Lab (CSL), and the LSI Speech Workstation. In addition to the capability and performance summaries, this article offers suggestions for continued development of speech analysis systems, particularly in data exchange, journaling, display features, spectral analysis, and fundamental frequency analysis.
APA, Harvard, Vancouver, ISO, and other styles
5

Debruyne, F., P. Delaere, J. Wouters, and P. Uwents. "Acoustic analysis of tracheo-oesophageal versus oesophageal speech." Journal of Laryngology & Otology 108, no. 4 (April 1994): 325–28. http://dx.doi.org/10.1017/s0022215100126660.

Full text
Abstract:
AbstractIn order to evaluate the vocal quality of tracheo-oesophageal and oesophageal speech, several objective acoustic parameters were measured in the acoustic waveform (fundamental frequency, waveform perturbation) and in the frequency spectrum (harmonic prominence, spectral slope). Twelve patients using tracheo-oesophageal speech (with the Provox® valve) and 12 patients using oesophageal speech for at least two months, participated.The main results were that tracheo-oesophageal voices more often showed a detectable fundamental frequency, and that this fundamental frequency was fairly stable; there was also a tendency to more clearly defined harmonics in tracheo-oesophageal speech. This suggests a more regular vibratory pattern in the pharyngo-oesophageal segment, due to the more efficient respiratory drive in tracheo-oesophageal speech. So, a better quality of the voice can be expected, in addition to the longer phonation time and higher maximal intensity.
APA, Harvard, Vancouver, ISO, and other styles
6

Boggs, George J., and Michael D. Connelly. "WFORM: A graphical speech-waveform editing and analysis system." Behavior Research Methods, Instruments, & Computers 18, no. 1 (January 1986): 25–31. http://dx.doi.org/10.3758/bf03200989.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Balakrishnan Sivakumar and Praveen Kadakola Biligirirangaiah. "Analysis of vowel addition or deletion in Continuous Speech." Global Journal of Engineering and Technology Advances 7, no. 3 (June 30, 2021): 136–43. http://dx.doi.org/10.30574/gjeta.2021.7.3.0084.

Full text
Abstract:
In order to improve the recognition performance, the articulation of the transcription is very important in the process of training. For continuous speech, the essential characteristics of various speakers are pronunciation variation, over focused or inadequately highlighted words can results the waveform misalignment in the sub word unit margin. Because of the deviation in the articulation leads into misalignment when this is compared with articulation dictionary. So the deletion or insertion of the sub word is necessary. This happens because for each expression, the transcription is not precise. This paper presents the corrections in the transcription at the sub word level utilizing sound prompts that are presented in the waveform. The transcription of a word is fixed Utilizing sentence-level transcriptions with reference to the phonemes that create the word. Specifically, it clarifies that vowels are either deleted or inserted. To help the proposed contention, errors in persistent discourse are validated utilizing machine learning and signal processing tools. A programmed information driven annotator abusing the inductions drawn from the examination is utilized to address transcription errors. The outcomes show that rectified pronunciations lead to higher probability for train expressions in the TIMIT corpus.
APA, Harvard, Vancouver, ISO, and other styles
8

Zhang, Fawen, Chelsea Benson, and Steven J. Cahn. "Cortical Encoding of Timbre Changes in Cochlear Implant Users." Journal of the American Academy of Audiology 24, no. 01 (January 2013): 046–58. http://dx.doi.org/10.3766/jaaa.24.1.6.

Full text
Abstract:
Background: Most cochlear implant (CI) users describe music as a noise-like and unpleasant sound. Using behavioral tests, most prior studies have shown that perception of pitch-based melody and timbre is poor in CI users. Purpose: This article will focus on cortical encoding of timbre changes in CI users, which may allow us to find solutions to further improve CI benefits. Furthermore, the value of using objective measures to reveal neural encoding of timbre changes may be reflected in this study. Research Design: A case-control study of the mismatch negativity (MMN) using electrophysiological technique was conducted. To derive MMNs, three randomly arranged oddball paradigms consisting of standard/deviant instrumental pairs: saxophone/piano, cello/trombone, and flute/French horn, respectively, were presented. Study Sample: Ten CI users and ten normal-hearing (NH) listeners participated in this study. Data Collection and Analysis: After filtering, epoching, and baseline correction, independent component analysis (ICA) was performed to remove artifacts. The averaged waveforms in response to the standard stimuli (STANDARD waveform) and the deviant stimuli (DEVIANT waveform) in each condition were separately derived. The responses from nine electrodes in the fronto-central area were averaged to form one waveform. The STANDARD waveform was subtracted from the DEVIANT waveform to derive the difference waveform, for which the MMN was judged to be present or absent. The measures used to evaluate the MMN included the MMN peak latency and amplitude as well as MMN duration. Results: The MMN, which reflects the ability to automatically detect acoustic changes, was present in all NH listeners but only approximately half of CI users. In CI users with present MMNs, the MMN peak amplitude and duration were significantly smaller and shorter compared to those in NH listeners. Conclusions: Our electrophysiological results were consistent with prior behavioral results that CI users' performance in timbre perception was significantly poorer than that in NH listeners. Our results may suggest that timbre information is poorly registered in the auditory cortex of CI users and the capability of automatic detection of timbre changes is degraded in CI users. Although there are some limitations of the MMN in CI users, along with other objective auditory evoked potential tools, the MMN may be a useful objective tool to indicate the extent of sound registration in auditory cortex in the future efforts of improving CI design and speech strategy.
APA, Harvard, Vancouver, ISO, and other styles
9

Ingram, Kelly, Ferenc Bunta, and David Ingram. "Digital Data Collection and Analysis." Language, Speech, and Hearing Services in Schools 35, no. 2 (April 2004): 112–21. http://dx.doi.org/10.1044/0161-1461(2004/013).

Full text
Abstract:
Technology for digital speech recording and speech analysis is now readily available for all clinicians who use a computer. This article discusses some advantages of moving from analog to digital recordings and outlines basic recording procedures. The purpose of this article is to familiarize speech-language pathologists with computerized audio files and the benefits of working with those sound files as opposed to using analog recordings. This article addresses transcription issues and offers practical examples of various functions, such as playback, editing sound files, using waveform displays, and extracting utterances. An appendix is provided that describes step-by-step how digital recording can be done. It also provides some editing examples and a list of useful computer programs for audio editing and speech analyses. In addition, this article includes suggestions for clinical uses in both the assessment and the treatment of various speech and language disorders.
APA, Harvard, Vancouver, ISO, and other styles
10

Chi-Sang Jung, Young-Sun Joo, and Hong-Goo Kang. "Waveform Interpolation-Based Speech Analysis/Synthesis for HMM-Based TTS Systems." IEEE Signal Processing Letters 19, no. 12 (December 2012): 809–12. http://dx.doi.org/10.1109/lsp.2012.2221703.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Heo, Hee-Soo, Byung-Min So, IL-Ho Yang, and Ha-Jin Yu. "A Speech Waveform Forgery Detection Algorithm Based on Frequency Distribution Analysis." Phonetics and Speech Sciences 7, no. 4 (December 31, 2015): 35–40. http://dx.doi.org/10.13064/ksss.2015.7.4.035.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Palmer, Shannon B., and Frank E. Musiek. "N1-P2 Recordings to Gaps in Broadband Noise." Journal of the American Academy of Audiology 24, no. 01 (January 2013): 037–45. http://dx.doi.org/10.3766/jaaa.24.1.5.

Full text
Abstract:
Background: Normal temporal processing is important for the perception of speech in quiet and in difficult listening situations. Temporal resolution is commonly measured using a behavioral gap detection task, where the patient or subject must participate in the evaluation process. This is difficult to achieve with subjects who cannot reliably complete a behavioral test. However, recent research has investigated the use of evoked potential measures to evaluate gap detection. Purpose: The purpose of the current study was to record N1-P2 responses to gaps in broadband noise in normal hearing young adults. Comparisons were made of the N1 and P2 latencies, amplitudes, and morphology to different length gaps in noise in an effort to quantify the changing responses of the brain to these stimuli. It was the goal of this study to show that electrophysiological recordings can be used to evaluate temporal resolution and measure the influence of short and long gaps on the N1-P2 waveform. Research Design: This study used a repeated-measures design. All subjects completed a behavioral gap detection procedure to establish their behavioral gap detection threshold (BGDT). N1-P2 waveforms were recorded to the gap in a broadband noise. Gap durations were 20 msec, 2 msec above their BGDT, and 2 msec. These durations were chosen to represent a suprathreshold gap, a near-threshold gap, and a subthreshold gap. Study Sample: Fifteen normal-hearing young adult females were evaluated. Subjects were recruited from the local university community. Data Collection and Analysis: Latencies and amplitudes for N1 and P2 were compared across gap durations for all subjects using a repeated-measures analysis of variance. A qualitative description of responses was also included. Results: Most subjects did not display an N1-P2 response to a 2 msec gap, but all subjects had present clear evoked potential responses to 20 msec and 2+ msec gaps. Decreasing gap duration toward threshold resulted in decreasing waveform amplitude. However, N1 and P2 latencies remained stable as gap duration changed. Conclusions: N1-P2 waveforms can be elicited by gaps in noise in young normal-hearing adults. The responses are present as low as 2 msec above behavioral gap detection thresholds (BGDT). Gaps that are below BGDT do not generally evoke an electrophysiological response. These findings indicate that when a waveform is present, the gap duration is likely above their BGDT. Waveform amplitude is also a good index of gap detection, since amplitude decreases with decreasing gap duration. Future studies in this area will focus on various age groups and individuals with auditory disorders.
APA, Harvard, Vancouver, ISO, and other styles
13

Enders, Jörg, Weihua Geng, Peijun Li, Michael W. Frazier, and David J. Scholl. "The shift-invariant discrete wavelet transform and application to speech waveform analysis." Journal of the Acoustical Society of America 117, no. 4 (April 2005): 2122–33. http://dx.doi.org/10.1121/1.1869732.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Ikuma, Takeshi, Melda Kunduk, and Andrew J. McWhorter. "Advanced Waveform Decomposition for High-Speed Videoendoscopy Analysis." Journal of Voice 27, no. 3 (May 2013): 369–75. http://dx.doi.org/10.1016/j.jvoice.2013.01.004.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Nisha, K. V., and U. Ajith Kumar. "Pre-Attentive Neural Signatures of Auditory Spatial Processing in Listeners With Normal Hearing and Sensorineural Hearing Impairment: A Comparative Study." American Journal of Audiology 28, no. 2S (August 28, 2019): 437–49. http://dx.doi.org/10.1044/2018_aja-ind50-18-0099.

Full text
Abstract:
Purpose This study was carried out to understand the neural intricacies of auditory spatial processing in listeners with sensorineural hearing impairment (SNHI) and compare it with normal hearing (NH) listeners using both local and global measures of waveform analyses. Method A standard group comparison research design was adopted in this study. Participants were assigned to 2 groups. Group I consisted of 13 participants with mild–moderate flat or sloping SNHI, while Group II consisted of 13 participants with NH sensitivity. Electroencephalographic data using virtual acoustic stimuli (spatially loaded stimuli played in center, right, and left hemifields) were recorded from 64 electrode sites in passive oddball paradigm. Both local (electrode-wise waveform analysis) and global (dissimilarity index, electric field strength, and topographic pattern analyses) measures were performed on the electroencephalographic data. Results Results of local waveform analyses marked the appearance of mismatch negativity in an earlier time window, relative to those reported conventionally in both the groups. The global measures of electric field strength and topographic modulations (dissimilarity index) revealed differences between the 2 groups in different time periods, indicating multiphases (integration and consolidation) of spatial processing. Further, the topographic pattern analysis showed the emergence of different scalp maps for SNHI and NH in the time window corresponding to mismatch negativity (78–150 ms), suggestive of differential spatial processing between the groups at the cortical level. Conclusions The findings of this study highlights the differential allotment of neural generators, denoting variations in spatial processing between SNHI and NH individuals.
APA, Harvard, Vancouver, ISO, and other styles
16

Davis, Tara, Nicholas Stanley, and Lori Foran. "Age-Related Effects of Dichotic Attentional Mode on Interaural Asymmetry: An AERP Study with Independent Component Analysis." Journal of the American Academy of Audiology 26, no. 05 (May 2015): 461–77. http://dx.doi.org/10.3766/jaaa.14094.

Full text
Abstract:
Background: The degree of interaural asymmetry (IA) obtained on a dichotic listening task is affected by attentional demands attributable to the mode of test administration. Previous research has shown that IA in the elderly is more influenced by increased attentional demands than young adults (YAs), but the effect of attentional mode on IA in middle-aged adults (MAs) has not been established. Auditory event-related potentials (AERPs), such as the N400, allow the evaluation of subtle differences in linguistic and cognitive processing between YAs and MAs that are imperceptible by behavioral means. Purpose: To determine the effect of dichotic attentional mode on IA in the N400 in YA and MA listeners. Research Design: Participants listened to groups of words that consisted of a reference word followed by dichotic probe words. Participants judged whether probe words were semantically related or unrelated to the reference word. This semantic judgment task was elicited in both divided-attention (DIV) and directed-attention (DIR) modes. Study Sample: Twenty-three YA (19–25 yr) and twenty-three MA (47–59 yr) females participated in the study. Data Collection and Analysis: Individual, as well as grand-averaged, AERP waveforms, scalp topographies, and event-related potential-image plots were analyzed. A mixed design analysis of variance was used to compare the N400 amplitude and latency response between ears in both attentional modes. Independent component analysis was used to isolate temporally overlapping neural sources that contributed to the negativity in the latency range of the N400 component. Results: N400 amplitude was significantly more negative in the DIV mode than DIR in both age groups. IA differences between age groups were evident only in DIV, as indicated by an age-related shift in the direction of IA in the N400 from greater asymmetry on the right in YAs to greater asymmetry on the left in MAs. ICA revealed that the age-related difference in IA in the AERP waveform reflected differences between YAs and MAs primarily in an electroencephalographic source process consistent with attentional processing. Conclusions: IA differences between YAs and MAs were revealed in the N400 only in DIV, which was the result of an increased information-processing load. ICA successfully separated temporally overlapping EEG sources that contributed to the N400 component, allowing a refined interpretation of differences in the AERP waveform among groups.
APA, Harvard, Vancouver, ISO, and other styles
17

Saloni, Saloni, Rajender K. Sharma, and Anil K. Gupta. "Human Voice Waveform Analysis for Categorization of Healthy and Parkinson Subjects." International Journal of Healthcare Information Systems and Informatics 11, no. 1 (January 2016): 21–35. http://dx.doi.org/10.4018/ijhisi.2016010102.

Full text
Abstract:
Parkinson disease is a neurological disorder. In this disease control over body muscles get disturbed. In almost 90% of the cases, people suffering from Parkinson disease (PD) have speech disorders. The goal of the paper is to differentiate healthy and PD affected persons using voice analysis. There are no well-developed lab techniques available for Parkinson detection. Parkinson detection using voice analysis is a noninvasive, reliable and economic method. Using this technique patient need not to visit the clinic. In this paper the authors have recorded 155 phonations from 25 healthy and 22 PD affected persons. Classification is done using two proposed parameters: Local angular frequency and instantaneous deviation in the waveform. Support vector machine is used as a classifier. Maximum 86.8% classification accuracy is achieved using linear kernel function.
APA, Harvard, Vancouver, ISO, and other styles
18

Childers, D. G., D. M. Hicks, G. P. Moore, L. Eskenazi, and A. L. Lalwani. "Electroglottography and Vocal Fold Physiology." Journal of Speech, Language, and Hearing Research 33, no. 2 (June 1990): 245–54. http://dx.doi.org/10.1044/jshr.3302.245.

Full text
Abstract:
The electroglottogram (EGG) is known to be related to vocal fold motion. A major hypothesis undergoing examination in several research centers is that the EGG is related to the area of contact of the vocal folds. This hypothesis is difficult to substantiate with direct measurements using human subjects. However, other supporting evidence can be offered. For this study we made measurements from synchronized ultra high-speed laryngeal films and from EGG waveforms collected from subjects with normal larynges and patients with vocal disorders. We compare certain features of the EGG waveform to (a) the instant of the opening of the glottis, (b) the instant of the closing of the glottis, and (c) the instant of the maximum opening of the glottis. In addition, we compare both the open quotient and the relative average perturbation measured from the glottal area to that estimated from the EGG. All of these comparisons indicate that vocal fold vibratory characteristics are reflected by features of the EGG waveform. This makes the EGG useful for speech analysis and synthesis as well as for modeling laryngeal behavior. The limitations of the EGG are discussed.
APA, Harvard, Vancouver, ISO, and other styles
19

Pan, Hui, Mei Gao, and Yan Pan. "Research on Signal Acquisition and Analysis System Based on Virtual Instrument Technology." Applied Mechanics and Materials 556-562 (May 2014): 4321–24. http://dx.doi.org/10.4028/www.scientific.net/amm.556-562.4321.

Full text
Abstract:
In this paper, we used Lab VIEW development platform, designed a virtual signal acquisition and analysis system based on the soundcard. In addition we studied the signal acquisition performance of it. This system has functions of signal acquisition, analysis, waveform display and storage. It is easy to use with good interface, and function appending is available on user requirement. It offers a kind of DAQ solution with low cost. It has broad application prospect, can be extended to the field of speech recognition, ambient noise monitoring and laboratory measurement etc.
APA, Harvard, Vancouver, ISO, and other styles
20

Gonzalez, Jennifer E., and Frank E. Musiek. "The Onset–Offset N1–P2 Auditory Evoked Response in Individuals With High-Frequency Sensorineural Hearing Loss: Responses to Broadband Noise." American Journal of Audiology 30, no. 2 (June 14, 2021): 423–32. http://dx.doi.org/10.1044/2021_aja-20-00113.

Full text
Abstract:
Purpose Clinical use of electrophysiologic measures has been limited to use of brief stimuli to evoke responses. While brief stimuli elicit onset responses in individuals with normal hearing and normal central auditory nervous system (CANS) function, responses represent the integrity of a fraction of the mainly excitatory central auditory neurons. Longer stimuli could provide information regarding excitatory and inhibitory CANS function. Our goal was to measure the onset–offset N1–P2 auditory evoked response in subjects with normal hearing and subjects with moderate high-frequency sensorineural hearing loss (HFSNHL) to determine whether the response can be measured in individuals with moderate HFSNHL and, if so, whether waveform components differ between participant groups. Method Waveforms were obtained from 10 participants with normal hearing and seven participants with HFSNHL aged 40–67 years using 2,000-ms broadband noise stimuli with 40-ms rise–fall times presented at 50 dB SL referenced to stimulus threshold. Amplitudes and latencies were analyzed via repeated-measures analysis of variance (ANOVA). N1 and P2 onset latencies were compared to offset counterparts via repeated-measures ANOVA after subtracting 2,000 ms from the offset latencies to account for stimulus duration. Offset-to-onset trough-to-peak amplitude ratios between groups were compared using a one-way ANOVA. Results Responses were evoked from all participants. There were no differences between participant groups for the waveform components measured. Response × Participant Group interactions were not significant. Offset N1–P2 latencies were significantly shorter than onset counterparts after adjusting for stimulus duration (normal hearing: 43 ms shorter; HFSNHL: 47 ms shorter). Conclusions Onset–offset N1–P2 responses were resistant to moderate HFSNHL. It is likely that the onset was elicited by the presentation of a sound in silence and the offset by the change in stimulus envelope from plateau to fall, suggesting an excitatory onset response and an inhibitory-influenced offset response. Results indicated this protocol can be used to investigate CANS function in individuals with moderate HFSNHL. Supplemental Material https://doi.org/10.23641/asha.14669007
APA, Harvard, Vancouver, ISO, and other styles
21

He, Dong Kang, You Cai Xu, Xin Shi Li, Ran Tao, Shu Guo, Min Gou, and Kun Li. "Research on Applied-Information Technology in Digital Speech Based on LMD Algorithm." Advanced Materials Research 1014 (July 2014): 447–51. http://dx.doi.org/10.4028/www.scientific.net/amr.1014.447.

Full text
Abstract:
As a new nonlinear and non-stationary signal analysis method,local mean decomposition (LMD) has a good adaptability. We decompose the original non-stationary acceleration vibration signals into several stationary production function (PF).But performing LMD will produce end effects which make results distorted. A hidden Markov model (HMM)-based speech recognition system for Chinese spell.After analyzing reasons for end effects of LMD in detail,a new method based on weighted matching similar waveform was proposed.Experiments in speech recognition to the production function as the training model, the more traditional identification method to identify higher rates. LMD is an effective method. It is feasible to extract the feature from speech signals with LMD.
APA, Harvard, Vancouver, ISO, and other styles
22

Carney, Arlene Earley. "Vibrotactile Perception of Segmental Features of Speech." Journal of Speech, Language, and Hearing Research 31, no. 3 (September 1988): 438–48. http://dx.doi.org/10.1044/jshr.3103.438.

Full text
Abstract:
This experiment compared the recognition performance of artificially deafened listeners for segmental stimuli presented through a single-channel tactile device and through a 24-channel vocoder. Both consonant and vowel stimuli were tested under visual only, tactile only, and visual + tactile conditions. Each subject received a pretest, eight 2-hr training sessions, and a posttest. Results indicated no significant differences between subjects' overall recognition performance with two different tactile devices. Analysis of consonant confusions showed that both devices transmit the features of voicing, manner, and place of articulation in a similar fashion. In contrast to an earlier study on suprasegmental features by Carney and Beachler (1986), these results do not support the notion that preservation of the waveform envelope of speech is necessary for the transmission of segmental features of speech. These results also suggest that tactile perception of segmental features may not be altered significantly by the tactile array chosen.
APA, Harvard, Vancouver, ISO, and other styles
23

Nishiguchi, Masayuki. "Method and apparatus for speech encoding and decoding by sinusoidal analysis and waveform encoding with phase reproducibility." Journal of the Acoustical Society of America 125, no. 5 (2009): 3486. http://dx.doi.org/10.1121/1.3139576.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Milenkovic, Paul, and Feng Mo. "Effect of the vocal tract yielding sidewall on inverse filter analysis of the glottal waveform." Journal of Voice 2, no. 4 (1988): 271–78. http://dx.doi.org/10.1016/s0892-1997(88)80019-5.

Full text
APA, Harvard, Vancouver, ISO, and other styles
25

Jacob, Agnes, and P. Mythili. "Developing a Child Friendly Text-to-Speech System." Advances in Human-Computer Interaction 2008 (2008): 1–6. http://dx.doi.org/10.1155/2008/597971.

Full text
Abstract:
This paper discusses the implementation details of a child friendly, good quality, English text-to-speech (TTS) system that is phoneme-based, concatenative, easy to set up and use with little memory. Direct waveform concatenation and linear prediction coding (LPC) are used. Most existing TTS systems are unit-selection based, which use standard speech databases available in neutral adult voices. Here reduced memory is achieved by the concatenation of phonemes and by replacing phonetic wave files with their LPC coefficients. Linguistic analysis was used to reduce the algorithmic complexity instead of signal processing techniques. Sufficient degree of customization and generalization catering to the needs of the child user had been included through the provision for vocabulary and voice selection to suit the requisites of the child. Prosody had also been incorporated. This inexpensive TTS system was implemented in MATLAB, with the synthesis presented by means of a graphical user interface (GUI), thus making it child friendly. This can be used not only as an interesting language learning aid for the normal child but it also serves as a speech aid to the vocally disabled child. The quality of the synthesized speech was evaluated using the mean opinion score (MOS).
APA, Harvard, Vancouver, ISO, and other styles
26

Xu, Xiaona, Li Yang, Yue Zhao, and Hui Wang. "End-to-End Speech Synthesis for Tibetan Multidialect." Complexity 2021 (January 25, 2021): 1–8. http://dx.doi.org/10.1155/2021/6682871.

Full text
Abstract:
The research on Tibetan speech synthesis technology has been mainly focusing on single dialect, and thus there is a lack of research on Tibetan multidialect speech synthesis technology. This paper presents an end-to-end Tibetan multidialect speech synthesis model to realize a speech synthesis system which can be used to synthesize different Tibetan dialects. Firstly, Wylie transliteration scheme is used to convert the Tibetan text into the corresponding Latin letters, which effectively reduces the size of training corpus and the workload of front-end text processing. Secondly, a shared feature prediction network with a cyclic sequence-to-sequence structure is built, which maps the Latin transliteration vector of Tibetan character to Mel spectrograms and learns the relevant features of multidialect speech data. Thirdly, two dialect-specific WaveNet vocoders are combined with the feature prediction network, which synthesizes the Mel spectrum of Lhasa-Ü-Tsang and Amdo pastoral dialect into time-domain waveform, respectively. The model avoids using a large number of Tibetan dialect expertise for processing some time-consuming tasks, such as phonetic analysis and phonological annotation. Additionally, it can directly synthesize Lhasa-Ü-Tsang and Amdo pastoral speech on the existing text annotation. The experimental results show that the synthesized speech of Lhasa-Ü-Tsang and Amdo pastoral dialect based on our proposed method has better clarity and naturalness than the Tibetan monolingual model.
APA, Harvard, Vancouver, ISO, and other styles
27

Dahmani, Mohamed, and Mhania Guerti. "Recurrence Quantification Analysis of Glottal Signal as non Linear Tool for Pathological Voice Assessment and Classification." International Arab Journal of Information Technology 17, no. 6 (November 1, 2020): 857–66. http://dx.doi.org/10.34028/iajit/17/6/4.

Full text
Abstract:
Automatic detection and assessment of Vocal Folds pathologies using signal processing techniques knows an extensively challenge use in the voice or speech research community. This paper contributes the application of the Recurrence Quantification Analysis (RQA) to a glottal signal waveform in order to evaluate the dynamic process of Vocal Folds (VFs) for diagnosis and classify the voice disorders. The proposed solution starts by extracting the glottal signal waveform from the voice signal through an inverse filtering algorithm. In the next step, the parameters of RQA are determined via the Recurrent Plot (RP) structure of the glottal signal where the normal voice is considered as a reference. Finally, these parameters are used as input features set of a hybrid Particle Swarm Optimization-Support Vector Machines (PSO-SVM) algorithms to segregate between normal and pathological voices. For the test validation, we have adopted the collection of Saarbrucken Voice Database (SVD) where we have selected the long vowel /a:/ of 133 normal samples and 260 pathological samples uttered by four groups of subjects : persons having suffered from vocal folds paralysis, persons having vocal folds polyps, persons having spasmodic dysphonia and normal voices. The obtained results show the effectiveness of RQA applied to the glottal signal as a features extraction technique. Indeed, the PSO-SVM as a classification method presented an effective tool for assessment and diagnosis of pathological voices with an accuracy of 97.41%
APA, Harvard, Vancouver, ISO, and other styles
28

Davis, Tara M., James Jerger, and Jeffrey Martin. "Electrophysiological Evidence of Augmented Interaural Asymmetry in Middle-Aged Listeners." Journal of the American Academy of Audiology 24, no. 03 (March 2013): 159–73. http://dx.doi.org/10.3766/jaaa.24.3.3.

Full text
Abstract:
Background: Various dimensions of auditory processing, especially the perception of speech in the presence of background competition, have been shown to deteriorate with age. A persistent problem in the assessment of these age-related changes has been the high prevalence of age-related high-frequency hearing loss in elderly persons. Some investigators have suggested that a more fruitful approach to the study of age-related decline might be to study middle-aged, rather than elderly, persons, where confounding high-frequency hearing loss is less prevalent. Purpose: To determine whether an increase in the left-ear disadvantage (LED) in dichotic listening could be demonstrated in a group of middle-aged persons. Research Design: The N400 component of the auditory event-related potential (AERP) was utilized to evaluate interaural asymmetry in a quasi-dichotic competing speech task. Electrophysiological responses were obtained on a word-pair semantic categorization task presented through a front loudspeaker while the listener ignored competing speech presented through either left (competition left [CL]) or right (competition right [CR]) loudspeakers. Study Samples: Twenty young (18–24 yr) and 20 middle-aged (44–57 yr) females with normal hearing sensitivity. Data Collection and Analysis: Individual, as well as grand-averaged, AERP waveforms and scalp topographies were analyzed for the word pairs. Peak amplitude and latency measures of the N400 component were subjected to a mixed design analysis of variance (ANOVA). Results: No significant interaural asymmetry was found in the AERP waveform for the reference word condition in either age group. In response to the second word of the pair, however, middle-aged females showed significantly greater N400 negativity in the CR condition than in the CL condition. No significant laterality effect was found in the young females. Conclusions: The study of young versus middle-aged participants may be an effective way of avoiding the confound of high-frequency hearing loss in elderly persons when studying age effects on auditory processing.
APA, Harvard, Vancouver, ISO, and other styles
29

Tan, Jingqian, Jia Luo, Xin Wang, Yanbing Jiang, Xiangli Zeng, Shixiong Chen, and Peng Li. "Analysis of Click and Swept-Tone Auditory Brainstem Response Results for Moderate and Severe Sensorineural Hearing Loss." Audiology and Neurotology 25, no. 6 (2020): 336–44. http://dx.doi.org/10.1159/000507691.

Full text
Abstract:
<b><i>Introduction:</i></b> Auditory brainstem response (ABR) is one of the commonly used methods in clinical settings to evaluate the hearing sensitivity and auditory function. The current ABR measurement usually adopts click sound as the stimuli. However, there may be partial ABR amplitude attenuation due to the delay characteristics of the cochlear traveling wave along the basilar membrane. To solve that problem, a swept-tone method was proposed, in which the show-up time of different frequency components was adjusted to compensate the delay characteristics of the cochlear basilar membrane; therefore, different ABR subcomponents of different frequencies were synchronized. <b><i>Methods:</i></b> The normal hearing group, moderate sensorineural hearing loss group, and severe sensorineural hearing loss group underwent click ABR and swept-tone ABR with different stimulus intensities. The latencies and amplitudes of waves I, III, and V in 2 detections were recorded. <b><i>Results:</i></b> It was found that the latency of each of the recorded I, III, and V waves detected by swept-tone ABR was shorter than that by click ABR in both the control group and experimental groups. In addition, the amplitude of each of the recorded I, III, and V waves, except V waves under 60 dB nHL in the moderate sensorineural hearing loss group, detected by swept-tone ABR was larger than that by click ABR. The results also showed that the swept-tone ABR could measure the visible V waves at lower stimulus levels in the severe sensorineural hearing loss group. <b><i>Conclusion:</i></b> Swept-tone improves the ABR waveforms and helps to obtain more accurate threshold to some extent. Therefore, the proposed swept-tone ABR may provide a new solution for better morphology of ABR waveform, which can help to make more accurate diagnosis about the hearing functionality in the clinic.
APA, Harvard, Vancouver, ISO, and other styles
30

Miller, Christi W., Joshua G. W. Bernstein, Xuyang Zhang, Yu-Hsiang Wu, Ruth A. Bentler, and Kelly Tremblay. "The Effects of Static and Moving Spectral Ripple Sensitivity on Unaided and Aided Speech Perception in Noise." Journal of Speech, Language, and Hearing Research 61, no. 12 (December 10, 2018): 3113–26. http://dx.doi.org/10.1044/2018_jslhr-h-17-0373.

Full text
Abstract:
Purpose This study evaluated whether certain spectral ripple conditions were more informative than others in predicting ecologically relevant unaided and aided speech outcomes. Method A quasi-experimental study design was used to evaluate 67 older adult hearing aid users with bilateral, symmetrical hearing loss. Speech perception in noise was tested under conditions of unaided and aided, auditory-only and auditory–visual, and 2 types of noise. Predictors included age, audiometric thresholds, audibility, hearing aid compression, and modulation depth detection thresholds for moving (4-Hz) or static (0-Hz) 2-cycle/octave spectral ripples applied to carriers of broadband noise or 2000-Hz low- or high-pass filtered noise. Results A principal component analysis of the modulation detection data found that broadband and low-pass static and moving ripple detection thresholds loaded onto the first factor whereas high-pass static and moving ripple detection thresholds loaded onto a second factor. A linear mixed model revealed that audibility and the first factor (reflecting broadband and low-pass static and moving ripples) were significantly associated with speech perception performance. Similar results were found for unaided and aided speech scores. The interactions between speech conditions were not significant, suggesting that the relationship between ripples and speech perception was consistent regardless of visual cues or noise condition. High-pass ripple sensitivity was not correlated with speech understanding. Conclusions The results suggest that, for hearing aid users, poor speech understanding in noise and sensitivity to both static and slow-moving ripples may reflect deficits in the same underlying auditory processing mechanism. Significant factor loadings involving ripple stimuli with low-frequency content may suggest an impaired ability to use temporal fine structure information in the stimulus waveform. Support is provided for the use of spectral ripple testing to predict speech perception outcomes in clinical settings.
APA, Harvard, Vancouver, ISO, and other styles
31

Perez, Ana, Karin Ziliotto, and Liliane Pereira. "Test-Retest of Long Latency Auditory Evoked Potentials (P300) with Pure Tone and Speech Stimuli." International Archives of Otorhinolaryngology 21, no. 02 (April 26, 2016): 134–39. http://dx.doi.org/10.1055/s-0036-1583527.

Full text
Abstract:
Introduction Long latency auditory evoked potentials, especially P300, have been used for clinical evaluation of mental processing. Many factors can interfere with Auditory Evoked Potential - P300 results, suggesting large intra and inter-subject variations. Objective The objective of the study was to identify the reliability of P3 components (latency and amplitude) over 4–6 weeks and the most stable auditory stimulus with the best test-retest agreement. Methods Ten normal-hearing women participated in the study. Only subjects without auditory processing problems were included. To determine the P3 components, we elicited long latency auditory evoked potential (P300) by pure tone and speech stimuli, and retested after 4–6 weeks using the same parameters. We identified P300 latency and amplitude by waveform subtraction. Results We found lower coefficient of variation values in latency than in amplitude, with less variability analysis when speech stimulus was used. There was no significant correlation in latency measures between pure tone and speech stimuli, and sessions. There was a significant intrasubject correlation between measures of latency and amplitude. Conclusion These findings show that amplitude responses are more robust for the speech stimulus when compared with its pure tone counterpart. The P300 indicated stability for latency and amplitude measures when the test-retest was applied. Reliability was higher for amplitude than for latency, with better agreement when the pure tone stimulus was used. However, further research with speech stimulus is needed to clarify how these stimuli are processed by the nervous system.
APA, Harvard, Vancouver, ISO, and other styles
32

Davis, Tara M., and James Jerger. "The Effect of Middle Age on the Late Positive Component of the Auditory Event-Related Potential." Journal of the American Academy of Audiology 25, no. 02 (February 2014): 199–209. http://dx.doi.org/10.3766/jaaa.25.2.8.

Full text
Abstract:
Background: We recently described a research study in which age-related changes in interaural asymmetry were elicited using the N400 of the auditory event-related potentials (AERP) (Davis et al, 2013). The N400 was the primary focus due to its sensitivity to various aspects of semantic processing (Kutas and Hillyard, 1984), which we measured using a quasi-dichotic semantic category judgment task in competing speech. In this article, we describe age-related changes that occurred in the late positive component (LPC) of the AERP in the same study. The LPC peak occurs subsequent to the N400 peak on the AERP waveform and has been associated with context updating and further evaluation and processing of stimulus meaning (Juottonen et al, 1996). Neither age group showed significant interaural asymmetry in the LPC. However, a robust age-related difference in LPC scalp topography was observed. Purpose: The LPC of the auditory event-related potentials was utilized to evaluate age-related differences in language processing in a quasi-dichotic competing speech task. Research Design: Electrophysiological responses were obtained on a word-pair semantic categorization task presented through a front loudspeaker while ignoring competing speech that was presented through either left (competition left [CL]) or right (competition right [CR]) loudspeakers. The LPC was compared between young and middle-aged groups in three conditions: side of competition, semantic judgment, and electrode position. Study Sample: Twenty young (18–24 yr) and twenty middle-aged (44–57 yr) females with normal hearing sensitivity participated in this study. Data Collection and Analysis: Individual, as well as grand-averaged, AERP waveforms and scalp topographies were analyzed in response to the word pairs. The LPC component was subjected to a mixed design analysis of variance (ANOVA) for peak latency and amplitude measures in the latency range of 700–800 msec. Since statistical analyses showed little difference in the LPC component as a function of side of competition, the AERP data were collapsed for the CL and CR conditions. The LPC was analyzed in two ways: first at mid-parieto-central electrode locations, second across midline electrodes from PZ to FZ. Results: Analysis of the mid-parieto-central electrodes showed no amplitude or latency differences for either group or side of competition. The second analysis (across midline electrodes), however, showed a significant amplitude interaction between electrode position and group, indicating that the two age groups were equivalent in the posterior region of the scalp but divergent as electrode site moved frontally. Significant age-related scalp topography differences were found in both semantic judgment conditions. No significant latency differences were found in any condition. Conclusions: The middle-aged group showed substantially greater LPC peak amplitude in the frontal regions of the scalp than young adults. These results were in concert with N400 results, which suggested that the middle-aged group required more attentional/cognitive resources than young adults in order to maintain a high performance level on a linguistic task in the presence of competing linguistic stimuli (Davis et al, 2013).
APA, Harvard, Vancouver, ISO, and other styles
33

Mozaffarilegha, Marjan, Ali Esteki, Mohsen Ahadi, and Ahmadreza Nazeri. "Identification of Dynamic Patterns of Speech-Evoked Auditory Brainstem Response Based on Ensemble Empirical Mode Decomposition and Nonlinear Time Series Analysis Methods." International Journal of Bifurcation and Chaos 26, no. 12 (November 2016): 1650202. http://dx.doi.org/10.1142/s0218127416502023.

Full text
Abstract:
The speech-evoked auditory brainstem response (sABR) shows how complex sounds such as speech and music are processed in the auditory system. Speech-ABR could be used to evaluate particular impairments and improvements in auditory processing system. Many researchers used linear approaches for characterizing different components of sABR signal, whereas nonlinear techniques are not applied so commonly. The primary aim of the present study is to examine the underlying dynamics of normal sABR signals. The secondary goal is to evaluate whether some chaotic features exist in this signal. We have presented a methodology for determining various components of sABR signals, by performing Ensemble Empirical Mode Decomposition (EEMD) to get the intrinsic mode functions (IMFs). Then, composite multiscale entropy (CMSE), the largest Lyapunov exponent (LLE) and deterministic nonlinear prediction are computed for each extracted IMF. EEMD decomposes sABR signal into five modes and a residue. The CMSE results of sABR signals obtained from 40 healthy people showed that 1st, and 2nd IMFs were similar to the white noise, IMF-3 with synthetic chaotic time series and 4th, and 5th IMFs with sine waveform. LLE analysis showed positive values for 3rd IMFs. Moreover, 1st, and 2nd IMFs showed overlaps with surrogate data and 3rd, 4th and 5th IMFs showed no overlap with corresponding surrogate data. Results showed the presence of noisy, chaotic and deterministic components in the signal which respectively corresponded to 1st, and 2nd IMFs, IMF-3, and 4th and 5th IMFs. While these findings provide supportive evidence of the chaos conjecture for the 3rd IMF, they do not confirm any such claims. However, they provide a first step towards an understanding of nonlinear behavior of auditory system dynamics in brainstem level.
APA, Harvard, Vancouver, ISO, and other styles
34

Dzulkarnain, Ahmad Aidil Arafat, Wan Mahirah Wan Mhd Pandi, Wayne J. Wilson, Andrew P. Bradley, and Faizah Sapian. "A preliminary investigation into the use of an auditory brainstem response (ABR) simulator for training audiology students in waveform analysis." International Journal of Audiology 53, no. 8 (April 4, 2014): 514–21. http://dx.doi.org/10.3109/14992027.2014.897763.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

Sussman, Harvey M., David Fruchter, Jon Hilbert, and Joseph Sirosh. "Linear correlates in the speech signal: The orderly output constraint." Behavioral and Brain Sciences 21, no. 2 (April 1998): 241–59. http://dx.doi.org/10.1017/s0140525x98001174.

Full text
Abstract:
Neuroethological investigations of mammalian and avian auditory systems have documented species-specific specializations for processing complex acoustic signals that could, if viewed in abstract terms, have an intriguing and striking relevance for human speech sound categorization and representation. Each species forms biologically relevant categories based on combinatorial analysis of information-bearing parameters within the complex input signal. This target article uses known neural models from the mustached bat and barn owl to develop, by analogy, a conceptualization of human processing of consonant plus vowel sequences that offers a partial solution to the noninvariance dilemma – the nontransparent relationship between the acoustic waveform and the phonetic segment. Critical input sound parameters used to establish species-specific categories in the mustached bat and barn owl exhibit high correlation and linearity due to physical laws. A cue long known to be relevant to the perception of stop place of articulation is the second formant (F2) transition. This article describes an empirical phenomenon – the locus equations – that describes the relationship between the F2 of a vowel and the F2 measured at the onset of a consonant-vowel (CV) transition. These variables, F2 onset and F2 vowel within a given place category, are consistently and robustly linearly correlated across diverse speakers and languages, and even under perturbation conditions as imposed by bite blocks. A functional role for this category-level extreme correlation and linearity (the “orderly output constraint”) is hypothesized based on the notion of an evolutionarily conserved auditory-processing strategy. High correlation and linearity between critical parameters in the speech signal that help to cue place of articulation categories might have evolved to satisfy a preadaptation by mammalian auditory systems for representing tightly correlated, linearly related components of acoustic signals.
APA, Harvard, Vancouver, ISO, and other styles
36

Ringelienė, Živilė, and Mark Filipovič. "Žodžių atpažinimo, grįsto paslėptaisiais Markovo modeliais, vizualizavimo ir analizės programinė įranga." Informacijos mokslai 56 (January 1, 2011): 63–72. http://dx.doi.org/10.15388/im.2011.0.3150.

Full text
Abstract:
Straipsnyje aprašomas atpažinimo, grįsto paslėptaisiais Markovo modeliais, sistemos prototipo veikimas. Ši programinė įranga skirta lietuvių kalbos žodžių atpažinimui tirti. Nagrinėjama, kaip sistemos pateikiama informacija apie žodžių atpažinimo procesą ir rezultatus padeda analizuoti klaidų priežastis. Žodžio atpažinimas priklauso nuo žodžio ribų nustatymo tikslumo. Signalo, energijos, žodžio ribų vizualizavimas leidžia lengviau įvertinti, ar sistema teisingai nustatė ribas. Jei žodis atpažintasklaidingai dėl to, kad buvo blogai nustatytos ribos, galima keisti sistemos parametrų, darančių įtaką ribų nustatymo tikslumui, reikšmes. Tam tikrais atvejais tai pagerina atpažinimo rezultatus. Žodžio paieškos vaizdavimas padeda įvertinti kiekvieno fonemos modelio įtaką žodžio atpažinimui ir parinkti žodžių transkripcijas, kurios pagerina atpažinimo rezultatus.A Tool for Visualization and Analysis of Isolated Word Recognition Based on the Hidden Markov ModelsŽivilė Ringelienė, Mark Filipovič SummaryThe paper presents a prototype of the isolated word recognition system based on hidden Markov models. The developed prototype of the speakerindependent Lithuanian isolated word recognition system is handy for recognition experiments and the analysis of their results. The user is provided with numeric and visual recognition information on the results. The word recognition pivots on the precision of the determination of the word limits. The main window contains a recognized word and its logarithmic likelihood, a visible waveform of the speech signal, the depicted energy of the speech signal, the identified word boundaries and energy detection thresholds. If the system misrecognized the word, such visualization enables to identify easier whether it resulted from wrong end-point detection. The segmentation window provides with a list of words which acoustic models to the given speech signal are the best, the scores of their likelihood and a diagram of the most likely sequence of the phoneme models aligned with the speech signal. Such visualization helps to analyze recognition errors and the impact of each phoneme model on the recognition accuracy. Results of preliminary experiments have shown that by changing the transcription of some words the recognition accuracy can be increased.
APA, Harvard, Vancouver, ISO, and other styles
37

Barker, Matthew D., Abin Kuruvilla-Mathew, and Suzanne C. Purdy. "Cortical Auditory-Evoked Potential and Behavioral Evidence for Differences in Auditory Processing between Good and Poor Readers." Journal of the American Academy of Audiology 28, no. 06 (June 2017): 534–45. http://dx.doi.org/10.3766/jaaa.16054.

Full text
Abstract:
Background: The relationship between auditory processing (AP) and reading is thought to be significant; however our understanding of this relationship is somewhat limited. Previous studies have investigated the relation between certain electrophysiological and behavioral measures of AP and reading abilities in children. This study attempts to further understand that relation. Purpose: Differences in AP between good and poor readers were investigated using electrophysiological and behavioral measures. Study Sample: Thirty-two children (15 female) aged 9–11 yr were placed in either a good reader group or poor reader group, based on the scores of a nationally normed reading test in New Zealand. Research Design: Children were initially tested using an automated behavioral measuring system that runs through a tablet computer known as “Feather Squadron.” Following the administration of Feather Squadron, cortical auditory-evoked potentials (CAEPs) were recorded using a speech stimulus (/m/) with the HEARLab® Cortical Auditory Evoked Potential Analyzer. Data Collection and Analysis: The children were evaluated on eight subsections of the Feather Squadron, and CAEP waveform peaks were visually identified and averaged. Separate Kruskal–Wallis analyses were performed for the behavioral and electrophysiological variables, with group (good versus poor readers) serving as the between-group independent variable and scores from the Feather Squadron AP tasks as well as CAEP latencies and amplitudes as dependent variables. After the children’s AP status was determined, the entire group was further divided into three groups: typically developing, auditory processing disorder + reading difficulty (APD + RD), and RDs only. Statistical analyses were repeated for these subgroups. Results: Poorer readers showed significantly worse scores than the good readers for the Tonal Pattern 1, Tonal Pattern 2, and Word Double Dichotic Right tasks. CAEP differences observed across groups indicated comorbid effects of RD and AP difficulties. N2 amplitude was significantly smaller for the poor readers. Conclusions: The current study found altered AP in poor readers using behavioral Feather Squadron measures and speech-evoked cortical potentials. These results provide further evidence that intact central auditory function is fundamental for reading development.
APA, Harvard, Vancouver, ISO, and other styles
38

Kist, Andreas M., Pablo Gómez, Denis Dubrovskiy, Patrick Schlegel, Melda Kunduk, Matthias Echternach, Rita Patel, et al. "A Deep Learning Enhanced Novel Software Tool for Laryngeal Dynamics Analysis." Journal of Speech, Language, and Hearing Research 64, no. 6 (June 4, 2021): 1889–903. http://dx.doi.org/10.1044/2021_jslhr-20-00498.

Full text
Abstract:
Purpose High-speed videoendoscopy (HSV) is an emerging, but barely used, endoscopy technique in the clinic to assess and diagnose voice disorders because of the lack of dedicated software to analyze the data. HSV allows to quantify the vocal fold oscillations by segmenting the glottal area. This challenging task has been tackled by various studies; however, the proposed approaches are mostly limited and not suitable for daily clinical routine. Method We developed a user-friendly software in C# that allows the editing, motion correction, segmentation, and quantitative analysis of HSV data. We further provide pretrained deep neural networks for fully automatic glottis segmentation. Results We freely provide our software Glottis Analysis Tools (GAT). Using GAT, we provide a general threshold-based region growing platform that enables the user to analyze data from various sources, such as in vivo recordings, ex vivo recordings, and high-speed footage of artificial vocal folds. Additionally, especially for in vivo recordings, we provide three robust neural networks at various speed and quality settings to allow a fully automatic glottis segmentation needed for application by untrained personnel. GAT further evaluates video and audio data in parallel and is able to extract various features from the video data, among others the glottal area waveform, that is, the changing glottal area over time. In total, GAT provides 79 unique quantitative analysis parameters for video- and audio-based signals. Many of these parameters have already been shown to reflect voice disorders, highlighting the clinical importance and usefulness of the GAT software. Conclusion GAT is a unique tool to process HSV and audio data to determine quantitative, clinically relevant parameters for research, diagnosis, and treatment of laryngeal disorders. Supplemental Material https://doi.org/10.23641/asha.14575533
APA, Harvard, Vancouver, ISO, and other styles
39

Atcherson, Samuel R., and Page C. Moore. "Are Chirps Better than Clicks and Tonebursts for Evoking Middle Latency Responses?" Journal of the American Academy of Audiology 25, no. 06 (June 2014): 576–83. http://dx.doi.org/10.3766/jaaa.25.6.7.

Full text
Abstract:
Background: The middle latency response (MLR) is considered a valid clinical tool for assessing the integrity of cortical and subcortical structures. Several investigators have demonstrated that a rising frequency chirp stimulus is capable of eliciting not only larger wave V amplitudes but larger MLR components as well. However, the chirp has never been specifically examined in a hemispheric electrode montage setup that is typical for neurodiagnostic application and site-of-lesion testing. Purpose: The purpose of this study was to examine the effect of chirp, click, and toneburst stimuli on MLR waveform peak latency and peak-to-peak amplitude in a hemispheric electrode montage setup. Research Design: This study used a repeated-measures design. Study Sample: A total of 10 young adult participants (3 males, 7 females) with normal hearing were recruited and had negative histories of audiologic, otologic, and neurologic involvement, and no reported language or learning difficulties. Data Collection and Analysis: MLR latencies (Na, Pa, Nb, and Pb) and peak-to-peak amplitudes (Na-Pa, Pa-Nb, and Nb-Pb) were measured for all conditions and were statistically evaluated for left hemisphere-right ear (C3-A2) and right hemisphere-left ear (C4-A1) recordings. Results: Statistical analyses revealed no significant difference between C3-A2 and C4-A1 peak-to-peak amplitudes; therefore, data were collapsed. Stimulus comparisons revealed that Na evoked by tonebursts were statistically prolonged compared with both chirp and click, and that both Na-Pa and Pa-Nb peak-to-peak amplitudes were statistically larger for chirps compared with both clicks and tonebursts, and for clicks compared with tonebursts. Conclusions: The results of this study support the hypothesis that a chirp would offer a clinical advantage to the click and toneburst in overall peak-to-peak amplitude. As expected, normal-hearing participants did not exhibit hemispheric differences when comparing C3-A2 and C4-A1 peak-to-peak amplitudes demonstrating symmetric auditory brain function. However, chirp-evoked MLRs will require further study to determine its usefulness in clinical practice.
APA, Harvard, Vancouver, ISO, and other styles
40

Bae, In-Ho, Soo-Geun Wang, Soon-Bok Kwon, Seong-Tae Kim, Eui-Suk Sung, and Jin-Choon Lee. "Clinical Application of Two-Dimensional Scanning Digital Kymography in Discrimination of Diplophonia." Journal of Speech, Language, and Hearing Research 62, no. 10 (October 25, 2019): 3643–54. http://dx.doi.org/10.1044/2019_jslhr-s-18-0175.

Full text
Abstract:
Purpose The purpose of this study was to investigate the characteristics of diplophonia using an auditory perception and multimodal simultaneous examination, which included sound waveform analysis, electroglottography (EGG), digital kymography (DKG), and 2-dimensional scanning digital kymography (2D DKG). Additionally, we compared the diagnostic accuracy of each method using a binary classifier in confusion matrix and convenience of discrimination, based on the time required for interpretation. Method One normophonic male, 12 patients with diplophonia, and 12 dysphonia patients without diplophonia were enrolled. A multimodal simultaneous evaluation was used to analyze the vibration pattern of diplophonia. Sensitivity, specificity, accuracy, area under the curve, and interpretation time were used to compare the various diagnostic methods. Discrimination was determined by 3 raters. Results There are 3 types of asymmetric vibratory patterns in diplophonia. The types are based on the oscillators vibrating at different frequencies: asymmetry of the left and right cords (6 subjects with unilateral palsy and 1 subject with vocal polyps), asymmetry of anterior and posterior cords (2 subjects with vocal polyps), and asymmetry of true and false cords (3 subjects with muscle tension dysphonia). All evaluation methods were useful as diagnostic tools, with all areas under the curve > .70. The diagnostic accuracy was highest with DKG (95.83%), followed by 2D DKG (83.33%), EGG (81.94%), auditory-perceptual evaluation (80.56%), and sound waveform (77.78%). The interpretation time was the shortest for auditory-perceptual evaluation (6.07 ± 1.34 s), followed by 2D DKG (10.04 ± 3.00 s), EGG (12.49 ± 2.76 s), and DKG (13.53 ± 2.60 s). Conclusions Auditory-perceptual judgment was the easiest and fastest method for experienced raters, but its diagnostic accuracy was lower than that of DKG or 2D DKG. The diagnostic accuracy of DKG was the highest, but 2D DKG allowed rapid interpretation and showed relatively high diagnostic accuracy, except in cases with space-occupying lesions. Supplemental Material https://doi.org/10.23641/asha.9911786
APA, Harvard, Vancouver, ISO, and other styles
41

Walliker, J. R., and A. J. Fourcin. "Signal-Processing Hearing Aids for the Totally and Profoundly Deaf." Annals of Otology, Rhinology & Laryngology 96, no. 1_suppl (January 1987): 74–76. http://dx.doi.org/10.1177/00034894870960s137.

Full text
Abstract:
We have developed a family of single-channel signal-processing aids for the profoundly and totally deaf. Common to them all are the analysis of speech into the components most important to the deaf lipreader; the synthesis of stimuli which make the best use of the patient's sensory abilities; and facilities to ensure accurate matching of the aid to the patient. The totally deaf are electrically stimulated by electrodes on the promontory or on the round window of the cochlea using charge-balanced controlled current square waves automatically adjusted to be at a comfortable level. Many potential candidates for electrocochlear stimulation have significant low frequency residual hearing, but do not find conventional hearing aids to be useful. We have found that they can often make very effective use of the voice fundamental frequency presented as an acoustic sinusoid. Our approach to these patients avoids the need for implant surgery but preserves that option should total loss of hearing occur in the future. Both electrocochlear and acoustic methods of signal presentation are implemented with similar hardware. The speech signal from a microphone or other source is analyzed by a voice fundamental frequency extractor and a voiceless sound detector. Their outputs are processed by a single chip microcomputer that synthesizes the output waveform. In both devices the aid is tailored to the patient using a desktop computer that stores amplitude-frequency characteristics and frequency mapping tables into a read-only memory.
APA, Harvard, Vancouver, ISO, and other styles
42

Rong, Panying. "Automated Acoustic Analysis of Oral Diadochokinesis to Assess Bulbar Motor Involvement in Amyotrophic Lateral Sclerosis." Journal of Speech, Language, and Hearing Research 63, no. 1 (January 22, 2020): 59–73. http://dx.doi.org/10.1044/2019_jslhr-19-00178.

Full text
Abstract:
Purpose The purpose of this article was to validate a novel acoustic analysis of oral diadochokinesis (DDK) in assessing bulbar motor involvement in amyotrophic lateral sclerosis (ALS). Method An automated acoustic DDK analysis was developed, which filtered out the voice features and extracted the envelope of the acoustic waveform reflecting the temporal pattern of syllable repetitions during an oral DDK task (i.e., repetitions of /tɑ/ at the maximum rate on 1 breath). Cycle-to-cycle temporal variability (cTV) of envelope fluctuations and syllable repetition rate (sylRate) were derived from the envelope and validated against 2 kinematic measures, which are tongue movement jitter (movJitter) and alternating tongue movement rate (AMR) during the DDK task, in 16 individuals with bulbar ALS and 18 healthy controls. After the validation, cTV, sylRate, movJitter, and AMR, along with an established clinical speech measure, that is, speaking rate (SR), were compared in their ability to (a) differentiate individuals with ALS from healthy controls and (b) detect early-stage bulbar declines in ALS. Results cTV and sylRate were significantly correlated with movJitter and AMR, respectively, across individuals with ALS and healthy controls, confirming the validity of the acoustic DDK analysis in extracting the temporal DDK pattern. Among all the acoustic and kinematic DDK measures, cTV showed the highest diagnostic accuracy (i.e., 0.87) with 80% sensitivity and 94% specificity in differentiating individuals with ALS from healthy controls, which outperformed the SR measure. Moreover, cTV showed a large increase during the early disease stage, which preceded the decline of SR. Conclusions This study provided preliminary validation of a novel automated acoustic DDK analysis in extracting a useful measure, namely, cTV, for early detection of bulbar ALS. This analysis overcame a major barrier in the existing acoustic DDK analysis, which is continuous voicing between syllables that interferes with syllable structures. This approach has potential clinical applications as a novel bulbar assessment.
APA, Harvard, Vancouver, ISO, and other styles
43

Bhargava, Saurabh, Florian Blättler, Sepp Kollmorgen, Shih-Chii Liu, and Richard H. R. Hahnloser. "Linear Methods for Efficient and Fast Separation of Two Sources Recorded with a Single Microphone." Neural Computation 27, no. 10 (October 2015): 2231–59. http://dx.doi.org/10.1162/neco_a_00776.

Full text
Abstract:
This letter addresses the problem of separating two speakers from a single microphone recording. Three linear methods are tested for source separation, all of which operate directly on sound spectrograms: (1) eigenmode analysis of covariance difference to identify spectro-temporal features associated with large variance for one source and small variance for the other source; (2) maximum likelihood demixing in which the mixture is modeled as the sum of two gaussian signals and maximum likelihood is used to identify the most likely sources; and (3) suppression-regression, in which autoregressive models are trained to reproduce one source and suppress the other. These linear approaches are tested on the problem of separating a known male from a known female speaker. The performance of these algorithms is assessed in terms of the residual error of estimated source spectrograms, waveform signal-to-noise ratio, and perceptual evaluation of speech quality scores. This work shows that the algorithms compare favorably to nonlinear approaches such as nonnegative sparse coding in terms of simplicity, performance, and suitability for real-time implementations, and they provide benchmark solutions for monaural source separation tasks.
APA, Harvard, Vancouver, ISO, and other styles
44

Durai, Mithila, Mary G. O’Keeffe, and Grant D. Searchfield. "A Review of Auditory Prediction and Its Potential Role in Tinnitus Perception." Journal of the American Academy of Audiology 29, no. 06 (June 2018): 533–47. http://dx.doi.org/10.3766/jaaa.17025.

Full text
Abstract:
AbstractThe precise mechanisms underlying tinnitus perception and distress are still not fully understood. A recent proposition is that auditory prediction errors and related memory representations may play a role in driving tinnitus perception. It is of interest to further explore this.To obtain a comprehensive narrative synthesis of current research in relation to auditory prediction and its potential role in tinnitus perception and severity.A narrative review methodological framework was followed.The key words Prediction Auditory, Memory Prediction Auditory, Tinnitus AND Memory, Tinnitus AND Prediction in Article Title, Abstract, and Keywords were extensively searched on four databases: PubMed, Scopus, SpringerLink, and PsychINFO. All study types were selected from 2000–2016 (end of 2016) and had the following exclusion criteria applied: minimum age of participants <18, nonhuman participants, and article not available in English. Reference lists of articles were reviewed to identify any further relevant studies. Articles were short listed based on title relevance.After reading the abstracts and with consensus made between coauthors, a total of 114 studies were selected for charting data.The hierarchical predictive coding model based on the Bayesian brain hypothesis, attentional modulation and top-down feedback serves as the fundamental framework in current literature for how auditory prediction may occur. Predictions are integral to speech and music processing, as well as in sequential processing and identification of auditory objects during auditory streaming. Although deviant responses are observable from middle latency time ranges, the mismatch negativity (MMN) waveform is the most commonly studied electrophysiological index of auditory irregularity detection. However, limitations may apply when interpreting findings because of the debatable origin of the MMN and its restricted ability to model real-life, more complex auditory phenomenon. Cortical oscillatory band activity may act as neurophysiological substrates for auditory prediction. Tinnitus has been modeled as an auditory object which may demonstrate incomplete processing during auditory scene analysis resulting in tinnitus salience and therefore difficulty in habituation. Within the electrophysiological domain, there is currently mixed evidence regarding oscillatory band changes in tinnitus.There are theoretical proposals for a relationship between prediction error and tinnitus but few published empirical studies.
APA, Harvard, Vancouver, ISO, and other styles
45

Fong, Simon, Kun Lan, and Raymond Wong. "Classifying Human Voices by Using Hybrid SFX Time-Series Preprocessing and Ensemble Feature Selection." BioMed Research International 2013 (2013): 1–27. http://dx.doi.org/10.1155/2013/720834.

Full text
Abstract:
Voice biometrics is one kind of physiological characteristics whose voice is different for each individual person. Due to this uniqueness, voice classification has found useful applications in classifying speakers’ gender, mother tongue or ethnicity (accent), emotion states, identity verification, verbal command control, and so forth. In this paper, we adopt a new preprocessing method named Statistical Feature Extraction (SFX) for extracting important features in training a classification model, based on piecewise transformation treating an audio waveform as a time-series. Using SFX we can faithfully remodel statistical characteristics of the time-series; together with spectral analysis, a substantial amount of features are extracted in combination. An ensemble is utilized in selecting only the influential features to be used in classification model induction. We focus on the comparison of effects of various popular data mining algorithms on multiple datasets. Our experiment consists of classification tests over four typical categories of human voice data, namely, Female and Male, Emotional Speech, Speaker Identification, and Language Recognition. The experiments yield encouraging results supporting the fact that heuristically choosing significant features from both time and frequency domains indeed produces better performance in voice classification than traditional signal processing techniques alone, like wavelets and LPC-to-CC.
APA, Harvard, Vancouver, ISO, and other styles
46

Gottermeier, Linda, and Carol De Filippo. "Patterns of Aided Loudness Growth in Experienced Adult Listeners with Early-Onset Severe–Profound Hearing Loss." Journal of the American Academy of Audiology 29, no. 06 (June 2018): 457–76. http://dx.doi.org/10.3766/jaaa.16060.

Full text
Abstract:
AbstractIndividuals with early-onset severe–profound bilateral hearing loss (S/PHL) manifest diverse levels of benefit and satisfaction with hearing aids (HAs), even with prescriptive HA fitting. Such fittings incorporate normal loudness values, but little is known about aided loudness outcomes in this population and how those outcomes affect benefit or satisfaction.To describe aided loudness growth and satisfaction with aided listening in experienced adult HA users with S/PHL.The Contour Test of loudness perception was administered to listeners with S/PHL in the aided sound field using broadband speech, band-limited speech, and warble tones. Patterns and slopes of resultant loudness growth functions were referenced to sound field results from listeners with normal hearing (NH). S/PHL listeners also rated their aided listening satisfaction. It was expected that (1) most S/PHL listeners would demonstrate steeper than normal aided loudness growth, (2) loudness normalization would be associated with better high-frequency detection thresholds and speech recognition, and (3) closer approximation to normal would yield greater satisfaction.Participants were paid college-student volunteers: 23 with S/PHL, long-term aided listening experience, and new HAs; 15 with NH.Participants rated loudness on four ascending runs per stimulus (5-dB increments) using categories defined in 1997 by Cox and colleagues. The region between the 10th and 90th percentiles of the NH distribution constituted local norms against which location and slope of the S/PHL functions were examined over the range from Quiet to Loud-but-OK. S/PHL functions were categorized on the basis of their configurations (locations/slopes) relative to the norms.Pattern of aided loudness was normalized or within 5 dB of the normal region on 37% of trials with sufficient data for analysis. Only one of the 23 S/PHL listeners did not demonstrate Normal/Near-normal loudness on any trials. Four nonnormal patterns were identified: Steep (recruitment-like; 38% of trials); Shifted right, with normal growth rate (10%); Hypersensitive, with most intensities louder than normal (10%); and Shallow, with decreasing growth rate (7%). Listeners with high-frequency average thresholds above 100 dB hearing loss or no phonemic-based speech-discrimination skill were less likely to display normalized loudness. Slope was within norms for 52% of S/PHL trials, most also having a Normal/Near-normal growth pattern. Regardless of measured loudness results, all but four listeners with S/PHL reported satisfactory hearing almost always or most of the time with their HAs in designated priority need areas.The variety of aided loudness growth patterns identified reflects the diversity known to characterize individuals with early-onset S/PHL. Loudness rating at the validation stage of HA fit with these listeners is likely to reveal nonnormal loudness, signaling need for further HA adjustment. High satisfaction, however, despite nonnormal loudness growth, suggests that listeners with poor auditory speech recognition may benefit more from aided loudness that supports pattern perception (via the time-intensity waveform of speech), different from most current-day prescription fits.
APA, Harvard, Vancouver, ISO, and other styles
47

Henkin, Yael, Yifat Yaar-Soffer, Lihi Givon, and Minka Hildesheimer. "Hearing with Two Ears: Evidence for Cortical Binaural Interaction during Auditory Processing." Journal of the American Academy of Audiology 26, no. 04 (April 2015): 384–92. http://dx.doi.org/10.3766/jaaa.26.4.6.

Full text
Abstract:
Background: Integration of information presented to the two ears has been shown to manifest in binaural interaction components (BICs) that occur along the ascending auditory pathways. In humans, BICs have been studied predominantly at the brainstem and thalamocortical levels; however, understanding of higher cortically driven mechanisms of binaural hearing is limited. Purpose: To explore whether BICs are evident in auditory event-related potentials (AERPs) during the advanced perceptual and postperceptual stages of cortical processing. Research Design: The AERPs N1, P3, and a late negative component (LNC) were recorded from multiple site electrodes while participants performed an oddball discrimination task that consisted of natural speech syllables (/ka/ vs. /ta/) that differed by place-of-articulation. Participants were instructed to respond to the target stimulus (/ta/) while performing the task in three listening conditions: monaural right, monaural left, and binaural. Study Sample: Fifteen (21–32 yr) young adults (6 females) with normal hearing sensitivity. Data Collection and Analysis: By subtracting the response to target stimuli elicited in the binaural condition from the sum of responses elicited in the monaural right and left conditions, the BIC waveform was derived and the latencies and amplitudes of the components were measured. The maximal interaction was calculated by dividing BIC amplitude by the summed right and left response amplitudes. In addition, the latencies and amplitudes of the AERPs to target stimuli elicited in the monaural right, monaural left, and binaural listening conditions were measured and subjected to analysis of variance with repeated measures testing the effect of listening condition and laterality. Results: Three consecutive BICs were identified at a mean latency of 129, 406, and 554 msec, and were labeled N1-BIC, P3-BIC, and LNC-BIC, respectively. Maximal interaction increased significantly with progression of auditory processing from perceptual to postperceptual stages and amounted to 51%, 55%, and 75% of the sum of monaural responses for N1-BIC, P3-BIC, and LNC-BIC, respectively. Binaural interaction manifested in a decrease of the binaural response compared to the sum of monaural responses. Furthermore, listening condition affected P3 latency only, whereas laterality effects manifested in enhanced N1 amplitudes at the left (T3) vs. right (T4) scalp electrode and in a greater left–right amplitude difference in the right compared to left listening condition. Conclusions: The current AERP data provides evidence for the occurrence of cortical BICs during perceptual and postperceptual stages, presumably reflecting ongoing integration of information presented to the two ears at the final stages of auditory processing. Increasing binaural interaction with the progression of the auditory processing sequence (N1 to LNC) may support the notion that cortical BICs reflect inherited interactions from preceding stages of upstream processing together with discrete cortical neural activity involved in binaural processing. Clinically, an objective measure of cortical binaural processing has the potential of becoming an appealing neural correlate of binaural behavioral performance.
APA, Harvard, Vancouver, ISO, and other styles
48

Wilson, Richard H. "Amplitude (vu and rms) and Temporal (msec) Measures of Two Northwestern University Auditory Test No. 6 Recordings." Journal of the American Academy of Audiology 26, no. 04 (April 2015): 346–54. http://dx.doi.org/10.3766/jaaa.26.4.3.

Full text
Abstract:
Background: In 1940, a cooperative effort by the radio networks and Bell Telephone produced the volume unit (vu) meter that has been the mainstay instrument for monitoring the level of speech signals in commercial broadcasting and research laboratories. With the use of computers, today the amplitude of signals can be quantified easily using the root mean square (rms) algorithm. Researchers had previously reported that amplitude estimates of sentences and running speech were 4.8 dB higher when measured with a vu meter than when calculated with rms. This study addresses the vu–rms relation as applied to the carrier phrase and target word paradigm used to assess word-recognition abilities, the premise being that by definition the word-recognition paradigm is a special and different case from that described previously. Purpose: The purpose was to evaluate the vu and rms amplitude relations for the carrier phrases and target words commonly used to assess word-recognition abilities. In addition, the relations with the target words between rms level and recognition performance were examined. Research Design: Descriptive and correlational. Study Sample: Two recoded versions of the Northwestern University Auditory Test No. 6 were evaluated, the Auditec of St. Louis (Auditec) male speaker and the Department of Veterans Affairs (VA) female speaker. Data Collection and Analysis: Using both visual and auditory cues from a waveform editor, the temporal onsets and offsets were defined for each carrier phrase and each target word. The rms amplitudes for those segments then were computed and expressed in decibels with reference to the maximum digitization range. The data were maintained for each of the four Northwestern University Auditory Test No. 6 word lists. Descriptive analyses were used with linear regressions used to evaluate the reliability of the measurement technique and the relation between the rms levels of the target words and recognition performances. Results: Although there was a 1.3 dB difference between the calibration tones, the mean levels of the carrier phrases for the two recordings were −14.8 dB (Auditec) and −14.1 dB (VA) with standard deviations <1 dB. For the target words, the mean amplitudes were −19.9 dB (Auditec) and −18.3 dB (VA) with standard deviations ranging from 1.3 to 2.4 dB. The mean durations for the carrier phrases of both recordings were 593–594 msec, with the mean durations of the target words a little different, 509 msec (Auditec) and 528 msec (VA). Random relations were observed between the recognition performances and rms levels of the target words. Amplitude and temporal data for the individual words are provided. Conclusions: The rms levels of the carrier phrases closely approximated (±1 dB) the rms levels of the calibration tones, both of which were set to 0 vu (dB). The rms levels of the target words were 5–6 dB below the levels of the carrier phrases and were substantially more variable than the levels of the carrier phrases. The relation between the rms levels of the target words and recognition performances on the words was random.
APA, Harvard, Vancouver, ISO, and other styles
49

Terzopoulos, D. "Co-occurrence analysis of speech waveforms." IEEE Transactions on Acoustics, Speech, and Signal Processing 33, no. 1 (February 1985): 5–30. http://dx.doi.org/10.1109/tassp.1985.1164511.

Full text
APA, Harvard, Vancouver, ISO, and other styles
50

Wilson, Richard H. "Variables that Influence the Recognition Performance of Interrupted Words: Rise-Fall Shape and Temporal Location of the Interruptions." Journal of the American Academy of Audiology 25, no. 07 (July 2014): 688–96. http://dx.doi.org/10.3766/jaaa.25.7.8.

Full text
Abstract:
Background: The abrupt transition of a signal from off to on and vice versa typically produces spectral splatter that can mask other signals that are spectrally removed from the nominal signal frequency. Both the Miller and Licklider (1950) and Cherry (1953) studies of interrupted speech and alternated speech, respectively, acknowledged the generation of extraneous noise by the rapid on and off characteristics of their unshaped signals but noted for slower interruption rates (e.g., 10 interruptions per second); the masking effects were minimal. Recent studies of interrupted speech have avoided this issue by shaping the rise-fall times with a digital algorithm (e.g., Jin and Nelson, 2010; Wang and Humes, 2010). A second variable in the interrupted speech paradigm is the temporal location or placement of the interruptions (i.e., where in the waveform the interruptions occur). Here the issue is this: what parts of an utterance are necessary to enable intelligibility (e.g., Fogerty and Kewley-Port, 2009)? Interruptions may or may not disturb these necessary cues. Purpose: Here is the prompting question: do shaped and unshaped rise-fall characteristics of the on-segments of interrupted speech produce the same or different recognition performances? A second question arises: are recognition performances on complementary halves of an interrupted signal the same or different? Research Design: This study used a mixed-model design with two within-subject variables (unshaped and shaped rise-fall characteristic, complementary halves) and one between-subjects variable (listener group). Study Sample: A total of 12 young listeners (age range: 19–29 yr) with normal hearing and 12 older listeners (age range: 53–80 yr) with hearing loss for pure tones participated. Data Collection and Analysis: A total of 95 consonant-vowel nucleus-consonant words were interrupted (10 interruptions per second; 50% duty cycle) by parsing alternate 50 msec segments to separate files, which provided complementary temporal halves of the target word referenced to word onset; the first on-segment of the 0 msec condition started at word onset, whereas the first on-segment of the 50 msec condition started 50 msec after word onset. The interruption routine either applied no shaping of the 4 msec rise-fall times or a cos2 shape. Each listener received 25 practice words then a unique randomization of 280 interrupted words (70 words, 2 rise-fall shapes, and 2 interrupt onset conditions). Results: The listeners with normal hearing performed 8–16% better on the various comparable conditions than did the older listeners with hearing loss. The mean performance differences between shaped and unshaped rise-fall characteristics ranged from <1–3% and were not significant. Performance was significantly 10–17% better on the 0 msec condition than on the 50 msec condition. There was no significant interaction between the two main variables, rise-fall shape, and onset time of the interruptions. Conclusions: The rise-fall shape of the onset and offset of the on-segment of the interruption cycle does not affect recognition performance of words. The location of the interruptions in a word can have a significant effect on recognition performance.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography