Literatura académica sobre el tema "Speech and audio signals"

Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros

Elija tipo de fuente:

Consulte las listas temáticas de artículos, libros, tesis, actas de conferencias y otras fuentes académicas sobre el tema "Speech and audio signals".

Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.

También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.

Artículos de revistas sobre el tema "Speech and audio signals"

1

Rao*, G. Manmadha, Raidu Babu D.N, Krishna Kanth P.S.L, Vinay B. y Nikhil V. "Reduction of Impulsive Noise from Speech and Audio Signals by using Sd-Rom Algorithm". International Journal of Recent Technology and Engineering 10, n.º 1 (30 de mayo de 2021): 265–68. http://dx.doi.org/10.35940/ijrte.a5943.0510121.

Texto completo
Resumen
Removal of noise is the heart for speech and audio signal processing. Impulse noise is one of the most important noise which corrupts different parts in speech and audio signals. To remove this type of noise from speech and audio signals the technique proposed in this work is signal dependent rank order mean (SD-ROM) method in recursive version. This technique is used to replace the impulse noise samples based on the neighbouring samples. It detects the impulse noise samples based on the rank ordered differences with threshold values. This technique doesn’t change the features and tonal quality of signal. Rank ordered differences is used for detecting the impulse noise samples in speech and audio signals. Once the sample is detected as corrupted sample, that sample is replaced with rank ordered mean value and this rank ordered mean value depends on the sliding window size and neighbouring samples. This technique shows good results in terms of signal to noise ratio (SNR) and peak signal to noise ratio (PSNR) when compared with other techniques. It mainly used for removal of impulse noises from speech and audio signals.
Los estilos APA, Harvard, Vancouver, ISO, etc.
2

S. Ashwin, J. y N. Manoharan. "Audio Denoising Based on Short Time Fourier Transform". Indonesian Journal of Electrical Engineering and Computer Science 9, n.º 1 (1 de enero de 2018): 89. http://dx.doi.org/10.11591/ijeecs.v9.i1.pp89-92.

Texto completo
Resumen
<p>This paper presents a novel audio de-noising scheme in a given speech signal. The recovery of original from the communication channel without any noise is a difficult task. Many de-noising techniques have been proposed for the removal of noises from a digital signal. In this paper, an audio de-noising technique based on Short Time Fourier Transform (STFT) is implemented. The proposed architecture uses a novel approach to estimate environmental noise from speech adaptively. Here original speech signals are given as input signal. Using AWGN, noises are added to the signal. Then noised signals are de-noised using STFT techniques. Finally Signal to Noise Ratio (SNR), Peak Signal to Noise Ratio (PSNR) values for noised and de-noised signals are obtained.</p>
Los estilos APA, Harvard, Vancouver, ISO, etc.
3

Kacur, Juraj, Boris Puterka, Jarmila Pavlovicova y Milos Oravec. "Frequency, Time, Representation and Modeling Aspects for Major Speech and Audio Processing Applications". Sensors 22, n.º 16 (22 de agosto de 2022): 6304. http://dx.doi.org/10.3390/s22166304.

Texto completo
Resumen
There are many speech and audio processing applications and their number is growing. They may cover a wide range of tasks, each having different requirements on the processed speech or audio signals and, therefore, indirectly, on the audio sensors as well. This article reports on tests and evaluation of the effect of basic physical properties of speech and audio signals on the recognition accuracy of major speech/audio processing applications, i.e., speech recognition, speaker recognition, speech emotion recognition, and audio event recognition. A particular focus is on frequency ranges, time intervals, a precision of representation (quantization), and complexities of models suitable for each class of applications. Using domain-specific datasets, eligible feature extraction methods and complex neural network models, it was possible to test and evaluate the effect of basic speech and audio signal properties on the achieved accuracies for each group of applications. The tests confirmed that the basic parameters do affect the overall performance and, moreover, this effect is domain-dependent. Therefore, accurate knowledge of the extent of these effects can be valuable for system designers when selecting appropriate hardware, sensors, architecture, and software for a particular application, especially in the case of limited resources.
Los estilos APA, Harvard, Vancouver, ISO, etc.
4

Nittrouer, Susan y Joanna H. Lowenstein. "Beyond Recognition: Visual Contributions to Verbal Working Memory". Journal of Speech, Language, and Hearing Research 65, n.º 1 (12 de enero de 2022): 253–73. http://dx.doi.org/10.1044/2021_jslhr-21-00177.

Texto completo
Resumen
Purpose: It is well recognized that adding the visual to the acoustic speech signal improves recognition when the acoustic signal is degraded, but how that visual signal affects postrecognition processes is not so well understood. This study was designed to further elucidate the relationships among auditory and visual codes in working memory, a postrecognition process. Design: In a main experiment, 80 young adults with normal hearing were tested using an immediate serial recall paradigm. Three types of signals were presented (unprocessed speech, vocoded speech, and environmental sounds) in three conditions (audio-only, audio–video with dynamic visual signals, and audio–picture with static visual signals). Three dependent measures were analyzed: (a) magnitude of the recency effect, (b) overall recall accuracy, and (c) response times, to assess cognitive effort. In a follow-up experiment, 30 young adults with normal hearing were tested largely using the same procedures, but with a slight change in order of stimulus presentation. Results: The main experiment produced three major findings: (a) unprocessed speech evoked a recency effect of consistent magnitude across conditions; vocoded speech evoked a recency effect of similar magnitude to unprocessed speech only with dynamic visual (lipread) signals; environmental sounds never showed a recency effect. (b) Dynamic and static visual signals enhanced overall recall accuracy to a similar extent, and this enhancement was greater for vocoded speech and environmental sounds than for unprocessed speech. (c) All visual signals reduced cognitive load, except for dynamic visual signals with environmental sounds. The follow-up experiment revealed that dynamic visual (lipread) signals exerted their effect on the vocoded stimuli by enhancing phonological quality. Conclusions: Acoustic and visual signals can combine to enhance working memory operations, but the source of these effects differs for phonological and nonphonological signals. Nonetheless, visual information can support better postrecognition processes for patients with hearing loss.
Los estilos APA, Harvard, Vancouver, ISO, etc.
5

B, Nagesh y Dr M. Uttara Kumari. "A Review on Machine Learning for Audio Applications". Journal of University of Shanghai for Science and Technology 23, n.º 07 (30 de junio de 2021): 62–70. http://dx.doi.org/10.51201/jusst/21/06508.

Texto completo
Resumen
Audio processing is an important branch under the signal processing domain. It deals with the manipulation of the audio signals to achieve a task like filtering, data compression, speech processing, noise suppression, etc. which improves the quality of the audio signal. For applications such as natural language processing, speech generation, automatic speech recognition, the conventional algorithms aren’t sufficient. There is a need for machine learning or deep learning algorithms which can be implemented so that the audio signal processing can be achieved with good results and accuracy. In this paper, a review of the various algorithms used by researchers in the past has been described and gives the appropriate algorithm that can be used for the respective applications.
Los estilos APA, Harvard, Vancouver, ISO, etc.
6

Kubanek, M., J. Bobulski y L. Adrjanowicz. "Characteristics of the use of coupled hidden Markov models for audio-visual polish speech recognition". Bulletin of the Polish Academy of Sciences: Technical Sciences 60, n.º 2 (1 de octubre de 2012): 307–16. http://dx.doi.org/10.2478/v10175-012-0041-6.

Texto completo
Resumen
Abstract. This paper focuses on combining audio-visual signals for Polish speech recognition in conditions of the highly disturbed audio speech signal. Recognition of audio-visual speech was based on combined hidden Markov models (CHMM). The described methods were developed for a single isolated command, nevertheless their effectiveness indicated that they would also work similarly in continuous audiovisual speech recognition. The problem of a visual speech analysis is very difficult and computationally demanding, mostly because of an extreme amount of data that needs to be processed. Therefore, the method of audio-video speech recognition is used only while the audiospeech signal is exposed to a considerable level of distortion. There are proposed the authors’ own methods of the lip edges detection and a visual characteristic extraction in this paper. Moreover, the method of fusing speech characteristics for an audio-video signal was proposed and tested. A significant increase of recognition effectiveness and processing speed were noted during tests - for properly selected CHMM parameters and an adequate codebook size, besides the use of the appropriate fusion of audio-visual characteristics. The experimental results were very promising and close to those achieved by leading scientists in the field of audio-visual speech recognition.
Los estilos APA, Harvard, Vancouver, ISO, etc.
7

Timmermann, Johannes, Florian Ernst y Delf Sachau. "Speech enhancement for helicopter headsets with an integrated ANC-system for FPGA-platforms". INTER-NOISE and NOISE-CON Congress and Conference Proceedings 265, n.º 5 (1 de febrero de 2023): 2720–30. http://dx.doi.org/10.3397/in_2022_0382.

Texto completo
Resumen
During flights, helicopter pilots are exposed to high noise levels caused by rotor, engine and wind. To protect the health of passengers and crew, noise-dampening headsets are used. Modern active noise control (ANC) headset can further reduce the noise exposure for humans in helicopters. Internal or external voice transmission in the helicopter must be adapted to the noisy environment and speech signals are therefore heavily amplified. To improve the quality of communication in helicopters speech and background noise in the transmitted audio signals should be separated. Subsequently the noise components of the signal are eliminated. One established method for this type of speech enhancement is spectral subtraction. In this study, audio files recorded with an artificial head during a helicopter flight are used to evaluate a speech enhancement system with additional ANC capabilities on a rapid prototyping platform. Since both spectral subtraction and the ANC algorithm are computationally intensive, an FPGA is used. The results show a significant enhancement in the quality of the speech signals, which thus lead to improved communication. Furthermore, the enhanced audio signals can be used for voice recognition algorithms.
Los estilos APA, Harvard, Vancouver, ISO, etc.
8

Abdallah, Hanaa A. y Souham Meshoul. "A Multilayered Audio Signal Encryption Approach for Secure Voice Communication". Electronics 12, n.º 1 (20 de diciembre de 2022): 2. http://dx.doi.org/10.3390/electronics12010002.

Texto completo
Resumen
In this paper, multilayer cryptosystems for encrypting audio communications are proposed. These cryptosystems combine audio signals with other active concealing signals, such as speech signals, by continuously fusing the audio signal with a speech signal without silent periods. The goal of these cryptosystems is to prevent unauthorized parties from listening to encrypted audio communications. Preprocessing is performed on both the speech signal and the audio signal before they are combined, as this is necessary to get the signals ready for fusion. Instead of encoding and decoding methods, the cryptosystems rely on the values of audio samples, which allows for saving time while increasing their resistance to hackers and environments with a noisy background. The main feature of the proposed approach is to consider three levels of encryption namely fusion, substitution, and permutation where various combinations are considered. The resulting cryptosystems are compared to the one-dimensional logistic map-based encryption techniques and other state-of-the-art methods. The performance of the suggested cryptosystems is evaluated by the use of the histogram, structural similarity index, signal-to-noise ratio (SNR), log-likelihood ratio, spectrum distortion, and correlation coefficient in simulated testing. A comparative analysis in relation to the encryption of logistic maps is given. This research demonstrates that increasing the level of encryption results in increased security. It is obvious that the proposed salting-based encryption method and the multilayer DCT/DST cryptosystem offer better levels of security as they attain the lowest SNR values, −25 dB and −2.5 dB, respectively. In terms of the used evaluation metrics, the proposed multilayer cryptosystem achieved the best results in discrete cosine transform and discrete sine transform, demonstrating a very promising performance.
Los estilos APA, Harvard, Vancouver, ISO, etc.
9

Yin, Shu Hua. "Design of the Auxiliary Speech Recognition System of Super-Short-Range Reconnaissance Radar". Applied Mechanics and Materials 556-562 (mayo de 2014): 4830–34. http://dx.doi.org/10.4028/www.scientific.net/amm.556-562.4830.

Texto completo
Resumen
To improve the usability and operability of the hybrid-identification reconnaissance radar for individual use, a voice identification System was designed. By using SPCE061A audio signal microprocessor as the core, a digital signal processing technology was used to obtain Doppler radar signals of audio segments by audio cable. Afterwards, the A/D acquisition was conducted to acquire digital signals, and then the data obtained were preprocessed and adaptively filtered to eliminate background noises. Moreover, segmented FFT transforming was used to identify the types of the signals. The overall design of radar voice recognition for an individual soldier was thereby fulfilled. The actual measurements showed that the design of the circuit improved radar resolution and the accuracy of the radar identification.
Los estilos APA, Harvard, Vancouver, ISO, etc.
10

Moore, Brian C. J. "Binaural sharing of audio signals". Hearing Journal 60, n.º 11 (noviembre de 2007): 46–48. http://dx.doi.org/10.1097/01.hj.0000299172.13153.6f.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.

Tesis sobre el tema "Speech and audio signals"

1

Mason, Michael. "Hybrid coding of speech and audio signals". Thesis, Queensland University of Technology, 2001.

Buscar texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
2

Trinkaus, Trevor R. "Perceptual coding of audio and diverse speech signals". Diss., Georgia Institute of Technology, 1999. http://hdl.handle.net/1853/13883.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
3

Mészáros, Tomáš. "Speech Analysis for Processing of Musical Signals". Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2015. http://www.nusl.cz/ntk/nusl-234974.

Texto completo
Resumen
Hlavním cílem této práce je obohatit hudební signály charakteristikami lidské řeči. Práce zahrnuje tvorbu audioefektu inspirovaného efektem talk-box: analýzu hlasového ústrojí vhodným algoritmem jako je lineární predikce, a aplikaci odhadnutého filtru na hudební audio-signál. Důraz je kladen na dokonalou kvalitu výstupu, malou latenci a nízkou výpočetní náročnost pro použití v reálném čase. Výstupem práce je softwarový plugin využitelný v profesionálních aplikacích pro úpravu audia a při využití vhodné hardwarové platformy také pro živé hraní. Plugin emuluje reálné zařízení typu talk-box a poskytuje podobnou kvalitu výstupu s unikátním zvukem.
Los estilos APA, Harvard, Vancouver, ISO, etc.
4

Choi, Hyung Keun. "Blind source separation of the audio signals in a real world". Thesis, Georgia Institute of Technology, 2002. http://hdl.handle.net/1853/14986.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
5

Lucey, Simon. "Audio-visual speech processing". Thesis, Queensland University of Technology, 2002. https://eprints.qut.edu.au/36172/7/SimonLuceyPhDThesis.pdf.

Texto completo
Resumen
Speech is inherently bimodal, relying on cues from the acoustic and visual speech modalities for perception. The McGurk effect demonstrates that when humans are presented with conflicting acoustic and visual stimuli, the perceived sound may not exist in either modality. This effect has formed the basis for modelling the complementary nature of acoustic and visual speech by encapsulating them into the relatively new research field of audio-visual speech processing (AVSP). Traditional acoustic based speech processing systems have attained a high level of performance in recent years, but the performance of these systems is heavily dependent on a match between training and testing conditions. In the presence of mismatched conditions (eg. acoustic noise) the performance of acoustic speech processing applications can degrade markedly. AVSP aims to increase the robustness and performance of conventional speech processing applications through the integration of the acoustic and visual modalities of speech, in particular the tasks of isolated word speech and text-dependent speaker recognition. Two major problems in AVSP are addressed in this thesis, the first of which concerns the extraction of pertinent visual features for effective speech reading and visual speaker recognition. Appropriate representations of the mouth are explored for improved classification performance for speech and speaker recognition. Secondly, there is the question of how to effectively integrate the acoustic and visual speech modalities for robust and improved performance. This question is explored in-depth using hidden Markov model(HMM)classifiers. The development and investigation of integration strategies for AVSP required research into a new branch of pattern recognition known as classifier combination theory. A novel framework is presented for optimally combining classifiers so their combined performance is greater than any of those classifiers individually. The benefits of this framework are not restricted to AVSP, as they can be applied to any task where there is a need for combining independent classifiers.
Los estilos APA, Harvard, Vancouver, ISO, etc.
6

Anderson, David Verl. "Audio signal enhancement using multi-resolution sinusoidal modeling". Diss., Georgia Institute of Technology, 1999. http://hdl.handle.net/1853/15394.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
7

Zeghidour, Neil. "Learning representations of speech from the raw waveform". Thesis, Paris Sciences et Lettres (ComUE), 2019. http://www.theses.fr/2019PSLEE004/document.

Texto completo
Resumen
Bien que les réseaux de neurones soient à présent utilisés dans la quasi-totalité des composants d’un système de reconnaissance de la parole, du modèle acoustique au modèle de langue, l’entrée de ces systèmes reste une représentation analytique et fixée de la parole dans le domaine temps-fréquence, telle que les mel-filterbanks. Cela se distingue de la vision par ordinateur, un domaine où les réseaux de neurones prennent en entrée les pixels bruts. Les mel-filterbanks sont le produit d’une connaissance précieuse et documentée du système auditif humain, ainsi que du traitement du signal, et sont utilisées dans les systèmes de reconnaissance de la parole les plus en pointe, systèmes qui rivalisent désormais avec les humains dans certaines conditions. Cependant, les mel-filterbanks, comme toute représentation fixée, sont fondamentalement limitées par le fait qu’elles ne soient pas affinées par apprentissage pour la tâche considérée. Nous formulons l’hypothèse qu’apprendre ces représentations de bas niveau de la parole, conjontement avec le modèle, permettrait de faire avancer davantage l’état de l’art. Nous explorons tout d’abord des approches d’apprentissage faiblement supervisé et montrons que nous pouvons entraîner un unique réseau de neurones à séparer l’information phonétique de celle du locuteur à partir de descripteurs spectraux ou du signal brut et que ces représentations se transfèrent à travers les langues. De plus, apprendre à partir du signal brut produit des représentations du locuteur significativement meilleures que celles d’un modèle entraîné sur des mel-filterbanks. Ces résultats encourageants nous mènent par la suite à développer une alternative aux mel-filterbanks qui peut être entraînée à partir des données. Dans la seconde partie de cette thèse, nous proposons les Time-Domain filterbanks, une architecture neuronale légère prenant en entrée la forme d’onde, dont on peut initialiser les poids pour répliquer les mel-filterbanks et qui peut, par la suite, être entraînée par rétro-propagation avec le reste du réseau de neurones. Au cours d’expériences systématiques et approfondies, nous montrons que les Time-Domain filterbanks surclassent systématiquement les melfilterbanks, et peuvent être intégrées dans le premier système de reconnaissance de la parole purement convolutif et entraîné à partir du signal brut, qui constitue actuellement un nouvel état de l’art. Les descripteurs fixes étant également utilisés pour des tâches de classification non-linguistique, pour lesquelles elles sont d’autant moins optimales, nous entraînons un système de détection de dysarthrie à partir du signal brut, qui surclasse significativement un système équivalent entraîné sur des mel-filterbanks ou sur des descripteurs de bas niveau. Enfin, nous concluons cette thèse en expliquant en quoi nos contributions s’inscrivent dans une transition plus large vers des systèmes de compréhension du son qui pourront être appris de bout en bout
While deep neural networks are now used in almost every component of a speech recognition system, from acoustic to language modeling, the input to such systems are still fixed, handcrafted, spectral features such as mel-filterbanks. This contrasts with computer vision, in which a deep neural network is now trained on raw pixels. Mel-filterbanks contain valuable and documented prior knowledge from human auditory perception as well as signal processing, and are the input to state-of-the-art speech recognition systems that are now on par with human performance in certain conditions. However, mel-filterbanks, as any fixed representation, are inherently limited by the fact that they are not fine-tuned for the task at hand. We hypothesize that learning the low-level representation of speech with the rest of the model, rather than using fixed features, could push the state-of-the art even further. We first explore a weakly-supervised setting and show that a single neural network can learn to separate phonetic information and speaker identity from mel-filterbanks or the raw waveform, and that these representations are robust across languages. Moreover, learning from the raw waveform provides significantly better speaker embeddings than learning from mel-filterbanks. These encouraging results lead us to develop a learnable alternative to mel-filterbanks, that can be directly used in replacement of these features. In the second part of this thesis we introduce Time-Domain filterbanks, a lightweight neural network that takes the waveform as input, can be initialized as an approximation of mel-filterbanks, and then learned with the rest of the neural architecture. Across extensive and systematic experiments, we show that Time-Domain filterbanks consistently outperform melfilterbanks and can be integrated into a new state-of-the-art speech recognition system, trained directly from the raw audio signal. Fixed speech features being also used for non-linguistic classification tasks for which they are even less optimal, we perform dysarthria detection from the waveform with Time-Domain filterbanks and show that it significantly improves over mel-filterbanks or low-level descriptors. Finally, we discuss how our contributions fall within a broader shift towards fully learnable audio understanding systems
Los estilos APA, Harvard, Vancouver, ISO, etc.
8

Bando, Yoshiaki. "Robust Audio Scene Analysis for Rescue Robots". Kyoto University, 2018. http://hdl.handle.net/2433/232410.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
9

Moghimi, Amir Reza. "Array-based Spectro-temporal Masking For Automatic Speech Recognition". Research Showcase @ CMU, 2014. http://repository.cmu.edu/dissertations/334.

Texto completo
Resumen
Over the years, a variety of array processing techniques have been applied to the problem of enhancing degraded speech to improve automatic speech recognition. In this context, linear beamforming has long been the approach of choice, for reasons including good performance, robustness and analytical simplicity. While various non-linear techniques - typically based to some extent on the study of auditory scene analysis - have also been of interest, they tend to lag behind their linear counterparts in terms of simplicity, scalability and exibility. Nonlinear techniques are also more difficult to analyze and lack the systematic descriptions available in the study of linear beamformers. This work focuses on a class of nonlinear processing, known as time-frequency (T-F) masking - a.k.a. spectro-temporal masking { whose variants comprise a significant portion of the existing techniques. T-F masking is based on accepting or rejecting individual time-frequency cells based on some estimate of local signal quality. Analyses are developed that attempt to mirror the beam patterns used to describe linear processing, leading to a view of T-F masking as "nonlinear beamforming". Two distinct formulations of these "nonlinear beam patterns" are developed, based on different metrics of the algorithms behavior; these formulations are modeled in a variety of scenarios to demonstrate the flexibility of the idea. While these patterns are not quite as simple or all-encompassing as traditional beam patterns in microphone-array processing, they do accurately represent the behavior of masking algorithms in analogous and intuitive ways. In addition to analyzing this class of nonlinear masking algorithm, we also attempt to improve its performance in a variety of ways. Improvements are proposed to the baseline two-channel version of masking, by addressing both the mask estimation and the signal reconstruction stages; the latter more successfully than the former. Furthermore, while these approaches have been shown to outperform linear beamforming in two-sensor arrays, extensions to larger arrays have been few and unsuccessful. We find that combining beamforming and masking is a viable method of bringing the benefits of masking to larger arrays. As a result, a hybrid beamforming-masking approach, called "post-masking", is developed that improves upon the performance of MMSE beamforming (and can be used with any beamforming technique), with the potential for even greater improvement in the future.
Los estilos APA, Harvard, Vancouver, ISO, etc.
10

Brangers, Kirstin M. "Perceptual Ruler for Quantifying Speech Intelligibility in Cocktail Party Scenarios". UKnowledge, 2013. http://uknowledge.uky.edu/ece_etds/31.

Texto completo
Resumen
Systems designed to enhance intelligibility of speech in noise are difficult to evaluate quantitatively because intelligibility is subjective and often requires feedback from large populations for consistent evaluations. Attempts to quantify the evaluation have included related measures such as the Speech Intelligibility Index. These require separating speech and noise signals, which precludes its use on experimental recordings. This thesis develops a procedure using an Intelligibility Ruler (IR) for efficiently quantifying intelligibility. A calibrated Mean Opinion Score (MOS) method is also implemented in order to compare repeatability over a population of 24 subjective listeners. Results showed that subjects using the IR consistently estimated SII values of the test samples with an average standard deviation of 0.0867 between subjects on a scale from zero to one and R2=0.9421. After a calibration procedure from a subset of subjects, the MOS method yielded similar results with an average standard deviation of 0.07620 and R2=0.9181.While results suggest good repeatability of the IR method over a broad range of subjects, the calibrated MOS method is capable of producing results more closely related to actual SII values and is a simpler procedure for human subjects.
Los estilos APA, Harvard, Vancouver, ISO, etc.

Libros sobre el tema "Speech and audio signals"

1

Gold, Ben, Nelson Morgan y Dan Ellis. Speech and Audio Signal Processing. Hoboken, NJ, USA: John Wiley & Sons, Inc., 2011. http://dx.doi.org/10.1002/9781118142882.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
2

Deller, John R. Discrete-time processing of speech signals. New York: Institute of Electrical and Electronics Engineers, 2000.

Buscar texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
3

G, Proakis John y Hansen John H. L, eds. Discrete-time processing of speech signals. New York: Macmillan Pub. Co., 1993.

Buscar texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
4

V, Madisetti, ed. Video, speech, and audio signal processing and associated standards. Boca Raton, FL: CRC Press, 2009.

Buscar texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
5

V, Madisetti, ed. Video, speech, and audio signal processing and associated standards. Boca Raton, FL: CRC Press, 2009.

Buscar texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
6

Gold, Bernard. Speech and audio signal processing: Processing and perception of speech and music. 2a ed. Hoboken, N.J: Wiley, 2011.

Buscar texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
7

Nelson, Morgan, ed. Speech and audio signal processing: Processing and perception of speech and music. New York: John Wiley, 2000.

Buscar texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
8

Madisetti, V. Video, speech, and audio signal processing and associated standards. 2a ed. Boca Raton, FL: CRC Press, 2010.

Buscar texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
9

Madisetti, V. Video, speech, and audio signal processing and associated standards. 2a ed. Boca Raton, FL: CRC Press, 2010.

Buscar texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
10

S, Atal Bishnu, Cuperman Vladimir y Gersho Allen, eds. Speech and audio coding for wireless and network applications. Boston: Kluwer Academic Publishers, 1993.

Buscar texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.

Capítulos de libros sobre el tema "Speech and audio signals"

1

Buchanan, William J. "Speech and Audio Signals". En Advanced Data Communications and Networks, 111–27. Boston, MA: Springer US, 1997. http://dx.doi.org/10.1007/978-1-4419-8670-2_8.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
2

Buchanan, Bill. "Speech and Audio Signals". En Handbook of Data Communications and Networks, 96–109. Boston, MA: Springer US, 1999. http://dx.doi.org/10.1007/978-1-4757-0905-6_9.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
3

Buchanan, W. J. "Speech and Audio Signals". En The Handbook of Data Communications and Networks, 359–72. Boston, MA: Springer US, 2004. http://dx.doi.org/10.1007/978-1-4020-7870-5_19.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
4

Buchanan, W. "Speech and Audio Signals". En Advanced Data Communications and Networks, 111–27. London: CRC Press, 2023. http://dx.doi.org/10.1201/9781003420415-8.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
5

Richter, Michael M., Sheuli Paul, Veton Këpuska y Marius Silaghi. "Audio Signals and Speech Recognition". En Signal Processing and Machine Learning with Applications, 345–68. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-319-45372-9_18.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
6

Buchner, Herbert y Walter Kellermann. "TRINICON for Dereverberation of Speech and Audio Signals". En Speech Dereverberation, 311–85. London: Springer London, 2010. http://dx.doi.org/10.1007/978-1-84996-056-4_10.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
7

Douglas, Scott C. y Malay Gupta. "Convolutive Blind Source Separation for Audio Signals". En Blind Speech Separation, 3–45. Dordrecht: Springer Netherlands, 2007. http://dx.doi.org/10.1007/978-1-4020-6479-1_1.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
8

Herre, Jürgen y Manfred Lutzky. "Perceptual Audio Coding of Speech Signals". En Springer Handbook of Speech Processing, 393–410. Berlin, Heidelberg: Springer Berlin Heidelberg, 2008. http://dx.doi.org/10.1007/978-3-540-49127-9_18.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
9

Kellermann, Walter. "Beamforming for Speech and Audio Signals". En Handbook of Signal Processing in Acoustics, 691–702. New York, NY: Springer New York, 2008. http://dx.doi.org/10.1007/978-0-387-30441-0_35.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
10

Shamsi, Meysam, Nelly Barbot, Damien Lolive y Jonathan Chevelu. "Mixing Synthetic and Recorded Signals for Audio-Book Generation". En Speech and Computer, 479–89. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-60276-5_46.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.

Actas de conferencias sobre el tema "Speech and audio signals"

1

Shriver, Stefanie, Alan W. Black y Ronald Rosenfeld. "Audio signals in speech interfaces". En 6th International Conference on Spoken Language Processing (ICSLP 2000). ISCA: ISCA, 2000. http://dx.doi.org/10.21437/icslp.2000-35.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
2

"Signal Processing. Speech and Audio Processing". En 2022 29th International Conference on Systems, Signals and Image Processing (IWSSIP). IEEE, 2022. http://dx.doi.org/10.1109/iwssip55020.2022.9854416.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
3

Phan, Huy, Lars Hertel, Marco Maass, Radoslaw Mazur y Alfred Mertins. "Representing nonspeech audio signals through speech classification models". En Interspeech 2015. ISCA: ISCA, 2015. http://dx.doi.org/10.21437/interspeech.2015-682.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
4

Kammerl, Julius, Neil Birkbeck, Sasi Inguva, Damien Kelly, A. J. Crawford, Hugh Denman, Anil Kokaram y Caroline Pantofaru. "Temporal synchronization of multiple audio signals". En ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2014. http://dx.doi.org/10.1109/icassp.2014.6854474.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
5

"Main Track: Speech and Audio Processing". En 2020 International Conference on Systems, Signals and Image Processing (IWSSIP). IEEE, 2020. http://dx.doi.org/10.1109/iwssip48289.2020.9145083.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
6

Movassagh, Mahmood, Joachim Thiemann y Peter Kabal. "Joint entropy-scalable coding of audio signals". En ICASSP 2012 - 2012 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2012. http://dx.doi.org/10.1109/icassp.2012.6288537.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
7

van der Waal, R. G. y R. N. J. Veldhuis. "Subband coding of stereophonic digital audio signals". En [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing. IEEE, 1991. http://dx.doi.org/10.1109/icassp.1991.151053.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
8

Rajagopalan, R. y B. Subramanian. "Removal of impulse noise from audio and speech signals". En International Symposium on Signals, Circuits and Systems, 2003. SCS 2003. IEEE, 2003. http://dx.doi.org/10.1109/scs.2003.1226973.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
9

Ziolko, Mariusz, Bartosz Ziolko y Rafal Samborski. "Dual-Microphone Speech Extraction from Signals with Audio Background". En 2009 Fifth International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP). IEEE, 2009. http://dx.doi.org/10.1109/iih-msp.2009.34.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
10

Braun, Jerome J. y Haim Levkowitz. "Internet-oriented visualization with audio presentation of speech signals". En Photonics West '98 Electronic Imaging, editado por Robert F. Erbacher y Alex Pang. SPIE, 1998. http://dx.doi.org/10.1117/12.309555.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.

Informes sobre el tema "Speech and audio signals"

1

DeLeon, Phillip L. Techniques for Preprocessing Speech Signals for More Effective Audio Interfaces. Fort Belvoir, VA: Defense Technical Information Center, diciembre de 2001. http://dx.doi.org/10.21236/ada412195.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
2

Mammone, Richard J., Khaled Assaleh, Kevin Farrell, Ravi Ramachandran y Mihailo Zilovic. A Modulation Model for Characterizing Speech Signals. Fort Belvoir, VA: Defense Technical Information Center, marzo de 1996. http://dx.doi.org/10.21236/ada311661.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
3

Spittka, J. y K. Vos. RTP Payload Format for the Opus Speech and Audio Codec. RFC Editor, junio de 2015. http://dx.doi.org/10.17487/rfc7587.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
4

Herrnstein, A. Start/End Delays of Voiced and Unvoiced Speech Signals. Office of Scientific and Technical Information (OSTI), septiembre de 1999. http://dx.doi.org/10.2172/15006006.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
5

Chan, A. D., K. Englehart, B. Hudgins y D. F. Lovely. Hidden Markov Model Classification of Myoelectric Signals in Speech. Fort Belvoir, VA: Defense Technical Information Center, octubre de 2001. http://dx.doi.org/10.21236/ada410037.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
6

STANDARD OBJECT SYSTEMS INC. Advanced Audio Interface for Phonetic Speech Recognition in a High Noise Environment. Fort Belvoir, VA: Defense Technical Information Center, enero de 2000. http://dx.doi.org/10.21236/ada373461.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
7

Nelson, W. T., Robert S. Bolia, Mark A. Ericson y Richard L. McKinley. Spatial Audio Displays for Speech Communications: A Comparison of Free Field and Virtual Acoustic Environments. Fort Belvoir, VA: Defense Technical Information Center, enero de 1999. http://dx.doi.org/10.21236/ada430289.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
8

Nelson, W. T., Robert S. Bolia, Mark A. Ericson y Richard L. McKinley. Monitoring the Simultaneous Presentation of Spatialized Speech Signals in a Virtual Acoustic Environment. Fort Belvoir, VA: Defense Technical Information Center, enero de 1998. http://dx.doi.org/10.21236/ada430284.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
9

Nelson, W. T., Robert S. Bolia, Mark A. Ericson y Richard L. McKinley. Monitoring the Simultaneous Presentation of Multiple Spatialized Speech Signals in the Free Field. Fort Belvoir, VA: Defense Technical Information Center, enero de 1998. http://dx.doi.org/10.21236/ada430298.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
10

Hamlin, Alexandra, Erik Kobylarz, James Lever, Susan Taylor y Laura Ray. Assessing the feasibility of detecting epileptic seizures using non-cerebral sensor. Engineer Research and Development Center (U.S.), diciembre de 2021. http://dx.doi.org/10.21079/11681/42562.

Texto completo
Resumen
This paper investigates the feasibility of using non-cerebral, time-series data to detect epileptic seizures. Data were recorded from fifteen patients (7 male, 5 female, 3 not noted, mean age 36.17 yrs), five of whom had a total of seven seizures. Patients were monitored in an inpatient setting using standard video electroencephalography (vEEG), while also wearing sensors monitoring electrocardiography, electrodermal activity, electromyography, accelerometry, and audio signals (vocalizations). A systematic and detailed study was conducted to identify the sensors and the features derived from the non-cerebral sensors that contribute most significantly to separability of data acquired during seizures from non-seizure data. Post-processing of the data using linear discriminant analysis (LDA) shows that seizure data are strongly separable from non-seizure data based on features derived from the signals recorded. The mean area under the receiver operator characteristic (ROC) curve for each individual patient that experienced a seizure during data collection, calculated using LDA, was 0.9682. The features that contribute most significantly to seizure detection differ for each patient. The results show that a multimodal approach to seizure detection using the specified sensor suite is promising in detecting seizures with both sensitivity and specificity. Moreover, the study provides a means to quantify the contribution of each sensor and feature to separability. Development of a non-electroencephalography (EEG) based seizure detection device would give doctors a more accurate seizure count outside of the clinical setting, improving treatment and the quality of life of epilepsy patients.
Los estilos APA, Harvard, Vancouver, ISO, etc.
Ofrecemos descuentos en todos los planes premium para autores cuyas obras están incluidas en selecciones literarias temáticas. ¡Contáctenos para obtener un código promocional único!

Pasar a la bibliografía