Academic literature on the topic 'Speech and audio signals'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Speech and audio signals.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Speech and audio signals"

1

Rao*, G. Manmadha, Raidu Babu D.N, Krishna Kanth P.S.L, Vinay B., and Nikhil V. "Reduction of Impulsive Noise from Speech and Audio Signals by using Sd-Rom Algorithm." International Journal of Recent Technology and Engineering 10, no. 1 (May 30, 2021): 265–68. http://dx.doi.org/10.35940/ijrte.a5943.0510121.

Full text
Abstract:
Removal of noise is the heart for speech and audio signal processing. Impulse noise is one of the most important noise which corrupts different parts in speech and audio signals. To remove this type of noise from speech and audio signals the technique proposed in this work is signal dependent rank order mean (SD-ROM) method in recursive version. This technique is used to replace the impulse noise samples based on the neighbouring samples. It detects the impulse noise samples based on the rank ordered differences with threshold values. This technique doesn’t change the features and tonal quality of signal. Rank ordered differences is used for detecting the impulse noise samples in speech and audio signals. Once the sample is detected as corrupted sample, that sample is replaced with rank ordered mean value and this rank ordered mean value depends on the sliding window size and neighbouring samples. This technique shows good results in terms of signal to noise ratio (SNR) and peak signal to noise ratio (PSNR) when compared with other techniques. It mainly used for removal of impulse noises from speech and audio signals.
APA, Harvard, Vancouver, ISO, and other styles
2

S. Ashwin, J., and N. Manoharan. "Audio Denoising Based on Short Time Fourier Transform." Indonesian Journal of Electrical Engineering and Computer Science 9, no. 1 (January 1, 2018): 89. http://dx.doi.org/10.11591/ijeecs.v9.i1.pp89-92.

Full text
Abstract:
<p>This paper presents a novel audio de-noising scheme in a given speech signal. The recovery of original from the communication channel without any noise is a difficult task. Many de-noising techniques have been proposed for the removal of noises from a digital signal. In this paper, an audio de-noising technique based on Short Time Fourier Transform (STFT) is implemented. The proposed architecture uses a novel approach to estimate environmental noise from speech adaptively. Here original speech signals are given as input signal. Using AWGN, noises are added to the signal. Then noised signals are de-noised using STFT techniques. Finally Signal to Noise Ratio (SNR), Peak Signal to Noise Ratio (PSNR) values for noised and de-noised signals are obtained.</p>
APA, Harvard, Vancouver, ISO, and other styles
3

Kacur, Juraj, Boris Puterka, Jarmila Pavlovicova, and Milos Oravec. "Frequency, Time, Representation and Modeling Aspects for Major Speech and Audio Processing Applications." Sensors 22, no. 16 (August 22, 2022): 6304. http://dx.doi.org/10.3390/s22166304.

Full text
Abstract:
There are many speech and audio processing applications and their number is growing. They may cover a wide range of tasks, each having different requirements on the processed speech or audio signals and, therefore, indirectly, on the audio sensors as well. This article reports on tests and evaluation of the effect of basic physical properties of speech and audio signals on the recognition accuracy of major speech/audio processing applications, i.e., speech recognition, speaker recognition, speech emotion recognition, and audio event recognition. A particular focus is on frequency ranges, time intervals, a precision of representation (quantization), and complexities of models suitable for each class of applications. Using domain-specific datasets, eligible feature extraction methods and complex neural network models, it was possible to test and evaluate the effect of basic speech and audio signal properties on the achieved accuracies for each group of applications. The tests confirmed that the basic parameters do affect the overall performance and, moreover, this effect is domain-dependent. Therefore, accurate knowledge of the extent of these effects can be valuable for system designers when selecting appropriate hardware, sensors, architecture, and software for a particular application, especially in the case of limited resources.
APA, Harvard, Vancouver, ISO, and other styles
4

Nittrouer, Susan, and Joanna H. Lowenstein. "Beyond Recognition: Visual Contributions to Verbal Working Memory." Journal of Speech, Language, and Hearing Research 65, no. 1 (January 12, 2022): 253–73. http://dx.doi.org/10.1044/2021_jslhr-21-00177.

Full text
Abstract:
Purpose: It is well recognized that adding the visual to the acoustic speech signal improves recognition when the acoustic signal is degraded, but how that visual signal affects postrecognition processes is not so well understood. This study was designed to further elucidate the relationships among auditory and visual codes in working memory, a postrecognition process. Design: In a main experiment, 80 young adults with normal hearing were tested using an immediate serial recall paradigm. Three types of signals were presented (unprocessed speech, vocoded speech, and environmental sounds) in three conditions (audio-only, audio–video with dynamic visual signals, and audio–picture with static visual signals). Three dependent measures were analyzed: (a) magnitude of the recency effect, (b) overall recall accuracy, and (c) response times, to assess cognitive effort. In a follow-up experiment, 30 young adults with normal hearing were tested largely using the same procedures, but with a slight change in order of stimulus presentation. Results: The main experiment produced three major findings: (a) unprocessed speech evoked a recency effect of consistent magnitude across conditions; vocoded speech evoked a recency effect of similar magnitude to unprocessed speech only with dynamic visual (lipread) signals; environmental sounds never showed a recency effect. (b) Dynamic and static visual signals enhanced overall recall accuracy to a similar extent, and this enhancement was greater for vocoded speech and environmental sounds than for unprocessed speech. (c) All visual signals reduced cognitive load, except for dynamic visual signals with environmental sounds. The follow-up experiment revealed that dynamic visual (lipread) signals exerted their effect on the vocoded stimuli by enhancing phonological quality. Conclusions: Acoustic and visual signals can combine to enhance working memory operations, but the source of these effects differs for phonological and nonphonological signals. Nonetheless, visual information can support better postrecognition processes for patients with hearing loss.
APA, Harvard, Vancouver, ISO, and other styles
5

B, Nagesh, and Dr M. Uttara Kumari. "A Review on Machine Learning for Audio Applications." Journal of University of Shanghai for Science and Technology 23, no. 07 (June 30, 2021): 62–70. http://dx.doi.org/10.51201/jusst/21/06508.

Full text
Abstract:
Audio processing is an important branch under the signal processing domain. It deals with the manipulation of the audio signals to achieve a task like filtering, data compression, speech processing, noise suppression, etc. which improves the quality of the audio signal. For applications such as natural language processing, speech generation, automatic speech recognition, the conventional algorithms aren’t sufficient. There is a need for machine learning or deep learning algorithms which can be implemented so that the audio signal processing can be achieved with good results and accuracy. In this paper, a review of the various algorithms used by researchers in the past has been described and gives the appropriate algorithm that can be used for the respective applications.
APA, Harvard, Vancouver, ISO, and other styles
6

Kubanek, M., J. Bobulski, and L. Adrjanowicz. "Characteristics of the use of coupled hidden Markov models for audio-visual polish speech recognition." Bulletin of the Polish Academy of Sciences: Technical Sciences 60, no. 2 (October 1, 2012): 307–16. http://dx.doi.org/10.2478/v10175-012-0041-6.

Full text
Abstract:
Abstract. This paper focuses on combining audio-visual signals for Polish speech recognition in conditions of the highly disturbed audio speech signal. Recognition of audio-visual speech was based on combined hidden Markov models (CHMM). The described methods were developed for a single isolated command, nevertheless their effectiveness indicated that they would also work similarly in continuous audiovisual speech recognition. The problem of a visual speech analysis is very difficult and computationally demanding, mostly because of an extreme amount of data that needs to be processed. Therefore, the method of audio-video speech recognition is used only while the audiospeech signal is exposed to a considerable level of distortion. There are proposed the authors’ own methods of the lip edges detection and a visual characteristic extraction in this paper. Moreover, the method of fusing speech characteristics for an audio-video signal was proposed and tested. A significant increase of recognition effectiveness and processing speed were noted during tests - for properly selected CHMM parameters and an adequate codebook size, besides the use of the appropriate fusion of audio-visual characteristics. The experimental results were very promising and close to those achieved by leading scientists in the field of audio-visual speech recognition.
APA, Harvard, Vancouver, ISO, and other styles
7

Timmermann, Johannes, Florian Ernst, and Delf Sachau. "Speech enhancement for helicopter headsets with an integrated ANC-system for FPGA-platforms." INTER-NOISE and NOISE-CON Congress and Conference Proceedings 265, no. 5 (February 1, 2023): 2720–30. http://dx.doi.org/10.3397/in_2022_0382.

Full text
Abstract:
During flights, helicopter pilots are exposed to high noise levels caused by rotor, engine and wind. To protect the health of passengers and crew, noise-dampening headsets are used. Modern active noise control (ANC) headset can further reduce the noise exposure for humans in helicopters. Internal or external voice transmission in the helicopter must be adapted to the noisy environment and speech signals are therefore heavily amplified. To improve the quality of communication in helicopters speech and background noise in the transmitted audio signals should be separated. Subsequently the noise components of the signal are eliminated. One established method for this type of speech enhancement is spectral subtraction. In this study, audio files recorded with an artificial head during a helicopter flight are used to evaluate a speech enhancement system with additional ANC capabilities on a rapid prototyping platform. Since both spectral subtraction and the ANC algorithm are computationally intensive, an FPGA is used. The results show a significant enhancement in the quality of the speech signals, which thus lead to improved communication. Furthermore, the enhanced audio signals can be used for voice recognition algorithms.
APA, Harvard, Vancouver, ISO, and other styles
8

Abdallah, Hanaa A., and Souham Meshoul. "A Multilayered Audio Signal Encryption Approach for Secure Voice Communication." Electronics 12, no. 1 (December 20, 2022): 2. http://dx.doi.org/10.3390/electronics12010002.

Full text
Abstract:
In this paper, multilayer cryptosystems for encrypting audio communications are proposed. These cryptosystems combine audio signals with other active concealing signals, such as speech signals, by continuously fusing the audio signal with a speech signal without silent periods. The goal of these cryptosystems is to prevent unauthorized parties from listening to encrypted audio communications. Preprocessing is performed on both the speech signal and the audio signal before they are combined, as this is necessary to get the signals ready for fusion. Instead of encoding and decoding methods, the cryptosystems rely on the values of audio samples, which allows for saving time while increasing their resistance to hackers and environments with a noisy background. The main feature of the proposed approach is to consider three levels of encryption namely fusion, substitution, and permutation where various combinations are considered. The resulting cryptosystems are compared to the one-dimensional logistic map-based encryption techniques and other state-of-the-art methods. The performance of the suggested cryptosystems is evaluated by the use of the histogram, structural similarity index, signal-to-noise ratio (SNR), log-likelihood ratio, spectrum distortion, and correlation coefficient in simulated testing. A comparative analysis in relation to the encryption of logistic maps is given. This research demonstrates that increasing the level of encryption results in increased security. It is obvious that the proposed salting-based encryption method and the multilayer DCT/DST cryptosystem offer better levels of security as they attain the lowest SNR values, −25 dB and −2.5 dB, respectively. In terms of the used evaluation metrics, the proposed multilayer cryptosystem achieved the best results in discrete cosine transform and discrete sine transform, demonstrating a very promising performance.
APA, Harvard, Vancouver, ISO, and other styles
9

Yin, Shu Hua. "Design of the Auxiliary Speech Recognition System of Super-Short-Range Reconnaissance Radar." Applied Mechanics and Materials 556-562 (May 2014): 4830–34. http://dx.doi.org/10.4028/www.scientific.net/amm.556-562.4830.

Full text
Abstract:
To improve the usability and operability of the hybrid-identification reconnaissance radar for individual use, a voice identification System was designed. By using SPCE061A audio signal microprocessor as the core, a digital signal processing technology was used to obtain Doppler radar signals of audio segments by audio cable. Afterwards, the A/D acquisition was conducted to acquire digital signals, and then the data obtained were preprocessed and adaptively filtered to eliminate background noises. Moreover, segmented FFT transforming was used to identify the types of the signals. The overall design of radar voice recognition for an individual soldier was thereby fulfilled. The actual measurements showed that the design of the circuit improved radar resolution and the accuracy of the radar identification.
APA, Harvard, Vancouver, ISO, and other styles
10

Moore, Brian C. J. "Binaural sharing of audio signals." Hearing Journal 60, no. 11 (November 2007): 46–48. http://dx.doi.org/10.1097/01.hj.0000299172.13153.6f.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "Speech and audio signals"

1

Mason, Michael. "Hybrid coding of speech and audio signals." Thesis, Queensland University of Technology, 2001.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
2

Trinkaus, Trevor R. "Perceptual coding of audio and diverse speech signals." Diss., Georgia Institute of Technology, 1999. http://hdl.handle.net/1853/13883.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Mészáros, Tomáš. "Speech Analysis for Processing of Musical Signals." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2015. http://www.nusl.cz/ntk/nusl-234974.

Full text
Abstract:
Hlavním cílem této práce je obohatit hudební signály charakteristikami lidské řeči. Práce zahrnuje tvorbu audioefektu inspirovaného efektem talk-box: analýzu hlasového ústrojí vhodným algoritmem jako je lineární predikce, a aplikaci odhadnutého filtru na hudební audio-signál. Důraz je kladen na dokonalou kvalitu výstupu, malou latenci a nízkou výpočetní náročnost pro použití v reálném čase. Výstupem práce je softwarový plugin využitelný v profesionálních aplikacích pro úpravu audia a při využití vhodné hardwarové platformy také pro živé hraní. Plugin emuluje reálné zařízení typu talk-box a poskytuje podobnou kvalitu výstupu s unikátním zvukem.
APA, Harvard, Vancouver, ISO, and other styles
4

Choi, Hyung Keun. "Blind source separation of the audio signals in a real world." Thesis, Georgia Institute of Technology, 2002. http://hdl.handle.net/1853/14986.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Lucey, Simon. "Audio-visual speech processing." Thesis, Queensland University of Technology, 2002. https://eprints.qut.edu.au/36172/7/SimonLuceyPhDThesis.pdf.

Full text
Abstract:
Speech is inherently bimodal, relying on cues from the acoustic and visual speech modalities for perception. The McGurk effect demonstrates that when humans are presented with conflicting acoustic and visual stimuli, the perceived sound may not exist in either modality. This effect has formed the basis for modelling the complementary nature of acoustic and visual speech by encapsulating them into the relatively new research field of audio-visual speech processing (AVSP). Traditional acoustic based speech processing systems have attained a high level of performance in recent years, but the performance of these systems is heavily dependent on a match between training and testing conditions. In the presence of mismatched conditions (eg. acoustic noise) the performance of acoustic speech processing applications can degrade markedly. AVSP aims to increase the robustness and performance of conventional speech processing applications through the integration of the acoustic and visual modalities of speech, in particular the tasks of isolated word speech and text-dependent speaker recognition. Two major problems in AVSP are addressed in this thesis, the first of which concerns the extraction of pertinent visual features for effective speech reading and visual speaker recognition. Appropriate representations of the mouth are explored for improved classification performance for speech and speaker recognition. Secondly, there is the question of how to effectively integrate the acoustic and visual speech modalities for robust and improved performance. This question is explored in-depth using hidden Markov model(HMM)classifiers. The development and investigation of integration strategies for AVSP required research into a new branch of pattern recognition known as classifier combination theory. A novel framework is presented for optimally combining classifiers so their combined performance is greater than any of those classifiers individually. The benefits of this framework are not restricted to AVSP, as they can be applied to any task where there is a need for combining independent classifiers.
APA, Harvard, Vancouver, ISO, and other styles
6

Anderson, David Verl. "Audio signal enhancement using multi-resolution sinusoidal modeling." Diss., Georgia Institute of Technology, 1999. http://hdl.handle.net/1853/15394.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Zeghidour, Neil. "Learning representations of speech from the raw waveform." Thesis, Paris Sciences et Lettres (ComUE), 2019. http://www.theses.fr/2019PSLEE004/document.

Full text
Abstract:
Bien que les réseaux de neurones soient à présent utilisés dans la quasi-totalité des composants d’un système de reconnaissance de la parole, du modèle acoustique au modèle de langue, l’entrée de ces systèmes reste une représentation analytique et fixée de la parole dans le domaine temps-fréquence, telle que les mel-filterbanks. Cela se distingue de la vision par ordinateur, un domaine où les réseaux de neurones prennent en entrée les pixels bruts. Les mel-filterbanks sont le produit d’une connaissance précieuse et documentée du système auditif humain, ainsi que du traitement du signal, et sont utilisées dans les systèmes de reconnaissance de la parole les plus en pointe, systèmes qui rivalisent désormais avec les humains dans certaines conditions. Cependant, les mel-filterbanks, comme toute représentation fixée, sont fondamentalement limitées par le fait qu’elles ne soient pas affinées par apprentissage pour la tâche considérée. Nous formulons l’hypothèse qu’apprendre ces représentations de bas niveau de la parole, conjontement avec le modèle, permettrait de faire avancer davantage l’état de l’art. Nous explorons tout d’abord des approches d’apprentissage faiblement supervisé et montrons que nous pouvons entraîner un unique réseau de neurones à séparer l’information phonétique de celle du locuteur à partir de descripteurs spectraux ou du signal brut et que ces représentations se transfèrent à travers les langues. De plus, apprendre à partir du signal brut produit des représentations du locuteur significativement meilleures que celles d’un modèle entraîné sur des mel-filterbanks. Ces résultats encourageants nous mènent par la suite à développer une alternative aux mel-filterbanks qui peut être entraînée à partir des données. Dans la seconde partie de cette thèse, nous proposons les Time-Domain filterbanks, une architecture neuronale légère prenant en entrée la forme d’onde, dont on peut initialiser les poids pour répliquer les mel-filterbanks et qui peut, par la suite, être entraînée par rétro-propagation avec le reste du réseau de neurones. Au cours d’expériences systématiques et approfondies, nous montrons que les Time-Domain filterbanks surclassent systématiquement les melfilterbanks, et peuvent être intégrées dans le premier système de reconnaissance de la parole purement convolutif et entraîné à partir du signal brut, qui constitue actuellement un nouvel état de l’art. Les descripteurs fixes étant également utilisés pour des tâches de classification non-linguistique, pour lesquelles elles sont d’autant moins optimales, nous entraînons un système de détection de dysarthrie à partir du signal brut, qui surclasse significativement un système équivalent entraîné sur des mel-filterbanks ou sur des descripteurs de bas niveau. Enfin, nous concluons cette thèse en expliquant en quoi nos contributions s’inscrivent dans une transition plus large vers des systèmes de compréhension du son qui pourront être appris de bout en bout
While deep neural networks are now used in almost every component of a speech recognition system, from acoustic to language modeling, the input to such systems are still fixed, handcrafted, spectral features such as mel-filterbanks. This contrasts with computer vision, in which a deep neural network is now trained on raw pixels. Mel-filterbanks contain valuable and documented prior knowledge from human auditory perception as well as signal processing, and are the input to state-of-the-art speech recognition systems that are now on par with human performance in certain conditions. However, mel-filterbanks, as any fixed representation, are inherently limited by the fact that they are not fine-tuned for the task at hand. We hypothesize that learning the low-level representation of speech with the rest of the model, rather than using fixed features, could push the state-of-the art even further. We first explore a weakly-supervised setting and show that a single neural network can learn to separate phonetic information and speaker identity from mel-filterbanks or the raw waveform, and that these representations are robust across languages. Moreover, learning from the raw waveform provides significantly better speaker embeddings than learning from mel-filterbanks. These encouraging results lead us to develop a learnable alternative to mel-filterbanks, that can be directly used in replacement of these features. In the second part of this thesis we introduce Time-Domain filterbanks, a lightweight neural network that takes the waveform as input, can be initialized as an approximation of mel-filterbanks, and then learned with the rest of the neural architecture. Across extensive and systematic experiments, we show that Time-Domain filterbanks consistently outperform melfilterbanks and can be integrated into a new state-of-the-art speech recognition system, trained directly from the raw audio signal. Fixed speech features being also used for non-linguistic classification tasks for which they are even less optimal, we perform dysarthria detection from the waveform with Time-Domain filterbanks and show that it significantly improves over mel-filterbanks or low-level descriptors. Finally, we discuss how our contributions fall within a broader shift towards fully learnable audio understanding systems
APA, Harvard, Vancouver, ISO, and other styles
8

Bando, Yoshiaki. "Robust Audio Scene Analysis for Rescue Robots." Kyoto University, 2018. http://hdl.handle.net/2433/232410.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Moghimi, Amir Reza. "Array-based Spectro-temporal Masking For Automatic Speech Recognition." Research Showcase @ CMU, 2014. http://repository.cmu.edu/dissertations/334.

Full text
Abstract:
Over the years, a variety of array processing techniques have been applied to the problem of enhancing degraded speech to improve automatic speech recognition. In this context, linear beamforming has long been the approach of choice, for reasons including good performance, robustness and analytical simplicity. While various non-linear techniques - typically based to some extent on the study of auditory scene analysis - have also been of interest, they tend to lag behind their linear counterparts in terms of simplicity, scalability and exibility. Nonlinear techniques are also more difficult to analyze and lack the systematic descriptions available in the study of linear beamformers. This work focuses on a class of nonlinear processing, known as time-frequency (T-F) masking - a.k.a. spectro-temporal masking { whose variants comprise a significant portion of the existing techniques. T-F masking is based on accepting or rejecting individual time-frequency cells based on some estimate of local signal quality. Analyses are developed that attempt to mirror the beam patterns used to describe linear processing, leading to a view of T-F masking as "nonlinear beamforming". Two distinct formulations of these "nonlinear beam patterns" are developed, based on different metrics of the algorithms behavior; these formulations are modeled in a variety of scenarios to demonstrate the flexibility of the idea. While these patterns are not quite as simple or all-encompassing as traditional beam patterns in microphone-array processing, they do accurately represent the behavior of masking algorithms in analogous and intuitive ways. In addition to analyzing this class of nonlinear masking algorithm, we also attempt to improve its performance in a variety of ways. Improvements are proposed to the baseline two-channel version of masking, by addressing both the mask estimation and the signal reconstruction stages; the latter more successfully than the former. Furthermore, while these approaches have been shown to outperform linear beamforming in two-sensor arrays, extensions to larger arrays have been few and unsuccessful. We find that combining beamforming and masking is a viable method of bringing the benefits of masking to larger arrays. As a result, a hybrid beamforming-masking approach, called "post-masking", is developed that improves upon the performance of MMSE beamforming (and can be used with any beamforming technique), with the potential for even greater improvement in the future.
APA, Harvard, Vancouver, ISO, and other styles
10

Brangers, Kirstin M. "Perceptual Ruler for Quantifying Speech Intelligibility in Cocktail Party Scenarios." UKnowledge, 2013. http://uknowledge.uky.edu/ece_etds/31.

Full text
Abstract:
Systems designed to enhance intelligibility of speech in noise are difficult to evaluate quantitatively because intelligibility is subjective and often requires feedback from large populations for consistent evaluations. Attempts to quantify the evaluation have included related measures such as the Speech Intelligibility Index. These require separating speech and noise signals, which precludes its use on experimental recordings. This thesis develops a procedure using an Intelligibility Ruler (IR) for efficiently quantifying intelligibility. A calibrated Mean Opinion Score (MOS) method is also implemented in order to compare repeatability over a population of 24 subjective listeners. Results showed that subjects using the IR consistently estimated SII values of the test samples with an average standard deviation of 0.0867 between subjects on a scale from zero to one and R2=0.9421. After a calibration procedure from a subset of subjects, the MOS method yielded similar results with an average standard deviation of 0.07620 and R2=0.9181.While results suggest good repeatability of the IR method over a broad range of subjects, the calibrated MOS method is capable of producing results more closely related to actual SII values and is a simpler procedure for human subjects.
APA, Harvard, Vancouver, ISO, and other styles

Books on the topic "Speech and audio signals"

1

Gold, Ben, Nelson Morgan, and Dan Ellis. Speech and Audio Signal Processing. Hoboken, NJ, USA: John Wiley & Sons, Inc., 2011. http://dx.doi.org/10.1002/9781118142882.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Deller, John R. Discrete-time processing of speech signals. New York: Institute of Electrical and Electronics Engineers, 2000.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
3

G, Proakis John, and Hansen John H. L, eds. Discrete-time processing of speech signals. New York: Macmillan Pub. Co., 1993.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
4

V, Madisetti, ed. Video, speech, and audio signal processing and associated standards. Boca Raton, FL: CRC Press, 2009.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
5

V, Madisetti, ed. Video, speech, and audio signal processing and associated standards. Boca Raton, FL: CRC Press, 2009.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
6

Gold, Bernard. Speech and audio signal processing: Processing and perception of speech and music. 2nd ed. Hoboken, N.J: Wiley, 2011.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
7

Nelson, Morgan, ed. Speech and audio signal processing: Processing and perception of speech and music. New York: John Wiley, 2000.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
8

Madisetti, V. Video, speech, and audio signal processing and associated standards. 2nd ed. Boca Raton, FL: CRC Press, 2010.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
9

Madisetti, V. Video, speech, and audio signal processing and associated standards. 2nd ed. Boca Raton, FL: CRC Press, 2010.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
10

S, Atal Bishnu, Cuperman Vladimir, and Gersho Allen, eds. Speech and audio coding for wireless and network applications. Boston: Kluwer Academic Publishers, 1993.

Find full text
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Speech and audio signals"

1

Buchanan, William J. "Speech and Audio Signals." In Advanced Data Communications and Networks, 111–27. Boston, MA: Springer US, 1997. http://dx.doi.org/10.1007/978-1-4419-8670-2_8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Buchanan, Bill. "Speech and Audio Signals." In Handbook of Data Communications and Networks, 96–109. Boston, MA: Springer US, 1999. http://dx.doi.org/10.1007/978-1-4757-0905-6_9.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Buchanan, W. J. "Speech and Audio Signals." In The Handbook of Data Communications and Networks, 359–72. Boston, MA: Springer US, 2004. http://dx.doi.org/10.1007/978-1-4020-7870-5_19.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Buchanan, W. "Speech and Audio Signals." In Advanced Data Communications and Networks, 111–27. London: CRC Press, 2023. http://dx.doi.org/10.1201/9781003420415-8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Richter, Michael M., Sheuli Paul, Veton Këpuska, and Marius Silaghi. "Audio Signals and Speech Recognition." In Signal Processing and Machine Learning with Applications, 345–68. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-319-45372-9_18.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Buchner, Herbert, and Walter Kellermann. "TRINICON for Dereverberation of Speech and Audio Signals." In Speech Dereverberation, 311–85. London: Springer London, 2010. http://dx.doi.org/10.1007/978-1-84996-056-4_10.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Douglas, Scott C., and Malay Gupta. "Convolutive Blind Source Separation for Audio Signals." In Blind Speech Separation, 3–45. Dordrecht: Springer Netherlands, 2007. http://dx.doi.org/10.1007/978-1-4020-6479-1_1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Herre, Jürgen, and Manfred Lutzky. "Perceptual Audio Coding of Speech Signals." In Springer Handbook of Speech Processing, 393–410. Berlin, Heidelberg: Springer Berlin Heidelberg, 2008. http://dx.doi.org/10.1007/978-3-540-49127-9_18.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Kellermann, Walter. "Beamforming for Speech and Audio Signals." In Handbook of Signal Processing in Acoustics, 691–702. New York, NY: Springer New York, 2008. http://dx.doi.org/10.1007/978-0-387-30441-0_35.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Shamsi, Meysam, Nelly Barbot, Damien Lolive, and Jonathan Chevelu. "Mixing Synthetic and Recorded Signals for Audio-Book Generation." In Speech and Computer, 479–89. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-60276-5_46.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Speech and audio signals"

1

Shriver, Stefanie, Alan W. Black, and Ronald Rosenfeld. "Audio signals in speech interfaces." In 6th International Conference on Spoken Language Processing (ICSLP 2000). ISCA: ISCA, 2000. http://dx.doi.org/10.21437/icslp.2000-35.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

"Signal Processing. Speech and Audio Processing." In 2022 29th International Conference on Systems, Signals and Image Processing (IWSSIP). IEEE, 2022. http://dx.doi.org/10.1109/iwssip55020.2022.9854416.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Phan, Huy, Lars Hertel, Marco Maass, Radoslaw Mazur, and Alfred Mertins. "Representing nonspeech audio signals through speech classification models." In Interspeech 2015. ISCA: ISCA, 2015. http://dx.doi.org/10.21437/interspeech.2015-682.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Kammerl, Julius, Neil Birkbeck, Sasi Inguva, Damien Kelly, A. J. Crawford, Hugh Denman, Anil Kokaram, and Caroline Pantofaru. "Temporal synchronization of multiple audio signals." In ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2014. http://dx.doi.org/10.1109/icassp.2014.6854474.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

"Main Track: Speech and Audio Processing." In 2020 International Conference on Systems, Signals and Image Processing (IWSSIP). IEEE, 2020. http://dx.doi.org/10.1109/iwssip48289.2020.9145083.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Movassagh, Mahmood, Joachim Thiemann, and Peter Kabal. "Joint entropy-scalable coding of audio signals." In ICASSP 2012 - 2012 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2012. http://dx.doi.org/10.1109/icassp.2012.6288537.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

van der Waal, R. G., and R. N. J. Veldhuis. "Subband coding of stereophonic digital audio signals." In [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing. IEEE, 1991. http://dx.doi.org/10.1109/icassp.1991.151053.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Rajagopalan, R., and B. Subramanian. "Removal of impulse noise from audio and speech signals." In International Symposium on Signals, Circuits and Systems, 2003. SCS 2003. IEEE, 2003. http://dx.doi.org/10.1109/scs.2003.1226973.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Ziolko, Mariusz, Bartosz Ziolko, and Rafal Samborski. "Dual-Microphone Speech Extraction from Signals with Audio Background." In 2009 Fifth International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP). IEEE, 2009. http://dx.doi.org/10.1109/iih-msp.2009.34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Braun, Jerome J., and Haim Levkowitz. "Internet-oriented visualization with audio presentation of speech signals." In Photonics West '98 Electronic Imaging, edited by Robert F. Erbacher and Alex Pang. SPIE, 1998. http://dx.doi.org/10.1117/12.309555.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "Speech and audio signals"

1

DeLeon, Phillip L. Techniques for Preprocessing Speech Signals for More Effective Audio Interfaces. Fort Belvoir, VA: Defense Technical Information Center, December 2001. http://dx.doi.org/10.21236/ada412195.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Mammone, Richard J., Khaled Assaleh, Kevin Farrell, Ravi Ramachandran, and Mihailo Zilovic. A Modulation Model for Characterizing Speech Signals. Fort Belvoir, VA: Defense Technical Information Center, March 1996. http://dx.doi.org/10.21236/ada311661.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Spittka, J., and K. Vos. RTP Payload Format for the Opus Speech and Audio Codec. RFC Editor, June 2015. http://dx.doi.org/10.17487/rfc7587.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Herrnstein, A. Start/End Delays of Voiced and Unvoiced Speech Signals. Office of Scientific and Technical Information (OSTI), September 1999. http://dx.doi.org/10.2172/15006006.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Chan, A. D., K. Englehart, B. Hudgins, and D. F. Lovely. Hidden Markov Model Classification of Myoelectric Signals in Speech. Fort Belvoir, VA: Defense Technical Information Center, October 2001. http://dx.doi.org/10.21236/ada410037.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

STANDARD OBJECT SYSTEMS INC. Advanced Audio Interface for Phonetic Speech Recognition in a High Noise Environment. Fort Belvoir, VA: Defense Technical Information Center, January 2000. http://dx.doi.org/10.21236/ada373461.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Nelson, W. T., Robert S. Bolia, Mark A. Ericson, and Richard L. McKinley. Spatial Audio Displays for Speech Communications: A Comparison of Free Field and Virtual Acoustic Environments. Fort Belvoir, VA: Defense Technical Information Center, January 1999. http://dx.doi.org/10.21236/ada430289.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Nelson, W. T., Robert S. Bolia, Mark A. Ericson, and Richard L. McKinley. Monitoring the Simultaneous Presentation of Spatialized Speech Signals in a Virtual Acoustic Environment. Fort Belvoir, VA: Defense Technical Information Center, January 1998. http://dx.doi.org/10.21236/ada430284.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Nelson, W. T., Robert S. Bolia, Mark A. Ericson, and Richard L. McKinley. Monitoring the Simultaneous Presentation of Multiple Spatialized Speech Signals in the Free Field. Fort Belvoir, VA: Defense Technical Information Center, January 1998. http://dx.doi.org/10.21236/ada430298.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Hamlin, Alexandra, Erik Kobylarz, James Lever, Susan Taylor, and Laura Ray. Assessing the feasibility of detecting epileptic seizures using non-cerebral sensor. Engineer Research and Development Center (U.S.), December 2021. http://dx.doi.org/10.21079/11681/42562.

Full text
Abstract:
This paper investigates the feasibility of using non-cerebral, time-series data to detect epileptic seizures. Data were recorded from fifteen patients (7 male, 5 female, 3 not noted, mean age 36.17 yrs), five of whom had a total of seven seizures. Patients were monitored in an inpatient setting using standard video electroencephalography (vEEG), while also wearing sensors monitoring electrocardiography, electrodermal activity, electromyography, accelerometry, and audio signals (vocalizations). A systematic and detailed study was conducted to identify the sensors and the features derived from the non-cerebral sensors that contribute most significantly to separability of data acquired during seizures from non-seizure data. Post-processing of the data using linear discriminant analysis (LDA) shows that seizure data are strongly separable from non-seizure data based on features derived from the signals recorded. The mean area under the receiver operator characteristic (ROC) curve for each individual patient that experienced a seizure during data collection, calculated using LDA, was 0.9682. The features that contribute most significantly to seizure detection differ for each patient. The results show that a multimodal approach to seizure detection using the specified sensor suite is promising in detecting seizures with both sensitivity and specificity. Moreover, the study provides a means to quantify the contribution of each sensor and feature to separability. Development of a non-electroencephalography (EEG) based seizure detection device would give doctors a more accurate seizure count outside of the clinical setting, improving treatment and the quality of life of epilepsy patients.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography