To see the other types of publications on this topic, follow the link: Signal processing; Voice recognition.

Journal articles on the topic 'Signal processing; Voice recognition'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Signal processing; Voice recognition.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Hu, J., C. C. Cheng, and W. H. Liu. "Processing of speech signals using a microphone array for intelligent robots." Proceedings of the Institution of Mechanical Engineers, Part I: Journal of Systems and Control Engineering 219, no. 2 (March 1, 2005): 133–43. http://dx.doi.org/10.1243/095965105x9461.

Full text
Abstract:
For intelligent robots to interact with people, an efficient human-robot communication interface is very important (e.g. voice command). However, recognizing voice command or speech represents only part of speech communication. The physics of speech signals includes other information, such as speaker direction. Secondly, a basic element of processing the speech signal is recognition at the acoustic level. However, the performance of recognition depends greatly on the reception. In a noisy environment, the success rate can be very poor. As a result, prior to speech recognition, it is important to process the speech signals to extract the needed content while rejecting others (such as background noise). This paper presents a speech purification system for robots to improve the signal-to-noise ratio of reception and an algorithm with a multidirection calibration beamformer.
APA, Harvard, Vancouver, ISO, and other styles
2

Uzdy, Z. "Human speaker recognition performance of LPC voice processors." IEEE Transactions on Acoustics, Speech, and Signal Processing 33, no. 3 (June 1985): 752–53. http://dx.doi.org/10.1109/tassp.1985.1164606.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

M Tasbolatov, N. Mekebayev, O. Mamyrbayev, M. Turdalyuly, D. Oralbekova,. "Algorithms and architectures of speech recognition systems." Psychology and Education Journal 58, no. 2 (February 20, 2021): 6497–501. http://dx.doi.org/10.17762/pae.v58i2.3182.

Full text
Abstract:
Digital processing of speech signal and the voice recognition algorithm is very important for fast and accurate automatic scoring of the recognition technology. A voice is a signal of infinite information. The direct analysis and synthesis of a complex speech signal is due to the fact that the information is contained in the signal. Speech is the most natural way of communicating people. The task of speech recognition is to convert speech into a sequence of words using a computer program. This article presents an algorithm of extracting MFCC for speech recognition. The MFCC algorithm reduces the processing power by 53% compared to the conventional algorithm. Automatic speech recognition using Matlab.
APA, Harvard, Vancouver, ISO, and other styles
4

Furui, Sadaoki. "Recent Advances in Voice Signal Processing. Application Technologies. Speaker Recognition." Journal of the Institute of Television Engineers of Japan 47, no. 12 (1993): 1600–1603. http://dx.doi.org/10.3169/itej1978.47.1600.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Mahalakshmi, P. "A REVIEW ON VOICE ACTIVITY DETECTION AND MEL-FREQUENCY CEPSTRAL COEFFICIENTS FOR SPEAKER RECOGNITION (TREND ANALYSIS)." Asian Journal of Pharmaceutical and Clinical Research 9, no. 9 (December 1, 2016): 360. http://dx.doi.org/10.22159/ajpcr.2016.v9s3.14352.

Full text
Abstract:
ABSTRACTObjective: The objective of this review article is to give a complete review of various techniques that are used for speech recognition purposes overtwo decades.Methods: VAD-Voice Activity Detection, SAD-Speech Activity Detection techniques are discussed that are used to distinguish voiced from unvoicedsignals and MFCC- Mel Frequency Cepstral Coefficient technique is discussed which detects specific features.Results: The review results show that research in MFCC has been dominant in signal processing in comparison to VAD and other existing techniques.Conclusion: A comparison of different speaker recognition techniques that were used previously were discussed and those in current research werealso discussed and a clear idea of the better technique was identified through the review of multiple literature for over two decades.Keywords: Cepstral analysis, Mel-frequency cepstral coefficients, signal processing, speaker recognition, voice activity detection.
APA, Harvard, Vancouver, ISO, and other styles
6

Mühl, Constanze, and Patricia EG Bestelmeyer. "Assessing susceptibility to distraction along the vocal processing hierarchy." Quarterly Journal of Experimental Psychology 72, no. 7 (October 31, 2018): 1657–66. http://dx.doi.org/10.1177/1747021818807183.

Full text
Abstract:
Recent models of voice perception propose a hierarchy of steps leading from a more general, “low-level” acoustic analysis of the voice signal to a voice-specific, “higher-level” analysis. We aimed to engage two of these stages: first, a more general detection task in which voices had to be identified amid environmental sounds, and, second, a more voice-specific task requiring a same/different decision about unfamiliar speaker pairs (Bangor Voice Matching Test [BVMT]). We explored how vulnerable voice recognition is to interfering distractor voices, and whether performance on the aforementioned tasks could predict resistance against such interference. In addition, we manipulated the similarity of distractor voices to explore the impact of distractor similarity on recognition accuracy. We found moderate correlations between voice detection ability and resistance to distraction ( r = .44), and BVMT and resistance to distraction ( r = .57). A hierarchical regression revealed both tasks as significant predictors of the ability to tolerate distractors ( R2 = .36). The first stage of the regression (BVMT as sole predictor) already explained 32% of the variance. Descriptively, the “higher-level” BVMT was a better predictor (β = .47) than the more general detection task (β = .25), although further analysis revealed no significant difference between both beta weights. Furthermore, distractor similarity did not affect performance on the distractor task. Overall, our findings suggest the possibility to target specific stages of the voice perception process. This could help explore different stages of voice perception and their contributions to specific auditory abilities, possibly also in forensic and clinical settings.
APA, Harvard, Vancouver, ISO, and other styles
7

Djara, Tahirou, Abdoul Matine Ousmane, and Antoine Vianou. "Emotional State Recognition Using Facial Expression, Voice, and Physiological Signal." International Journal of Robotics Applications and Technologies 6, no. 1 (January 2018): 1–20. http://dx.doi.org/10.4018/ijrat.2018010101.

Full text
Abstract:
Emotion recognition is an important aspect of affective computing, one of whose aims is the study and development of behavioral and emotional interaction between human and machine. In this context, another important point concerns acquisition devices and signal processing tools which lead to an estimation of the emotional state of the user. This article presents a survey about concepts around emotion, multimodality in recognition, physiological activities and emotional induction, methods and tools for acquisition and signal processing with a focus on processing algorithm and their degree of reliability.
APA, Harvard, Vancouver, ISO, and other styles
8

P, Ramadevi, and . "A Novel User Interface for Text Dependent Human Voice Recognition System." International Journal of Engineering & Technology 7, no. 4.6 (September 25, 2018): 285. http://dx.doi.org/10.14419/ijet.v7i4.6.20714.

Full text
Abstract:
In an effort to provide a more efficient representation of the speech signal, the application of the wavelet analysis is considered. This research presents an effective and robust method for extracting features for speech processing. Here, we proposed a novel user interface for Text Dependent Human Voice Recognition (TD-HVR) system. The proposed HVR model utilizes decimated bi-orthogonal wavelet transform (DBT) approach to extract the low level features from the given input voice signal, then the noise elimination will be done by band pass filtering followed by normalization for better quality of a voice signal and finally the formants of a train and test voices will be calculated by using the Additive Prognostication (AP) algorithm. Simulation results have been compared with the existing HVR schemes, and shown that the proposed user interface system has performed superior to the conventional HVR systems with an accuracy rate of approximately 99 %.
APA, Harvard, Vancouver, ISO, and other styles
9

P, Ramadevi, and . "A Novel User Interface for Text Dependent Human Voice Recognition System." International Journal of Engineering & Technology 7, no. 4.6 (September 25, 2018): 258. http://dx.doi.org/10.14419/ijet.v7i4.6.21193.

Full text
Abstract:
In an effort to provide a more efficient representation of the speech signal, the application of the wavelet analysis is considered. This research presents an effective and robust method for extracting features for speech processing. Here, we proposed a novel user interface for Text Dependent Human Voice Recognition (TD-HVR) system. The proposed HVR model utilizes decimated bi-orthogonal wavelet transform (DBT) approach to extract the low level features from the given input voice signal, then the noise elimination will be done by band pass filtering followed by normalization for better quality of a voice signal and finally the formants of a train and test voices will be calculated by using the Additive Prognostication (AP) algorithm. Simulation results have been compared with the existing HVR schemes, and shown that the proposed user interface system has performed superior to the conventional HVR systems with an accuracy rate of approximately 99 %.
APA, Harvard, Vancouver, ISO, and other styles
10

Wei, Yan Ping, and Hai Liu Xiao. "Design of Voice Signal Visualization Acquisition System Based on Sound Card and MATLAB." Applied Mechanics and Materials 716-717 (December 2014): 1272–76. http://dx.doi.org/10.4028/www.scientific.net/amm.716-717.1272.

Full text
Abstract:
With the development of computer technology and information technology, voice interaction has become a necessary means of human-computer interaction, and voice signal acquisition and processing is the precondition and foundation of human-computer interaction. This paper introduces the MATLAB visualization method into voice signal acquisition system, and uses MATLAB programming method to drive sound card directly, which realizes the identification and acquisition of voice signal and designs a new voice signal visualization acquisition system. In order to optimize the system, this paper introduces the variance analysis algorithm into the design of visualization system, which realizes the optimization of voice signal recognition model with different level parameters. At the end this paper does numerical simulation on the speech signal acquisition system; through signal acquisition 2D and 3D visualization voice signals are obtained. It extracts single signal characteristics, which provides a theoretical reference for the design of signal acquisition system.
APA, Harvard, Vancouver, ISO, and other styles
11

Sani, Dian Ahkam, and Muchammad Saifulloh. "Speech to Text Processing for Interactive Agent of Virtual Tour Navigation." International Journal of Artificial Intelligence & Robotics (IJAIR) 1, no. 1 (October 31, 2019): 31. http://dx.doi.org/10.25139/ijair.v1i1.2030.

Full text
Abstract:
The development of science and technology is one way to replace the method of human interaction with computers, one of which is to provide voice input. Conversion of sound into text form with the Backpropagation method can be understood and realized through feature extraction, including the use of Linear Predictive Coding (LPC). Linear Predictive Coding is one way to represent the signal in obtaining the features of each sound pattern. In brief, the way this speech recognition system worked was by inputting human voice through a microphone (analog signal) which then sampled with a sampling speed of 8000 Hz so that it became a digital signal with the assistance of sound card on the computer. The digital signal from the sample then entered the initial process using LPC, so that several LPC coefficients were obtained. The LPC outputs were then trained using the Backpropagation learning method. The results of the learning were classified with a word and stored in a database afterwards. The results of the test were in the form of an introduction program that able display the voice plots. the results of speech recognition with voice recognition percentage of respondents in the database iss 80% of the 100 data in the test in Real Time
APA, Harvard, Vancouver, ISO, and other styles
12

Watanabe, Takao. "Recent Advances in Voice Signal Processing. Fundamental Technologies. Continuous Speech Recognition." Journal of the Institute of Television Engineers of Japan 47, no. 12 (1993): 1583–87. http://dx.doi.org/10.3169/itej1978.47.1583.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

Yin, Shu Hua. "Design of the Auxiliary Speech Recognition System of Super-Short-Range Reconnaissance Radar." Applied Mechanics and Materials 556-562 (May 2014): 4830–34. http://dx.doi.org/10.4028/www.scientific.net/amm.556-562.4830.

Full text
Abstract:
To improve the usability and operability of the hybrid-identification reconnaissance radar for individual use, a voice identification System was designed. By using SPCE061A audio signal microprocessor as the core, a digital signal processing technology was used to obtain Doppler radar signals of audio segments by audio cable. Afterwards, the A/D acquisition was conducted to acquire digital signals, and then the data obtained were preprocessed and adaptively filtered to eliminate background noises. Moreover, segmented FFT transforming was used to identify the types of the signals. The overall design of radar voice recognition for an individual soldier was thereby fulfilled. The actual measurements showed that the design of the circuit improved radar resolution and the accuracy of the radar identification.
APA, Harvard, Vancouver, ISO, and other styles
14

HimaBindu, Gottumukkala, Gondi Lakshmeeswari, Giddaluru Lalitha, and Pedalanka P. S. Subhashini. "Recognition Using DNN with Bacterial Foraging Optimization Using MFCC Coefficients." Journal Européen des Systèmes Automatisés 54, no. 2 (April 27, 2021): 283–87. http://dx.doi.org/10.18280/jesa.540210.

Full text
Abstract:
Speech is an important mode of communication for people. For a long time, researchers have been working hard to develop conversational machines which will communicate with speech technology. Voice recognition is a part of a science called signal processing. Speech recognition is becoming more successful for providing user authentication. The process of user recognition is becoming more popular now a days for providing security by authenticating the users. With the rising importance of automated information processing and telecommunications, the usefulness of recognizing an individual from the features of user voice is increasing. In this paper, the three stages of speech recognition processing are defined as pre-processing, feature extraction and decoding. Speech comprehension has been significantly enhanced by using foreign languages. Automatic Speech Recognition (ASR) aims to translate text to speech. Speaker recognition is the method of recognizing an individual through his/her voice signals. The new speaker initially privileges identity for speaker authentication, and then the stated model is used for identification. The identity argument is approved when the match is above a predefined threshold. The speech used for these tasks may be either text-dependent or text-independent. The article uses Bacterial Foraging Optimization Algorithm (BFO) for accurate speech recognition through Mel Frequency Cepstral Coefficients (MFCC) model using DNN. Speech recognition efficiency is compared to that of the conventional system.
APA, Harvard, Vancouver, ISO, and other styles
15

Mispagel, Karen M., and Michael Valente. "Effect of Multichannel Digital Signal Processing on Loudness Comfort, Sentence Recognition, and Sound Quality." Journal of the American Academy of Audiology 17, no. 10 (November 2006): 681–707. http://dx.doi.org/10.3766/jaaa.17.10.2.

Full text
Abstract:
This study evaluated the effect of increasing the number of processing channels from 32- to 64-signal processing channels on subjects' loudness comfort and satisfaction, sentence recognition, and sound quality of his or her own voice. Ten experienced hearing aid users with mild-to-moderate sensorineural hearing loss wore behind-the-ear (BTE) hearing aids with Adaptive Dynamic Range Optimization (ADRO™) signal processing for a period of six weeks in the 32-channel and 64-channel conditions. Results revealed no significant differences in loudness comfort or satisfaction for the majority of sound samples as measured by the Subjective Loudness Test and Environmental Sounds Questionnaire. No significant differences in sentence recognition between the two processing conditions were found as measured by the Hearing In Noise Test (HINT). Additionally, no subjective differences in sound quality of subjects' own voice were determined by the Listening Tasks Questionnaire.
APA, Harvard, Vancouver, ISO, and other styles
16

Nair, Vani, Pooja Pillai, Anupama Subramanian, Sarah Khalife, and Dr Madhu Nashipudimath. "Voice Feature Extraction for Gender and Emotion Recognition." International Journal on Recent and Innovation Trends in Computing and Communication 9, no. 5 (May 31, 2021): 17–22. http://dx.doi.org/10.17762/ijritcc.v9i5.5463.

Full text
Abstract:
Voice recognition plays a key role in spoken communication that helps to identify the emotions of a person that reflects in the voice. Gender classification through speech is a widely used Human Computer Interaction (HCI) as it is not easy to identify gender by computer. This led to the development of a model for “Voice feature extraction for Emotion and Gender Recognition”. The speech signal consists of semantic information, speaker information (gender, age, emotional state), accompanied by noise. Females and males have different voice characteristics due to their acoustical and perceptual differences along with a variety of emotions which convey their own unique perceptions. In order to explore this area, feature extraction requires pre- processing of data, which is necessary for increasing the accuracy. The proposed model follows steps such as data extraction, pre- processing using Voice Activity Detector (VAD), feature extraction using Mel-Frequency Cepstral Coefficient (MFCC), feature reduction by Principal Component Analysis (PCA) and Support Vector Machine (SVM) classifier. The proposed combination of techniques produced better results which can be useful in the healthcare sector, virtual assistants, security purposes and other fields related to the Human Machine Interaction domain.
APA, Harvard, Vancouver, ISO, and other styles
17

Nashipudimath, Madhu M., Pooja Pillai, Anupama Subramanian, Vani Nair, and Sarah Khalife. "Voice Feature Extraction for Gender and Emotion Recognition." ITM Web of Conferences 40 (2021): 03008. http://dx.doi.org/10.1051/itmconf/20214003008.

Full text
Abstract:
Voice recognition plays a key function in spoken communication that facilitates identifying the emotions of a person that reflects within the voice. Gender classification through speech is a popular Human Computer Interaction (HCI) method on account that determining gender through computer is hard. This led to the development of a model for "Voice feature extraction for Emotion and Gender Recognition". The speech signal consists of semantic information, speaker information (gender, age, emotional state), accompanied by noise. Females and males have specific vocal traits because of their acoustical and perceptual variations along with a variety of emotions which bring their own specific perceptions. In order to explore this area, feature extraction requires pre-processing of data, which is necessary for increasing the accuracy. The proposed model follows steps such as data extraction, pre-processing using Voice Activity Detector(VAD), feature extraction using Mel-Frequency Cepstral Coefficient(MFCC), feature reduction by Principal Component Analysis(PCA) and Support Vector Machine (SVM) classifier. The proposed combination of techniques produced better results which can be useful in healthcare sector, virtual assistants, security purposes and other fields related to Human Machine Interaction domain.
APA, Harvard, Vancouver, ISO, and other styles
18

Mohd Hanifa, Rafizah, Khalid Isa, Shamsul Mohamad, Shaharil Mohd Shah, Shelena Soosay Nathan, Rosni Ramle, and Mazniha Berahim. "Voiced and unvoiced separation in malay speech using zero crossing rate and energy." Indonesian Journal of Electrical Engineering and Computer Science 16, no. 2 (November 1, 2019): 775. http://dx.doi.org/10.11591/ijeecs.v16.i2.pp775-780.

Full text
Abstract:
<p>This paper contributes to the literature on voice-recognition in the context of non-English language. Specifically, it aims to validate the techniques used to present the basic characteristics of speech, viz. voiced and unvoiced, that need to be evaluated when analysing speech signals. Zero Crossing Rate (ZCR) and Short Time Energy (STE) are used in this paper to perform signal pre-processing of continuous Malay speech to separate the voiced and unvoiced parts. The study is based on non-real time data which was developed from a collection of audio speeches. The signal is assessed using ZCR and STE for comparison purposes. The results revealed that ZCR are low for voiced part and high for unvoiced part whereas the STE is high for voiced part and low for unvoiced part. Thus, these two techniques can be used effectively for separating voiced and unvoiced for continuous Malay speech.</p>
APA, Harvard, Vancouver, ISO, and other styles
19

Shi, Li Juan, Ping Feng, Jian Zhao, Li Rong Wang, and Na Che. "Study on Dual Mode Fusion Method of Video and Audio." Applied Mechanics and Materials 734 (February 2015): 412–15. http://dx.doi.org/10.4028/www.scientific.net/amm.734.412.

Full text
Abstract:
In order to solve the hearing-impaired students in class only rely on sign language, amount of classroom information received less, This paper studies video and audio dual mode fusion algorithm combined with lip reading、speech recognition technology and information fusion technology.First ,speech feature extraction, processing of speech signal, the speech synchronization output text. At the same time, extraction of video features, voice and video signal fusion, Make voice information into visual information that the hearing-impaired students can receive. Make the students receive text messages as receive visual information, improve speech recognition rate, so meet the need of the classroom teaching for hearing-impaired students.
APA, Harvard, Vancouver, ISO, and other styles
20

Hekiert, Daniela, and Magdalena Igras-Cybulska. "Capturing emotions in voice: A comparative analysis of methodologies in psychology and digital signal processing." Roczniki Psychologiczne 22, no. 1 (November 19, 2019): 15–34. http://dx.doi.org/10.18290/rpsych.2019.22.1-2.

Full text
Abstract:
People use their voices to communicate not only verbally but also emotionally. This article presents theories and methodologies that concern emotional vocalizations at the intersection of psychology and digital signal processing. Specifically, it demonstrates the encoding (production) and decoding (recognition) of emotional sounds, including the review and comparison of strategies in database design, parameterization, and classification. Whereas psychology predominantly focuses on the subjective recognition of emotional vocalizations, digital signal processing relies on automated and thus more objective vocal affect measures. The article aims to compare these two approaches and suggest methods of combining them to achieve a more complex insight into the vocal communication of emotions.
APA, Harvard, Vancouver, ISO, and other styles
21

Xu, Yang, Zhe Zhang, and Zhi Yu Huang. "Vehicle Embedded Speech Recognition and Control System Research and Implementation." Applied Mechanics and Materials 494-495 (February 2014): 104–7. http://dx.doi.org/10.4028/www.scientific.net/amm.494-495.104.

Full text
Abstract:
For the driver in the process of moving inconvenient to manually operated vehicle electronics, as well as the monopoly of foreign technology and other issues related , a framework based on DSP + MCU car speech recognition and control systems is designed. According to the embedded application environment, the corresponding recognition algorithm and the hardware architecture of DSP + MCU are chosen, in which DSP is mainly responsible for voice signal processing work, MCU is responsible for communicating with DSP and MCU to obtain recognition results after speech signal processing, as the final control instructions .The experimental results show that the hardware platform can run normally, and control the car body light on experimental bench.
APA, Harvard, Vancouver, ISO, and other styles
22

Kang, Sang-Ick, and Sangmin Lee. "Improvement of Speech/Music Classification for 3GPP EVS Based on LSTM." Symmetry 10, no. 11 (November 7, 2018): 605. http://dx.doi.org/10.3390/sym10110605.

Full text
Abstract:
The competition of speech recognition technology related to smartphones is now getting into full swing with the widespread internet of thing (IoT) devices. For robust speech recognition, it is necessary to detect speech signals in various acoustic environments. Speech/music classification that facilitates optimized signal processing from classification results has been extensively adapted as an essential part of various electronics applications, such as multi-rate audio codecs, automatic speech recognition, and multimedia document indexing. In this paper, we propose a new technique to improve robustness of a speech/music classifier for an enhanced voice service (EVS) codec adopted as a voice-over-LTE (VoLTE) speech codec using long short-term memory (LSTM). For effective speech/music classification, feature vectors implemented with the LSTM are chosen from the features of the EVS. To overcome the diversity of music data, a large scale of data is used for learning. Experiments show that LSTM-based speech/music classification provides better results than the conventional EVS speech/music classification algorithm in various conditions and types of speech/music data, especially at lower signal-to-noise ratio (SNR) than conventional EVS algorithm.
APA, Harvard, Vancouver, ISO, and other styles
23

Czap, Laszlo, and Judit Pinter. "Noise Reduction in Voice Controlled Logistic Systems." Applied Mechanics and Materials 309 (February 2013): 260–67. http://dx.doi.org/10.4028/www.scientific.net/amm.309.260.

Full text
Abstract:
The most comfortable way of human communication is speech, which is a possible channel of human-machine interface as well. Moreover, a voice driven system can be controlled with busy hands. Performance of a speech recognition system is highly decayed by presence of noise. Logistic systems typically work in noisy environment, so noise reduction is crucial in industrial speech processing systems. Traditional noise reduction procedures (e.g. Wiener and Kalman filters) are effective on stationary or Gaussian noise. The noise of a real workplace can be captured by an additional microphone: The voice microphone takes both speech and noise, while the noise mike takes only the noise signal. Because of the phase shift of the two signals, simple subtraction in time domain is ineffective. In this paper, we discuss a spectral representation modeling the noise and voice signals. A frequency spectrum based noise cancellation method is proposed and verified in real industrial environment.
APA, Harvard, Vancouver, ISO, and other styles
24

de Abreu, Caio Cesar Enside, Marco Aparecido Queiroz Duarte, Bruno Rodrigues de Oliveira, Jozue Vieira Filho, and Francisco Villarreal. "Regression-Based Noise Modeling for Speech Signal Processing." Fluctuation and Noise Letters 20, no. 03 (January 30, 2021): 2150022. http://dx.doi.org/10.1142/s021947752150022x.

Full text
Abstract:
Speech processing systems are very important in different applications involving speech and voice quality such as automatic speech recognition, forensic phonetics and speech enhancement, among others. In most of them, the acoustic environmental noise is added to the original signal, decreasing the signal-to-noise ratio (SNR) and the speech quality by consequence. Therefore, estimating noise is one of the most important steps in speech processing whether to reduce it before processing or to design robust algorithms. In this paper, a new approach to estimate noise from speech signals is presented and its effectiveness is tested in the speech enhancement context. For this purpose, partial least squares (PLS) regression is used to model the acoustic environment (AE) and a Wiener filter based on a priori SNR estimation is implemented to evaluate the proposed approach. Six noise types are used to create seven acoustically modeled noises. The basic idea is to consider the AE model to identify the noise type and estimate its power to be used in a speech processing system. Speech signals processed using the proposed method and classical noise estimators are evaluated through objective measures. Results show that the proposed method produces better speech quality than state-of-the-art noise estimators, enabling it to be used in real-time applications in the field of robotic, telecommunications and acoustic analysis.
APA, Harvard, Vancouver, ISO, and other styles
25

Xue, Lei, Zhi Zhang, Xiaoyang Zhang, and Yiwen Zhang. "Research and Implementation of Children’s Speech Signal Processing System." Open Biomedical Engineering Journal 9, no. 1 (August 31, 2015): 188–93. http://dx.doi.org/10.2174/1874120701509010188.

Full text
Abstract:
As people's living standard and the degree of mass culture have been constantly improved, many families are caring more about the healthy growth of early childhood. In this paper, based on the research of domestic and foreign experts and scholars: the guardians (such as parents) take appropriate intervention on children at the early stage can effectively promote children's language and cognitive ability development, and the intervention has obvious effect on the autistic spectrum disorders of children. This paper presents a system for analyzing children's speech signal, calculating the guardian’s language words, the number of children‘s verbal words and the number of guardian and children's dialogue rotation times by voice signal processing and pattern recognition technology. And related personnels use these indicators to analysis the development of children's language and cognitive ability, then adopting appropriate measures for children and providing the basis for decision-making criteria, so as to promote the development of children's language and cognitive status.
APA, Harvard, Vancouver, ISO, and other styles
26

Manoharan, Samuel, and Narain Ponraj. "Analysis of Complex Non-Linear Environment Exploration in Speech Recognition by Hybrid Learning Technique." December 2020 2, no. 4 (February 19, 2021): 202–9. http://dx.doi.org/10.36548//jiip.2020.4.005.

Full text
Abstract:
Recently, the application of voice-controlled interfaces plays a major role in many real-time environments such as a car, smart home and mobile phones. In signal processing, the accuracy of speech recognition remains a thought-provoking challenge. The filter designs assist speech recognition systems in terms of improving accuracy by parameter tuning. This task is some degree of form filter’s narrowed specifications which lead to complex nonlinear problems in speech recognition. This research aims to provide analysis on complex nonlinear environment and exploration with recent techniques in the combination of statistical-based design and Support Vector Machine (SVM) based learning techniques. Dynamic Bayes network is a dominant technique related to speech processing characterizing stack co-occurrences. This method is derived from mathematical and statistical formalism. It is also used to predict the word sequences along with the posterior probability method with the help of phonetic word unit recognition. This research involves the complexities of signal processing that it is possible to combine sentences with various types of noises at different signal-to-noise ratios (SNR) along with the measure of comparison between the two techniques.
APA, Harvard, Vancouver, ISO, and other styles
27

Manoharan, Samuel, and Narain Ponraj. "Analysis of Complex Non-Linear Environment Exploration in Speech Recognition by Hybrid Learning Technique." December 2020 2, no. 4 (February 19, 2021): 202–9. http://dx.doi.org/10.36548/jiip.2020.4.005.

Full text
Abstract:
Recently, the application of voice-controlled interfaces plays a major role in many real-time environments such as a car, smart home and mobile phones. In signal processing, the accuracy of speech recognition remains a thought-provoking challenge. The filter designs assist speech recognition systems in terms of improving accuracy by parameter tuning. This task is some degree of form filter’s narrowed specifications which lead to complex nonlinear problems in speech recognition. This research aims to provide analysis on complex nonlinear environment and exploration with recent techniques in the combination of statistical-based design and Support Vector Machine (SVM) based learning techniques. Dynamic Bayes network is a dominant technique related to speech processing characterizing stack co-occurrences. This method is derived from mathematical and statistical formalism. It is also used to predict the word sequences along with the posterior probability method with the help of phonetic word unit recognition. This research involves the complexities of signal processing that it is possible to combine sentences with various types of noises at different signal-to-noise ratios (SNR) along with the measure of comparison between the two techniques.
APA, Harvard, Vancouver, ISO, and other styles
28

Benítez-Guijarro, Callejas, Noguera, and Benghazi. "Coordination of Speech Recognition Devices in Intelligent Environments with Multiple Responsive Devices." Proceedings 31, no. 1 (November 20, 2019): 54. http://dx.doi.org/10.3390/proceedings2019031054.

Full text
Abstract:
Devices with oral interfaces are enabling new interesting interaction scenarios and ways of interaction in ambient intelligence settings. The use of several of such devices in the same environment opens up the possibility to compare the inputs gathered from each one of them and perform a more accurate recognition and processing of user speech. However, the combination of multiple devices presents coordination challenges, as the processing of one voice signal by different speech processing units may result in conflicting outputs and it is necessary to decide which is the most reliable source. This paper presents an approach to rank several sources of spoken input in multi-device environments in order to give preference to the input with the highest estimated quality. The voice signals received by the multiple devices are assessed in terms of their calculated acoustic quality and the reliability of the speech recognition hypotheses produced. After this assessment, each input is assigned a unique score that allows the audio sources to be ranked so as to pick the best to be processed by the system. In order to validate this approach, we have performed an evaluation using a corpus of 4608 audios recorded in a two-room intelligent environment with 24 microphones. The experimental results show that our ranking approach makes it possible to successfully orchestrate an increasing number of acoustic inputs, obtaining better recognition rates than considering a single input, both in clear and noisy settings.
APA, Harvard, Vancouver, ISO, and other styles
29

Ganesh, Venkateshwaran, and C. Sujatha. "Ingenious Traffic Control System with Green Signal Timings Using Image Processing." Advanced Science, Engineering and Medicine 12, no. 3 (March 1, 2020): 337–41. http://dx.doi.org/10.1166/asem.2020.2502.

Full text
Abstract:
In metropolis, traffic congestion affects the daily routine of passengers and in the long run there will be a declination in productivity if such situation is left unaddressed. If an Ambulance, unfortunately, stuck in the middle of congested road, any delay can endanger the life of the patient and, such cases require intelligent, powerful and reliable traffic control system. In this paper, the Infra-Red (IR) Sensors keep track of vehicle density across the lane. The micro-controller in turn, generates the control signals to alter the traffic accordingly. During each transition phase, the Voice Recognition (VR) modules installed on lanes sense the emergency siren and thus temporarily allow passage by turning the signal green for the corresponding lane, while others, being remained at red. Using Image Processing analysis, the exact count of vehicles can be visualized in the Graphical User Interface (GUI) Tool and the green light timings for the consecutive turns can be estimated.
APA, Harvard, Vancouver, ISO, and other styles
30

Wheatley, Barbara, and Joseph Picone. "Voice across America: Toward robust speaker-independent speech recognition for telecommunications applications." Digital Signal Processing 1, no. 2 (April 1991): 45–63. http://dx.doi.org/10.1016/1051-2004(91)90095-3.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Navakauskas, Dalius, and Šarūnas Paulikas. "Autonomous Robot in an Adverse Environment: Intelligent Control by Voice." Solid State Phenomena 113 (June 2006): 325–28. http://dx.doi.org/10.4028/www.scientific.net/ssp.113.325.

Full text
Abstract:
In this paper we investigate the voice control of an autonomous robot in the presence of impulsive noise. We propose an original structure of the intelligent voice control system, present experimental investigation of separate modules and outline the performance of the system by the simulation example. Our approach differs from others in twofold: the noise detection is carried out by specialized artificial neural network; and the restoration of the missing speech signal is performed by using an intelligent multirate-processing scheme. The simplicity of neural network’s employment and unnecessary a priori knowledge of noise characteristics are the main merits of this approach. The employment of nonlinear re-sampling improved the precision of speech signal restoration and consequently increased the overall recognition performance.
APA, Harvard, Vancouver, ISO, and other styles
32

Gao, Mei Juan, and Zhi Xin Yang. "Research and Realization on the Voice Command Recognition System for Robot Control Based on ARM9." Applied Mechanics and Materials 44-47 (December 2010): 1422–26. http://dx.doi.org/10.4028/www.scientific.net/amm.44-47.1422.

Full text
Abstract:
In this paper, based on the study of two speech recognition algorithms, two designs of speech recognition system are given to realize this isolated speech recognition mobile robot control system based on ARM9 processor. The speech recognition process includes pretreatment of speech signal, characteristic extrication, pattern matching and post-processing. Mel-Frequency cepstrum coefficients (MFCC) and linear prediction cepstrum coefficients (LPCC) are the two most common parameters. Through analysis and comparison the parameters, MFCC shows more noise immunity than LPCC, so MFCC is selected as the characteristic parameters. Both dynamic time warping (DTW) and hidden markov model (HMM) are commonly used algorithm. For the different characteristics of DTW and HMM recognition algorithm, two different programs were designed for mobile robot control system. The effect and speed of the two speech recognition system were analyzed and compared.
APA, Harvard, Vancouver, ISO, and other styles
33

Lutsenko, K., and K. Nikulin. "VOICE SPEAKER IDENTIFICATION AS ONE OF THE CURRENT BIOMETRIC METHODS OF IDENTIFICATION OF A PERSON." Theory and Practice of Forensic Science and Criminalistics 19, no. 1 (April 2, 2020): 239–55. http://dx.doi.org/10.32353/khrife.1.2019.18.

Full text
Abstract:
The article deals with the most widespread biometric identification systems of individuals, including voice recognition of the speaker on video and sound recordings. The urgency of the topic of identification of a person is due to the active informatization of modern society and the increase of flows of confidential information. The branches of the use of biometric technologies and their general characteristics are given. Here is an overview of the use of identification groups that characterize the voice. Also in the article the division of voice identification systems into the corresponding classes is given. The main advantages of voice biometrics such as simplicity of system realization are considered; low cost (the lowest among all biometric methods); No need for contact, the voice biometry allows for long-range verification, unlike other biometric technologies. The analysis of existing methods of speech recognition recognition identifying a person by a combination of unique voice characteristics, determining their weak and strong points, on the basis of which the choice of the most appropriate method for solving the problem of text-independent recognition, Namely the model of Gaussian mixtures, was carried out. The prerequisite for the development of speech technologies is a significant increase in computing capabilities, memory capacity with a significant reduction in the size of computer systems. It should also be Noted the development of mathematical methods that make it possible to perform the Necessary processing of an audio signal by isolating informative features from it. It has been established that the development of information technologies, and the set of practical applications, which use voice recognition technologies, make this area relevant for further theoretical and practical research.
APA, Harvard, Vancouver, ISO, and other styles
34

Ma, Lina, and Yanjie Lei. "Optimization of Computer Aided English Pronunciation Teaching System Based on Speech Signal Processing Technology." Computer-Aided Design and Applications 18, S3 (October 20, 2020): 129–40. http://dx.doi.org/10.14733/cadaps.2021.s3.129-140.

Full text
Abstract:
After the development of speech signal processing technology has matured, various language learning tools have begun to emerge. The speech signal processing technology has many functions, such as standard tape reading, making audio aids, synthesizing speech, and performing speech evaluation. Therefore, the adoption of speech signal processing technology in English pronunciation teaching can meet different teaching needs. Voice signal processing technology can present teaching information in different forms, and promote multi-form communication between teachers and students, and between students and students. This will help stimulate students' interest in learning English and improve the overall teaching level of English pronunciation. This research first investigates and studies the current level of English pronunciation mastery. After combining the relevant principles of speech signal processing technology, it puts forward the areas that need to be optimized in the design of the English pronunciation teaching system. Through the demand analysis and function analysis of the system, this research uses speech signal processing technology to extract the characteristics of the speech signal---Mel Frequency Cepstrum Coefficient (MFCC), The system's speech signal preprocessing, speech signal feature extraction and dynamic time warping (DTW) recognition algorithms are optimized. At the same time, this research combines multimedia teaching resources such as text, pronunciation video and excellent courses to study the realization process of each function of the system.
APA, Harvard, Vancouver, ISO, and other styles
35

ESPOSITO, ANNA, VOJTĚCH STEJSKAL, and ZDENĚK SMÉKAL. "COGNITIVE ROLE OF SPEECH PAUSES AND ALGORITHMIC CONSIDERATIONS FOR THEIR PROCESSING." International Journal of Pattern Recognition and Artificial Intelligence 22, no. 05 (August 2008): 1073–88. http://dx.doi.org/10.1142/s0218001408006508.

Full text
Abstract:
This study investigates pausing strategies, focusing the attention on empty speech pauses. A cross-modal analysis (video and audio) of spontaneous narratives produced by male and female children and adults showed that a remarkable amount of empty speech pauses was used to signal new concepts in the speech flow and to segment discourse units such as clauses and paragraphs. Based on these results, an adaptive mathematical model for pause distribution was suggested, that exploits, as pause features, the absence of signal and/or the changes of energy over different acoustic dimensions strongly related to the auditory perception. These considerations inspired the formulation and the implementation of two pause detection procedures that proved to be more effective than the Likelihood Ratio Test (LRT) and Long-Term Spectral Divergence (LTSD) algorithms recently proposed in literature and applied for Voice Activity Detection (VAD).
APA, Harvard, Vancouver, ISO, and other styles
36

Yang, Hai, Yunfei Xu, Houjun Huang, Ruohua Zhou, and Yonghong Yan. "Voice biometrics using linear Gaussian model." IET Biometrics 3, no. 1 (March 2014): 9–15. http://dx.doi.org/10.1049/iet-bmt.2013.0027.

Full text
APA, Harvard, Vancouver, ISO, and other styles
37

Ting, Liu, and Luo Xinwei. "An improved voice activity detection method based on spectral features and neural network." INTER-NOISE and NOISE-CON Congress and Conference Proceedings 263, no. 2 (August 1, 2021): 4570–80. http://dx.doi.org/10.3397/in-2021-2747.

Full text
Abstract:
The recognition accuracy of speech signal and noise signal is greatly affected under low signal-to-noise ratio. The neural network with parameters obtained from the training set can achieve good results in the existing data, but is poor for the samples with different the environmental noises. This method firstly extracts the features based on the physical characteristics of the speech signal, which have good robustness. It takes the 3-second data as samples, judges whether there is speech component in the data under low signal-to-noise ratios, and gives a decision tag for the data. If a reasonable trajectory which is like the trajectory of speech is found, it is judged that there is a speech segment in the 3-second data. Then, the dynamic double threshold processing is used for preliminary detection, and then the global double threshold value is obtained by K-means clustering. Finally, the detection results are obtained by sequential decision. This method has the advantages of low complexity, strong robustness, and adaptability to multi-national languages. The experimental results show that the performance of the method is better than that of traditional methods under various signal-to-noise ratios, and it has good adaptability to multi language.
APA, Harvard, Vancouver, ISO, and other styles
38

Muhaxov, Harhenbek, Zhong Lou, Wang Li, Tolewbek Samet, and Aheyeh Harhenbek. "Experimental Research on Signal Recognition Algorithm of Wireless Sensor Language." International Journal of Online Engineering (iJOE) 12, no. 10 (October 31, 2016): 38. http://dx.doi.org/10.3991/ijoe.v12i10.6203.

Full text
Abstract:
<p style="margin: 0in 0in 10pt;"><span style="-ms-layout-grid-mode: line;"><span style="font-family: Times New Roman; font-size: small;">In the past several decades, much research has been carried out on the </span><a name="OLE_LINK9"></a><span style="font-family: Times New Roman; font-size: small;">wireless</span><span style="font-family: Times New Roman; font-size: small;"> sensor network which is widely used in the fields of national defense and national economy within China. The main function of the language sensor is to transfer the voice signal into an electrical signal so as to facilitate the subsequent analysis and processing. Combined with the wireless network signal, it is widely used in banks, shopping malls, examination rooms, prisons, important places of military affairs and other places. This paper makes a detailed introduction of some theoretical knowledge of language recognition and puts forward the recognition algorithm in which the language signal is abstracted by language features. Afterwards, Matlab is used to make simulation in-depth research and classification based on Support Vector Machine. Finally, a large number of samples are collected for the experiment so as to research the effect of weighted feature value and structure of classification on speech recognition rate. The conclusion of the paper provides a basis for further subsequent theoretical study.</span></span></p>
APA, Harvard, Vancouver, ISO, and other styles
39

Badr, Ameer, and Alia Abdul-Hassan. "A Review on Voice-based Interface for Human-Robot Interaction." Iraqi Journal for Electrical and Electronic Engineering 16, no. 2 (November 12, 2020): 1–12. http://dx.doi.org/10.37917/ijeee.16.2.10.

Full text
Abstract:
With the recent developments of technology and the advances in artificial intelligence and machine learning techniques, it has become possible for the robot to understand and respond to voice as part of Human-Robot Interaction (HRI). The voice-based interface robot can recognize the speech information from humans so that it will be able to interact more naturally with its human counterpart in different environments. In this work, a review of the voice-based interface for HRI systems has been presented. The review focuses on voice-based perception in HRI systems from three facets, which are: feature extraction, dimensionality reduction, and semantic understanding. For feature extraction, numerous types of features have been reviewed in various domains, such as time, frequency, cepstral (i.e. implementing the inverse Fourier transform for the signal spectrum logarithm), and deep domains. For dimensionality reduction, subspace learning can be used to eliminate the redundancies of high-dimensional features by further processing extracted features to reflect their semantic information better. For semantic understanding, the aim is to infer from the extracted features the objects or human behaviors. Numerous types of semantic understanding have been reviewed, such as speech recognition, speaker recognition, speaker gender detection, speaker gender and age estimation, and speaker localization. Finally, some of the existing voice-based interface issues and recommendations for future works have been outlined.
APA, Harvard, Vancouver, ISO, and other styles
40

Cheng, Xie Feng, Ye Wei Tao, and Zheng Jiang Huang. "Heart Sound Recognition - A Prospective Candidate for Biometric Identification." Advanced Materials Research 225-226 (April 2011): 433–36. http://dx.doi.org/10.4028/www.scientific.net/amr.225-226.433.

Full text
Abstract:
Based on principles of human heart auscultation and the associated signal processing technology, we designed and manufactured "a double-header two-way voice auscultation detection device". The paper introduced a special human feature extraction method which is based on improved circle convolution (ICC) slicing algorithm combined with independent sub-band function (ISF). Follow we adopt a fire-new classification technology namely s1 and s2 model which is through two recognition steps to get different human’s heart sound features to assure validity, and then use similarity distance to carry out human heart sound pattern matching. The method was verified using 10 recorded heart sounds. The results show that identification accuracy is 85.7% in the two-step mode, and the error acceptance rate is less than 7%,and refusing error rate is less than 10% for normal people.
APA, Harvard, Vancouver, ISO, and other styles
41

Dejonckere, P. H., A. Giordano, J. Schoentgen, S. Fraj, L. Bocchi, and C. Manfredi. "To what degree of voice perturbation are jitter measurements valid? A novel approach with synthesized vowels and visuo-perceptual pattern recognition." Biomedical Signal Processing and Control 7, no. 1 (January 2012): 37–42. http://dx.doi.org/10.1016/j.bspc.2011.05.002.

Full text
APA, Harvard, Vancouver, ISO, and other styles
42

Wu, Jian-Tong, Shinichi Tamura, Hiroshi Mitsumoto, Hideo Kawai, Kenji Kurosu, and Kozo Okazaki. "Neural network vowel-recognition jointly using voice features and mouth shape image." Pattern Recognition 24, no. 10 (January 1991): 921–27. http://dx.doi.org/10.1016/0031-3203(91)90089-n.

Full text
APA, Harvard, Vancouver, ISO, and other styles
43

Li, Haowei. "A Speaker Recognition System Based on Deep Learning." Journal of Electronic Research and Application 3, no. 6 (December 31, 2019): 1–6. http://dx.doi.org/10.26689/jera.v3i6.1056.

Full text
Abstract:
This paper lies in the field of digital signal processing. This is a speech recognition system that identifies the different speakers based on deep learning. The invention consists of the following steps: Firstly, we collect the voice data from different people. Secondly, the data having been selected is preprocessed by extracting their Mel Frequency Cepstral Coefficients (MFCC) and is divided into training set and test set randomly. Thirdly, we cut the training set into batches, and put them into the convolutional neural network which consists of convolutional layers, max pooling layers and fully connected layers. After repeatedly adjusting the parameters of the network such as learning rate, dropout rate and decay rate, the model will reach the optimal performance. Finally, the testing set is also cut into batches and put into the trained neural network. The final recognition accuracy rate is 70.23%. In brief, the research can automatically recognize different speakers efficiently.
APA, Harvard, Vancouver, ISO, and other styles
44

Sledevič, Tomyslav, and Liudas Stašionis. "FPGA-BASED IMPLEMENTATION OF LITHUANIAN ISOLATED WORD RECOGNITION ALGORITHM / LIETUVIŲ KALBOS PAVIENIŲ ŽODŽIŲ ATPAŽINIMO ALGORITMO ĮGYVENDINIMAS LAUKU PROGRAMUOJAMA LOGINE MATRICA." Mokslas - Lietuvos ateitis 5, no. 2 (May 24, 2013): 101–4. http://dx.doi.org/10.3846/mla.2013.18.

Full text
Abstract:
The paper describes the FPGA-based implementation of Lithuanian isolated word recognition algorithm. FPGA is selected for parallel process implementation using VHDL to ensure fast signal processing at low rate clock signal. Cepstrum analysis was applied to features extraction in voice. The dynamic time warping algorithm was used to compare the vectors of cepstrum coefficients. A library of 100 words features was created and stored in the internal FPGA BRAM memory. Experimental testing with speaker dependent records demonstrated the recognition rate of 94%. The recognition rate of 58% was achieved for speaker-independent records. Calculation of cepstrum coefficients lasted for 8.52 ms at 50 MHz clock, while 100 DTWs took 66.56 ms at 25 MHz clock. Article in Lithuanian. Santrauka Pateikiamas lietuvių kalbos pavienių žodžių atpažinimo algoritmo įgyvendinimas lauku programuojama logine matrica (LPLM). LPLM įrenginys pasirinktas dėl lygiagrečiai veikiančių procesų įgyvendinimo galimybės taikant VHDL kalbą. Tai užtikrina spartų signalų apdorojimą esant taktiniam dažniui iki 50 MHz. Kalbos požymiams išskirti taikoma kepstrinė šnekos analizė. Požymiams palyginti taikomas dinaminis laiko skalės kraipymo (DSLK) metodas. Sudaryta 100 žodžių požymių biblioteka, kuri saugoma vidinėje LPLM BRAM atmintyje. Pasiektas 94 % atpažinimo tikslumas priklausomai nuo kalbėtojo ir 58 % – nepriklausomai nuo kalbėtojo. Kepstro koeficientų skaičiavimas vienam žodžiui trunka 8,52 ms, esant 50 MHz taktiniam dažniui, ir šimtui DLSK – 66,56 ms, esant 25 MHz taktiniam dažniui.
APA, Harvard, Vancouver, ISO, and other styles
45

Li, Jing Jiao, Dong An, Jiao Wang, and Chao Qun Rong. "Speech Endpoint Detection in Noisy Environment Based on the Ensemble Empirical Mode Decomposition." Advanced Engineering Forum 2-3 (December 2011): 135–39. http://dx.doi.org/10.4028/www.scientific.net/aef.2-3.135.

Full text
Abstract:
Speech endpoint detection is one of the key problems in the practical application of speech recognition system. In this paper, speech signal contained chirp is decomposed into several intrinsic mode function (IMF) with the method of ensemble empirical mode decomposition (EEMD). At the same time, it eliminates the modal mix superposition phenomenon which usually comes out in processing speech signal with the algorithm of empirical mode decomposition (EMD). After that, selects IMFs contained major noise through the adaptive algorithm. Finally, the IMFs and speech signal contained chirp are input into the independent component analysis (ICA) and pure voice signal is separated out. The accuracy of speech endpoint detection can be improved in this way. The result shows that the new speech endpoint detection method proposed above is effective, and has strong anti-noises ability, especially suitable for the speech endpoint detection in low SNR.
APA, Harvard, Vancouver, ISO, and other styles
46

Wu, Jian Da, Pang Yi Liu, and Guan Long Hong. "Driver Voice Identification System Using Auto-Correlation Function and Average Magnitude Difference Function." Applied Mechanics and Materials 490-491 (January 2014): 1287–92. http://dx.doi.org/10.4028/www.scientific.net/amm.490-491.1287.

Full text
Abstract:
This study presents a driver identification system using voice analysis for a vehicle security system. The structure of the proposed system has three parts. The first procedure is speech pre-processing, the second is feature extraction of sound signals, and the third is classification of driver voice. Initially, a database of sound signals for several drivers was established. The volume and zero-crossing rate (ZCR) of sound are used to detect the voice end-point in order to reduce data computation. Then the Auto-correlation Function (ACF) and Average Magnitude Difference Function (AMDF) methods are applied to retrieve the voice pitch features. Finally these features are used to identify the drivers by a General Regression Neural Network (GRNN). The experimental results show that the development of this voice identification system can use fewer feature vectors of pitch to obtain a good recognition rate.
APA, Harvard, Vancouver, ISO, and other styles
47

Laptev, O., V. Sobchuk, and V. Savchenko. "A METHOD OF INCREASING THE IMMUNITY OF A SYSTEM FOR DETECTING, RECOGNIZING AND LOCALIZING DIGITAL SIGNALS IN THE INFORMATION SYSTEMS." Collection of scientific works of the Military Institute of Kyiv National Taras Shevchenko University, no. 66 (2019): 90–104. http://dx.doi.org/10.17721/2519-481x/2020/66-09.

Full text
Abstract:
In the process of detection, recognition, and localization of the single means of silent retrieval of information in information systems, the urgent issue is the increase of noise immunity. The article explores the features of using low-pass filters with a quadratic and linear response dependence on the input signal. It is shown that the principle of operation of the filters is that the summation process is performed. In this case, the useful signal is summed coherently, and the interference signal is incoherent, that is, the useful signal increases, and the interference signal decreases. When exposed to inputs, linear and quadratic filters, a rectangular pulse that simulates the signal of modern digital non-voice information, the parameters necessary for use in the future, such as mathematical expectation, correlation coefficient, variance, root mean square, the ratio of signals to interference in temporal and spectral form. We have determined a payoff ratio that shows the efficiency of using low pass filters. The graphs of the envelope voltage at the output of the ideal bandpass filter when exposed to the input of a rectangular pulse - the signal of the means of silent information acquisition, with different duration. The filtration process was simulated at different correlation coefficients, which confirmed the possibility of signal isolation of the means of silent information acquisition by the method of determining the two-dimensional probability density of the interference signal and the background of the general signal. The process of increasing the noise immunity of the system as a whole is considered, it is proved that the increase of noise immunity by 23 % of the system of identification, recognition and localization of the means of silent retrieval of information is achieved by using, in the process of signal processing, narrow-band filters of low frequency.
APA, Harvard, Vancouver, ISO, and other styles
48

Shackleton, Trevor M., Ray Meddis, and Michael J. Hewitt. "The Role of Binaural and Fundamental Frequency Difference cues in the Identification of Concurrently Presented Vowels." Quarterly Journal of Experimental Psychology Section A 47, no. 3 (August 1994): 545–63. http://dx.doi.org/10.1080/14640749408401127.

Full text
Abstract:
The relative importance of voice pitch and interaural difference cues in facilitating the recognition of both of two concurrently presented synthetic vowels was measured. The interaural difference cues used were an interaural time difference (400 μsec ITD), two magnitudes of interaural level difference (15 dB and infinite ILD), and a combination of ITD and ILD (400 μsec plus 15 dB). The results are analysed separately for those cases where both vowels are identical and those where they are different. When the two vowels are different, a voice pitch difference of one semitone is found to improve the percentage of correct reports of both vowels by 35.8% on average. However, the use of interaural difference cues results in an improvement of 11.5% on average when there is a voice pitch difference of one semitone, but only a non-significant 0.1% when there is no voice pitch difference. When the two vowels are identical, imposition of either a voice pitch difference or binaural difference reduces performance, in a subtractive manner. It is argued that the smaller size of the interaural difference effect is not due to a “ceiling effect” but is characteristic of the relative importance of the two kinds of cues in this type of experiment. The possibility that the improvement due to interaural difference cues may in fact be due to monaural processing is discussed. A control experiment is reported for the ITD condition, which suggests binaural processing does occur for this condition. However, it is not certain whether the improvement in the ILD condition is due to binaural processing or use of the improvement in signal-to-noise ratio for a single vowel at each ear.
APA, Harvard, Vancouver, ISO, and other styles
49

Beiderman, Yevgeny, Yaniv Azani, Yoni Cohen, Chen Nisankoren, Mina Teicher, Ehud Rivlin, Vicente Mico, Javier Garcia, and Zeev Zalevsky. "Spatial Processing for Improved Quality Recognition of Optically Recorded Voice Signals and Illumination Varied Scenery." Recent Patents on Signal Processing 1, no. 2 (December 21, 2011): 91–100. http://dx.doi.org/10.2174/1877612411101020091.

Full text
APA, Harvard, Vancouver, ISO, and other styles
50

Beiderman, Yevgeny, Yaniv Azani, Yoni Cohen, Chen Nisankoren, Mina Teicher, Ehud Rivlin, Vicente Mico, Javier Garcia, and Zeev Zalevsky. "Spatial Processing for Improved Quality Recognition of Optically Recorded Voice Signals and Illumination Varied Scenery." Recent Patents on Signal Processinge 1, no. 2 (December 1, 2011): 91–100. http://dx.doi.org/10.2174/2210686311101020091.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography