Log in

Relevant bibliographies by topics / Microphone arrays / Dissertations / Theses

Dissertations / Theses on the topic 'Microphone arrays'

To see the other types of publications on this topic, follow the link: Microphone arrays.

Author: Grafiati

Published: 4 June 2021

Last updated: 8 June 2024

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Microphone arrays.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Gillett, Philip Winslow. "Head Mounted Microphone Arrays." Diss., Virginia Tech, 2009. http://hdl.handle.net/10919/28867.

Full text

Abstract:

Microphone arrays are becoming increasingly integrated into every facet of life. From sonar to gunshot detection systems to hearing aids, the performance of each system is enhanced when multi-sensor processing is implemented in lieu of single sensor processing. Head mounted microphone arrays have a broad spectrum of uses that follow the rigorous demands of human hearing. From noise cancellation to focused listening, from localization to classification of sound sources, any and all attributes of human hearing may be augmented through the use of microphone arrays and signal processing algorithms. Placing a set of headphones on a human provides several desirable features such as hearing protection, control over the acoustic environment (via headphone speakers), and a means of communication. The shortcoming of headphones is the complete occlusion of the pinnae (the ears), disrupting auditory cues utilized by humans for sound localization. This thesis presents the underlying theory in designing microphone arrays placed on diffracting bodies, specifically the human head. A progression from simple to complex geometries chronicles the effect of diffracting structures on array manifold matrices. Experimental results validate theoretical and computational models showing that arrays mounted on diffracting structures provide better beamforming and localization performance than arrays mounted in the free field. Data independent, statistically optimal, and adaptive beamforming methods are presented to cover a broad range of goals present in array applications. A framework is developed to determine the performance potential of microphone array designs regardless of geometric complexity. Directivity index, white noise gain, and singular value decomposition are all utilized as performance metrics for array comparisons. The biological basis for human hearing is presented as a fundamental attribute of headset array optimization methods. A method for optimizing microphone locations for the purpose of the recreation of HRTFs is presented, allowing transparent hearing (also called natural hearing restoration) to be performed. Results of psychoacoustic testing with a prototype headset array are presented and examined. Subjective testing shows statistically significant improvements over occluded localization when equipped with this new transparent hearing system prototype.
Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

2

Barnes, Hugh. "Speech enhancement using microphone arrays." Thesis, Imperial College London, 2004. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.409581.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Lustberg, Robert Jack. "Acoustic beamforming using microphone arrays." Thesis, Massachusetts Institute of Technology, 1993. http://hdl.handle.net/1721.1/12338.

Full text

Abstract:

Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1993.
Includes bibliographical references (leaves 71-72).
by Robert Jack Lustberg.
M.S.

APA, Harvard, Vancouver, ISO, and other styles

4

Mošner, Ladislav. "Microphone Arrays for Speaker Recognition." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2017. http://www.nusl.cz/ntk/nusl-363803.

Full text

Abstract:

Tato diplomová práce se zabývá problematikou vzdáleného rozpoznávání mluvčích. V případě dat zachycených odlehlým mikrofonem se přesnost standardního rozpoznávání značně snižuje, proto jsem navrhl dva přístupy pro zlepšení výsledků. Prvním z nich je použití mikrofonního pole (záměrně rozestavené sady mikrofonů), které je schopné nasměrovat virtuální "paprsek" na pozici řečníka. Dále jsem prováděl adaptaci komponent systému (PLDA skórování a extraktoru i-vektorů). S využitím simulace pokojových podmínek jsem syntetizoval trénovací a testovací data ze standardní datové sady NIST 2010. Ukázal jsem, že obě techniky a jejich kombinace vedou k výraznému zlepšení výsledků. Dále jsem se zabýval společným určením identity a pozice mluvčího. Zatímco výsledky ve venkovním simulovaném prostředí (bez ozvěn) jsou slibné, výsledky z interiéru (s ozvěnami) jsou smíšené a vyžadují další prozkoumání. Na závěr jsem mohl systémem vyhodnotit omezené množství reálných dat získaných přehráním a záznamem nahrávek ve skutečné místnosti. Zatímco výsledky pro mužské nahrávky odpovídají simulaci, výsledky pro ženské nahrávky nejsou přesvědčivé a vyžadují další analýzu.

APA, Harvard, Vancouver, ISO, and other styles

5

Moore, Darren C. "Speech enhancement using microphone arrays." Thesis, Queensland University of Technology, 2000. https://eprints.qut.edu.au/36141/1/36141_Moore_2000.pdf.

Full text

Abstract:

This thesis presents a comparative analysis of baseline microphone array speech enhancement techniques that are prominent in current literature. Delay-sum beamforming, sub-array beamforming, near- and far-field superdirectivity, the generalised sidelobe canceller and the adaptive system for microphone-array noise reduction (AivINOR) are evaluated in varying noise conditions and for different array geometries. The effect of complementing each technique with a postfilter is also assessed. A novel beamformer, termed the near-field adaptive beamformer (NFAB), is introduced and then shown to provide superior enhancement performance over any of the baseline techniques assessed. A description of the design and implementation of a high-speed, multi-channel speech data acquisition system is also presented.

APA, Harvard, Vancouver, ISO, and other styles

6

Ryan, James G. "Near-field beamforming using microphone arrays." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1999. http://www.collectionscanada.ca/obj/s4/f2/dsk1/tape8/PQDD_0015/NQ48335.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Goh, Boon Aik. "Adaptive subband beamforming for microphone arrays." Thesis, University of Leeds, 2005. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.424021.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Cardoso, Clara Ferreira. "Signal processing for circular microphone arrays." Thesis, University of Southampton, 2007. https://eprints.soton.ac.uk/421465/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

McCowan, Iain A. "Robust speech recognition using microphone arrays." Thesis, Queensland University of Technology, 2001.

Find full text

Abstract:

The performance of state-of-the-art automatic speech recognition has recently attained levels sufficient for deployment in practical applications. As speech recognition technology undergoes the transition from research laboratories to the consumer market, much work remains to be done in the research domain in order to produce recognition systems that will perform well in practical configurations and realistic noise conditions. A major problem facing speech recognition researchers is the presence of undesired noise in the input speech signal. Systems that provide high performance levels in clean laboratory conditions often degrade dramatically in more realistic noise conditions. A varying level of noise exists in the majority of speech recognition applications, whether it be conflicting speech and computer noise in an office, engine and wind noise in vehicles, machine noise in factories, or any other source of undesired sound. For a speech recognition system to be practical, it must be robust to a variety of noisy conditions. As well as the problem of noise reduction, another issue in many applications of speech recognition is the desire for hands-free acquisition of the speech signal. Currently, most speech recognition systems require a close-talking head-set microphone to provide the input speech signal, as the performance degrades markedly when a distant microphone is used. An emerging topic of research is the use of microphone arrays in speech processing applications. A microphone array consists of multiple microphones placed at different spatial locations. Built upon a knowledge of sound propagation principles, the multiple inputs can be manipulated to enhance or attenuate signals emanating from particular directions. In this way, microphone arrays provide a means of enhancing a desired signal in the presence of corrupting noise sources. Moreover, this enhancement is based purely on knowledge of the source location, and so microphone array techniques are applicable to a wide variety of noise types. This thesis investigates the use of microphone arrays to improve the robustness of hands-free speech recognition systems in noisy conditions. Microphone arrays have great potential in practical applications of speech recognition, due to their ability to provide both noise robustness and hands-free signal acquisition. As well as investigating the use of microphone arrays as a speech enhancement stage prior to recognition, this thesis also examines a closer integration of the multi-channel input with other robust speech recognition techniques. In addition to an experimental evaluation of key microphone array beamforming methods, several novel techniques are proposed. These include a near-field adaptive beamforming algorithm, an adaptive parameter compensation algorithm, and a multi-channel sub-band recognition system. Each of the proposed techniques is shown to offer significant performance improvements in speech recognition experiments in high noise conditions.

APA, Harvard, Vancouver, ISO, and other styles

10

Hua, Thanh Phong. "Adaptation mode controllers for adaptive microphone arrays." Rennes 1, 2006. http://www.theses.fr/2006REN1S136.

Full text

Abstract:

Le traitement d’antenne réalisé par un réseau de microphones permet l’extraction d’un signal cible dans un environnement bruité. Dans ce travail, une calibration automatique est proposée pour supprimer la différence de gain entre les microphones tout en gardant la même puissance moyenne en sortie de l'antenne fixe. Deux nouveaux contrôleurs de mode d'adaptation (AMC pour Adaptation Mode Controller) sont proposés pour la mise à jour des coefficients des filtres suivant la situation détectée (présence de signal cible ou d’interférence). Ces AMC sont basés sur une estimation du rapport signal-à-interférence. Les résultats des évaluations dans un environnement réel montrent que les AMC proposés contribuent à une meilleure qualité du signal de sortie ainsi qu'à une augmentation du taux de reconnaissance vocale pouvant atteindre 31% en comparaison d’un AMC conventionnel. Ces systèmes sont intégrés au robot PaPeRo développé par NEC et destiné à vivre en interaction avec les humains.

APA, Harvard, Vancouver, ISO, and other styles

11

Himawan, Ivan. "Speech recognition using ad-hoc microphone arrays." Thesis, Queensland University of Technology, 2010. https://eprints.qut.edu.au/34461/1/Ivan_Himawan_Thesis.pdf.

Full text

Abstract:

While close talking microphones give the best signal quality and produce the highest accuracy from current Automatic Speech Recognition (ASR) systems, the speech signal enhanced by microphone array has been shown to be an effective alternative in a noisy environment. The use of microphone arrays in contrast to close talking microphones alleviates the feeling of discomfort and distraction to the user. For this reason, microphone arrays are popular and have been used in a wide range of applications such as teleconferencing, hearing aids, speaker tracking, and as the front-end to speech recognition systems. With advances in sensor and sensor network technology, there is considerable potential for applications that employ ad-hoc networks of microphone-equipped devices collaboratively as a virtual microphone array. By allowing such devices to be distributed throughout the users’ environment, the microphone positions are no longer constrained to traditional fixed geometrical arrangements. This flexibility in the means of data acquisition allows different audio scenes to be captured to give a complete picture of the working environment. In such ad-hoc deployment of microphone sensors, however, the lack of information about the location of devices and active speakers poses technical challenges for array signal processing algorithms which must be addressed to allow deployment in real-world applications. While not an ad-hoc sensor network, conditions approaching this have in effect been imposed in recent National Institute of Standards and Technology (NIST) ASR evaluations on distant microphone recordings of meetings. The NIST evaluation data comes from multiple sites, each with different and often loosely specified distant microphone configurations. This research investigates how microphone array methods can be applied for ad-hoc microphone arrays. A particular focus is on devising methods that are robust to unknown microphone placements in order to improve the overall speech quality and recognition performance provided by the beamforming algorithms. In ad-hoc situations, microphone positions and likely source locations are not known and beamforming must be achieved blindly. There are two general approaches that can be employed to blindly estimate the steering vector for beamforming. The first is direct estimation without regard to the microphone and source locations. An alternative approach is instead to first determine the unknown microphone positions through array calibration methods and then to use the traditional geometrical formulation for the steering vector. Following these two major approaches investigated in this thesis, a novel clustered approach which includes clustering the microphones and selecting the clusters based on their proximity to the speaker is proposed. Novel experiments are conducted to demonstrate that the proposed method to automatically select clusters of microphones (ie, a subarray), closely located both to each other and to the desired speech source, may in fact provide a more robust speech enhancement and recognition than the full array could.

APA, Harvard, Vancouver, ISO, and other styles

12

Varada, Vijay K. "Acoustic Localization Employing Polar Directivity Patterns of Bidirectional Microphones Enabling Minimum Aperture Microphone Arrays." University of Toledo / OhioLINK, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=toledo1290118825.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Hughes, Ashley. "Acoustic source localisation and tracking using microphone arrays." Thesis, University of Edinburgh, 2016. http://hdl.handle.net/1842/19524.

Full text

Abstract:

This thesis considers the domain of acoustic source localisation and tracking in an indoor environment. Acoustic tracking has applications in security, human-computer interaction, and the diarisation of meetings. Source localisation and tracking is typically a computationally expensive task, making it hard to process on-line, especially as the number of speakers to track increases. Much of the literature considers single-source localisation, however a practical system must be able to cope with multiple speakers, possibly active simultaneously, without knowing beforehand how many speakers are present. Techniques are explored for reducing the computational requirements of an acoustic localisation system. Techniques to localise and track multiple active sources are also explored, and developed to be more computationally efficient than the current state of the art algorithms, whilst being able to track more speakers. The first contribution is the modification of a recent single-speaker source localisation technique, which improves the localisation speed. This is achieved by formalising the implicit assumption by the modified algorithm that speaker height is uniformly distributed on the vertical axis. Estimating height information effectively reduces the search space where speakers have previously been detected, but who may have moved over the horizontal-plane, and are unlikely to have significantly changed height. This is developed to allow multiple non-simultaneously active sources to be located. This is applicable when the system is given information from a secondary source such as a set of cameras allowing the efficient identification of active speakers rather than just the locations of people in the environment. The next contribution of the thesis is the application of a particle swarm technique to significantly further decrease the computational cost of localising a single source in an indoor environment, compared the state of the art. Several variants of the particle swarm technique are explored, including novel variants designed specifically for localising acoustic sources. Each method is characterised in terms of its computational complexity as well as the average localisation error. The techniques’ responses to acoustic noise are also considered, and they are found to be robust. A further contribution is made by using multi-optima swarm techniques to localise multiple simultaneously active sources. This makes use of techniques which extend the single-source particle swarm techniques to finding multiple optima of the acoustic objective function. Several techniques are investigated and their performance in terms of localisation accuracy and computational complexity is characterised. Consideration is also given to how these metrics change when an increasing number of active speakers are to be localised. Finally, the application of the multi-optima localisation methods as an input to a multi-target tracking system is presented. Tracking multiple speakers is a more complex task than tracking single acoustic source, as observations of audio activity must be associated in some way with distinct speakers. The tracker used is known to be a relatively efficient technique, and the nature of the multi-optima output format is modified to allow the application of this technique to the task of speaker tracking.

APA, Harvard, Vancouver, ISO, and other styles

14

Kuntz, Achim. "Wave field analysis using virtual circular microphone arrays." München Verl. Dr. Hut, 2008. http://d-nb.info/993260292/04.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

Cohen, Zachary Gideon. "Noise Reduction with Microphone Arrays for Speaker Identification." DigitalCommons@CalPoly, 2012. https://digitalcommons.calpoly.edu/theses/884.

Full text

Abstract:

The presence of acoustic noise in audio recordings is an ongoing issue that plagues many applications. This ambient background noise is difficult to reduce due to its unpredictable nature. Many single channel noise reduction techniques exist but are limited in that they may distort the desired speech signal due to overlapping spectral content of the speech and noise. It is therefore of interest to investigate the use of multichannel noise reduction algorithms to further attenuate noise while attempting to preserve the speech signal of interest. Specifically, this thesis looks to investigate the use of microphone arrays in conjunction with multichannel noise reduction algorithms to aid aiding in speaker identification. Recording a speaker in the presence of acoustic background noise ultimately limits the performance and confidence of speaker identification algorithms. In situations where it is impossible to control the noise environment where the speech sample is taken, noise reduction algorithms must be developed and applied to clean the speech signal in order to give speaker identification software a chance at a positive identification. Due to the limitations of single channel techniques, it is of interest to see if spatial information provided by microphone arrays can be exploited to aid in speaker identification. This thesis provides an exploration of several time domain multichannel noise reduction techniques including delay sum beamforming, multi-channel Wiener filtering, and Spatial-Temporal Prediction filtering. Each algorithm is prototyped and filter performance is evaluated using various simulations and experiments. A three-dimensional noise model is developed to simulate and compare the performance of the above methods and experimental results of three data collections are presented and analyzed. The algorithms are compared and recommendations are given for the use of each technique. Finally, ideas for future work are discussed to improve performance and implementation of these multichannel algorithms. Possible applications for this technology include audio surveillance, identity verification, video chatting, conference calling and sound source localization.

APA, Harvard, Vancouver, ISO, and other styles

16

Chakraborty, Rupayan. "Acoustic event detection and localization using distributed microphone arrays." Doctoral thesis, Universitat Politècnica de Catalunya, 2013. http://hdl.handle.net/10803/134364.

Full text

Abstract:

Automatic acoustic scene analysis is a complex task that involves several functionalities: detection (time), localization (space), separation, recognition, etc. This thesis focuses on both acoustic event detection (AED) and acoustic source localization (ASL), when several sources may be simultaneously present in a room. In particular, the experimentation work is carried out with a meeting-room scenario. Unlike previous works that either employed models of all possible sound combinations or additionally used video signals, in this thesis, the time overlapping sound problem is tackled by exploiting the signal diversity that results from the usage of multiple microphone array beamformers. The core of this thesis work is a rather computationally efficient approach that consists of three processing stages. In the first, a set of (null) steering beamformers is used to carry out diverse partial signal separations, by using multiple arbitrarily located linear microphone arrays, each of them composed of a small number of microphones. In the second stage, each of the beamformer output goes through a classification step, which uses models for all the targeted sound classes (HMM-GMM, in the experiments). Then, in a third stage, the classifier scores, either being intra- or inter-array, are combined using a probabilistic criterion (like MAP) or a machine learning fusion technique (fuzzy integral (FI), in the experiments). The above-mentioned processing scheme is applied in this thesis to a set of complexity-increasing problems, which are defined by the assumptions made regarding identities (plus time endpoints) and/or positions of sounds. In fact, the thesis report starts with the problem of unambiguously mapping the identities to the positions, continues with AED (positions assumed) and ASL (identities assumed), and ends with the integration of AED and ASL in a single system, which does not need any assumption about identities or positions. The evaluation experiments are carried out in a meeting-room scenario, where two sources are temporally overlapped; one of them is always speech and the other is an acoustic event from a pre-defined set. Two different databases are used, one that is produced by merging signals actually recorded in the UPC¿s department smart-room, and the other consists of overlapping sound signals directly recorded in the same room and in a rather spontaneous way. From the experimental results with a single array, it can be observed that the proposed detection system performs better than either the model based system or a blind source separation based system. Moreover, the product rule based combination and the FI based fusion of the scores resulting from the multiple arrays improve the accuracies further. On the other hand, the posterior position assignment is performed with a very small error rate. Regarding ASL and assuming an accurate AED system output, the 1-source localization performance of the proposed system is slightly better than that of the widely-used SRP-PHAT system, working in an event-based mode, and it even performs significantly better than the latter one in the more complex 2-source scenario. Finally, though the joint system suffers from a slight degradation in terms of classification accuracy with respect to the case where the source positions are known, it shows the advantage of carrying out the two tasks, recognition and localization, with a single system, and it allows the inclusion of information about the prior probabilities of the source positions. It is worth noticing also that, although the acoustic scenario used for experimentation is rather limited, the approach and its formalism were developed for a general case, where the number and identities of sources are not constrained.

APA, Harvard, Vancouver, ISO, and other styles

17

Huang, Yiteng (Arden). "Real-time acoustic source localization with passive microphone arrays." Diss., Georgia Institute of Technology, 2001. http://hdl.handle.net/1853/15024.

Full text

APA, Harvard, Vancouver, ISO, and other styles

18

Scharrer, Roman [Verfasser]. "Acoustic field analysis in small microphone arrays / Roman Scharrer." Aachen : Hochschulbibliothek der Rheinisch-Westfälischen Technischen Hochschule Aachen, 2014. http://d-nb.info/1050618939/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

19

Tontiwattanakul, Khemapat. "Signal processing for microphone arrays with novel geometrical design." Thesis, University of Southampton, 2016. https://eprints.soton.ac.uk/400599/.

Full text

Abstract:

This research aims to propose a novel technique to design rigid baffle microphone arrays with complex geometry and their signal processing. The proposed technique relies on the use of the boundary element method (BEM) that allows to obtain a numerical model of the acoustic pressure captured by sensors of microphone arrays with any geometry. The beamforming strategy proposed in this study is based on the singular value decomposition (SVD), which is regarded as the general idea of the modal beamforming. In this research, the microphone array problem is setup in an inverse problem framework,and is formulated as a Fredholm's integral equation of the first kind. The beamformer is then realised and the discrete version of the problem is analysed and discussed. The proposed design technique is then applied to two case studies: ?firstly, to study spheroidal microphone array and, secondly, to study spherical microphone arrays with acoustic waveguides added to the surface of the arrays with the aim to extend their operating frequency bandwidth. Number of numerical simulations were carried out in both studies. The most significant results are presented in the second case study. It is proved via numerical simulations that the cavities on the spherical array allow for the reduction of spatial aliasing error. Moreover, the relation between the spherical harmonic beamforming and the SVD beamformer is discussed. An experiment was setup in order to demonstrate that the array with waveguides achieves the desired e?ffect with realisable. The experiment results indicate that the proposed technique is practical and can be implemented for real-life microphone array design.

APA, Harvard, Vancouver, ISO, and other styles

20

Allred, Daniel Jackson. "Evaluation and Comparison of Beamforming Algorithms for Microphone Array Speech Processing." Thesis, Georgia Institute of Technology, 2006. http://hdl.handle.net/1853/11606.

Full text

Abstract:

Recent years have brought many new developments in the processing of speech and acoustic signals. Yet, despite this, the process of acquiring signals has gone largely unchanged. Adding spatial diversity to the repertoire of signal acquisition has long been known to offer advantages for processing signals further. The processing capabilities of mobile devices had not previously been able to handle the required computation to handle these previous streams of information. But current processing capabilities are such that the extra workload introduced by the addition of mutiple sensors on a mobile device are not over-burdensome. How these extra data streams can best be handled is still an open question. The present work deals with the examination of one type of spatial processing technique, known as beamforming. A microphone array test platform is constructed and verified through a number of beamforming agorithms. Issues related to speech acquisition through microphones arrays are discussed. The algorithms used for verification are presented in detail and compared to one another.

APA, Harvard, Vancouver, ISO, and other styles

21

Jasti, Srichandana. "Design of randomly placed microphone array." Birmingham, Ala. : University of Alabama at Birmingham, 2006. http://www.mhsl.uab.edu/dt/2006m/jasti.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Roper, Simon Edward. "A room acoustics measurement system using non-invasive microphone arrays." Thesis, University of Birmingham, 2010. http://etheses.bham.ac.uk//id/eprint/891/.

Full text

Abstract:

This thesis summarises research into adaptive room correction for small rooms and pre-recorded material, for example music of films. A measurement system to predict the sound at a remote location within a room, without a microphone at that location was investigated. This would allow the sound within a room to be adaptively manipulated to ensure that all listeners received optimum sound, therefore increasing their enjoyment. The solution presented used small microphone arrays, mounted on the room's walls. A unique geometry and processing system was designed, incorporating three processing stages, temporal, spatial and spectral. The temporal processing identifies individual reflection arrival times from the recorded data. Spatial processing estimates the angles of arrival of the reflections so that the three-dimensional coordinates of the reflections' origin can be calculated. The spectral processing then estimates the frequency response of the reflection. These estimates allow a mathematical model of the room to be calculated, based on the acoustic measurements made in the actual room. The model can then be used to predict the sound at different locations within the room. A simulated model of a room was produced to allow fast development of algorithms. Measurements in real rooms were then conducted and analysed to verify the theoretical models developed and to aid further development of the system. Results from these measurements and simulations, for each processing stage are presented.

APA, Harvard, Vancouver, ISO, and other styles

23

Achi, Peter Y. "Speech Enhancement Techniques for Large Space Habitats Using Microphone Arrays." Thesis, University of Louisiana at Lafayette, 2019. http://pqdtopen.proquest.com/#viewpdf?dispub=10813016.

Full text

Abstract:

The astronauts? ability to communicate easily among themselves or with the ship?s computer should be a high priority for the success of missions. Long-duration space habitats--whether spaceships or surface bases--will likely be larger than present-day Earth-to-orbit/Moon transfer ships. Hence an efficient approach would be to free the crew members from the relative burden of having to wear headsets throughout the spacecraft. This can be achieved by placing microphone arrays in all crew-accessible parts of the habitat. Processing algorithms would first localize the speaker and then perform speech enhancement. The background "noise" in a spacecraft is typically fan and duct noise (hum, drone), valve opening/closing (click, hiss), pumps, etc. We simulate such interfering sources by a number of loudspeakers broadcasting various sounds: real ISS sounds, a continuous radio stream, and a poem read by one author. To test the concept, we use a linear 30-microphone array driven by a zero-latency professional audio interface. Speaker localization is obtained by time-domain processing. To enhance the speech-to-noise ratio, a frequency-domain minimum-variance approach is used.

APA, Harvard, Vancouver, ISO, and other styles

24

Morgan, Joshua P. "Time-Frequency Masking Performance for Improved Intelligibility with Microphone Arrays." UKnowledge, 2017. http://uknowledge.uky.edu/ece_etds/101.

Full text

Abstract:

Time-Frequency (TF) masking is an audio processing technique useful for isolating an audio source from interfering sources. TF masking has been applied and studied in monaural and binaural applications, but has only recently been applied to distributed microphone arrays. This work focuses on evaluating the TF masking technique's ability to isolate human speech and improve speech intelligibility in an immersive "cocktail party" environment. In particular, an upper-bound on TF masking performance is established and compared to the traditional delay-sum and general sidelobe canceler (GSC) beamformers. Additionally, the novel technique of combining the GSC with TF masking is investigated and its performance evaluated. This work presents a resource-efficient method for studying the performance of these isolation techniques and evaluates their performance using both virtually simulated data and data recorded in a real-life acoustical environment. Further, methods are presented to analyze speech intelligibility post-processing, and automated objective intelligibility measurements are applied alongside informal subjective assessments to evaluate the performance of these processing techniques. Finally, the causes for subjective/objective intelligibility measurement disagreements are discussed, and it was shown that TF masking did enhance intelligibility beyond delay-sum beamforming and that the utilization of adaptive beamforming can be beneficial.

APA, Harvard, Vancouver, ISO, and other styles

25

Noohi, Tahereh. "Sound Field Decomposition with Spherical Microphone Arrays Using Sparse Recovery Techniques." Thesis, The University of Sydney, 2016. http://hdl.handle.net/2123/16102.

Full text

Abstract:

This dissertation describes research regarding sound field decomposition with spherical microphone arrays using sparse recovery techniques. Recently, sound field decomposition using sparse recovery has demonstrated the ability to achieve a surprisingly high-resolution spatial analysis of the sound field. The focus of this thesis is on improving the accuracy of the sound field decomposition in non-sparse conditions with multiple sources and reverberation. In particular, we develop and characterise two new sparse recovery techniques, which improve the spatial accuracy of the sound field decomposition in non-sparse sound conditions. The first method incorporates information related to the expected location of the sound sources into the sparse recovery problem. The second method takes advantage of both independence and sparsity by serially combining the methods of independent component analysis and sparse recovery for sound field decomposition. We then go on to examine the issue of resolving sources based on their distance from the spherical microphone array. We develop a new sparse recovery method to resolve sources located in the same direction, but located at different distances. We then very briefly examine the decomposition of sound fields based on signal content. In other words, instead of trying to decompose the sound field based on spatial location, we seek to decompose the sound field based on phoneme or word content.

APA, Harvard, Vancouver, ISO, and other styles

26

Hart, Patrick Hammel. "FPAA realization of a controlled directional microphone." Diss., Online access via UMI:, 2009.

Find full text

Abstract:

Thesis (M.S.)--State University of New York at Binghamton, Thomas J. Watson School of Engineering and Applied Science, Department of Electrical and Computer Engineering, 2009.
Includes bibliographical references.

APA, Harvard, Vancouver, ISO, and other styles

27

Abhayapala, P. Thushara D., and Thushara Abhayapala@anu edu au. "Modal Analysis and Synthesis of Broadband Nearfield Beamforming Arrays." The Australian National University. Telecommunications Engineering Group, 2000. http://thesis.anu.edu.au./public/adt-ANU20010905.121231.

Full text

Abstract:

This thesis considers the design of a beamformer which can enhance desired signals in an environment consisting of broadband nearfield and/or farfield sources. The thesis contains: a formulation of a set of analysis tools which can provide insight into the intrinsic structure of array processing problems; a methodology for nearfield beamforming; theory and design of a general broadband beamformer; and a consideration of a coherent nearfield broadband adaptive beamforming problem. To a lesser extent, the source localization problem and background noise modeling are also treated. ¶: A set of analysis tools called modal analysis techniques which can be used to a solve wider class of array signal processing problems, is first formulated. The solution to the classical wave equation is studied in detail and exploited in order to develop these techniques. ¶: Three novel methods of designing a beamformer having a desired nearfield broadband beampattern are presented. The first method uses the modal analysis techniques to transform the desired nearfield beampattern to an equivalent farfield beampattern. A farfield beamformer is then designed for a transformed farfield beampattern which, if achieved, gives the desired nearfield pattern exactly. The second method establishes an asymptotic equivalence, up to complex conjugation, of two problems: (i) determining the nearfield performance of a farfield beampattern specification, and (ii) determining the equivalent farfield beampattern corresponding to a nearfield beampattern specification. Using this reciprocity relationship a computationally simple nearfield beamforming procedure is developed. The third method uses the modal analysis techniques to find a linear transformation between the array weights required to have the desired beampattern for farfield and nearfield, respectively. ¶: An efficient parameterization for the general broadband beamforming problem is introduced with a single parameter to focus the beamformer to a desired operating radius and another set of parameters to control the actual broadband beampattern shape. This parameterization is derived using the modal analysis techniques and the concept of the theoretical continuous aperture. ¶: A design of an adaptive beamformer to operate in a signal environment consisting of broadband nearfield sources, where some of interfering signals may be correlated with desired signal is also considered. Application of modal analysis techniques to noise modeling and broadband coherent source localization conclude the thesis.

APA, Harvard, Vancouver, ISO, and other styles

28

Teutsch, Heinz. "Wavefield decomposition using microphone arrays and its application to acoustic scene analysis." [S.l.] : [s.n.], 2006. http://deposit.ddb.de/cgi-bin/dokserv?idn=97902806X.

Full text

APA, Harvard, Vancouver, ISO, and other styles

29

Matzumoto, Andres Esteban Perez. "A study of microphone arrays for the location of vibrational sound sources." Thesis, University of Southampton, 1991. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.305576.

Full text

APA, Harvard, Vancouver, ISO, and other styles

30

Unnikrishnan, Harikrishnan. "AUDIO SCENE SEGEMENTATION USING A MICROPHONE ARRAY AND AUDITORY FEATURES." UKnowledge, 2010. http://uknowledge.uky.edu/gradschool_theses/622.

Full text

Abstract:

Auditory stream denotes the abstract effect a source creates in the mind of the listener. An auditory scene consists of many streams, which the listener uses to analyze and understand the environment. Computer analyses that attempt to mimic human analysis of a scene must first perform Audio Scene Segmentation (ASS). ASS find applications in surveillance, automatic speech recognition and human computer interfaces. Microphone arrays can be employed for extracting streams corresponding to spatially separated sources. However, when a source moves to a new location during a period of silence, such a system loses track of the source. This results in multiple spatially localized streams for the same source. This thesis proposes to identify local streams associated with the same source using auditory features extracted from the beamformed signal. ASS using the spatial cues is first performed. Then auditory features are extracted and segments are linked together based on similarity of the feature vector. An experiment was carried out with two simultaneous speakers. A classifier is used to classify the localized streams as belonging to one speaker or the other. The best performance was achieved when pitch appended with Gammatone Frequency Cepstral Coefficeints (GFCC) was used as the feature vector. An accuracy of 96.2% was achieved.

APA, Harvard, Vancouver, ISO, and other styles

31

Massé, Pierre. "Analysis, Treatment, and Manipulation Methods for Spatial Room Impulse Responses Measured with Spherical Microphone Arrays." Electronic Thesis or Diss., Sorbonne université, 2022. http://www.theses.fr/2022SORUS079.

Full text

Abstract:

L'utilisation de réponses impulsionnelles spatiales de salles (spatial room impulse response, SRIR) dans la reproduction d'effets de réverbération de salle tri-dimensionnels connaît aujourd'hui une réelle démocratisation grâce à la commercialisation répandue d'antennes sphériques de microphones (spherical microphone array, SMA) et à une capacité de calcul numérique en croissance continue. Ces SRIR peuvent reproduire des effets de réverbération spatialisés sur des dispositifs immersifs ("surround-sound") à travers des convolutions multicanal de plus en plus performantes. De cette utilisation découle naturellement une demande pour des techniques d'analyse et de traitement non seulement capables d'assurer une reproduction fidèle, mais qui pourraient éventuellement aussi servir à contrôler différentes modifications de la SRIR de façon plus créative que réaliste. Dans ce contexte, l'objectif principal de cette thèse est de développer un environnement complet d'analyse, de traitement, et de manipulation temps-fréquence-espace de SRIR. Les outils d'analyse doivent mener à une modélisation approfondie permettant ensuite un traitement de la mesure vis-à-vis de ses limitations intrinsèques (conditions de mesure, accumulation de bruit de fond, etc.) ainsi qu'une capacité à faire évoluer certaines caractéristiques de l'effet de réverbération décrit par la SRIR. Ces caractéristiques peuvent être tout à fait objectives, c'est-à-dire explicitement reliées à différents paramètres du modèle, ou alors plutôt informées par une connaissance de la perception humaine de l'acoustique des salles. Les aspects théoriques de ce projet de recherche sont présentés en deux parties principales. Dans un premier temps, le modèle de signal de SRIR sous-jacent est décrit en s'inspirant directement des approches historiques dans les domaines de la réverbération artificielle et le traitement de SMA, tout en y proposant plusieurs extensions. Le modèle de signal est alors exploité afin de définir les méthodes d'analyse qui forment le noyau du cadre de traitement-manipulation final. Ces méthodes se focalisent particulièrement sur (a) l'identification du "temps de mélange" décrivant le moment de transition entre les premières réflexions et la réverbération tardive, (b) la génération d'une cartographie temps-espace des premières réflexions, et (c) l'estimation des paramètres régissant la décroissance exponentielle de l'enveloppe d'énergie de la réverbération tardive, à la fois en fréquence et en direction. La définition d'une procédure de génération de représentations directionnelles de SRIR (directional room impulse response, DRIR) est aussi nécessaire pour pouvoir prendre en compte la dépendance directionnelle de ces propriétés. En seconde partie, les paramètres de modélisation explicités par les méthodes d'analyse sont exploités à des fins soit de traitement (c'est-à-dire tenter de corriger certaines des limitations inhérentes au processus de mesure par SMA), soit de manipulation ou de modification plus créative de la SRIR. Deux méthodes de traitement sont développées en particulier : (1) une procedure d'atténuation de bruits non stationnaires agissant directement sur les signaux de mesure par balayages de fréquence exponentiels (exponential sweep method, ESM) répétés, et (2) une technique de débruitage basée sur une extrapolation et une resynthèse de la queue de réverbération tardive. Les descriptions théoriques ainsi complétées, les principales méthodes d'analyse ainsi que la génération de DRIR et le débruitage sont sujets à une série de tests de validation au cours desquels des SRIR simulées sont utilisées afin d'évaluer la performance, les limitations, et la paramétrisation des différentes techniques. Ces sous-études permettent à chaque méthode d'être vérifiée individuellement, et donnent un aperçu détaillé du fonctionnement interne des outils d'analyse. Enfin, une vue d'ensemble de l'environnement d'analyse-traitement-manipulation est obtenue [...]
The use of spatial room impulse responses (SRIR) for the reproduction of three-dimensional reverberation effects through multi-channel convolution over immersive surround-sound loudspeaker systems has become commonplace within the last few years, thanks in large part to the commercial availability of various spherical microphone arrays (SMA) as well as a constant increase in computing power. This use has in turn created a demand for analysis and treatment techniques not only capable of ensuring the faithful reproduction of the measured reverberation effect, but which could also be used to control various modifications of the SRIR in a more "creative" approach, as is often encountered in the production of immersive musical performances and installations. Within this context, the principal objective of the current thesis is the definition of a complete space-time-frequency framework for the analysis, treatment, and manipulation of SRIRs. The analysis tools should lead to an in-depth model allowing for measurements to first be treated with respect to their inherent limitations (measurement conditions, background noise, etc.), as well as offering the ability to modify different characteristics of the final reverberation effect described by the SRIR. These characteristics can be either completely objective, even physical, or otherwise informed by knowledge of human auditory perception with regard to room acoustics. The theoretical work in this research project is therefore presented in two main parts. First, the underlying SRIR signal model is described, heavily inspired by the historical approaches from the fields of artificial reverberation synthesis and SMA signal processing, while at the same time (incrementally) extending both. The signal model is then used to define the analysis methods that form the core of the final framework; these focus particularly on (a) identifying the "mixing time" that defines the moment of transition between the early reflection and late reverberation regimes, (b) obtaining a space-time cartography of the early reflections, and (c) estimating the frequency- and direction-dependent properties of the late reverberation's exponential energy decay envelope. In order to account for the directional dependence of these properties, a procedure for generating directional SRIR representations (i.e. directional room impulse responses, DRIR) that guarantee the preservation of certain fundamental reverberation properties must also be defined. In the second part, the model parameters made explicit by the analysis methods are exploited in order to either treat (i.e. attempt to correct some of the inevitable limitations inherent to the SMA measurement process) or more creatively manipulate and modify the SRIR. Two treatment methods in particular are developed in this thesis: (1) a pre-analysis procedure acting directly on repeated exponential sweep method (ESM) SMA measurement signals in an attempt to simultaneously increase the resulting SRIR's signal-to-noise ratio (SNR) while reducing its vulnerability to non-stationary noise events, and (2) a post-analysis denoising technique based on replacing the SRIR's background noise floor with a resynthesized extrapolation of the late reverberation tail. The theoretical descriptions thus complete, the main analysis methods as well as the DRIR generation and the denoising treatment procedures are then subjected to a series of validation tests, wherein simulated SRIRs (or parts thereof) are used to evaluate the performance, discuss the limitations, and parameterize the implementation of the different techniques. These sub-studies allow each method to be individually verified, resulting in a comprehensive investigation into the inner workings of the analysis toolbox (as well as the denoising process). Finally, to provide a concluding overview of the complete analysis-treatment-manipulation framework, similar studies are carried out using examples of real-world [...]

APA, Harvard, Vancouver, ISO, and other styles

32

Gergen, Sebastian [Verfasser], Rainer [Akademischer Betreuer] Martin, and Simon [Akademischer Betreuer] Doclo. "Classification of audio sources using ad-hoc microphone arrays / Sebastian Gergen. Gutachter: Rainer Martin ; Simon Doclo." Bochum : Ruhr-Universität Bochum, 2016. http://d-nb.info/1089006322/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

33

Koutrouli, Eleni. "Low Complexity Beamformer structures for application in Hearing Aids." Thesis, Blekinge Tekniska Högskola, Institutionen för tillämpad signalbehandling, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-17612.

Full text

Abstract:

Background noise is particularly damaging to speech intelligibility for people with hearing loss. The problem of reducing noise in hearing aids is one of great importance and great difficulty. Over the years, many solutions and different algorithms have been implemented in order to provide the optimal solution to the problem. Beamforming has been used for a long time and has therefore been extensively researched. Studying the performance of Minimum Variance Distortionless Response (MVDR) beamforming with a three- and four- microphone array compared to the conventional two-microphone array, the aim is to implement a speech signal enhancement and a noise reduction algorithm. By using multiple microphones, it is possible to achieve spatial selectivity, which is the ability to select certain signals based on the angle of incidence, and improve the performance of noise reduction beamformers. This thesis proposes the use of beamforming, an existing technique in order to create a new way to reduce noise transmitted by hearing aids. In order to reduce the complexity of that system, we use hybrid cascades, which are simpler beamformers of two inputs each and connected in series. The configurations that we consider are a three-microphone linear array (monaural beamformer), a three-microphone configuration with a two-microphone linear array and the 3rd microphone in the ear (monaural beamformer), a three-microphone configuration with a two-microphone linear array and the 3rd microphone on contra-lateral ear (binaural beamformer), and finally four-microphone configurations. We also investigate the performance improvement of the beamformer with more than two microphones for the different configurations, against the two-microphone beamformer reference. This can be measured by using objective measurements, such as the amount of noise suppression, target energy loss, output SNR, speech intelligibility index and speech quality evaluation. These objective measurements are good indicators of subjective performance. In this project, we prove that most hybrid structures can perform satisfyingly well compared to the full complexity beamformer. The low complexity beamformer is designed with a fixed target location (azimuth), where its weights are calibrated with respect to a target signal located in front of the listener and for a diffuse noise field. Both second- and third- order beamformers are tested in different acoustic scenarios, such as a car environment, a meeting room, a party occasion and a restaurant place. In those scenarios, the target signal is not arriving at the hearing aid directly from the front side of the listener and the noise field is not always diffuse. We thoroughly investigate what are the performance limitations in that case and how well the different cascades can perform. It is proven that there are some very critical factors, which can affect the performance of the fixed beamformer, concerning all the hybrid structures that were examined. Finally, we show that lower complexity cascades for both second- and third- order beamformers can perform similarly well as the full complexity beamformers when tested for a set of multiple Head Related Transfer Functions (HRTFs) that correspond to a real head shape.

APA, Harvard, Vancouver, ISO, and other styles

34

Abad, Gareta Alberto. "A multi-microphone approach to speech processing in a smart-room environment." Doctoral thesis, Universitat Politècnica de Catalunya, 2007. http://hdl.handle.net/10803/6906.

Full text

Abstract:

Els avenços recents en tecnologia informàtica i processament de la parla i del llenguatge, entre altres, han fet possible que noves maneres de comunicació entre les persones i les màquines comencin a semblar factibles. Concretament, l'interès en el desenvolupament de noves aplicacions en entorns tancats equipats amb múltiples sensors multimodals, també coneguts com sales intel.ligents, ha augmentat considerablement darrerament.
En general, és ben conegut que la qualitat de les senyals de la parla capturades per micròfons que poden trobar-se a diversos metros de distància dels locutors es veu severament degradada pel soroll acústic i per la reverberació de la sala. En el context del desenvolupament d'aplicacions de la parla en entorns de sales intel.ligents, l'ús de sensors que no siguin molestos és un requeriment habitual. És a dir, no està normalment permès o no és possible fer servir micròfons propers o de solapa, i per tant, les tecnologies de la parla desenvolupades han de basar-se en les senyals capturades per micròfons llunyans. En aquestes situacions, les tecnologies de la parla que habitualment funcionen raonablement bé en entorns lliures de soroll i reverberació pateixen una davallada dràstica en les seves prestacions.
En aquesta tesis s'investiguen mètodes multi micròfon per a solucionar els problemes que provoca l'ús de micròfons llunyans en les aplicacions de la parla que habitualment es desenvolupen en sales intel.ligents. Concretament, s'estudia el processament d'arrays de micròfons com a un mètode possible d'aprofitar la disponibilitat de múltiples micròfons per a obtenir senyals de veu millorades. Mitjançant la correcta combinació de les senyals que incideixen en una agrupació de micròfons, el processament d'arrays permet apuntar direccions espacials concretes a l'hora que altres es rebutgen.
Per a la millora de la parla amb arrays de micròfons, en la tesis es proposa l'ús d'un nou esquema robust de conformació que integra en només etapa un conformador adaptatiu i una etapa de post-filtrat de Wiener. Els resultats obtinguts mostren que el conformador proposat és una solució adequada per a entorns molt sorollosos i que, en general, és preferible a l'ús convencional d'etapes de post-filtrat a la sortida d'un conformador adaptatiu. No obstant això, el conformador mostra una certa degradació de la senyal de veu que pot afectar a la seva utilitat per a aplicacions de reconeixement de la parla, especialment quan el soroll no és massa important.
A continuació s'investiga l'ús específic d'arrays de micròfons per al reconeixement de la parla en entorns de sales intel.ligents. Es demostra que l'ús convencional d'arrays de micròfons per al reconeixement de la parla, que consisteix en la seva aplicació en dues etapes independents, no aporta una millora significativa respecte de l'ús de tècniques mono-canal, especialment, si el reconeixedor està adaptat a les condicions reals de l'entorn acústic. En la tesis es fa èmfasis en la necessitat de que el reconeixement de la parla incorpori informació de la conformació amb arrays de micròfons, o alternativament, que els conformadors incorporin informació del reconeixement de la parla. Més concretament, es proposa utilitzar les dades primer capturades per un array de micròfons i després processades per un conformador per a la construcció dels models acústics, per a d'aquesta manera, obtenir un major benefici dels arrays de micròfons. La aplicació del esquema proposat d'adaptació amb dades conformades d'un array, permet obtenir una millora considerable en un sistema de reconeixement depenent de locutor, mentre que en el cas d'un sistema independent de locutor només s'obté una millora molt limitada, degut en part a l'ús de dades d'array simulades.
Per altra banda, una limitació habitual del rocessament d'arrays de micròfons és que es necessita una estimació versemblant de la posició del locutor per a poder apuntar correctament cap a la posició d'interès. A més, el coneixement de la posició de les fonts acústiques que poden estar presents en una sala és una informació que pot ser aprofitada per altres serveis que es desenvolupen en les sales intel.ligents, com per exemple per a apuntar automàticament una càmara en vídeo-conferencies. Afortunadament, existeixen nombrosos mètodes que permeten sol.lucionar el problema del seguiment de fonts acústiques basant-se en les senyals capturades per múltiples micròfons.
Concretament, a la tesis es desenvolupa un sistema robust de localització de locutor basat en un dels algorismes actuals de major èxit que consisteix en computar la versemblança de cada possible posició basant-se en les estimacions de les correlaciones creuades generalitzades entre parelles de micròfons. El sistema proposat incorpora principalment dues novetats. Primer, les correlacions creuades es calculen de forma adaptativa basant-se en las velocitats estimades de les fonts. Aquest càlcul adaptatiu es realitza de manera que es minimitzi l'efecte de les diferents dinàmiques de les fonts presents en la sala en el resultat de la localització. Segon, es proposa l'ús d'un mètode accelerat per al càlcul de la posició basat en estratègies de cerca de menor a major resolució tant en el domini espacial com en el freqüencial. De fet, es mostra que la relació entre resolució espacial i l'ample de banda considerat en el càlcul de les correlacions creuades és un aspecte fonamental a tenir en compte en l'aplicació adequada d'aquest tipus d'estratègies ràpides. Les dues novetats comentades permeten que el sistema proposat assoleixi uns resultats raonablement bons quan s'evalúa en escenaris relativament controlats i amb pocs locutors que no se solapin. A més, la conveniència del sistema de localització acústica proposat queda de manifest si s'atenen els destacats resultats que es van obtenir en una evaluació internacional.
Finalment, a la tesis també s'estudia el problema de l'estimació de l'orientació del locutor en base a las senyals rebudes per múltiples micròfons, en el context del desenvolupament de noves tecnologies que poden aportar informació addicional per als sistemes que potencialment poden actuar en sales intel.ligents. En concret, es proposen i comparen dos mètodes completament diferents. Por una banda, mètodes sofisticats basats en l'estimació conjunta de la posició i de l'orientació permeten assolir estimacions acceptables a canvi d'un elevat cost computacional. Per altra banda, els mètodes més simples que es basen en consideracions sobre el diagrama de radiació de la parla encara que no són capaços d'assolir les prestacions dels mètodes sofisticats, també poden resultar adequats en alguns casos, como ara quan es coneix la posició amb antelació, o bé quan la despesa computacional està limitada. En tots dos casos, els resultats obtinguts permeten ser optimistes de cara al futur desenvolupament de nous algorismes adreçats a l'estimació de l'orientació del locutor.
Los avances recientes en tecnología informática y procesado del habla y del lenguaje, entre otros, han hecho posible que nuevos modos de comunicación entre las personas y las máquinas empiecen a parecer factibles. Concretamente, el interés en el desarrollo de nuevas aplicaciones en entornos cerrados equipados con múltiples sensores multimodales, también conocidos como salas inteligentes, ha aumentado considerablemente en los últimos tiempos.
En general, es bien sabido que la calidad de las señales de habla capturadas por micrófonos que pueden encontrarse a varios metros de distancia de los locutores se ve severamente degradada por el ruido acústico y por la reverberación de la sala. En el contexto del desarrollo de aplicaciones del habla en entornos de salas inteligentes, el uso de sensores que no sean molestos es un requisito habitual. Es decir, normalmente no está permitido o no es posible usar micrófonos cercanos o de solapa, y por lo tanto, las tecnologías del habla desarrolladas tienen que basarse en las señales capturadas por micrófonos lejanos. En estas situaciones, las tecnologías del habla que habitualmente funcionan razonablemente bien en entornos libres de ruido y reverberación sufren un descenso drástico en sus prestaciones.
En esta tesis se investigan métodos multi micrófono para solventar los problemas que provoca el uso de micrófonos lejanos en las aplicaciones del habla que habitualmente se desarrollan en salas inteligentes. Concretamente, se estudia el procesado de arrays de micrófonos como un método posible de aprovechar la disponibilidad de múltiples micrófonos para obtener señales de voz mejoradas. Mediante la correcta combinación de las señales que inciden en una agrupación de micrófonos, el procesado de arrays permite apuntar direcciones espaciales concretas a la vez que otras se rechazan.
Para la mejora del habla con arrays de micrófonos, en la tesis se propone el uso de un nuevo esquema robusto de conformación que integra en una sóla etapa un conformador adaptativo y una etapa de post-filtrado de Wiener. Los resultados obtenidos muestran que el conformador propuesto es una solución adecuada para entornos muy ruidosos y que, en general, es preferible al uso convencional de etapas de post-filtrado a la salida de un conformador adaptativo. Sin embargo, el conformador muestra cierta degradación de la señal de voz que puede afectar a su utilidad para aplicaciones de reconocimiento del habla, especialmente cuando el ruido no es demasiado importante.
A continuación se investiga el uso específico de arrays de micrófonos para el reconocimiento del habla en entornos de salas inteligentes. Se demuestra que el uso convencional de arrays de micrófonos para reconocimiento del habla, que consiste en su aplicación en dos etapas independientes, no aporta una mejora significativa respecto al uso de técnicas mono canal, especialmente, si el reconocedor está adaptado a las condiciones reales del entorno acústico. En la tesis se hace énfasis en la necesidad de que el reconocimiento del habla incorpore información de la conformación con arrays de micrófonos, o alternativamente, que los conformadores incorporen información del reconocimiento del habla. Más concretamente, se propone el uso de datos capturados por un array de micrófonos y luego procesados por un conformador para la construcción de los modelos acústicos, para de esta manera, obtener un mayor beneficio de los arrays. La aplicación del esquema propuesto de adaptación con datos conformados de un array de micrófonos permite obtener una mejora considerable en un sistema de reconocimiento dependiente de locutor, mientras que en el caso de un sistema independiente de locutor sólo se obtiene una mejora muy limitada, debido en parte al uso de datos de array simulados.
Por otro lado, una limitación habitual del procesado de arrays de micrófonos es que se necesita una estimación verosímil de la posición del locutor para poder apuntar correctamente hacia la posición de interés. Además, el conocimiento de la posición de las fuentes acústicas que puedan estar presentes en una sala es una información que puede ser aprovechada por otros servicios que se desarrollan en las salas inteligentes, como por ejemplo para apuntar automáticamente una cámara en vídeo-conferencias. Afortunadamente, existen numerosos métodos que permiten resolver el problema del seguimiento de fuentes acústicas basándose en las señales capturadas por múltiples micrófonos.
Concretamente, en la tesis se desarrolla un sistema robusto de localización de locutor basado en uno de los algoritmos actuales de mayor éxito consistente en el cómputo de la verosimilitud de cada posible posición basándose en las estimaciones de las correlaciones cruzadas generalizadas entre pares de micrófonos. El sistema propuesto incorpora principalmente dos novedades. Primero, las correlaciones cruzadas se calculan de forma adaptativa basándose en las velocidades estimadas de las fuentes. Este cálculo adaptativo se hace de manera que se minimice el efecto de las diferentes dinámicas de las fuentes presentes en la sala en el resultado de la localización. Segundo, se propone el uso de un método acelerado para el cálculo de la posición basado en estrategias de búsqueda de menor a mayor resolución tanto en el dominio espacial como frecuencial. De hecho, se muestra que la relación entre resolución espacial y el ancho de banda considerado en el cálculo de las correlaciones cruzadas es un aspecto fundamental a tener en cuenta en la aplicación adecuada de este tipo de estrategias rápidas. Las dos novedades comentadas permiten que el sistema propuesto alcance unos resultados razonablemente buenos cuando se evalúa en escenarios relativamente controlados y con pocos locutores que no se solapan. Además, la conveniencia del sistema de localización acústica propuesto queda de manifiesto si se atiende a los destacados resultados que se obtuvieron en una evaluación internacional.
Finalmente, en la tesis también se estudia el problema de la estimación de la orientación del locutor en base a las señales capturadas por múltiples micrófonos en el contexto del desarrollo de nuevas tecnologías que puedan aportar información adicional para los sistemas que potencialmente pueden actuar en salas inteligentes. En concreto, se proponen y comparan dos métodos completamente diferentes. Por un lado, métodos sofisticados basados en la estimación conjunta de la posición y de la orientación que permiten obtener estimaciones aceptables a cambio de un elevado coste computacional. Por otro lado, los métodos más simples que se basan en consideraciones sobre el diagrama de radiación del habla aunque no son capaces de igualar las prestaciones de los métodos sofisticados, también pueden resultar adecuados en algunos casos, como cuando se sabe la posición de antemano o cuando la complejidad computacional está limitada. En ambos casos, los resultados obtenidos permiten ser optimistas de cara al futuro desarrollo de nuevos algoritmos dedicados a la estimación de la orientación del locutor.
Recent advances in computer technology and speech and language processing have made possible that some new ways of person-machine communication and computer assistance to human activities start to appear feasible. Concretely, the interest on the development of new challenging applications in indoor environments equipped with multiple multimodal sensors, also known as smart-rooms, has considerably grown.
In general, it is well-known that the quality of speech signals captured by microphones that can be located several meters away from the speakers is severely distorted by acoustic noise and room reverberation. In the context of the development of hands-free speech applications in smart-room environments, the use of obtrusive sensors like close-talking microphones is usually not allowed, and consequently, speech technologies must operate on the basis of distant-talking recordings. In such conditions, speech technologies that usually perform reasonably well in free of noise and reverberation environments show a dramatically drop of performance.
In this thesis, the use of a multi-microphone approach to solve the problems introduced by far-field microphones in speech applications deployed in smart-rooms is investigated. Concretely, microphone array processing is investigated as a possible way to take advantage of the multi-microphone availability in order to obtain enhanced speech signals. Microphone array beamforming permits targeting concrete desired spatial directions while others are rejected, by means of the appropriate combination of the signals impinging a microphone array.
A new robust beamforming scheme that integrates an adaptive beamformer and a Wiener post-filter in a single stage is proposed for speech enhancement. Experimental results show that the proposed beamformer is an appropriate solution for high noise environments and that it is preferable to conventional post-filtering of the output of an adaptive beamformer. However, the beamformer introduces some distortion to the speech signal that can affect its usefulness for speech recognition applications, particularly in low noise conditions.
Then, the use of microphone arrays for specific speech recognition purposes in smart-room environments is investigated. It is shown that conventional microphone array based speech recognition, consisting on two independent stages, does not provide a significant improvement with respect to single microphone approaches, especially if the recognizer is adapted to the actual acoustic environmental conditions. In the thesis, it is pointed out that speech recognition needs to incorporate information about microphone array beamformers, or otherwise, beamformers need to incorporate speech recognition information. Concretely, it is proposed to use microphone array beamformed data for acoustic model construction in order to take more benefit from microphone arrays. The result obtained with the proposed adaptation scheme with beamformed enrollment data shows a remarkable improvement in a speaker dependent recognition system, while only a limited enhancement is achieved in a speaker independent recognition system, partially due to i
ii the use of simulated microphone array data.
On the other hand, a common limitation of microphone array processing is that a reliable speaker position estimation is needed to correctly steer the beamformer towards the position of interest. Additionally, knowledge about the location of the audio sources present in a room is information that can be exploited by other smart-room services, such as automatic video steering in conference applications. Fortunately, audio source tracking can be solved on the basis of multiple microphone captures by means of several different approaches.
In the thesis, a robust speaker tracking system is developed based on successful state of the art SRP-PHAT algorithm, which computes the likelihood of each potential source position on the basis of the generalized cross-correlation estimations between pairs of microphones. The proposed system mainly incorporates two novelties: firstly, cross-correlations are adaptively computed based on the estimated velocities of the sources. The adaptive computation permits minimizing the influence of the varying dynamics of the speakers present in a room on the overall localization performance. Secondly, an accelerated method for the computation of the source position based on coarse-to-fine search strategies in both spatial and frequency dimensionalities is proposed. It is shown that the relation between spatial resolution and cross-correlation bandwidth is a matter of major importance in this kind of fast search strategies. Experimental assessment shows that the two novelties introduced permit achieving a reasonably good tracking performance in relatively controlled environments with few non-overlapping speakers. Additionally, the remarkable results obtained by the proposed audio tracker in an international evaluation confirm the convenience of the algorithm developed.
Finally, in the context of the development of novel technologies that can provide additional cues of information to the potential services deployed in smart-room environments, acoustic head orientation estimation based on multiple microphones is also investigated in the thesis. Two completely different approaches are proposed and compared: on the one hand, sophisticated methods based on the joint estimation of speaker position and orientation are shown to provide a superior performance in exchange of large computational requirements. On the other hand, simple and computationally cheap approaches based on speech radiation considerations are suitable in some cases, such as when computational complexity is limited or when the source position is known beforehand. In both cases, the results obtained are encouraging for future research on the development of new algorithms addressed to the head orientation estimation problem.

APA, Harvard, Vancouver, ISO, and other styles

35

Žmolíková, Kateřina. "Far-Field Speech Recognition." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2016. http://www.nusl.cz/ntk/nusl-255331.

Full text

Abstract:

Systémy rozpoznávání řeči v dnešní době dosahují poměrně vysoké úspěšnosti. V případě řeči, která je snímána vzdáleným mikrofonem a je tak narušena množstvím šumu a dozvukem (reverberací), je ale přesnost rozpoznávání značně zhoršena. Tento problém je možné zmírnit využitím mikrofonních polí. Tato práce se zabývá technikami, které umožňují kombinovat signály z více mikrofonů tak, aby byla zlepšena kvalita výsledného signálu a tedy i přesnost rozpoznávání. Práce nejprve shrnuje teorii rozpoznávání řeči a uvádí nejpoužívanější algoritmy pro zpracování mikrofonních polí. Následně jsou demonstrovány a analyzovány výsledky použití dvou metod pro beamforming a metody dereverberace vícekanálových signálů. Na závěr je vyzkoušen alternativní způsob beamformingu za použití neuronových sítí.

APA, Harvard, Vancouver, ISO, and other styles

36

Shaffer, Irena Marie. "Effects of Echolocation Calls on the Interactions of Bat Pairs using Transfer Entropy Analysis." Thesis, Virginia Tech, 2020. http://hdl.handle.net/10919/98672.

Full text

Abstract:

Many animal species, including many species of bats, exhibit collective behavior where groups of individuals coordinate their motion. Most bats are unique among these animals in that they use the active sensing mechanism of echolocation as their primary means of navigation. Due to their use of echolocation in large groups, bats run the risk of signal interference from sonar jamming. However, several species of bats have developed various strategies to prevent interference which may lead to different behavior when flying with conspecifics than when flying alone. This thesis seeks to explore the role of this sensing on the behavior of bat pairs flying together. Field data from a maternity colony of gray bats (Myotis grisescens) were collected using an array of cameras and microphones. These data were analyzed using the information theoretic measure of transfer entropy in order to quantify the interaction between pairs of bats and to determine the effect echolocation calls have on this interaction. Results show that there is evidence of information transfer in both the speed of the bats and their turning behavior, and that such evidence is absent when we consider their heading directions. Unidirectional information transfer was found in some subsets of the data which could be evidence of a leader-follower interaction.
Master of Science
Manyanimalspeciesexhibitcollectivebehaviorwheregroupsofanimalscoordinatetheir motion, as in flocking or schooling. Many species of bats also demonstrate this behavior. Bats are unique among these animals in that they use echolocation as their primary means of navigation. Bats produce ultrasonic pulses or calls and listen to the returning echo to "visualize" their environment. Bats using echolocation in large groups run the risk of other bat calls interfering with their ability to hear their own calls. They have developed various waystopreventinterferencewhichmayleadtodifferentbehaviorwhenflyingwithotherbats thanwhenflyingalone. Fielddatafromamaternitycolonyofgraybatswerecollectedusing a system of cameras and microphones. These data were analyzed to quantify the interaction between pairs of bats and to determine the effect echolocation calls have on this interaction. Results show that there is evidence of information transfer about both the speed of the bats and their turning behavior. There was also evidence of a possible leader-follower interaction in some subsets of the data.

APA, Harvard, Vancouver, ISO, and other styles

37

Hoffman, Jeffrey Dean. "Using Blind Source Separation and a Compact Microphone Array to Improve the Error Rate of Speech Recognition." PDXScholar, 2016. https://pdxscholar.library.pdx.edu/open_access_etds/3367.

Full text

Abstract:

Automatic speech recognition has become a standard feature on many consumer electronics and automotive products, and the accuracy of the decoded speech has improved dramatically over time. Often, designers of these products achieve accuracy by employing microphone arrays and beamforming algorithms to reduce interference. However, beamforming microphone arrays are too large for small form factor products such as smart watches. Yet these small form factor products, which have precious little space for tactile user input (i.e. knobs, buttons and touch screens), would benefit immensely from a user interface based on reliably accurate automatic speech recognition. This thesis proposes a solution for interference mitigation that employs blind source separation with a compact array of commercially available unidirectional microphone elements. Such an array provides adequate spatial diversity to enable blind source separation and would easily fit in a smart watch or similar small form factor product. The solution is characterized using publicly available speech audio clips recorded for the purpose of testing automatic speech recognition algorithms. The proposal is modelled in different interference environments and the efficacy of the solution is evaluated. Factors affecting the performance of the solution are identified and their influence quantified. An expectation is presented for the quality of separation as well as the resulting improvement in word error rate that can be achieved from decoding the separated speech estimate versus the mixture obtained from a single unidirectional microphone element. Finally, directions for future work are proposed, which have the potential to improve the performance of the solution thereby making it a commercially viable product.

APA, Harvard, Vancouver, ISO, and other styles

38

Rasumow, Eugen [Verfasser], Simon [Akademischer Betreuer] Doclo, Matthias [Akademischer Betreuer] Blau, and Dorte [Akademischer Betreuer] Hammershoi. "Synthetic reproduction of head-related transfer functions by using microphone arrays / Eugen Rasumow. Betreuer: Simon Doclo ; Matthias Blau ; Dorte Hammershoi." Oldenburg : BIS der Universität Oldenburg, 2015. http://d-nb.info/1071947257/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

39

Bernschütz, Benjamin [Verfasser], Stefan [Akademischer Betreuer] Weinzierl, Stefan [Gutachter] Weinzierl, Christoph [Gutachter] Pörschmann, and Sascha [Gutachter] Spors. "Microphone arrays and sound field decomposition for dynamic binaural recording / Benjamin Bernschütz ; Gutachter: Stefan Weinzierl, Christoph Pörschmann, Sascha Spors ; Betreuer: Stefan Weinzierl." Berlin : Technische Universität Berlin, 2016. http://d-nb.info/1156013852/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

40

Kern, Alexander Marco. "Quantification of the performance of 3D sound field reconstruction algorithms using high-density loudspeaker arrays and 3rd order sound field microphone measurements." Thesis, Virginia Tech, 2017. http://hdl.handle.net/10919/77516.

Full text

Abstract:

The development and improvement of 3-D immersive audio is gaining momentum through the growing interest in virtual reality. Possible applications reach from recreating real world environments to immersive concerts and performances to exploiting big data acoustically. To improve the immersive experience several measures can be taken. The recording of the sound field, the spatialization and the development of the loudspeaker arrays are some of the greatest challenges. In this thesis, these challenges for improving immersive audio will be explored. First, there will be a short introduction about 3D audio and a review about the state of the art technology and research. Next, the thesis will provide an introduction to 3D loudspeaker arrays and describe the systems used during this research. Furthermore, the development of a new 16-element 3rd order sound field microphone will be described. Afterwards, different spatial audio algorithms such as higher order ambisonics, wave field synthesis and vector based amplitude panning will be described, analyzed and compared. For each spatialization algorithm, the quality of soundfield reproduction will be quantified using listener perception tests for clarity and sound source localization.
Master of Science

APA, Harvard, Vancouver, ISO, and other styles

41

Townsend, Phil. "Enhancements to the Generalized Sidelobe Canceller for Audio Beamforming in an Immersive Environment." UKnowledge, 2009. http://uknowledge.uky.edu/gradschool_theses/645.

Full text

Abstract:

The Generalized Sidelobe Canceller is an adaptive algorithm for optimally estimating the parameters for beamforming, the signal processing technique of combining data from an array of sensors to improve SNR at a point in space. This work focuses on the algorithm’s application to widely-separated microphone arrays with irregular distributions used for human voice capture. Methods are presented for improving the performance of the algorithm’s blocking matrix, a stage that creates a noise reference for elimination, by proposing a stochastic model for amplitude correction and enhanced use of cross correlation for phase correction and time-difference of arrival estimation via a correlation coefficient threshold. This correlation technique is also applied to a multilateration algorithm for an efficient method of explicit target tracking. In addition, the underlying microphone array geometry is studied with parameters and guidelines for evaluation proposed. Finally, an analysis of the stability of the system is performed with respect to its adaptation parameters.

APA, Harvard, Vancouver, ISO, and other styles

42

Otsuka, Takuma. "Bayesian Microphone Array Processing." 京都大学 (Kyoto University), 2014. http://hdl.handle.net/2433/188871.

Full text

Abstract:

Kyoto University (京都大学)
0048
新制・課程博士
博士(情報学)
甲第18412号
情博第527号
新制||情||93(附属図書館)
31270
京都大学大学院情報学研究科知能情報学専攻
(主査)教授奥乃博, 教授河原達也, 准教授 CUTURI CAMETO Marco, 講師吉井和佳
学位規則第4条第1項該当

APA, Harvard, Vancouver, ISO, and other styles

43

Cho, Jaeyoun. "Speech enhancement using microphone array." Columbus, Ohio : Ohio State University, 2005. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=osu1132239060.

Full text

APA, Harvard, Vancouver, ISO, and other styles

44

Bakir, Tariq Saad. "Blind adaptive dereverberation of speech signals using a microphone array." Diss., Available online, Georgia Institute of Technology, 2004:, 2004. http://etd.gatech.edu/theses/available/etd-06072004-131047/unrestricted/bakir%5Ftariq%5Fs%5F200405%5Fphd.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

45

Yu, Jingjing. "MICROPHONE ARRAY OPTIMIZATION IN IMMERSIVE ENVIRONMENTS." UKnowledge, 2013. http://uknowledge.uky.edu/ece_etds/19.

Full text

Abstract:

The complex relationship between array gain patterns and microphone distributions limits the application of traditional optimization algorithms on irregular arrays, which show enhanced beamforming performance for human speech capture in immersive environments. This work analyzes the relationship between irregular microphone geometries and spatial filtering performance with statistical methods. Novel geometry descriptors are developed to capture the properties of irregular microphone distributions showing their impact on array performance. General guidelines and optimization methods for regular and irregular array design are proposed in immersive (near-field) environments to obtain superior beamforming ability for speech applications. Optimization times are greatly reduced through the objective function rules using performance-based geometric descriptions of microphone distributions that circumvent direct array gain computations over the space of interest. In addition, probabilistic descriptions of acoustic scenes are introduced to incorporate various levels of prior knowledge for the source distribution. To verify the effectiveness of the proposed optimization methods, simulated gain patterns and real SNR results of the optimized arrays are compared to corresponding traditional regular arrays and arrays obtained from direct exhaustive searching methods. Results show large SNR enhancements for the optimized arrays over arbitrary randomly generated arrays and regular arrays, especially at low microphone densities. The rapid convergence and acceptable processing times observed during the experiments establish the feasibility of proposed optimization methods for array geometry design in immersive environments where rapid deployment is required with limited knowledge of the acoustic scene, such as in mobile platforms and audio surveillance applications.

APA, Harvard, Vancouver, ISO, and other styles

46

Hill, Jeffrey R. "Development of a Weatherproof Windscreen for a Microphone Array." Diss., CLICK HERE for online access, 2005. http://contentdm.lib.byu.edu/ETD/image/etd948.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

47

Furnon, Nicolas. "Apprentissage profond pour le rehaussement de la parole dans les antennes acoustiques ad-hoc." Electronic Thesis or Diss., Université de Lorraine, 2021. http://www.theses.fr/2021LORR0277.

Full text

Abstract:

Un grand nombre d’appareils que nous utilisons au quotidien embarque un ou plusieurs microphones afin de rendre possible leur utilisation par commande vocale. Le réseau de microphones que l’on peut former avec ces appareils est ce qu’on appelle une antenne acoustique ad-hoc (AAAH). Une étape de rehaussement de la parole est souvent appliquée afin d’optimiser l’exécution des commandes vocales. Pour cela, les AAAH, de par leur flexibilité d’utilisation, leur large étendue spatiale et la diversité de leurs enregistrements, offrent un grand potentiel. Ce potentiel est néanmoins difficilement exploitable à cause de la mobilité des appareils, leur faible puissance et les contraintes en bande passante. Ceslimites empêchent d’utiliser les algorithmes de rehaussement de la parole « classiques » qui reposent sur un nœud de fusion et requièrent de fortes puissances de calcul.Cette thèse propose de rallier le domaine de l’apprentissage profond à celui des AAAH, en conciliant la puissance de modélisation des réseaux de neurones (RN) à la flexibilité d’utilisation des AAAH. Pour cela, nous présentons un système distribué de rehaussement de la parole. Il est distribué en cela que la contrainte d’un centre de fusion est levée. Des signaux dits compressés, échangés entre les nœuds, permettent de véhiculer l’information spatiale tout en réduisant la consommation en bande passante. Des RN sont utilisés afin d’estimer les coefficients d’un filtre de Wiener multicanal. Une analyse empirique détaillée de ce système est conduite à la fois sur données synthétiques et sur données réelles afin de valider son efficacité et de mettre en évidence l’intérêt d’utiliser conjointement des RN et des algorithmes distribués classiques de rehaussement de la parole. Nous montrons ainsi que notre système obtient des performances équivalentes à celles de l’état de l’art, tout en étant plus flexible et en réduisant significativement la complexité algorithmique.Par ailleurs, nous développons notre solution pour l’adapter à des conditions d’utilisation propres aux AAAH. Nous étudions son comportement lorsque le nombre d’appareils de l’AAAH varie, et nous comparons l’influence de deux mécanismes d’attention, l’un d’attention spatiale et l’autre d’auto-attention. Les deux mécanismes d’attention rendent notre système résilient à un nombre variable d’appareils et les poids du mécanisme d’auto-attention révèlent l’utilité de l’information convoyée par chaque signal. Nous analysons également le comportement de notre système lorsque les signaux des différents appareils sont désynchronisés. Nous proposons une solution pour améliorer les performances de notre système en conditions asynchrones, en présentant un autre mécanisme d’attention. Nous montrons que ce mécanisme d’attention permet de retrouver un ordre de grandeur du décalage d’horloge entre les appareils d’une AAAH. Enfin, nous montrons que notre système est une solution viable pour la séparation de sources de parole. Même avec des RN d’architecture simple, il est capable d’exploiter efficacement l’information spatiale enregistrée par tous les appareils d’une AAAH dans une configuration typique de réunion
More and more devices we use in our daily life are embedded with one or more microphones so that they can be voice controlled. Put together, these devices can form a so-called ad-hoc microphone array (AHMA). A speech enhancement step is often applied on the recorded signals to optimise the execution of the voice commands. To this effect, AHMAs are of high interest because of their flexible usage, their wide spatial coverage and the diversity of their recordings. However, it is challenging to exploit the potential of mbox{AHMAs} because devices that compose them may move and have a limited power and bandwidth capacity. Because of these limits, the speech enhancement solutions deployed in ``classic'' microphone arrays, relying on a fusion center and high processing loads, cannot be afforded.This thesis combines the modelling power of deep neural networks (DNNs) with the flexibility of use of AHMAs. To this end, we introduce a distributed speech enhancement system, which does not rely on a fusion center. So-called compressed signals are sent among the nodes and convey the spatial information recorded by the whole AHMA, while reducing the bandwidth requirements. DNNs are used to estimate the coefficients of a multichannel Wiener filter. We conduct an empirical analysis of this sytem, both on synthesized and real data, in order to validate its efficiency and to highlight the benefits of jointly using DNNs and distributed speech enhancement algorithms. We show that our system performs comparatively well compared with a state-of-the-art solution, while being more flexible and significantly reducing the computation cost.Besides, we develop our solution to adapt it to the typical usage conditions of mbox{AHMAs}. We study its behaviour when the number of devices in the AHMA varies. We introduce and compare a spatial attention mechanism and a self-attention mechanism. Both mechanisms make our system robust to a varying number of devices. We show that the weights of the self-attention mechanism reveal the utility of the information carried by each signal.We also analyse our system when the signals recorded by different devices are not synchronised. We propose a solution to improve its performance in such conditions by introducing a temporal attention mechanism. We show that this mechanism can help estimating the sampling time offset between the several devices of the AHMA.Lastly, we show that our system is also efficient for source separation. It can efficiently process the spatial information recorded by the whole AHMA in a typical meeting scenario and alleviate the needs of a complex DNN architecture

APA, Harvard, Vancouver, ISO, and other styles

48

Legg, Mathew. "Microphone phased array 3D beamforming and deconvolution." Thesis, University of Auckland, 2012. http://hdl.handle.net/2292/17820.

Full text

Abstract:

Microphone phased arrays are used to generate acoustic maps showing the position and Magnitude of sound sources. Deconvolution of these acoustic maps, which are generated using beamforming, is commonly performed to remove sidelobe artifacts so that it is possible to accurately describe the position and magnitude of the sound source distribution. Traditionally beamforming and deconvolution have used a 2D scanning surface, which is orientated perpendicular to the array axis, but errors can arise when imaging 3D objects. The work in this thesis investigates the use of a deconvolution algorithm for 3D beamformed maps and compares the results to those obtained using traditional 2D acoustic scanning surfaces. Microphone phased array hardware and 3D objects were designed and built. Acoustic maps were generated by attaching mini speakers onto the surface of an object and performing beamforming and deconvolution for both traditional 2D scanning surfaces and 3D scanning surfaces corresponding to the 3D surface geometry of the object. The 3D surface geometry was obtained using computer vision techniques. For more complex objects or where no CAD model of the object exists, structured light scanning was used to obtain an accurate scan of the 3D surface of the object. The scan points obtained using the above two methods were in the reference frame of the primary optical camera in the array. To enable these scan points to be used for beamforming and deconvolution, a microphone position calibration technique was developed which automatically found the coordinates of the microphones, in the reference frame of the primary camera in the array, using computer vision techniques and audio time of flight measurements. This technique was extended to enable dense point clouds of experimental deconvolution errors to be automatically obtained as a function of the frequency and location of the sound sources. These point clouds were used to analyse the deconvolution errors for 3D and traditional 2D scanning surfaces. The data obtained showed that using the 3D scanning surface corresponding to the surface geometry of the object gave more accurate sound pressure levels and, at higher frequencies, more accurate positioning of sound sources than the 2D case.

APA, Harvard, Vancouver, ISO, and other styles

49

Zeng, Qingning. "Speech enhancement using a small microphone array." Thesis, University of Auckland, 2010. http://hdl.handle.net/2292/5690.

Full text

Abstract:

Microphone array based speech enhancement has wide applications. However, a big array aperture may greatly limit its applications. The research for small microphone array based speech enhancement has great value. Yet it is a challenging objective. In this thesis, some algorithms and methods for small microphone array based speech enhancement are proposed first. Then two main algorithms that synthesize the proposed algorithms and methods are presented. Firstly, the Multichannel Crosstalk Resistant Adaptive Noise Cancellation (MCRANC) algorithm is proposed. The algorithm employs only two adaptive filters. It has good stability and low computational complexity. Secondly, three combined algorithms of MCRANC with other existing algorithms are presented. One combined algorithm is the cascade of MCRANC with improved spectrum subtraction. The second is the combination with delay and sum beamforming, and the third is the combination with Weiner post-filtering. These combined algorithms may achieve better results than any one algorithm alone. Thirdly, four improvements are made for MCRANC itself. One is to improve MCRANC with multichannel distorted signals filtering for its second-stage filter. Another improvement is to employ multiple sampling rates for the array signals. The third is to add fixed beamforming and use partial-channel processing in MCRANC. The fourth improvement is to introduce subband processing to MCRANC. Fourthly, two improved MGSC algorithms based on multichannel crosstalk resistant adaptive signal cancellation (MCRASC) are proposed. One is to use every channel of the array signal as the main channel signal and others as the referential signals for an ACRASC to get the noise estimations. The other is to obtain the noise estimations by establishing a shared distorted speech signal. It is proved that the essence of the two proposed improved MGSC algorithms is to extend the common blocking matrix to a time-variable vector blocking matrix. Finally, two hybrid algorithms that employ several of the above-mentioned algorithms and methods are presented. Both of them can be used for different environments and both are suitable for real-time implementation. For all of the algorithms, simulations and experiments are made to verify their effectiveness.

APA, Harvard, Vancouver, ISO, and other styles

50

Greenberg, Julie Elise. "Improved design of microphone-array hearing aids." Thesis, Massachusetts Institute of Technology, 1994. http://hdl.handle.net/1721.1/11631.

Full text

Abstract:

Thesis (Ph. D.)--Massachusetts Institute of Technology, Whitaker College of Health Sciences and Technology, 1994.
Includes bibliographical references (p. 197-204).
by Julie Elise Greenberg.
Ph.D.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!