Dissertations / Theses on the topic 'Microphone arrays'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Microphone arrays.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Gillett, Philip Winslow. "Head Mounted Microphone Arrays." Diss., Virginia Tech, 2009. http://hdl.handle.net/10919/28867.
Full textPh. D.
Barnes, Hugh. "Speech enhancement using microphone arrays." Thesis, Imperial College London, 2004. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.409581.
Full textLustberg, Robert Jack. "Acoustic beamforming using microphone arrays." Thesis, Massachusetts Institute of Technology, 1993. http://hdl.handle.net/1721.1/12338.
Full textIncludes bibliographical references (leaves 71-72).
by Robert Jack Lustberg.
M.S.
Mošner, Ladislav. "Microphone Arrays for Speaker Recognition." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2017. http://www.nusl.cz/ntk/nusl-363803.
Full textMoore, Darren C. "Speech enhancement using microphone arrays." Thesis, Queensland University of Technology, 2000. https://eprints.qut.edu.au/36141/1/36141_Moore_2000.pdf.
Full textRyan, James G. "Near-field beamforming using microphone arrays." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1999. http://www.collectionscanada.ca/obj/s4/f2/dsk1/tape8/PQDD_0015/NQ48335.pdf.
Full textGoh, Boon Aik. "Adaptive subband beamforming for microphone arrays." Thesis, University of Leeds, 2005. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.424021.
Full textCardoso, Clara Ferreira. "Signal processing for circular microphone arrays." Thesis, University of Southampton, 2007. https://eprints.soton.ac.uk/421465/.
Full textMcCowan, Iain A. "Robust speech recognition using microphone arrays." Thesis, Queensland University of Technology, 2001.
Find full textHua, Thanh Phong. "Adaptation mode controllers for adaptive microphone arrays." Rennes 1, 2006. http://www.theses.fr/2006REN1S136.
Full textHimawan, Ivan. "Speech recognition using ad-hoc microphone arrays." Thesis, Queensland University of Technology, 2010. https://eprints.qut.edu.au/34461/1/Ivan_Himawan_Thesis.pdf.
Full textVarada, Vijay K. "Acoustic Localization Employing Polar Directivity Patterns of Bidirectional Microphones Enabling Minimum Aperture Microphone Arrays." University of Toledo / OhioLINK, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=toledo1290118825.
Full textHughes, Ashley. "Acoustic source localisation and tracking using microphone arrays." Thesis, University of Edinburgh, 2016. http://hdl.handle.net/1842/19524.
Full textKuntz, Achim. "Wave field analysis using virtual circular microphone arrays." München Verl. Dr. Hut, 2008. http://d-nb.info/993260292/04.
Full textCohen, Zachary Gideon. "Noise Reduction with Microphone Arrays for Speaker Identification." DigitalCommons@CalPoly, 2012. https://digitalcommons.calpoly.edu/theses/884.
Full textChakraborty, Rupayan. "Acoustic event detection and localization using distributed microphone arrays." Doctoral thesis, Universitat Politècnica de Catalunya, 2013. http://hdl.handle.net/10803/134364.
Full textHuang, Yiteng (Arden). "Real-time acoustic source localization with passive microphone arrays." Diss., Georgia Institute of Technology, 2001. http://hdl.handle.net/1853/15024.
Full textScharrer, Roman [Verfasser]. "Acoustic field analysis in small microphone arrays / Roman Scharrer." Aachen : Hochschulbibliothek der Rheinisch-Westfälischen Technischen Hochschule Aachen, 2014. http://d-nb.info/1050618939/34.
Full textTontiwattanakul, Khemapat. "Signal processing for microphone arrays with novel geometrical design." Thesis, University of Southampton, 2016. https://eprints.soton.ac.uk/400599/.
Full textAllred, Daniel Jackson. "Evaluation and Comparison of Beamforming Algorithms for Microphone Array Speech Processing." Thesis, Georgia Institute of Technology, 2006. http://hdl.handle.net/1853/11606.
Full textJasti, Srichandana. "Design of randomly placed microphone array." Birmingham, Ala. : University of Alabama at Birmingham, 2006. http://www.mhsl.uab.edu/dt/2006m/jasti.pdf.
Full textRoper, Simon Edward. "A room acoustics measurement system using non-invasive microphone arrays." Thesis, University of Birmingham, 2010. http://etheses.bham.ac.uk//id/eprint/891/.
Full textAchi, Peter Y. "Speech Enhancement Techniques for Large Space Habitats Using Microphone Arrays." Thesis, University of Louisiana at Lafayette, 2019. http://pqdtopen.proquest.com/#viewpdf?dispub=10813016.
Full textThe astronauts? ability to communicate easily among themselves or with the ship?s computer should be a high priority for the success of missions. Long-duration space habitats--whether spaceships or surface bases--will likely be larger than present-day Earth-to-orbit/Moon transfer ships. Hence an efficient approach would be to free the crew members from the relative burden of having to wear headsets throughout the spacecraft. This can be achieved by placing microphone arrays in all crew-accessible parts of the habitat. Processing algorithms would first localize the speaker and then perform speech enhancement. The background "noise" in a spacecraft is typically fan and duct noise (hum, drone), valve opening/closing (click, hiss), pumps, etc. We simulate such interfering sources by a number of loudspeakers broadcasting various sounds: real ISS sounds, a continuous radio stream, and a poem read by one author. To test the concept, we use a linear 30-microphone array driven by a zero-latency professional audio interface. Speaker localization is obtained by time-domain processing. To enhance the speech-to-noise ratio, a frequency-domain minimum-variance approach is used.
Morgan, Joshua P. "Time-Frequency Masking Performance for Improved Intelligibility with Microphone Arrays." UKnowledge, 2017. http://uknowledge.uky.edu/ece_etds/101.
Full textNoohi, Tahereh. "Sound Field Decomposition with Spherical Microphone Arrays Using Sparse Recovery Techniques." Thesis, The University of Sydney, 2016. http://hdl.handle.net/2123/16102.
Full textHart, Patrick Hammel. "FPAA realization of a controlled directional microphone." Diss., Online access via UMI:, 2009.
Find full textIncludes bibliographical references.
Abhayapala, P. Thushara D., and Thushara Abhayapala@anu edu au. "Modal Analysis and Synthesis of Broadband Nearfield Beamforming Arrays." The Australian National University. Telecommunications Engineering Group, 2000. http://thesis.anu.edu.au./public/adt-ANU20010905.121231.
Full textTeutsch, Heinz. "Wavefield decomposition using microphone arrays and its application to acoustic scene analysis." [S.l.] : [s.n.], 2006. http://deposit.ddb.de/cgi-bin/dokserv?idn=97902806X.
Full textMatzumoto, Andres Esteban Perez. "A study of microphone arrays for the location of vibrational sound sources." Thesis, University of Southampton, 1991. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.305576.
Full textUnnikrishnan, Harikrishnan. "AUDIO SCENE SEGEMENTATION USING A MICROPHONE ARRAY AND AUDITORY FEATURES." UKnowledge, 2010. http://uknowledge.uky.edu/gradschool_theses/622.
Full textMassé, Pierre. "Analysis, Treatment, and Manipulation Methods for Spatial Room Impulse Responses Measured with Spherical Microphone Arrays." Electronic Thesis or Diss., Sorbonne université, 2022. http://www.theses.fr/2022SORUS079.
Full textThe use of spatial room impulse responses (SRIR) for the reproduction of three-dimensional reverberation effects through multi-channel convolution over immersive surround-sound loudspeaker systems has become commonplace within the last few years, thanks in large part to the commercial availability of various spherical microphone arrays (SMA) as well as a constant increase in computing power. This use has in turn created a demand for analysis and treatment techniques not only capable of ensuring the faithful reproduction of the measured reverberation effect, but which could also be used to control various modifications of the SRIR in a more "creative" approach, as is often encountered in the production of immersive musical performances and installations. Within this context, the principal objective of the current thesis is the definition of a complete space-time-frequency framework for the analysis, treatment, and manipulation of SRIRs. The analysis tools should lead to an in-depth model allowing for measurements to first be treated with respect to their inherent limitations (measurement conditions, background noise, etc.), as well as offering the ability to modify different characteristics of the final reverberation effect described by the SRIR. These characteristics can be either completely objective, even physical, or otherwise informed by knowledge of human auditory perception with regard to room acoustics. The theoretical work in this research project is therefore presented in two main parts. First, the underlying SRIR signal model is described, heavily inspired by the historical approaches from the fields of artificial reverberation synthesis and SMA signal processing, while at the same time (incrementally) extending both. The signal model is then used to define the analysis methods that form the core of the final framework; these focus particularly on (a) identifying the "mixing time" that defines the moment of transition between the early reflection and late reverberation regimes, (b) obtaining a space-time cartography of the early reflections, and (c) estimating the frequency- and direction-dependent properties of the late reverberation's exponential energy decay envelope. In order to account for the directional dependence of these properties, a procedure for generating directional SRIR representations (i.e. directional room impulse responses, DRIR) that guarantee the preservation of certain fundamental reverberation properties must also be defined. In the second part, the model parameters made explicit by the analysis methods are exploited in order to either treat (i.e. attempt to correct some of the inevitable limitations inherent to the SMA measurement process) or more creatively manipulate and modify the SRIR. Two treatment methods in particular are developed in this thesis: (1) a pre-analysis procedure acting directly on repeated exponential sweep method (ESM) SMA measurement signals in an attempt to simultaneously increase the resulting SRIR's signal-to-noise ratio (SNR) while reducing its vulnerability to non-stationary noise events, and (2) a post-analysis denoising technique based on replacing the SRIR's background noise floor with a resynthesized extrapolation of the late reverberation tail. The theoretical descriptions thus complete, the main analysis methods as well as the DRIR generation and the denoising treatment procedures are then subjected to a series of validation tests, wherein simulated SRIRs (or parts thereof) are used to evaluate the performance, discuss the limitations, and parameterize the implementation of the different techniques. These sub-studies allow each method to be individually verified, resulting in a comprehensive investigation into the inner workings of the analysis toolbox (as well as the denoising process). Finally, to provide a concluding overview of the complete analysis-treatment-manipulation framework, similar studies are carried out using examples of real-world [...]
Gergen, Sebastian [Verfasser], Rainer [Akademischer Betreuer] Martin, and Simon [Akademischer Betreuer] Doclo. "Classification of audio sources using ad-hoc microphone arrays / Sebastian Gergen. Gutachter: Rainer Martin ; Simon Doclo." Bochum : Ruhr-Universität Bochum, 2016. http://d-nb.info/1089006322/34.
Full textKoutrouli, Eleni. "Low Complexity Beamformer structures for application in Hearing Aids." Thesis, Blekinge Tekniska Högskola, Institutionen för tillämpad signalbehandling, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-17612.
Full textAbad, Gareta Alberto. "A multi-microphone approach to speech processing in a smart-room environment." Doctoral thesis, Universitat Politècnica de Catalunya, 2007. http://hdl.handle.net/10803/6906.
Full textEn general, és ben conegut que la qualitat de les senyals de la parla capturades per micròfons que poden trobar-se a diversos metros de distància dels locutors es veu severament degradada pel soroll acústic i per la reverberació de la sala. En el context del desenvolupament d'aplicacions de la parla en entorns de sales intel.ligents, l'ús de sensors que no siguin molestos és un requeriment habitual. És a dir, no està normalment permès o no és possible fer servir micròfons propers o de solapa, i per tant, les tecnologies de la parla desenvolupades han de basar-se en les senyals capturades per micròfons llunyans. En aquestes situacions, les tecnologies de la parla que habitualment funcionen raonablement bé en entorns lliures de soroll i reverberació pateixen una davallada dràstica en les seves prestacions.
En aquesta tesis s'investiguen mètodes multi micròfon per a solucionar els problemes que provoca l'ús de micròfons llunyans en les aplicacions de la parla que habitualment es desenvolupen en sales intel.ligents. Concretament, s'estudia el processament d'arrays de micròfons com a un mètode possible d'aprofitar la disponibilitat de múltiples micròfons per a obtenir senyals de veu millorades. Mitjançant la correcta combinació de les senyals que incideixen en una agrupació de micròfons, el processament d'arrays permet apuntar direccions espacials concretes a l'hora que altres es rebutgen.
Per a la millora de la parla amb arrays de micròfons, en la tesis es proposa l'ús d'un nou esquema robust de conformació que integra en només etapa un conformador adaptatiu i una etapa de post-filtrat de Wiener. Els resultats obtinguts mostren que el conformador proposat és una solució adequada per a entorns molt sorollosos i que, en general, és preferible a l'ús convencional d'etapes de post-filtrat a la sortida d'un conformador adaptatiu. No obstant això, el conformador mostra una certa degradació de la senyal de veu que pot afectar a la seva utilitat per a aplicacions de reconeixement de la parla, especialment quan el soroll no és massa important.
A continuació s'investiga l'ús específic d'arrays de micròfons per al reconeixement de la parla en entorns de sales intel.ligents. Es demostra que l'ús convencional d'arrays de micròfons per al reconeixement de la parla, que consisteix en la seva aplicació en dues etapes independents, no aporta una millora significativa respecte de l'ús de tècniques mono-canal, especialment, si el reconeixedor està adaptat a les condicions reals de l'entorn acústic. En la tesis es fa èmfasis en la necessitat de que el reconeixement de la parla incorpori informació de la conformació amb arrays de micròfons, o alternativament, que els conformadors incorporin informació del reconeixement de la parla. Més concretament, es proposa utilitzar les dades primer capturades per un array de micròfons i després processades per un conformador per a la construcció dels models acústics, per a d'aquesta manera, obtenir un major benefici dels arrays de micròfons. La aplicació del esquema proposat d'adaptació amb dades conformades d'un array, permet obtenir una millora considerable en un sistema de reconeixement depenent de locutor, mentre que en el cas d'un sistema independent de locutor només s'obté una millora molt limitada, degut en part a l'ús de dades d'array simulades.
Per altra banda, una limitació habitual del rocessament d'arrays de micròfons és que es necessita una estimació versemblant de la posició del locutor per a poder apuntar correctament cap a la posició d'interès. A més, el coneixement de la posició de les fonts acústiques que poden estar presents en una sala és una informació que pot ser aprofitada per altres serveis que es desenvolupen en les sales intel.ligents, com per exemple per a apuntar automàticament una càmara en vídeo-conferencies. Afortunadament, existeixen nombrosos mètodes que permeten sol.lucionar el problema del seguiment de fonts acústiques basant-se en les senyals capturades per múltiples micròfons.
Concretament, a la tesis es desenvolupa un sistema robust de localització de locutor basat en un dels algorismes actuals de major èxit que consisteix en computar la versemblança de cada possible posició basant-se en les estimacions de les correlaciones creuades generalitzades entre parelles de micròfons. El sistema proposat incorpora principalment dues novetats. Primer, les correlacions creuades es calculen de forma adaptativa basant-se en las velocitats estimades de les fonts. Aquest càlcul adaptatiu es realitza de manera que es minimitzi l'efecte de les diferents dinàmiques de les fonts presents en la sala en el resultat de la localització. Segon, es proposa l'ús d'un mètode accelerat per al càlcul de la posició basat en estratègies de cerca de menor a major resolució tant en el domini espacial com en el freqüencial. De fet, es mostra que la relació entre resolució espacial i l'ample de banda considerat en el càlcul de les correlacions creuades és un aspecte fonamental a tenir en compte en l'aplicació adequada d'aquest tipus d'estratègies ràpides. Les dues novetats comentades permeten que el sistema proposat assoleixi uns resultats raonablement bons quan s'evalúa en escenaris relativament controlats i amb pocs locutors que no se solapin. A més, la conveniència del sistema de localització acústica proposat queda de manifest si s'atenen els destacats resultats que es van obtenir en una evaluació internacional.
Finalment, a la tesis també s'estudia el problema de l'estimació de l'orientació del locutor en base a las senyals rebudes per múltiples micròfons, en el context del desenvolupament de noves tecnologies que poden aportar informació addicional per als sistemes que potencialment poden actuar en sales intel.ligents. En concret, es proposen i comparen dos mètodes completament diferents. Por una banda, mètodes sofisticats basats en l'estimació conjunta de la posició i de l'orientació permeten assolir estimacions acceptables a canvi d'un elevat cost computacional. Per altra banda, els mètodes més simples que es basen en consideracions sobre el diagrama de radiació de la parla encara que no són capaços d'assolir les prestacions dels mètodes sofisticats, també poden resultar adequats en alguns casos, como ara quan es coneix la posició amb antelació, o bé quan la despesa computacional està limitada. En tots dos casos, els resultats obtinguts permeten ser optimistes de cara al futur desenvolupament de nous algorismes adreçats a l'estimació de l'orientació del locutor.
Los avances recientes en tecnología informática y procesado del habla y del lenguaje, entre otros, han hecho posible que nuevos modos de comunicación entre las personas y las máquinas empiecen a parecer factibles. Concretamente, el interés en el desarrollo de nuevas aplicaciones en entornos cerrados equipados con múltiples sensores multimodales, también conocidos como salas inteligentes, ha aumentado considerablemente en los últimos tiempos.
En general, es bien sabido que la calidad de las señales de habla capturadas por micrófonos que pueden encontrarse a varios metros de distancia de los locutores se ve severamente degradada por el ruido acústico y por la reverberación de la sala. En el contexto del desarrollo de aplicaciones del habla en entornos de salas inteligentes, el uso de sensores que no sean molestos es un requisito habitual. Es decir, normalmente no está permitido o no es posible usar micrófonos cercanos o de solapa, y por lo tanto, las tecnologías del habla desarrolladas tienen que basarse en las señales capturadas por micrófonos lejanos. En estas situaciones, las tecnologías del habla que habitualmente funcionan razonablemente bien en entornos libres de ruido y reverberación sufren un descenso drástico en sus prestaciones.
En esta tesis se investigan métodos multi micrófono para solventar los problemas que provoca el uso de micrófonos lejanos en las aplicaciones del habla que habitualmente se desarrollan en salas inteligentes. Concretamente, se estudia el procesado de arrays de micrófonos como un método posible de aprovechar la disponibilidad de múltiples micrófonos para obtener señales de voz mejoradas. Mediante la correcta combinación de las señales que inciden en una agrupación de micrófonos, el procesado de arrays permite apuntar direcciones espaciales concretas a la vez que otras se rechazan.
Para la mejora del habla con arrays de micrófonos, en la tesis se propone el uso de un nuevo esquema robusto de conformación que integra en una sóla etapa un conformador adaptativo y una etapa de post-filtrado de Wiener. Los resultados obtenidos muestran que el conformador propuesto es una solución adecuada para entornos muy ruidosos y que, en general, es preferible al uso convencional de etapas de post-filtrado a la salida de un conformador adaptativo. Sin embargo, el conformador muestra cierta degradación de la señal de voz que puede afectar a su utilidad para aplicaciones de reconocimiento del habla, especialmente cuando el ruido no es demasiado importante.
A continuación se investiga el uso específico de arrays de micrófonos para el reconocimiento del habla en entornos de salas inteligentes. Se demuestra que el uso convencional de arrays de micrófonos para reconocimiento del habla, que consiste en su aplicación en dos etapas independientes, no aporta una mejora significativa respecto al uso de técnicas mono canal, especialmente, si el reconocedor está adaptado a las condiciones reales del entorno acústico. En la tesis se hace énfasis en la necesidad de que el reconocimiento del habla incorpore información de la conformación con arrays de micrófonos, o alternativamente, que los conformadores incorporen información del reconocimiento del habla. Más concretamente, se propone el uso de datos capturados por un array de micrófonos y luego procesados por un conformador para la construcción de los modelos acústicos, para de esta manera, obtener un mayor beneficio de los arrays. La aplicación del esquema propuesto de adaptación con datos conformados de un array de micrófonos permite obtener una mejora considerable en un sistema de reconocimiento dependiente de locutor, mientras que en el caso de un sistema independiente de locutor sólo se obtiene una mejora muy limitada, debido en parte al uso de datos de array simulados.
Por otro lado, una limitación habitual del procesado de arrays de micrófonos es que se necesita una estimación verosímil de la posición del locutor para poder apuntar correctamente hacia la posición de interés. Además, el conocimiento de la posición de las fuentes acústicas que puedan estar presentes en una sala es una información que puede ser aprovechada por otros servicios que se desarrollan en las salas inteligentes, como por ejemplo para apuntar automáticamente una cámara en vídeo-conferencias. Afortunadamente, existen numerosos métodos que permiten resolver el problema del seguimiento de fuentes acústicas basándose en las señales capturadas por múltiples micrófonos.
Concretamente, en la tesis se desarrolla un sistema robusto de localización de locutor basado en uno de los algoritmos actuales de mayor éxito consistente en el cómputo de la verosimilitud de cada posible posición basándose en las estimaciones de las correlaciones cruzadas generalizadas entre pares de micrófonos. El sistema propuesto incorpora principalmente dos novedades. Primero, las correlaciones cruzadas se calculan de forma adaptativa basándose en las velocidades estimadas de las fuentes. Este cálculo adaptativo se hace de manera que se minimice el efecto de las diferentes dinámicas de las fuentes presentes en la sala en el resultado de la localización. Segundo, se propone el uso de un método acelerado para el cálculo de la posición basado en estrategias de búsqueda de menor a mayor resolución tanto en el dominio espacial como frecuencial. De hecho, se muestra que la relación entre resolución espacial y el ancho de banda considerado en el cálculo de las correlaciones cruzadas es un aspecto fundamental a tener en cuenta en la aplicación adecuada de este tipo de estrategias rápidas. Las dos novedades comentadas permiten que el sistema propuesto alcance unos resultados razonablemente buenos cuando se evalúa en escenarios relativamente controlados y con pocos locutores que no se solapan. Además, la conveniencia del sistema de localización acústica propuesto queda de manifiesto si se atiende a los destacados resultados que se obtuvieron en una evaluación internacional.
Finalmente, en la tesis también se estudia el problema de la estimación de la orientación del locutor en base a las señales capturadas por múltiples micrófonos en el contexto del desarrollo de nuevas tecnologías que puedan aportar información adicional para los sistemas que potencialmente pueden actuar en salas inteligentes. En concreto, se proponen y comparan dos métodos completamente diferentes. Por un lado, métodos sofisticados basados en la estimación conjunta de la posición y de la orientación que permiten obtener estimaciones aceptables a cambio de un elevado coste computacional. Por otro lado, los métodos más simples que se basan en consideraciones sobre el diagrama de radiación del habla aunque no son capaces de igualar las prestaciones de los métodos sofisticados, también pueden resultar adecuados en algunos casos, como cuando se sabe la posición de antemano o cuando la complejidad computacional está limitada. En ambos casos, los resultados obtenidos permiten ser optimistas de cara al futuro desarrollo de nuevos algoritmos dedicados a la estimación de la orientación del locutor.
Recent advances in computer technology and speech and language processing have made possible that some new ways of person-machine communication and computer assistance to human activities start to appear feasible. Concretely, the interest on the development of new challenging applications in indoor environments equipped with multiple multimodal sensors, also known as smart-rooms, has considerably grown.
In general, it is well-known that the quality of speech signals captured by microphones that can be located several meters away from the speakers is severely distorted by acoustic noise and room reverberation. In the context of the development of hands-free speech applications in smart-room environments, the use of obtrusive sensors like close-talking microphones is usually not allowed, and consequently, speech technologies must operate on the basis of distant-talking recordings. In such conditions, speech technologies that usually perform reasonably well in free of noise and reverberation environments show a dramatically drop of performance.
In this thesis, the use of a multi-microphone approach to solve the problems introduced by far-field microphones in speech applications deployed in smart-rooms is investigated. Concretely, microphone array processing is investigated as a possible way to take advantage of the multi-microphone availability in order to obtain enhanced speech signals. Microphone array beamforming permits targeting concrete desired spatial directions while others are rejected, by means of the appropriate combination of the signals impinging a microphone array.
A new robust beamforming scheme that integrates an adaptive beamformer and a Wiener post-filter in a single stage is proposed for speech enhancement. Experimental results show that the proposed beamformer is an appropriate solution for high noise environments and that it is preferable to conventional post-filtering of the output of an adaptive beamformer. However, the beamformer introduces some distortion to the speech signal that can affect its usefulness for speech recognition applications, particularly in low noise conditions.
Then, the use of microphone arrays for specific speech recognition purposes in smart-room environments is investigated. It is shown that conventional microphone array based speech recognition, consisting on two independent stages, does not provide a significant improvement with respect to single microphone approaches, especially if the recognizer is adapted to the actual acoustic environmental conditions. In the thesis, it is pointed out that speech recognition needs to incorporate information about microphone array beamformers, or otherwise, beamformers need to incorporate speech recognition information. Concretely, it is proposed to use microphone array beamformed data for acoustic model construction in order to take more benefit from microphone arrays. The result obtained with the proposed adaptation scheme with beamformed enrollment data shows a remarkable improvement in a speaker dependent recognition system, while only a limited enhancement is achieved in a speaker independent recognition system, partially due to i
ii the use of simulated microphone array data.
On the other hand, a common limitation of microphone array processing is that a reliable speaker position estimation is needed to correctly steer the beamformer towards the position of interest. Additionally, knowledge about the location of the audio sources present in a room is information that can be exploited by other smart-room services, such as automatic video steering in conference applications. Fortunately, audio source tracking can be solved on the basis of multiple microphone captures by means of several different approaches.
In the thesis, a robust speaker tracking system is developed based on successful state of the art SRP-PHAT algorithm, which computes the likelihood of each potential source position on the basis of the generalized cross-correlation estimations between pairs of microphones. The proposed system mainly incorporates two novelties: firstly, cross-correlations are adaptively computed based on the estimated velocities of the sources. The adaptive computation permits minimizing the influence of the varying dynamics of the speakers present in a room on the overall localization performance. Secondly, an accelerated method for the computation of the source position based on coarse-to-fine search strategies in both spatial and frequency dimensionalities is proposed. It is shown that the relation between spatial resolution and cross-correlation bandwidth is a matter of major importance in this kind of fast search strategies. Experimental assessment shows that the two novelties introduced permit achieving a reasonably good tracking performance in relatively controlled environments with few non-overlapping speakers. Additionally, the remarkable results obtained by the proposed audio tracker in an international evaluation confirm the convenience of the algorithm developed.
Finally, in the context of the development of novel technologies that can provide additional cues of information to the potential services deployed in smart-room environments, acoustic head orientation estimation based on multiple microphones is also investigated in the thesis. Two completely different approaches are proposed and compared: on the one hand, sophisticated methods based on the joint estimation of speaker position and orientation are shown to provide a superior performance in exchange of large computational requirements. On the other hand, simple and computationally cheap approaches based on speech radiation considerations are suitable in some cases, such as when computational complexity is limited or when the source position is known beforehand. In both cases, the results obtained are encouraging for future research on the development of new algorithms addressed to the head orientation estimation problem.
Žmolíková, Kateřina. "Far-Field Speech Recognition." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2016. http://www.nusl.cz/ntk/nusl-255331.
Full textShaffer, Irena Marie. "Effects of Echolocation Calls on the Interactions of Bat Pairs using Transfer Entropy Analysis." Thesis, Virginia Tech, 2020. http://hdl.handle.net/10919/98672.
Full textMaster of Science
Manyanimalspeciesexhibitcollectivebehaviorwheregroupsofanimalscoordinatetheir motion, as in flocking or schooling. Many species of bats also demonstrate this behavior. Bats are unique among these animals in that they use echolocation as their primary means of navigation. Bats produce ultrasonic pulses or calls and listen to the returning echo to "visualize" their environment. Bats using echolocation in large groups run the risk of other bat calls interfering with their ability to hear their own calls. They have developed various waystopreventinterferencewhichmayleadtodifferentbehaviorwhenflyingwithotherbats thanwhenflyingalone. Fielddatafromamaternitycolonyofgraybatswerecollectedusing a system of cameras and microphones. These data were analyzed to quantify the interaction between pairs of bats and to determine the effect echolocation calls have on this interaction. Results show that there is evidence of information transfer about both the speed of the bats and their turning behavior. There was also evidence of a possible leader-follower interaction in some subsets of the data.
Hoffman, Jeffrey Dean. "Using Blind Source Separation and a Compact Microphone Array to Improve the Error Rate of Speech Recognition." PDXScholar, 2016. https://pdxscholar.library.pdx.edu/open_access_etds/3367.
Full textRasumow, Eugen [Verfasser], Simon [Akademischer Betreuer] Doclo, Matthias [Akademischer Betreuer] Blau, and Dorte [Akademischer Betreuer] Hammershoi. "Synthetic reproduction of head-related transfer functions by using microphone arrays / Eugen Rasumow. Betreuer: Simon Doclo ; Matthias Blau ; Dorte Hammershoi." Oldenburg : BIS der Universität Oldenburg, 2015. http://d-nb.info/1071947257/34.
Full textBernschütz, Benjamin [Verfasser], Stefan [Akademischer Betreuer] Weinzierl, Stefan [Gutachter] Weinzierl, Christoph [Gutachter] Pörschmann, and Sascha [Gutachter] Spors. "Microphone arrays and sound field decomposition for dynamic binaural recording / Benjamin Bernschütz ; Gutachter: Stefan Weinzierl, Christoph Pörschmann, Sascha Spors ; Betreuer: Stefan Weinzierl." Berlin : Technische Universität Berlin, 2016. http://d-nb.info/1156013852/34.
Full textKern, Alexander Marco. "Quantification of the performance of 3D sound field reconstruction algorithms using high-density loudspeaker arrays and 3rd order sound field microphone measurements." Thesis, Virginia Tech, 2017. http://hdl.handle.net/10919/77516.
Full textMaster of Science
Townsend, Phil. "Enhancements to the Generalized Sidelobe Canceller for Audio Beamforming in an Immersive Environment." UKnowledge, 2009. http://uknowledge.uky.edu/gradschool_theses/645.
Full textOtsuka, Takuma. "Bayesian Microphone Array Processing." 京都大学 (Kyoto University), 2014. http://hdl.handle.net/2433/188871.
Full text0048
新制・課程博士
博士(情報学)
甲第18412号
情博第527号
新制||情||93(附属図書館)
31270
京都大学大学院情報学研究科知能情報学専攻
(主査)教授 奥乃 博, 教授 河原 達也, 准教授 CUTURI CAMETO Marco, 講師 吉井 和佳
学位規則第4条第1項該当
Cho, Jaeyoun. "Speech enhancement using microphone array." Columbus, Ohio : Ohio State University, 2005. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=osu1132239060.
Full textBakir, Tariq Saad. "Blind adaptive dereverberation of speech signals using a microphone array." Diss., Available online, Georgia Institute of Technology, 2004:, 2004. http://etd.gatech.edu/theses/available/etd-06072004-131047/unrestricted/bakir%5Ftariq%5Fs%5F200405%5Fphd.pdf.
Full textYu, Jingjing. "MICROPHONE ARRAY OPTIMIZATION IN IMMERSIVE ENVIRONMENTS." UKnowledge, 2013. http://uknowledge.uky.edu/ece_etds/19.
Full textHill, Jeffrey R. "Development of a Weatherproof Windscreen for a Microphone Array." Diss., CLICK HERE for online access, 2005. http://contentdm.lib.byu.edu/ETD/image/etd948.pdf.
Full textFurnon, Nicolas. "Apprentissage profond pour le rehaussement de la parole dans les antennes acoustiques ad-hoc." Electronic Thesis or Diss., Université de Lorraine, 2021. http://www.theses.fr/2021LORR0277.
Full textMore and more devices we use in our daily life are embedded with one or more microphones so that they can be voice controlled. Put together, these devices can form a so-called ad-hoc microphone array (AHMA). A speech enhancement step is often applied on the recorded signals to optimise the execution of the voice commands. To this effect, AHMAs are of high interest because of their flexible usage, their wide spatial coverage and the diversity of their recordings. However, it is challenging to exploit the potential of mbox{AHMAs} because devices that compose them may move and have a limited power and bandwidth capacity. Because of these limits, the speech enhancement solutions deployed in ``classic'' microphone arrays, relying on a fusion center and high processing loads, cannot be afforded.This thesis combines the modelling power of deep neural networks (DNNs) with the flexibility of use of AHMAs. To this end, we introduce a distributed speech enhancement system, which does not rely on a fusion center. So-called compressed signals are sent among the nodes and convey the spatial information recorded by the whole AHMA, while reducing the bandwidth requirements. DNNs are used to estimate the coefficients of a multichannel Wiener filter. We conduct an empirical analysis of this sytem, both on synthesized and real data, in order to validate its efficiency and to highlight the benefits of jointly using DNNs and distributed speech enhancement algorithms. We show that our system performs comparatively well compared with a state-of-the-art solution, while being more flexible and significantly reducing the computation cost.Besides, we develop our solution to adapt it to the typical usage conditions of mbox{AHMAs}. We study its behaviour when the number of devices in the AHMA varies. We introduce and compare a spatial attention mechanism and a self-attention mechanism. Both mechanisms make our system robust to a varying number of devices. We show that the weights of the self-attention mechanism reveal the utility of the information carried by each signal.We also analyse our system when the signals recorded by different devices are not synchronised. We propose a solution to improve its performance in such conditions by introducing a temporal attention mechanism. We show that this mechanism can help estimating the sampling time offset between the several devices of the AHMA.Lastly, we show that our system is also efficient for source separation. It can efficiently process the spatial information recorded by the whole AHMA in a typical meeting scenario and alleviate the needs of a complex DNN architecture
Legg, Mathew. "Microphone phased array 3D beamforming and deconvolution." Thesis, University of Auckland, 2012. http://hdl.handle.net/2292/17820.
Full textZeng, Qingning. "Speech enhancement using a small microphone array." Thesis, University of Auckland, 2010. http://hdl.handle.net/2292/5690.
Full textGreenberg, Julie Elise. "Improved design of microphone-array hearing aids." Thesis, Massachusetts Institute of Technology, 1994. http://hdl.handle.net/1721.1/11631.
Full textIncludes bibliographical references (p. 197-204).
by Julie Elise Greenberg.
Ph.D.