Academic literature on the topic 'Multichannel audio'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Multichannel audio.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Multichannel audio":

1

Ono, Kazuho. "2.Multichannel Audio." Journal of the Institute of Image Information and Television Engineers 68, no. 8 (2014): 604–7. http://dx.doi.org/10.3169/itej.68.604.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Holbrook, Kyle A., and Michael J. Yacavone. "Multichannel audio reproduction system." Journal of the Acoustical Society of America 82, no. 2 (August 1987): 728. http://dx.doi.org/10.1121/1.395373.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Emmett, John. "Metering for Multichannel Audio." SMPTE Journal 110, no. 8 (August 2001): 532–36. http://dx.doi.org/10.5594/j17765.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Zhu, Qiushi, Jie Zhang, Yu Gu, Yuchen Hu, and Lirong Dai. "Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi-Modal Speech Representation." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 17 (March 24, 2024): 19768–76. http://dx.doi.org/10.1609/aaai.v38i17.29951.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Self-supervised speech pre-training methods have developed rapidly in recent years, which show to be very effective for many near-field single-channel speech tasks. However, far-field multichannel speech processing is suffering from the scarcity of labeled multichannel data and complex ambient noises. The efficacy of self-supervised learning for far-field multichannel and multi-modal speech processing has not been well explored. Considering that visual information helps to improve speech recognition performance in noisy scenes, in this work we propose the multichannel multi-modal speech self-supervised learning framework AV-wav2vec2, which utilizes video and multichannel audio data as inputs. First, we propose a multi-path structure to process multi-channel audio streams and a visual stream in parallel, with intra-, and inter-channel contrastive as training targets to fully exploit the rich information in multi-channel speech data. Second, based on contrastive learning, we use additional single-channel audio data, which is trained jointly to improve the performance of multichannel multi-modal representation. Finally, we use a Chinese multichannel multi-modal dataset in real scenarios to validate the effectiveness of the proposed method on audio-visual speech recognition (AVSR), automatic speech recognition (ASR), visual speech recognition (VSR) and audio-visual speaker diarization (AVSD) tasks.
5

Martyniuk, Tetiana, Maksym Mykytiuk, and Mykola Zaitsev. "FEATURES OF ANALYSIS OF MULTICHANNEL AUDIO SIGNALSFEATURES OF ANALYSIS OF MULTICHANNEL AUDIO SIGNALS." ГРААЛЬ НАУКИ, no. 2-3 (April 9, 2021): 302–5. http://dx.doi.org/10.36074/grail-of-science.02.04.2021.061.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
The rapid growth of audio content has led to the need to use tools for analysis and quality control of audio signals using software and hardware and modules. The fastest-growing industry is software and programming languages.The Python programming language today has the most operational and visual capabilities for working with sound. When developing programs for computational signal analysis, it provides the optimal balance of high and low-level programming functions. Compared to Matlab or other similar solutions, Python is free and allows you to create standalone applications without the need for large, permanently installed files and a virtual environment.
6

Gao, Xue Fei, Guo Yang, Jing Wang, Xiang Xie, and Jing Ming Kuang. "A Backward Compatible Multichannel Audio Compression Method." Advanced Materials Research 756-759 (September 2013): 977–81. http://dx.doi.org/10.4028/www.scientific.net/amr.756-759.977.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
This paper proposes a backward-compatible multichannel audio codec based on downmix and upmix operation. The codec represents a multichannel audio input signal with downmixed mono signal and spatial parametric data. The encoding method consists of three parts: spatial temporal analysis of audio signal, compressing multi-channel audio into mono audio and encoding mono signals. The proposed codec combines high audio quality and low parameter coding rate and the method is simpler and more effective than the conventional methods. With this method, its possible to transmit or store multi-channel audio signals as mono audio signals.
7

Gunawan, Teddy Surya, and Mira Kartiwi. "Performance Evaluation of Multichannel Audio Compression." Indonesian Journal of Electrical Engineering and Computer Science 10, no. 1 (April 1, 2018): 146. http://dx.doi.org/10.11591/ijeecs.v10.i1.pp146-153.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
<p>In recent years, multichannel audio systems are widely used in modern sound devices as it can provide more realistic and engaging experience to the listener. This paper focuses on the performance evaluation of three lossy, i.e. AAC, Ogg Vorbis, and Opus, and three lossless compression, i.e. FLAC, TrueAudio, and WavPack, for multichannel audio signals, including stereo, 5.1 and 7.1 channels. Experiments were conducted on the same three audio files but with different channel configurations. The performance of each encoder was evaluated based on its encoding time (averaged over 100 times), data reduction, and audio quality. Usually, there is always a trade-off between the three metrics. To simplify the evaluation, a new integrated performance metric was proposed that combines all the three performance metrics. Using the new measure, FLAC was found to be the best lossless compression, while Ogg Vorbis and Opus were found to be the best for lossy compression depends on the channel configuration. This result could be used in determining the proper audio format for multichannel audio systems.</p>
8

Dong, Yingjun, Neil G. MacLaren, Yiding Cao, Francis J. Yammarino, Shelley D. Dionne, Michael D. Mumford, Shane Connelly, Hiroki Sayama, and Gregory A. Ruark. "Utterance Clustering Using Stereo Audio Channels." Computational Intelligence and Neuroscience 2021 (September 25, 2021): 1–8. http://dx.doi.org/10.1155/2021/6151651.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Utterance clustering is one of the actively researched topics in audio signal processing and machine learning. This study aims to improve the performance of utterance clustering by processing multichannel (stereo) audio signals. Processed audio signals were generated by combining left- and right-channel audio signals in a few different ways and then by extracting the embedded features (also called d-vectors) from those processed audio signals. This study applied the Gaussian mixture model for supervised utterance clustering. In the training phase, a parameter-sharing Gaussian mixture model was obtained to train the model for each speaker. In the testing phase, the speaker with the maximum likelihood was selected as the detected speaker. Results of experiments with real audio recordings of multiperson discussion sessions showed that the proposed method that used multichannel audio signals achieved significantly better performance than a conventional method with mono-audio signals in more complicated conditions.
9

Fujimori, Kazuki, Bisser Raytchev, Kazufumi Kaneda, Yasufumi Yamada, Yu Teshima, Emyo Fujioka, Shizuko Hiryu, and Toru Tamaki. "Localization of Flying Bats from Multichannel Audio Signals by Estimating Location Map with Convolutional Neural Networks." Journal of Robotics and Mechatronics 33, no. 3 (June 20, 2021): 515–25. http://dx.doi.org/10.20965/jrm.2021.p0515.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
We propose a method that uses ultrasound audio signals from a multichannel microphone array to estimate the positions of flying bats. The proposed model uses a deep convolutional neural network that takes multichannel signals as input and outputs the probability maps of the locations of bats. We present experimental results using two ultrasound audio clips of different bat species and show numerical simulations with synthetically generated sounds.
10

Hotho, Gerard, Lars F. Villemoes, and Jeroen Breebaart. "A Backward-Compatible Multichannel Audio Codec." IEEE Transactions on Audio, Speech, and Language Processing 16, no. 1 (January 2008): 83–93. http://dx.doi.org/10.1109/tasl.2007.910768.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "Multichannel audio":

1

Romoli, Laura. "Advanced application for multichannel teleconferencing audio systems." Doctoral thesis, Università Politecnica delle Marche, 2011. http://hdl.handle.net/11566/242000.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Al giorno d'oggi si registra un grande interesse verso i sistemi di telecon- ferenza multimediale a seguito della crescente richiesta di comunicazioni effi- cienti e dello sviluppo di tecniche avanzate per il processamento digitale dei segnali. Un sistema di teleconferenza dovrebbe fornire una rappresentazione realistica del campo sonoro e visivo, consentendo una comunicazione natu- rale tra i partecipanti dislocati ovunque nel mondo come fossero nella stessa stanza. In questo contesto, sono stati sviluppati molti sistemi, a partire da applicazioni basate su PC pensate per comunicazioni tra singoli utenti no a sistemi complessi dotati di ampi schermi che riproducono la stanza remota come fosse il proseguimento della stanza locale. Nei sistemi di teleconferenza è possibile ridurre l'eco indesiderata dovuta all'accoppiamento tra l'altoparlante e il microfono usando un cancellatore d'eco acustica (AEC). In presenza di più di un partecipante, per la localizza- zione del parlatore devono essere presi in considerazione sistemi multicanale. Possono essere ottenute prestazioni più realistiche già con sistemi stereofonici, poichè gli ascoltatori hanno a disposizione informazioni spaziali che aiutano ad identi care la posizione del parlatore. Tuttavia, deve essere impiegato un maggior numero di ltri adattativi e la relazione lineare esistente tra i due canali generati dalla stessa sorgente causa problemi aggiuntivi: la soluzione dell'algoritmo adattativo non è unica e dipende dalla posizione del parla- tore nella stanza di trasmissione che non è stazionaria, causando possibili problemi di convergenza. In aggiunta, la scelta dell'algoritmo adattativo di- venta estremamente importante perchè le prestazioni dipendono dal numero di condizionamento del segnale d'ingresso che è molto alto nello scenario multicanale. In questa tesi, vengono presentati contributi innovativi per la cancellazione d'eco acustica stereofonica basati sul fenomeno della \missing- fundamental". L'innovazione delle soluzioni è legata alla grande riduzione della coerenza tra i canali del segnale stereo che si riesce ad ottenere senza al- terare la qualità dell'audio e la percezione stereofonica. Inoltre, viene discussa una soluzione per migliorare la velocità di convergenza dei filtri adattativi basata su un metodo per la variazione del passo d'adattamento: l'approccio è applicato alla cancellazione d'eco acustica stereofonica ma in realtà può essere usato per generici algoritmi adattativi. Contestualmente, si è assistito ad un crescente interesse nel progetto di si- stemi che forniscono una riproduzione dei suoni la più realistica possibile così che l'ascoltatore non si accorge che sono stati prodotti artifi cialmente poichè è immerso nella scena audio virtuale circondato da un elevato numero di altoparlanti. I sistemi convezionali sono progettati per massimizzare la senzazione acustica in una speci ca posizione dell'ambiente d'ascolto, il cosiddetto sweet spot. Inoltre, non è possibile ottenere una corretta localiz- zazione della sorgente con un numero limitato di altoparlanti. Quindi, sono stati condotti diversi studi sull'ottimizzazione di questi sistemi, concentrando l'attenzione su nuove tecniche di registrazione e riproduzione, ovvero la Wave Field Analysis (WFA) e la Wave Field Synthesis (WFS). La prima è una tec- nica di registrazione del campo sonoro basata su array di microfoni e la seconda consente la sintesi del campo sonoro attraverso array di altoparlanti. Per utilizzare queste tecniche in scenari reali (ad esempio, sistemi di telecon- ferenza, cinema, home theatre) è necessario applicare algoritmi multicanale per il processamento digitale dei segnali, già sviluppati per sistemi tradizio- nali. Questo porta all'introduzione della Wave Domain Adaptive Filtering (WDAF), ovvero una generalizzazione spazio-temporale dell'algoritmo adat- tativo Fast Least Mean Squares, consentendo una considerevole riduzione della complessità computazionale. In questa tesi vengono discusse soluzioni efficienti per un'implementazione in tempo reale e possibili approssimazioni di fase delle funzioni guida usate per gestire gli altoparlanti. Inoltre, vengono presentati un approccio per la WDAF basato sulla struttura Weighted-Overlap-Add e una tecnica per il puntamento digitale dei arrays lineari basata sulla WFS: l'obiettivo di questi studi è quello di applicare questi concetti in scenari reali, come nel caso di un sistema di teleconferenza. Infatti, le suddette tecniche per la riproduzione audio immersiva possono essere sfruttate per migliorare le prestazioni di si- stemi di teleconferenza a grandezza naturale, combinando requisiti temporali e spaziali. Inoltre, risultano necessari algoritmi di riproduzione audio per migliorare la qualità audio percepita così da rendere più piacevole l'ambiente d'ascolto tenendo conto di alcune caratteristiche proprie dell'ambiente. Più speci ficata- mente, l'equalizzazione rappresenta uno strumento potente capace di gestire le irregolarità della risposta in frequenza: un equalizzatore può compensare il posizionamento del parlatore e le caratteristiche della stanza d'ascolto e può essere applicato in un sistema di teleconferenza per rendere la comunicazione la più naturale possibile. In questo lavoro vengono discusse la valutazione di un equalizzatore multipunto e una soluzione mixed-phase con un ritardo di gruppo della stanza adeguatamente progettato.
Nowadays, there is a large interest towards multimedia teleconferencing sys- tems as a consequence of the increasing requirement for efficent communica- tions and the development of advanced digital signal processing techniques. A teleconferencing system should provide a realistic representation of visual and sound fields, allowing a natural communication among participants any- where in the world as they were all in the same room. In this context, a lot of systems have been developed ranging from PC-based applications, thought for single users communications, up to complex systems provided with large video screens playing the remote room as it were a continuum of the local room. In teleconferencing systems the undesired echo due to coupling between the loudspeaker and the microphone can be reduced using an acoustic echo can- celer (AEC). In the presence of more than one participant, multichannel systems have to be taken into consideration for speaker localization. More realistic performance can be already obtained through stereophonic systems since listeners have spatial information that helps to identify the speaker position. Anyway, more adaptive lters have to be used and the linear rela- tionship existing between the two channels generated from the same source brings some additional problems: the solution of the adaptive algorithm is not unique and depends on the speaker position in the transmission room which is not stationary, causing possible convergence problems. Moreover, the choice of the adaptive algorithm becomes extremely important because the performance depends on the condition number of the input signal which is very high in the multichannel scenario. In this thesis novel contributions for stereophonic acoustic echo cancellation are given based on the \missing- fundamental" phenomenon. The novelty of the solutions is related to the great interchannel coherence reduction obtained without a ecting speech quality and stereo perception. Moreover, a solution for improving the con- vergence speed of adaptive lters is discussed based on a variable step-size method: the approach is applied to stereophonic acoustic echo cancellation but, actually, it can be used for generic adaptive algorithms. Contextually, there has been an increasing interest in the design of systems providing a reproduction of sounds as realistic as possible so that the lis- tener does not notice that they have been produced arti cially since he is immersed in the virtual audio scene surrounded by a large number of loud- speakers. Conventional systems are designed to obtain the optimal acoustic sensation in a particular position of the listening environment, i.e., the so called sweet spot. Furthermore, it is impossible to achieve a correct source localization with a limited number of loudspeakers. Hence, several research e orts have been made in the optimization of these systems, focusing on new recording and reproduction techniques, i.e., Wave Field Analysis (WFA) and Wave Field Synthesis (WFS). The former is a sound eld recording tech- nique based on microphone arrays and the latter allows sound eld synthesis through loudspeakers arrays. At the aim of using these techniques in real world applications (e.g., teleconferencing systems, cinemas, home theatres) it is necessary to apply multichannel digital signal processing algorithms, already developed for traditional systems. This led to the introduction of Wave Domain Adaptive Filtering (WDAF), a spatio-temporal generalization of Fast Least Mean Squares adaptive algorithm, allowing a considerable re- duction of the computational complexity. Efficient solutions for real time implementation and possible phase approx- imations of the driving functions used in order to manage the loudspeakers are discussed in this thesis. Furthermore, a Weighted-Overlap-Add-based (WOLA-based) approach for WDAF and a WFS-based digital pointing of line arrays are presented: the objective of these studies is that of apply- ing these concepts in real scenarios, such as a teleconferencing system. In- deed, the aforementioned immersive audio reproduction techniques can be exploited for enhancing the performance of life-sized teleconferencing sys- tems, combining temporal and spatial requirements. Furthermore, audio rendering algorithms are needed to improve the perceived audio quality in order to make the listening environment more pleasant by taking into account some speci c features of the environment. More specifically, equalization represents a powerful tool capable of dealing with the frequency response irregularities: an equalizer can compensates for speaker placement and listening room characteristics and it can be applied in a tele-conferencing system to make the communication the most natural as possible. The evaluation of a multipoint equalizer and a mixed-phase solution with a suitably designed room group delay are discussed in this work.
2

De, Sena Enzo. "Analysis, design and implementation of multichannel audio systems." Thesis, King's College London (University of London), 2013. https://kclpure.kcl.ac.uk/portal/en/theses/analysis-design-and-implementation-of-multichannel-audio-systems(2667506b-f58e-44f1-858a-bcb67d341720).html.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
This thesis is concerned with the analysis, design and implementation of multichannel audio systems. The design objective is to reconstruct a given sound field such that it is perceptually equivalent to the recorded one. A framework for the design of circular microphone arrays is proposed. This framework is based on fitting of psychoacoustic data and enables the design of both coincident and quasi-coincident arrays. Results of formal listening experiments suggest that the proposed methodology performs on a par with state of the art methods, albeit with a more graceful degradation away from the centre of the loudspeaker array. A computational model of auditory perception is also developed to estimate the subjects' response in a broader class of conditions than the ones considered in the listening experiments. The model predictions suggest that quasi-coincident microphone arrays result in auditory events that are easier to localise for off centre listeners. Two technologies are developed to enable using the proposed framework for recording of real sound fields (e.g. live concert) and virtual ones (e.g. video-games). Differential microphones are identified as desirable candidates for the case of real sound fields and are adapted to suit the framework requirements. Their robustness to self-noise is assessed and measurements of a third-order prototype are presented. Finally, a scalable and interactive room acoustic simulator is proposed to enable virtual recordings in simulated sound fields.
3

Daniel, Adrien. "Spatial Auditory Blurring and Applications to Multichannel Audio Coding." Phd thesis, Université Pierre et Marie Curie - Paris VI, 2011. http://tel.archives-ouvertes.fr/tel-00623670.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Ce travail se place en contexte de télécommunications, et concerne plus particulièrement la transmission de signaux audio multicanaux. Quatre expériences psychoacoustiques ont été menées de façon à étudier la résolution spatiale du système auditif - également appelée flou de localisation - en présence de sons distracteurs. Il en résulte que le flou de localisation augmente quand ces distracteurs sont présents, mettant en évidence ce que nous appellerons le phénomène de "floutage spatial" auditif. Ces expériences estiment l'effet de plusieurs variables sur le floutage spatial : la fréquence de la source sonore considérée ainsi que celles des sources distractrices, leur niveau sonore, leur position spatiale, et le nombre de sources distractrices. Exceptée la position des sources distractrices, toutes ces variables ont montré un effet significatif sur le floutage spatial. Cette thèse aborde également la modélisation de ce phénomène, de sorte que la résolution spatiale auditive puisse être prédite en fonction des caractéristiques de la scène sonore (nombre de sources présentes, leur fréquence, et leur niveau). Enfin, deux schémas de codage audio multicanaux exploitant ce modèle à des fins de réduction de l'information à transmettre sont proposés : l'un basé sur une représentation paramétrique (downmix + paramètres spatiaux) du signal multicanal, et l'autre sur la représentation Higher-Order Ambisonics (HOA). Ces schémas sont tous deux basés sur l'idée originale d'ajuster dynamiquement la précision de la représentation spatiale du signal multicanal de façon à maintenir les distorsions spatiales résultantes dans le flou de localisation, afin que celles-ci restent indétectables.
4

George, Sunish. "Objective models for predicting selected multichannel audio quality attributes." Thesis, University of Surrey, 2009. http://epubs.surrey.ac.uk/844426/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
This thesis discusses the details of research conducted in order to contribute towards the development of a generic model that predicts multichannel audio quality. The review in this thesis evaluated the existing objective models that predict audio quality. It was concluded from the review that most objective models that exists today are not capable of predicting multichannel audio quality in their current form. Therefore, important multichannel audio quality attributes were identified and an attempt was made to predict some of them using features derived from the recordings themselves. The project was completed in two phases. The selected attributes in the first phase were basic audio quality, timbral fidelity, frontal spatial fidelity and surround spatial fidelity since they were the most frequently reported. Envelopment was selected in the second phase since it was reported as an important attribute of multichannel audio in several elicitation experiments. The listening tests in the first phase were conducted according to ITU-R BS 1534-1 recommendation. A novel test paradigm was employed for evaluating envelopment. The models were calibrated by employing regression analysis techniques. The models in the first phase were of double-ended type and features IACC measurements, spectral centroid, spectral rolloff and centroid of coherence were proved to be useful for the predictions. The model for predicting envelopment was of single-ended type and features IACC measurements, spectral rolloff, area of sound distribution, inter-channel coherence and extent of coverage angle was proved to be important for prediction. The calibrated models were validated using the scores obtained from independent listening tests. The predicted scores from validation experiments showed high correlation with the actual scores and the accuracy of the models were comparable to the inter-listener errors encountered in typical listening tests. The developed models could either be used as independent applications or act as building blocks of a generic model that predicts multichannel audio quality.
5

Martí, Guerola Amparo. "Multichannel audio processing for speaker localization, separation and enhancement." Doctoral thesis, Universitat Politècnica de València, 2013. http://hdl.handle.net/10251/33101.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
This thesis is related to the field of acoustic signal processing and its applications to emerging communication environments. Acoustic signal processing is a very wide research area covering the design of signal processing algorithms involving one or several acoustic signals to perform a given task, such as locating the sound source that originated the acquired signals, improving their signal to noise ratio, separating signals of interest from a set of interfering sources or recognizing the type of source and the content of the message. Among the above tasks, Sound Source localization (SSL) and Automatic Speech Recognition (ASR) have been specially addressed in this thesis. In fact, the localization of sound sources in a room has received a lot of attention in the last decades. Most real-word microphone array applications require the localization of one or more active sound sources in adverse environments (low signal-to-noise ratio and high reverberation). Some of these applications are teleconferencing systems, video-gaming, autonomous robots, remote surveillance, hands-free speech acquisition, etc. Indeed, performing robust sound source localization under high noise and reverberation is a very challenging task. One of the most well-known algorithms for source localization in noisy and reverberant environments is the Steered Response Power - Phase Transform (SRP-PHAT) algorithm, which constitutes the baseline framework for the contributions proposed in this thesis. Another challenge in the design of SSL algorithms is to achieve real-time performance and high localization accuracy with a reasonable number of microphones and limited computational resources. Although the SRP-PHAT algorithm has been shown to be an effective localization algorithm for real-world environments, its practical implementation is usually based on a costly fine grid-search procedure, making the computational cost of the method a real issue. In this context, several modifications and optimizations have been proposed to improve its performance and applicability. An effective strategy that extends the conventional SRP-PHAT functional is presented in this thesis. This approach performs a full exploration of the sampled space rather than computing the SRP at discrete spatial positions, increasing its robustness and allowing for a coarser spatial grid that reduces the computational cost required in a practical implementation with a small hardware cost (reduced number of microphones). This strategy allows to implement real-time applications based on location information, such as automatic camera steering or the detection of speech/non-speech fragments in advanced videoconferencing systems. As stated before, besides the contributions related to SSL, this thesis is also related to the field of ASR. This technology allows a computer or electronic device to identify the words spoken by a person so that the message can be stored or processed in a useful way. ASR is used on a day-to-day basis in a number of applications and services such as natural human-machine interfaces, dictation systems, electronic translators and automatic information desks. However, there are still some challenges to be solved. A major problem in ASR is to recognize people speaking in a room by using distant microphones. In distant-speech recognition, the microphone does not only receive the direct path signal, but also delayed replicas as a result of multi-path propagation. Moreover, there are multiple situations in teleconferencing meetings when multiple speakers talk simultaneously. In this context, when multiple speaker signals are present, Sound Source Separation (SSS) methods can be successfully employed to improve ASR performance in multi-source scenarios. This is the motivation behind the training method for multiple talk situations proposed in this thesis. This training, which is based on a robust transformed model constructed from separated speech in diverse acoustic environments, makes use of a SSS method as a speech enhancement stage that suppresses the unwanted interferences. The combination of source separation and this specific training has been explored and evaluated under different acoustical conditions, leading to improvements of up to a 35% in ASR performance.
Martí Guerola, A. (2013). Multichannel audio processing for speaker localization, separation and enhancement [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/33101
TESIS
6

Belloch, Rodríguez José Antonio. "PERFORMANCE IMPROVEMENT OF MULTICHANNEL AUDIO BY GRAPHICS PROCESSING UNITS." Doctoral thesis, Universitat Politècnica de València, 2014. http://hdl.handle.net/10251/40651.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Multichannel acoustic signal processing has undergone major development in recent years due to the increased complexity of current audio processing applications. People want to collaborate through communication with the feeling of being together and sharing the same environment, what is considered as Immersive Audio Schemes. In this phenomenon, several acoustic e ects are involved: 3D spatial sound, room compensation, crosstalk cancelation, sound source localization, among others. However, high computing capacity is required to achieve any of these e ects in a real large-scale system, what represents a considerable limitation for real-time applications. The increase of the computational capacity has been historically linked to the number of transistors in a chip. However, nowadays the improvements in the computational capacity are mainly given by increasing the number of processing units, i.e expanding parallelism in computing. This is the case of the Graphics Processing Units (GPUs), that own now thousands of computing cores. GPUs were traditionally related to graphic or image applications, but new releases in the GPU programming environments, CUDA or OpenCL, allowed that most applications were computationally accelerated in elds beyond graphics. This thesis aims to demonstrate that GPUs are totally valid tools to carry out audio applications that require high computational resources. To this end, di erent applications in the eld of audio processing are studied and performed using GPUs. This manuscript also analyzes and solves possible limitations in each GPU-based implementation both from the acoustic point of view as from the computational point of view. In this document, we have addressed the following problems: Most of audio applications are based on massive ltering. Thus, the rst implementation to undertake is a fundamental operation in the audio processing: the convolution. It has been rst developed as a computational kernel and afterwards used for an application that combines multiples convolutions concurrently: generalized crosstalk cancellation and equalization. The proposed implementation can successfully manage two di erent and common situations: size of bu ers that are much larger than the size of the lters and size of bu ers that are much smaller than the size of the lters. Two spatial audio applications that use the GPU as a co-processor have been developed from the massive multichannel ltering. First application deals with binaural audio. Its main feature is that this application is able to synthesize sound sources in spatial positions that are not included in the database of HRTF and to generate smoothly movements of sound sources. Both features were designed after di erent tests (objective and subjective). The performance regarding number of sound source that could be rendered in real time was assessed on GPUs with di erent GPU architectures. A similar performance is measured in a Wave Field Synthesis system (second spatial audio application) that is composed of 96 loudspeakers. The proposed GPU-based implementation is able to reduce the room e ects during the sound source rendering. A well-known approach for sound source localization in noisy and reverberant environments is also addressed on a multi-GPU system. This is the case of the Steered Response Power with Phase Transform (SRPPHAT) algorithm. Since localization accuracy can be improved by using high-resolution spatial grids and a high number of microphones, accurate acoustic localization systems require high computational power. The solutions implemented in this thesis are evaluated both from localization and from computational performance points of view, taking into account different acoustic environments, and always from a real-time implementation perspective. Finally, This manuscript addresses also massive multichannel ltering when the lters present an In nite Impulse Response (IIR). Two cases are analyzed in this manuscript: 1) IIR lters composed of multiple secondorder sections, and 2) IIR lters that presents an allpass response. Both cases are used to develop and accelerate two di erent applications: 1) to execute multiple Equalizations in a WFS system, and 2) to reduce the dynamic range in an audio signal.
Belloch Rodríguez, JA. (2014). PERFORMANCE IMPROVEMENT OF MULTICHANNEL AUDIO BY GRAPHICS PROCESSING UNITS [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/40651
TESIS
Premiado
7

Parry, Robert Mitchell. "Separation and Analysis of Multichannel Signals." Diss., Georgia Institute of Technology, 2007. http://hdl.handle.net/1853/19743.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Music recordings contain the mixed contribution of multiple overlapping instruments. In order to better understand the music, it would be beneficial to understand each instrument independently. This thesis focuses on separating the individual instrument recordings within a song. In particular, we propose novel algorithms for separating instrument recordings given only their mixture. When the number of source signals does not exceed the number of mixture signals, we focus on a subclass of source separation algorithms based on joint diagonalization. Each approach leverages a different form of source structure. We introduce repetitive structure as an alternative that leverages unique repetition patterns in music and compare its performance against the other techniques. When the number of source signals exceeds the number of mixtures (i.e. the underdetermined problem), we focus on spectrogram factorization techniques for source separation. We extend single-channel techniques to utilize the additional spatial information in multichannel recordings, and use phase information to improve the estimation of the underlying components.
8

Wille, Joachim Olsen. "Performance of a Multichannel Audio Correction System Outside the Sweetspot. : Further Investigations of the Trinnov Optimizer." Thesis, Norwegian University of Science and Technology, Department of Electronics and Telecommunications, 2008. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-8911.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:

This report is a continuation of the student project "Evaluation of TrinnovOptimizer audio reproduction system". It will further investigate theproperties and function of the Trinnov Optimizer, a correction system foraudio reproduction systems. During the student project measurements wereperformed in an anechoic lab to provide information on the functionality andabilities of the Trinnov Optimizer. Massive amounts of data were recorded,and that has also been the foundation of this report. The new work that hasbeen done is by interpreting these results through the use of Matlab. The Optimizer by Trinnov [9 ] is a standalone system for reproductionof audio over a single or multiple loudspeaker setup. It is designed tocorrect frequency and phase response in addition to correcting loudspeakerplacements and cancel simple early re?ections in a multiple loudspeakersetup. The purpose of further investigating this issue was to understandmore about the sound?eld produced around the listening position, and togive more detailed results on the changes in the sound?eld after correction.Importance of correcting the system not only in the listening position, butalso in the surrounding area, is obvious because there is often more than onelistener. This report gives further insight in physical measurements ratherthan subjective statements, on the performance of a room and loudspeakercorrection device. WinMLS has been used to measure the system with single, and multiplemicrophone setups. Some results from the earlier student project are alsoin this report to verify measurement methods, and to show correspondancebetween the di?erent measuring systems. Therefore some of the data havebeen compared to the Trinnov Optimizer's own measurements and appear similar in this report. Some errors found in the initial report, the results from the phase response measurements, have also been corrected. Multiple loudspeakers in a 5.0 setup have been measured with 5 microphones on a rotating boom to measure the soundpressure over an area around the listening position. This allowed the e?ect of simple re?ections cancellation, and the ability to generate virtual sources to be investigated. For the speci?c cases that were investigated in this report, the Optimizer showed the following: ? Frequency and phase response will in every situation be optimized to the extent of the Optimizers algorithms. ? Every case shows improvement in the frequency and phase response over the whole measured area. ? Direct frontal re?ections was deconvolved up to 300Hz over the whole measured area with a radius of 56cm. ? A re?ection from the side was deconvolved roughly up to 200Hz for microphones 1 through 3, up to a radius of 31.25cm, and up to 100Hz for microphones 4 and 5. ? The ability to create virtual sources corresponds fairly to the theoretical expectations. The video sequences that were developed give an interesting new angle on the problems that were investigated. Other than looking at plots of di?erent angles which is di?cult and time consuming, the videos showed an intuitive perspective that enlightened the same issues as the common presented data of frequency and phase response measurements.

9

Sekiguchi, Kouhei. "A Unified Statistical Approach to Fast and Robust Multichannel Speech Separation and Dereverberation." Doctoral thesis, Kyoto University, 2021. http://hdl.handle.net/2433/263770.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Gaultier, Clément. "Conception et évaluation de modèles parcimonieux et d'algorithmes pour la résolution de problèmes inverses en audio." Thesis, Rennes 1, 2019. http://www.theses.fr/2019REN1S009/document.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Dans le contexte général de la résolution de problèmes inverses en acoustique et traitement du signal audio les défis sont nombreux. Pour la résolution de ces problèmes, leur caractère souvent mal posé nécessite de considérer des modèles de signaux appropriés. Les travaux de cette thèse montrent sur la base d'un cadre algorithmique générique polyvalent comment les différentes formes de parcimonie (à l'analyse ou à la synthèse, simple, structurée ou sociale) sont particulièrement adaptées à la reconstruction de signaux sonores dans un cadre mono ou multicanal. Le cœur des travaux de thèse permet de mettre en évidence les limites des conditions d'évaluation de l'état de l'art pour le problème de désaturation et de mettre en place un protocole rigoureux d'évaluation à grande échelle pour identifier les méthodes les plus appropriées en fonction du contexte (musique ou parole, signaux fortement ou faiblement dégradés). On démontre des améliorations de qualité substantielles par rapport à l'état de l'art dans certains régimes avec des configurations qui n'avaient pas été précédemment considérées, nous obtenons également des accélérations conséquentes. Enfin, un volet des travaux aborde la localisation de sources sonores sous l'angle de l'apprentissage statistique « virtuellement supervisé ». On montre avec cette méthode des résultats encourageants sur l'estimation de directions d'arrivée et de distance
Today's challenges in the context of audio and acoustic signal processing inverse problems are multiform. Addressing these problems often requires additional appropriate signal models due to their inherent ill-posedness. This work focuses on designing and evaluating audio reconstruction algorithms. Thus, it shows how various sparse models (analysis, synthesis, plain, structured or “social”) are particularly suited for single or multichannel audio signal reconstruction. The core of this work notably identifies the limits of state-of-the-art methods evaluation for audio declipping and proposes a rigourous large-scale evaluation protocol to determine the more appropriate methods depending on the context (music or speech, moderately or highly degraded signals). Experimental results demonstrate substantial quality improvements for some newly considered testing configurations. We also show computational efficiency of the different methods and considerable speed improvements. Additionally, a part of this work is dedicated to the sound source localization problem. We address it with a “virtually supervised” machine learning technique. Experiments show with this method promising results on distance and direction of arrival estimation

Books on the topic "Multichannel audio":

1

Society, Audio Engineering. AES recommended practice for digital audio engineering: Serial multichannel audio digital interface (MADI). New York: Audio Engineering Society, 1991.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
2

Grimm, Simon. Directivity Based Multichannel Audio Signal Processing For Microphones in Noisy Acoustic Environments. Wiesbaden: Springer Fachmedien Wiesbaden, 2019. http://dx.doi.org/10.1007/978-3-658-25152-9.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Meares, D. J. Evaluations of high quality, multichannel audio codecs carried out on behalf of ISO/IEC MPeg. London: British Broadcasting Corporation Research and Development Department, 1995.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
4

Kyriakakis, Chris, Dai Tracy Yang, and C. C. Jay Kuo. High-Fidelity Multichannel Audio Coding. Hindawi, 2004.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
5

Grimm, Simon. Directivity Based Multichannel Audio Signal Processing For Microphones in Noisy Acoustic Environments ). Springer Vieweg, 2019.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
6

Théberge, Paul, Kyle Devine, and Tom Everrett. Living Stereo: Histories and Cultures of Multichannel Sound. Bloomsbury Academic & Professional, 2015.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
7

Théberge, Paul, Kyle Devine, and Tom Everrett. Living stereo: Histories and cultures of multichannel sound. 2015.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
8

Chris Kyriakakis, and C.-C. Jay Kuo Dai Tracy Yang. High-Fidelity Multichannel Audio Coding (Second Edition) (EURASIP Book Series on Signal Processing & Communications). 2nd ed. Hindawi Publishing Corporation, 2006.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
9

Yang, dai tracy. High-Fidelity Multichannel Audio Coding (Eurasip Book Series on Signal Processing and Communications, Vol. 1). Hindawi Publishing Corporation, 2004.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
10

Huff, W. A. Kelly. Regulating the Future. Praeger, 2001. http://dx.doi.org/10.5040/9798216006695.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
This comprehensive study examines the case of AM stereo and subsequent technologies to demonstrate the FCC's evolution from stern to reluctant regulator. It also examines emerging technologies, such as multichannel television sound, digital audio broadcasting, and high definition television, and discusses their impact on the evolution of broadcast regulation. In the 1980s the tension between governmental control and the marketplace resulted in the FCC's deregulation of TV and radio, electing to set only technical operating parameters and allowing legal operation of any system that meets those minimal standards. Huff argues that this approach is likely to influence regulatory approaches to other new developments in broadcast technologies. The extensive overview of the industry and the study of the interrelationships between the technologies will appeal to communication scholars in the fields of radio and television as well as interest industry professionals.

Book chapters on the topic "Multichannel audio":

1

Toole, Floyd E. "Multichannel Audio." In Sound Reproduction, 397–432. Third edition. | New York ; London : Routledge, 2017.: Routledge, 2017. http://dx.doi.org/10.4324/9781315686424-15.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Markovich-Golan, Shmulik, Walter Kellermann, and Sharon Gannot. "Multichannel Parameter Estimation." In Audio Source Separation and Speech Enhancement, 219–34. Chichester, UK: John Wiley & Sons Ltd, 2018. http://dx.doi.org/10.1002/9781119279860.ch11.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Kameoka, Hirokazu, Hiroshi Sawada, and Takuya Higuchi. "General Formulation of Multichannel Extensions of NMF Variants." In Audio Source Separation, 95–124. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-73031-8_5.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Nugraha, Aditya Arie, Antoine Liutkus, and Emmanuel Vincent. "Deep Neural Network Based Multichannel Audio Source Separation." In Audio Source Separation, 157–85. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-73031-8_7.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Mandel, Michael I., Shoko Araki, and Tomohiro Nakatani. "Multichannel Clustering and Classification Approaches." In Audio Source Separation and Speech Enhancement, 235–61. Chichester, UK: John Wiley & Sons Ltd, 2018. http://dx.doi.org/10.1002/9781119279860.ch12.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Ozerov, Alexey, and Hirokazu Kameoka. "Gaussian Model Based Multichannel Separation." In Audio Source Separation and Speech Enhancement, 289–315. Chichester, UK: John Wiley & Sons Ltd, 2018. http://dx.doi.org/10.1002/9781119279860.ch14.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Ozerov, Alexey, Cédric Févotte, and Emmanuel Vincent. "An Introduction to Multichannel NMF for Audio Source Separation." In Audio Source Separation, 73–94. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-73031-8_4.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Kornatowski, Eugeniusz. "Monitoring of the Multichannel Audio Signal." In Computational Collective Intelligence. Technologies and Applications, 298–306. Berlin, Heidelberg: Springer Berlin Heidelberg, 2010. http://dx.doi.org/10.1007/978-3-642-16732-4_32.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Ito, Nobutaka, Shoko Araki, and Tomohiro Nakatani. "Recent Advances in Multichannel Source Separation and Denoising Based on Source Sparseness." In Audio Source Separation, 279–300. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-73031-8_11.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Pertilä, Pasi, Alessio Brutti, Piergiorgio Svaizer, and Maurizio Omologo. "Multichannel Source Activity Detection, Localization, and Tracking." In Audio Source Separation and Speech Enhancement, 47–64. Chichester, UK: John Wiley & Sons Ltd, 2018. http://dx.doi.org/10.1002/9781119279860.ch4.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Multichannel audio":

1

Ozerov, Alexey, Cagdas Bilen, and Patrick Perez. "Multichannel audio declipping." In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2016. http://dx.doi.org/10.1109/icassp.2016.7471757.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Boltze, Thomas, and Leon van de Kerkhof. "MPEG Multichannel Audio in DVB." In SMPTE Australia Conference. IEEE, 1999. http://dx.doi.org/10.5594/m001173.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Langer, Henrik, and Robert Manzke. "Embedded Multichannel Linux Audiosystem for Musical Applications." In AM '17: Audio Mostly 2017. New York, NY, USA: ACM, 2017. http://dx.doi.org/10.1145/3123514.3123523.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Leglaive, Simon, Umut Simsekli, Antoine Liutkus, Roland Badeau, and Gael Richard. "Alpha-stable multichannel audio source separation." In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2017. http://dx.doi.org/10.1109/icassp.2017.7952221.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Lyman, Steve. "Contribution and Distribution of Multichannel Audio." In SMPTE Australia Conference. IEEE, 1999. http://dx.doi.org/10.5594/m001174.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Reiss, Joshua D. "Intelligent systems for mixing multichannel audio." In 2011 17th International Conference on Digital Signal Processing (DSP). IEEE, 2011. http://dx.doi.org/10.1109/icdsp.2011.6004988.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Yang, Dai, Hongmei Ai, Christos Kyriakakis, and C. C. Jay Kuo. "Embedded high-quality multichannel audio coding." In Photonics West 2001 - Electronic Imaging, edited by Sethuraman Panchanathan, V. Michael Bove, Jr., and Subramania I. Sudharsanan. SPIE, 2001. http://dx.doi.org/10.1117/12.420793.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Thomas, Mark R. P., Nikolay D. Gaubitch, Jon Gudnason, and Patrick A. Naylor. "A Practical Multichannel Dereverberation Algorithm using Multichannel Dypsa and Spatiotemporal Averaging." In 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. IEEE, 2007. http://dx.doi.org/10.1109/aspaa.2007.4392983.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Arteaga, Daniel, and Jordi Pons. "Multichannel-based Learning for Audio Object Extraction." In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021. http://dx.doi.org/10.1109/icassp39728.2021.9414585.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Wohlmayr, Michael, and Marián Képesi. "Joint position-pitch extraction from multichannel audio." In Interspeech 2007. ISCA: ISCA, 2007. http://dx.doi.org/10.21437/interspeech.2007-454.

Full text
APA, Harvard, Vancouver, ISO, and other styles

To the bibliography