Dissertationen zum Thema „Multichannel audio“
Geben Sie eine Quelle nach APA, MLA, Chicago, Harvard und anderen Zitierweisen an
Machen Sie sich mit Top-23 Dissertationen für die Forschung zum Thema "Multichannel audio" bekannt.
Neben jedem Werk im Literaturverzeichnis ist die Option "Zur Bibliographie hinzufügen" verfügbar. Nutzen Sie sie, wird Ihre bibliographische Angabe des gewählten Werkes nach der nötigen Zitierweise (APA, MLA, Harvard, Chicago, Vancouver usw.) automatisch gestaltet.
Sie können auch den vollen Text der wissenschaftlichen Publikation im PDF-Format herunterladen und eine Online-Annotation der Arbeit lesen, wenn die relevanten Parameter in den Metadaten verfügbar sind.
Sehen Sie die Dissertationen für verschiedene Spezialgebieten durch und erstellen Sie Ihre Bibliographie auf korrekte Weise.
Romoli, Laura. „Advanced application for multichannel teleconferencing audio systems“. Doctoral thesis, Università Politecnica delle Marche, 2011. http://hdl.handle.net/11566/242000.
Der volle Inhalt der QuelleNowadays, there is a large interest towards multimedia teleconferencing sys- tems as a consequence of the increasing requirement for efficent communica- tions and the development of advanced digital signal processing techniques. A teleconferencing system should provide a realistic representation of visual and sound fields, allowing a natural communication among participants any- where in the world as they were all in the same room. In this context, a lot of systems have been developed ranging from PC-based applications, thought for single users communications, up to complex systems provided with large video screens playing the remote room as it were a continuum of the local room. In teleconferencing systems the undesired echo due to coupling between the loudspeaker and the microphone can be reduced using an acoustic echo can- celer (AEC). In the presence of more than one participant, multichannel systems have to be taken into consideration for speaker localization. More realistic performance can be already obtained through stereophonic systems since listeners have spatial information that helps to identify the speaker position. Anyway, more adaptive lters have to be used and the linear rela- tionship existing between the two channels generated from the same source brings some additional problems: the solution of the adaptive algorithm is not unique and depends on the speaker position in the transmission room which is not stationary, causing possible convergence problems. Moreover, the choice of the adaptive algorithm becomes extremely important because the performance depends on the condition number of the input signal which is very high in the multichannel scenario. In this thesis novel contributions for stereophonic acoustic echo cancellation are given based on the \missing- fundamental" phenomenon. The novelty of the solutions is related to the great interchannel coherence reduction obtained without a ecting speech quality and stereo perception. Moreover, a solution for improving the con- vergence speed of adaptive lters is discussed based on a variable step-size method: the approach is applied to stereophonic acoustic echo cancellation but, actually, it can be used for generic adaptive algorithms. Contextually, there has been an increasing interest in the design of systems providing a reproduction of sounds as realistic as possible so that the lis- tener does not notice that they have been produced arti cially since he is immersed in the virtual audio scene surrounded by a large number of loud- speakers. Conventional systems are designed to obtain the optimal acoustic sensation in a particular position of the listening environment, i.e., the so called sweet spot. Furthermore, it is impossible to achieve a correct source localization with a limited number of loudspeakers. Hence, several research e orts have been made in the optimization of these systems, focusing on new recording and reproduction techniques, i.e., Wave Field Analysis (WFA) and Wave Field Synthesis (WFS). The former is a sound eld recording tech- nique based on microphone arrays and the latter allows sound eld synthesis through loudspeakers arrays. At the aim of using these techniques in real world applications (e.g., teleconferencing systems, cinemas, home theatres) it is necessary to apply multichannel digital signal processing algorithms, already developed for traditional systems. This led to the introduction of Wave Domain Adaptive Filtering (WDAF), a spatio-temporal generalization of Fast Least Mean Squares adaptive algorithm, allowing a considerable re- duction of the computational complexity. Efficient solutions for real time implementation and possible phase approx- imations of the driving functions used in order to manage the loudspeakers are discussed in this thesis. Furthermore, a Weighted-Overlap-Add-based (WOLA-based) approach for WDAF and a WFS-based digital pointing of line arrays are presented: the objective of these studies is that of apply- ing these concepts in real scenarios, such as a teleconferencing system. In- deed, the aforementioned immersive audio reproduction techniques can be exploited for enhancing the performance of life-sized teleconferencing sys- tems, combining temporal and spatial requirements. Furthermore, audio rendering algorithms are needed to improve the perceived audio quality in order to make the listening environment more pleasant by taking into account some speci c features of the environment. More specifically, equalization represents a powerful tool capable of dealing with the frequency response irregularities: an equalizer can compensates for speaker placement and listening room characteristics and it can be applied in a tele-conferencing system to make the communication the most natural as possible. The evaluation of a multipoint equalizer and a mixed-phase solution with a suitably designed room group delay are discussed in this work.
De, Sena Enzo. „Analysis, design and implementation of multichannel audio systems“. Thesis, King's College London (University of London), 2013. https://kclpure.kcl.ac.uk/portal/en/theses/analysis-design-and-implementation-of-multichannel-audio-systems(2667506b-f58e-44f1-858a-bcb67d341720).html.
Der volle Inhalt der QuelleDaniel, Adrien. „Spatial Auditory Blurring and Applications to Multichannel Audio Coding“. Phd thesis, Université Pierre et Marie Curie - Paris VI, 2011. http://tel.archives-ouvertes.fr/tel-00623670.
Der volle Inhalt der QuelleGeorge, Sunish. „Objective models for predicting selected multichannel audio quality attributes“. Thesis, University of Surrey, 2009. http://epubs.surrey.ac.uk/844426/.
Der volle Inhalt der QuelleMartí, Guerola Amparo. „Multichannel audio processing for speaker localization, separation and enhancement“. Doctoral thesis, Universitat Politècnica de València, 2013. http://hdl.handle.net/10251/33101.
Der volle Inhalt der QuelleMartí Guerola, A. (2013). Multichannel audio processing for speaker localization, separation and enhancement [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/33101
TESIS
Belloch, Rodríguez José Antonio. „PERFORMANCE IMPROVEMENT OF MULTICHANNEL AUDIO BY GRAPHICS PROCESSING UNITS“. Doctoral thesis, Universitat Politècnica de València, 2014. http://hdl.handle.net/10251/40651.
Der volle Inhalt der QuelleBelloch Rodríguez, JA. (2014). PERFORMANCE IMPROVEMENT OF MULTICHANNEL AUDIO BY GRAPHICS PROCESSING UNITS [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/40651
TESIS
Premiado
Parry, Robert Mitchell. „Separation and Analysis of Multichannel Signals“. Diss., Georgia Institute of Technology, 2007. http://hdl.handle.net/1853/19743.
Der volle Inhalt der QuelleWille, Joachim Olsen. „Performance of a Multichannel Audio Correction System Outside the Sweetspot. : Further Investigations of the Trinnov Optimizer“. Thesis, Norwegian University of Science and Technology, Department of Electronics and Telecommunications, 2008. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-8911.
Der volle Inhalt der QuelleThis report is a continuation of the student project "Evaluation of TrinnovOptimizer audio reproduction system". It will further investigate theproperties and function of the Trinnov Optimizer, a correction system foraudio reproduction systems. During the student project measurements wereperformed in an anechoic lab to provide information on the functionality andabilities of the Trinnov Optimizer. Massive amounts of data were recorded,and that has also been the foundation of this report. The new work that hasbeen done is by interpreting these results through the use of Matlab. The Optimizer by Trinnov [9 ] is a standalone system for reproductionof audio over a single or multiple loudspeaker setup. It is designed tocorrect frequency and phase response in addition to correcting loudspeakerplacements and cancel simple early re?ections in a multiple loudspeakersetup. The purpose of further investigating this issue was to understandmore about the sound?eld produced around the listening position, and togive more detailed results on the changes in the sound?eld after correction.Importance of correcting the system not only in the listening position, butalso in the surrounding area, is obvious because there is often more than onelistener. This report gives further insight in physical measurements ratherthan subjective statements, on the performance of a room and loudspeakercorrection device. WinMLS has been used to measure the system with single, and multiplemicrophone setups. Some results from the earlier student project are alsoin this report to verify measurement methods, and to show correspondancebetween the di?erent measuring systems. Therefore some of the data havebeen compared to the Trinnov Optimizer's own measurements and appear similar in this report. Some errors found in the initial report, the results from the phase response measurements, have also been corrected. Multiple loudspeakers in a 5.0 setup have been measured with 5 microphones on a rotating boom to measure the soundpressure over an area around the listening position. This allowed the e?ect of simple re?ections cancellation, and the ability to generate virtual sources to be investigated. For the speci?c cases that were investigated in this report, the Optimizer showed the following: ? Frequency and phase response will in every situation be optimized to the extent of the Optimizers algorithms. ? Every case shows improvement in the frequency and phase response over the whole measured area. ? Direct frontal re?ections was deconvolved up to 300Hz over the whole measured area with a radius of 56cm. ? A re?ection from the side was deconvolved roughly up to 200Hz for microphones 1 through 3, up to a radius of 31.25cm, and up to 100Hz for microphones 4 and 5. ? The ability to create virtual sources corresponds fairly to the theoretical expectations. The video sequences that were developed give an interesting new angle on the problems that were investigated. Other than looking at plots of di?erent angles which is di?cult and time consuming, the videos showed an intuitive perspective that enlightened the same issues as the common presented data of frequency and phase response measurements.
Sekiguchi, Kouhei. „A Unified Statistical Approach to Fast and Robust Multichannel Speech Separation and Dereverberation“. Doctoral thesis, Kyoto University, 2021. http://hdl.handle.net/2433/263770.
Der volle Inhalt der QuelleGaultier, Clément. „Conception et évaluation de modèles parcimonieux et d'algorithmes pour la résolution de problèmes inverses en audio“. Thesis, Rennes 1, 2019. http://www.theses.fr/2019REN1S009/document.
Der volle Inhalt der QuelleToday's challenges in the context of audio and acoustic signal processing inverse problems are multiform. Addressing these problems often requires additional appropriate signal models due to their inherent ill-posedness. This work focuses on designing and evaluating audio reconstruction algorithms. Thus, it shows how various sparse models (analysis, synthesis, plain, structured or “social”) are particularly suited for single or multichannel audio signal reconstruction. The core of this work notably identifies the limits of state-of-the-art methods evaluation for audio declipping and proposes a rigourous large-scale evaluation protocol to determine the more appropriate methods depending on the context (music or speech, moderately or highly degraded signals). Experimental results demonstrate substantial quality improvements for some newly considered testing configurations. We also show computational efficiency of the different methods and considerable speed improvements. Additionally, a part of this work is dedicated to the sound source localization problem. We address it with a “virtually supervised” machine learning technique. Experiments show with this method promising results on distance and direction of arrival estimation
Lecomte, Pierre. „Ambisonie d'ordre élevé en trois dimensions : captation, transformations et décodage adaptatif de champs sonores“. Thèse, Université de Sherbrooke, 2016. http://hdl.handle.net/11143/9888.
Der volle Inhalt der QuelleAbstract : Sound field synthesis is an active research domain with various musical, multimedia or industrial applications. In the latter case, the accurate reconstruction of the sound field is targeted, which involves answering several scientific questions. Using arrays of microphones and loudspeakers, the capture, synthesis and accurate reconstruction of sound fields are theoretically possible. However, for practical applications, the arrangement of the loud- speakers and the acoustic influence of the restitution room are critical factors to consider in order to ensure the accurate reconstruction of the sound field. In this context, this thesis proposes methods and techniques for the capture, transforma- tions and accurate reconstruction of sound fields in three dimensions based on the Higher Order Ambisonics (HOA) method. A spherical configuration for the array of microphones and loudspeakers is proposed. It follows a fifty-node Lebedev grid that enables the capture and reconstruction of the sound field up to order 5 with HOA formalism. The limitations of this approach, such as the spatial aliasing, are studied in detail. A transformation op- eration of the sound field is also proposed. The formulation is established in the spherical harmonics domain and enables a directional filtering on the sound field prior to the decod- ing step. For the reconstruction of the sound field, an original approach, also established in the spherical harmonics domain, can take into account the acoustic influence of the restitution room and the defects of the playback system. This treatment then adapts the synthesis of sound fields to the restitution room, maintaining the theoretical formalism established in free field. Finally, an experimental validation of methods and techniques developed in the thesis is made. In this context, a digital signal processing toolkit is de- veloped. It process in real-time the microphones, ambisonics, and loudspeaker signals for the sound field capture, transformations, and decoding.
VISMARA, GIULIA. „Corrispondenze e interazioni tra suono spazio e corpo, strategie per un design sonoro dello spazio“. Doctoral thesis, Università IUAV di Venezia, 2020. http://hdl.handle.net/11578/287422.
Der volle Inhalt der QuelleGorlow, Stasnislaw. „Reverse audio engineering for active listening and other applications“. Phd thesis, Université Sciences et Technologies - Bordeaux I, 2013. http://tel.archives-ouvertes.fr/tel-00959329.
Der volle Inhalt der QuelleNugraha, Aditya Arie. „Réseaux de neurones profonds pour la séparation des sources et la reconnaissance robuste de la parole“. Electronic Thesis or Diss., Université de Lorraine, 2017. http://www.theses.fr/2017LORR0212.
Der volle Inhalt der QuelleThis thesis addresses the problem of multichannel audio source separation by exploiting deep neural networks (DNNs). We build upon the classical expectation-maximization (EM) based source separation framework employing a multichannel Gaussian model, in which the sources are characterized by their power spectral densities and their source spatial covariance matrices. We explore and optimize the use of DNNs for estimating these spectral and spatial parameters. Employing the estimated source parameters, we then derive a time-varying multichannel Wiener filter for the separation of each source. We extensively study the impact of various design choices for the spectral and spatial DNNs. We consider different cost functions, time-frequency representations, architectures, and training data sizes. Those cost functions notably include a newly proposed task-oriented signal-to-distortion ratio cost function for spectral DNNs. Furthermore, we present a weighted spatial parameter estimation formula, which generalizes the corresponding exact EM formulation. On a singing-voice separation task, our systems perform remarkably close to the current state-of-the-art method and provide up to 2 dB improvement of the source-to-interference ratio. On a speech enhancement task, our systems outperforms the state-of-the-art GEV-BAN beamformer by 14%, 7%, and 1% relative word error rate improvement on 6-channel, 4-channel, and 2-channel data, respectively
Kotouček, Filip. „Vícekanálový přenos zvukových signálů po lokální počítačové síti“. Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2017. http://www.nusl.cz/ntk/nusl-316879.
Der volle Inhalt der QuelleLecomte, Pierre. „Ambisonie d'ordre élevé en trois dimensions : captation, transformations et décodage adaptatifs de champs sonores“. Thesis, Paris, CNAM, 2016. http://www.theses.fr/2016CNAM1076/document.
Der volle Inhalt der QuelleSound field synthesis is an active research domain with various musical, multimedia or industrial applications. In the latter case, the accurate reconstruction of the sound field is targeted, which involves answering several scientific questions. Using arrays of microphones and loudspeakers, the capture, synthesis and accurate reconstruction of sound fields are theoretically possible. However, for practical applications, the arrangement of the loudspeakers and the acoustic influence of the restitution room are critical factors to consider in order to ensure the accurate reconstruction of the sound field.In this context, this thesis proposes methods and techniques for the capture, transformations and accurate reconstruction of sound fields in three dimensions based on the Higher Order Ambisonics (HOA) method. A spherical configuration for the array of microphones and loudspeakers is proposed. It follows a fifty-node Lebedev grid that enables the capture and reconstruction of the sound field up to order 5 with HOA formalism. The limitations of this approach, such as the spatial aliasing, are studied in detail.A transformation operation of the sound field is also proposed. The formulation is established in the spherical harmonics domain and enables a directional filtering on the sound field prior to the decoding step.For the reconstruction of the sound field, and original approach, also established in the spherical harmonics domain, can take into account the acoustic influence of the restitution room and the defects of the playback system. This treatment then adapts the synthesis of sound fields to the restitution room, maintaining the theoretical formalism established in free field.Finally, an experimental validation of methods and techniques developed in the thesis is made. In this context, a digital signal processing toolkit is developed. It process in real-time the microphones, ambisonics, and loudspeaker signals for the sound field capture, transformations, and decoding
Mahé, Pierre. „Codage ambisonique pour les communications immersives“. Thesis, La Rochelle, 2022. http://www.theses.fr/2022LAROS011.
Der volle Inhalt der QuelleThis thesis takes place in the context of the spread of immersive content. For the last couple of years, immersive audio recording and playback technologies have gained momentum and have become more and more popular. New codecs are needed to handle those spatial audio formats, especially for communication applications. There are several ways to represent spatial audio scenes. In this thesis, we focused on First Order Ambisonic. The first part of our research focused on improving multi-monocoding by decorrelated each ambisonic signal component before the multi-mono coding. To guarantee signal continuity between frames, efficient quantization new mechanisms are proposed. In the second part of this thesis, we proposed a new coding concept using a power map to recreate the original spatial image. With this concept, we proposed two compressing methods. The first one is a post-processing focused on limiting the spatial distortion of the decoded signal. The spatial correction is based on the difference between the original and the decoded spatial image. This post-processing is later extended to a parametric coding method. The last part of this thesis presents a more exploratory method. This method studied audio signal compression by neural networks inspired by image compression models using variational autoencoders
Leglaive, Simon. „Modèles de mélange pour la séparation multicanale de sources sonores en milieu réverbérant“. Electronic Thesis or Diss., Paris, ENST, 2017. http://www.theses.fr/2017ENST0068.
Der volle Inhalt der QuelleThis thesis addresses the problem of under-determined audio source separation for multichannel reverberant mixtures. We adopt a probabilistic approach where the source signals are represented as latent random variables in a time-frequency domain. The specific structure of musical signals in this domain is exploited by means of non-negative matrix factorization models. In the literature, the mixing filters are generally treated as deterministic parameters, only estimated from the observed data. However, as these filters correspond to room responses, they exhibit a very particular structure that can be used to guide their estimation. In a first part, the time-domain convolutive mixing process is approximated in the short-time Fourier transform domain, under the assumption that the impulse response of the mixing filters is short. We develop autoregressive moving average models that aim to transcribe the temporal dynamics of the filters into frequency-domain correlations. These models are then used in a source separation framework, for performing maximum a posteriori estimation of the mixing filters by means of an expectation-maximization algorithm. In a second part, we propose to infer the time-frequency source coefficients from the time-domain mixture observations, using a variational approach. The convolutive mixing process is here exactly represented. In addition to being suitable for the separation of highly reverberant mixtures, this approach allows us to develop simple priors for the mixing filters in order to guide their estimation. We propose a model based on the Student’s t distribution that exploits the exponential decay of reverberation in the time domain
Mariotte, Théo. „Traitement automatique de la parole en réunion par dissémination de capteurs“. Electronic Thesis or Diss., Le Mans, 2024. http://www.theses.fr/2024LEMA1001.
Der volle Inhalt der QuelleThis thesis work focuses on automatic speech processing, and more specifically on speaker diarization. This task requires the signal to be segmented to identify events such as voice activity, overlapped speech, or speaker changes. This work tackles the scenario where the signal is recorded by a device located in the center of a group of speakers, as in meetings. These conditions lead to a degradation in signal quality due to the distance between the speakers (distant speech).To mitigate this degradation, one approach is to record the signal using a microphone array. The resulting multichannel signal provides information on the spatial distribution of the acoustic field. Two lines of research are being explored for speech segmentation using microphone arrays.The first introduces a method combining acoustic features with spatial features. We propose a new set of features based on the circular harmonics expansion. This approach improves segmentation performance under distant speech conditions while reducing the number of model parameters and improving robustness in case of change in the array geometry.The second proposes several approaches that combine channels using self-attention. Different models, inspired by an existing architecture, are developed. Combining channels also improves segmentation under distant speech conditions. Two of these approaches make feature extraction more interpretable. The proposed distant speech segmentation systems also improve speaker diarization.Channel combination shows poor robustness to changes in the array geometry during inference. To avoid this behavior, a learning procedure is proposed, which improves the robustness in case of array mismatch.Finally, we identified a gap in the public datasets available for distant multichannel automatic speech processing. An acquisition protocol is introduced to build a new dataset, integrating speaker position annotation in addition to speaker diarization.Thus, this work aims to improve the quality of multichannel distant speech segmentation. The proposed methods exploit the spatial information provided by microphone arrays while improving the robustness in case of array mismatch
Roussel, Georges. „Contributions à la mise au point de méthodes adaptatives de reproduction de champs sonores multi-zone pour les auditeurs en mouvement : Sound zones pour auditeurs en mouvement“. Thesis, Le Mans, 2019. http://www.theses.fr/2019LEMA1018/document.
Der volle Inhalt der QuelleThe growing number of audio devices raises the problem of sharing the same physical space without sharing the same sound space. SoundZones make it possible to play independent and spatially separated audio programs by loudspeaker array in combination with sound fieldreproduction methods. The problem is then split into two zones: the Bright zone, where the audio content must be reproduced and theDark zone, where it must be cancelled. There are many methods available to solve this problem, but most only deal with auditors in astatic position. They are based on the direct resolution of adaptive optimization methods, such as the Pressure Matching (PM) method.However, for moving users, these methods have a too high computation cost, making it impossible to apply them to a dynamic problem.The aim of this thesis is to develop a solution offering a level of complexity compatible with a dynamic control of Sound Zones, whilemaintening the performance of conventional methods. Under the assumption that displacements are slow, an iterative resolution of the PMproblem is proposed and assessed. The LMS, NLMS and APA algorithms are compared on the basis of free field simulations. The LMSmethod is the most advantageous in terms of complexity, but it suffers from a reproduction error. A memory effect limiting the reactivityof the algorithms is also highlighted. It is corrected by implementing a leaky variant (Variable Leaky LMS or VLLMS) introducing aforgetting factor
Chen, Yi-Han, und 陳禕涵. „Nonlinear Calibration of the Room Transfer Function for Multichannel Audio Systems“. Thesis, 2012. http://ndltd.ncl.edu.tw/handle/45782122218193392058.
Der volle Inhalt der Quelle國立臺灣科技大學
電子工程系
100
As listening becomes a group experience, considering the acoustic variations in the design of equalizer for multichannel audio system turns out to be an indispensable process. The room transfer function describes the changes an audio signal undergoes when it propagates via a direct path and multipath reflections around an enclosed room. One of the criteria for a room transfer function design is to verify its capability of canceling indirect terms and multipath reflections. To perform such equalization, we propose a partial update adaptive algorithm based on the head-related transfer function (HRTF) and least-mean-square (LMS) approach to efficiently repress the indirect terms resulting from the multichannel playback system. Since HRTF has played a crucial role in the spatial audio signal processing, introducing this model is shown to facilitate the complexity reduction for a highly nonlinear system. On the other hand, the selection of crossover frequency between the subwoofer and satellite loudspeakers is essential for reconstructing a nearly perfect sound field in the reverberation room. Apart from the distortion-free of individual loudspeakers, the overall spectral variation between the subwoofer and satellite loudspeakers should be minimized around the selected crossover frequency. In this work, we propose a phase compensation scheme based on the minimax approximation to flatten the spectral response around the crossover frequency for the loudspeaker system emphasized on the low-frequency effect. The cascaded all-pass filter is shown to yield an improved low-frequency performance by flattening the net magnitude response in the crossover region. As a result, several examples of numerical design based on the reverberant room recording are provided to verify the characteristics of the proposal systems.
Chen, Yi-Wen, und 陳怡妏. „Robust binaural audio rendering with the time-domain underdetermined multichannel inverse prefilters“. Thesis, 2018. http://ndltd.ncl.edu.tw/handle/kpf4k7.
Der volle Inhalt der QuelleAditya, Arie Nugraha. „Réseaux de neurones profonds pour la séparation des sources et la reconnaissance robuste de la parole“. Thesis, 2017. http://www.theses.fr/2017LORR0212/document.
Der volle Inhalt der QuelleThis thesis addresses the problem of multichannel audio source separation by exploiting deep neural networks (DNNs). We build upon the classical expectation-maximization (EM) based source separation framework employing a multichannel Gaussian model, in which the sources are characterized by their power spectral densities and their source spatial covariance matrices. We explore and optimize the use of DNNs for estimating these spectral and spatial parameters. Employing the estimated source parameters, we then derive a time-varying multichannel Wiener filter for the separation of each source. We extensively study the impact of various design choices for the spectral and spatial DNNs. We consider different cost functions, time-frequency representations, architectures, and training data sizes. Those cost functions notably include a newly proposed task-oriented signal-to-distortion ratio cost function for spectral DNNs. Furthermore, we present a weighted spatial parameter estimation formula, which generalizes the corresponding exact EM formulation. On a singing-voice separation task, our systems perform remarkably close to the current state-of-the-art method and provide up to 2 dB improvement of the source-to-interference ratio. On a speech enhancement task, our systems outperforms the state-of-the-art GEV-BAN beamformer by 14%, 7%, and 1% relative word error rate improvement on 6-channel, 4-channel, and 2-channel data, respectively