Tesis sobre el tema "Classification de scènes sonores"
Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros
Consulte los 45 mejores tesis para su investigación sobre el tema "Classification de scènes sonores".
Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.
También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.
Explore tesis sobre una amplia variedad de disciplinas y organice su bibliografía correctamente.
Bisot, Victor. "Apprentissage de représentations pour l'analyse de scènes sonores". Electronic Thesis or Diss., Paris, ENST, 2018. http://www.theses.fr/2018ENST0016.
Texto completoThis thesis work focuses on the computational analysis of environmental sound scenes and events. The objective of such tasks is to automatically extract information about the context in which a sound has been recorded. The interest for this area of research has been rapidly increasing in the last few years leading to a constant growth in the number of works and proposed approaches. We explore and contribute to the main families of approaches to sound scene and event analysis, going from feature engineering to deep learning. Our work is centered at representation learning techniques based on nonnegative matrix factorization, which are particularly suited to analyse multi-source environments such as acoustic scenes. As a first approach, we propose a combination of image processing features with the goal of confirming that spectrograms contain enough information to discriminate sound scenes and events. From there, we leave the world of feature engineering to go towards automatically learning the features. The first step we take in that direction is to study the usefulness of matrix factorization for unsupervised feature learning techniques, especially by relying on variants of NMF. Several of the compared approaches allow us indeed to outperform feature engineering approaches to such tasks. Next, we propose to improve the learned representations by introducing the TNMF model, a supervised variant of NMF. The proposed TNMF models and algorithms are based on jointly learning nonnegative dictionaries and classifiers by minimising a target classification cost. The last part of our work highlights the links and the compatibility between NMF and certain deep neural network systems by proposing and adapting neural network architectures to the use of NMF as an input representation. The proposed models allow us to get state of the art performance on scene classification and overlapping event detection tasks. Finally we explore the possibility of jointly learning NMF and neural networks parameters, grouping the different stages of our systems in one optimisation problem
Olvera, Zambrano Mauricio Michel. "Robust sound event detection". Electronic Thesis or Diss., Université de Lorraine, 2022. http://www.theses.fr/2022LORR0324.
Texto completoFrom industry to general interest applications, computational analysis of sound scenes and events allows us to interpret the continuous flow of everyday sounds. One of the main degradations encountered when moving from lab conditions to the real world is due to the fact that sound scenes are not composed of isolated events but of multiple simultaneous events. Differences between training and test conditions also often arise due to extrinsic factors such as the choice of recording hardware and microphone positions, as well as intrinsic factors of sound events, such as their frequency of occurrence, duration and variability. In this thesis, we investigate problems of practical interest for audio analysis tasks to achieve robustness in real scenarios.Firstly, we explore the separation of ambient sounds in a practical scenario in which multiple short duration sound events with fast varying spectral characteristics (i.e., foreground sounds) occur simultaneously with background stationary sounds. We introduce the foreground-background ambient sound separation task and investigate whether a deep neural network with auxiliary information about the statistics of the background sound can differentiate between rapidly- and slowly-varying spectro-temporal characteristics. Moreover, we explore the use of per-channel energy normalization (PCEN) as a suitable pre-processing and the ability of the separation model to generalize to unseen sound classes. Results on mixtures of isolated sounds from the DESED and Audioset datasets demonstrate the generalization capability of the proposed separation system, which is mainly due to PCEN.Secondly, we investigate how to improve the robustness of audio analysis systems under mismatched training and test conditions. We explore two distinct tasks: acoustic scene classification (ASC) with mismatched recording devices and training of sound event detection (SED) systems with synthetic and real data.In the context of ASC, without assuming the availability of recordings captured simultaneously by mismatched training and test recording devices, we assess the impact of moment normalization and matching strategies and their integration with unsupervised adversarial domain adaptation. Our results show the benefits and limitations of these adaptation strategies applied at different stages of the classification pipeline. The best strategy matches source domain performance in the target domain.In the context of SED, we propose a PCEN based acoustic front-end with learned parameters. Then, we study the joint training of SED with auxiliary classification branches that categorize sounds as foreground or background according to their spectral properties. We also assess the impact of aligning the distributions of synthetic and real data at the frame or segment level based on optimal transport. Finally, we integrate an active learning strategy in the adaptation procedure. Results on the DESED dataset indicate that these methods are beneficial for the SED task and that their combination further improves performance on real sound scenes
Gontier, Félix. "Analyse et synthèse de scènes sonores urbaines par approches d'apprentissage profond". Thesis, Ecole centrale de Nantes, 2020. http://www.theses.fr/2020ECDN0042.
Texto completoThe advent of the Internet of Things (IoT) has enabled the development of largescale acoustic sensor networks to continuously monitor sound environments in urban areas. In the soundscape approach, perceptual quality attributes are associated with the activity of sound sources, quantities of importance to better account for the human perception of its acoustic environment. With recent success in acoustic scene analysis, deep learning approaches are uniquely suited to predict these quantities. Though, annotations necessary to the training process of supervised deep learning models are not easily obtainable, partly due to the fact that the information content of sensor measurements is limited by privacy constraints. To address this issue, a method is proposed for the automatic annotation of perceived source activity in large datasets of simulated acoustic scenes. On simulated data, trained deep learning models achieve state-of-the-art performances in the estimation of sourcespecific perceptual attributes and sound pleasantness. Semi-supervised transfer learning techniques are further studied to improve the adaptability of trained models by exploiting knowledge from the large amounts of unlabelled sensor data. Evaluations on annotated in situ recordings show that learning latent audio representations of sensor measurements compensates for the limited ecological validity of simulated sound scenes. In a second part, the use of deep learning methods for the synthesis of time domain signals from privacy-aware sensor measurements is investigated. Two spectral convolutional approaches are developed and evaluated against state-of-the-art methods designed for speech synthesis
Lafay, Grégoire. "Simulation de scènes sonores environnementales : Application à l’analyse sensorielle et l’analyse automatique". Thesis, Ecole centrale de Nantes, 2016. http://www.theses.fr/2016ECDN0007/document.
Texto completoThis thesis deals with environmental scene analysis, the auditory result of mixing separate but concurrent emitting sources. The sound environment is a complex object, which opens the field of possible research beyond the specific areas that are speech or music. For a person to make sense of its sonic environment, the involved process relies on both the perceived data and its context. For each experiment, one must be, as much as possible,in control of the evaluated stimuli, whether the field of investigation is perception or machine learning. Nevertheless, the sound environment needs to be studied in an ecological framework, using real recordings of sounds as stimuli rather than synthetic pure tones. We therefore propose a model of sound scenes allowing us to simulate complex sound environments from isolated sound recordings. The high level structural properties of the simulated scenes -- such as the type of sources, their sound levels or the event density -- are set by the experimenter. Based on knowledge of the human auditory system, the model abstracts the sound environment as a composite object, a sum of soundsources. The usefulness of the proposed model is assessed on two areas of investigation. The first is related to the soundscape perception issue, where the model is used to propose an innovative experimental protocol to study pleasantness perception of urban soundscape. The second tackles the major issue of evaluation in machine listening, for which we consider simulated data in order to powerfully assess the generalization capacities of automatic sound event detection systems
Moussallam, Manuel. "Représentations redondantes et hiérarchiques pour l'archivage et la compression de scènes sonores". Phd thesis, Télécom ParisTech, 2012. http://pastel.archives-ouvertes.fr/pastel-00834272.
Texto completoMoussallam, Manuel. "Représentations redondantes et hiérarchiques pour l'archivage et la compression de scènes sonores". Electronic Thesis or Diss., Paris, ENST, 2012. http://www.theses.fr/2012ENST0079.
Texto completoThe main goal of this work is automated processing of large volumes of audio data. Most specifically, one is interested in archiving, a process that encompass at least two distinct problems: data compression and data indexing. Jointly addressing these problems is a difficult task since many of their objectives may be concurrent. Therefore, building a consistent framework for audio archival is the matter of this thesis. Sparse representations of signals in redundant dictionaries have recently been found of interest for many sub-problems of the archival task. Sparsity is a desirable property both for compression and for indexing. Methods and algorithms to build such representations are the first topic of this thesis. Given the dimensionality of the considered data, greedy algorithms will be particularly studied. A first contribution of this thesis is the proposal of a variant of the famous Matching Pursuit algorithm, that exploits randomness and sub-sampling of very large time frequency dictionaries. We show that audio compression (especially at low bit-rate) can be improved using this method. This new algorithms comes with an original modeling of asymptotic pursuit behaviors, using order statistics and tools from extreme values theory. Other contributions deal with the second member of the archival problem: indexing. The same framework is used and applied to different layers of signal structures. First, redundancies and musical repetition detection is addressed. At larger scale, we investigate audio fingerprinting schemes and apply it to radio broadcast on-line segmentation. Performances have been evaluated during an international campaign within the QUAERO project. Finally, the same framework is used to perform source separation informed by the redundancy. All these elements validate the proposed framework for the audio archiving task. The layered structures of audio data are accessed hierarchically by greedy decomposition algorithms and allow processing the different objectives of archival at different steps, thus addressing them within the same framework
Baskind, Alexis. "Modèles et méthodes de description spatiale de scènes sonores : application aux enregistrements binauraux". Paris 6, 2003. http://www.theses.fr/2003PA066407.
Texto completoRompré, Louis. "Vers une méthode de classification de fichiers sonores /". Thèse, Trois-Rivières : Université du Québec à Trois-Rivières, 2007. http://www.uqtr.ca/biblio/notice/resume/30024804R.pdf.
Texto completoRompré, Louis. "Vers une méthode de classification de fichiers sonores". Thèse, Université du Québec à Trois-Rivières, 2007. http://depot-e.uqtr.ca/2022/1/030024804.pdf.
Texto completoPerotin, Lauréline. "Localisation et rehaussement de sources de parole au format Ambisonique : analyse de scènes sonores pour faciliter la commande vocale". Thesis, Université de Lorraine, 2019. http://www.theses.fr/2019LORR0124/document.
Texto completoThis work was conducted in the fast-growing context of hands-free voice command. In domestic environments, smart devices are usually laid in a fixed position, while the human speaker gives orders from anywhere, not necessarily next to the device, or nor even facing it. This adds difficulties compared to the problem of near-field voice command (typically for mobile phones) : strong reverberation, early reflections on furniture around the device, and surrounding noises can degrade the signal. Moreover, other speakers may interfere, which make the understanding of the target speaker quite difficult. In order to facilitate speech recognition in such adverse conditions, several preprocessing methods are introduced here. We use a spatialized audio format suitable for audio scene analysis : the Ambisonic format. We first propose a sound source localization method that relies on a convolutional and recurrent neural network. We define an input feature vector inspired by the acoustic intensity vector which improves the localization performance, in particular in real conditions involving several speakers and a microphone array laid on a table. We exploit the visualization technique called layerwise relevance propagation (LRP) to highlight the time-frequency zones that are correlate positively with the network output. This analysis is of paramount importance to establish the validity of a neural network. In addition, it shows that the neural network essentially relies on time-frequency zones where direct sound dominates reverberation and background noise. We then present a method to enhance the voice of the main speaker and ease its recognition. We adopt a mask-based beamforming framework based on a time-frequency mask estimated by a neural network. To deal with the situation of multiple speakers with similar loudness, we first use a wideband beamformer to enhance the target speaker thanks to the associated localization information. We show that this additional information is not enough for the network when two speakers are close to each other. However, if we also give an enhanced version of the interfering speaker as input to the network, it returns much better masks. The filters generated from those masks greatly improve speech recognition performance. We evaluate this algorithm in various environments, including real ones, with a black-box automatic speech recognition system. Finally, we combine the proposed localization and enhancement systems and evaluate the robustness of the latter to localization errors in real environments
Perotin, Lauréline. "Localisation et rehaussement de sources de parole au format Ambisonique : analyse de scènes sonores pour faciliter la commande vocale". Electronic Thesis or Diss., Université de Lorraine, 2019. http://www.theses.fr/2019LORR0124.
Texto completoThis work was conducted in the fast-growing context of hands-free voice command. In domestic environments, smart devices are usually laid in a fixed position, while the human speaker gives orders from anywhere, not necessarily next to the device, or nor even facing it. This adds difficulties compared to the problem of near-field voice command (typically for mobile phones) : strong reverberation, early reflections on furniture around the device, and surrounding noises can degrade the signal. Moreover, other speakers may interfere, which make the understanding of the target speaker quite difficult. In order to facilitate speech recognition in such adverse conditions, several preprocessing methods are introduced here. We use a spatialized audio format suitable for audio scene analysis : the Ambisonic format. We first propose a sound source localization method that relies on a convolutional and recurrent neural network. We define an input feature vector inspired by the acoustic intensity vector which improves the localization performance, in particular in real conditions involving several speakers and a microphone array laid on a table. We exploit the visualization technique called layerwise relevance propagation (LRP) to highlight the time-frequency zones that are correlate positively with the network output. This analysis is of paramount importance to establish the validity of a neural network. In addition, it shows that the neural network essentially relies on time-frequency zones where direct sound dominates reverberation and background noise. We then present a method to enhance the voice of the main speaker and ease its recognition. We adopt a mask-based beamforming framework based on a time-frequency mask estimated by a neural network. To deal with the situation of multiple speakers with similar loudness, we first use a wideband beamformer to enhance the target speaker thanks to the associated localization information. We show that this additional information is not enough for the network when two speakers are close to each other. However, if we also give an enhanced version of the interfering speaker as input to the network, it returns much better masks. The filters generated from those masks greatly improve speech recognition performance. We evaluate this algorithm in various environments, including real ones, with a black-box automatic speech recognition system. Finally, we combine the proposed localization and enhancement systems and evaluate the robustness of the latter to localization errors in real environments
BOBAN, PATRICK. "Iconographie du sanctuaire d'Eleusis : études des représentations figurées dans l'art grec et romain : essai de classification thématique selon le triptyque scènes mythiques, scènes rituelles, scènes initiatiques". Dijon, 1994. http://www.theses.fr/1994DIJOL008.
Texto completoGribonval, Rémi. "Approximations non-linéaires pour l'analyse de signaux sonores". Phd thesis, Université Paris Dauphine - Paris IX, 1999. http://tel.archives-ouvertes.fr/tel-00583662.
Texto completoHomayouni, Saeid. "Caractérisation des Scènes Urbaines par Analyse des Images Hyperspectrales". Phd thesis, Télécom ParisTech, 2005. http://pastel.archives-ouvertes.fr/pastel-00002521.
Texto completoDuan, Liuyun. "Modélisation géométrique de scènes urbaines par imagerie satellitaire". Thesis, Université Côte d'Azur (ComUE), 2017. http://www.theses.fr/2017AZUR4025.
Texto completoAutomatic city modeling from satellite imagery is one of the biggest challenges in urban reconstruction. The ultimate goal is to produce compact and accurate 3D city models that benefit many application fields such as urban planning, telecommunications and disaster management. Compared with aerial acquisition, satellite imagery provides appealing advantages such as low acquisition cost, worldwide coverage and high collection frequency. However, satellite context also imposes a set of technical constraints as a lower pixel resolution and a wider that challenge 3D city reconstruction. In this PhD thesis, we present a set of methodological tools for generating compact, semantically-aware and geometrically accurate 3D city models from stereo pairs of satellite images. The proposed pipeline relies on two key ingredients. First, geometry and semantics are retrieved simultaneously providing robust handling of occlusion areas and low image quality. Second, it operates at the scale of geometric atomic regions which allows the shape of urban objects to be well preserved, with a gain in scalability and efficiency. Images are first decomposed into convex polygons that capture geometric details via Voronoi diagram. Semantic classes, elevations, and 3D geometric shapes are then retrieved in a joint classification and reconstruction process operating on polygons. Experimental results on various cities around the world show the robustness, scalability and efficiency of the proposed approach
Blachon, David. "Reconnaissance de scènes multimodale embarquée". Thesis, Université Grenoble Alpes (ComUE), 2016. http://www.theses.fr/2016GREAM001/document.
Texto completoContext: This PhD takes place in the contexts of Ambient Intelligence and (Mobile) Context/Scene Awareness. Historically, the project comes from the company ST-Ericsson. The project was depicted as a need to develop and embed a “context server” on the smartphone that would get and provide context information to applications that would require it. One use case was given for illustration: when someone is involved in a meeting and receives a call, then thanks to the understanding of the current scene (meet at work), the smartphone is able to automatically act and, in this case, switch to vibrate mode in order not to disturb the meeting. The main problems consist of i) proposing a definition of what is a scene and what examples of scenes would suit the use case, ii) acquiring a corpus of data to be exploited with machine learning based approaches, and iii) propose algorithmic solutions to the problem of scene recognition.Data collection: After a review of existing databases, it appeared that none fitted the criteria I fixed (long continuous records, multi-sources synchronized records necessarily including audio, relevant labels). Hence, I developed an Android application for collecting data. The application is called RecordMe and has been successfully tested on 10+ devices, running Android 2.3 and 4.0 OS versions. It has been used for 3 different campaigns including the one for scenes. This results in 500+ hours recorded, 25+ volunteers were involved, mostly in Grenoble area but abroad also (Dublin, Singapore, Budapest). The application and the collection protocol both include features for protecting volunteers privacy: for instance, raw audio is not saved, instead MFCCs are saved; sensitive strings (GPS coordinates, device ids) are hashed on the phone.Scene definition: The study of existing works related to the task of scene recognition, along with the analysis of the annotations provided by the volunteers during the data collection, allowed me to propose a definition of a scene. It is defined as a generalisation of a situation, composed of a place and an action performed by one person (the smartphone owner). Examples of scenes include taking a transportation, being involved in a work meeting, walking in the street. The composition allows to get different kinds of information to provide on the current scene. However, the definition is still too generic, and I think that it might be completed with additionnal information, integrated as new elements of the composition.Algorithmics: I have performed experiments involving machine learning techniques, both supervised and unsupervised. The supervised one is about classification. The method is quite standard: find relevant descriptors of the data through the use of an attribute selection method. Then train and test several classifiers (in my case, there were J48 and Random Forest trees ; GMM ; HMM ; and DNN). Also, I have tried a 2-stage system composed of a first step of classifiers trained to identify intermediate concepts and whose predictions are merged in order to estimate the most likely scene. The unsupervised part of the work aimed at extracting information from the data, in an unsupervised way. For this purpose, I applied a bottom-up hierarchical clustering, based on the EM algorithm on acceleration and audio data, taken separately and together. One of the results is the distinction of acceleration into groups based on the amount of agitation
Baelde, Maxime. "Modèles génératifs pour la classification et la séparation de sources sonores en temps-réel". Thesis, Lille 1, 2019. http://www.theses.fr/2019LIL1I058/document.
Texto completoThis thesis is part of the A-Volute company, an audio enhancement softwares editor. It offers a radar that translates multi-channel audio information into visual information in real-time. This radar, although relevant, lacks intelligence because it only analyses the audio stream in terms of energy and not in terms of separate sound sources. The purpose of this thesis is to develop algorithms for classifying and separating sound sources in real time. On the one hand, audio source classification aims to assign a label (e.g. voice) to a monophonic (one label) or polyphonic (several labels) sound. The developed method uses a specific feature, the normalized power spectrum, which is useful in both monophonic and polyphonic cases due to its additive properties of the sound sources. This method uses a generative model that allows to derive a decision rule based on a non-parametric estimation. The real-time constraint is achieved by pre-processing the prototypes with a hierarchical clustering. The results are encouraging on different databases (owned and benchmark), both in terms of accuracy and computation time, especially in the polyphonic case. On the other hand, source separation consists in estimating the sources in terms of signal in a mixture. Two approaches to this purpose were considered in this thesis. The first considers the signals to be found as missing data and estimates them through a generative process and probabilistic modelling. The other approach consists, from sound examples present in a database, in computing optimal transformations of several examples whose combination tends towards the observed mixture. The two proposals are complementary, each having advantages and drawbacks (computation time for the first, interpretability of the result for the second). The experimental results seem promising and allow us to consider interesting research perspectives for each of the proposals
Dellandréa, Emmanuel. "Analyse de signaux vidéos et sonores : application à l'étude de signaux médicaux". Tours, 2003. http://www.theses.fr/2003TOUR4031.
Texto completoThe work deals with the study of multimedia sequences containing images and sounds. The analysis of images sequences consists in the tracking of moving objects in order to allow the study of their properties. The investigations have to enable the understanding of sounds when correlated to events in the image sequence. One generic method, based on the combination of regions and contours tracking, and one method adapted to homogeneous objects, based on level set theory, are proposed. The analysis of audio data consists in the development of an identification system based on the study of the structure of signals thanks to their coding and Zipf laws modeling. These methods have been evaluated on medical sequences within the framework of the gastro-oesophageal reflux pathology study, in collaboration with the Acoustique et Motricité Digestive research team of the University of Tours
Vincent, Emmanuel. "Modèles d'instruments pour la séparation de sources et la transcription d'enregistrements musicaux". Phd thesis, Université Pierre et Marie Curie - Paris VI, 2004. http://tel.archives-ouvertes.fr/tel-00544710.
Texto completoBesbes, Bassem. "Intégration de méthodes de représentation et de classification pour la détection et la reconnaissance d'obstacles dans des scènes routières". Phd thesis, INSA de Rouen, 2011. http://tel.archives-ouvertes.fr/tel-00633109.
Texto completoRanta, Radu. "Traitement et analyse de signaux sonores physiologiques : application à la phonoentérographie". Phd thesis, Institut National Polytechnique de Lorraine - INPL, 2003. http://tel.archives-ouvertes.fr/tel-00005906.
Texto completoAlqasir, Hiba. "Apprentissage profond pour l'analyse de scènes de remontées mécaniques : amélioration de la généralisation dans un contexte multi-domaines". Thesis, Lyon, 2020. http://www.theses.fr/2020LYSES045.
Texto completoThis thesis presents our work on chairlift safety using deep learning techniques as part of the Mivao project, which aims to develop a computer vision system that acquires images of the chairlift boarding station, analyzes the crucial elements, and detects dangerous situations. In this scenario, we have different chairlifts spread over different ski resorts, with a high diversity of acquisition conditions and geometries; thus, each chairlift is considered a domain. When the system is installed for a new chairlift, the objective is to perform an accurate and reliable scene analysis, given the lack of labeled data on this new domain (chairlift).In this context, we mainly concentrate on the chairlift safety bar and propose to classify each image into two categories, depending on whether the safety bar is closed (safe) or open (unsafe). Thus, it is an image classification problem with three specific features: (i) the image category depends on a small detail (the safety bar) in a cluttered background, (ii) manual annotations are not easy to obtain, (iii) a classifier trained on some chairlifts should provide good results on a new one (generalization). To guide the classifier towards the important regions of the images, we have proposed two solutions: object detection and Siamese networks. Furthermore, we analyzed the generalization property of these two approaches. Our solutions are motivated by the need to minimize human annotation efforts while improving the accuracy of the chairlift safety problem. However, these contributions are not necessarily limited to this specific application context, and they may be applied to other problems in a multi-domain context
Magnier, Caroline. "Production acoustique d'une flottille côtière : Application au suivi environnemental et à l'identification automatisée de sources sonores anthropiques". Thesis, Université Grenoble Alpes (ComUE), 2018. http://www.theses.fr/2018GREAU040/document.
Texto completoMarine traffic is the main contributor to anthropogenic underwater noise: since the 1970s, the increase in deep-sea shipping has increased the ambient noise by more than 10 dB in some areas. In response to this concern, the Marine Strategy Framework Directive (MSFD) recommends acoustic monitoring. Few studies are concerned with coastal activity and the noises radiated by small craft while these coastal environments are the purveyors of 41.7% of the ecosystem services produced by the oceans.Between the academic and the industrial world, this PhD was to answer the different scientific and industrial questions on the topic of the coastal traffic in terms of the influence in the soundscape and the detection and classification of the coastal craft.Without information on the coastal maritime traffic, a visual identification protocol is proposed using GoPro® images processing and produced the same data as the AIS (position, speed, size and type of craft); It allows to create maritime traffic maps on a disk of 1.6km radius. The traffic is characterized by two acoustic descriptors: the SPL linked to the distance of the nearest boat and the ANL linked to the number of boats present in a 500 m radius disc. The spatiotemporal monitoring of these descriptors allows to identify the impact on the maritime traffic on the coastal acoustic landscape. The acoustic detection and the classification are performed after individual characterization of the noise by a set of acoustic parameters and using of supervised machine learning algorithm. A specific protocol for the creation of the classification tree is proposed by comparing the acoustic data with the physical and contextual characteristics of each boat.The methods are applied on the flotilla of coastal boats present in the Bay of Calvi (Corsica) during summer
Amberg, Virginie. "Analyse de scènes péri-urbaines à partir d'images radar haute résolution : application à l'extraction semi-automatique du réseau routier". Phd thesis, Toulouse, INPT, 2005. http://oatao.univ-toulouse.fr/7452/1/amberg1.pdf.
Texto completoGille, Quentin. "Propositions pour un paradigme culturel de la phono-cinématographie: des phono-scènes aux vidéoclips et au-delà". Doctoral thesis, Universite Libre de Bruxelles, 2014. http://hdl.handle.net/2013/ULB-DIPOT:oai:dipot.ulb.ac.be:2013/209309.
Texto completoNotre approche se situe à cheval sur l’histoire du cinéma, de la musique populaire et de la télévision. En nous appuyant sur certaines propositions théoriques et certains concepts formulés dans le champ des études cinématographiques ainsi que dans le champ des performance studies, nous serons particulièrement attentif aux questions de représentation qui se déploient dans ces différents dispositifs phono-cinématographiques :à savoir, les premiers films chantants (les phono-scènes Gaumont et les Vitaphone shorts), les juke-boxes équipés d’un écran (les Soundies et les Scopitones) et enfin les vidéos musicales télévisées (les films promotionnels et les vidéoclips).
Doctorat en Information et communication
info:eu-repo/semantics/nonPublished
Kulikova, Maria. "Shape recognition for image scene analysis". Nice, 2009. http://www.theses.fr/2009NICE4081.
Texto completoThis thesis includes two main parts. In the first part we address the problem of tree crown classification into species using shape features, without, or in combination with, those of radiometry and texture, to demonstrate that shape information improves classification performance. For this purpose, we first study the shapes of tree crowns extracted from very high resolution aerial infra-red images. For our study, we choose a methodology based on the shape analysis of closed continuous curves on shape spaces using geodesic paths under the bending metric with the angle function curve representation, and the elastic metric with the square root q-function representation? A necessary preliminary step to classification is extraction of the tree crowns. In the second part, we address thus the problem of extraction of multiple objects with complex, arbitrary shape from remote sensing images of very high resolution. We develop a model based on marked point process. Its originality lies on its use of arbitrarily-shaped objects as opposed to parametric shape objects, e. G. Ellipses or rectangles. The shapes considered are obtained by local minimisation of an energy of contour active type with weak and the strong shape prior knowledge included. The objects in the final (optimal) configuration are then selected from amongst these candidates by a birth-and-death dynamics embedded in an annealing scheme. The approach is validated on very high resolutions of forest provided by the Swedish University of Agriculture
Tanquerel, Lucille. "Caractérisation des documents sonores : Etude et conception d'un procédé de calcul rapide de signature audio basée sur une perception limitée du contenu". Caen, 2008. http://www.theses.fr/2008CAEN2056.
Texto completoThe description of the sound characteristics of a document is a key for treatments involving automatic audio data. The objective of our work is to describe a method able to generate rapidly a signature of a sound file by the extraction of physical characteristics over the file (spectral analysis of signal). The innovation of our proposal concerns the organization of the extraction of samples and the analysis mode to provide quickly a signature representative of musical content. The organization of extraction defines how samples are taken. Our proposal aims to achieve a statistical sequential minimum sampling allocated over the sound file. The principle of this proposal is based on the assumption that the collection of a small quantity of small duration samples is sufficient to have information summarizing effectively the perceived rhythm. Our validation method is based on an error objective recognition. We show that the signature can compare the files between them and accurately identify identical pieces even if they are not complete. We also show that it can combine two halves of the same song with a significant success rate. On the other hand the validation is based on the comparison of the rhythmical signature with human perception and also on the distinction of sound recordings according to the language spoken. All tests provide interesting results given the time of calculation
Adeli, Mohammad. "Recherche de caractéristiques sonores et de correspondances audiovisuelles pour des systèmes bio-inspirés de substitution sensorielle de l'audition vers la vision". Thèse, Université de Sherbrooke, 2016. http://hdl.handle.net/11143/8194.
Texto completoAbstract: Sensory substitution systems encode a stimulus modality into another stimulus modality. They can provide the means for handicapped people to perceive stimuli of an impaired modality through another modality. The purpose of this study was to investigate auditory to visual substitution systems. This type of sensory substitution is not well-studied probably because of the complexities of the auditory system and the difficulties arising from the mismatch between audible sounds that can change with frequencies up to 20000 Hz and visual stimuli that should change very slowly with time to be perceived. Two specific problems of auditory to visual substitution systems were targeted in this research: the investigation of audiovisual correspondences and the extraction of auditory features. An audiovisual experiment was conducted online to find the associations between the auditory (pitch and timbre) and visual (shape, color, height) features. One hundred and nineteen subjects took part in the experiments. A strong association between timbre of envelope normalized sounds and visual shapes was observed. Subjects strongly associated soft timbres with blue, green or light gray rounded shapes, harsh timbres with red, yellow or dark gray sharp angular shapes and timbres having elements of softness and harshness together with a mixture of the previous two shapes. Fundamental frequency was not associated with height, grayscale or color. Given the correspondence between timbre and shapes, in the next step, a flexible and multipurpose bio-inspired hierarchical model for analyzing timbre and extracting the important timbral features was developed. Inspired by findings in the fields of neuroscience, computational neuroscience, and psychoacoustics, not only does the model extract spectral and temporal characteristics of a signal, but it also analyzes amplitude modulations on different timescales. It uses a cochlear filter bank to resolve the spectral components of a sound, lateral inhibition to enhance spectral resolution, and a modulation filter bank to extract the global temporal envelope and roughness of the sound from amplitude modulations. To demonstrate its potential for timbre representation, the model was successfully evaluated in three applications: 1) comparison with subjective values of roughness, 2) musical instrument classification, and 3) feature selection for labeled timbres. The correspondence between timbre and shapes revealed by this study and the proposed model for timbre analysis can be used to develop intuitive auditory to visual substitution systems that encode timbre into visual shapes.
Binet, Karine. "Mise en oeuvre d'un système d'aide à la détection de cibles terrestres camouflées". Rennes 1, 2005. http://www.theses.fr/2005REN1S085.
Texto completoKrapac, Josip. "Représentations d'images pour la recherche et la classification d'images". Phd thesis, Université de Caen, 2011. http://tel.archives-ouvertes.fr/tel-00650998.
Texto completoCarlo, Diego Di. "Echo-aware signal processing for audio scene analysis". Thesis, Rennes 1, 2020. http://www.theses.fr/2020REN1S075.
Texto completoMost of audio signal processing methods regard reverberation and in particular acoustic echoes as a nuisance. However, they convey important spatial and semantic information about sound sources and, based on this, recent echo-aware methods have been proposed. In this work we focus on two directions. First, we study the how to estimate acoustic echoes blindly from microphone recordings. Two approaches are proposed, one leveraging on continuous dictionaries, one using recent deep learning techniques. Then, we focus on extending existing methods in audio scene analysis to their echo-aware forms. The Multichannel NMF framework for audio source separation, the SRP-PHAT localization method, and the MVDR beamformer for speech enhancement are all extended to their echo-aware versions
Gebhardt, Lars. "„… eine Abgeburt, welche aus gräulichem Inceste entsteht …“". Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2009. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-23515.
Texto completoLu, Yanyun. "Online classification and clustering of persons using appearance-based features from video images : application to person discovery and re-identification in multicamera environments". Thesis, Lille 1, 2014. http://www.theses.fr/2014LIL10119/document.
Texto completoVideo surveillance is nowadays an important topic to address, as it is broadly used for security and it brings problems related to big data processing. A part of it is identification and re-identification of persons in multicamera environments. The objective of this thesis work is to design a complete automatic appearance-based human recognition system working in real-life environment, with the goal to achieve two main tasks: person re-identification and new person discovery. The proposed system consists of four modules: video data acquisition; background extraction and silhouette extraction; feature extraction and selection; and person recognition. For evaluation purposes, in addition to the public available CASIA Database, a more challenging new database has been created under low constraints. Grey-world normalized color features and Haralick texture features are extracted as initial feature subset, then features selection approaches are tested and compared. These optimized subsets of features are then used firstly for person re-identification using Multi-category Incremental and Decremental SVM (MID-SVM) algorithm with the advantage of training only with few initial images and secondly for person discovery and classification using Self-Adaptive Kernel Machine (SAKM) algorithm able to differentiate existing persons who can be classified from new persons who have to be learned and added. The proposed system succeed in person re-identification with classification rate of over 95\% and achieved satisfying performances on person discovery with accuracy rate of over 90%
Nesvadba, Jan. "Segmentation sémantique des contenus audio-visuels". Bordeaux 1, 2007. http://www.theses.fr/2007BOR13456.
Texto completoEssid, Slim. "Classification automatique des signaux audio-fréquences : reconnaissance des instruments de musique". Phd thesis, Université Pierre et Marie Curie - Paris VI, 2005. http://pastel.archives-ouvertes.fr/pastel-00002738.
Texto completoMousse, Ange Mikaël. "Reconnaissance d'activités humaines à partir de séquences multi-caméras : application à la détection de chute de personne". Thesis, Littoral, 2016. http://www.theses.fr/2016DUNK0453/document.
Texto completoArtificial vision is an involving field of research. The new strategies make it possible to have some autonomous networks of cameras. This leads to the development of many automatic surveillance applications using the cameras. The work developed in this thesis concerns the setting up of an intelligent video surveillance system for real-time people fall detection. The first part of our work consists of a robust estimation of the surface area of a person from two (02) cameras with complementary views. This estimation is based on the detection of each camera. In order to have a robust detection, we propose two approaches. The first approach consists in combining a motion detection algorithm based on the background modeling with an edge detection algorithm. A fusion approach has been proposed to make much more efficient the results of the detection. The second approach is based on the homogeneous regions of the image. A first segmentation is performed to find homogeneous regions of the image. And finally we model the background using obtained regions
Benabbas, Yassine. "Analyse du comportement humain à partir de la vidéo en étudiant l'orientation du mouvement". Phd thesis, Université des Sciences et Technologie de Lille - Lille I, 2012. http://tel.archives-ouvertes.fr/tel-00839699.
Texto completoSingh, Praveer. "Processing high-resolution images through deep learning techniques". Thesis, Paris Est, 2018. http://www.theses.fr/2018PESC1172.
Texto completoIn this thesis, we discuss four different application scenarios that can be broadly grouped under the larger umbrella of Analyzing and Processing high-resolution images using deep learning techniques. The first three chapters encompass processing remote-sensing (RS) images which are captured either from airplanes or satellites from hundreds of kilometers away from the Earth. We start by addressing a challenging problem related to improving the classification of complex aerial scenes through a deep weakly supervised learning paradigm. We showcase as to how by only using the image level labels we can effectively localize the most distinctive regions in complex scenes and thus remove ambiguities leading to enhanced classification performance in highly complex aerial scenes. In the second chapter, we deal with refining segmentation labels of Building footprints in aerial images. This we effectively perform by first detecting errors in the initial segmentation masks and correcting only those segmentation pixels where we find a high probability of errors. The next two chapters of the thesis are related to the application of Generative Adversarial Networks. In the first one, we build an effective Cloud-GAN model to remove thin films of clouds in Sentinel-2 imagery by adopting a cyclic consistency loss. This utilizes an adversarial lossfunction to map cloudy-images to non-cloudy images in a fully unsupervised fashion, where the cyclic-loss helps in constraining the network to output a cloud-free image corresponding to the input cloudy image and not any random image in the target domain. Finally, the last chapter addresses a different set of high-resolution images, not coming from the RS domain but instead from High Dynamic Range Imaging (HDRI) application. These are 32-bit imageswhich capture the full extent of luminance present in the scene. Our goal is to quantize them to 8-bit Low Dynamic Range (LDR) images so that they can be projected effectively on our normal display screens while keeping the overall contrast and perception quality similar to that found in HDR images. We adopt a Multi-scale GAN model that focuses on both coarser as well as finer-level information necessary for high-resolution images. The final tone-mapped outputs have a high subjective quality without any perceived artifacts
Khlaifi, Hajer. "Preliminary study for detection and classification of swallowing sound". Thesis, Compiègne, 2019. http://www.theses.fr/2019COMP2485/document.
Texto completoThe diseases affecting and altering the swallowing process are multi-faceted, affecting the patient’s quality of life and ability to perform well in society. The exact nature and severity of the pre/post-treatment changes depend on the location of the anomaly. Effective swallowing rehabilitation, clinically depends on the inclusion of a video-fluoroscopic evaluation of the patient’s swallowing in the post-treatment evaluation. There are other available means such as endoscopic optical fibre. The drawback of these evaluation approaches is that they are very invasive. However, these methods make it possible to observe the swallowing process and identify areas of dysfunction during the process with high accuracy. "Prevention is better than cure" is the fundamental principle of medicine in general. In this context, this thesis focuses on remote monitoring of patients and more specifically monitoring the functional evolution of the swallowing process of people at risk of dysphagia, whether at home or in medical institutions, using the minimum number of non-invasive sensors. This has motivated the monitoring of the swallowing process based on the capturing only the acoustic signature of the process and modeling the process as a sequence of acoustic events occuring within a specific time frame. The main problem of such acoustic signal processing is the automatic detection of the relevent sound signals, a crucial step in the automatic classification of sounds during food intake for automatic monitoring. The detection of relevant signal reduces the complexity of the subsequent analysis and characterisation of a particular swallowing process. The-state-of-the-art algorithms processing the detection of the swallowing sounds as distinguished from environmental noise were not sufficiently accurate. Hence, the idea occured of using an adaptive threshold on the signal resulting from wavelet decomposition. The issues related to the classification of sounds in general and swallowing sounds in particular are addressed in this work with a hierarchical analysis that aims to first identify the swallowing sound segments and then to decompose them into three characteristic sounds, consistent with the physiology of the process. The coupling between detection and classification is also addressed in this work. The real-time implementation of the detection algorithm has been carried out. However, clinical use of the classification is discussed with a plan for its staged deployment subject to normal processes of clinical approval
Becker, Udo J., Thilo Becker y Julia Gerlach. "Coûts externes de l’automobile Aperçu des estimations existantes dans l'Union européenne à 27". Technische Universität Dresden, 2012. https://tud.qucosa.de/id/qucosa%3A29088.
Texto completoBecker, Udo J., Thilo Becker y Julia Gerlach. "The True Costs of Automobility: External Costs of Cars Overview on existing estimates in EU-27". Technische Universität Dresden, 2012. https://tud.qucosa.de/id/qucosa%3A30084.
Texto completoBecker, Udo J., Thilo Becker y Julia Gerlach. "Externe Autokosten in der EU-27 Überblick über existierende Studien". Technische Universität Dresden, 2012. https://tud.qucosa.de/id/qucosa%3A30128.
Texto completoFournier, Alexandre. "Détection et classification de changements sur des scènes urbaines en télédétection". Phd thesis, 2008. http://tel.archives-ouvertes.fr/tel-00463593.
Texto completoLacoste, Alexandre. "Apprentissage à base de gradient pour l'extraction de caractéristiques dans les signaux sonores complexes". Thèse, 2006. http://hdl.handle.net/1866/17867.
Texto completoBergstra, James. "Algorithms for classifying recorded music by genre". Thèse, 2006. http://hdl.handle.net/1866/16735.
Texto completo