Thèses : « Joint clustering with alignment »

1

Arsenteva, Polina. « Statistical modeling and analysis of radio-induced adverse effects based on in vitro and in vivo data ». Electronic Thesis or Diss., Bourgogne Franche-Comté, 2023. http://www.theses.fr/2023UBFCK074.

Texte intégral

Résumé :

Dans ce travail nous abordons le problème des effets indésirables induits par la radiothérapie sur les tissus sains. L'objectif est de proposer un cadre mathématique pour comparer les effets de différentes modalités d'irradiation, afin de pouvoir éventuellement choisir les traitements qui produisent le moins d'effets indésirables pour l’utilisation potentielle en clinique. Les effets secondaires sont étudiés dans le cadre de deux types de données : en termes de réponse omique in vitro des cellules endothéliales humaines, et en termes d'effets indésirables observés sur des souris dans le cadre d'expérimentations in vivo. Dans le cadre in vitro, nous rencontrons le problème de l'extraction d'informations clés à partir de données temporelles complexes qui ne peuvent pas être traitées avec les méthodes disponibles dans la littérature. Nous modélisons le fold change radio-induit, l'objet qui code la différence d'effet de deux conditions expérimentales, d’une manière qui permet de prendre en compte les incertitudes des mesures ainsi que les corrélations entre les entités observées. Nous construisons une distance, avec une généralisation ultérieure à une mesure de dissimilarité, permettant de comparer les fold changes en termes de toutes leurs propriétés statistiques importantes. Enfin, nous proposons un algorithme computationnellement efficace effectuant le clustering joint avec l'alignement temporel des fold changes. Les caractéristiques clés extraites de ces dernières sont visualisées à l'aide de deux types de représentations de réseau, dans le but de faciliter l'interprétation biologique. Dans le cadre in vivo, l’enjeu statistique est d’établir un lien prédictif entre des variables qui, en raison des spécificités du design expérimental, ne pourront jamais être observées sur les mêmes animaux. Dans le contexte de ne pas avoir accès aux lois jointes, nous exploitons les informations supplémentaires sur les groupes observés pour déduire le modèle de régression linéaire. Nous proposons deux estimateurs des paramètres de régression, l'un basé sur la méthode des moments et l'autre basé sur le transport optimal, ainsi que des estimateurs des intervalles de confiance basés sur le bootstrap stratifié
In this work we address the problem of adverse effects induced by radiotherapy on healthy tissues. The goal is to propose a mathematical framework to compare the effects of different irradiation modalities, to be able to ultimately choose those treatments that produce the minimal amounts of adverse effects for potential use in the clinical setting. The adverse effects are studied in the context of two types of data: in terms of the in vitro omic response of human endothelial cells, and in terms of the adverse effects observed on mice in the framework of in vivo experiments. In the in vitro setting, we encounter the problem of extracting key information from complex temporal data that cannot be treated with the methods available in literature. We model the radio-induced fold change, the object that encodes the difference in the effect of two experimental conditions, in the way that allows to take into account the uncertainties of measurements as well as the correlations between the observed entities. We construct a distance, with a further generalization to a dissimilarity measure, allowing to compare the fold changes in terms of all the important statistical properties. Finally, we propose a computationally efficient algorithm performing clustering jointly with temporal alignment of the fold changes. The key features extracted through the latter are visualized using two types of network representations, for the purpose of facilitating biological interpretation. In the in vivo setting, the statistical challenge is to establish a predictive link between variables that, due to the specificities of the experimental design, can never be observed on the same animals. In the context of not having access to joint distributions, we leverage the additional information on the observed groups to infer the linear regression model. We propose two estimators of the regression parameters, one based on the method of moments and the other based on optimal transport, as well as the estimators for the confidence intervals based on the stratified bootstrap procedure

Styles APA, Harvard, Vancouver, ISO, etc.

2

Gao, Zhiming. « Reducing the Search Space of Ontology Alignment Using Clustering Techniques ». Thesis, Linköpings universitet, Databas och informationsteknik, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-141887.

Texte intégral

Résumé :

With the emerging amount of information available in the internet, how to make full use of this information becomes an urgent issue. One of the solutions is using ontology alignment to aggregate different sources of information in order to get comprehensive and complete information. Scalability is a problem regarding the ontology alignment and it can be settled down by reducing the search space of mapping suggestions. In this paper we propose an automated procedure mainly using different clustering techniques to prune the search space. The main focus of this paper is to evaluate different clustering related techniques to be applied in our system. K-means, Chameleon and Birch have been studied and evaluated, every parameter in these clustering algorithms is studied by doing experiments separately, in order to find the best clustering setting to the ontology clustering problem. Four different similarity assignment methods are researched and analyzed as well. Tfidf vectors and cosine similarity are used to identify the similar clusters in the two ontologies, experiments about threshold of cosine similarity are made to get the most suitable value. Our system successfully builds an automated procedure to generate reduced search space for ontology alignment, on one hand, the result shows that it reduces twenty to ninety times of comparisons that the ontology alignment was supposed to make, the precision goes up as well. On the other hand, it only needs one to two minutes of execution time, meanwhile the recall and f-score only drop down a little bit. The trade- off is acceptable for the ontology alignment system which will take tens of minutes to generate the ontology alignment of the same ontology set. As a result, the large scale ontology alignment becomes more computable and feasible.

Styles APA, Harvard, Vancouver, ISO, etc.

3

Aminu, M. (Mubarak). « Dynamic clustering for coordinated multipoint transmission with joint prosessing ». Master's thesis, University of Oulu, 2016. http://urn.fi/URN:NBN:fi:oulu-201602111176.

Texte intégral

Résumé :

Coordinated Multipoint (CoMP) transmission has been identified as a promising concept to handle the substantial interference in the LTE-Advanced systems and it is one of the key technology components in the future 5G networks. CoMP transmission involves two coordination schemes: joint processing (JP) and coordinated beamforming (CB). The scope of this thesis is limited to JP. In the CoMP JP scheme, each user is coherently served by multiple base stations (BSs) and consequently, the user’s signal strength is enhanced and the interference is mitigated. The coherent joint processing requires sharing data and channel state information (CSI) of all the users among all the BSs, which leads to high backhaul capacity requirement and high signaling cost especially in large-scale networks. Grouping the BSs into smaller coordination clusters within which a user is served by only the BSs in the cluster will significantly reduce the signaling cost and the backhaul burden. In this thesis, optimal BS clustering and beamformer design for CoMP JP in the downlink of a multi-cell network is studied. The unique aspect of the study is that the BS clustering and the beamformer design are carried out jointly by iteratively solving a series of convex sub-problems. The BSs are dynamically grouped into small coordination clusters whereby each user is served by a few BSs that are in a coordination cluster. The joint BS clustering and beamformer design is performed to maximize a network utility function in the form of the weighted sum rate maximization (WSRM). The weighted sum rate maximization (WSRM) problem is formulated from the perspective of sparse optimization framework where sparsity is induced by penalizing the objective function with a power penalty. The WSRM problem is known to be non-convex and NP-hard. Therefore, it is difficult to solve directly. Two solutions are studied; in the first approach, the WSRM problem is solved via weighted minimum mean square error (WMMSE) minimization and the second approach involves approximation of the WSRM problem as a successive second order cone program (SSOCP). In both approaches, the objective function is penalized with a power penalty and the clusters can be adjusted by a single parameter in the problem. The performance evaluation of the proposed algorithms is carried out via simulation and it is shown that the serving sets in the network can be controlled according to the available backhaul capacity by properly selecting a single parameter in the problem. Finally, an algorithm for a fixed number of active links is proposed.

Styles APA, Harvard, Vancouver, ISO, etc.

4

Costigan, Patrick Allan. « Gait and lower limb alignment in patellofemoral joint pain syndrome ». Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1997. http://www.collectionscanada.ca/obj/s4/f2/dsk3/ftp05/nq22451.pdf.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

5

White, Derek A. « Factors affecting changes in joint alignment following knee osteotomy surgery ». Thesis, National Library of Canada = Bibliothèque nationale du Canada, 2001. http://www.collectionscanada.ca/obj/s4/f2/dsk3/ftp05/MQ63389.pdf.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

6

Nunes, Neuza Filipa Martins. « Algorithms for time series clustering applied to biomedical signals ». Master's thesis, Faculdade de Ciências e Tecnologia, 2011. http://hdl.handle.net/10362/5666.

Texte intégral

Résumé :

Thesis submitted in the fulfillment of the requirements for the Degree of Master in Biomedical Engineering
The increasing number of biomedical systems and applications for human body understanding creates a need for information extraction tools to use in biosignals. It’s important to comprehend the changes in the biosignal’s morphology over time, as they often contain critical information on the condition of the subject or the status of the experiment. The creation of tools that automatically analyze and extract relevant attributes from biosignals, providing important information to the user, has a significant value in the biosignal’s processing field. The present dissertation introduces new algorithms for time series clustering, where we are able to separate and organize unlabeled data into different groups whose signals are similar to each other. Signal processing algorithms were developed for the detection of a meanwave, which represents the signal’s morphology and behavior. The algorithm designed computes the meanwave by separating and averaging all cycles of a cyclic continuous signal. To increase the quality of information given by the meanwave, a set of wave-alignment techniques was also developed and its relevance was evaluated in a real database. To evaluate our algorithm’s applicability in time series clustering, a distance metric created with the information of the automatic meanwave was designed and its measurements were given as input to a K-Means clustering algorithm. With that purpose, we collected a series of data with two different modes in it. The produced algorithm successfully separates two modes in the collected data with 99.3% of efficiency. The results of this clustering procedure were compared to a mechanism widely used in this area, which models the data and uses the distance between its cepstral coefficients to measure the similarity between the time series.The algorithms were also validated in different study projects. These projects show the variety of contexts in which our algorithms have high applicability and are suitable answers to overcome the problems of exhaustive signal analysis and expert intervention. The algorithms produced are signal-independent, and therefore can be applied to any type of signal providing it is a cyclic signal. The fact that this approach doesn’t require any prior information and the preliminary good performance make these algorithms powerful tools for biosignals analysis and classification.

Styles APA, Harvard, Vancouver, ISO, etc.

7

Tachibana, Kanta, Takeshi Furuhashi, Tomohiro Yoshikawa, Eckhard Hitzer et MINH TUAN PHAM. « Clustering of Questionnaire Based on Feature Extracted by Geometric Algebra ». 日本知能情報ファジィ学会, 2008. http://hdl.handle.net/2237/20676.

Texte intégral

Résumé :

Session ID: FR-G2-2
Joint 4th International Conference on Soft Computing and Intelligent Systems and 9th International Symposium on advanced Intelligent Systems, September 17-21, 2008, Nagoya University, Nagoya, Japan

Styles APA, Harvard, Vancouver, ISO, etc.

8

Hasnat, Md Abul. « Unsupervised 3D image clustering and extension to joint color and depth segmentation ». Thesis, Saint-Etienne, 2014. http://www.theses.fr/2014STET4013/document.

Texte intégral

Résumé :

L'accès aux séquences d'images 3D s'est aujourd'hui démocratisé, grâce aux récentes avancées dans le développement des capteurs de profondeur ainsi que des méthodes permettant de manipuler des informations 3D à partir d'images 2D. De ce fait, il y a une attente importante de la part de la communauté scientifique de la vision par ordinateur dans l'intégration de l'information 3D. En effet, des travaux de recherche ont montré que les performances de certaines applications pouvaient être améliorées en intégrant l'information 3D. Cependant, il reste des problèmes à résoudre pour l'analyse et la segmentation de scènes intérieures comme (a) comment l'information 3D peut-elle être exploitée au mieux ? et (b) quelle est la meilleure manière de prendre en compte de manière conjointe les informations couleur et 3D ? Nous abordons ces deux questions dans cette thèse et nous proposons de nouvelles méthodes non supervisées pour la classification d'images 3D et la segmentation prenant en compte de manière conjointe les informations de couleur et de profondeur. A cet effet, nous formulons l'hypothèse que les normales aux surfaces dans les images 3D sont des éléments à prendre en compte pour leur analyse, et leurs distributions sont modélisables à l'aide de lois de mélange. Nous utilisons la méthode dite « Bregman Soft Clustering » afin d'être efficace d'un point de vue calculatoire. De plus, nous étudions plusieurs lois de probabilités permettant de modéliser les distributions de directions : la loi de von Mises-Fisher et la loi de Watson. Les méthodes de classification « basées modèles » proposées sont ensuite validées en utilisant des données de synthèse puis nous montrons leur intérêt pour l'analyse des images 3D (ou de profondeur). Une nouvelle méthode de segmentation d'images couleur et profondeur, appelées aussi images RGB-D, exploitant conjointement la couleur, la position 3D, et la normale locale est alors développée par extension des précédentes méthodes et en introduisant une méthode statistique de fusion de régions « planes » à l'aide d'un graphe. Les résultats montrent que la méthode proposée donne des résultats au moins comparables aux méthodes de l'état de l'art tout en demandant moins de temps de calcul. De plus, elle ouvre des perspectives nouvelles pour la fusion non supervisée des informations de couleur et de géométrie. Nous sommes convaincus que les méthodes proposées dans cette thèse pourront être utilisées pour la classification d'autres types de données comme la parole, les données d'expression en génétique, etc. Elles devraient aussi permettre la réalisation de tâches complexes comme l'analyse conjointe de données contenant des images et de la parole
Access to the 3D images at a reasonable frame rate is widespread now, thanks to the recent advances in low cost depth sensors as well as the efficient methods to compute 3D from 2D images. As a consequence, it is highly demanding to enhance the capability of existing computer vision applications by incorporating 3D information. Indeed, it has been demonstrated in numerous researches that the accuracy of different tasks increases by including 3D information as an additional feature. However, for the task of indoor scene analysis and segmentation, it remains several important issues, such as: (a) how the 3D information itself can be exploited? and (b) what is the best way to fuse color and 3D in an unsupervised manner? In this thesis, we address these issues and propose novel unsupervised methods for 3D image clustering and joint color and depth image segmentation. To this aim, we consider image normals as the prominent feature from 3D image and cluster them with methods based on finite statistical mixture models. We consider Bregman Soft Clustering method to ensure computationally efficient clustering. Moreover, we exploit several probability distributions from directional statistics, such as the von Mises-Fisher distribution and the Watson distribution. By combining these, we propose novel Model Based Clustering methods. We empirically validate these methods using synthetic data and then demonstrate their application for 3D/depth image analysis. Afterward, we extend these methods to segment synchronized 3D and color image, also called RGB-D image. To this aim, first we propose a statistical image generation model for RGB-D image. Then, we propose novel RGB-D segmentation method using a joint color-spatial-axial clustering and a statistical planar region merging method. Results show that, the proposed method is comparable with the state of the art methods and requires less computation time. Moreover, it opens interesting perspectives to fuse color and geometry in an unsupervised manner. We believe that the methods proposed in this thesis are equally applicable and extendable for clustering different types of data, such as speech, gene expressions, etc. Moreover, they can be used for complex tasks, such as joint image-speech data analysis

Styles APA, Harvard, Vancouver, ISO, etc.

9

Fahrni, Angela Petra [Verfasser], et Michael [Akademischer Betreuer] Strube. « Joint Discourse-aware Concept Disambiguation and Clustering / Angela Petra Fahrni ; Betreuer : Michael Strube ». Heidelberg : Universitätsbibliothek Heidelberg, 2016. http://d-nb.info/1180614704/34.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

10

Coles, Lisa. « Functional kinematic study of knee replacement : the effect of implant design and alignment on the patellofemoral joint ». Thesis, University of Bath, 2015. https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.642032.

Texte intégral

Résumé :

Total knee arthroplasty is a widely used and relatively successful procedure, designed to relieve pain and restore function to patients suffering from osteoarthritis. However, satisfaction following the procedure is low. One of the primary sources of pain and a cause of functional limitations following knee arthroplasty is the patellofemoral joint. Reasons for pain in the patellofemoral joint are not well understood but adverse patellofemoral biomechanics are thought to contribute. Many in vitro methods exist for the investigation of patellofemoral joint biomechanics but there is no consistent standard protocol. It is therefore difficult to draw any general conclusions regarding the effect of specific design or alignment factors on the biomechanics of the patellofemoral joint. The present study aimed to improve current understanding of factors contributing to patellofemoral complications. A knee simulator, which was based on the Oxford Knee Rig and included synthetic models for a number of soft tissue and bony structures, was developed. The simulator was demonstrated to provide a simplified but physiologically relevant model of the human knee, which allowed effective assessment of patellofemoral joint biomechanics under physiological loads. The system eliminated the need for cadaveric tissue and therefore demonstrated reduced variability, enabling the efficient assessment of a number of potentially influencing factors. A number of investigations were carried out using the simulator to assess the effect of patella component design and position, and femoral component alignment on patellofemoral biomechanics using the Scorpio NRG system. The results of these studies indicate the benefit of medialisation of the apex of the patella component and warn against excessive femoral component sagittal plane malalignment. However, in general they indicated the relatively forgiving nature of the Scorpio system to malalignment and highlighted that irrespective of alignment and patella component design, pressures in excess of material limits are frequently achieved in deep flexion.

Styles APA, Harvard, Vancouver, ISO, etc.

11

Ogden, Samuel R. « Automatic Content-Based Temporal Alignment of Image Sequences with Varying Spatio-Temporal Resolution ». BYU ScholarsArchive, 2012. https://scholarsarchive.byu.edu/etd/3303.

Texte intégral

Résumé :

Many applications use multiple cameras to simultaneously capture imagery of a scene from different vantage points on a rigid, moving camera system over time. Multiple cameras often provide unique viewing angles but also additional levels of detail of a scene at different spatio-temporal resolutions. However, in order to benefit from this added information the sources must be temporally aligned. As a result of cost and physical limitations it is often impractical to synchronize these sources via an external clock device. Most methods attempt synchronization through the recovery of a constant scale factor and offset with respect to time. This limits the generality of such alignment solutions. We present an unsupervised method that utilizes a content-based clustering mechanism in order to temporally align multiple non-synchronized image sequences of different and varying spatio-temporal resolutions. We show that the use of temporal constraints and dynamic programming adds robustness to changes in capture rates, field of view, and resolution.

Styles APA, Harvard, Vancouver, ISO, etc.

12

Weber, Matthias. « Structural Performance Comparison of Parallel Software Applications ». Doctoral thesis, Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2016. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-216133.

Texte intégral

Résumé :

With rising complexity of high performance computing systems and their parallel software, performance analysis and optimization has become essential in the development of efficient applications. The comparison of performance data is a key operation required in performance analysis. An analyst may conduct different types of comparisons in order to understand the performance properties of an application. One use case is comparing performance data from multiple measurements. Typical examples for such comparisons are before/after comparisons when applying optimizations or changing code versions. Besides comparing performance between multiple runs, also comparing performance characteristics across the parallel execution streams of an application is essential to detect performance problems. This is typically useful to detect imbalances, outliers, or changing runtime behavior during the execution of an application. While such comparisons are straightforward for the aggregated data in performance profiles, only limited solutions exist for comparing event traces. Trace-based analysis, i.e., the collection of fine-grained information on individual application events with timestamps and application context, has proven to be a powerful technique. The detailed performance information included in event traces make them very suitable for performance analysis. However, this level of detail also presents a challenge because it implies a large and overwhelming amount of data. Currently, users need to perform manual comparison of event traces, which is extremely challenging and time consuming because of the large volume of detailed data and the need to correctly line up trace events. To fill the gap of missing solutions for automatic comparison of event traces, this work proposes a set of techniques that automatically align traces. The alignment allows their structural comparison and the highlighting of differences between them. A set of novel metrics provide the user with an objective measure of the differences between traces, both in terms of differences in the event stream and timing differences across events. An additional important aspect of trace-based analysis is the visualization of performance data in event timelines. This has proven to be a powerful approach for the detection of various types of performance problems. However, visualization of large numbers of event timelines quickly hits the limits of available display resolution. Likewise, identifying performance problems is challenging in the large amount of visualized performance data. To alleviate these problems this work proposes two new approaches for event timeline visualization. First, novel folding strategies for event timelines facilitate visual scalability and provide powerful overviews of performance data at the same time. Second, this work presents an effective approach that automatically identifies and highlights several types of performance critical sections in an application run. This approach identifies time dominant functions of an application and subsequently uses them to analyze runtime imbalances throughout the application run. Intuitive visualizations present the resulting runtime variations and guide the analyst to performance hot spots. Evaluations with benchmarks and real-world applications assess all introduced techniques. The effectiveness of the comparison approaches is demonstrated by showing automatically detected performance issues and structural differences between different versions of applications and across parallel execution streams. Case studies showcase the capabilities of the event timeline visualization techniques by demonstrating scalable performance data visualizations and detecting performance problems and code inefficiencies in real-world applications.

Styles APA, Harvard, Vancouver, ISO, etc.

13

Mizdrak, Pedrag. « Novel iterative approach to joint sequence alignment and tree inference under maximum likelihood : A critical assessment ». Thesis, University of Ottawa (Canada), 2009. http://hdl.handle.net/10393/28253.

Texte intégral

Résumé :

Multiple sequence alignment (MBA) and phylogeny tree reconstruction are two imporant problems in bioinformatics. In some respect, they represent "two sides of the same coin", since solving either of the two problems would be easier if the solution to the other problem was given. However, most of the currently available algorithms present a solution to only one of these two problems, either completely ignoring the other problem or assuming that its solution is known in advance. Attempts have been made to solve these two problems simultaneously, but they are either too computationally intensive or inappropriate to analyze divergent sequences. Here we derive a new method that addresses these shortcomings by iteratively improving the starting alignment and its corresponding evolutionary tree based on maximum likelihood scores. We show that the method produces trees with significantly better likelihood scores for fairly to highly divergent sequences. Yet, this improvement does not translate directly into an improvement of the tree and alignment quality.

Styles APA, Harvard, Vancouver, ISO, etc.

14

Park, Y., E. Krause, S. Dodelson, B. Jain, A. Amara, M. R. Becker, S. L. Bridle et al. « Joint analysis of galaxy-galaxy lensing and galaxy clustering : Methodology and forecasts for Dark Energy Survey ». AMER PHYSICAL SOC, 2016. http://hdl.handle.net/10150/621963.

Texte intégral

Résumé :

The joint analysis of galaxy-galaxy lensing and galaxy clustering is a promising method for inferring the growth function of large-scale structure. Anticipating a near future application of this analysis to Dark Energy Survey (DES) measurements of galaxy positions and shapes, we develop a practical approach to modeling the assumptions and systematic effects affecting the joint analysis of small-scale galaxy-galaxy lensing and large-scale galaxy clustering. Introducing parameters that characterize the halo occupation distribution (HOD), photometric redshift uncertainties, and shear measurement errors, we study how external priors on different subsets of these parameters affect our growth constraints. Degeneracies within the HOD model, as well as between the HOD and the growth function, are identified as the dominant source of complication, with other systematic effects being subdominant. The impact of HOD parameters and their degeneracies necessitate the detailed joint modeling of the galaxy sample that we employ. We conclude that DES data will provide powerful constraints on the evolution of structure growth in the Universe, conservatively/optimistically constraining the growth function to 7.9%/4.8% with its first-year data that cover over 1000 square degrees, and to 3.9%/2.3% with its full five-year data that will survey 5000 square degrees, including both statistical and systematic uncertainties.

Styles APA, Harvard, Vancouver, ISO, etc.

15

Luu, Vinh Trung. « Using event sequence alignment to automatically segment web users for prediction and recommendation ». Thesis, Mulhouse, 2016. http://www.theses.fr/2016MULH0098/document.

Texte intégral

Résumé :

Une masse de données importante est collectée chaque jour par les gestionnaires de site internet sur les visiteurs qui accèdent à leurs services. La collecte de ces données a pour objectif de mieux comprendre les usages et d'acquérir des connaissances sur le comportement des visiteurs. A partir de ces connaissances, les gestionnaires de site peuvent décider de modifier leur site ou proposer aux visiteurs du contenu personnalisé. Cependant, le volume de données collectés ainsi que la complexité de représentation des interactions entre le visiteur et le site internet nécessitent le développement de nouveaux outils de fouille de données. Dans cette thèse, nous avons exploré l’utilisation des méthodes d’alignement de séquences pour l'extraction de connaissances sur l'utilisation de site Web (web mining). Ces méthodes sont la base du regroupement automatique d’internautes en segments, ce qui permet de découvrir des groupes de comportements similaires. De plus, nous avons également étudié comment ces groupes pouvaient servir à effectuer de la prédiction et la recommandation de pages. Ces thèmes sont particulièrement importants avec le développement très rapide du commerce en ligne qui produit un grand volume de données (big data) qu’il est impossible de traiter manuellement
This thesis explored the application of sequence alignment in web usage mining, including user clustering and web prediction and recommendation.This topic was chosen as the online business has rapidly developed and gathered a huge volume of information and the use of sequence alignment in the field is still limited. In this context, researchers are required to build up models that rely on sequence alignment methods and to empirically assess their relevance in user behavioral mining. This thesis presents a novel methodological point of view in the area and show applicable approaches in our quest to improve previous related work. Web usage behavior analysis has been central in a large number of investigations in order to maintain the relation between users and web services. Useful information extraction has been addressed by web content providers to understand users’ need, so that their content can be correspondingly adapted. One of the promising approaches to reach this target is pattern discovery using clustering, which groups users who show similar behavioral characteristics. Our research goal is to perform users clustering, in real time, based on their session similarity

Styles APA, Harvard, Vancouver, ISO, etc.

16

Soheily-Khah, Saeid. « Generalized k-means-based clustering for temporal data under time warp ». Thesis, Université Grenoble Alpes (ComUE), 2016. http://www.theses.fr/2016GREAM064/document.

Texte intégral

Résumé :

L’alignement de multiples séries temporelles est un problème important non résolu dans de nombreuses disciplines scientifiques. Les principaux défis pour l’alignement temporel de multiples séries comprennent la détermination et la modélisation des caractéristiques communes et différentielles de classes de séries. Cette thèse est motivée par des travaux récents portant sur l'extension de la DTW pour l’alignement de séries multiples issues d’applications diverses incluant la reconnaissance vocale, l'analyse de données micro-array, la segmentation ou l’analyse de mouvements humain. Ces travaux fondés sur l’extension de la DTW souffrent cependant de plusieurs limites : 1) Ils se limitent au problème de l'alignement par pair de séries 2) Ils impliquent uniformément les descripteurs des séries 3) Les alignements opérés sont globaux. L'objectif de cette thèse est d'explorer de nouvelles approches d’alignement temporel pour la classification non supervisée de séries. Ce travail comprend d'abord le problème de l'extraction de prototypes, puis de l'alignement de séries multiples multidimensionnelles
Temporal alignment of multiple time series is an important unresolved problem in many scientific disciplines. Major challenges for an accurate temporal alignment include determining and modeling the common and differential characteristics of classes of time series. This thesis is motivated by recent works in extending Dynamic time warping for aligning multiple time series from several applications including speech recognition, curve matching, micro-array data analysis, temporal segmentation or human motion. However these DTW-based works suffer of several limitations: 1) They address the problem of aligning two time series regardless of the remaining time series, 2) They involve uniformly the features of the multiple time series, 3) The time series are aligned globally by including the whole observations. The aim of this thesis is to explore a generalized dynamic time warping for time series clustering. This work includes first the problem of prototype extraction, then the alignment of multiple and multidimensional time series

Styles APA, Harvard, Vancouver, ISO, etc.

17

Poddar, Sunrita. « Joint recovery of high-dimensional signals from noisy and under-sampled measurements using fusion penalties ». Diss., University of Iowa, 2018. https://ir.uiowa.edu/etd/6623.

Texte intégral

Résumé :

The presence of missing entries pose a hindrance to data analysis and interpretation. The missing entries may occur due to a variety of reasons such as sensor malfunction, limited acquisition time or unavailability of information. In this thesis, we present algorithms to analyze and complete data which contain several missing entries. We consider the recovery of a group of signals, given a few under-sampled and noisy measurements of each signal. This involves solving ill-posed inverse problems, since the number of available measurements are considerably fewer than the dimensionality of the signal that we aim to recover. In this work, we consider different data models to enable joint recovery of the signals from their measurements, as opposed to the independent recovery of each signal. This prior knowledge makes the inverse problems well-posed. While compressive sensing techniques have been proposed for low-rank or sparse models, such techniques have not been studied to the same extent for other models such as data appearing in clusters or lying on a low-dimensional manifold. In this work, we consider several data models arising in different applications, and present some theoretical guarantees for the joint reconstruction of the signals from few measurements. Our proposed techniques make use of fusion penalties, which are regularizers that promote solutions with similarity between certain pairs of signals. The first model that we consider is that of points lying on a low-dimensional manifold, embedded in high dimensional ambient space. This model is apt for describing a collection of signals, each of which is a function of only a few parameters; the manifold dimension is equal to the number of parameters. We propose a technique to recover a series of such signals, given a few measurements for each signal. We demonstrate this in the context of dynamic Magnetic Resonance Imaging (MRI) reconstruction, where only a few Fourier measurements are available for each time frame. A novel acquisition scheme enables us to detect the neighbours of each frame on the manifold. We then recover each frame by enforcing similarity with its neighbours. The proposed scheme is used to enable fast free-breathing cardiac and speech MRI scans. Next, we consider the recovery of curves/surfaces from few sampled points. We model the curves as the zero-level set of a trigonometric polynomial, whose bandwidth controls the complexity of the curve. We present theoretical results for the minimum number of samples required to uniquely identify the curve. We show that the null-space vectors of high dimensional feature maps of these points can be used to recover the curve. The method is demonstrated on the recovery of the structure of DNA filaments from a few clicked points. This idea is then extended to recover data lying on a high-dimensional surface from few measurements. The formulated algorithm has similarities to our algorithm for recovering points on a manifold. Hence, we apply the above ideas to the cardiac MRI reconstruction problem, and are able to show better image quality with reduced computational complexity. Finally, we consider the case where the data is organized into clusters. The goal is to recover the true clustering of the data, even when a few features of each data point is unknown. We propose a fusion-penalty based optimization problem to cluster data reliably in the presence of missing entries, and present theoretical guarantees for successful recovery of the correct clusters. We next propose a computationally efficient algorithm to solve a relaxation of this problem. We demonstrate that our algorithm reliably recovers the true clusters in the presence of large fractions of missing entries on simulated and real datasets. This work thus results in several theoretical insights and solutions to different practical problems which involve reconstructing and analyzing data with missing entries. The fusion penalties that are used in each of the above models are obtained directly as a result of model assumptions. The proposed algorithms show very promising results on several real datasets, and we believe that they are general enough to be easily extended to several other practical applications.

Styles APA, Harvard, Vancouver, ISO, etc.

18

Synnergren, Jane. « Wheat variety identification using genetic variations ». Thesis, University of Skövde, Department of Computer Science, 2003. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-821.

Texte intégral

Résumé :

There is a continuous development of different crop varieties in the crop trade. The cultivated crops tend to be more and more alike which require an effective method for crop identification. Crop type and crop type purity has become a quality measure in crop trade both nationally and internationally. A number of well known quality attributes of interest in the crop trade can be correlated to the specific crop type and therefore it is of great importance to reliably be able to identify different crop varieties. It is well known from the literature that there exist genomic variations at the nucleotide level between different crop varieties and these variations might potentially be useful for automated variety identification.

This project deals with the crop variety identification area where the possibilities of distinguishing between different wheat varieties are investigated. Experience from performing wheat variety identification at protein level has shown unsatisfactory results and therefore DNA-based techniques are proposed instead. DNA-based techniques are dependent upon the availability of sequence data from the wheat genome and some work has concerned examining the availability of sequence data from wheat. But the focus of the work has been on defining a method for computational detection of single nucleotide variations in ESTs from wheat and to experimentally test that method. Results from these experiments show that the method defined in this project detects polymorphic variations that can be correlated to variety variations

Styles APA, Harvard, Vancouver, ISO, etc.

19

Binti, Zainul Abidin Fatin Nurzahirah. « Flexible model-based joint probabilistic clustering of binary and continuous inputs and its application to genetic regulation and cancer ». Thesis, University of Leeds, 2017. http://etheses.whiterose.ac.uk/18883/.

Texte intégral

Résumé :

Clustering is used widely in ‘omics’ studies and is often tackled with standard methods such as hierarchical clustering or k-means which are limited to a single data type. In addition, these methods are further limited by having to select a cut-off point at specific level of dendrogram- a tree diagram or needing a pre-defined number of clusters respectively. The increasing need for integration of multiple data sets leads to a requirement for clustering methods applicable to mixed data types, where the straightforward application of standard methods is not necessarily the best approach. A particularly common problem involves clustering entities characterized by a mixture of binary data, for example, presence or absence of mutations, binding, motifs, and/or epigenetic marks and continuous data, for example, gene expression, protein abundance and/or metabolite levels. In this work, we presented a generic method based on a probabilistic model for clustering this mixture of data types, and illustrate its application to genetic regulation and the clustering of cancer samples. It uses penalized maximum likelihood (ML) estimation of mixture model parameters using information criteria (model selection objective function) and meta-heuristic searches for optimum clusters. Compatibility of several information criteria with our model-based joint clustering was tested, including the well-known Akaike Information Criterion (AIC) and its empirically determined derivatives (AICλ), Bayesian Information Criterion (BIC) and its derivative (CAIC), and Hannan-Quinn Criterion (HQC). We have experimentally shown with simulated data that AIC and AIC (λ=2.5) worked well with our method. We show that the resulting clusters lead to useful hypotheses: in the case of genetic regulation these concern regulation of groups of genes by specific sets of transcription factors and in the case of cancer samples combinations of gene mutations are related to patterns of gene expression. The clusters have potential mechanistic significance and in the latter case are significantly linked to survival.

Styles APA, Harvard, Vancouver, ISO, etc.

20

Choi, Seo Wook. « A directed joint model of fMRI network and response timedata : The linear ballistic accumulator model with aneconomical clustering model ». The Ohio State University, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=osu1605888867128103.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

21

Cameron, Michael, et mcam@mc-mc net. « Efficient Homology Search for Genomic Sequence Databases ». RMIT University. Computer Science and Information Technology, 2006. http://adt.lib.rmit.edu.au/adt/public/adt-VIT20070509.162443.

Texte intégral

Résumé :

Genomic search tools can provide valuable insights into the chemical structure, evolutionary origin and biochemical function of genetic material. A homology search algorithm compares a protein or nucleotide query sequence to each entry in a large sequence database and reports alignments with highly similar sequences. The exponential growth of public data banks such as GenBank has necessitated the development of fast, heuristic approaches to homology search. The versatile and popular blast algorithm, developed by researchers at the US National Center for Biotechnology Information (NCBI), uses a four-stage heuristic approach to efficiently search large collections for analogous sequences while retaining a high degree of accuracy. Despite an abundance of alternative approaches to homology search, blast remains the only method to offer fast, sensitive search of large genomic collections on modern desktop hardware. As a result, the tool has found widespread use with millions of queries posed each day. A significant investment of computing resources is required to process this large volume of genomic searches and a cluster of over 200 workstations is employed by the NCBI to handle queries posed through the organisation's website. As the growth of sequence databases continues to outpace improvements in modern hardware, blast searches are becoming slower each year and novel, faster methods for sequence comparison are required. In this thesis we propose new techniques for fast yet accurate homology search that result in significantly faster blast searches. First, we describe improvements to the final, gapped alignment stages where the query and sequences from the collection are aligned to provide a fine-grain measure of similarity. We describe three new methods for aligning sequences that roughly halve the time required to perform this computationally expensive stage. Next, we investigate improvements to the first stage of search, where short regions of similarity between a pair of sequences are identified. We propose a novel deterministic finite automaton data structure that is significantly smaller than the codeword lookup table employed by ncbi-blast, resulting in improved cache performance and faster search times. We also discuss fast methods for nucleotide sequence comparison. We describe novel approaches for processing sequences that are compressed using the byte packed format already utilised by blast, where four nucleotide bases from a strand of DNA are stored in a single byte. Rather than decompress sequences to perform pairwise comparisons, our innovations permit sequences to be processed in their compressed form, four bases at a time. Our techniques roughly halve average query evaluation times for nucleotide searches with no effect on the sensitivity of blast. Finally, we present a new scheme for managing the high degree of redundancy that is prevalent in genomic collections. Near-duplicate entries in sequence data banks are highly detrimental to retrieval performance, however existing methods for managing redundancy are both slow, requiring almost ten hours to process the GenBank database, and crude, because they simply purge highly-similar sequences to reduce the level of internal redundancy. We describe a new approach for identifying near-duplicate entries that is roughly six times faster than the most successful existing approaches, and a novel approach to managing redundancy that reduces collection size and search times but still provides accurate and comprehensive search results. Our improvements to blast have been integrated into our own version of the tool. We find that our innovations more than halve average search times for nucleotide and protein searches, and have no signifcant effect on search accuracy. Given the enormous popularity of blast, this represents a very significant advance in computational methods to aid life science research.

Styles APA, Harvard, Vancouver, ISO, etc.

22

Curado, Manuel. « Structural Similarity : Applications to Object Recognition and Clustering ». Doctoral thesis, Universidad de Alicante, 2018. http://hdl.handle.net/10045/98110.

Texte intégral

Résumé :

In this thesis, we propose many developments in the context of Structural Similarity. We address both node (local) similarity and graph (global) similarity. Concerning node similarity, we focus on improving the diffusive process leading to compute this similarity (e.g. Commute Times) by means of modifying or rewiring the structure of the graph (Graph Densification), although some advances in Laplacian-based ranking are also included in this document. Graph Densification is a particular case of what we call graph rewiring, i.e. a novel field (similar to image processing) where input graphs are rewired to be better conditioned for the subsequent pattern recognition tasks (e.g. clustering). In the thesis, we contribute with an scalable an effective method driven by Dirichlet processes. We propose both a completely unsupervised and a semi-supervised approach for Dirichlet densification. We also contribute with new random walkers (Return Random Walks) that are useful structural filters as well as asymmetry detectors in directed brain networks used to make early predictions of Alzheimer's disease (AD). Graph similarity is addressed by means of designing structural information channels as a means of measuring the Mutual Information between graphs. To this end, we first embed the graphs by means of Commute Times. Commute times embeddings have good properties for Delaunay triangulations (the typical representation for Graph Matching in computer vision). This means that these embeddings can act as encoders in the channel as well as decoders (since they are invertible). Consequently, structural noise can be modelled by the deformation introduced in one of the manifolds to fit the other one. This methodology leads to a very high discriminative similarity measure, since the Mutual Information is measured on the manifolds (vectorial domain) through copulas and bypass entropy estimators. This is consistent with the methodology of decoupling the measurement of graph similarity in two steps: a) linearizing the Quadratic Assignment Problem (QAP) by means of the embedding trick, and b) measuring similarity in vector spaces. The QAP problem is also investigated in this thesis. More precisely, we analyze the behaviour of $m$-best Graph Matching methods. These methods usually start by a couple of best solutions and then expand locally the search space by excluding previous clamped variables. The next variable to clamp is usually selected randomly, but we show that this reduces the performance when structural noise arises (outliers). Alternatively, we propose several heuristics for spanning the search space and evaluate all of them, showing that they are usually better than random selection. These heuristics are particularly interesting because they exploit the structure of the affinity matrix. Efficiency is improved as well. Concerning the application domains explored in this thesis we focus on object recognition (graph similarity), clustering (rewiring), compression/decompression of graphs (links with Extremal Graph Theory), 3D shape simplification (sparsification) and early prediction of AD.
Ministerio de Economía, Industria y Competitividad (Referencia TIN2012-32839 BES-2013-064482)

Styles APA, Harvard, Vancouver, ISO, etc.

23

Telha, Cornejo Claudio (Claudio A. ). « Algorithms and hardness results for the jump number problem, the joint replenishment problem, and the optimal clustering of frequency-constrained maintenance jobs ». Thesis, Massachusetts Institute of Technology, 2012. http://hdl.handle.net/1721.1/70446.

Texte intégral

Résumé :

Thesis (Ph. D.)--Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center, 2012.
Cataloged from PDF version of thesis.
Includes bibliographical references (p. 107-110).
In the first part of this thesis we present a new, geometric interpretation of the jump number problem on 2-dimensional 2-colorable (2D2C) partial order. We show that the jump number of a 2D2C poset is equivalent to the maximum cardinality of an independent set in a properly defined collection of rectangles in the plane. We then model the geometric problem as a linear program. Even though the underlying polytope may not be integral, we show that one can always find an integral optimal solution. Inspired by this result and by previous work of A. Frank, T. Jordan and L. Vegh [13, 14, 15] on set-pairs, we derive an efficient combinatorial algorithm to find the maximum independent set and its dual, the minimum hitting set, in polynomial time. The combinatorial algorithm solves the jump number problem on convex posets (a subclass of 2D2C posets) significantly faster than current methods. If n is the number of nodes in the partial order, our algorithm runs in 0((n log n)2.5) time, while previous algorithms ran in at least 0(n9 ) time. In the second part, we present a novel connection between certain sequencing problems that involve the coordination of activities and the problem of factorizing integer numbers. We use this connection to derive hardness results for three different problems: -- The Joint Replenishment Problem with General Integer Policies. -- The Joint Replenishment Problem with Correction Factor. -- The Problem of Optimal Clustering of Frequency-Constrained Maintenance Jobs. Our hardness results do not follow from a standard type of reduction (e.g., we do not prove NP-hardness), and imply that no polynomial-time algorithm exists for the problems above, unless Integer Factorization is solvable in polynomial time..
by Claudio Telha Cornejo.
Ph.D.

Styles APA, Harvard, Vancouver, ISO, etc.

24

Böhm, Christoph. « Enriching the Web of Data with topics and links ». Phd thesis, Universität Potsdam, 2013. http://opus.kobv.de/ubp/volltexte/2013/6862/.

Texte intégral

Résumé :

This thesis presents novel ideas and research findings for the Web of Data – a global data space spanning many so-called Linked Open Data sources. Linked Open Data adheres to a set of simple principles to allow easy access and reuse for data published on the Web. Linked Open Data is by now an established concept and many (mostly academic) publishers adopted the principles building a powerful web of structured knowledge available to everybody. However, so far, Linked Open Data does not yet play a significant role among common web technologies that currently facilitate a high-standard Web experience. In this work, we thoroughly discuss the state-of-the-art for Linked Open Data and highlight several shortcomings – some of them we tackle in the main part of this work. First, we propose a novel type of data source meta-information, namely the topics of a dataset. This information could be published with dataset descriptions and support a variety of use cases, such as data source exploration and selection. For the topic retrieval, we present an approach coined Annotated Pattern Percolation (APP), which we evaluate with respect to topics extracted from Wikipedia portals. Second, we contribute to entity linking research by presenting an optimization model for joint entity linking, showing its hardness, and proposing three heuristics implemented in the LINked Data Alignment (LINDA) system. Our first solution can exploit multi-core machines, whereas the second and third approach are designed to run in a distributed shared-nothing environment. We discuss and evaluate the properties of our approaches leading to recommendations which algorithm to use in a specific scenario. The distributed algorithms are among the first of their kind, i.e., approaches for joint entity linking in a distributed fashion. Also, we illustrate that we can tackle the entity linking problem on the very large scale with data comprising more than 100 millions of entity representations from very many sources. Finally, we approach a sub-problem of entity linking, namely the alignment of concepts. We again target a method that looks at the data in its entirety and does not neglect existing relations. Also, this concept alignment method shall execute very fast to serve as a preprocessing for further computations. Our approach, called Holistic Concept Matching (HCM), achieves the required speed through grouping the input by comparing so-called knowledge representations. Within the groups, we perform complex similarity computations, relation conclusions, and detect semantic contradictions. The quality of our result is again evaluated on a large and heterogeneous dataset from the real Web. In summary, this work contributes a set of techniques for enhancing the current state of the Web of Data. All approaches have been tested on large and heterogeneous real-world input.
Die vorliegende Arbeit stellt neue Ideen sowie Forschungsergebnisse für das Web of Data vor. Hierbei handelt es sich um ein globales Netz aus sogenannten Linked Open Data (LOD) Quellen. Diese Datenquellen genügen gewissen Prinzipien, um Nutzern einen leichten Zugriff über das Internet und deren Verwendung zu ermöglichen. LOD ist bereits weit verbreitet und es existiert eine Vielzahl von Daten-Veröffentlichungen entsprechend der LOD Prinzipien. Trotz dessen ist LOD bisher kein fester Baustein des Webs des 21. Jahrhunderts. Die folgende Arbeit erläutert den aktuellen Stand der Forschung und Technik für Linked Open Data und identifiziert dessen Schwächen. Einigen Schwachstellen von LOD widmen wir uns in dem darauf folgenden Hauptteil. Zu Beginn stellen wir neuartige Metadaten für Datenquellen vor – die Themen von Datenquellen (engl. Topics). Solche Themen könnten mit Beschreibungen von Datenquellen veröffentlicht werden und eine Reihe von Anwendungsfällen, wie das Auffinden und Explorieren relevanter Daten, unterstützen. Wir diskutieren unseren Ansatz für die Extraktion dieser Metainformationen – die Annotated Pattern Percolation (APP). Experimentelle Ergebnisse werden mit Themen aus Wikipedia Portalen verglichen. Des Weiteren ergänzen wir den Stand der Forschung für das Auffinden verschiedener Repräsentationen eines Reale-Welt-Objektes (engl. Entity Linking). Für jenes Auffinden werden nicht nur lokale Entscheidungen getroffen, sondern es wird die Gesamtheit der Objektbeziehungen genutzt. Wir diskutieren unser Optimierungsmodel, beweisen dessen Schwere und präsentieren drei Ansätze zur Berechnung einer Lösung. Alle Ansätze wurden im LINked Data Alignment (LINDA) System implementiert. Die erste Methode arbeitet auf einer Maschine, kann jedoch Mehrkern-Prozessoren ausnutzen. Die weiteren Ansätze wurden für Rechnercluster ohne gemeinsamen Speicher entwickelt. Wir evaluieren unsere Ergebnisse auf mehr als 100 Millionen Entitäten und erläutern Vor- sowie Nachteile der jeweiligen Ansätze. Im verbleibenden Teil der Arbeit behandeln wir das Linking von Konzepten – ein Teilproblem des Entity Linking. Unser Ansatz, Holistic Concept Matching (HCM), betrachtet abermals die Gesamtheit der Daten. Wir gruppieren die Eingabe um eine geringe Laufzeit bei der Verarbeitung von mehreren Hunderttausenden Konzepten zu erreichen. Innerhalb der Gruppen berechnen wir komplexe Ähnlichkeiten, und spüren semantische Schlussfolgerungen und Widersprüche auf. Die Qualität des Ergebnisses evaluieren wir ebenfalls auf realen Datenmengen. Zusammenfassend trägt diese Arbeit zum aktuellen Stand der Forschung für das Web of Data bei. Alle diskutierten Techniken wurden mit realen, heterogenen und großen Datenmengen getestet.

Styles APA, Harvard, Vancouver, ISO, etc.

25

Ghaemmaghami, Houman. « Robust automatic speaker linking and attribution ». Thesis, Queensland University of Technology, 2013. https://eprints.qut.edu.au/60832/4/Houman_Ghaemmaghami_Thesis.pdf.

Texte intégral

Résumé :

This research makes a major contribution which enables efficient searching and indexing of large archives of spoken audio based on speaker identity. It introduces a novel technique dubbed as “speaker attribution” which is the task of automatically determining ‘who spoke when?’ in recordings and then automatically linking the unique speaker identities within each recording across multiple recordings. The outcome of the research will also have significant impact in improving the performance of automatic speech recognition systems through the extracted speaker identities.

Styles APA, Harvard, Vancouver, ISO, etc.

26

Wang, David I.-Chung. « Speaker diarization : "who spoke when" ». Thesis, Queensland University of Technology, 2012. https://eprints.qut.edu.au/59624/1/David_Wang_Thesis.pdf.

Texte intégral

Résumé :

Speaker diarization is the process of annotating an input audio with information that attributes temporal regions of the audio signal to their respective sources, which may include both speech and non-speech events. For speech regions, the diarization system also specifies the locations of speaker boundaries and assign relative speaker labels to each homogeneous segment of speech. In short, speaker diarization systems effectively answer the question of ‘who spoke when’. There are several important applications for speaker diarization technology, such as facilitating speaker indexing systems to allow users to directly access the relevant segments of interest within a given audio, and assisting with other downstream processes such as summarizing and parsing. When combined with automatic speech recognition (ASR) systems, the metadata extracted from a speaker diarization system can provide complementary information for ASR transcripts including the location of speaker turns and relative speaker segment labels, making the transcripts more readable. Speaker diarization output can also be used to localize the instances of specific speakers to pool data for model adaptation, which in turn boosts transcription accuracies. Speaker diarization therefore plays an important role as a preliminary step in automatic transcription of audio data. The aim of this work is to improve the usefulness and practicality of speaker diarization technology, through the reduction of diarization error rates. In particular, this research is focused on the segmentation and clustering stages within a diarization system. Although particular emphasis is placed on the broadcast news audio domain and systems developed throughout this work are also trained and tested on broadcast news data, the techniques proposed in this dissertation are also applicable to other domains including telephone conversations and meetings audio. Three main research themes were pursued: heuristic rules for speaker segmentation, modelling uncertainty in speaker model estimates, and modelling uncertainty in eigenvoice speaker modelling. The use of heuristic approaches for the speaker segmentation task was first investigated, with emphasis placed on minimizing missed boundary detections. A set of heuristic rules was proposed, to govern the detection and heuristic selection of candidate speaker segment boundaries. A second pass, using the same heuristic algorithm with a smaller window, was also proposed with the aim of improving detection of boundaries around short speaker segments. Compared to single threshold based methods, the proposed heuristic approach was shown to provide improved segmentation performance, leading to a reduction in the overall diarization error rate. Methods to model the uncertainty in speaker model estimates were developed, to address the difficulties associated with making segmentation and clustering decisions with limited data in the speaker segments. The Bayes factor, derived specifically for multivariate Gaussian speaker modelling, was introduced to account for the uncertainty of the speaker model estimates. The use of the Bayes factor also enabled the incorporation of prior information regarding the audio to aid segmentation and clustering decisions. The idea of modelling uncertainty in speaker model estimates was also extended to the eigenvoice speaker modelling framework for the speaker clustering task. Building on the application of Bayesian approaches to the speaker diarization problem, the proposed approach takes into account the uncertainty associated with the explicit estimation of the speaker factors. The proposed decision criteria, based on Bayesian theory, was shown to generally outperform their non- Bayesian counterparts.

Styles APA, Harvard, Vancouver, ISO, etc.

27

Peres, Patrícia Silva. « Alinhamento múltiplo de seqüências através de técnicas de agrupamento ». Universidade Federal do Amazonas, 2006. http://tede.ufam.edu.br/handle/tede/2927.

Texte intégral

Résumé :

Made available in DSpace on 2015-04-11T14:02:59Z (GMT). No. of bitstreams: 1 Patricia Silva Peres.pdf: 506475 bytes, checksum: 40dfa72e28b5cca338c104148bd4ef06 (MD5) Previous issue date: 2006-02-24
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior
The simultaneous alignment of many DNA or protein sequences is one of the commonest tasks in computational molecular biology. Multiple alignments are important in many applications, such as, predicting the structure of new sequences, demonstrating the relationship between new sequences and existing families of sequences, inferring the evolutionary history of a family of sequences,finding the characteristic motifs (core blocks) between biological sequences, assembling fragments in DNA sequencing, and many others. Currently, the most popular strategy used for solving the multiple sequence alignment problem is the progressive alignment. Each step of this strategy might generate an error which is expected to be low for closely related sequences but increases as sequences diverge. Therefore, determining the order in which the sequences will be aligned is a key step in the progressive alignment strategy. Traditional approaches take into account, in each iteration of the progressive alignment, only the closest pair or groups of sequences to be aligned. Such strategy minimizes the error introduced in each step, but may not be the best option to minimize the final error. Based on that hypothesis, this work aims the study and the application of a global clustering technique to perform a previous analysis of all sequences in order to separate them into groups according to their similarities. These groups, then, guide the traditional progressive alignment, as an attempt to minimize the overall error introduced by the steps of the progressive alignment and improve the final result. To assess the reliability of this new strategy, three well-known methods were modified for the purpose of introducing the new sequence clustering stage. The accuracy of new versions of the methods was tested using three diferent reference collections. Besides, the modified methods were compared with their original versions. Results of the conducted experiments depict that the new versions of the methods with the global clustering stage really obtained better alignments than their original versions in the three reference collections and achieving improvement over the main methods found in literature, with an increase of only 3% on average in the running time.
O alinhamento simultâneo entre várias seqüências de DNA ou proteína é um dos principais problemas em biologia molecular computacional. Alinhamentos múltiplos são importantes em muitas aplicações, tais como, predição da estrutura de novas seqüências, demonstração do relacionamento entre novas seqüências e famílias de seqüências já existentes, inferência da história evolutiva de uma família de seqüências, descobrimento de padrões que sejam compartilhados entre seqüências, montagem de fragmentos de DNA, entre outras. Atualmente, a estratégia mais popular utilizada na resolução do problema do alinhamento múltiplo é o alinhamento progressivo. Cada etapa desta estratégia pode gerar uma taxa de erro que tenderá a ser baixa no caso de seqüências muito similares entre si, porêm tenderá a ser alta na medida em que as seqüências divergirem. Portanto, a determinação da ordem de alinhamento das seqüências constitui-se em um passo fundamental na estratégia de alinhamento progressivo. Estratégias tradicionais levam em consideração, a cada iteração do alinhamento progressivo, apenas o par ou grupo de seqüências mais próximo a ser alinhado. Tal estratégia minimiza a taxa de erro introduzida em cada etapa, porém pode não ser a melhor forma para minimizar a taxa de erro final. Baseado nesta hipótese, este trabalho tem por objetivo o estudo e aplicação de uma técnica de agrupamento global para executar uma análise prévia de todas as seqüências de forma a separálas em grupos de acordo com suas similaridades. Estes grupos, então, guiarão o alinhamento progressivo tradicional, numa tentativa de minimizar a taxa de erro global introduzida pelas etapas do alinhamento progressivo e melhorar o resultado final. Para avaliar a contabilidade desta nova estratégia, três métodos conhecidos foram modificados com o objetivo de agregar a nova etapa de agrupamento de seqüências. A acurácia das novas versões dos métodos foi testada utilizando três diferentes coleções de referências. Além disso, os métodos modificados foram comparadas com suas respectivas versões originais. Os resultados dos experimentos mostram que as novas versões dos métodos com a etapa de agrupamento global realmente obtiveram alinhamentos melhores do que suas versões originais nas três coleções de referência e alcançando melhorias sobre os principais métodos encontrados na literatura, com um aumento de apenas 3% em média no tempo de execução.

Styles APA, Harvard, Vancouver, ISO, etc.

28

Tosi, Alessia. « Adjusting linguistically to others : the role of social context in lexical choices and spatial language ». Thesis, University of Edinburgh, 2017. http://hdl.handle.net/1842/23557.

Texte intégral

Résumé :

The human brain is highly sensitive to social information and so is our language production system: people adjust not just what they say but also how they say it in response to the social context. For instance, we are sensitive to the presence of others, and our interactional expectations and goals affect how we individually choose to talk about and refer to things. This thesis is an investigation of the social factors that might lead speakers to adapt linguistically to others. The question of linguistic adaptation is conceived and addressed at two levels: as lexical convergence (i.e., interlocutors coordinating their lexical choices with each other), and as spatial perspective taking in language use (i.e., speakers abandoning their self perspective in favour of another's when verbally locating objects in space). What motivated my research was two-fold. First, I aimed to contribute to the understanding of the interplay between the automatic cognitive accounts and the strategic social accounts of linguistic convergence. At the same time, I wanted to explore new analytical tools for the investigation of interpersonal coordination in conversation (cross-recurrence quantification analysis (CRQA)). Second, there are conflicting explanations as to why people often abandon their self spatial perspective when another person is present in the environment. I aimed to clarify this by bringing together insights from different research fields: spatial language production, spatial cognition, joint attention and joint action. A first set of experiments investigated the effects of speakers' deceptive goals on lexical convergence. Given the extensive evidence that one interlocutor's choices of words shapes another's during collaborative interaction, would we still observe this coordination of linguistic behaviour under conditions of no coordination of intents? In two novel interactive priming paradigms, half of the participants deceived their naïve partner in a detective game (Experiment 1) or a picture naming/matching task (Experiment 2-3) in order to jeopardise their partner's performance in resolving the crime or in a related memory task. Crucially, participants were primed by their partner with suitable-yet-unusual names for objects. I did not find any consistent evidence that deceiving led to a different degree of lexical convergence between deceivers and deceived than between truthful interlocutors. I then explored possibilities and challenges of the use of cross-recurrence quantification analysis (CRQA) (a new analytical tool borrowed from dynamical systems) for the study of lexical convergence in conversation. I applied CRQA in Experiment 4, where I focused on the strategic social accounts of linguistic convergence and investigated whether speakers' tendency to match their interlocutors' lexical choices depended on the social impression that they formed of each other in a previous interaction, and whether this tendency was further modulated by the interactional goal. I developed a novel two-stage paradigm: pairs of participants first experienced a collectivist or an individualistic co-player in an economic decision game (in reality, a pre-set computer programme) and then engaged in a discussion of a survival scenario (this time with the real other) divided in an open-ended vs. joint-goal driven part. I found no evidence that the social impression of their interlocutor affected speakers' degree of lexical convergence. Greater convergence was observed in the joint-goal dialogues, replicating previous findings at syntactic level. Experiments 5-7 left the interactive framework of the previous two sets of experiments and explored spatial perspective taking in a non-interactive language task. I investigated why the presence of a person in the environment can induce speakers to abandon their self perspective to locate objects: Do speakers adapt their spatial descriptions to the vantage point of the person out of intentionality-mediated simulation or of general attention-orienting mechanisms? In an online paradigm, participants located objects in photographs that sometimes contained a person or a plant in various positions with respect to the to-be-located object. Findings were consistent with the simulated intentional accounts and linked non-self spatial perspective in language to the apprehension of another person’s visual affordance. Experiments 8-9 investigated the role of shared experience on perspective taking in spatial language. Prior to any communicative and interactional demand, do speakers adapt their spatial descriptions to the presumed perspective of someone who is attending to the same environment at the same time as them? And is this tendency further affected by the number of co-attendees? I expanded the previous online paradigm and induced participants into thinking that someone else was doing the task at the same time as them. I found that shared experience reinforced self perspective (via shared perspective) rather than reinforcing non-self perspective (via unshared perspective). I did not find any crowd effect.

Styles APA, Harvard, Vancouver, ISO, etc.

29

Assis, Adriana Alcida Pacheco Ramiro de. « Gestão de pessoas em momento de formação de joint venture : estudo de caso em uma empresa multinacional do segmento de refrigeração ». reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2012. http://hdl.handle.net/10183/56660.

Texte intégral

Résumé :

Este trabalho objetivou fazer uma análise da gestão de pessoas na subsidiária brasileira de uma empresa multinacional do segmento de produção e distribuição de equipamentos de refrigeração diante de situação de transformação dos negócios oriunda da fusão com outro grupo econômico, considerando seu impacto no ambiente interno e no ambiente de negócio em um momento de instabilidade na economia mundial e de importantes alterações no segmento onde a Empresa opera. Para tanto, foi conduzido um estudo de caso, com a realização de 10 entrevistas e consulta a dados e documentos internos, sendo as informações levantadas analisadas com base no referencial teórico e mediante análise de conteúdo. Dentre os principais resultados, destacam-se: a necessidade de revisão dos processos e práticas de gestão de pessoas neste novo cenário, o foco na gestão da transformação com suporte da área de RH, o valor da comunicação interna para assegurar o alinhamento de todas as partes interessadas e a demanda por reforço no desenvolvimento da liderança, preparando-a para gerir a Empresa em tempos de transição e mudanças.
This study aimed to analyze the human resource management in a Brazilian subsidiary of a multinational enterprise in the production and distribution of refrigeration equipment segment, facing business transformation due to the merger with another economic group, considering the impact on the internal environment and the business environment in a time of instability in the world economy and important changes in the field the company operates. To that end, a case study was conducted, composed of 10 interviews and consultation to internal documents and data, analyzing the collected information in the light of theoretical framework and content analysis. Among the key findings, we highlight: the need to review processes and practices of people management under the new scenario; the focus on managing the transformation of human resources support; the value of internal communication, ensuring alignment and leadership development of all stakeholders, preparing them to manage the company in times of transition and change.

Styles APA, Harvard, Vancouver, ISO, etc.

30

Sha, Long. « Representing and predicting multi-agent data in adversarial team sports ». Thesis, Queensland University of Technology, 2018. https://eprints.qut.edu.au/116506/1/Long_Sha_Thesis.pdf.

Texte intégral

Résumé :

This thesis addresses the theoretical challenges of the application of Artificial Intelligence (AI) to the domain of sports. The key contribution of this work is a new data representation that allows AI algorithms to understand real world sports games such as basketball and soccer. The theoretical advances that this thesis has contributed has the potential to make a significant impact on many aspects of sport analytics, such as prediction, retrieval and simulation. Intelligent systems have been developed based upon this method which enables active spectator engagement in sporting events and more effective coaching of athletes.

Styles APA, Harvard, Vancouver, ISO, etc.

31

Vaněčková, Tereza. « Numerické metody pro klasifikaci metagenomických dat ». Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2016. http://www.nusl.cz/ntk/nusl-242014.

Texte intégral

Résumé :

This thesis deals with metagenomics and numerical methods for classification of metagenomic data. Review of alignment-free methods based on nucleotide word frequency is provided as they appear to be effective for processing of metagenomic sequence reads produced by next-generation sequencing technologies. To evaluate these methods, selected features based on k-mer analysis were tested on simulated dataset of metagenomic sequence reads. Then the data in original data space were enrolled for hierarchical clustering and PCA processed data were clustered by K-means algorithm. Analysis was performed for different lengths of nucleotide words and evaluated in terms of classification accuracy.

Styles APA, Harvard, Vancouver, ISO, etc.

32

Ni, Weiyuan. « Recalage d'images de visage ». Thesis, Grenoble, 2012. http://www.theses.fr/2012GRENT045/document.

Texte intégral

Résumé :

Etude bibliographique sur le recalage d'images de visage et sur le recalage d'images et travail en collaboration avec Son VuS, pour définir la précision nécessaire du recalage en fonction des exigences des méthodes de reconnaissance de visages
Face alignment is an important step in a typical automatic face recognition system.This thesis addresses the alignment of faces for face recognition applicationin video surveillance context. The main challenging factors of this research includethe low quality of images (e.g., low resolution, motion blur, and noise), uncontrolledillumination conditions, pose variations, expression changes, and occlusions. In orderto deal with these problems, we propose several face alignment methods using differentstrategies. The _rst part of our work is a three-stage method for facial pointlocalization which can be used for correcting mis-alignment errors. While existingalgorithms mostly rely on a priori knowledge of facial structure and on a trainingphase, our approach works in an online mode without requirements of pre-de_nedconstraints on feature distributions. The proposed method works well on images underexpression and lighting variations. The key contributions of this thesis are aboutjoint image alignment algorithms where a set of images is simultaneously alignedwithout a biased template selection. We respectively propose two unsupervised jointalignment algorithms : \Lucas-Kanade entropy congealing" (LKC) and \gradient correlationcongealing" (GCC). In LKC, an image ensemble is aligned by minimizing asum-of-entropy function de_ned over all images. GCC uses gradient correlation coef-_cient as similarity measure. The proposed algorithms perform well on images underdi_erent conditions. To further improve the robustness to mis-alignments and thecomputational speed, we apply a multi-resolution framework to joint face alignmentalgorithms. Moreover, our work is not limited in the face alignment stage. Since facealignment and face acquisition are interrelated, we develop an adaptive appearanceface tracking method with alignment feedbacks. This closed-loop framework showsits robustness to large variations in target's state, and it signi_cantly decreases themis-alignment errors in tracked faces

Styles APA, Harvard, Vancouver, ISO, etc.

33

Danzì, Paolo. « Mining dei Workflow di un Laboratorio di Anatomia Patologica ». Master's thesis, Alma Mater Studiorum - Università di Bologna, 2011. http://amslaurea.unibo.it/2924/.

Texte intégral

Résumé :

Il presente lavoro nasce dall’obiettivo di individuare strumenti statistici per indagare, sotto diversi aspetti, il flusso di lavoro di un Laboratorio di Anatomia Patologica. Il punto di partenza dello studio è l’ambiente di lavoro di ATHENA, software gestionale utilizzato nell’Anatomia Patologica, sviluppato dalla NoemaLife S.p.A., azienda specializzata nell’informatica per la sanità. A partire da tale applicativo è stato innanzitutto formalizzato il workflow del laboratorio (Capitolo 2), nelle sue caratteristiche e nelle sue possibili varianti, identificando le operazioni principali attraverso una serie di “fasi”. Proprio le fasi, unitamente alle informazioni addizionali ad esse associate, saranno per tutta la trattazione e sotto diversi punti di vista al centro dello studio. L’analisi che presentiamo è stata per completezza sviluppata in due scenari che tengono conto di diversi aspetti delle informazioni in possesso. Il primo scenario tiene conto delle sequenze di fasi, che si presentano nel loro ordine cronologico, comprensive di eventuali ripetizioni o cicli di fasi precedenti alla conclusione. Attraverso l’elaborazione dei dati secondo specifici formati è stata svolta un’iniziale indagine grafica di Workflow Mining (Capitolo 3) grazie all’ausilio di EMiT, un software che attraverso un set di log di processo restituisce graficamente il flusso di lavoro che li rappresenta. Questa indagine consente già di valutare la completezza dell’utilizzo di un applicativo rispetto alle sue potenzialità. Successivamente, le stesse fasi sono state elaborate attraverso uno specifico adattamento di un comune algoritmo di allineamento globale, l’algoritmo Needleman-Wunsch (Capitolo 4). L’utilizzo delle tecniche di allineamento applicate a sequenze di processo è in grado di individuare, nell’ambito di una specifica codifica delle fasi, le similarità tra casi clinici. L’algoritmo di Needleman-Wunsch individua le identità e le discordanze tra due stringhe di caratteri, assegnando relativi punteggi che portano a valutarne la similarità. Tale algoritmo è stato opportunamente modificato affinché possa riconoscere e penalizzare differentemente cicli e ripetizioni, piuttosto che fasi mancanti. Sempre in ottica di allineamento sarà utilizzato l’algoritmo euristico Clustal, che a partire da un confronto pairwise tra sequenze costruisce un dendrogramma rappresentante graficamente l’aggregazione dei casi in funzione della loro similarità. Proprio il dendrogramma, per la sua struttura grafica ad albero, è in grado di mostrare intuitivamente l’andamento evolutivo della similarità di un pattern di casi. Il secondo scenario (Capitolo 5) aggiunge alle sequenze l’informazione temporale in termini di istante di esecuzione di ogni fase. Da un dominio basato su sequenze di fasi, si passa dunque ad uno scenario di serie temporali. I tempi rappresentano infatti un dato essenziale per valutare la performance di un laboratorio e per individuare la conformità agli standard richiesti. Il confronto tra i casi è stato effettuato con diverse modalità, in modo da stabilire la distanza tra tutte le coppie sotto diversi aspetti: le sequenze, rappresentate in uno specifico sistema di riferimento, sono state confrontate in base alla Distanza Euclidea ed alla Dynamic Time Warping, in grado di esprimerne le discordanze rispettivamente temporali, di forma e, dunque, di processo. Alla luce dei risultati e del loro confronto, saranno presentate già in questa fase le prime valutazioni sulla pertinenza delle distanze e sulle informazioni deducibili da esse. Il Capitolo 6 rappresenta la ricerca delle correlazioni tra elementi caratteristici del processo e la performance dello stesso. Svariati fattori come le procedure utilizzate, gli utenti coinvolti ed ulteriori specificità determinano direttamente o indirettamente la qualità del servizio erogato. Le distanze precedentemente calcolate vengono dunque sottoposte a clustering, una tecnica che a partire da un insieme eterogeneo di elementi individua famiglie o gruppi simili. L’algoritmo utilizzato sarà l’UPGMA, comunemente applicato nel clustering in quanto, utilizzando, una logica di medie pesate, porta a clusterizzazioni pertinenti anche in ambiti diversi, dal campo biologico a quello industriale. L’ottenimento dei cluster potrà dunque essere finalmente sottoposto ad un’attività di ricerca di correlazioni utili, che saranno individuate ed interpretate relativamente all’attività gestionale del laboratorio. La presente trattazione propone quindi modelli sperimentali adattati al caso in esame ma idealmente estendibili, interamente o in parte, a tutti i processi che presentano caratteristiche analoghe.

Styles APA, Harvard, Vancouver, ISO, etc.

34

Tetley, Romain. « Analyse mixte de protéines basée sur la séquence et la structure - applications à l'annotation fonctionnelle ». Thesis, Université Côte d'Azur (ComUE), 2018. http://www.theses.fr/2018AZUR4111/document.

Texte intégral

Résumé :

Dans cette thèse, l'emphase est mise sur la réconciliation de l'analyse de structure et de séquence pour les protéines. L'analyse de séquence brille lorsqu'il s'agit de comparer des protéines présentant une forte identité de séquence (≤ 30\%) mais laisse à désirer pour identifier des homologues lointains. L'analyse de structure est une alternative intéressante. Cependant, les méthodes de résolution de structures sont coûteuses et complexes - lorsque toutefois elles produisent des résultats. Ces observations rendent évident la nécessité de développer des méthodes hybrides, exploitant l'information extraite des structures disponibles pour l'injecter dans des modèles de séquence. Cette thèse produit quatre contributions principales dans ce domaine. Premièrement, nous présentons une nouvelle distance structurale, le RMSDcomb, basée sur des patterns de conservation structurale locale, les motifs structuraux. Deuxièmement, nous avons développé une méthode pour identifier des motifs structuraux entre deux structures exploitant un bootstrap dépendant de filtrations. Notre approche n'est pas un compétiteur direct des aligneurs flexibles mais permet plutôt de produire des analyses multi-échelles de similarités structurales. Troisièmement, nous exploitons les méthodes suscitées pour construire des modèles de Markov cachés hybrides biaisés vers des régions mieux conservées structurellement. Nous utilisons un tel modèle pour caractériser les protéines de fusion virales de classe II, une tâche particulièrement ardue du fait de leur faible identité de séquence et leur conservation structurale moyenne. Ce faisant, nous parvenons à trouver un certain nombre d'homologues distants connues des protéines virales, notamment chez la Drosophile. Enfin, en formalisant un sous-problème rencontré lors de la comparaison de filtrations, nous présentons un nouveau problème théorique - le D-family matching - sur lequel nous démontrons des résultats algorithmiques variés. Nous montrons - d'une façon analogue à la comparaison de régions de deux conformations d'une protéine - comment exploiter ce modèle théorique pour comparer deux clusterings d'un même jeu de données
In this thesis, the focus is set on reconciling the realms of structure and sequence for protein analysis. Sequence analysis tools shine when faced with proteins presenting high sequence identity (≤ 30\%), but are lack - luster when it comes to remote homolog detection. Structural analysis tools present an interesting alternative, but solving structures - when at all possible- is a tedious and expensive process. These observations make the need for hybrid methods - which inject information obtained from available structures in a sequence model - quite clear. This thesis makes four main contributions toward this goal. First we present a novel structural measure, the RMSDcomb, based on local structural conservation patterns - the so called structural motifs. Second, we developed a method to identify structural motifs between two structures using a bootstrap method which relies on filtrations. Our approach is not a direct competitor to flexible aligners but can provide useful to perform a multiscale analysis of structural similarities. Third, we build upon the previous methods to design hybrid Hidden Markov Models which are biased towards regions of increased structural conservation between sets of proteins. We test this tool on the class II fusion viral proteins - particularly challenging because of their low sequence identity and mild structural homology. We find that we are able to recover known remote homologs of the viral proteins in the Drosophila and other organisms. Finally, formalizing a sub - problem encountered when comparing filtrations, we present a new theoretical problem - the D-family matching - on which we present various algorithmic results. We show - in a manner that is analogous to comparing parts of two protein conformations - how it is possible to compare two clusterings of the same data set using such a theoretical model

Styles APA, Harvard, Vancouver, ISO, etc.

35

Kühnapfel, Thorsten. « Audio networks for speech enhancement and indexing ». Thesis, Curtin University, 2009. http://hdl.handle.net/20.500.11937/206.

Texte intégral

Résumé :

For humans, hearing is the second most important sense, after sight. Therefore, acoustic information greatly contributes to observing and analysing an area of interest. For this reason combining audio and video cues for surveillance enhances scene understanding and the observed events. However, when combining different sensors their measurements need to be correlated, which is done by either knowing the exact relative sensor alignment or learning a mapping function. Most deployed systems assume a known relative sensor alignment, making them susceptible to sensor drifts. Additionally, audio recordings are generally a mixture of several source signals and therefore need to be processed to extract a desired sound source, such as speech of a target person.In this thesis a generic framework is described that captures, indexes and extracts surveillance events from coordinated audio and video cues. It presents a dynamic joint-sensor calibration approach that uses audio-visual sensor measurements to dynamically and incrementally learn the calibration function, making the sensor calibration resilient to independent drifts in the sensor suite. Experiments demonstrate the use of such a framework for enhancing surveillance.Furthermore, a speech enhancement approach is presented based on a distributed network of microphones, increasing the effectiveness for acoustic surveillance of large areas. This approach is able to detect and enhance speech in the presence of rapidly changing environmental noise. Spectral subtraction, a single channel speech enhancement approach, is modified to adapt quickly to rapid noise changes of two common noise sources by incorporating multiple noise models. The result of the cross correlation based noise classification approach is also utilised to improve the voice activity detection by minimising false detection based on rapid noise changes. Experiments with real world noise consisting of scooter and café noise have proven the advantage of multiple noise models especially when the noise changes during speech.The modified spectral subtraction approach is then extended to real world scenarios by introducing more and highly non-stationary noise types. Thus, the focus is directed to implement a more sophisticated noise classification approach by extracting a variety of acoustic features and applying PCA transformation to compute the Mahalanobis distance to each noise class. This distance measurement is also included in the voice activity detection algorithm to reduce false detection for highly non-stationary noise types. However, using spectral subtraction in non-stationary noise environments, such as street noise, reduces the performance of the speech enhancement. For that reason the speech enhancement approach is further improved by using the sound information of the entire network to update the noise model of the detected noise type during speech. This adjustment considerably improved the speech enhancement performance in non-stationary noise environments. Experiments conducted under diverse real world conditions including rapid noise changes and non-stationary noise sources demonstrate the effectiveness of the presented method.

Styles APA, Harvard, Vancouver, ISO, etc.

36

Ren, Jinchang. « Semantic content analysis for effective video segmentation, summarisation and retrieval ». Thesis, University of Bradford, 2009. http://hdl.handle.net/10454/4251.

Texte intégral

Résumé :

This thesis focuses on four main research themes namely shot boundary detection, fast frame alignment, activity-driven video summarisation, and highlights based video annotation and retrieval. A number of novel algorithms have been proposed to address these issues, which can be highlighted as follows. Firstly, accurate and robust shot boundary detection is achieved through modelling of cuts into sub-categories and appearance based modelling of several gradual transitions, along with some novel features extracted from compressed video. Secondly, fast and robust frame alignment is achieved via the proposed subspace phase correlation (SPC) and an improved sub-pixel strategy. The SPC is proved to be insensitive to zero-mean-noise, and its gradient-based extension is even robust to non-zero-mean noise and can be used to deal with non-overlapped regions for robust image registration. Thirdly, hierarchical modelling of rush videos using formal language techniques is proposed, which can guide the modelling and removal of several kinds of junk frames as well as adaptive clustering of retakes. With an extracted activity level measurement, shot and sub-shot are detected for content-adaptive video summarisation. Fourthly, highlights based video annotation and retrieval is achieved, in which statistical modelling of skin pixel colours, knowledge-based shot detection, and improved determination of camera motion patterns are employed. Within these proposed techniques, one important principle is to integrate various kinds of feature evidence and to incorporate prior knowledge in modelling the given problems. High-level hierarchical representation is extracted from the original linear structure for effective management and content-based retrieval of video data. As most of the work is implemented in the compressed domain, one additional benefit is the achieved high efficiency, which will be useful for many online applications.

Styles APA, Harvard, Vancouver, ISO, etc.

37

Buckingham, Lawrence. « K-mer based algorithms for biological sequence comparison and search ». Thesis, Queensland University of Technology, 2022. https://eprints.qut.edu.au/236377/1/Buckingham%2BThesis%281%29.pdf.

Texte intégral

Résumé :

The present thesis develops novel algorithms for biological sequence comparison and accelerated sequence database search, motivated by the need to work effectively with the rapidly expanding volume of sequence data which is available as the result of continuing advances in sequencing technology. Empirical tests using datasets of realistic size and content demonstrate that these algorithms are approximately an order of magnitude faster than standard sequence database search tools, while attaining higher precision. While the algorithms have been developed and tested in a biological context, they are applicable to any problem involving comparison of sequential data series.

Styles APA, Harvard, Vancouver, ISO, etc.

38

Belghiti, Moulay Tayeb. « Modélisation et techniques d'optimisation en bio-informatique et fouille de données ». Thesis, Rouen, INSA, 2008. http://www.theses.fr/2008ISAM0002.

Texte intégral

Résumé :

Cette thèse est particulièrement destinée à traiter deux types de problèmes : clustering et l'alignement multiple de séquence. Notre objectif est de résoudre de manière satisfaisante ces problèmes globaux et de tester l'approche de la Programmation DC et DCA sur des jeux de données réelles. La thèse comporte trois parties : la première partie est consacrée aux nouvelles approches de l'optimisation non convexe. Nous y présentons une étude en profondeur de l'algorithme qui est utilisé dans cette thèse, à savoir la programmation DC et l'algorithme DC (DCA). Dans la deuxième partie, nous allons modéliser le problème clustering en trois sous-problèmes non convexes. Les deux premiers sous-problèmes se distinguent par rapport au choix de la norme utilisée, (clustering via les normes 1 et 2). Le troisième sous-problème utilise la méthode du noyau, (clustering via la méthode du noyau). La troisième partie sera consacrée à la bio-informatique. On va se focaliser sur la modélisation et la résolution de deux sous-problèmes : l'alignement multiple de séquence et l'alignement de séquence d'ARN par structure. Tous les chapitres excepté le premier se terminent par des tests numériques
This Ph.D. thesis is particularly intended to treat two types of problems : clustering and the multiple alignment of sequence. Our objective is to solve efficiently these global problems and to test DC Programming approach and DCA on real datasets. The thesis is divided into three parts : the first part is devoted to the new approaches of nonconvex optimization-global optimization. We present it a study in depth of the algorithm which is used in this thesis, namely the programming DC and the algorithm DC ( DCA). In the second part, we will model the problem clustering in three nonconvex subproblems. The first two subproblems are distinguished compared to the choice from the norm used, (clustering via norm 1 and 2). The third subproblem uses the method of the kernel, (clustering via the method of the kernel). The third part will be devoted to bioinformatics, one goes this focused on the modeling and the resolution of two subproblems : the multiple alignment of sequence and the alignment of sequence of RNA. All the chapters except the first end in numerical tests

Styles APA, Harvard, Vancouver, ISO, etc.

39

Schimd, Michele. « Quality value based models and methods for sequencing data ». Doctoral thesis, Università degli studi di Padova, 2015. http://hdl.handle.net/11577/3424144.

Texte intégral

Résumé :

First isolated by Friedrich Miescher in 1869 and then identified by James Watson and Francis Crick in 1953, the double stranded DeoxyriboNucleic Acid (DNA) molecule of Homo sapiens took fifty years to be completely reconstructed and to finally be at disposal to researchers for deep studies and analyses. The first technologies for DNA sequencing appeared around the mid-1970s; among them the most successful has been chain termination method, usually referred to as Sanger method. They remained de-facto standard for sequencing until, at the beginning of the 2000s, Next Generation Sequencing (NGS) technologies started to be developed. These technologies are able to produce huge amount of data with competitive costs in terms of dollars per base, but now further advances are revealing themselves in form of Single Molecule Real Time (SMRT) based sequencer, like Pacific Biosciences, that promises to produce fragments of length never been available before. However, none of above technologies are able to read an entire DNA, they can only produce short fragments (called reads) of the sample in a process referred to as sequencing. Although all these technologies have different characteristics, one recurrent trend in their evolution has been represented by the constant grow of the fraction of errors injected into the final reads. While Sanger machines produce as low as 1 erroneous base in 1000, the recent PacBio sequencers have an average error rate of 15%; NGS machines place themselves roughly in the middle with the expected error rate around 1%. With such a heterogeneity of error profiles and, as more and more data is produced every day, algorithms being able to cope with different sequencing technologies are becoming fundamental; at the same time also models for the description of sequencing with the inclusion of error profiling are gaining importance. A key feature that can make these approaches really effective is the ability of sequencers of producing quality scores which measure the probability of observing a sequencing error. In this thesis we present a stochastic model for the sequencing process and show its application to the problems of clustering and filtering of reads. The novel idea is to use quality scores to build a probabilistic framework that models the entire process of sequencing. Although relatively straightforward, the developing of such a model goes through the proper definition of probability spaces and events on such spaces. To keep the model simple and tractable several simplification hypotheses need to be introduce, each of them, however, must be explicitly stated and extensively discussed. The final result is a model for sequencing process that can be used: to give probabilistic interpretation of the problems defined on sequencing data and to characterize corresponding probabilistic answers (i.e., solutions). To experimentally validate the aforementioned model, we apply it to two different problems: reads clustering and reads filtering. The first set of experiments goes through the introduction of a set of novel alignment-free measures D2 resulting from the extension of the well known D2 -type measures to incorporate quality values. More precisely, instead of adding a unit contribution to the k-mers count statistic (as for D2 statistics), each k- mer contributes with an additive term corresponding to its probability of being correct as defined by our stochastic model. We show that this new measures are effective when applied to clustering of reads, by employing clusters produced with D2 as input to the problems of metagenomic binning and de-novo assembly. In the second set of experiments conducted to validate our stochastic model, we applied the same definition of correct read to the problem of reads filtering. We first define rank filtering which is a lossless filtering technique that sorts reads based on a given criterion; then we used the sorted list of reads as input of algorithms for reads mapping and de-novo assembly. The idea is that, on the reordered set, reads ranking higher should have better quality than the ones at lower ranks. To test this conjecture, we use such filtering as pre-processing step of reads mapping and de-novo assembly; in both cases we observe improvements when our rank filtering approach is used.
Isolata per la prima volta da Friedrich Miescher nel 1869 ed identificata nel 1953 da James Watson e Francis Crick, la molecola del DNA (acido desossiribonucleico) umano ha richiesto più di 50 anni perchè fosse a disposizione della comunità internazionale per studi e analisi approfondite. Le prime tecnologie di sequenziamento sono apparse attorno alla metà degli anni 70, tra queste quella di maggiore successo è stata la tecnologia denominata Sanger rimasta poi lo standard di fatto per il sequenziamento fino a che, agli inizi degli anni 2000, sequenziatori battezzati di nuova generazione (Next Generation Sequencing (NGS)) sono comparsi sul mercato. Questi ultimi hanno velocemente preso piede grazie ai bassi costi di sequenziamento soprattutto se confrontati con le precedenti macchine Sanger. Oggi tuttavia, nuove tecnologie (ad esempio PacBio di Pacific Biosciences) si stanno facendo strada grazie alla loro capacità di produrre frammenti di lunghezze mai ottenute prima d’ora. Nonostante la continua evoluzione nessuna di queste tecnologie è ancora in grado di produrre letture complete del DNA, ma solo parziali frammenti (chiamati read) come risultato del processo biochimico chiamato sequenziamento. Un trend ricorrente durante l’evoluzione dei sequenziatori è rappresentato dalla crescente presenza di errori di sequenziamento, se nelle read Sanger in media una lettura su mille corrisponde ad un errore, le ultime macchine PacBio sono caratterizzate da un tasso di errore di circa il 15%, una situazione più o meno intermedia è rappresentata dalle read NGS all’interno delle quali questo tasso si attesta su valori attorno al 1%. E’ chiaro quindi che algoritmi in grado di processare dati con diversi caratteristiche in termini di errori di sequenziamento stanno acquisendo maggiore importanza mentre lo sviluppo di modelli ad-hoc che affrontino esplicitamente il problema degli errori di sequenziamento stanno assumendo notevole rilevanza. A supporto di queste tecniche le macchine sequenziatrici producono valori di qualità (quality scores o quality values) che possono esser messi in relazione con la probabilità di osservare un errore di sequenziamento. In questa tesi viene presentato un modello stocastico per descrivere il processo di sequenziamento e ne vengono presentate due applicazioni: clustering di read e il filtraggio di read. L’idea alla base del modello è di utilizzare i valori di qualità come fondamento per la definizione di un modello probabilistico che descriva il processo di sequenziamento. La derivazione di tale modello richiede la definizione rigorosa degli spazi di probabilità coinvolti e degli eventi in essi definiti. Inoltre, allo scopo di sviluppare un modello semplice e trattabile è necessario introdurre ipotesi semplificative che agevolino tale processo, tuttavia tali ipotesi debbono essere esplicitate ed opportunamente discusse. Per fornirne una validazione sperimentale, il modello è stato applicato ai problemi di clustering e filtraggio. Nel primo caso il clustering viene eseguito utilizzando le nuove misure Dq2 ottenute come estensione delle note misure alignment-free D2 attraverso l’introduzione dei valori di qualità. Più precisamente anzichè indurre un contributo unitario al conto della frequenza dei k-mer (come avviene per le statistiche D2), nelle misure Dq2 il contributo di un k-mer coincide con la probabilità dello stesso si essere corretto, calcolata sulla base dei valori di qualità associati. I risultati del clustering sono poi utilizzati per risolvere il problema del de-novo assembly (ricostruzione ex-novo di sequenze) e del metagenomic binning (classificazione di read da esperimenti di metagenomica). Una seconda applicazione del modello teorico è rappresentata dal problema del filtraggio di read utilizzando un approccio senza perdita di informazione in cui le read vengono ordinate secondo la loro probabilità di correttezza. L’idea che giustifica l’impiego di tale approccio è che l’ordinamento dovrebbe collocare nelle posizioni più alte le read con migliore qualità retrocedendo quelle con qualità più bassa. Per verificare la validità di questa nostra congettura, il filtraggio è stato utilizzato come fase preliminare di algoritmi per mappaggio di read e de-novo assembly. In entrambi i casi si osserva un miglioramento delle prestazione degli algoritmi quando le read sono presentate nell’ordine indotto dalla nostra misura. La tesi è strutturata nel seguente modo. Nel Capitolo 1 viene fornita una introduzione al sequenziamento e una panoramica dei principali problemi definiti sui dati prodotti. Inoltre vengono dati alcuni cenni sulla rappresentazione di sequenze, read e valori di qualità. Alla fine dello stesso Capitolo 1 si delineano brevemente i principali contributi della tesi e la letteratura correlata. Il Capitolo 2 contiene la derivazione formale del modello probabilistico per il sequenziamento. Nella prima parte viene schematicamente presentato il processo di produzione di una coppia simbolo qualità per poi passare alla definizione di spazi di probabilità per sequenze e sequenziamento. Mentre gli aspetti relativo alla distribuzione di probabilità per la sequenza di riferimento non vengono considerati in questa tesi, la descrizione probabilistica del processo di sequenziamento è trattata in dettaglio nella parte centrale del Capitolo 2 nella cui ultima parte viene presentata la derivazione della probabilità di correttezza di una read che viene poi utilizzata nei capitoli successivi. Il Capitolo 3 presenta le misure Dq2 e gli esperimenti relativi al clustering i cui risultati sono frutto del lavoro svolto in collaborazione con Matto Comin e Andrea Leoni e pubblicato in [CLS14] e [CLS15]. Il Capitolo 4 presenta invece i risultati preliminari fin qui ottenuti per il filtraggio di read basato sui valori di qualità. Infine il Capitolo 5 presenta le conclusioni e delinea le direzioni future che si intendono perseguire a continuamento del lavoro qui presentato.

Styles APA, Harvard, Vancouver, ISO, etc.

40

Campbell, Benjamin W. « Supervised and Unsupervised Machine Learning Strategies for Modeling Military Alliances ». The Ohio State University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=osu1558024695617708.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

41

Antonsson, Anna. « Lärarens dubbla uppdrag inom Ekonomiprogrammet ». Thesis, Linnéuniversitetet, Institutionen för utbildningsvetenskap (UV), 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-49610.

Texte intégral

Résumé :

Denna studie handlar om lärarens dubbla uppdrag inom gymnasieskolans ekonomiprogram och är ännu en studie som visar på att uppgiftsorienterat/elevaktivt förhållningssätt ger läraren förutsättning att uppfylla Skollagens båda uppdrag, att ge eleverna kunskaper och demokratiska värden, samt känna lust i lärandet samtidigt. Utgångspunkt är Skollagen och Skolverkets texter. Skollagen initierar att eleverna genom sin utbildning skall erhålla kunskaper och demokratiska värden. Det är detta som är studiens definition av det dubbla uppdraget. Tanken bakom studien är; Skollagen initierar det dubbla uppdraget, detta realiseras via uppgiftsorienterat/ elevaktivt förhållningssätt vilket ger eleven en dialektisk utveckling. Syftet i studien är att undersöka om eleverna genom arbetssätten uppgiftsorienterat/elevaktivt förhållningssätt utvecklar de medierande metoderna ”förberedelser med informationsintag inför deliberativa samtal” och ”förmåga att delta i deliberativa samtal.” Om detta skulle visa sig stämma skulle detta vara en användbar modell för att genomföra lärarens dubbla uppdrag samtidigt. Studiens frågeställningar: Utvecklar eleverna bättre översikt eller arbetsplanering av kursen? Upplever eleverna lust i lärandet? Påverkar metoden elevernas kunskapsresultat? Begrepp som är väsentliga för studien är; beslutsprocesser, det deliberativa samtalet, dialektik, elevaktivitet, grundprincipen offentlighet i det deliberativa samtalet, lärarens dubbla uppdrag, medierande metoder, påverka utbildningen till form och innehåll, uppgiftsorienterad pedagogik. Varför det är väsentligt att arbeta med de demokratiska värdena samt ge eleverna arbetssätt för att snabbt ta in information just inom Ekonomiprogrammet? Eleverna inom detta program kommer i sina framtida yrkesroller att delta i beslutsfattande i miljöer som inte präglas av demokratiska värden. Det är därför viktigt att de lär sig demokratiska arbetssätt som det deliberativa samtalet inför sitt yrkesliv. Det handlar om vilka ledare vi skapar inför framtiden. Eleverna måste även kunna arbeta självständigt med informationssökning i sina yrkesroller. Studien är utformad som en enkätstudie där två klasser, EK13 & EK14, som läser marknadsföring parallellt, ombeds utvärdera ett kursmoment (kurs1) som genomförs med ordinarie lärares ”vanliga” arbetssätt. Därefter genomförs ett kursmoment (kurs2) som är utformat i enlighet med uppgiftsorienterat/elaktivt förhållningssätt. Eleverna ombeds utvärdera även denna kurs med samma enkätformulär som för kurs1. Teoretisk referensram för utformning av studien är Egidius, Hansén, Nihlfors, Selberg, Vygotskij med fokus på elevinflytande och hur det kan påverka utveckling av kompetenser i form av kunskaper och demokratiska värden. Undersökningen visar att arbetssätten ger eleverna möjlighet att utveckla medierande metoder för ”förberedelser med informationsintag inför deliberativa samtal” och ”förmåga att delta i deliberativa samtal”. Det finns indikationer på att eleverna utvecklar en bättre arbetsplanering och översikt över kursen. Det finns starka indikationer på att eleverna upplever större lust i lärandet. Resultaten visar även att arbetssätten är effektiva med avseende på elevernas kunskapsutveckling. Därmed konstateras att uppgiftsorienterat/elevaktivt förhållningssätt är en användbar modell för att genomföra lärarens dubbla uppdrag samtidigt inom ekonomiprogrammet.
This study is about the teacher's dual mission in upper secondary school alignment economy and is yet another study showing that the approach joint influence in learning, gives teachers prerequisite to meet the two missions stated in the Education Act, to give students both skills and democratic values, and feel the desire of learning at the same time. Starting point of the study is the texts of the Education Act and the National Agency for Education. The Education Act initiates that the students through their training shall receive skills and democratic values. It is this which is the study's definition of the dual task. The idea behind the study is; The Education Act initiates the dual task, this is then realized through joint influence in learning, giving pupils a dialectical development. The purpose of the study is to investigate whether the students through joint influence in learning, develops the mediating methods "preparation of information intake before the deliberative dialogue 'and' ability to participate in deliberative dialogue." If this proved correct, this would be a useful model for implement teacher's dual mission simultaneously. The study's questions: Does students develop better overview or work planning of the course? Does the students experience desire for learning? Does the method affect the students' academic achievement? Concepts that are essential for the study; decision-making, the deliberative dialogue, dialectic, joint influence in learning, the principle of openness in the deliberative dialogue, the teacher's dual role, mediating practices, influencing the education in form and content, Achievement goal theory. Theoretical framework for the design of the study is Egidius, Hansen, Nihlfors, Selberg, Vygotsky, focusing on joint influence in learning and how this can affect the development of competencies in terms of knowledge and democratic values. Why is it essential to work with democratic values in alignment economy? Students of this program, will in their future professional roles, participate in decision making in environments that are not characterized by democratic values. It is therefore important that they learn democratic functioning as deliberative dialogue before his or hers professional life. It's about what kind of leaders we create for the future. The students must also be able to work independently with different kinds of information in their professional roles. The study is designed as a survey in which two classes, EK13 and EK14, which both read marketing in parallel, are asked to evaluate the course modules (course1) undertaken by the regular teachers 'usual' approach. Then a lesson is conducted (course2) which is designed in accordance with joint influence in learning. Students are asked to evaluate also this course with the same questionnaire as for course1. The survey shows that the working methods in joint influence in learning gives students the opportunity to develop mediating methods of "preparation of information intake before the deliberative dialogue' and 'ability to participate in deliberative dialogue'. There are indications that students develop a better work planning and overview of the course. There are strong indications that students experience a greater desire for learning. The results also show that working methods are efficient in terms of pupil achievement. Thus found that joint influence in learning is a useful model for implementing the teacher's dual mission simultaneously.

Styles APA, Harvard, Vancouver, ISO, etc.

42

Şentürk, Sertan. « Computational analysis of audio recordings and music scores for the description and discovery of Ottoman-Turkish Makam music ». Doctoral thesis, Universitat Pompeu Fabra, 2017. http://hdl.handle.net/10803/402102.

Texte intégral

Résumé :

This thesis addresses several shortcomings on the current state of the art methodologies in music information retrieval (MIR). In particular, it proposes several computational approaches to automatically analyze and describe music scores and audio recordings of Ottoman-Turkish makam music (OTMM). The main contributions of the thesis are the music corpus that has been created to carry out the research and the audio-score alignment methodology developed for the analysis of the corpus. In addition, several novel computational analysis methodologies are presented in the context of common MIR tasks of relevance for OTMM. Some example tasks are predominant melody extraction, tonic identification, tempo estimation, makam recognition, tuning analysis, structural analysis and melodic progression analysis. These methodologies become a part of a complete system called Dunya-makam for the exploration of large corpora of OTMM. The thesis starts by presenting the created CompMusic Ottoman- Turkish makam music corpus. The corpus includes 2200 music scores, more than 6500 audio recordings, and accompanying metadata. The data has been collected, annotated and curated with the help of music experts. Using criteria such as completeness, coverage and quality, we validate the corpus and show its research potential. In fact, our corpus is the largest and most representative resource of OTMM that can be used for computational research. Several test datasets have also been created from the corpus to develop and evaluate the specific methodologies proposed for different computational tasks addressed in the thesis. The part focusing on the analysis of music scores is centered on phrase and section level structural analysis. Phrase boundaries are automatically identified using an existing state-of-the-art segmentation methodology. Section boundaries are extracted using heuristics specific to the formatting of the music scores. Subsequently, a novel method based on graph analysis is used to establish similarities across these structural elements in terms of melody and lyrics, and to label the relations semiotically. The audio analysis section of the thesis reviews the state-of-the-art for analysing the melodic aspects of performances of OTMM. It proposes adaptations of existing predominant melody extraction methods tailored to OTMM. It also presents improvements over pitch-distribution-based tonic identification and makam recognition methodologies. The audio-score alignment methodology is the core of the thesis. It addresses the culture-specific challenges posed by the musical characteristics, music theory related representations and oral praxis of OTMM. Based on several techniques such as subsequence dynamic time warping, Hough transform and variable-length Markov models, the audio-score alignment methodology is designed to handle the structural differences between music scores and audio recordings. The method is robust to the presence of non-notated melodic expressions, tempo deviations within the music performances, and differences in tonic and tuning. The methodology utilizes the outputs of the score and audio analysis, and links the audio and the symbolic data. In addition, the alignment methodology is used to obtain score-informed description of audio recordings. The scoreinformed audio analysis not only simplifies the audio feature extraction steps that would require sophisticated audio processing approaches, but also substantially improves the performance compared with results obtained from the state-of-the-art methods solely relying on audio data. The analysis methodologies presented in the thesis are applied to the CompMusic Ottoman-Turkish makam music corpus and integrated into a web application aimed at culture-aware music discovery. Some of the methodologies have already been applied to other music traditions such as Hindustani, Carnatic and Greek music. Following open research best practices, all the created data, software tools and analysis results are openly available. The methodologies, the tools and the corpus itself provide vast opportunities for future research in many fields such as music information retrieval, computational musicology and music education.
Esta tesis aborda varias limitaciones de las metodologías más avanzadas en el campo de recuperación de información musical (MIR por sus siglas en inglés). En particular, propone varios métodos computacionales para el análisis y la descripción automáticas de partituras y grabaciones de audio de música de makam turco-otomana (MMTO). Las principales contribuciones de la tesis son el corpus de música que ha sido creado para el desarrollo de la investigación y la metodología para alineamiento de audio y partitura desarrollada para el análisis del corpus. Además, se presentan varias metodologías nuevas para análisis computacional en el contexto de las tareas comunes de MIR que son relevantes para MMTO. Algunas de estas tareas son, por ejemplo, extracción de la melodía predominante, identificación de la tónica, estimación de tempo, reconocimiento de makam, análisis de afinación, análisis estructural y análisis de progresión melódica. Estas metodologías constituyen las partes de un sistema completo para la exploración de grandes corpus de MMTO llamado Dunya-makam. La tesis comienza presentando el corpus de música de makam turcootomana de CompMusic. El corpus incluye 2200 partituras, más de 6500 grabaciones de audio, y los metadatos correspondientes. Los datos han sido recopilados, anotados y revisados con la ayuda de expertos. Utilizando criterios como compleción, cobertura y calidad, validamos el corpus y mostramos su potencial para investigación. De hecho, nuestro corpus constituye el recurso de mayor tamaño y representatividad disponible para la investigación computacional de MMTO. Varios conjuntos de datos para experimentación han sido igualmente creados a partir del corpus, con el fin de desarrollar y evaluar las metodologías específicas propuestas para las diferentes tareas computacionales abordadas en la tesis. La parte dedicada al análisis de las partituras se centra en el análisis estructural a nivel de sección y de frase. Los márgenes de frase son identificados automáticamente usando uno de los métodos de segmentación existentes más avanzados. Los márgenes de sección son extraídos usando una heurística específica al formato de las partituras. A continuación, se emplea un método de nueva creación basado en análisis gráfico para establecer similitudes a través de estos elementos estructurales en cuanto a melodía y letra, así como para etiquetar relaciones semióticamente. La sección de análisis de audio de la tesis repasa el estado de la cuestión en cuanto a análisis de los aspectos melódicos en grabaciones de MMTO. Se proponen modificaciones de métodos existentes para extracción de melodía predominante para ajustarlas a MMTO. También se presentan mejoras de metodologías tanto para identificación de tónica basadas en distribución de alturas, como para reconocimiento de makam. La metodología para alineación de audio y partitura constituye el grueso de la tesis. Aborda los retos específicos de esta cultura según vienen determinados por las características musicales, las representaciones relacionadas con la teoría musical y la praxis oral de MMTO. Basada en varias técnicas tales como deformaciones dinámicas de tiempo subsecuentes, transformada de Hough y modelos de Markov de longitud variable, la metodología de alineamiento de audio y partitura está diseñada para tratar las diferencias estructurales entre partituras y grabaciones de audio. El método es robusto a la presencia de expresiones melódicas no anotadas, desviaciones de tiempo en las grabaciones, y diferencias de tónica y afinación. La metodología utiliza los resultados del análisis de partitura y audio para enlazar el audio y los datos simbólicos. Además, la metodología de alineación se usa para obtener una descripción informada por partitura de las grabaciones de audio. El análisis de audio informado por partitura no sólo simplifica los pasos para la extracción de características de audio que de otro modo requerirían sofisticados métodos de procesado de audio, sino que también mejora sustancialmente su rendimiento en comparación con los resultados obtenidos por los métodos más avanzados basados únicamente en datos de audio. Las metodologías analíticas presentadas en la tesis son aplicadas al corpus de música de makam turco-otomana de CompMusic e integradas en una aplicación web dedicada al descubrimiento culturalmente específico de música. Algunas de las metodologías ya han sido aplicadas a otras tradiciones musicales, como música indostaní, carnática y griega. Siguiendo las mejores prácticas de investigación en abierto, todos los datos creados, las herramientas de software y los resultados de análisis está disponibles públicamente. Las metodologías, las herramientas y el corpus en sí mismo ofrecen grandes oportunidades para investigaciones futuras en muchos campos tales como recuperación de información musical, musicología computacional y educación musical.
Aquesta tesi adreça diverses deficiències en l’estat actual de les metodologies d’extracció d’informació de música (Music Information Retrieval o MIR). En particular, la tesi proposa diverses estratègies per analitzar i descriure automàticament partitures musicals i enregistraments d’actuacions musicals de música Makam Turca Otomana (OTMM en les seves sigles en anglès). Les contribucions principals de la tesi són els corpus musicals que s’han creat en el context de la tesi per tal de dur a terme la recerca i la metodologia de alineament d’àudio amb la partitura que s’ha desenvolupat per tal d’analitzar els corpus. A més la tesi presenta diverses noves metodologies d’anàlisi computacional d’OTMM per a les tasques més habituals en MIR. Alguns exemples d’aquestes tasques són la extracció de la melodia principal, la identificació del to musical, l’estimació de tempo, el reconeixement de Makam, l’anàlisi de la afinació, l’anàlisi de la estructura musical i l’anàlisi de la progressió melòdica. Aquest seguit de metodologies formen part del sistema Dunya-makam per a la exploració de grans corpus musicals d’OTMM. En primer lloc, la tesi presenta el corpus CompMusic Ottoman- Turkish makam music. Aquest inclou 2200 partitures musicals, més de 6500 enregistraments d’àudio i metadata complementària. Les dades han sigut recopilades i anotades amb ajuda d’experts en aquest repertori musical. El corpus ha estat validat en termes de d’exhaustivitat, cobertura i qualitat i mostrem aquí el seu potencial per a la recerca. De fet, aquest corpus és el la font més gran i representativa de OTMM que pot ser utilitzada per recerca computacional. També s’han desenvolupat diversos subconjunts de dades per al desenvolupament i evaluació de les metodologies específiques proposades per a les diverses tasques computacionals que es presenten en aquest tesi. La secció de la tesi que tracta de l’anàlisi de partitures musicals se centra en l’anàlisi estructural a nivell de secció i de frase musical. Els límits temporals de les frases musicals s’identifiquen automàticament gràcies a un metodologia de segmentació d’última generació. Els límits de les seccions s’extreuen utilitzant un seguit de regles heurístiques determinades pel format de les partitures musicals. Posteriorment s’utilitza un nou mètode basat en anàlisi gràfic per establir semblances entre aquest elements estructurals en termes de melodia i text. També s’utilitza aquest mètode per etiquetar les relacions semiòtiques existents. La següent secció de la tesi tracta sobre anàlisi d’àudio i en particular revisa les tecnologies d’avantguardia d’anàlisi dels aspectes melòdics en OTMM. S’hi proposen adaptacions dels mètodes d’extracció de melodia existents que s’ajusten a OTMM. També s’hi presenten millores en metodologies de reconeixement de makam i en identificació de tònica basats en distribució de to. La metodologia d’alineament d’àudio amb partitura és el nucli de la tesi. Aquesta aborda els reptes culturalment específics imposats per les característiques musicals, les representacions de la teoria musical i la pràctica oral particulars de l’OTMM. Utilitzant diverses tècniques tal i com Dynamic Time Warping, Hough Transform o models de Markov de durada variable, la metodologia d’alineament esta dissenyada per enfrontar les diferències estructurals entre partitures musicals i enregistraments d’àudio. El mètode és robust inclús en presència d’expressions musicals no anotades en la partitura, desviacions de tempo ocorregudes en les actuacions musicals i diferències de tònica i afinació. La metodologia aprofita els resultats de l’anàlisi de la partitura i l’àudio per enllaçar la informació simbòlica amb l’àudio. A més, la tècnica d’alineament s’utilitza per obtenir descripcions de l’àudio fonamentades en la partitura. L’anàlisi de l’àudio fonamentat en la partitura no només simplifica les fases d’extracció de característiques d’àudio que requeririen de mètodes de processament d’àudio sofisticats, sinó que a més millora substancialment els resultats comparat amb altres mètodes d´ultima generació que només depenen de contingut d’àudio. Les metodologies d’anàlisi presentades s’han utilitzat per analitzar el corpus CompMusic Ottoman-Turkish makam music i s’han integrat en una aplicació web destinada al descobriment musical de tradicions culturals específiques. Algunes de les metodologies ja han sigut també aplicades a altres tradicions musicals com la Hindustani, la Carnàtica i la Grega. Seguint els preceptes de la investigació oberta totes les dades creades, eines computacionals i resultats dels anàlisis estan disponibles obertament. Tant les metodologies, les eines i el corpus en si mateix proporcionen àmplies oportunitats per recerques futures en diversos camps de recerca tal i com la musicologia computacional, la extracció d’informació musical i la educació musical. Traducció d’anglès a català per Oriol Romaní Picas.

Styles APA, Harvard, Vancouver, ISO, etc.

43

Huang, Kuen-Feng, et 黃崑峰. « Multiple Sequence Alignment Using the Clustering Method ». Thesis, 2001. http://ndltd.ncl.edu.tw/handle/49117729014706086846.

Texte intégral

Résumé :

碩士
國立中山大學
資訊工程學系研究所
89
The multiple sequence alignment (MSA) is a fundamental technique of molecular biology. Biological sequences are aligned with each other vertically in order to show the similarities and differences among them. Due to its importance, many algorithms have been proposed. With dynamic programming, finding the optimal alignment for a pair of sequences can be done in O(n2) time, where n is the length of the two strings. Unfortunately, for the general optimization problem of aligning k sequences of length n , O(nk) time is required. In this thesis, we shall first propose an efficient group alignment method to perform the alignment between two groups of sequences. Then we shall propose a clustering method to build the tree topology for merging. The clustering method is based on the concept that the two sequences having the longest distance should be split into two clusters. By our experiments, both the alignment quality and required time of our algorithm are better than those of NJ (neighbor joining) algorithm and Clustal W algorithm.

Styles APA, Harvard, Vancouver, ISO, etc.

44

Liu, Cheng-Hao, et 劉承澔. « Parallel Training of Joint Cascade Face Detection and Alignment ». Thesis, 2017. http://ndltd.ncl.edu.tw/handle/57776317585335133312.

Texte intégral

Résumé :

碩士
國立暨南國際大學
資訊工程學系
105
Thisthesisaimstoimplement the techniqueofjoint cascade facedetection and alignment (JDA) proposed by Chen et al. at Microsoft Research Asia in 2014. Three modifications are made to improve performance of JDA. First, a local coordinate system is introduced into the featureextractionprocesstoimprovetherotationinvarianceoftheimagefeature. Second,the negative sample extraction process is modified to increase the diversity of negative samples by including more non-face images. Third, the JDA training process is parallelized by using OpenMP and multiple computers. The developed JDA model is tested by using three facial image data sets, namely, FDDB, CVF, and CelebA. Experimental results show that our JDA model outperforms the other JDA implementations available on the Internet. Furthermore, the parallelized training process can reduce the training time considerably. Additionally, it is analyzed why the implemented JDA model is not as accurate as the model of the original paper.

Styles APA, Harvard, Vancouver, ISO, etc.

45

Lo, Chia-Hao, et 駱嘉濠. « Efficient Joint Clustering Algorithms in Optimization and Geography Domains ». Thesis, 2008. http://ndltd.ncl.edu.tw/handle/68401276653610303039.

Texte intégral

Résumé :

碩士
國立交通大學
資訊科學與工程研究所
96
Prior works have elaborated on the problem of joint clustering in the optimization and geography domains. However, prior works neither clearly specify the connected constraint in the geography domain nor propose efficient algorithms. In this paper, we formulate the joint clustering problem in which a connected constraint and the number of clusters should be specified. We propose an algorithm K-means with Local Search (abbreviated as KLS) consisting of three phases: the transformation phase, the coarse clustering phase and the fine clustering phase. First, data objects that fulfill the connected constraint is represented as the ConGraph (standing for CONnected Graph). In light of the ConGraph, by adapting the concept of K-means and local search, an algorithm is devised to coarsely cluster objects for the purpose of efficiency. Then, these coarse cluster results are fine tuned to minimize the dissimilarity of the data objects in the optimization domain. Our experimental results show that KLS can find correct clusters efficiently.

Styles APA, Harvard, Vancouver, ISO, etc.

46

Li, Yi-Lun, et 李懿倫. « User Clustering and Precoding for Joint Multicast-Unicast Beamforming ». Thesis, 2019. http://ndltd.ncl.edu.tw/handle/xbk8q7.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

47

Yi-ChenChen et 陳奕蓁. « Face Alignment and Recognition Based on Joint Feature Shape Regression ». Thesis, 2016. http://ndltd.ncl.edu.tw/handle/4vuz83.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

48

Li, Wan Ju, et 李婉如. « Patellofemoral Joint Alignment and Muscle Performance of Lower Extremity in Overweight Adults ». Thesis, 2019. http://ndltd.ncl.edu.tw/cgi-bin/gs32/gsweb.cgi/login?o=dnclcdr&s=id=%22107CGU05595012%22.&searchmode=basic.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

49

YU, TSAI-LING, et 余彩綾. « Joint Range and Angle Estimation with Signal Clustering in FMCW Radar ». Thesis, 2019. http://ndltd.ncl.edu.tw/handle/u2duwy.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

50

Widjajahakim, Rafael. « Association between patellofemoral joint alignment and morphology to superlateral Hoffa's fat pad edema ». Thesis, 2016. https://hdl.handle.net/2144/19482.

Texte intégral

Résumé :

BACKGROUND: Osteoarthritis is a leading cause of disability in people of 65 and older. Researches have shown several possible factors leading to knee osteoarthritis development. Patellofemoral joint maltracking has been thought to be associated with or caused edema in the knee; which is thought to be the early signs of osteoarthritis. Hoffa's fat pad is an intra-articular component of knee located under the kneecap. It has also been suggested as one marker for osteoarthritis, when MRI shows a presence of edema in it. Recently, edema in the superolateral region of Hoffa's fat pad has been hypothesized as a distinct signal than the edema on other regions. There is an interest in finding the relation of this superolateral edema with other factors of osteoarthritis development. OBJECTIVE: This thesis research project is aimed to assess the relation of kneecap-thighbone (patellofemoral) joint alignment, femoral trochlea morphology, and patellar height to edema in the superolateral region of Hoffa’s fat pad especially in the population with average age above 65 years old. The hypothesis is that the flatter trochlear morphology and abnormal patella alignment will have higher risk of superolateral edema. METHODS: This is a cross-sectional study using a subset data from Multicenter Osteoarthritis (MOST) study, specifically at 60-month visit. This study measured the patellofemoral measurements (sulcus angle, lateral and medial trochlear inclination angle, trochlear angle, Insall-Salvati ratio, patellar tilt angle, and bisect offset) as the predictor variables, and semiquantitative scoring of MRI edema in superolateral Hoffa’s fat pad as the outcome variable. Logistic regression analyses were performed to find the strongly associated patellofemoral measurements to superolateral Hoffa’s fat pad edema. RESULTS: From the logistic regression analysis, trochlear angle, Insall-Salvati ratio, and bisect offset were highly associated with the superolateral edema. A further analysis, by categorizing the measurements to quartiles, was found that only the highest quartiles of both bisect offset and trochlear angle are associated with superolateral Hoffa’s fat pad edema when compared to the reference quartile. All quartiles of Insall-Salvati ratio are strongly associated with superolateral edema when compared to the reference quartile. CONCLUSION: Current study presents that people above 65 years old with high trochlear angle, extreme lateral patellar translation or bisect offset, and high patella riding have high risk of having superolateral Hoffa’s fat pad edema.

Styles APA, Harvard, Vancouver, ISO, etc.

Thèses sur le sujet « Joint clustering with alignment »

Créez une référence correcte selon les styles APA, MLA, Chicago, Harvard et plusieurs autres