Thèses : « Dbsan »

1

Neto, AntÃnio Cavalcante AraÃjo. « G2P-DBSCAN : Data Partitioning Strategy and Distributed Processing of DBSCAN with MapReduce ». Universidade Federal do CearÃ, 2015. http://www.teses.ufc.br/tde_busca/arquivo.php?codArquivo=15592.

Texte intégral

Résumé :

CoordenaÃÃo de AperfeÃoamento de Pessoal de NÃvel Superior
Clustering is a data mining technique that brings together elements of a data set such so that the elements of a same group are more similar to each other than to those from other groups. This thesis studied the problem of processing the clustering based on density DBSCAN algorithm distributedly through the MapReduce paradigm. In the distributed processing it is important that the partitions are processed have approximately the same size, provided that the total of the processing time is limited by the time the node with a larger amount of data leads to complete the computation of data assigned to it. For this reason we also propose a data set partitioning strategy called G2P, which aims to distribute the data set in a balanced manner between partitions and takes into account the characteristics of DBSCAN algorithm. More Specifically, the G2P strategy uses grid and graph structures to assist in the division of space low density regions. Distributed DBSCAN the algorithm is done processing MapReduce two stages and an intermediate phase that identifies groupings that can were divided into more than one partition, called candidates from merging. The first MapReduce phase applies the algorithm DSBCAN the partitions individually. The second and checks correcting, if necessary, merge candidate clusters. Experiments using data sets demonstrate that true G2P-DBSCAN strategy overcomes the baseline adopted in all the scenarios, both at runtime and quality of obtained partitions.
ClusterizaÃao Ã uma tÃcnica de mineraÃÃo de dados que agrupa elementos de um conjunto de dados de forma que os elementos que pertencem ao mesmo grupo sÃo mais semelhantes entre si que entre elementos de outros grupos. Nesta dissertaÃÃo nÃs estudamos o problema de processar o algoritmo de clusterizaÃÃo baseado em densidade DBSCAN de maneira distribuÃda atravÃs do paradigma MapReduce. Em processamentos distribuÃdos Ã importante que as partiÃÃes de dados a serem processadas tenham tamanhos proximadamente iguais, uma vez que o tempo total de processamento Ã delimitado pelo tempo que o nÃ com uma maior quantidade de dados leva para finalizar a computaÃÃo dos dados a ele atribuÃdos. Por essa razÃo nÃs tambÃm propomos uma estratÃgia de particionamento de dados, chamada G2P, que busca distribuir o conjunto de dados de forma balanceada entre as partiÃÃes e que leva em consideraÃÃo as caracterÃsticas do algoritmo DBSCAN. Mais especificamente, a estratÃgia G2P usa estruturas de grade e grafo para auxiliar na divisÃo do espaÃo em regiÃes de baixa densidade. JÃ o processamento distribuÃdo do algoritmo DBSCAN se dÃ por meio de duas fases de processamento MapReduce e uma fase intermediÃria que identifica clusters que podem ter sido divididos em mais de uma partiÃÃo, chamados de candidatos Ã junÃÃo. A primeira fase de MapReduce aplica o algoritmo DSBCAN nas partiÃÃes de dados individualmente, e a segunda verifica e corrige, caso necessÃrio, os clusters candidatos Ã junÃÃo. Experimentos utilizando dados reais mostram que a estratÃgia G2P-DBSCAN se comporta melhor que a soluÃÃo utilizada para comparaÃÃo em todos os cenÃrios considerados, tanto em tempo de execuÃÃo quanto em qualidade das partiÃÃes obtidas.

Styles APA, Harvard, Vancouver, ISO, etc.

2

Araújo, Neto Antônio Cavalcante. « G2P-DBSCAN : Estratégia de Particionamento de Dados e de Processamento Distribuído fazer DBSCAN com MapReduce ». reponame:Repositório Institucional da UFC, 2016. http://www.repositorio.ufc.br/handle/riufc/16372.

Texte intégral

Résumé :

ARAÚJO NETO, Antônio Cavalcante. G2P-DBSCAN: Estratégia de Particionamento de Dados e de Processamento Distribuído fazer DBSCAN com MapReduce. 2016. 63 f. Dissertação (mestrado em ciência da computação)- Universidade Federal do Ceará, Fortaleza-CE, 2016.
Submitted by Elineudson Ribeiro (elineudsonr@gmail.com) on 2016-03-22T19:21:02Z No. of bitstreams: 1 2016_dis_acaraujoneto.pdf: 5671232 bytes, checksum: ce91a85d087f63206ad938133c163560 (MD5)
Approved for entry into archive by Rocilda Sales (rocilda@ufc.br) on 2016-04-25T12:33:12Z (GMT) No. of bitstreams: 1 2016_dis_acaraujoneto.pdf: 5671232 bytes, checksum: ce91a85d087f63206ad938133c163560 (MD5)
Made available in DSpace on 2016-04-25T12:33:12Z (GMT). No. of bitstreams: 1 2016_dis_acaraujoneto.pdf: 5671232 bytes, checksum: ce91a85d087f63206ad938133c163560 (MD5) Previous issue date: 2016
Clustering is a data mining technique that brings together elements of a data set such so that the elements of a same group are more similar to each other than to those from other groups. This thesis studied the problem of processing the clustering based on density DBSCAN algorithm distributedly through the MapReduce paradigm. In the distributed processing it is important that the partitions are processed have approximately the same size, provided that the total of the processing time is limited by the time the node with a larger amount of data leads to complete the computation of data assigned to it. For this reason we also propose a data set partitioning strategy called G2P, which aims to distribute the data set in a balanced manner between partitions and takes into account the characteristics of DBSCAN algorithm. More Specifically, the G2P strategy uses grid and graph structures to assist in the division of space low density regions. Distributed DBSCAN the algorithm is done processing MapReduce two stages and an intermediate phase that identifies groupings that can were divided into more than one partition, called candidates from merging. The first MapReduce phase applies the algorithm DSBCAN the partitions individually. The second and checks correcting, if necessary, merge candidate clusters. Experiments using data sets demonstrate that true G2P-DBSCAN strategy overcomes the baseline adopted in all the scenarios, both at runtime and quality of obtained partitions.
Clusterizaçao é uma técnica de mineração de dados que agrupa elementos de um conjunto de dados de forma que os elementos que pertencem ao mesmo grupo são mais semelhantes entre si que entre elementos de outros grupos. Nesta dissertação nós estudamos o problema de processar o algoritmo de clusterização baseado em densidade DBSCAN de maneira distribuída através do paradigma MapReduce. Em processamentos distribuídos é importante que as partições de dados a serem processadas tenham tamanhos proximadamente iguais, uma vez que o tempo total de processamento é delimitado pelo tempo que o nó com uma maior quantidade de dados leva para finalizar a computação dos dados a ele atribuídos. Por essa razão nós também propomos uma estratégia de particionamento de dados, chamada G2P, que busca distribuir o conjunto de dados de forma balanceada entre as partições e que leva em consideração as características do algoritmo DBSCAN. Mais especificamente, a estratégia G2P usa estruturas de grade e grafo para auxiliar na divisão do espaço em regiões de baixa densidade. Já o processamento distribuído do algoritmo DBSCAN se dá por meio de duas fases de processamento MapReduce e uma fase intermediária que identifica clusters que podem ter sido divididos em mais de uma partição, chamados de candidatos à junção. A primeira fase de MapReduce aplica o algoritmo DSBCAN nas partições de dados individualmente, e a segunda verifica e corrige, caso necessário, os clusters candidatos à junção. Experimentos utilizando dados reais mostram que a estratégia G2P-DBSCAN se comporta melhor que a solução utilizada para comparação em todos os cenários considerados, tanto em tempo de execução quanto em qualidade das partições obtidas.

Styles APA, Harvard, Vancouver, ISO, etc.

3

Mahmod, Shad. « Deinterleaving pulse trains with DBSCAN and FART ». Thesis, Uppsala universitet, Avdelningen för systemteknik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-379718.

Texte intégral

Résumé :

Studying radar pulses and looking for certain patterns is critical in order to assess the threat level of the environment around an antenna. In order to study the electromagnetic pulses emitted from a certain radar, one must first register and identify these pulses. Usually there are several active transmitters in anenvironment and an antenna will register pulses from various sources. In order to study the different pulse trains, the registered pulses first have to be sorted sothat all pulses that are transmitted from one source are grouped together. This project aims to solve this problem, using Density-Based Spatial Clustering of Applications with Noise (DBSCAN) and compare the results with those obtained by Fuzzy Adaptive Resonance Theory (FART). We aim to further dig into these methods and map out how factors such as feature selection and training time affects the results. A solution based on the DBSCAN method is proposed which allows online clustering of new points introduced to the system. The methods are implemented and tested on simulated data. The data consists of pulse trains from simulated transmitters with unique behaviors. The deployed methods are then tested varying the parameters of the models as well as the number of pulse trains they are asked to deinterleave. The results when applying the models are then evaluated using the adjusted Rand index (ARI). The results indicate that in most cases using all possible data (in this case the angle of arrival, radio frequency, pulse width and amplitudes of the pulses) generate the best results. Rescaling the data further improves the result and tuning the parameters shows that the models work well when increasing the number of emitters. The results also indicate that the DBSCAN method can be used to get accurate estimates of the number of emitters transmitting. The online DBSCAN generates a higher ARI than FART on the simulated data set but has a higher worst case computational cost.

Styles APA, Harvard, Vancouver, ISO, etc.

4

Кулік, Євгенія Сергіївна, Евгения Сергеевна Кулик, Євгенія Сергіївна Кулік, Захар Вікторович Козлов, Захар Викторович Козлов et Zakhar Viktorovych Kozlov. « Використання SR-дерев у щільнісному методі кластеризації числових просторів DBSCAN ». Thesis, Cумський державний університет, 2016. http://essuir.sumdu.edu.ua/handle/123456789/46528.

Texte intégral

Résumé :

Алгоритм щільнісного кластерного аналізу DBSCAN при обробці просторових даних дуже часто потребує знаходження ε-околу точки у n-вимірному просторі. Цей крок алгоритму виконується для кожної точки щонайменше один раз [1], отже покращення ефективності знаходження ε-околу матиме значний вплив на результати роботи алгоритму загалом.

Styles APA, Harvard, Vancouver, ISO, etc.

5

Kästel, Arne Morten, et Christian Vestergaard. « Comparing performance of K-Means and DBSCAN on customer support queries ». Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-260252.

Texte intégral

Résumé :

In customer support, there are often a lot of repeat questions, and questions that does not need novel answers. In a quest to increase the productivity in the question answering task within any business, there is an apparent room for automatic answering to take on some of the workload of customer support functions. We look at clustering corpora of older queries and texts as a method for identifying groups of semantically similar questions and texts that would allow a system to identify new queries that fit a specific cluster to receive a connected, automatic response. The approach compares the performance of K-means and density-based clustering algorithms on three different corpora using document embeddings encoded with BERT. We also discuss the digital transformation process, why companies are unsuccessful in their implementation as well as the possible room for a new more iterative model.
I kundtjänst förekommer det ofta upprepningar av frågor samt sådana frågor som inte kräver unika svar. I syfte att öka produktiviteten i kundtjänst funktionens arbete att besvara dessa frågor undersöks metoder för att automatisera en del av arbetet. Vi undersöker olika metoder för klusteranalys, applicerat på existerande korpusar innehållande texter så väl som frågor. Klusteranalysen genomförs i syfte att identifiera dokument som är semantiskt lika, vilket i ett automatiskt system för frågebevarelse skulle kunna användas för att besvara en ny fråga med ett existerande svar. En jämförelse mellan hur K-means och densitetsbaserad metod presterar på tre olika korpusar vars dokumentrepresentationer genererats med BERT genomförs. Vidare diskuteras den digitala transformationsprocessen, varför företag misslyckas avseende implementation samt även möjligheterna för en ny mer iterativ modell.

Styles APA, Harvard, Vancouver, ISO, etc.

6

Legoabe, Reginald Sethole. « An Impact Assessment of the DBSA/ SALGA ICT Internship Programme : A Case Study ». Thesis, North-West University (South Africa), 2010. http://hdl.handle.net/10919/71530.

Texte intégral

Résumé :

The aim of this descriptive and evaluative research study is to assess the impact achieved by the DBSA/ SALGA ICT Internship Programme, a national local government internship programme that was undertaken by South African Local Government Association (SALGA) and the Development Bank of Southern Africa (DBSA) Development Fund in partnership with the South African Communication Forum (SACF), Department of Provincial and Local Government (DPLG) and SIEMENS Ltd Training Institute.A supply-side internship programme in nature, its strategic objectives were to train and equip young South Africans with ICT skills, give youth learners workplace experience in the ICT functional area within their respective municipalities, create employment opportunities for youth and economic development for local municipalities in alleviation of scarce-critical ICT skills to capacitate the local government sector.Forty (x40) learners from Further Education and Training (FET) Colleges were recruited from various rural municipalities to undertake ICT training with SIEMENS Training Institute and given workplace experiential learning with fifteen x15 host municipalities under the banner of the South African Local Government Association. This descriptive and evaluative study is undertaken in a case study format with particular interest in the retention levels of graduate learners endowed with scarce skills in the context of the skills challenges facing the local government sector. The study also focuses on unique challenges and interventional measures that could be undertaken by designers of public education and training programmes to ensure efficiency of internship programmes and optimal benefit of publicly-funded internship programmes to youth learners. This research study not only has internal validity in terms of the operational delivery of internship programmes but also external contextual importance for publicly-funded learning and placement programmes within the larger human resources development (HRD) domain and local government sector. In the process of conducting the study, stratified random sampling is utilised due to the multi-stakeholder nature of the programme. A stratified survey sample comprising fifty percent (50%) of the total survey population of forty (40) former ICT learners who participated in the internship programme is selected whilst a sample of sixty percent (60%) of the fifteen (15) host municipalities who participated in the programme is also selected using stratified random sampling. The findings of the study indicate that participation in the DBSA/ SALGA ICT Internship Programme has positively promoted the employability of former ICT learners. All ICT learner respondents confirmed current employment within the ICT functional area. Research findings indicate that the local government sector has derived short term retention and benefit from the programme but has not been able to retain the skills of the majority of former ICT learners in the long term. Although most of the former ICT learners have since migrated out of the local government sector, most former learners are still employed in the ICT field within the public sector and to some extent in the private sector of the South African economy. The study found out that most learners were able to assimilate and find employment within their host municipalities or were able to find ICT-related employment soon after graduation. The research findings of this impact assessment study indicate that the DBSA/ SALGA ICT Internship Programme has positively transformed young inexperienced graduates into responsible young adults through the development of key life skills and work experiences to enable them to successfully navigate the path between the classroom and the challenging world of work.
Mini-dissertation submitted in partial fullfilment of the requirements for the North-West University Yunibesiti Ya Bokone-Bophirima Noordwest-Universiteit Masters Degree in Business Administration (MBA) Human Resource Management (HRM) North-West University (NWU) Graduate School of Business & Government

Styles APA, Harvard, Vancouver, ISO, etc.

7

Мельникова, П. А. « Поиск аномалий во временных рядах на основе оценивания их параметров ». Thesis, ХНУРЕ, 2021. https://openarchive.nure.ua/handle/document/16436.

Texte intégral

Résumé :

Anomaly detection in every domain is an important topic of discussion and the knowledge about the effective ways to perform anomaly detection is an important skill to have for any Data Scientist. A problem of anomaly detection in time series is considered. A method is proposed, which allows to detect anomalies in time series efficiently by their parameters.

Styles APA, Harvard, Vancouver, ISO, etc.

8

Крамар, Іван Ігорович. « Кластеризація даних, що збираються з відібраних джерел науково-технічної інформації ». Bachelor's thesis, КПІ ім. Ігоря Сікорського, 2020. https://ela.kpi.ua/handle/123456789/36639.

Texte intégral

Résumé :

Метою роботи є застосування кластеризації науково-технічних даних не тільки для наглядного представлення об’єктів, але і для розпізнавання нових. Метою кластеризації документів є автоматичне виявлення груп семантично схожих документів серед заданої фіксованої множини. Групи формуються тільки на основі попарної схожості описів документів, і ніякі характеристики цих груп не задаються заздалегідь. Для видалення неінформативних слів розглянуто методи: видалення стоп-слів, стеммінг, N-діаграми, приведення регістра. Для виділення ключових слів та класифікації результатів використано наступні методи: словниковий, статистичний та побудований на основі Y-інтерпретації закону Бредфорда, TF-IDF міра, F-міра та метод лакричних шаблонів. Для реалізації системи кластерного аналізу науково-технічних даних обрано високорівневу мову програмування Python, реалізація інтерпретатора 2.7. Даний програмний код читається легше, його багаторазове використання і обслуговування виконується набагато простіше, ніж використання програмного коду на інших мовах.
The aim of the work is to use the clustering of scientific and technical data not only for the visual representation of objects, but also for the recognition of new ones. The purpose of document clustering is to automatically detect groups of semantically similar documents among a given fixed set. Groups are formed only on the basis of pairwise similarity of document descriptions, and no characteristics of these groups are set in advance. Methods for deleting uninformative words are considered: deletion of stop words, stemming, N-diagrams, case reduction. The following methods were used to highlight keywords and classify the results: dictionary, statistical and based on the Y-interpretation of Bradford's law, TF-IDF measure, F-measure and the method of licorice patterns. Python programming language was chosen to implement the system of cluster analysis of scientific and technical data, a high-level, the implementation of the interpreter 2.7. This program code is easier to read, its reuse and maintenance is much easier than using program code in other languages.
Целью работы является применение кластеризации научно-технических данных не только для наглядного представления объектов, но и для распознавания новых. Целью кластеризации документов является автоматическое выявление групп семантически похожих документов среди заданной фиксированной множества. Группы формируются только на основе попарно сходства описаний документов, и никакие характеристики этих групп не задаются заранее. Для удаления неинформативных слов рассмотрены методы: удаление стоп-слов, стемминг, N-диаграммы, приведение регистра. Для выделения ключевых слов и классификации результатов использованы следующие методы: словарный, статистический и построен на основе Y-интерпретации закона Брэдфорда, TF-IDF мера, F-мера и способ лакричным шаблонов. Для реализации системы кластерного анализа научно-технических данных избран высокоуровневый язык программирования Python, реализация интерпретатора 2.7. Данный программный код читается легче, его многократное использование и обслуживание выполняется гораздо проще, чем использование программного кода на других языках.

Styles APA, Harvard, Vancouver, ISO, etc.

9

Kannamareddy, Aruna Sai. « Density and partition based clustering on massive threshold bounded data sets ». Kansas State University, 2017. http://hdl.handle.net/2097/35467.

Texte intégral

Résumé :

Master of Science
Department of Computing and Information Sciences
William H. Hsu
The project explores the possibility of increasing efficiency in the clusters formed out of massive data sets which are formed using threshold blocking algorithm. Clusters thus formed are denser and qualitative. Clusters that are formed out of individual clustering algorithms alone, do not necessarily eliminate outliers and the clusters generated can be complex, or improperly distributed over the data set. The threshold blocking algorithm, a current research paper from Michael Higgins of Statistics Department on other hand, in comparison with existing algorithms performs better in forming the dense and distinctive units with predefined threshold. Developing a hybridized algorithm by implementing the existing clustering algorithms to re-cluster these units thus formed is part of this project. Clustering on the seeds thus formed from threshold blocking Algorithm, eases the task of clustering to the existing algorithm by eliminating the overhead of worrying about the outliers. Also, the clusters thus generated are more representative of the whole. Also, since the threshold blocking algorithm is proven to be fast and efficient, we now can predict a lot more decisions from large data sets in less time. Predicting the similar songs from Million Song Data Set using such a hybridized algorithm is considered as the data set for the evaluation of this goal.

Styles APA, Harvard, Vancouver, ISO, etc.

10

Huo, Shiyin. « Detecting Self-Correlation of Nonlinear, Lognormal, Time-Series Data via DBSCAN Clustering Method, Using Stock Price Data as Example ». The Ohio State University, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=osu1321989426.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

11

Lind, Emma, et Mattias Stahre. « Deinterleaving of radar pulses with batch processing to utilize parallelism ». Thesis, KTH, Kommunikationssystem, CoS, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-273737.

Texte intégral

Résumé :

The threat level (specifically in this thesis, for aircraft) in an environment can be determined by analyzing radar signals. This task is critical and has to be solved fast and with high accuracy. The received electromagnetic pulses have to be identiﬁed in order to classify a radar emitter. Usually, there are several emitters transmitting radar pulses at the same time in an environment. These pulses need to be sorted into groups, where each group contains pulses from the same emitter. This thesis aims to find a fast and accurate solution to sort the pulses in parallel. The selected approach analyzes batches of pulses in parallel to exploit the advantages of a multi-threaded Central Processing Unit (CPU) or a Graphics Processing Unit (GPU). Firstly, a suitable clustering algorithm had to be selected. Secondly, an optimal batch size had to be determined to achieve high clustering performance and to rapidly process the batches of pulses in parallel. A quantitative method based on experiments was used to measure clustering performance, execution time, system response, and parallelism as a function of batch sizes when using the selected clustering algorithm. The algorithm selected for clustering the data was Density-based Spatial Clustering of Applications with Noise (DBSCAN) because of its advantages, such as not having to specify the number of clusters in advance, its ability to find arbitrary shapes of a cluster in a data set, and its low time complexity. The evaluation showed that implementing parallel batch processing is possible while still achieving high clustering performance, compared to a sequential implementation that used the maximum likelihood method.An optimal batch size in terms of data points and cutoff time is hard to determine since the batch size is very dependent on the input data. Therefore, one batch size might not be optimal in terms of clustering performance and system response for all streams of data. A solution could be to determine optimal batch sizes in advance for different streams of data, then adapt a batch size depending on the stream of data. However, with a high level of parallelism, an additional delay is introduced that depends on the difference between the time it takes to collect data points into a batch and the time it takes to process the batch, thus the system will be slower to output its result for a given batch compared to a sequential system. For a time-critical system, a high level of parallelism might be unsuitable since it leads to slower response times.
Genom analysering av radarsignaler i en miljö kan hotnivån bestämmas. Detta är en kritisk uppgift som måste lösas snabbt och med bra noggrannhet. För att kunna klassificera en specifik radar måste de elektromagnetiska pulserna identifieras. Vanligtvis sänder flera emittrar ut radarpulser samtidigt i en miljö. Dessa pulser måste sorteras i grupper, där varje grupp innehåller pulser från en och samma emitter. Målet med denna avhandling är att ta fram ett sätt att snabbt och korrekt sortera dessa pulser parallellt. Den valda metoden använder grupper av data som analyserades parallellt för att nyttja fördelar med en multitrådad Central Processing Unit (CPU) eller en Central Processing Unit (CPU) or a Graphics Processing Unit (GPU). Först behövde en klustringsalgoritm väljas och därefter en optimal gruppstorlek för den valda algoritmen. Gruppstorleken baserades på att grupperna kunde behandlas parallellt och snabbt, samt uppnå tillförlitlig klustring. En kvantitativ metod användes som baserades på experiment genom att mäta klustringens tillförlitlighet, exekveringstid, systemets svarstid och parallellitet som en funktion av gruppstorlek med avseende på den valda klustringsalgoritmen. Density-based Spatial Clustering of Applications with Noise (DBSCAN) valdes som algoritm på grund av dess förmåga att hitta kluster av olika former och storlekar utan att på förhand ange antalet kluster för en mängd datapunkter, samt dess låga tidskomplexitet. Resultaten från utvärderingen visade att det är möjligt att implementera ett system med grupper av pulser och uppnå bra och tillförlitlig klustring i jämförelse med en sekventiell implementation av maximum likelihood-metoden. En optimal gruppstorlek i antal datapunkter och cutoff tid är svårt att definiera då storleken är väldigt beroende på indata. Det vill säga, en gruppstorlek måste inte nödvändigtvis vara optimal för alla typer av indataströmmar i form av tillförlitlig klustring och svarstid för systemet. En lösning skulle vara att definiera optimala gruppstorlekar i förväg för olika indataströmmar, för att sedan kunna anpassa gruppstorleken efter indataströmmen. Det uppstår en fördröjning i systemet som är beroende av differensen mellan tiden det tar att skapa en grupp och exekveringstiden för att bearbeta en grupp. Denna fördröjning innebär att en parallell grupp-implementation aldrig kommer kunna vara lika snabb på att producera sin utdata som en sekventiell implementation. Detta betyder att det i ett tidskritiskt system förmodligen inte är optimalt att parallellisera mycket eftersom det leder till långsammare svarstid för systemet.

Styles APA, Harvard, Vancouver, ISO, etc.

12

Amlinger, Anton. « An Evaluation of Clustering and Classification Algorithms in Life-Logging Devices ». Thesis, Linköpings universitet, Programvara och system, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-121630.

Texte intégral

Résumé :

Using life-logging devices and wearables is a growing trend in today’s society. These yield vast amounts of information, data that is not directly overseeable or graspable at a glance due to its size. Gathering a qualitative, comprehensible overview over this quantitative information is essential for life-logging services to serve its purpose. This thesis provides an overview comparison of CLARANS, DBSCAN and SLINK, representing different branches of clustering algorithm types, as tools for activity detection in geo-spatial data sets. These activities are then classified using a simple model with model parameters learned via Bayesian inference, as a demonstration of a different branch of clustering. Results are provided using Silhouettes as evaluation for geo-spatial clustering and a user study for the end classification. The results are promising as an outline for a framework of classification and activity detection, and shed lights on various pitfalls that might be encountered during implementation of such service.

Styles APA, Harvard, Vancouver, ISO, etc.

13

Roos, Johannes, et Sven Lindqvist. « Identifiering av områden med förhöjd olycksrisk för cyklister baserad på cykelhjälmsdata ». Thesis, Malmö universitet, Fakulteten för teknik och samhälle (TS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva-20421.

Texte intégral

Résumé :

Antalet cyklister i Sverige väntas öka under kommande år, men trots stora insatser för trafiksäkerheten minskar inte antalet allvarliga cykelolyckor i samma takt som bilolyckor. Denna studie har tittat på cykelhjälm-tillverkaren Hövdings data som samlats in från deras kunder. Hjälmen fungerar som en krockkudde som löses ut vid en kraftig huvudrörelse som sker vid en olycka. Datan betsår av GPS-positioner tillsammans med ett värde från en Support Vector Machine (SVM) som indikerar hur nära en hjälm är att registrera en olycka och därmed lösas ut. Syftet med studien var att analysera denna data från cyklister i Malmö för att se om det går att identifiera platser som är överrepresenterade i antalet förhöjda SVM-nivåer, och om dessa platser speglar verkliga, potentiellt farliga trafiksituationer. Density-based spatial clustering of applications with noise (DBSCAN) användes för att identifiera kluster av förhöjda SVM-nivåer. DBSCAN är en oövervakad maskininlärningsalgoritm som ofta används för att klustra på spatial data med brusdata i datamängden. Från dessa kluster räknades antalet unika cykelturer som genererat en förhöjd SVM-nivå i klustret, samt totala antalet cykelturer som passerat genom klustret. 405 kluster identifierades och sorterades på flest unika cykelturer som genererat en förhöjd SVM-nivå, varpå de 30 översta valdes ut för närmare analys. För att validera klusterna mot registrerade cykelolyckor hämtades data från från Swedish Traffic Accident Data Acquisition (STRADA), den nationella olycksdatabasen i Sverige. De trettio utvalda klustren hade 0,082\% cykelolyckor per unik cykeltur i klustren och för resterande 375 kluster var siffran 0,041\%. Antal olyckor per kluster i de utvalda trettio klustren var 0,46 och siffran för övriga kluster var 0,064. De topp trettio klustren kategoriserades sedan i tre kategorier. De kluster som hade en eventuell förklaring till förhöjda SVM-nivåer, som farthinder och kullersten gavs kategori 1. Hövding har kommunicerat att sådana inslag i underlaget kan generera en lägre grad av förhöjd SVM-nivå. Kategori 2 var de kluster som hade haft en byggarbetsplats inom klustret. Kategori 3 var de kluster som inte kunde förklaras med något av de andra två kategorierna. Andel olyckor per unik cykeltur i kluster som tillhörde kategori 1 var 0,068\%, för kategori 2 0,071\% och kategori 3 0,106\%. Resultaten indikerar att denna data är användbar för att identifiera platser med förhöjd olycksrisk för cyklister. Datan som behandlats i denna studie har en rad svagheter i sig varpå resultaten bör tolkas med försiktigthet. Exempelvis är datamängden från en kort tidsperiod, ca 6 månader, varpå säsongsbetingat cykelbeteende inte är representerat i dataunderlaget. Det antas även förekomma en del brusdata, vilket eventuellt har påverkat resultaten. Men det finns potential i denna typ av data att i framtiden, när mer data samlats in, med större träffsäkerhet kunna identifiera olycksdrabbade platser för cyklister.
The number of cyclists in Sweden is expected to increase in the coming years, but despite major efforts in road safety, the number of serious bicycle accidents does not decrease at the same rate as car accidents.This study has looked at the data collected by the bicycle helmet manufacturer Hövding's customers. The helmet acts as an airbag that is triggered when a strong head movement occurs in the event of an accident. The data consists of GPS positions along with a Support Vector machine (SVM)- generated value which indicates how close the helmet is to registering an accident, and thus is triggered. The purpose of the study was to analyze this data from cyclists in Malmö to see if it's possible to identify places that are over-represented in the number of elevated SVM levels, and whether these sites reflect real, potentially dangerous traffic situations. Density-based spatial clustering of applications with noise (DBSCAN) was used to identify clusters of elevated SVM levels. DBSCAN is an unsupervised clustering algorithm widely used when clustering on spatial data. From these clusters, the number of unique cycle trips that generated an elevated SVM level in the cluster was calculated, as well as the total number of cycle trips that passed through each cluster. 405 clusters were identified and sorted by the highest number of unique bike rides that generated an elevated SVM level, whereupon the top 30 were selected for further analysis. In order to validate the clusters against registered bicycle accidents, data were obtained from the Swedish Traffic Accident Data Acquisition (STRADA), the national accident database in Sweden. The thirty selected clusters had 0.082 \% cycling accidents per unique cycle trip in the clusters and for the remaining 375 clusters the figure was 0.041 \%. The number of accidents per cluster in the selected thirty clusters was 0.46 and the number for the other clusters was 0.064. The top thirty clusters were then categorized into three categories. The clusters that had a possible explanation for elevated SVM levels, such as cruise barriers and cobblestones were given category 1. Hövding has communicated that such elements in the substrate can generate elevated SVM levels. Category 2 was the clusters that had a construction site within the cluster. Category 3 was the clusters that could not be explained by any of the other two categories. The proportion of accidents per unique cycle trip in clusters belonging to category 1 was 0.068 \%, for category 2 0.071 \% and for category 3 0.106 \%.The results indicate that this data is useful for identifying places with increased risk of accidents for cyclists. The data processed in this study has a number of weaknesses in itself and the results should be interpreted with caution. For example, the data is from a short period of time, about 6 months, whereby seasonal cycling behavior is not represented in the data set. The data set is also assumed to contain some noisy data, which may have affected the results. But there is potential in this type of data so that in the future, when more data is collected, it can be used to identify places with higher risk of accidents for cyclists with greater accuracy.

Styles APA, Harvard, Vancouver, ISO, etc.

14

Zhang, Xianjie, et Sebastian Bogic. « Datautvinning av klickdata : Kombination av klustring och klassifikation ». Thesis, KTH, Hälsoinformatik och logistik, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-230630.

Texte intégral

Résumé :

Ägare av webbplatser och applikationer tjänar ofta på att användare klickar på deras länkar. Länkarna kan bland annat vara reklam eller varor som säljs. Det finns många studier inom dataanalys angående om en sådan länk kommer att bli klickad, men få studier fokuserar på hur länkarna kan justeras för att bli klickade. Problemet som företaget Flygresor.se har är att de saknar ett verktyg för deras kunder, resebyråer, att analysera deras biljetter och därefter justera attributen för resorna. Den efterfrågade lösningen var en applikation som gav förslag på hur biljetterna skulle förändras för att bli mer klickade och på såsätt kunna sälja fler resor. I detta arbete byggdes en prototyp som använder sig av två olika datautvinningsmetoder, klustring med algoritmen DBSCAN och klassifikation med algoritmen k-NN. Algoritmerna användes tillsammans med en utvärderingsprocess, kallad DNNA, som analyserade resultatet från dessa två algoritmer och gav förslag på förändringar av artikelns attribut. Kombinationen av algoritmerna tillsammans med DNNA testades och utvärderades som lösning till problemet. Programmet lyckades förutse vilka attribut av biljetter som behövde justeras för att biljetterna skulle bli mer klickade. Rekommendationerna av justeringar var rimliga men eftersom andra liknande verktyg inte hade publicerats kunde detta arbetes resultat inte jämföras.
Owners of websites and applications usually profits through users that clicks on their links. These can be advertisements or items for sale amongst others. There are many studies about data analysis where they tell you if a link will be clicked, but only a few that focus on what needs to be adjusted to get the link clicked. The problem that Flygresor.se have is that they are missing a tool for their customers, travel agencies, that analyses their tickets and after that adjusts the attributes of those trips. The requested solution was an application which gave suggestions about how to change the tickets in a way that would make it more clicked and in that way, make more sales. A prototype was constructed which make use of two different data mining methods, clustering with the algorithm DBSCAN and classification with the algorithm knearest neighbor. These algorithms were used together with an evaluation process, called DNNA, which analyzes the result from the algorithms and gave suggestions about changes that could be done to the attributes of the links. The combination of the algorithms and DNNA was tested and evaluated as the solution to the problem. The program was able to predict what attributes of the tickets needed to be adjusted to get the tickets more clicks. ‘The recommendations of adjustments were reasonable but this result could not be compared to similar tools since they had not been published.

Styles APA, Harvard, Vancouver, ISO, etc.

15

Stefan, Vasic, et Lindgren Nicklas. « Product categorisation using machine learning ». Thesis, KTH, Data- och elektroteknik, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-209031.

Texte intégral

Résumé :

Machine learning is a method in data science for analysing large data sets and extracting hidden patterns and common characteristics in the data. Corporations often have access to databases containing great amounts of data that could contain valuable information. Navetti AB wants to investigate the possibility to automate their product categorisation by evaluating different types of machine learning algorithms. This could increase both time- and cost efficiency. This work resulted in three different prototypes, each using different machine learning algorithms with the ability to categorise products automatically. The prototypes were tested and evaluated based on their ability to categorise products and their performance in terms of speed. Different techniques used for preprocessing data is also evaluated and tested. An analysis of the tests shows that when providing a suitable algorithm with enough data it is possible to automate the manual categorisation.
Maskininlärning är en metod inom datavetenskap vars uppgift är att analysera stora mängder data och hitta dolda mönster och gemensamma karaktärsdrag. Företag har idag ofta tillgång till stora mängder data som i sin tur kan innehålla värdefull information. Navetti AB vill undersöka möjligheten att automatisera sin produktkategorisering genom att utvärdera olika typer av maskininlärnings- algoritmer. Detta skulle dramatiskt öka effektiviteten både tidsmässigt och ekonomiskt. Resultatet blev tre prototyper som implementerar tre olika maskininlärnings-algoritmer som automatiserat kategoriserar produkter. Prototyperna testades och utvärderades utifrån dess förmåga att kategorisera och dess prestanda i form av hastighet. Olika tekniker som används för att förbereda data analyseras och utvärderas. En analys av testerna visar att med tillräckligt mycket data och en passande algoritm så är det möjligt att automatisera den manuella kategoriseringen.

Styles APA, Harvard, Vancouver, ISO, etc.

16

Rojas, Araya Javier Orlando. « Diseño de procesos para la segmentación de clientes según su comportamiento de compra y hábito de consumo en una empresa de consumo masivo ». Tesis, Universidad de Chile, 2017. http://repositorio.uchile.cl/handle/2250/145586.

Texte intégral

Résumé :

Magíster en Ingeniería de Negocios con Tecnologías de Información
La industria de alimentos de consumo masivo ha ido evolucionando en el tiempo. Los primeros canales de venta para llegar a los clientes finales fueron los almacenes de barrio los que se vieron fuertemente amenazados con la proliferación de grandes cadenas de supermercados. La aparición de internet también creó un nuevo canal que permite a los clientes finales hacer pedidos de productos y pagarlos a través de aplicaciones móviles para finalmente recibirlos en su domicilio. A pesar de esta evolución en los canales, los almacenes de barrio se niegan a desaparecer. Son muchos los clientes que siguen prefiriendo la atención amable y personalizada de los almacenes junto con un abanico amplio de productos y precios atractivos. La empresa no está ajena a esta realidad y también comercializa sus productos a clientes finales por los canales supermercado y almacenes. Respecto a los almacenes se atiende mensualmente una cantidad aproximada de 25.000 clientes a nivel nacional donde existe una mayor concentración en la zona centro del país. Segmentar a estos clientes para conocer su comportamiento de compra y hábito de consumo se ha convertido en el eje central de la estrategia de este canal. Ya no basta con analizar los reportes de ventas para aumentar el rendimiento del Área Comercial. Este proyecto tiene por objetivo agrupar los clientes del canal Almacenes de la empresa bajo los conceptos de comportamiento de compra y hábito de consumo y lograr caracterizarlos. Para alcanzar esta meta se utiliza la metodología de Ingeniería de Negocios que parte desde la definición del posicionamiento estratégico, el modelo de negocio, la arquitectura de procesos, el diseño detallado de los procesos, el diseño del apoyo tecnológico que soportará a los procesos y finalmente la construcción y puesta en marcha de la solución. Además se utilizarán algoritmos propios para este tipo de tareas como son DBSCAN y K-Means. Los resultados obtenidos permiten segmentar a los clientes en siete grupos para el comportamiento de compra y siete para el hábito consumo. Con esto se puede responder las preguntas de cuándo, cuánto y qué compran los clientes del canal. El beneficio del proyecto se traduce en un aumento de las ventas por acciones que permiten recuperar a clientes que están en proceso de fugarse y por aumento del ticket promedio de aquellos clientes que realizan compras frecuentes pero de muy bajo monto de facturación.
07/04/2022

Styles APA, Harvard, Vancouver, ISO, etc.

17

Gao, Yang. « Article identification for inventory list in a warehouse environment ». Thesis, Högskolan i Halmstad, Sektionen för Informationsvetenskap, Data– och Elektroteknik (IDE), 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-27132.

Texte intégral

Résumé :

In this paper, an object recognition system has been developed that uses local image features. In the system, multiple classes of objects can be recognized in an image. This system is basically divided into two parts: object detection and object identification. Object detection is based on SIFT features, which are invariant to image illumination, scaling and rotation. SIFT features extracted from a test image are used to perform a reliable matching between a database of SIFT features from known object images. Method of DBSCAN clustering is used for multiple object detection. RANSAC method is used for decreasing the amount of false detection. Object identification is based on 'Bag-of-Words' model. The 'BoW' model is a method based on vector quantization of SIFT descriptors of image patches. In this model, K-means clustering and Support Vector Machine (SVM) classification method are applied.

Styles APA, Harvard, Vancouver, ISO, etc.

18

Fumbata, Nandipha. « Industrial policy, institutions and industrial financing in South Africa : the role of the IDC and DBSA, and lessons from Brazil’s BNDES ». Thesis, Rhodes University, 2016. http://hdl.handle.net/10962/d1021278.

Texte intégral

Résumé :

Institutions, particularly development finance institutions (DFIs) have been instrumental in economic development and the implementation of industrial policy throughout history. In 2007, the South African government identified the country’s DFIs as key to the implementation of its new industrial policy framework with the main objective of job creation. This thesis examines the impact that South Africa’s DFIs, particularly the IDC and the DBSA, have had on employment creation from 2010 to 2014. A comparative institutional approach is adopted in a case study analysis examining the role of the state in industrial financing. The financing activities of Brazil’s BNDES are explored by comparison to determine if there are possible lessons for South Africa. An analysis of the DFIs’ financial and annual reports and government policy documents is conducted. The political settlements framework is used as a basis for understanding the balance of power within the country and the impact this has had on the country’s industrial policy and industrial finance. The thesis finds that the financing activities of South Africa’s DFIs, particularly the IDC, have been directed at large scale capital intensive projects, with a large portion of disbursements channelled towards mining and mineral beneficiation. These sectors have also facilitated the most number of jobs. Even though the activities of the country’s DFIs are consistent with South Africa’s industrial policy and have facilitated job creation, it is evident that these efforts have not been on a scale that is large enough to reduce unemployment. Despite the DFIs’ efforts, there has been an increase in the number of unemployed South Africans between 2010 and 2014.

Styles APA, Harvard, Vancouver, ISO, etc.

19

Tshabalala, Alfred Mshengu. « Financing public hospitals in South Africa : the case of the Industrial Development Corporation (IDC) and the Development Bank of Southern Africa (DBSA) ». Thesis, Stellenbosch : Stellenbosch University, 2015. http://hdl.handle.net/10019.1/97444.

Texte intégral

Résumé :

Thesis (MDF)--Stellenbosch University, 2015.
ENGLISH ABSTRACT: The research on this topic was motivated by the concern about the state of disarray in the public hospitals infrastructure and that due to budget constrain across the globe, the governments can no longer afford to provide public health services alone without the assistance of the private sector. South African public healthcare system continues to function in a state of disarray. Public hospitals serve the vast majority of the South African population, but are underfunded and in most cases these hospitals have ailing infrastructure. The study will look at the mechanism to fund public hospitals. This study examines the role that the Industrial Development Corporation and the Development Bank of Southern Africa can play in addressing the gap that exists in funding public hospitals. It will attempt to answer the following questions of concern, how is public healthcare financed in South Africa, what are the major challenges in financing public hospitals, what is the current role played by the Industrial Development Corporation and the Development Bank of Southern Africa in funding the public hospitals and what are the other possible solutions to address these challenges. The findings indicate that, despite the government funding the public hospitals there is a shortfall of funds for hospitals to complete the project that they are engage in. Chris Hani Baragwanath Academic Hospital and other five cases of hospitals in KwaZulu Natal were looked at and confirmed that there is definitely a gap in funding public hospitals

Styles APA, Harvard, Vancouver, ISO, etc.

20

Lundstedt, Magnus. « Implementation and Evaluation of Image Retrieval Method Utilizing Geographic Location Metadata ». Thesis, Uppsala universitet, Teknisk-naturvetenskapliga fakulteten, 2009. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-171865.

Texte intégral

Résumé :

Multimedia retrieval systems are very important today with millions of content creators all over the world generating huge multimedia archives. Recent developments allows for content based image and video retrieval. These methods are often quite slow, especially if applied on a library of millions of media items. In this research a novel image retrieval method is proposed, which utilizes spatial metadata on images. By finding clusters of images based on their geographic location, the spatial metadata, and combining this information with existing content- based image retrieval algorithms, the proposed method enables efficient presentation of high quality image retrieval results to system users. Clustering methods considered include Vector Quantization, Vector Quantization LBG and DBSCAN. Clustering was performed on three different similarity measures; spatial metadata, histogram similarity or texture similarity. For histogram similarity there are many different distance metrics to use when comparing histograms. Euclidean, Quadratic Form and Earth Mover’s Distance was studied. As well as three different color spaces; RGB, HSV and CIE Lab.

Styles APA, Harvard, Vancouver, ISO, etc.

21

Arce, Munoz Samuel. « Optimized 3D Reconstruction for Infrastructure Inspection with Automated Structure from Motion and Machine Learning Methods ». BYU ScholarsArchive, 2020. https://scholarsarchive.byu.edu/etd/8469.

Texte intégral

Résumé :

Infrastructure monitoring is being transformed by the advancements on remote sensing, unmanned vehicles and information technology. The wide interaction among these fields and the availability of reliable commercial technology are helping pioneer intelligent inspection methods based on digital 3D models. Commercially available Unmanned Aerial Vehicles (UAVs) have been used to create 3D photogrammetric models of industrial equipment. However, the level of automation of these missions remains low. Limited flight time, wireless transfer of large files and the lack of algorithms to guide a UAV through unknown environments are some of the factors that constraint fully automated UAV inspections. This work demonstrates the use of unsupervised Machine Learning methods to develop an algorithm capable of constructing a 3D model of an unknown environment in an autonomous iterative way. The capabilities of this novel approach are tested in a field study, where a municipal water tank is mapped to a level of resolution comparable to that of manual missions by experienced engineers but using $63\%$ . The iterative approach also shows improvements in autonomy and model coverage when compared to reproducible automated flights. Additionally, the use of this algorithm for different terrains is explored through simulation software, exposing the effectiveness of the automated iterative approach in other applications.

Styles APA, Harvard, Vancouver, ISO, etc.

22

Peabody, Dustin P. « Detecting Metagame Shifts in League of Legends Using Unsupervised Learning ». ScholarWorks@UNO, 2018. https://scholarworks.uno.edu/td/2482.

Texte intégral

Résumé :

Over the many years since their inception, the complexity of video games has risen considerably. With this increase in complexity comes an increase in the number of possible choices for players and increased difficultly for developers who try to balance the effectiveness of these choices. In this thesis we demonstrate that unsupervised learning can give game developers extra insight into their own games, providing them with a tool that can potentially alert them to problems faster than they would otherwise be able to find. Specifically, we use DBSCAN to look at League of Legends and the metagame players have formed with their choices and attempt to detect when the metagame shifts possibly giving the developer insight into what changes they should affect to achieve a more balanced, fun game.

Styles APA, Harvard, Vancouver, ISO, etc.

23

Lindroth, Henriksson Amelia. « Unsupervised Anomaly Detection on Time Series Data : An Implementation on Electricity Consumption Series ». Thesis, KTH, Matematisk statistik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-301731.

Texte intégral

Résumé :

Digitization of the energy industry, introduction of smart grids and increasing regulation of electricity consumption metering have resulted in vast amounts of electricity data. This data presents a unique opportunity to understand the electricity usage and to make it more efficient, reducing electricity consumption and carbon emissions. An important initial step in analyzing the data is to identify anomalies. In this thesis the problem of anomaly detection in electricity consumption series is addressed using four machine learning methods: density based spatial clustering for applications with noise (DBSCAN), local outlier factor (LOF), isolation forest (iForest) and one-class support vector machine (OC-SVM). In order to evaluate the methods synthetic anomalies were introduced to the electricity consumption series and the methods were then evaluated for the two anomaly types point anomaly and collective anomaly. In addition to electricity consumption data, features describing the prior consumption, outdoor temperature and date-time properties were included in the models. Results indicate that the addition of the temperature feature and the lag features generally impaired anomaly detection performance, while the inclusion of date-time features improved it. Of the four methods, OC-SVM was found to perform the best at detecting point anomalies, while LOF performed the best at detecting collective anomalies. In an attempt to improve the models' detection power the electricity consumption series were de-trended and de-seasonalized and the same experiments were carried out. The models did not perform better on the decomposed series than on the non-decomposed.
Digitaliseringen av elbranschen, införandet av smarta nät samt ökad reglering av elmätning har resulterat i stora mängder eldata. Denna data skapar en unik möjlighet att analysera och förstå fastigheters elförbrukning för att kunna effektivisera den. Ett viktigt inledande steg i analysen av denna data är att identifiera möjliga anomalier. I denna uppsats testas fyra olika maskininlärningsmetoder för detektering av anomalier i elförbrukningsserier: densitetsbaserad spatiell klustring för applikationer med brus (DBSCAN), lokal avvikelse-faktor (LOF), isoleringsskog (iForest) och en-klass stödvektormaskin (OC-SVM). För att kunna utvärdera metoderna infördes syntetiska anomalier i elförbrukningsserierna och de fyra metoderna utvärderades därefter för de två anomalityperna punktanomali och gruppanomali. Utöver elförbrukningsdatan inkluderades även variabler som beskriver tidigare elförbrukning, utomhustemperatur och tidsegenskaper i modellerna. Resultaten tyder på att tillägget av temperaturvariabeln och lag-variablerna i allmänhet försämrade modellernas prestanda, medan införandet av tidsvariablerna förbättrade den. Av de fyra metoderna visade sig OC-SVM vara bäst på att detektera punktanomalier medan LOF var bäst på att detektera gruppanomalier. I ett försök att förbättra modellernas detekteringsförmåga utfördes samma experiment efter att elförbrukningsserierna trend- och säsongsrensats. Modellerna presterade inte bättre på de rensade serierna än på de icke-rensade.

Styles APA, Harvard, Vancouver, ISO, etc.

24

Bergqvist, Jonathan. « Study of Protein Interfaces with Clustering ». Thesis, Linköpings universitet, Bioinformatik, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-152471.

Texte intégral

Résumé :

Protein-protein interactions occur in nature and have different functions. The interacting surface between two interacting proteins contains the respective protein's interface residues. In this thesis, a series of Python scripts are presented which can perform interface-interface comparisons with the method InterComp, to obtain a distance matrix of different protein interfaces. The distance matrix can be studied with the use of clustering algorithms such as DBSCAN. The result from clustering using DBSCAN shows that for the 77,017 protein interfaces studied, a majority of the protein interfaces are part of a single cluster while most of the remaining interfaces are noise for the tested parameters Eps and MinPts. The conclusion of this thesis is the effect on the number of clusters for the tested parameters Eps and MinPts when performing DBSCAN.

Styles APA, Harvard, Vancouver, ISO, etc.

25

Jared, Mohammed Iqbal. « The development region as opposed to the "Homeland" as the essential element of regional development policy ». University of the Western Cape, 1991. http://hdl.handle.net/11394/7855.

Texte intégral

Résumé :

Magister Economicae - MEcon
This study is an evaluation of development strategies that have been followed in South Africa. Lebowa is used as a case study for an assessment of the present strategy. The basic question is whether or not it is economically, politically and socially effective to follow the "homeland" development strategy. This approach places "homeland" states within confined political borders. Development policies are also confined to these borders. An alternative is to follow a broader regional development strategy, that spans across both political and economic borders. This may provide a more feasible approach to development. The present regional pattern of development, which focuses mainly on industrial decentralization, is discussed. The evaluation of the present strategy explores various other alternatives which may provide for a more effective regional development policy. In this context an assessment of 'backward regions/homelands' is provided. The central problem addressed is the country or 'homeland' versus regional orientation. To understand the problem, the core-periphery view on South Africa's regional growth pattern, is utilized. The PWV, Durban/ Pinetown and the Cape metropole areas may be taken as "core", where most of the economic activity takes place. One can also distinguish between the "inner-periphery", which is close to the core, and the "outer-periphery", further away from the core and which includes the Black Homelands. This core-periphery approach provides an understanding of the polarisation effect, which results in the "homelands" becoming poorer, whilst the urban areas grow richer. The main criticism of the of the modernisation or diffusionist approach is that the "trickle-down" or spread-effect from the core to the other regions does not really take place. Thus, regional aspirations are not satisfied. The South African Government's attempts to counter some of the forces of concentration have been questionable. Within the context of the diffusionist paradigm, trickle-down effects have not occurred because of the super-imposition of a political ideology onto this approach. Rather these areas are the result of polarization (re-inforced by political consideration) brought about by the concept of separate development. It is clear that South Africa's approach to regional development is in a process of change. This is mainly due to the failure of the "homelands" strategy. Since the mid 1970's it has become increasingly clear that the "homelands" could not really become economically independent (and internationally recognised), and development strategy concentrating on each that the uneconomic and inefficient."homeland" would be uneconomists critical of this unified economy. planners and politicians. approach have suggested that the whole South African economy should be planned as one economy, even if the homelands still maintain political independence. The nine development region mapping of South Africa, Regions A - J, came about as a result of attempting to address South Africa as a more unified economy. Up to now, the proposed role of the regions have not been clearly stated

Styles APA, Harvard, Vancouver, ISO, etc.

26

Faccioli, Caterina. « Spatial analysis in pathomics : a network based method applied on fluorescence microscopy ». Master's thesis, Alma Mater Studiorum - Università di Bologna, 2022. http://amslaurea.unibo.it/25122/.

Texte intégral

Résumé :

Recently, some applications of spatial statistics in histopathology have been explored, also thanks to the development of innovative digital imaging techniques and machine learning algorithms. However, it seems that in all these studies only global spatial measures are considered, usually analysed in combination with other techniques that depart from spatial statistics. In this thesis we developed a new spatial statistic based method for histopathological image analysis, which exploits local spatial features derived from coordinates in space and area of the cells. This features are mostly based on reciprocal distance between cells and also includes network-related measures. The dataset we analysed consisted of many sections of lymphoid tissue, for which also fluorescence measures obtained with a particular multiplexing technique were available. We performed clustering on these fluorescence features in order to obtain some reference labels for our points. Then we applied a supervised learning algorithm in order to predict fluorescence labels from the spatial features. We measured the performance of our predictions by computing the difference between the accuracy of the classifier we applied and of a random classifier. What we obtained is that the accuracy score of our classifier was greater than the one of the dummy classifier in every image. From a qualitative point of view, by comparing the achieved predictions and the clustering of fluorescence features of our images we obtained good results (verified by a senior histopathologist), often managing to identify the zone around the germinal centres of the lymph nodes and other structures. We consider these results encouraging, since they prove the predictive capability of our spatial features towards biological structures. The potential of this work is big: these features could strongly enhance the results obtainable from fluorescence imaging, allowing to resolve previously undistinguishable structures.

Styles APA, Harvard, Vancouver, ISO, etc.

27

Hodzic, Amer, et Danny Hoang. « Detection of Deviations in Beehives Based on Sound Analysis and Machine Learning ». Thesis, Linnéuniversitetet, Institutionen för datavetenskap och medieteknik (DM), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-105316.

Texte intégral

Résumé :

Honeybees are an essential part of our ecosystem as they take care of most of the pollination in the world. They also produce honey, which is the main reason beekeeping was introduced in the first place. As the production of honey is affected by the living conditions of the honeybees, the beekeepers aim to maintain the health of the honeybee societies. TietoEVRY, together with HSB Living Lab, introduced connected beehives in a project named BeeLab. The goal of BeeLab is to provide a service to monitor and gain knowledge about honeybees using the data collected with different sensors. Today they measure weight, temperature, air pressure, and humidity. It is known that honeybees produce different sounds when different events are occurring in the beehive. Therefore BeeLab wants to introduce sound monitoring to their service. This project aims to investigate the possibility of detecting deviations in beehives based on sound analysis and machine learning. This includes recording sound from beehives followed by preprocessing of sound data, feature extraction, and applying a machine learning algorithm on the sound data. An experiment is done using Mel-Frequency Cepstral Coefficients (MFCC) to extract sound features and applying the DBSCAN machine learning algorithm to investigate the possibilities of detecting deviations in the sound data. The experiment showed promising results as deviating sounds used in the experiment were grouped into different clusters.

Styles APA, Harvard, Vancouver, ISO, etc.

28

Bungula, Wako Tasisa. « Bi-filtration and stability of TDA mapper for point cloud data ». Diss., University of Iowa, 2019. https://ir.uiowa.edu/etd/6918.

Texte intégral

Résumé :

TDA mapper is an algorithm used to visualize and analyze big data. TDA mapper is applied to a dataset, X, equipped with a filter function f from X to R. The output of the algorithm is an abstract graph (or simplicial complex). The abstract graph captures topological and geometric information of the underlying space of X. One of the interests in TDA mapper is to study whether or not a mapper graph is stable. That is, if a dataset X is perturbed by a small value, and denote the perturbed dataset by X∂, we would like to compare the TDA mapper graph of X to the TDA mapper graph of X∂. Given a topological space X, if the cover of the image of f satisfies certain conditions, Tamal Dey, Facundo Memoli, and Yusu Wang proved that the TDA mapper is stable. That is, the mapper graph of X differs from the mapper graph of X∂ by a small value measured via homology. The goal of this thesis is three-fold. The first is to introduce a modified TDA mapper algorithm. The fundamental difference between TDA mapper and the modified version is the modified version avoids the use of filter function. In comparing the mapper graph outputs, the proposed modified mapper is shown to capture more geometric and topological features. We discuss the advantages and disadvantages of the modified mapper. Tamal Dey, Facundo Memoli, and Yusu Wang showed that a filtration of covers induce a filtration of simplicial complexes, which in turn induces a filtration of homology groups. While Tamal Dey, Facundo Memoli, and Yusu Wang focused on TDA mapper's application to topological space, the second goal of this thesis is to show DBSCAN clustering gives a filtration of covers when TDA mapper is applied to a point cloud. Hence, DBSCAN gives a filtration of mapper graphs (simplicial complexes) and homology groups. More importantly, DBSCAN gives a filtration of covers, mapper graphs, and homology groups in three parameter directions: bin size, epsilon, and Minpts. Hence, there is a multi-dimensional filtration of covers, mapper graphs, and homology groups. We also note that single-linkage clustering is a special case of DBSCAN clustering, so the results proved to be true when DBSCAN is used are also true when single-linkage is used. However, complete-linkage does not give a filtration of covers in the direction of bin, hence no filtration of simpicial complexes and homology groups exist when complete-linkage is applied to cluster a dataset. In general, the results hold for any clustering algorithm that gives a filtration of covers. The third (and last) goal of this thesis is to prove that two multi-dimensional persistence modules (one: with respect to the original dataset, X; two: with respect to the ∂-perturbation of X) are 2∂-interleaved. In other words, the mapper graphs of X and X∂ differ by a small value as measured by homology.

Styles APA, Harvard, Vancouver, ISO, etc.

29

Hanna, Peter, et Erik Swartling. « Anomaly Detection in Time Series Data using Unsupervised Machine Learning Methods : A Clustering-Based Approach ». Thesis, KTH, Matematisk statistik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-273630.

Texte intégral

Résumé :

For many companies in the manufacturing industry, attempts to find damages in their products is a vital process, especially during the production phase. Since applying different machine learning techniques can further aid the process of damage identification, it becomes a popular choice among companies to make use of these methods to enhance the production process even further. For some industries, damage identification can be heavily linked with anomaly detection of different measurements. In this thesis, the aim is to construct unsupervised machine learning models to identify anomalies on unlabeled measurements of pumps using high frequency sampled current and voltage time series data. The measurement can be split up into five different phases, namely the startup phase, three duty point phases and lastly the shutdown phase. The approach is based on clustering methods, where the main algorithms of use are the density-based algorithms DBSCAN and LOF. Dimensionality reduction techniques, such as feature extraction and feature selection, are applied to the data and after constructing the five models of each phase, it can be seen that the models identifies anomalies in the data set given.
För flera företag i tillverkningsindustrin är felsökningar av produkter en fundamental uppgift i produktionsprocessen. Då användningen av olika maskininlärningsmetoder visar sig innehålla användbara tekniker för att hitta fel i produkter är dessa metoder ett populärt val bland företag som ytterligare vill förbättra produktionprocessen. För vissa industrier är feldetektering starkt kopplat till anomalidetektering av olika mätningar. I detta examensarbete är syftet att konstruera oövervakad maskininlärningsmodeller för att identifiera anomalier i tidsseriedata. Mer specifikt består datan av högfrekvent mätdata av pumpar via ström och spänningsmätningar. Mätningarna består av fem olika faser, nämligen uppstartsfasen, tre last-faser och fasen för avstängning. Maskinilärningsmetoderna är baserade på olika klustertekniker, och de metoderna som användes är DBSCAN och LOF algoritmerna. Dessutom tillämpades olika dimensionsreduktionstekniker och efter att ha konstruerat 5 olika modeller, alltså en för varje fas, kan det konstateras att modellerna lyckats identifiera anomalier i det givna datasetet.

Styles APA, Harvard, Vancouver, ISO, etc.

30

Shreepathi, Subrahmanya. « Dodecylbenzenesulfonic Acid : A Surfactant and Dopant for the Synthesis of Processable Polyaniline and its Copolymers ». Doctoral thesis, Universitätsbibliothek Chemnitz, 2006. http://nbn-resolving.de/urn:nbn:de:swb:ch1-200602029.

Texte intégral

Résumé :

Das Ziel der vorliegenden Arbeit ist die bessere Verarbeitung von Polyanilin (PANI), da dies bisher ein großer Nachteil unter leitfähigen Polymeren war. Dazu wird ein sperriges Tensid und Dotand, Dodecylbenzensulfonsäure (DBSA) verwendet. Zur Synthese der PANI kommen zwei verschiedene Methoden zur Anwendung, die in dieser Dissertation in zwei Kapiteln beschrieben werden. Im ersten Teil wurden in einem kleinen Reaktionsvolumen (250 mL) PANI-DBSA-Suspensionen synthetisiert, wobei mit einem binären Gemisch aus 2-Propanol und Wasser als Lösungsmittel gearbeitet wird um die Löslichkeit zu unterstützen. Die micellenunterstützte Synthese produziert grüne Dispersionen, welche nach länger als einem Jahr noch keine sichtbare Ausscheidung zeigen. Eine detaillierte spectroelektrochemische Untersuchung der PANI-DBSA-Nanokolloide wurde durchgeführt und gibt eine bessere Erklärung der Charge-Transfer-Prozesse zwischen PANI-Kolloiden und Elektrodenoberfläche. In einem alkalischen Medium ist das UV-Vis-Spektrum von der Beweglichkeit der Anionen und von einem elektrokinetischen Phänomen abhängig. Um den „metal-to-insulator”-Übergang zwischen PANI-Kolloiden, welcher durch pH-Wert-Änderung des Mediums geschehen kann, zu zeigen, wurden UV-Vis- und pre-resonanz-Raman-Spektroskopie verwendet. Im zweiten Teil der Dissertation wird zur Polymerisation von Anilin sowie seinen Copolymeren mit o-Toluidin eine neue Technik der Polymerisation beschrieben, welche durch inverse Emulsion erfolgt. Diese benutzt Benzoylperoxid, ein ungewöhnlicheres organisches Oxidationsmittel. Die erhaltenen PANI sind in gebräuchlichen organischen Lösungsmitteln, wie in Chloroform, vollständig löslich. Mit einer klar-transparenten, grünen Lösung von PANI können metallische Oberflächen oder Glas leicht tropfenbeschichtet werden. Zyklische Voltammetrie und spektroelektrochemische Verfahren kamen zum Einsatz, um die Elektroaktivität, das UV-Vis-Verhalten und die „metal-to-insulator”-Übergänge der chemisch synthetisierten PANI als Funktion des verwendeten Elektrodenpotentials zu untersuchen. Die elektrische Leitfähigkeit der Materialien ist relativ hoch (R = 10 ). SEM-Untersuchungen zeigen, dass die Menge des zugesetzten DBSA die Morphologie des Polymers stark beeinflusst. Aus in situ UV-Vis-spektroskopischen Messungen lässt sich eine gute elektrochromische Reversibilität des Polymers erkennen. DBSA kann Poly(o-toluidin) (POT) effektiv dotieren, auch wenn von der Methylgruppe eine sterische Hinderung ausgeht. Die spektroskopischen Untersuchungen, wie UV-Vis, FT-IR, Raman-Spektroskopie und zyklische Voltammetrie, zeigen deutlich, dass wirkliche Copolymere gebildet werden und die Möglichkeit von Kompositen nicht in Betracht kommt. Das entstandene Poly(anilin-co-o-toluidin) (PAT) ist in schwach polaren Lösungsmitteln wie Chloroform löslich. Wie erwartet, sind die elektrischen Leitfähigkeiten der Copolymere viel kleiner als die Leitfähigkeit von PANI-DBSA.

Styles APA, Harvard, Vancouver, ISO, etc.

31

Zanotti, Andrea. « Supporto a query geografiche efficienti su dati spaziali in ambiente Apache Spark ». Master's thesis, Alma Mater Studiorum - Università di Bologna, 2016.

Trouver le texte intégral

Résumé :

La tesi illustra la progettazione e implementazione di un supporto basato su tecnologia Apache Spark per l'analisi di dati di posizionamento in ambito Big Data. Dopo aver analizzato tre estensioni specifiche per il trattamento di dati geografici si è deciso di utilizzare il framework GeoSpark. Al suo interno è stata inserita la tecnica di clustering basata su densità DBSCAN ottimizzata per il funzionamento in architettura distribuita. È presente anche un layer dedicato all'ottimizzazione automatica per la configurazione dei parametri relativi al partizionamento del database sul cluster. Sono stati eseguiti test di funzionamento e integrazione per verificare il corretto comportamento delle funzionalità offerte e dimostrare l'integrazione con quelle già presenti. Infine è stata realizzata una sessione di test dedicata all'analisi delle prestazioni attraverso il servizio di cloud computing Amazon Web Services in particolare Amazon EMR. In questo capitolo la nostra soluzione è stata paragonata ad una precedente basata su tecnologia MongoDB per confrontarne le performance. Come dimostrato dai risultati sperimentali, il nostro supporto risulta essere computazionalmente più veloce e ottimizzato.

Styles APA, Harvard, Vancouver, ISO, etc.

32

Bjurenfalk, Jonatan, et August Johnson. « Automated error matching system using machine learning and data clustering : Evaluating unsupervised learning methods for categorizing error types, capturing bugs, and detecting outliers ». Thesis, Linköpings universitet, Programvara och system, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-177280.

Texte intégral

Résumé :

For large and complex software systems, it is a time-consuming process to manually inspect error logs produced from the test suites of such systems. Whether it is for identifyingabnormal faults, or finding bugs; it is a process that limits development progress, and requires experience. An automated solution for such processes could potentially lead to efficient fault identification and bug reporting, while also enabling developers to spend more time on improving system functionality. Three unsupervised clustering algorithms are evaluated for the task, HDBSCAN, DBSCAN, and X-Means. In addition, HDBSCAN, DBSCAN and an LSTM-based autoencoder are evaluated for outlier detection. The dataset consists of error logs produced from a robotic test system. These logs are cleaned and pre-processed using stopword removal, stemming, term frequency-inverse document frequency (tf-idf) and singular value decomposition (SVD). Two domain experts are tasked with evaluating the results produced from clustering and outlier detection. Results indicate that X-Means outperform the other clustering algorithms when tasked with automatically categorizing error types, and capturing bugs. Furthermore, none of the outlier detection methods yielded sufficient results. However, it was found that X-Means’s clusters with a size of one data point yielded an accurate representation of outliers occurring in the error log dataset. Conclusively, the domain experts deemed X-means to be a helpful tool for categorizing error types, capturing bugs, and detecting outliers.

Styles APA, Harvard, Vancouver, ISO, etc.

33

Persson, Pontus. « Identifying Early Usage Patterns That Increase User Retention Rates In A Mobile Web Browser ». Thesis, Linköpings universitet, Databas och informationsteknik, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-137793.

Texte intégral

Résumé :

One of the major challenges for modern technology companies is user retentionmanagement. This work focuses on identifying early usage patterns that signifyincreased retention rates in a mobile web browser.This is done using a targetedparallel implementation of the association rule mining algorithm FP-Growth.Different item subset selection techniques including clustering and otherstatistical methods have been used in order to reduce the mining time and allowfor lower support thresholds.A lot of interesting rules have been mined. The best retention-wise ruleimplies a retention rate of 99.5%. The majority of the rules analyzed in thiswork implies a retention rate increase between 150% and 200%.

Styles APA, Harvard, Vancouver, ISO, etc.

34

Jiménez, González Daniel. « ALGORITMOS DE CLUSTERING PARALELOS EN SISTEMAS DE RECUPERACIÓN DE INFORMACIÓN DISTRIBUIDOS ». Doctoral thesis, Universitat Politècnica de València, 2011. http://hdl.handle.net/10251/11234.

Texte intégral

Résumé :

La información es útil si cuando se necesita está disponible y se puede hacer uso de ella. La disponibilidad suele darse fácilmente cuando la información está bien estructurada y ordenada, y además, no es muy extensa. Pero esta situación no es la más común, cada vez se tiende más a que la cantidad de información ofrecida crezca de forma desmesurada, que esté desestructurada y que no presente un orden claro. La estructuración u ordenación manual es inviable debido a las dimensiones de la información a manejar. Por todo ello se hace clara la utilidad, e incluso la necesidad, de buenos sistemas de recuperación de información (SRI). Además, otra característica también importante es que la información tiende a presentarse de forma natural de manera distribuida, lo cual implica la necesidad de SRI que puedan trabajar en entornos distribuidos y con técnicas de paralelización. Esta tesis aborda todos estos aspectos desarrollando y mejorando métodos que permitan obtener SRI con mejores prestaciones, tanto en calidad de recuperación como en eficiencia computacional, los cuales además permiten trabajar desde el enfoque de sistemas ya distribuidos. El principal objetivo de los SRI será proporcionar documentos relevantes y omitir los considerados irrelevantes respecto a una consulta dada. Algunos de los problemas más destacables de los SRI son: la polisemia y la sinonimia; las palabras relacionadas (palabras que juntas tienen un signi cado y separadas otro); la enormidad de la información a manejar; la heterogeneidad de los documentos; etc. De todos ellos esta tesis se centra en la polisemia y la sinonimia, las palabras relacionadas (indirectamente mediante la lematización semántica) y en la enormidad de la información a manejar. El desarrollo de un SRI comprende básicamente cuatro fases distintas: el preprocesamiento, la modelización, la evaluación y la utilización. El preprocesamiento que conlleva las acciones necesarias para transformar los documentos de la colección en una estructura de datos con la información relevante de los documentos ha sido una parte importante del estudio de esta tesis. En esta fase nos hemos centrado en la reducción de los datos y estructuras a manejar, maximizando la información contenida. La modelización, ha sido la fase más analizada y trabajada en esta tesis, es la que se encarga de defi nir la estructura y comportamiento del SRI.
Jiménez González, D. (2011). ALGORITMOS DE CLUSTERING PARALELOS EN SISTEMAS DE RECUPERACIÓN DE INFORMACIÓN DISTRIBUIDOS [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/11234
Palancia

Styles APA, Harvard, Vancouver, ISO, etc.

35

Boccali, Matteo. « Tecniche di Machine Learning Non Supervisionato per riconoscimento licenze ». Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2020.

Trouver le texte intégral

Résumé :

Il lavoro svolto riguarda l'applicazione di tecniche di machine learning non supervisionato per il riconoscimento di modelli su un dataset formato da testi di licenze software. Vengono applicate tecniche di trasformazione del testo in vettori da poter utilizzare poi come dati per degli algoritmi non supervisionati. Sono state applicate diverse tecniche di riduzione e trasformazione delle dimensioni per cercare di aumentare la qualità della rappresentazione in vettori. Sono state inoltre applicate tecniche per cercare di ottimizzare i risultati degli algoritmi non supervisionati.

Styles APA, Harvard, Vancouver, ISO, etc.

36

Behara, Krishna Nikhil Sumanth. « Origin-destination matrix estimation using big traffic data : A structural perspective ». Thesis, Queensland University of Technology, 2019. https://eprints.qut.edu.au/132444/1/Krishna_Behara_Thesis.pdf.

Texte intégral

Résumé :

With ever increasing traffic demand, cities are facing more serious problems from traffic congestion. It is extremely important to have an accurate estimation of travel demand for strategic planning and control. Lack of such knowledge before implementing major transport infrastructure projects could result in huge economic losses. Thus, this research develops new methods using big traffic data and are thoroughly tested on the Brisbane network. These methods can be readily integrated into the existing state-of-the-art and practice for a better estimation of travel demand.

Styles APA, Harvard, Vancouver, ISO, etc.

37

Fešar, Marek. « Analýza dat na sociálních sítích s využitím dolování dat ». Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2014. http://www.nusl.cz/ntk/nusl-236086.

Texte intégral

Résumé :

The thesis presents general principles of data mining and it also focuses on specific needs of social networks. Certain social networks, chosen with respect to popularity and availability to Czech users, are discussed from various points of view. The benefits and drawbacks of each are also mentioned. Afterwards, one suitable API is selected for futher analysis. The project explains harvesting data via Twitter API and the process of mining of data from this particular network. Design of a mining algorithm inspired by density based clustering methods is described. The implementation is explained in its own chapter, preceded by thorough explanation of MVC architectural pattern. In the end some examples of usage of gathered knowledge are shown as well as possibility of future extensions.

Styles APA, Harvard, Vancouver, ISO, etc.

38

Hezoučký, Ladislav. « Nástroj pro shlukovou analýzu ». Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2010. http://www.nusl.cz/ntk/nusl-237169.

Texte intégral

Résumé :

The master' s thesis deals with cluster data analysis. There are explained basic concepts and methods from this domain. Result of the thesis is Cluster analysis tool, in which are implemented methods K-Medoids and DBSCAN. Adjusted results on real data are compared with programs Rapid Miner and SAS Enterprise Miner.

Styles APA, Harvard, Vancouver, ISO, etc.

39

Cova, Riccardo. « Analisi di dati citofluorimetrici con tecniche di Data Mining ». Master's thesis, Alma Mater Studiorum - Università di Bologna, 2012. http://amslaurea.unibo.it/4774/.

Texte intégral

Résumé :

Il citofluorimetro è uno strumento impiegato in biologia genetica per analizzare dei campioni cellulari: esso, analizza individualmente le cellule contenute in un campione ed estrae, per ciascuna cellula, una serie di proprietà fisiche, feature, che la descrivono. L’obiettivo di questo lavoro è mettere a punto una metodologia integrata che utilizzi tali informazioni modellando, automatizzando ed estendendo alcune procedure che vengono eseguite oggi manualmente dagli esperti del dominio nell’analisi di alcuni parametri dell’eiaculato. Questo richiede lo sviluppo di tecniche biochimiche per la marcatura delle cellule e tecniche informatiche per analizzare il dato. Il primo passo prevede la realizzazione di un classificatore che, sulla base delle feature delle cellule, classifichi e quindi consenta di isolare le cellule di interesse per un particolare esame. Il secondo prevede l'analisi delle cellule di interesse, estraendo delle feature aggregate che possono essere indicatrici di certe patologie. Il requisito è la generazione di un report esplicativo che illustri, nella maniera più opportuna, le conclusioni raggiunte e che possa fungere da sistema di supporto alle decisioni del medico/biologo.

Styles APA, Harvard, Vancouver, ISO, etc.

40

Di, Marzo Giuseppe. « Advanced Analytics per il Marketing : clustering dei clienti fidelizzati ». Master's thesis, Alma Mater Studiorum - Università di Bologna, 2018.

Trouver le texte intégral

Résumé :

Questo elaborato descrive un progetto di cluster analysis svolto in Iconsulting per un’azienda di fashion retail. Lo scopo di questo progetto è clusterizzare i clienti fidelizzati dell’azienda di fashion retail e testare varie piattaforme che permettono analisi avanzate, in particolare il clustering. Le clusterizzazioni sono state effettuate tramite il linguaggio R, sono stati sperimentati diversi algoritmi di clustering, tenendo in considerazione effcienza e qualità del risultato. K-means è stato considerato il più adatto degli algoritmi sperimentati. È stata svolta una piccola analisi sugli outliers, sempre utilizzando R, che ha portato a importanti riflessioni sul trattamento delle quantità monetarie. È stata sviluppata ed implementata in R una metrica di qualità basata sul coeffciente di silhouette a complessità temporale lineare; tale metrica è stata utilizzata per scegliere, a partire da diverse clusterizzazioni effettuate con k-means e variando il numero di cluster, la clusterizzazione a qualità massima. I clienti fidelizzati sono stati clusterizzati diverse volte considerando ogni volta diversi aspetti delle loro abitudini d’acquisto, data l’impossibilità di trattare il cliente al massimo livello di dettaglio. Ogni clusterizzazione è stata realizzata applicando ripetutamente l’algoritmo k-means al dataset preparato variando il k da 2 a 10, scegliendo poi il risultato a qualità massima. Le clusterizzazioni ottenute risultano soddisfacenti ed informative e verranno utilizzate per successive campagne di marketing e di CRM.

Styles APA, Harvard, Vancouver, ISO, etc.

41

Hlaváček, Martin. « Snížení paměťové náročnosti stavového zpracování síťového provozu ». Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2012. http://www.nusl.cz/ntk/nusl-236547.

Texte intégral

Résumé :

This master thesis deals with the problems of memory reduction in the stateful network traffic processing. Its goal is to explore new possibilities of memory reduction during network processing. As an introduction this thesis provides motivation and reasons for need to search new method for the memory reduction. In the following part there are theoretical analyses of NetFlow technology and two basic methods which can in principle reduce memory demands of stateful processing. Later on, there is described the design and implementation of solution which contains the application of these two methods to NetFlow architecture. The final part of this work summarizes the main properties of this solution during interaction with real data.

Styles APA, Harvard, Vancouver, ISO, etc.

42

Veta, Jacob E. « Analysis and Development of a Lower Extremity Osteological Monitoring Tool Based on Vibration Data ». Miami University / OhioLINK, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=miami1595879294258019.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

43

Hlosta, Martin. « Modul pro shlukovou analýzu systému pro dolování z dat ». Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2010. http://www.nusl.cz/ntk/nusl-237158.

Texte intégral

Résumé :

This thesis deals with the design and implementation of a cluster analysis module for currently developing datamining system DataMiner on FIT BUT. So far, the system lacked cluster analysis module. The main objective of the thesis was therefore to extend the system of such a module. Together with me, Pavel Riedl worked on the module. We have created a common part for all the algorithms so that the system can be easily extended to other clustering algorithms. In the second part, I extended the clustering module by adding three density based clustering aglorithms - DBSCAN, OPTICS and DENCLUE. Algorithms have been implemented and appropriate sample data was chosen to verify theirs functionality.

Styles APA, Harvard, Vancouver, ISO, etc.

44

Tomešová, Tereza. « Autonomní jednokanálový deinterleaving ». Master's thesis, Vysoké učení technické v Brně. Fakulta strojního inženýrství, 2021. http://www.nusl.cz/ntk/nusl-445470.

Texte intégral

Résumé :

This thesis deals with an autonomous single-channel deinterleaving. An autonomous single-channel deinterleaving is a separation of the received sequence of impulses from more than one emitter to sequences of impulses from one emitter without a human assistance. Methods used for deinterleaving could be divided into single-parameter and multiple-parameter methods according to the number of parameters used for separation. This thesis primarily deals with multi-parameter methods. As appropriate methods for an autonomous single-channel deinterleaving DBSCAN and variational bayes methods were chosen. Selected methods were adjusted for deinterleaving and implemented in programming language Python. Their efficiency is examined on simulated and real data.

Styles APA, Harvard, Vancouver, ISO, etc.

45

Dywili, Nomxolisi Ruth. « Development of Metal Nanoparticle-Doped Polyanilino-Graphene Oxide High Performance Supercapacitor Cells ». University of the Western Cape, 2018. http://hdl.handle.net/11394/6251.

Texte intégral

Résumé :

Philosophiae Doctor - PhD (Chemistry)
Supercapacitors, also known as ultracapacitors or electrochemical capacitors, are considered one of the most important subjects concerning electricity or energy storage which has proven to be problematic for South Africa. In this work, graphene oxide (GO) was supported with platinum, silver and copper nanoparticles anchored with dodecylbenzenesulphonic acid (DBSA) doped polyaniline (PANI) to form nanocomposites. Their properties were investigated with different characterization techniques. The high resolution transmission electron microscopy (HRTEM) revealed GO's nanosheets to be light, flat, transparent and appeared to be larger than 1.5 ?m in thickness. This was also confirmed by high resolution scanning electron microscopy (HRSEM) with smooth surfaces and wrinkled edges observed with the energy dispersive X-ray analysis (EDX) confirming the presence of the functional groups such as carbon and oxygen. The HRTEM analysis of decorated GO with platinum, silver and copper nanoparticles (NPs) revealed small and uniformly dispersed NPs on the surface of GO with mean particle sizes of 2.3 ± 0.2 nm, 2.6 ± 0.3 nm and 3.5 ± 0.5 nm respectively and the surface of GO showed increasing roughness as observed in HRSEM micrographs. The X-ray fluorescence microscopy (XRF) and EDX confirmed the presence of the nanoparticles on the surface of GO as platinum, silver and copper which appeared in abundance in each spectra. Anchoring the GO with DBSA doped PANI revealed that single GO sheets were embedded into the polymer latex, which caused the DBSA-PANI particles to become adsorbed on their surfaces. This process then appeared as dark regions in the HRTEM images. Morphological studies by HRSEM also supported that single GO sheets were embedded into the polymer latex as composite formation appeared aggregated and as bounded particles with smooth and toothed edges.

Styles APA, Harvard, Vancouver, ISO, etc.

46

SANTOS, Danilo Abreu. « Recomendação pedagógica para melhoria da aprendizagem em redações ». Universidade Federal de Campina Grande, 2015. http://dspace.sti.ufcg.edu.br:8080/jspui/handle/riufcg/550.

Texte intégral

Résumé :

Submitted by Johnny Rodrigues (johnnyrodrigues@ufcg.edu.br) on 2018-05-02T13:28:09Z No. of bitstreams: 1 DANILO ABREU SANTOS - DISSERTAÇÃO PPGCC 2015..pdf: 2955839 bytes, checksum: 45290d85cdffbae0320f29fc5e633cb6 (MD5)
Made available in DSpace on 2018-05-02T13:28:09Z (GMT). No. of bitstreams: 1 DANILO ABREU SANTOS - DISSERTAÇÃO PPGCC 2015..pdf: 2955839 bytes, checksum: 45290d85cdffbae0320f29fc5e633cb6 (MD5) Previous issue date: 2015-08-24
A modalidade de educação online tem crescido significativamente nas últimas décadas em todo o mundo, transformando-se em uma opção viável tanto àqueles que não dispõem de tempo para trabalhar a sua formação acadêmica na forma presencial quanto àqueles que desejam complementá-la. Há também os que buscam ingressar no ensino superior por meio do Exame Nacional do Ensino Médio (ENEM) e utilizam esta modalidade de ensino para complementar os estudos, objetivando sanar lacunas deixadas pela formação escolar. O ENEM é composto por questões objetivas (subdivididas em 4 grandes áreas: Linguagens e Códigos; Matemática; Ciências Humanas; e Ciências Naturais) e a questão subjetiva (redação). Segundo dados do Ministério da Educação (MEC), mais de 50% dos candidatos que fizeram a prova do ENEM em 2014 obtiveram desempenho abaixo de 500 pontos na redação. Esta pesquisa utilizará recomendações pedagógicas baseadas no gênero textual utilizado pelo ENEM, visando prover uma melhoria na escrita da redação dissertativa. Para tanto, foi utilizado, como ferramenta experimental, o ambiente online de aprendizagem MeuTutor. O ambiente possui um módulo de escrita de redação, no qual é utilizada para correção dos textos elaborados pelos alunos, a metodologia de avaliação por pares, cujo pesquisas mostram que os resultados avaliativos são significativos e bastante similares aos obtidos por professores especialistas. Entretanto, apenas apresentar a pontuação da redação por si só, não garante a melhora da produção textual do aluno avaliado. Desta forma, visando um ganho em performance na produção da redação, foi adicionado ao MeuTutor um módulo de recomendação pedagógica baseado em 19 perfis resultados do uso de algoritmos de mineração de dados (DBScan e Kmeans) nos microdados do ENEM 2012 disponibilizado pelo MEC. Estes perfis foram agrupados em 6 blocos que possuíam um conjunto de tarefas nas áreas de escrita, gramática e coerências e concordância textual. A validação destas recomendações foi feita em um experimento de 3 ciclos, onde em cada ciclo o aluno: escreve a redação; avalia os seus pares; realiza a recomendação pedagógica que foi recebida. A partir da análise estatística destes dados, foi possível constatar que o modelo estratégico de recomendação utilizado nesta pesquisa, possibilitou um ganho mensurável na qualidade da produção textual.
Online education has grown significantly in recent years throughout the world, becoming a viable option for those who don’t have the time to pursuit traditional technical training or academic degree. In Brazil, people seek to enter higher education through the National Secondary Education Examination (ENEM) and use online education to complement their studies, aiming to remedy gaps in their school formation. The ENEM consists of objective questions (divided into 4 main areas: languages and codes; Mathematics; Social Sciences, and Natural Sciences), and the subjective questions (the essay). According to the Brazilian Department of Education (MEC), more than 50% of the candidates who took the test (ENEM) in 2014, obtained performance below 500 points (out of a 1000 maximum points) for their essays. This research uses educational recommendations based on the five official correction criteria for the ENEM essays, to improve writing. Thus, this research used an experimental tool in an online learning environment called MeuTutor. The mentioned learning environment has an essay writing/correction module. The correction module uses peer evaluation techniques, for which researches show that the results are, significantly, similar to those obtained by specialists’ correction. However, to simply display the scores for the criteria does not guarantee an improvement in students’ writing. Thus, to promote that, an educational recommendation module was added to MeuTutor. It is based on 19 profiles obtained mining data from the 2012 ENEM. It uses the algorithms DBSCAN and K-Means, and grouped the profiles into six blocks, to which a set of tasks were associated to the areas of writing, grammar and coherence, and textual agreement. The validation of these recommendations was made in an experiment with three cycles, where students should: (1) write the essay; (2) evaluate their peers; (3) perform the pedagogical recommendations received. From the analysis of these data, it was found that the strategic model of recommendation used in this study, enabled a measurable gain in quality of textual production.

Styles APA, Harvard, Vancouver, ISO, etc.

47

Málik, Peter. « Získávání znalostí z multimediálních databází ». Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2011. http://www.nusl.cz/ntk/nusl-235525.

Texte intégral

Résumé :

This master"s thesis deals with the knowledge discovery in multimedia databases. It contains general principles of knowledge discovery in databases, especially methods of cluster analysis used for data mining in large and multidimensional databases are described here. The next chapter contains introduction to multimedia databases, focusing on the extraction of low level features from images and video data. The practical part is then an implementation of the methods BIRCH, DBSCAN and k-means for cluster analysis. Final part is dedicated to experiments above TRECVid 2008 dataset and description of achievements.

Styles APA, Harvard, Vancouver, ISO, etc.

48

Shreepathi, Subrahmanya, Hung Van Hoang et Rudolf Holze. « Corrosion Protection Performance and Spectroscopic Investigations of Soluble Conducting Polyaniline-Dodecylbenzenesulfonate Synthesized via Inverse Emulsion Procedure ». Universitätsbibliothek Chemnitz, 2009. http://nbn-resolving.de/urn:nbn:de:bsz:ch1-200900775.

Texte intégral

Résumé :

Corrosion protection performance of a completely soluble polyaniline-dodecylbenzenesulfonic acid salt (PANI-DBSA) on C45 steel has been studied with electrochemical impedance and potentiodynamic measurements. Chloroform is the most suitable solvent to process the pristine PANI-DBSA because of negligible interaction of the solvent with the polyaniline (PANI) backbone. An anodic shift in the corrosion potential ( Delta

E=~70 mV), a decrease in the corrosion current and a significant increase in the charge transfer resistance indicate a significant anti-corrosion performance of the soluble PANI deposited on the protected steel surface. Corrosion protection follows the mechanism of formation of a passive oxide layer on the surface of C45 steel. In situ UV-Vis spectroscopy was used to investigate the differences in permeability of aqueous anions into PANI-DBSA. Preliminary results of electron diffraction studies show that PANI-DBSA possesses an orthorhombic type of crystal structure. An increase in the feed ratio of DBSA to aniline increases the tendency of aggregation of spherical particles of PANI obvious in transmission electron microscopy. PANI-DBSA slowly loses its electrochemical activity in acid free electrolyte without undergoing degradation.

Styles APA, Harvard, Vancouver, ISO, etc.

49

Lanzarone, Lorenzo Biagio. « Manutenzione predittiva di macchinari industriali tramite tecniche di intelligenza artificiale : una valutazione sperimentale ». Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021. http://amslaurea.unibo.it/22853/.

Texte intégral

Résumé :

Nella società è in corso un processo di evoluzione tecnologica, il quale sviluppa una connessione tra l’ambiente fisico e l’ambiente digitale, per scambiare dati e informazioni. Nella presente tesi si approfondisce, nel contesto dell’Industria 4.0, la tematica della manutenzione predittiva di macchinari industriali tramite tecniche di intelligenza artificiale, per prevedere in anticipo il verificarsi di un imminente guasto, identificandolo prima ancora che si possa verificare. La presente tesi è divisa in due parti complementari, nella prima parte si approfondiscono gli aspetti teorici relativi al contesto e allo stato dell’arte, mentre nella seconda parte gli aspetti pratici e progettuali. In particolare, la prima parte è dedicata a fornire una panoramica sull’Industria 4.0 e su una sua applicazione, rappresentata dalla manutenzione predittiva. Successivamente vengono affrontate le tematiche inerenti l’intelligenza artificiale e la Data Science, tramite le quali è possibile applicare la manutenzione predittiva. Nella seconda parte invece, si propone un progetto pratico, ossia il lavoro da me svolto durante un tirocinio presso la software house Open Data di Funo di Argelato (Bologna). L’obiettivo del progetto è stato la realizzazione di un sistema informatico di manutenzione predittiva di macchinari industriali per lo stampaggio plastico a iniezione, utilizzando tecniche di intelligenza artificiale. Il fine ultimo è l’integrazione di tale sistema all’interno del software Opera MES sviluppato dall’azienda.

Styles APA, Harvard, Vancouver, ISO, etc.

50

Boulogne, Fleur Anne. « Building sustainable communities through participation : analysing the transition from participatory planning to implementation in the case of the Grabouw Sustainable Development Initiative ». Thesis, Stellenbosch : University of Stellenbosch, 2010. http://hdl.handle.net/10019.1/4273.

Texte intégral

Résumé :

Thesis ((MPhil (Sustainable Development, Planning and Management))--University of Stellenbosch, 2010.
ENGLISH ABTRACT: Through the development of sustainable communities, a transformation process can be incited towards a more sustainable way of life. An important prerequisite of this transformation process is behavioural change. This thesis is based on the supposition that participation can contribute to behavioural change. Behaviour which supports the functioning of sustainable systems, is essential in the long term success of sustainable communities. To sustain this behaviour and create a sense of ownership, participatory processes need to encompass the initial phases of development (planning) as well as the implementation and management phase (governance). To secure the participatory involvement in the implementation phase anchor points need to be created in the planning phase, which enable participation of community members in the implementation phase. By means of a case study this thesis has analysed the role of participation in the pilot project in Grabouw, a medium-sized town in the Western Cape, South Africa. The key objective was to establish whether and in what manner, the participatory planning process anticipated the involvement of community members in the implementation phase. Research shows that in some occasions, participation is defined as an instrument to effectively manage contingencies and facilitate the implementation of government decisions. However, the case studies of Grabouw and Porto Alegre, illustrate that community participation can also be organised in such a way that it enables community members to be involved in a meaningful way in decision-making processes, enabling them to shape their own environment. Defined this way active participation is not merely an instrument but an integral part of a complex system encompassing opportunities for social learning. Active participation can incite a process of „conscientization‟ and empowerment, stimulating people to become aware of sustainable challenges and adapt their behaviour accordingly. This viewpoint on participation is in line with the multi-dimensional nature of sustainable development and based on the need to facilitate a continuous evolving learning system. Furthermore it supports the notion that sustainable development is not a fixed objective but a moving target. Within this perspective sustainable communities need to be flexible entities able to evolve in accordance with increased understanding of the complex interrelated issues of sustainable development.
AFRIKAANSE OPSOMMING: ‟n Transformasieproses, gerig op ‟n meer volhoubare lewenswyse, kan deur die ontwikkeling van volhoubare gemeenskappe aangemoedig word. ‟n Belangrike voorvereiste vir so ‟n transformasieproses is gedragsverandering. Gedragsverandering is nie ‟n individuele oefening nie, maar is stewig veranker in sosiale prosesse en word daardeur beïnvloed. Om gedragsverandering op groter skaal te stimuleer, is dit nodig dat individue as katalisators van gedragsverandering optree. Deelname speel ‟n vername rol om volhoubare gemeenskappe as platforms vir volhoubare gedragsverandering op te stel. Die bestaande verskeidenheid tussen die verskillende vlakke van deelname bemoeilik die opstel van een duidelik omlynde definisie van deelname. Die regering en ander gemeenskapsrolspelers het die waarde van deelname besef en dit het algemene gebruik geword om lede van die gemeenskap by die beplanning en/of beheer van volhoubare stedelike ontwikkeling te betrek. Kompleksiteit-teorie bied ‟n waardevolle perspektief in die strewe na dieper verstandhouding rondom die geleenthede en beperkinge van deelname. Hierdie verhandeling het deur middel van ‟n gevallestudie die rol van deelname in die loodsprojek op Grabouw, ‟n medium-grootte dorp in Wes-Kaapland, geanaliseer. Die navorsing wat vir dié verhandeling gedoen is, het deel uitgemaak van ‟n evaluasiestudie wat deur die Ontwikkelingsbank van Suider Afrika bekend gestel is en deur die Omgewingsevaluasie-eenheid aan die Universiteit van Kaapstad (UK) uitgevoer is. Die navorsing het getoon dat in sommige gevalle deelname gedefinieer word as ‟n instrument om omstandighede doeltreffend te beheer en die toepassing van regeringsbesluite af te glad. Die gevallestudies van Grabouw en Porto Allegre wys egter daarop dat deelname ook op so ‟n manier georganiseer kan word dat dit lede van die gemeenskap in staat stel om op betekenisvolle wyse by besluitnemingsprosesse betrokke te raak en sodoende hulle eie omgewing rangskik. Aktiewe deelname wat so gedefinieer word, is nie ‟n instrument nie, maar ‟n integrale deel van ‟n komplekse stelsel wat geleenthede vir sosiale leer omsluit. Aktiewe deelname kan ‟n proses van „gewetensprikkeling‟ en bemagtiging aanmoedig, wat mense stimuleer om bewus te word van volhoubare uitdagings en hulle gedrag dienooreenkomstig aan te pas. Hierdie siening oor deelname is in lyn met die multi-dimensionele aard van volhoubare ontwikkeling en gebaseer op die behoefte om ‟n voortdurende ontwikkelende leerstelsel te fasiliteer. Voorts ondersteun dit die denkwyse dat volhoubare ontwikkeling nie ‟n vasgeankerde doelwit is nie, maar wel ‟n bewegende teiken. Binne hierdie perspektief behoort volhoubare gemeenskappe buigsame entiteite te wees wat daar toe in staat is om met toenemende insig van die komplekse verbandhoudende aangeleenthede rondom volhoubare ontwikkeling, te groei.

Styles APA, Harvard, Vancouver, ISO, etc.

Thèses sur le sujet « Dbsan »

Créez une référence correcte selon les styles APA, MLA, Chicago, Harvard et plusieurs autres