To see the other types of publications on this topic, follow the link: ML algorithm.

Dissertations / Theses on the topic 'ML algorithm'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 22 dissertations / theses for your research on the topic 'ML algorithm.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Krüger, Franz David, and Mohamad Nabeel. "Hyperparameter Tuning Using Genetic Algorithms : A study of genetic algorithms impact and performance for optimization of ML algorithms." Thesis, Mittuniversitetet, Institutionen för informationssystem och –teknologi, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-42404.

Full text
Abstract:
Maskininlärning har blivit allt vanligare inom näringslivet. Informationsinsamling med Data mining (DM) har expanderats och DM-utövare använder en mängd tumregler för att effektivisera tillvägagångssättet genom att undvika en anständig tid att ställa in hyperparametrarna för en given ML-algoritm för nå bästa träffsäkerhet. Förslaget i denna rapport är att införa ett tillvägagångssätt som systematiskt optimerar ML-algoritmerna med hjälp av genetiska algoritmer (GA), utvärderar om och hur modellen ska konstrueras för att hitta globala lösningar för en specifik datamängd. Genom att implementera genetiska algoritmer på två utvalda ML-algoritmer, K-nearest neighbors och Random forest, med två numeriska datamängder, Iris-datauppsättning och Wisconsin-bröstcancerdatamängd. Modellen utvärderas med träffsäkerhet och beräkningstid som sedan jämförs med sökmetoden exhaustive search. Resultatet har visat att GA fungerar bra för att hitta bra träffsäkerhetspoäng på en rimlig tid. Det finns vissa begränsningar eftersom parameterns betydelse varierar för olika ML-algoritmer.
As machine learning (ML) is being more and more frequent in the business world, information gathering through Data mining (DM) is on the rise, and DM-practitioners are generally using several thumb rules to avoid having to spend a decent amount of time to tune the hyperparameters (parameters that control the learning process) of an ML algorithm to gain a high accuracy score. The proposal in this report is to conduct an approach that systematically optimizes the ML algorithms using genetic algorithms (GA) and to evaluate if and how the model should be constructed to find global solutions for a specific data set. By implementing a GA approach on two ML-algorithms, K-nearest neighbors, and Random Forest, on two numerical data sets, Iris data set and Wisconsin breast cancer data set, the model is evaluated by its accuracy scores as well as the computational time which then is compared towards a search method, specifically exhaustive search. The results have shown that it is assumed that GA works well in finding great accuracy scores in a reasonable amount of time. There are some limitations as the parameter’s significance towards an ML algorithm may vary.
APA, Harvard, Vancouver, ISO, and other styles
2

Mohammad, Maruf H. "Blind Acquisition of Short Burst with Per-Survivor Processing (PSP)." Thesis, Virginia Tech, 2002. http://hdl.handle.net/10919/46193.

Full text
Abstract:
This thesis investigates the use of Maximum Likelihood Sequence Estimation (MLSE) in the presence of unknown channel parameters. MLSE is a fundamental problem that is closely related to many modern research areas like Space-Time Coding, Overloaded Array Processing and Multi-User Detection. Per-Survivor Processing (PSP) is a technique for approximating MLSE for unknown channels by embedding channel estimation into the structure of the Viterbi Algorithm (VA). In the case of successful acquisition, the convergence rate of PSP is comparable to that of the pilot-aided RLS algorithm. However, the performance of PSP degrades when certain sequences are transmitted. In this thesis, the blind acquisition characteristics of PSP are discussed. The problematic sequences for any joint ML data and channel estimator are discussed from an analytic perspective. Based on the theory of indistinguishable sequences, modifications to conventional PSP are suggested that improve its acquisition performance significantly. The effect of tree search and list-based algorithms on PSP is also discussed. Proposed improvement techniques are compared for different channels. For higher order channels, complexity issues dominate the choice of algorithms, so PSP with state reduction techniques is considered. Typical misacquisition conditions, transients, and initialization issues are reported.
Master of Science
APA, Harvard, Vancouver, ISO, and other styles
3

Deyneka, Alexander. "Metody ekvalizace v digitálních komunikačních systémech." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2011. http://www.nusl.cz/ntk/nusl-218963.

Full text
Abstract:
Tato práce je psaná v angličtině a je zaměřená na problematiku ekvalizace v digitálních komunikačních systémech. Teoretická část zahrnuje stručné pozorování různých způsobů návrhu ekvalizérů. Praktická část se zabývá implementací nejčastěji používaných ekvalizérů a s jejich adaptačními algoritmy. Cílem praktické části je porovnat jejich charakteristiky a odhalit činitele, které ovlivňují kvalitu ekvalizace. V rámci problematiky ekvalizace jsou prozkoumány tři typy ekvalizérů. Lineární ekvalizér, ekvalizér se zpětnou vazbou a ML (Maximum likelihood) ekvalizér. Každý ekvalizér byl testován na modelu, který simuloval reálnou přenosovou soustavu s komplexním zkreslením, která je složena z útlumu, mezisymbolové interference a aditivního šumu. Na základě implenentace byli určeny charakteristiky ekvalizérů a stanoveno že optimální výkon má ML ekvalizér. Adaptační algoritmy hrají významnou roli ve výkonnosti všech zmíněných ekvalizérů. V práci je nastudována skupina stochastických algoritmů jako algoritmus nejmenších čtverců(LMS), Normalizovaný LMS, Variable step-size LMS a algoritmus RLS jako zástupce deterministického přístupu. Bylo zjištěno, že RLS konverguje mnohem rychleji, než algoritmy založené na LMS. Byly nastudovány činitele, které ovlivnili výkon popisovaných algoritmů. Jedním z důležitých činitelů, který ovlivňuje rychlost konvergence a stabilitu algoritmů LMS je parametr velikosti kroku. Dalším velmi důležitým faktorem je výběr trénovací sekvence. Bylo zjištěno, že velkou nevýhodou algoritmů založených na LMS v porovnání s RLS algoritmy je, že kvalita ekvalizace je velmi závislá na spektrální výkonové hustotě a a trénovací sekvenci.
APA, Harvard, Vancouver, ISO, and other styles
4

Zhang, Dan [Verfasser]. "Iterative algorithms in achieving near-ML decoding performance in concatenated coding systems / Dan Zhang." Aachen : Hochschulbibliothek der Rheinisch-Westfälischen Technischen Hochschule Aachen, 2014. http://d-nb.info/1048607224/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Santos, Helton Saulo Bezerra dos. "Essays on Birnbaum-Saunders models." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2013. http://hdl.handle.net/10183/87375.

Full text
Abstract:
Nessa tese apresentamos três diferentes aplicações dos modelos Birnbaum-Saunders. No capítulo 2 introduzimos um novo método por função-núcleo não-paramétrico para a estimação de densidades assimétricas, baseado nas distribuições Birnbaum-Saunders generalizadas assimétricas. Funções-núcleo baseadas nessas distribuições têm a vantagem de fornecer flexibilidade nos níveis de assimetria e curtose. Em adição, os estimadores da densidade por função-núcleo Birnbaum-Saunders gene-ralizadas assimétricas são livres de viés na fronteira e alcançam a taxa ótima de convergência para o erro quadrático integrado médio dos estimadores por função-núcleo-assimétricas-não-negativos da densidade. Realizamos uma análise de dados consistindo de duas partes. Primeiro, conduzimos uma simulação de Monte Carlo para avaliar o desempenho do método proposto. Segundo, usamos esse método para estimar a densidade de três dados reais da concentração de poluentes atmosféricos. Os resultados numéricos favorecem os estimadores não-paramétricos propostos. No capítulo 3 propomos uma nova família de modelos autorregressivos de duração condicional baseados nas distribuições misturas de escala Birnbaum-Saunders (SBS). A distribuição Birnbaum-Saunders (BS) é um modelo que tem recebido considerável atenção recentemente devido às suas boas propriedades. Uma extensão dessa distribuição é a classe de distribuições SBS, a qual (i) herda várias das boas propriedades da distribuição BS, (ii) permite a estimação de máxima verossimilhança em uma forma eficiente usando o algoritmo EM, e (iii) possibilita a obtenção de um procedimento de estimação robusta, entre outras propriedades. O modelo autorregressivo de duração condicional é a família primária de modelos para analisar dados de duração de transações de alta frequência. A metodologia estudada aqui inclui estimação dos parâmetros pelo algoritmo EM, inferência para esses parâmetros, modelo preditivo e uma análise residual. Realizamos simulações de Monte Carlo para avaliar o desempenho da metodologia proposta. Ainda, avalia-mos a utilidade prática dessa metodologia usando dados reais de transações financeiras da bolsa de valores de Nova Iorque. O capítulo 4 trata de índices de capacidade do processo (PCIs), os quais são ferramentas utilizadas pelas empresas para determinar a qualidade de um produto e avaliar o desempenho de seus processos de produção. Estes índices foram desenvolvidos para processos cuja característica de qualidade tem uma distribuição normal. Na prática, muitas destas ca-racterísticas não seguem esta distribuição. Nesse caso, os PCIs devem ser modificados considerando a não-normalidade. O uso de PCIs não-modificados podemlevar a resultados inadequados. De maneira a estabelecer políticas de qualidade para resolver essa inadequação, transformação dos dados tem sido proposta, bem como o uso de quantis de distribuições não-normais. Um distribuição não-normal assimétrica o qual tem tornado muito popular em tempos recentes é a distribuição Birnbaum-Saunders (BS). Propomos, desenvolvemos, implementamos e aplicamos uma metodologia baseada em PCIs para a distribuição BS. Além disso, realizamos um estudo de simulação para avaliar o desempenho da metodologia proposta. Essa metodologia foi implementada usando o software estatístico chamado R. Aplicamos essa metodologia para um conjunto de dados reais de maneira a ilustrar a sua flexibilidade e potencialidade.
In this thesis, we present three different applications of Birnbaum-Saunders models. In Chapter 2, we introduce a new nonparametric kernel method for estimating asymmetric densities based on generalized skew-Birnbaum-Saunders distributions. Kernels based on these distributions have the advantage of providing flexibility in the asymmetry and kurtosis levels. In addition, the generalized skew-Birnbaum-Saunders kernel density estimators are boundary bias free and achieve the optimal rate of convergence for the mean integrated squared error of the nonnegative asymmetric kernel density estimators. We carry out a data analysis consisting of two parts. First, we conduct a Monte Carlo simulation study for evaluating the performance of the proposed method. Second, we use this method for estimating the density of three real air pollutant concentration data sets, whose numerical results favor the proposed nonparametric estimators. In Chapter 3, we propose a new family of autoregressive conditional duration models based on scale-mixture Birnbaum-Saunders (SBS) distributions. The Birnbaum-Saunders (BS) distribution is a model that has received considerable attention recently due to its good properties. An extension of this distribution is the class of SBS distributions, which allows (i) several of its good properties to be inherited; (ii) maximum likelihood estimation to be efficiently formulated via the EM algorithm; (iii) a robust estimation procedure to be obtained; among other properties. The autoregressive conditional duration model is the primary family of models to analyze high-frequency financial transaction data. This methodology includes parameter estimation by the EM algorithm, inference for these parameters, the predictive model and a residual analysis. We carry out a Monte Carlo simulation study to evaluate the performance of the proposed methodology. In addition, we assess the practical usefulness of this methodology by using real data of financial transactions from the New York stock exchange. Chapter 4 deals with process capability indices (PCIs), which are tools widely used by companies to determine the quality of a product and the performance of their production processes. These indices were developed for processes whose quality characteristic has a normal distribution. In practice, many of these characteristics do not follow this distribution. In such a case, the PCIs must be modified considering the non-normality. The use of unmodified PCIs can lead to inadequacy results. In order to establish quality policies to solve this inadequacy, data transformation has been proposed, as well as the use of quantiles from non-normal distributions. An asymmetric non-normal distribution which has become very popular in recent times is the Birnbaum-Saunders (BS) distribution. We propose, develop, implement and apply a methodology based on PCIs for the BS distribution. Furthermore, we carry out a simulation study to evaluate the performance of the proposed methodology. This methodology has been implemented in a noncommercial and open source statistical software called R. We apply this methodology to a real data set to illustrate its flexibility and potentiality.
APA, Harvard, Vancouver, ISO, and other styles
6

FECCHIO, PIETRO. "High-precision measurement of the hypertriton lifetime and Λ-separation energy exploiting ML algorithms with ALICE at the LHC." Doctoral thesis, Politecnico di Torino, 2022. http://hdl.handle.net/11583/2968462.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Garg, Anushka. "Comparing Machine Learning Algorithms and Feature Selection Techniques to Predict Undesired Behavior in Business Processesand Study of Auto ML Frameworks." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-285559.

Full text
Abstract:
In recent years, the scope of Machine Learning algorithms and its techniques are taking up a notch in every industry (for example, recommendation systems, user behavior analytics, financial applications and many more). In practice, they play an important role in utilizing the power of the vast data we currently generate on a daily basis in our digital world.In this study, we present a comprehensive comparison of different supervised Machine Learning algorithms and feature selection techniques to build a best predictive model as an output. Thus, this predictive model helps companies predict unwanted behavior in their business processes. In addition, we have researched for the automation of all the steps involved (from understanding data to implementing models) in the complete Machine Learning Pipeline, also known as AutoML, and provide a comprehensive survey of the various frameworks introduced in this domain. These frameworks were introduced to solve the problem of CASH (combined algorithm selection and Hyper- parameter optimization), which is basically automation of various pipelines involved in the process of building a Machine Learning predictive model.
Under de senaste åren har omfattningen av maskininlärnings algoritmer och tekniker tagit ett steg i alla branscher (till exempel rekommendationssystem, beteendeanalyser av användare, finansiella applikationer och många fler). I praktiken spelar de en viktig roll för att utnyttja kraften av den enorma mängd data vi för närvarande genererar dagligen i vår digitala värld.I den här studien presenterar vi en omfattande jämförelse av olika övervakade maskininlärnings algoritmer och funktionsvalstekniker för att bygga en bästa förutsägbar modell som en utgång. Således hjälper denna förutsägbara modell företag att förutsäga oönskat beteende i sina affärsprocesser. Dessutom har vi undersökt automatiseringen av alla inblandade steg (från att förstå data till implementeringsmodeller) i den fullständiga maskininlärning rörledningen, även känd som AutoML, och tillhandahåller en omfattande undersökning av de olika ramarna som introducerats i denna domän. Dessa ramar introducerades för att lösa problemet med CASH (kombinerat algoritmval och optimering av Hyper-parameter), vilket i grunden är automatisering av olika rörledningar som är inblandade i processen att bygga en förutsägbar modell för maskininlärning.
APA, Harvard, Vancouver, ISO, and other styles
8

Protzenko, Jonathan. "Mezzo : a typed language for safe effectful concurrent programs." Paris 7, 2014. http://www.theses.fr/2014PA077159.

Full text
Abstract:
Cette thèse décrit comment obtenir de plus fortes garanties de sûreté pour les programmes en utilisant Mezzo, un langage de programmation inspiré par ML, et muni d'un système de types novateur. Les programmes écrits en Mezzo bénéficient de plus fortes garanties, com¬parés à des programmes équivalents écrits dans un dialecte de ML: absence de séquencements critiques (« race conditions »), suivi des changements d'états au travers du système de types, et une notion de possession qui facilite le raisonnement modulaire et la compréhension des programmes. Mezzo n'est pas la premier langage à s'attaquer à cet objectif louable : une première partie s'efforce donc de situer Mezzo dans son contexte, en présentant des travaux emblématiques de la recherche en langages de programmation, travaux qui ont constitué des sources d'inspiration ou ont servi de points de comparaison. Une seconde partie présente le langage. Tout d'abord, au travers d'une riche palette d'exemples, qui permettent d'illustrer les fonctionnalités du lan¬gage ainsi que les gains de sûreté qui en découlent. Puis, dans une partie suivante, de manière formelle, en détaillant les différentes règles qui gouvernent le système de types de Mezzo. Mezzo n'existe pas seulement sur le papier : une dernière partie décrit la manière dont le lan¬gage est implémenté, en formalisant les algorithmes utilisés dans le typeur et en détaillant les techniques utilisées pour déterminer la validité d'un programme
The present dissertation argues that better programming languages can be designed and implemented, so as to provide greater safety and reliability for computer programs. I sustain my daims through the example of Mezzo, a programming language in the tradition of ML, which I co-designed and implemented. Programs written in Mezzo enjoy stronger properties than programs written in traditional ML languages: they are data-race free; state changes can be tracked by the type system; a central notion of ownership facilitates modular reasoning. Mezzo is not the first attempt at designing a better programming language; hence, a first part strives to position Mezzo relative to other works in the literature. I present landmark results in the field, which served either as sources of inspiration or points of comparison. The subsequent part is about the design of the Mezzo language. Using a variety of examples, I illustrate the language features as well as the safety gains that one obtains by writing their programs in Mezzo. In a subsequent part, I formalize the semantics of the Mezzo language. Mezzo is not just a type system that lives on paper: the fmal part describes the implementation of a type-checker for Mezzo, by formalizing the algorithms that I designed and the various ways the type-checker ensures that a program is valid
APA, Harvard, Vancouver, ISO, and other styles
9

Tade, Foluwaso Olunkunle. "Receiver architectures for MIMO wireless communication systems based on V-BLAST and sphere decoding algorithms." Thesis, University of Hertfordshire, 2011. http://hdl.handle.net/2299/6400.

Full text
Abstract:
Modern day technology aspires to always progress. This progression leads to a lot of research in any significant area of improvement. There is a growing amount of end-users in the wireless spectrum which has led to a need for improved bandwidth usage and BER values. In other words, new technologies which would increase the capacity of wireless systems are proving to be a crucial point of research in these modern times. Different combinations of multiuser receivers are evaluated to determine performance under normal working conditions by comparing their BER performance charts. Multiple input, multiple output (MIMO) systems are incorporated into the system to utilise the increased capacity rates achievable using the MIMO configuration. The effect of MIMO on the technologies associated with modern day technological standards such as CDMA and OFDM have been investigated due to the significant capacity potentials these technologies normally exhibit in a single antenna scenario. An in-depth comparison is established before comparison is made with a conventional maximum likelihood (ML) detector. The complexity of the ML detector makes its realization evaluated in such a manner to achieve the same or near ML solution but with lower computational complexity. This was achieved using a proposed modification of the Schnorr-Euchner Sphere decoding algorithm (SE-SDA). The proposed sphere decoder (P-SD) adopts a modification of the radius utilised in the SE-SDA to obtain a near ML solution at a much lower complexity compared to the conventional ML decoder. The P-SD was configured to work in different MIMO antenna configurations. The need for the highest possible data rates from the available limited spectrum led to my research into the multi-user detection scenario and MIMO.
APA, Harvard, Vancouver, ISO, and other styles
10

PIROZZI, MICHELA. "Development of a simulation tool for measurements and analysis of simulated and real data to identify ADLs and behavioral trends through statistics techniques and ML algorithms." Doctoral thesis, Università Politecnica delle Marche, 2020. http://hdl.handle.net/11566/272311.

Full text
Abstract:
Con una popolazione di anziani in crescita, il numero di soggetti a rischio di patologia è in rapido aumento. Molti gruppi di ricerca stanno studiando soluzioni pervasive per monitorare continuamente e discretamente i soggetti fragili nelle loro case, riducendo i costi sanitari e supportando la diagnosi medica. Comportamenti anomali durante l'esecuzione di attività di vita quotidiana (ADL) o variazioni sulle tendenze comportamentali sono di grande importanza.
With a growing population of elderly people, the number of subjects at risk of pathology is rapidly increasing. Many research groups are studying pervasive solutions to continuously and unobtrusively monitor fragile subjects in their homes, reducing health-care costs and supporting the medical diagnosis. Anomalous behaviors while performing activities of daily living (ADLs) or variations on behavioral trends are of great importance. To measure ADLs a significant number of parameters need to be considering affecting the measurement such as sensors and environment characteristics or sensors disposition. To face the impossibility to study in the real context the best configuration of sensors able to minimize costs and maximize accuracy, simulation tools are being developed as powerful means. This thesis presents several contributions on this topic. In the following research work, a study of a measurement chain aimed to measure ADLs and represented by PIRs sensors and ML algorithm is conducted and a simulation tool in form of Web Application has been developed to generate datasets and to simulate how the measurement chain reacts varying the configuration of the sensors. Starting from eWare project results, the simulation tool has been thought to provide support for technicians, developers and installers being able to speed up analysis and monitoring times, to allow rapid identification of changes in behavioral trends, to guarantee system performance monitoring and to study the best configuration of the sensors network for a given environment. The UNIVPM Home Care Web App offers the chance to create ad hoc datasets related to ADLs and to conduct analysis thanks to statistical algorithms applied on data. To measure ADLs, machine learning algorithms have been implemented in the tool. Five different tasks have been identified. To test the validity of the developed instrument six case studies divided into two categories have been considered. To the first category belong those studies related to: 1) discover the best configuration of the sensors keeping environmental characteristics and user behavior as constants; 2) define the most performant ML algorithms. The second category aims to proof the stability of the algorithm implemented and its collapse condition by varying user habits. Noise perturbation on data has been applied to all case studies. Results show the validity of the generated datasets. By maximizing the sensors network is it possible to minimize the ML error to 0.8%. Due to cost is a key factor in this scenario, the fourth case studied considered has shown that minimizing the configuration of the sensors it is possible to reduce drastically the cost with a more than reasonable value for the ML error around 11.8%. Results in ADLs measurement can be considered more than satisfactory.
APA, Harvard, Vancouver, ISO, and other styles
11

Wessman, Filip. "Advanced Algorithms for Classification and Anomaly Detection on Log File Data : Comparative study of different Machine Learning Approaches." Thesis, Mittuniversitetet, Institutionen för informationssystem och –teknologi, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-43175.

Full text
Abstract:
Background: A problematic area in today’s large scale distributed systems is the exponential amount of growing log data. Finding anomalies by observing and monitoring this data with manual human inspection methods becomes progressively more challenging, complex and time consuming. This is vital for making these systems available around-the-clock. Aim: The main objective of this study is to determine which are the most suitable Machine Learning (ML) algorithms and if they can live up to needs and requirements regarding optimization and efficiency in the log data monitoring area. Including what specific steps of the overall problem can be improved by using these algorithms for anomaly detection and classification on different real provided data logs. Approach: Initial pre-study is conducted, logs are collected and then preprocessed with log parsing tool Drain and regular expressions. The approach consisted of a combination of K-Means + XGBoost and respectively Principal Component Analysis (PCA) + K-Means + XGBoost. These was trained, tested and with different metrics individually evaluated against two datasets, one being a Server data log and on a HTTP Access log. Results: The results showed that both approaches performed very well on both datasets. Able to with high accuracy, precision and low calculation time classify, detect and make predictions on log data events. It was further shown that when applied without dimensionality reduction, PCA, results of the prediction model is slightly better, by a few percent. As for the prediction time, there was marginally small to no difference for when comparing the prediction time with and without PCA. Conclusions: Overall there are very small differences when comparing the results for with and without PCA. But in essence, it is better to do not use PCA and instead apply the original data on the ML models. The models performance is generally very dependent on the data being applied, it the initial preprocessing steps, size and it is structure, especially affecting the calculation time the most.
APA, Harvard, Vancouver, ISO, and other styles
12

Ucci, Graziano. "The Interstellar Medium of Galaxies: a Machine Learning Approach." Doctoral thesis, Scuola Normale Superiore, 2019. http://hdl.handle.net/11384/85928.

Full text
Abstract:
Understanding the structure and physical properties of the Interstellar Medium (ISM) in galaxies, especially at high redshift, is one of the major drivers of galaxy formation studies. Measurements of key properties as gas density, column density, metallicity, ionization parameter, and Habing flux, rely on galaxy spectra obtained through the most advanced telescopes (both earth-based and space-borne) and, in particular, on their emission lines. However, finding diagnostics that are free of significant systematic uncertainties remains an unsolved problem. Several attempts have been made to recover ISM physical properties by mean of diagnostics based on small, pre-selected subsets of emission line ratios. Most of these previous works focused on ionized nebulae, and have obtained diagnostics for the physical properties of galaxies based only on the strongest nebular emission lines coming from extra-galactic HII regions and star-forming galaxies. The main purpose of this work is to reconstruct key ISM physical properties of galaxies from their spectra. The aim is to maximize the information that can be extracted from such data by using not only few specific and pre-selected emission lines, but the full information encoded in the spectra. This is now possible thanks to the combination of powerful Supervised Machine Learning (ml) algorithms and large synthetic spectra libraries. In order to achieve this goal, I have developed a code called game (GAlaxy Machine learning for Emission lines), a new fast method to reconstruct the ISM physical properties by using all the information carried by the emission lines intensities present in the available spectrum. The library included in this code covers a very large range of plausible ISM physical properties to accurately describe the physics both of ionized regions and of other phases (i.e neutral, molecular) of the ISM. The strength of the method relies on the fact that the ml algorithm can learn from all the lines present in a spectrum, including the weakest ones as those coming for example from neutral ISM components. I verified that with ml it is possible to set strong constraints on the properties of the different phases from observed spectra. game has been extensively tested, and shown to deliver excellent predictive performances when applied to synthetic spectra. A ml approach will become fundamental with upcoming high-quality spectra, including also faint lines of high-redshift galaxies, from new facilities such as the James Webb Space Telescope (JWST) and the Extremely Large Telescope (ELT). The astrophysical community will be therefore into an era where ml algorithms and Big Data Analytics will become extremely useful tools in the data-mining process. This is already the case for local observations where Integral Field Units (IFUs) are already able to provide observations containing tens of thousands of spaxels. A notable study case is the ISM of local Blue Compact Galaxies (BCGs), a subclass of dwarf galaxies. In fact, since BCGs are low-metallicity, compact, star-forming systems, they are thought to represent local analogues of early galaxies that will become soon observable in greater detail (e.g. with JWST). Thus, ISM studies of local BCGs can be used as benchmarks for understanding the structure, formation, and evolution of highredshift galaxies. In addition to a general description of the ml algorithm and the game code, I will show the first game results concerning the interpretation of high-quality IFU spectra of BCGs.
APA, Harvard, Vancouver, ISO, and other styles
13

GUPTA, SONALI. "CLASSIFYING FRAUDULENT COMPANIES USING ML ALGORITHM IN PYTHON." Thesis, 2021. http://dspace.dtu.ac.in:8080/jspui/handle/repository/19186.

Full text
Abstract:
This paper is a case study of visiting an external audit company to explore the usefulness of machine learning algorithms for improving the quality of an audit work. Annual data of 777 firms from 14 different sectors are collected. With the appearance of tremendous growth of financial fraud cases, machine learning will play a big part in improving the quality of an audit field work in the future Purpose: The goal of the research is to help the auditors by building a classification model that can predict the fraudulent firm on the basis of the present risk factors and historical risk factors. The information about the sectors and the counts of firms are listed respectively as Irrigation (114), Public Health (77), Buildings and Roads (82), Forest (70), Corporate (47), Animal Husbandry (95), Communication (1), Electrical (4), Land (5), Science and Technology (3), Tourism (1), Fisheries (41), Industries (37), Agriculture (200). Methodology/Approach: The machine learning algorithms like Random Forest Classifier and Logistic regression are used in this project to classify the fraudulent firms.The exploratory data analysis is done using libraries of Python like matplotlib and plotly. Research Limitations: The dataset is one year non-confidential data in the year 2015 to 2016 of firms is collected from the Auditor Office of India to build a predictor for classifying suspicious firms. Value: To help the auditors by building a classification model that can predict the fraudulent firm on the basis the present and historical risk factors.
APA, Harvard, Vancouver, ISO, and other styles
14

Chung, Hsiang-Han, and 鍾享翰. "A SISO ML Decoding Algorithm and Its Application in Turbo Decoding." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/47608737661188217448.

Full text
Abstract:
碩士
長庚大學
電機工程研究所
95
Abstract For many applications, the component code to be used with iterative decoding is a recursive systematic convolutional code (RSC code). Turbo code is a kind of iterative decoding first proposed by Berrou, Glavieux and Thitimajashima, who reported excellent coding gain results, approaching theoretical limit predicted by Shannon, in 1993. Since its excellent performance, the coding technique has been widely used in error control for many communication applications, such as third-generation (3G) mobile radio systems and deep space communications. Two suitable decoding algorithm of turbo code are Soft-Output Viterbi Algorithm (SOVA) proposed by Hagenauer and Hoeher. The other is Maximum A-Posteriori (MAP) algorithm proposed by Bahl et al. In this thesis, a new decoding algorithm is proposed for turbo decoding. This decoding algorithm using log ratio of a-observation probability to substitute log ratio of a-posteriori probability and used for decoding. Theoretically, the proposed new decoding scheme is no coding gain sacrificing, i.e., it can be achieved as well as BER performance of Log-MAP algorithm. Finally, we offer a decoding structure. Compare this structure and the structure proposed by Berrou et al. This structure can reduce some operation for conventional high operation. It also explains that this decoding structure is suitable to be applied to decode turbo codes for a mobile phone.
APA, Harvard, Vancouver, ISO, and other styles
15

Wu, Meng-Lin, and 吳孟霖. "Theory and Performance of ML Decoding for LDPC Codes Using Genetic Algorithm." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/64317738328812502720.

Full text
Abstract:
碩士
國立臺灣大學
電信工程學研究所
97
Low-density parity-check (LDPC) codes drawn large attention lately due to their exceptional performance. Typical decoders operate based on the belief-propagation principle. Although these decoding algorithms work remarkably well, it is generally suspected that they do not achieve the performance of ML decoding. The ML performance of LDPC codes remains unknown because efficient ML decoders have not been discovered. Although it has been proved that for various appropriately chosen ensembles of LDPC codes, low error probability and reliable communication is possible up to channel capacity, we still want to know the actual limit for one specific code. Thus, in this thesis, our goal is to establish the ML performance. At a word error probability (WEP) of 10^{-5} or lower, we find that perturbed decoding can effectively achieve the ML performance at reasonable complexity. In higher error probability regime, the complexity of PD becomes prohibitive. In light of this, we propose the use of gifts. Proper gifts can induce high likelihood decoded codewords. We investigate the feasibility of using gifts in detail and discover that the complexity is dominated by the effort to identify small gifts that can pass the trigger criterion. A greedy concept is proposed to maximize the probability for a receiver to produce such a gift. Here we also apply the concept of gift into the genetic algorithm to find the ML bounds of LDPC codes. In genetic decoding algorithm (GDA), chromosomes are amount of gift sequence with some known gift bits. A conventional SPA decoder is used to assign fitness values for the chromosomes in the population. After evolution in many generations, chromosomes that correspond to decoded codewords of very high likelihood emerge. We also propose a parallel genetic decoding algorithm (P-GDA) based on the greedy concept and feasibility research of gifts. The most important aspect of GDA, in our opinion, is that one can utilize the ML bounding technique and GDA to empirically determine an effective lower bound on the error probability with ML decoding. Our results show that GDA and P-GDA outperform conventional decoder by 0.1 ~ 0.13 dB and the two bounds converge at a WEP of $10^{-5}$. Our results also indicate that, for a practical block size of thousands of bits, the SNR-error probability relationship of LDPC codes trends smoothly in the same fashion as the sphere packing bound. The abrupt cliff-like error probability curve is actually an artifact due to the ineffectiveness of iterative decoding. If additional complexity is allowed, our methods can be applied to improve on the typical decoders.
APA, Harvard, Vancouver, ISO, and other styles
16

Hsueh, Tsun-Chih. "Theory and Performance of ML Decoding for Turbo Codes using Genetic Algorithm." 2007. http://www.cetd.com.tw/ec/thesisdetail.aspx?etdun=U0001-3107200702055600.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Hsueh, Tsun-Chih, and 薛存志. "Theory and Performance of ML Decoding for Turbo Codes using Genetic Algorithm." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/51236587751523493601.

Full text
Abstract:
碩士
臺灣大學
電信工程學研究所
95
Although yielding the lowest error probability, ML decoding of turbo codes has been considered unrealistic so far because efficient ML decoders have not been discovered. In this thesis, we propose an experimental bounding technique for ML decoding and the Genetic Decoding Algorithm (GDA) for turbo codes. The ML bounding technique establishes both lower and upper bounds for ML decoding. GDA combines the principles of perturbed decoding and genetic algorithm. In GDA, chromosomes are random additive perturbation noises. A conventional turbo decoder is used to assign fitness values to the chromosomes in the population. After generations of evolution, good chromosomes that correspond to decoded codewords of very good likelihood emerge. GDA can be used as a practical decoder for turbo codes in certain contexts. It is also a natural multiple-output decoder. The most important aspect of GDA, in our opinion, is that one can utilize the ML bounding technique and GDA to empirically determine a effective lower bound on the error probability with ML decoding. Our results show that, at a word error probability of 10^{-4}, GDA achieves the performance of ML decoding. Using the ML bounding technique and GDA, we establish that an ML decoder only slightly outperforms a MAP-based iterative decoder at this word error probability for the block size we used and the turbo code defined for WCDMA.
APA, Harvard, Vancouver, ISO, and other styles
18

Sá, Pedro Miguel Martins de Sousa e. "Image reconstruction algorithm implementation for the easyPET: a didactic and pre-clinical PET system." Master's thesis, 2017. http://hdl.handle.net/10451/31700.

Full text
Abstract:
Tese de mestrado integrado em Engenharia Biomédica e Biofísica , apresentada à Universidade de Lisboa, através da Faculdade de Ciências, 2017
Tomografia por Emissão de Positrões (PET) é uma técnica de imagiologia funcional, utilizada para observar processos biológicos. O conceito de tomografia por emissão foi introduzido durante a década de 1950, sendo que foi apenas com o desenvolvimento de radiofármacos na década de 1970, que esta técnica começou a ser utilizada em medicina. Nos últimos 20 anos, o avanço tecnológico tornou os sistemas PET numa ferramenta altamente qualificada para imagiologia funcional. Neste período, o aparecimento de sistemas PET-CT veio colmatar as deficiências produzidas pela PET ao nível de imagem estrutural, com a combinação desta técnica funcional com a de Tomografia Computadorizada (CT). A evolução da tecnologia PET foi também acompanhada pela evolução da tecnologia para produção de radiofármacos, incluindo os radionuclídeos, bem como do conhecimento médico relativo aos processos biológicos humanos. Aliando esta tecnologia e conhecimento, tornou-se possível traçar moléculas com funções metabólicas nos diversos sistemas do corpo humano e, assim, produzir uma variedade de imagens funcionais. Dado o tipo de imagem produzida pela técnica PET, é bastante comum associar-lhe o diagnóstico de doenças cancerígenas, cuja principal característica é a desregulação metabólica celular no organismo. Tendo em vista o aumento esperado da incidência de cancro em Portugal e na Europa, tendo já sido atingida uma incidência nacional, em 2010, de 444,50 pessoas em cada 100.000 (números avançados pela DGS, 2015), a utilização de técnicas que permitam o diagnóstico precoce destas doenças é de elevada importância. Posto isto, e apesar do constante crescimento do gasto público em cuidados médicos relativos ao diagnóstico e tratamento de cancro, estão a ser postos cada vez mais esforços e fundos para que o processo de Investigação e Desenvolvimento (I&D) relacionado com esta doença seja célere. São constantemente desenvolvidas novas e melhores técnicas de imagiologia, que permitem diagnósticos mais precoces e precisos, enquanto ajudam na aplicação de planos de tratamento mais eficazes que, consequentemente, levam a um gasto público mais eficiente. Os sistemas PET inserem-se neste contexto e, uma vez permitindo imagem altamente sensível a processos funcionais, facilmente se generalizaram no meio médico e académico. Os sistemas direcionados a aplicações relacionadas com a medicina humana têm como função observar processos biológicos, com a finalidade de um diagnóstico médico ou estudo. Sistemas pré-clínicos, direcionados a estudos com animais pequenos, têm o propósito de auxiliar a investigação relacionada com os estudos preliminares de doenças que afetem o ser humano. Finalmente, e sendo o grupo com menor oferta comercial, os sistemas PET didáticos possibilitam uma melhor formação de pessoal responsável pelo futuro uso e I&D relacionados com esta tecnologia. No entanto, a tecnologia utilizada nestes três tipos de sistemas encarece consideravelmente o seu valor comercial sendo que, contrariamente ao que seria de esperar, os preços dos sistemas pré-clínicos não se diferenciam consideravelmente dos sistemas para humanos. O encarecimento destes sistemas deve-se ao facto de que toda a tecnologia a eles associada tem características mais dispendiosas de produzir. No caso dos sistemas didáticos, simplesmente não existe o incentivo necessário à sua produção e compra. É neste contexto que surge o easyPET. O design inovador, constituído por apenas duas colunas de detetores opostos, e tirando partido de uma atuação sobre dois eixos de rotação, faz deste sistema ideal para entrar no mercado em duas vertentes. A primeira, constituída apenas por um detetor em cada coluna, está destinada a ter um papel didático. A segunda, tirando partido de colunas com múltiplos detetores, foi desenhada para entrar no mercado de sistemas pré-clínicos. Em ambos os casos, a principal característica do easyPET, e a que o destaca dos restantes sistemas, é o seu reduzido número de detetores, que resulta num reduzido custo de produção. Através da implementação de um número reduzido de detetores e, consequentemente, reduzida eletrónica, é possível obter um custo final da máquina inferior. No entanto, é sempre necessário garantir que os dados obtidos em tal sistema correspondam a imagens com as características necessárias, sendo que o processo de reconstrução de imagem é bastante importante. O trabalho apresentado nesta tese tem como objetivo a implementação de um método de reconstrução de imagem a duas dimensões, dedicado ao sistema easyPET. Para tal, foi considerado um algoritmo estatístico iterativo que se baseia na Maximização da Estimativa da Máxima Verosimilhança (ML-EM), introduzido por Shepp e Vardi em 1982. Desde então, tem sido largamente explorado e, inclusive, dando aso a outras versões bastante comuns em reconstrução de imagem PET, como é caso da Maximização da Espectativa usando Subgrupos Ordenados (OS-EM). A implementação do algoritmo escolhido foi feita no software Matlab. Para computar a unidade básica do algoritmo, a Linha de Resposta (LOR), foi implementado o método ray-driven. Por forma a otimizar a construção da matriz de sistema utilizada neste algoritmo, foram implementadas simetrias de geometria. Esta otimização baseou-se na consideração de que a geometria do sistema easyPET pode ser dividida em quadrantes, sendo que um único quadrante consegue descrever os restantes três. Além disso, foram também implementadas otimizações ao nível estrutural do código escrito em Matlab. Estas foram feitas tendo em conta o aumento na facilidade de acesso à memória através da utilização variáveis para rápido indexamento. Foram também implementados dois métodos de regularização de dados: filtragem gaussiana entre iterações e um root prior baseado na mediana. Por forma a comparar, mais tarde, os resultados obtidos através do algoritmo implementado, foi também implementado o método de reconstrução de Retroprojeção Filtrada (FBP). Por último, foi implementada uma interface para o utilizador, utilizando a aplicação GUIDE do Matlab. Esta interface tem como objetivo servir de ponte entre o sistema didático easyPET e o utilizador, para que a experiência de utilização seja otimizada. Por forma a delinear o teste ao sistema easyPET e ao algoritmo ML-EM implementado, foram seguidas as normas NEMA. Este é um conjunto de normas que tem como objetivo padronizar a análise realizada a sistemas de imagem médica. Para tal, foram adquiridos e simulados ficheiros de dados com uma fonte pontual a 5, 10, 15 e 25 mm do centro do campo de visão do sistema (FOV) e utilizando um par de detetores com 2x2x30 mm3. Para realizar a análise de resultados, os dados foram reconstruídos utilizando a FBP implementada, e foi medida a FWHM e FWTM da fonte reconstruída. O mesmo procedimento foi aplicado, mas reconstruindo os dados através do algoritmo ML-EM, utilizando o filtro gaussiano, o MRP, e não utilizando qualquer método de regularização de dados (nativo). Por forma a comparar os métodos de regularização de dados, foi também realizada uma medição do rácio sinal-ruído (SNR). Os resultados foram obtidos para imagens reconstruídas com um pixel de, aproximadamente, 0.25x0.25 mm2, correspondendo a imagens de 230x230 pixéis. Os primeiros resultados foram obtidos a fim de determinar qual a iteração em que se começaria a observar a estabilização das imagens reconstruídas. Para algoritmo ML-EM implementado e o tipo de dados utilizados, foi observado que a partir da 10a iteração o algoritmo ML-EM converge. Através das medidas para a FWHM e FWTM observou-se, também, que os dados obtidos experimentalmente se diferenciam dos resultados obtidos sobre os dados simulados. Isto levou a que, fora dos objetivos deste trabalho, fossem realizados mais testes utilizando dados experimentais e, que daqui em diante, apenas fossem utilizados dados obtidos através de simulação Monte Carlo, por razões de conveniência na precisão da colocação da fonte pontual. De seguida, comparam-se os dados obtidos através da FBP e o algoritmo ML-EM nativo. Para o primeiro caso foram medidas FWHM de 1.5x1.5 mm2, enquanto que para o segundo foram atingidos valores de 1.2x1.2 mm2. Para os métodos de regularização de dados foram medidos valores de resolução semelhantes ou inferiores, sendo que estes resultaram num aumento da qualidade da reconstrução da fonte, observado através do aumento no valor de SNR medido. O trabalho apresentado nesta tese revela, não só a validação do algoritmo de reconstrução proposto, mas também o bom funcionamento e potencialidades do sistema easyPET. Pelos resultados obtidos através das normas NEMA, é possível observar que este sistema vai ao encontro do estado de arte. Mais ainda, através de um método de reconstrução dedicado ao easyPET é possível otimizar os resultados obtidos. Com o avançar do projeto no qual este trabalho esteve inserido, é de esperar que o modelo a três dimensões pré-clínico easyPET irá produzir melhores resultados. De frisar que o sistema easyPET didático se encontra na sua fase final e que os resultados obtidos são bastante satisfatórios tendo em conta a finalidade deste sistema.
The easyPET scanner has an innovative design, comprising only two array columns facing each other, and with an actuation defined by two rotation axes. Using this design, two approaches have been taken. The first concerns to a didactic PET scanner, where the arrays of detectors are comprised of only one detector each, and it is meant to be a simple 2-dimensional PET scanner for educational purposes. The second corresponds to a pre-clinical scanner, with the arrays having multiple detectors, meant to acquire 3-dimensional data. Given the geometry of the system, there is no concern with the effects of not measuring the Depth-of-Interaction (DOI), and a resolution of 1-1.5 mm is expected with the didactic system, improving with the pre-clinical. The work presented in this thesis deals with 2D image reconstruction for the easyPET scanners. The unconventional nature of the acquisition geometry, the large amount of data to be processed, the complexity of implementing a PET image reconstruction algorithm, and the implementation of data regularization methods, gaussian filtering and Median Root Prior (MRP), were addressed in this thesis. For this, the Matlab software was used to implement the ML-EM algorithm. Alongside, several optimizations were also implemented in order to convey a better computational performance to the algorithm. These optimizations refer to using geometry symmetries and fast indexing approaches. Moreover, a user interface was created so as to enhance the user experience for the didactic easyPET system. The validation of the implemented algorithm was performed using Monte Carlo simulated, and acquired data. The first results obtained indicate that the optimizations implemented on the algorithm have successfully reduced the image reconstruction time. On top of that, the system was tested according to the NEMA rules. A comparison was then made between reconstructed images produced by using Filtered Back Projection (FBP), the native ML-EM implementation, the ML-EM algorithm using inter-iteration gaussian filtering, and the ML-EM algorithm implemented with the MRP. This comparison was made through the calculation of FWHM, FWTM, and SNR, at different spatial positions. The results obtained reveal an approximate 1.5x 1.5 mm2 FWHM source resolution in the FOV, when recurring to FBP, and 1.2x 1.2 mm2 for the native ML-EM algorithm. The implemented data regularization methods produced similar or improved spatial resolution results, whilst improving the source’s SNR values. The results obtained show the potential in the easyPET systems. Since the didactic scanner is already on its final stage, the next step will be to further test the pre-clinical system.
APA, Harvard, Vancouver, ISO, and other styles
19

"On local and global influence analysis of latent variable models with ML and Bayesian approaches." 2004. http://library.cuhk.edu.hk/record=b6073748.

Full text
Abstract:
Bin Lu.
"September 2004."
Thesis (Ph.D.)--Chinese University of Hong Kong, 2004.
Includes bibliographical references (p. 118-126)
Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web.
Mode of access: World Wide Web.
Abstracts in English and Chinese.
APA, Harvard, Vancouver, ISO, and other styles
20

Kaur, Rajvir. "A comparative analysis of selected set of natural language processing (NLP) and machine learning (ML) algorithms for clinical coding using clinical classification standards." Thesis, 2018. http://hdl.handle.net/1959.7/uws:49614.

Full text
Abstract:
In Australia, hospital discharge summaries created at the end of an episode of care contain patient information such as demographic data, medical history, various diagnosis, interventions carried out, medications and drug therapies provided to the patient. These discharge summaries not only serve as a record of the episode of care, but later converted into a set of clinical codes for statistical analysis purposes. The process of clinical coding refers to assigning alphanumeric codes to discharge summaries. In Australia, clinical coding is done using International Classification of Diseases, version 10, Australian Modification (ICD-10-AM) and Australian Classification of Health Interventions (ACHI) as per the Australian Coding Standards (ACS), in an acute and subacute care setting, in both public and private hospitals. Clinical coding and subsequent analysis facilitate funding, insurance claims processing and research. The task of assigning codes to an episode of care is a manual process. This posed challenges in terms of ever-increasing set of codes in ICD-10-AM and ACHI, changing coding standards in ACS, complexity of care episodes, and large training and recruitment costs associated with clinical coders. In addition, the manual clinical coding process is time consuming and prone to errors, leading to financial losses. The use of Natural Language Processing (NLP) and Machine Learning (ML) techniques is considered as a solution to the above problem. In this thesis, four different approaches namely, pattern matching, rule based, machine learning and hybrid technique are compared to identify most efficient algorithm suitable for clinical coding. The ICD-10-AM and ACHI consists of 22 chapters based on human body organs, where each chapter describe diseases and interventions of a body system. The aforementioned, NLP and ML comparison is carried out only two chapters namely, diseases of the respiratory system and diseases of the digestive system. Initially, the dataset contained 190 clinical records of two chapters and named as Data190. Due to the limited number of clinical records, another 45 records were added to the existing dataset and this resultant dataset was named as Data235. The clinical records were cleaned up in the pre-processing stage to extract useful information which includes principal diagnosis, additional diagnosis, diabetes condition, principal procedure, additional procedure and anaesthesia details. In data pre-processing, various NLP techniques such as tokenisation, stop word removal, spelling error detection and correction, negation detection and abbreviation expansion were applied. In pattern matching approach, the textstring were matched charcter by character against the ICD-10-AMand ACHI coding guide using regular expression. If the match was found, codes were assigned. Whereas, in rule-based, 409 rules were defined to avoid coding of wrong patterns. In machine learning, once the unwanted information was removed from the clinical records, text was represented in vector form for feature extraction using Bag of words (BoW) representation (Manning, Raghavan, & Schütze, 2008, p. 117) and Term Frequency-Inverse Document Frequency (TF-IDF) vectoriser (Manning et al., 2008, p. 118). After feature extraction, classification is done using seven classifiers namely Support Vector Machine (SVM) (Cortes & Vapnik, 1995), Naïve Bayes (Manning et al., 2008, p. 258), Decision Tree (Kumar, Assistant, & Sahni, 2011), Random Forest (Breiman, 2001), AdaBoost (Freund & Schapire, 1999), Multi Layer Perceptron (MLP) (Naraei, Abhari, & Sadeghian, 2016) and k-Nearest Neighbour (kNN) (Manning et al., 2008, p. 297). A set of standard metrics: Precision(P), Recall (R), F-score (F-score), Accuracy, Hamming Loss(HL) and Jaccard Similarity (JS) (Dalianis, 2018), (Aldrees & Chikh, 2016) is used to do the measure the efficiency of the said NLP and ML algorithms using the above mentioned two datasets. For both the datasets (Data190 and Data235), the machine learning approach and the hybrid approach gave good performances in comparison to pattern matching and rule-based approach. Among all the classifiers, AdaBoost outperformed followed by Decision Tree and other classifiers. In the machine learning approach, Decision Tree technique performed better than all the other classifiers using 4-gram feature set by achieving 0.87 F-score, 0.7453 JS and 0.0877 HL. Similarly, in Data235, AdaBoost outperforms by achieving 0.91 F-score, 0.8294 JS and 0.0945 HL.
APA, Harvard, Vancouver, ISO, and other styles
21

Vicente, David José Marques. "Distributed Algorithms for Target Localization in Wireless Sensor Networks Using Hybrid Measurements." Master's thesis, 2017. http://hdl.handle.net/10362/27875.

Full text
Abstract:
This dissertation addresses the target localization problem in wireless sensor networks (WSNs). WSNs is now a widely applicable technology which can have numerous practical applications and offer the possibility to improve people’s lives. A required feature to many functions of a WSN, is the ability to indicate where the data reported by each sensor was measured. For this reason, locating each sensor node in a WSN is an essential issue that should be considered. In this dissertation, a performance analysis of two recently proposed distributed localization algorithms for cooperative 3-D wireless sensor networks (WSNs) is presented. The tested algorithms rely on distance and angle measurements obtained from received signal strength (RSS) and angle-of-arrival (AoA) information, respectively. The measurements are then used to derive a convex estimator, based on second-order cone programming (SOCP) relaxation techniques, and a non-convex one that can be formulated as a generalized trust region sub-problem (GTRS). Both estimators have shown excellent performance assuming a static network scenario, giving accurate location estimates in addition to converging in few iterations. The results obtained in this dissertation confirm the novel algorithms’ performance and accuracy. Additionally, a change to the algorithms is proposed, allowing the study of a more realistic and challenging scenario where different probabilities of communication failure between neighbor nodes at the broadcast phase are considered. Computational simulations performed in the scope of this dissertation, show that the algorithms’ performance holds for high probability of communication failure and that convergence is still achieved in a reasonable number of iterations.
APA, Harvard, Vancouver, ISO, and other styles
22

Rasool, Raihan Ur. "CyberPulse: A Security Framework for Software-Defined Networks." Thesis, 2020. https://vuir.vu.edu.au/42172/.

Full text
Abstract:
Software-Defined Networking (SDN) technology provides a new perspective in traditional network management by separating infrastructure plane from the control plane which facilitates a higher level of programmability and management. While centralized control provides lucrative benefits, the control channel becomes a bottleneck and home to numerous attacks. We conduct a detailed study and find that crossfire Link Flooding Attacks (LFA) are one of the most lethal attacks for SDN due to the utilization of low-rate traffic and persistent attacking nature. LFAs can be launched by the malicious adversaries to congest the control plane with low-rate traffic which can obstruct the flow rule installation and can ultimately bring down the whole network. Similarly, the adversary can employ bots to generate low-rate traffic to congest the control channel, and ultimately bring down the control plane and data plane connection causing service disruption. We present a systematic and comparative study on the vulnerabilities of LFAs on all the SDN planes, elaborate in detail the LFA types, techniques, and their behavior in all the variant of SDN. We then illustrate the importance of a defense mechanism employing a distributed strategy against LFAs and propose a Machine Learning (ML) based framework namely CyberPulse. Its detailed design, components, and their interaction, working principles, implementation, and in-depth evaluation are presented subsequently. This research presents a novel approach to write anomaly patterns and makes a significant contribution by developing a pattern-matching engine as the first line of defense against known attacks at a line-speed. The second important contribution is the effective detection and mitigation of LFAs in SDN through deep learning techniques. We perform twofold experiments to classify and mitigate LFAs. In the initial experimental setup, we utilize Artificial Neural Networks backward propagation technique to effectively classify the incoming traffic. In the second set of experiments, we employ a holistic approach in which CyberPulse demonstrates algorithm agnostic behavior and employs a pre-trained ML repository for precise classification. As an important scientific contribution, CyberPulse framework has been developed ground up using modern software engineering principles and hence provides very limited bandwidth and computational overhead. It has several useful features such as large-scale network-level monitoring, real-time network status information, and support for a wide variety of ML algorithms. An extensive evaluation is performed using Floodlight open-source controller which shows that CyberPulse offers limited bandwidth and computational overhead and proactively detect and defend against LFA in real-time. This thesis contributes to the state-of-the-art by presenting a novel framework for the defense, detection, and mitigation of LFA in SDN by utilizing ML-based classification techniques. Existing solutions in the area mandate complex hardware for detection and defense, but our presented solution offers a unique advantage in the sense that it operates on real-time traffic scenario as well as it utilizes multiple ML classification algorithms for LFA traffic classification without necessitating complex and expensive hardware. In the future, we plan to implement it on a large testbed and extend it by training on multiple datasets for multiple types of attacks.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography