Дисертації: "EFFICIENT CLASSIFICATION"

1

Cisse, Mouhamadou Moustapha. "Efficient extreme classification." Thesis, Paris 6, 2014. http://www.theses.fr/2014PA066594/document.

Повний текст джерела

Анотація:

Dans cette thèse, nous proposons des méthodes a faible complexité pour la classification en présence d'un très grand nombre de catégories. Ces methodes permettent d'accelerer la prediction des classifieurs afin des les rendre utilisables dans les applications courantes. Nous proposons deux methodes destinées respectivement a la classification monolabel et a la classification multilabel. La première méthode utilise l'information hierarchique existante entre les catégories afin de créer un représentation binaire compact de celles-ci. La seconde approche , destinée aux problemes multilabel adpate le framework des Filtres de Bloom a la representation de sous ensembles de labels sous forme de de vecteurs binaires sparses. Dans chacun des cas, des classifieurs binaires sont appris afin de prédire les representations des catégories/labels et un algorithme permettant de retrouver l'ensemble de catégories pertinentes a partir de la représentation prédite est proposée. Les méthodes proposées sont validées par des expérience sur des données de grandes échelles et donnent des performances supérieures aux méthodes classiquement utilisées pour la classification extreme
We propose in this thesis new methods to tackle classification problems with a large number of labes also called extreme classification. The proposed approaches aim at reducing the inference conplexity in comparison with the classical methods such as one-versus-rest in order to make learning machines usable in a real life scenario. We propose two types of methods respectively for single label and multilable classification. The first proposed approach uses existing hierarchical information among the categories in order to learn low dimensional binary representation of the categories. The second type of approaches, dedicated to multilabel problems, adapts the framework of Bloom Filters to represent subsets of labels with sparse low dimensional binary vectors. In both approaches, binary classifiers are learned to predict the new low dimensional representation of the categories and several algorithms are also proposed to recover the set of relevant labels. Large scale experiments validate the methods

Стилі APA, Harvard, Vancouver, ISO та ін.

2

Monadjemi, Amirhassan. "Towards efficient texture classification and abnormality detection." Thesis, University of Bristol, 2005. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.409593.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

3

Alonso, Pedro. "Faster and More Resource-Efficient Intent Classification." Licentiate thesis, Luleå tekniska universitet, EISLAB, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-81178.

Повний текст джерела

Анотація:

Intent classification is known to be a complex problem in Natural Language Processing (NLP) research. This problem represents one of the stepping stones to obtain machines that can understand our language. Several different models recently appeared to tackle the problem. The solution has become reachable with deep learning models. However, they have not achieved the goal yet.Nevertheless, the energy and computational resources of these modern models (especially deep learning ones) are very high. The utilization of energy and computational resources should be kept at a minimum to deploy them on resource-constrained devices efficiently.Furthermore, these resource savings will help to minimize the environmental impact of NLP. This thesis considers two main questions.First, which deep learning model is optimal for intent classification?Which model can more accurately infer a written piece of text (here inference equals to hate-speech) in a short text environment. Second, can we make intent classification models to be simpler and more resource-efficient than deep learning?. Concerning the first question, the work here shows that intent classification in written language is still a complex problem for modern models.However, deep learning has shown successful results in every area it has been applied.The work here shows the optimal model that was used in short texts.The second question shows that we can achieve results similar to the deep learning models by more straightforward solutions.To show that, when combining classical machine learning models, pre-processing techniques, and a hyperdimensional computing approach. This thesis presents a research done for a more resource-efficient machine learning approach to intent classification. It does this by first showing a high baseline using tweets filled with hate-speech and one of the best deep learning models available now (RoBERTa, as an example). Next, by showing the steps taken to arrive at the final model with hyperdimensional computing, which minimizes the required resources.This model can help make intent classification faster and more resource-efficient by trading a few performance points to achieve such resource-saving.Here, a hyperdimensional computing model is proposed. The model is inspired by hyperdimensional computing and its called ``hyperembed,'' which shows the capabilities of the hyperdimensional computing paradigm.When considering resource-efficiency, the models proposed were tested on intent classification on short texts, tweets (for hate-speech where intents are to offend or not to), and questions posed to Chatbots. In summary, the work proposed here covers two aspects. First, the deep learning models have an advantage in performance when there are sufficient data. They, however, tend to fail when the amount of available data is not sufficient. In contrast to the deep learning models, the proposed models work well even on small datasets.Second, the deep learning models require substantial resources to train and run them while the models proposed here aim at trading off the computational resources spend to obtaining and running the model against the classification performance of the model.

Стилі APA, Harvard, Vancouver, ISO та ін.

4

Chatchinarat, Anuchin. "An efficient emotion classification system using EEG." Thesis, Chatchinarat, Anuchin (2019) An efficient emotion classification system using EEG. PhD thesis, Murdoch University, 2019. https://researchrepository.murdoch.edu.au/id/eprint/52772/.

Повний текст джерела

Анотація:

Emotion classification via Electroencephalography (EEG) is used to find the relationships between EEG signals and human emotions. There are many available channels, which consist of electrodes capturing brainwave activity. Some applications may require a reduced number of channels and frequency bands to shorten the computation time, facilitate human comprehensibility, and develop a practical wearable. In prior research, different sets of channels and frequency bands have been used. In this study, a systematic way of selecting the set of channels and frequency bands has been investigated, and results shown that by using the reduced number of channels and frequency bands, it can achieve similar accuracies. The study also proposed a method used to select the appropriate features using the Relief F method. The experimental results of this study showed that the method could reduce and select appropriate features confidently and efficiently. Moreover, the Fuzzy Support Vector Machine (FSVM) is used to improve emotion classification accuracy, as it was found from this research that it performed better than the Support Vector Machine (SVM) in handling the outliers, which are typically presented in the EEG signals. Furthermore, the FSVM is treated as a black-box model, but some applications may need to provide comprehensible human rules. Therefore, the rules are extracted using the Classification and Regression Trees (CART) approach to provide human comprehensibility to the system. The FSVM and rule extraction experiments showed that The FSVM performed better than the SVM in classifying the emotion of interest used in the experiments, and rule extraction from the FSVM utilizing the CART (FSVM-CART) had a good trade-off between classification accuracy and human comprehensibility.

Стилі APA, Harvard, Vancouver, ISO та ін.

5

Duta, Ionut Cosmin. "Efficient and Effective Solutions for Video Classification." Doctoral thesis, Università degli studi di Trento, 2017. https://hdl.handle.net/11572/369314.

Повний текст джерела

Анотація:

The aim of this PhD thesis is to make a step forward towards teaching computers to understand videos in a similar way as humans do. In this work we tackle the video classification and/or action recognition tasks. This thesis was completed in a period of transition, the research community moving from traditional approaches (such as hand-crafted descriptor extraction) to deep learning. Therefore, this thesis captures this transition period, however, unlike image classification, where the state-of-the-art results are dominated by deep learning approaches, for video classification the deep learning approaches are not so dominant. As a matter of fact, most of the current state-of-the-art results in video classification are based on a hybrid approach where the hand-crafted descriptors are combined with deep features to obtain the best performance. This is due to several factors, such as the fact that video is a more complex data as compared to an image, therefore, more difficult to model and also that the video datasets are not large enough to train deep models with effective results. The pipeline for video classification can be broken down into three main steps: feature extraction, encoding and classification. While for the classification part, the existing techniques are more mature, for feature extraction and encoding there is still a significant room for improvement. In addition to these main steps, the framework contains some pre/post processing techniques, such as feature dimensionality reduction, feature decorrelation (for instance using Principal Component Analysis - PCA) and normalization, which can influence considerably the performance of the pipeline. One of the bottlenecks of the video classification pipeline is represented by the feature extraction step, where most of the approaches are extremely computationally demanding, what makes them not suitable for real-time applications. In this thesis, we tackle this issue, propose different speed-ups to improve the computational cost and introduce a new descriptor that can capture motion information from a video without the need of computing optical flow (which is very expensive to compute). Another important component for video classification is represented by the feature encoding step, which builds the final video representation that serves as input to a classifier. During the PhD, we proposed several improvements over the standard approaches for feature encoding. We also propose a new feature encoding approach for deep feature encoding. To summarize, the main contributions of this thesis are as follows3: (1) We propose several speed-ups for descriptor extraction, providing a version for the standard video descriptors that can run in real-time. We also investigate the trade-off between accuracy and computational efficiency; â€¨(2) We provide a new descriptor for extracting information from a video, which is very efficient to compute, being able to extract motion information without the need of extracting the optical flow; (3) We investigate different improvements over the standard encoding approaches for boosting the performance of the video classification pipeline.;(4) We propose a new feature encoding approach specifically designed for encoding local deep features, providing a more robust video representation.

Стилі APA, Harvard, Vancouver, ISO та ін.

6

Duta, Ionut Cosmin. "Efficient and Effective Solutions for Video Classification." Doctoral thesis, University of Trento, 2017. http://eprints-phd.biblio.unitn.it/2669/1/Duta_PhD-Thesis.pdf.

Повний текст джерела

Анотація:

The aim of this PhD thesis is to make a step forward towards teaching computers to understand videos in a similar way as humans do. In this work we tackle the video classification and/or action recognition tasks. This thesis was completed in a period of transition, the research community moving from traditional approaches (such as hand-crafted descriptor extraction) to deep learning. Therefore, this thesis captures this transition period, however, unlike image classification, where the state-of-the-art results are dominated by deep learning approaches, for video classification the deep learning approaches are not so dominant. As a matter of fact, most of the current state-of-the-art results in video classification are based on a hybrid approach where the hand-crafted descriptors are combined with deep features to obtain the best performance. This is due to several factors, such as the fact that video is a more complex data as compared to an image, therefore, more difficult to model and also that the video datasets are not large enough to train deep models with effective results. The pipeline for video classification can be broken down into three main steps: feature extraction, encoding and classification. While for the classification part, the existing techniques are more mature, for feature extraction and encoding there is still a significant room for improvement. In addition to these main steps, the framework contains some pre/post processing techniques, such as feature dimensionality reduction, feature decorrelation (for instance using Principal Component Analysis - PCA) and normalization, which can influence considerably the performance of the pipeline. One of the bottlenecks of the video classification pipeline is represented by the feature extraction step, where most of the approaches are extremely computationally demanding, what makes them not suitable for real-time applications. In this thesis, we tackle this issue, propose different speed-ups to improve the computational cost and introduce a new descriptor that can capture motion information from a video without the need of computing optical flow (which is very expensive to compute). Another important component for video classification is represented by the feature encoding step, which builds the final video representation that serves as input to a classifier. During the PhD, we proposed several improvements over the standard approaches for feature encoding. We also propose a new feature encoding approach for deep feature encoding. To summarize, the main contributions of this thesis are as follows3: (1) We propose several speed-ups for descriptor extraction, providing a version for the standard video descriptors that can run in real-time. We also investigate the trade-off between accuracy and computational efficiency;  (2) We provide a new descriptor for extracting information from a video, which is very efficient to compute, being able to extract motion information without the need of extracting the optical flow; (3) We investigate different improvements over the standard encoding approaches for boosting the performance of the video classification pipeline.;(4) We propose a new feature encoding approach specifically designed for encoding local deep features, providing a more robust video representation.

Стилі APA, Harvard, Vancouver, ISO та ін.

7

Stein, David Benjamin. "Efficient homomorphically encrypted privacy-preserving automated biometric classification." Thesis, Massachusetts Institute of Technology, 2020. https://hdl.handle.net/1721.1/130608.

Повний текст джерела

Анотація:

Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, September, 2020
Cataloged from the official PDF of thesis.
Includes bibliographical references (pages 87-96).
This thesis investigates whether biometric recognition can be performed on encrypted data without decrypting the data. Borrowing the concept from machine learning, we develop approaches that cache as much computation as possible to a pre-computation step, allowing for efficient, homomorphically encrypted biometric recognition. We demonstrate two algorithms: an improved version of the k-ishNN algorithm originally designed by Shaul et. al. in [1] and a homomorphically encrypted implementation of a SVM classifier. We provide experimental demonstrations of the accuracy and practical efficiency of both of these algorithms.
by David Benjamin Stein.
M. Eng.
M.Eng. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science

Стилі APA, Harvard, Vancouver, ISO та ін.

8

Graham, James T. "Efficient Generation of Reducts and Discerns for Classification." Ohio University / OhioLINK, 2007. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1175639229.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

9

Ekman, Carl. "Traffic Sign Classification Using Computationally Efficient Convolutional Neural Networks." Thesis, Linköpings universitet, Datorseende, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-157453.

Повний текст джерела

Анотація:

Traffic sign recognition is an important problem for autonomous cars and driver assistance systems. With recent developments in the field of machine learning, high performance can be achieved, but typically at a large computational cost. This thesis aims to investigate the relation between classification accuracy and computational complexity for the visual recognition problem of classifying traffic signs. In particular, the benefits of partitioning the classification problem into smaller sub-problems using prior knowledge in the form of shape or current region are investigated. In the experiments, the convolutional neural network (CNN) architecture MobileNetV2 is used, as it is specifically designed to be computationally efficient. To incorporate prior knowledge, separate CNNs are used for the different subsets generated when partitioning the dataset based on region or shape. The separate CNNs are trained from scratch or initialized by pre-training on the full dataset. The results support the intuitive idea that performance initially increases with network size and indicate a network size where the improvement stops. Including shape information using the two investigated methods does not result in a significant improvement. Including region information using pretrained separate classifiers results in a small improvement for small complexities, for one of the regions in the experiments. In the end, none of the investigated methods of including prior knowledge are considered to yield an improvement large enough to justify the added implementational complexity. However, some other methods are suggested, which would be interesting to study in future work.

Стилі APA, Harvard, Vancouver, ISO та ін.

10

Nurrito, Eugenio. "Scattering networks: efficient 2D implementation and application to melanoma classification." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2016. http://amslaurea.unibo.it/12261/.

Повний текст джерела

Анотація:

Machine learning is an approach to solving complex tasks. Its adoption is growing steadily and the several research works active on the field are publishing new interesting results regularly. In this work, the scattering network representation is used to transform raw images in a set of features convenient to be used in an image classification task, a fundamental machine learning application. This representation is invariant to translations and stable to small deformations. Moreover, it does not need any sort of training, since its parameters are fixed and only some hyper-parameters must be defined. A novel, efficient code implementation is proposed in this thesis. It leverages on the power of GPUs parallel architecture in order to achieve performance up to 20× faster than earlier codes, enabling near real-time applications. The source code of the implementation is also released open-source. The scattering network is then applied on a complex dataset of textures to test the behaviour in a general classification task. Given the conceptual complexity of the database, this unspecialized model scores a mere 32.9 % of accuracy. Finally, the scattering network is applied to a classification task of the medical field. A dataset of images of skin lesions is used in order to train a model able to classify malignant melanoma against benign lesions. Malignant melanoma is one of the most dangerous skin tumor, but if discovered in early stage there are generous probabilities to recover. The trained model has been tested and an interesting accuracy of 70.5 % (sensitivity 72.2 %, specificity 70.0 %) has been reached. While not being values high enough to permit the use of the model in a real application, this result demonstrates the great capabilities of the scattering network representation.

Стилі APA, Harvard, Vancouver, ISO та ін.

11

Sundström, Mikael. "Time and space efficient algorithms for packet classification and forwarding." Doctoral thesis, Luleå tekniska universitet, Datavetenskap, 2007. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-25804.

Повний текст джерела

Анотація:

The Internet consists of a mesh of routers (nodes) connected by links (edges) and the traffic through the Internet is divided into flows where each flow is an ordered sequence of packets, or datagrams. Each packet consists of a header and a piece of data, also referred to as payload. The header contains information about source and destination of the packet as well as some additional information. The primary function of an Internet router is to inspect the destination address of a packet, determine in which direction, i.e. on which link, to forward the packet on its next step towards its destination and then to forward the packet. This is called forwarding and is one of the problems considered in this thesis. Forwarding is essentially a data structuring problem where a local view of the Internet surrounding the router is represented in the form of a forwarding table, where the destination address can be looked up to determine the forwarding direction. In this thesis we develop a number of forwarding table data structures with different characteristics, both for supporting the current Internet Protocol IP version 4, which uses 32-bit addressing, as well as tomorrows IP version 6 featuring 128-bit addresses. The secondary function is the ability to determine whether to forward a packet or not based on the information from one or more header fields. While the entries stored in a forwarding table are 1-dimensional intervals, the entries used for packet classification are D-dimensional, where D is typically larger than or equal to 5. As a result, packet classification requires some degree of brute force, either in terms of parallel processing or huge amount of memory to achieve guaranteed performance. We have developed efficient algorithms for reducing the number of bits involved in the actual D-dimensional classification. These algorithms can be used to improve performance of both brute force hardware classifiers and heuristic software based classifiers. We first work on a purely theoretical problem called implicit selection where the solution as such does not have any impact whatsoever on forwarding and packet classification. However, in the process of solving the implicit selection problem, we have worked with numerous in-place techniques that becomes extremely useful when dealing with some aspects of packet classification and forwarding later on. It is interesting to see how techniques for achieving good performance in Asymptopia can be used also in the real world. The next step is to develop a data structure called hybrid tree where the keys are stored with minimal storage overhead and the lookup cost is independent of the number of keys in a non-trivial way. We also show how to engineer both static 128-bit single field classification without storage overhead as well as dynamic 128-bit classification with roughly 40% storage overhead that support reasonably fast update operations. Next we deal with compression state lookup for IPv6 header compression, using a dynamic move-to-root Patricia tree which adapts to the traffic pattern in an on-line fashion, followed by classification of fragmented packets, using a highly dynamic dictionary data structure featuring automated garbage collection. This is followed by two forwarding algorithms with completely different properties. The first algorithm is called XTC and supports fast lookups and good average compression but not incremental updates whereas the second algorithm is based on hybrid trees and features fast lookups and updates as well as good table compression. Finally, we present a packet classification algorithm which reduces both silicon area and power consumption for a hardware implementation. Our approach is to use hybrid trees to compress the addresses to reduce the total number of bits involved in final parallel processing. For IPv6 multifield classification, we can reduce the total number of transistors by 50% and the power consumption by over 80% compared to existing technologies for interval matching in hardware.

Godkänd; 2007; 20070504 (ysko)

Стилі APA, Harvard, Vancouver, ISO та ін.

12

Yoshioka, Atsushi. "Rule hashing for efficient packet classification in network intrusion detection." Online access for everyone, 2007. http://www.dissertations.wsu.edu/Thesis/Fall2007/a_yoshioka_120307.pdf.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

13

Sundström, Mikael. "Time and space efficient algorithms for packet classification and forwarding /." Luleå : Centre for Distance Spanning Technology : Luleå University of Technology, 2007. http://epubl.ltu.se/1402-1544/2007/15/.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

14

Khojandi, Aryan Iden. "Efficient MCMC inference for material detection and classification In tomography." Thesis, Massachusetts Institute of Technology, 2016. http://hdl.handle.net/1721.1/113183.

Повний текст джерела

Анотація:

Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016
Page 106 blank. Cataloged from PDF version of thesis.
Includes bibliographical references (pages 103-105).
Inferring the distribution of material in a volume of interest based on tomographic measurements is a ubiquitous problem. Accurate reconstruction of the configuration is a daunting task, especially when the sensor setup is not sufficiently comprehensive. The inverse problem corresponding to this reconstruction task is almost always ill-posed, but reasoning about the latent state remains possible. We investigate the problem of classifying volumes into object classes, using the latent configuration as an intermediate representation. We use the framework of Probabilistic Inference to implement MCMC sampling of realizations of the latent configuration conditioned on the measurements. We exploit conditional-independence properties of the graphical-model representation to sample many nodes in parallel and thereby render our sampling scheme much more efficient. We then reason over the samples and use a neural network to classify them. We demonstrate that classification is far more robust than reconstruction to the removal of sensors and interrogation angles. We also show the value of using the intermediate representation and a generative physics-based forward model by comparing these classification results with those obtained by foregoing the latent space and training a classifier directly on the sensor readings. The former benefits from regularization of the posterior distribution, allowing it to learn more rapidly and thereby perform significantly better when the number of labeled examples is limited, a reality present in the context of our problem and in many others.
by Aryan Iden Khojandi.
M. Eng.
M.Eng. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science

Стилі APA, Harvard, Vancouver, ISO та ін.

15

Naoto, Chiche Benjamin. "Video classification with memory and computation-efficient convolutional neural network." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-254678.

Повний текст джерела

Анотація:

Video understanding involves problems such as video classification, which consists in labeling videos based on their contents and frames. In many real world applications such as robotics, self-driving car, augmented reality, and Internet of Things (IoT), video understanding tasks need to be carried out in a real-time manner on a device with limited memory resources and computation capabilities, while meeting latency requirement.In this context, whereas neural networks that are memory and computationefficient i.e., that present a reasonable trade-off between accuracy and efficiency with respect to memory size and computational speed have been developed for image recognition tasks, studies about video classification have not made the most of these networks. To fill this gap, this project answers the following research question: how to build video classification pipelines that are based on memory and computation-efficient convolutional neural network (CNN) and how do the latter perform?In order to answer this question, the project builds and evaluates video classification pipelines that are new artefacts. This research involves triangulation (i.e., is qualitative and quantitative at the same time) and the empirical research method is used for the evaluation. The artefacts are based on one of existing memory and computation-efficient CNNs and its evaluation is based on a public video classification dataset and multiclass classification performance metrics. The case study research strategy is adopted: we try to generalize obtained results as far as possible to other memory and computation-efficient CNNs and video classification datasets. The abductive research approach is used in order to verify or falsify hypotheses. As results, the artefacts are built and show satisfactory performance metrics compared to baseline pipelines that are also developed in this thesis and metric values that are reported in other papers that used the same dataset. To conclude, video-classification pipelines based on memory and computation-efficient CNN can be built by designing and developing artefacts that combine approaches inspired from existing papers and new approaches and these artefacts present satisfactory performance. In particular, we observe that the drop in accuracy induced by memory and computation-efficient CNN when dealing with video frames is, to some extent, compensated by capturing temporal information via consideration of sequence of these frames.
Videoförståelse innebär problem som videoklassificering, som består av att annotera videor baserat på deras innehåll och ramar. I många verkliga applikationer, som robotteknik, självkörande bilar, förstärkt verklighet (AR) och sakernas internet (IoT) måste videoförståelsuppgifter utföras i realtid på en enhet med begränsade minnesresurser och beräkningsförmåga, samtidigt som det uppfyller krav på låg fördröjning.I det här sammanhanget, medan neurala nätverk som är minnesoch beräkningseffektiva, dvs den aktuella presentationen har en rimlig avvägning mellan noggrannhet och effektivitet (med avseende på minnesstorlek och beräkningar) utvecklats för bildigenkänningsuppgifter, har studier om videoklassificering inte fullt utnyttjat dessa tekniker. För att fylla denna lucka i vetenskapen svarar det här projektet på följande forskningsfråga: hur bygger man videoklassificeringspipelines som bygger på minne och beräkningseffektiva faltningsnätverk (CNN) och hur utförs det sistnämnda?För att svara på denna fråga bygger projektet och utvärderar videoklassificeringspipelines som är nya artefakter. Den empiriska forskningsmetoden används i denna forskning som involverar triangulering (dvs kvalitativt och kvantitativt samtidigt). Artefakterna är baserade på ett befintligt minnesoch beräkningseffektivt CNN och dess utvärdering baseras på en öppet tillgängligt dataset för videoklassificering. Fallstudieforskningsstrategin antas: Vi försöker att generalisera erhållna resultat så långt som möjligt till andra minnesoch beräkningseffektiva CNNs och videoklassificeringsdataset. Som resultat byggs artefakterna och visar tillfredsställande prestandamätningar jämfört med baslinjeresultat som också utvecklas i denna avhandling och värden som rapporteras i andra forskningspapper baserat på samma dataset. Sammanfattningsvis kan video-klassificeringsledningar baserade på ett minne och beräkningseffektivt CNN byggas genom att utforma och utveckla artefakter som kombinerar metoder inspirerade av befintliga papper och nya tillvägagångssätt och dessa artefakter presenterar tillfredsställande prestanda. I synnerhet observerar vi att nedgången i noggrannhet som induceras av ett minne och beräkningseffektivt CNN vid hantering av videoramar kompenseras till viss del genom att ta upp tidsmässig information genom beaktande av sekvensen av dessa ramar.

Стилі APA, Harvard, Vancouver, ISO та ін.

16

Bosio, Mattia. "Hierarchical information representation and efficient classification of gene expression microarray data." Doctoral thesis, Universitat Politècnica de Catalunya, 2014. http://hdl.handle.net/10803/145902.

Повний текст джерела

Анотація:

In the field of computational biology, microarryas are used to measure the activity of thousands of genes at once and create a global picture of cellular function. Microarrays allow scientists to analyze expression of many genes in a single experiment quickly and eficiently. Even if microarrays are a consolidated research technology nowadays and the trends in high-throughput data analysis are shifting towards new technologies like Next Generation Sequencing (NGS), an optimum method for sample classification has not been found yet. Microarray classification is a complicated task, not only due to the high dimensionality of the feature set, but also to an apparent lack of data structure. This characteristic limits the applicability of processing techniques, such as wavelet filtering or other filtering techniques that take advantage of known structural relation. On the other hand, it is well known that genes are not expressed independently from other each other: genes have a high interdependence related to the involved regulating biological process. This thesis aims to improve the current state of the art in microarray classification and to contribute to understand how signal processing techniques can be developed and applied to analyze microarray data. The goal of building a classification framework needs an exploratory work in which algorithms are constantly tried and adapted to the analyzed data. The developed algorithms and classification frameworks in this thesis tackle the problem with two essential building blocks. The first one deals with the lack of a priori structure by inferring a data-driven structure with unsupervised hierarchical clustering tools. The second key element is a proper feature selection tool to produce a precise classifier as an output and to reduce the overfitting risk. The main focus in this thesis is the binary data classification, field in which we obtained relevant improvements to the state of the art. The first key element is the data-driven structure, obtained by modifying hierarchical clustering algorithms derived from the Treelets algorithm from the literature. Several alternatives to the original reference algorithm have been tested, changing either the similarity metric to merge the feature or the way two feature are merged. Moreover, the possibility to include external sources of information from publicly available biological knowledge and ontologies to improve the structure generation has been studied too. About the feature selection, two alternative approaches have been studied: the first one is a modification of the IFFS algorithm as a wrapper feature selection, while the second approach involved an ensemble learning focus. To obtain good results, the IFFS algorithm has been adapted to the data characteristics by introducing new elements to the selection process like a reliability measure and a scoring system to better select the best feature at each iteration. The second feature selection approach is based on Ensemble learning, taking advantage of the microarryas feature abundance to implement a different selection scheme. New algorithms have been studied in this field, improving state of the art algorithms to the microarray data characteristic of small sample and high feature numbers. In addition to the binary classification problem, the multiclass case has been addressed too. A new algorithm combining multiple binary classifiers has been evaluated, exploiting the redundancy offered by multiple classifiers to obtain better predictions. All the studied algorithm throughout this thesis have been evaluated using high quality publicly available data, following established testing protocols from the literature to offer a proper benchmarking with the state of the art. Whenever possible, multiple Monte Carlo simulations have been performed to increase the robustness of the obtained results.
En el campo de la biología computacional, los microarrays son utilizados para medir la actividad de miles de genes a la vez y producir una representación global de la función celular. Los microarrays permiten analizar la expresión de muchos genes en un solo experimento, rápidamente y eficazmente. Aunque los microarrays sean una tecnología de investigación consolidada hoy en día y la tendencia es en utilizar nuevas tecnologías como Next Generation Sequencing (NGS), aun no se ha encontrado un método óptimo para la clasificación de muestras. La clasificación de muestras de microarray es una tarea complicada, debido al alto número de variables y a la falta de estructura entre los datos. Esta característica impide la aplicación de técnicas de procesado que se basan en relaciones estructurales, como el filtrado con wavelet u otras técnicas de filltrado. Por otro lado, los genes no se expresen independientemente unos de otros: los genes están inter-relacionados según el proceso biológico que les regula. El objetivo de esta tesis es mejorar el estado del arte en la clasi cación de microarrays y contribuir a entender cómo se pueden diseñar y aplicar técnicas de procesado de señal para analizar microarrays. El objetivo de construir un algoritmo de clasi cación, necesita un estudio de comprobaciones y adaptaciones de algoritmos existentes a los datos analizados. Los algoritmo desarrollados en esta tesis encaran el problema con dos bloques esenciales. El primero ataca la falta de estructura, derivando un árbol binario usando herramientas de clustering no supervisado. El segundo elemento fundamental para obtener clasificadores precisos reduciendo el riesgo de overfitting es un elemento de selección de variables. La principal tarea en esta tesis es la clasificación de datos binarios en la cual hemos obtenido mejoras relevantes al estado del arte. El primer paso es la generación de una estructura, para eso se ha utilizado el algoritmo Treelets disponible en la literatura. Múltiples alternativas a este algoritmo original han sido propuestas y evaluadas, cambiando las métricas de similitud o las reglas de fusión durante el proceso. Además, se ha estudiado la posibilidad de usar fuentes de información externas, como ontologías de información biológica, para mejorar la inferencia de la estructura. Se han estudiado dos enfoques diferentes para la selección de variables: el primero es una modificación del algoritmo IFFS y el segundo utiliza un esquema de aprendizaje con “ensembles”. El algoritmo IFFS ha sido adaptado a las características de microarrays para obtener mejores resultados, añadiendo elementos como la medida de fiabilidad y un sistema de evaluación para seleccionar la mejor variable en cada iteración. El método que utiliza “ensembles” aprovecha la abundancia de features de los microarrays para implementar una selección diferente. En este campo se han estudiado diferentes algoritmos, mejorando alternativas ya existentes al escaso número de muestras y al alto número de variables, típicos de los microarrays. El problema de clasificación con más de dos clases ha sido también tratado al estudiar un nuevo algoritmo que combina múltiples clasificadores binarios. El algoritmo propuesto aprovecha la redundancia ofrecida por múltiples clasificadores para obtener predicciones más fiables. Todos los algoritmos propuestos en esta tesis han sido evaluados con datos públicos y de alta calidad, siguiendo protocolos establecidos en la literatura para poder ofrecer una comparación fiable con el estado del arte. Cuando ha sido posible, se han aplicado simulaciones Monte Carlo para mejorar la robustez de los resultados.

Стилі APA, Harvard, Vancouver, ISO та ін.

17

Kanumuri, Sai Srilakshmi. "ON EVALUATING MACHINE LEARNING APPROACHES FOR EFFICIENT CLASSIFICATION OF TRAFFIC PATTERNS." Thesis, Blekinge Tekniska Högskola, Institutionen för datalogi och datorsystemteknik, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-14985.

Повний текст джерела

Анотація:

Context. With the increased usage of mobile devices and internet, the cellular network traffic has increased tremendously. This increase in network traffic has led to increased occurrences of communication failures among the network nodes. Each communication failure among the nodes is defined as a bad event and occurrence of one such bad event acts as a source of origin for several consecutive bad events. These bad events as a whole may eventually lead to node failures (not being able to respond to any data requests). But it requires a lot of human effort and cost to be invested in by the telecom companies to implement workarounds for these node failures. So, there is a need to prevent node failures from happening. This can be done by classifying the traffic patterns between nodes in the network, identify bad events in them and deliver the verdict immediately after their detection. Objectives. Through this study, we aim to find the best suitable machine learning algorithm which can efficiently classify the traffic patterns of SGSN-MME (SGSN (Serving GPRS (General Packet Radio Service) Support node) and MME (Mobility Management Entity). SGSN-MME is a network management tool designed to support the functionalities of two nodes namely SGSN and MME. We do this by evaluating the classification performance of four machine learning classification algorithms, namely Support vector machines (SVMs), Naïve Bayes, Decision trees and Random forests, on the traffic patterns of SGSN and MME. The selected classification algorithm will be developed in such a way that, whenever it detects a bad event, it notifies the user about it by prompting a message saying, “Something bad is happening”. Methods. We have conducted an experiment for evaluating the classification performance of our four chosen classification algorithms on the dataset provided by Ericsson AB, Gothenburg. The experimental dataset is a combination of three logs, one of which represents the traffic patterns in real network and the other two logs contain synthetic traffic patterns that are generated manually. The dataset is unlabeled with 720 data instances and 4019 attributes in it. K-means clustering is performed for dividing the data instances into groups and thereby proceed with labeling them accordingly into good and bad events. Also, since the number of attributes in the experimental dataset are more than the number of instances, feature selection is performed for selecting the subset of relevant attributes which best represents the whole data. All the chosen classification algorithms are trained and tested with ten-fold cross validation sets using the selected subset of attributes and the obtained performance measures like classification accuracy, F1 score and training time are analyzed and compared for selecting the best suitable one among them. Finally, the chosen algorithm is tested on unlabeled real data and the performance measures are analyzed in order to check if is able to detect the bad events correctly or not. Results. Experimental results showed that Random forests outperformed Support vector machines, Naïve Bayes and Decision trees with an average classification accuracy of 99.72% and average F1 score of 99.6, when classification accuracy and F1 score are considered. On the other hand, Naive Bayes outperformed Support vector machines, Decision trees and Random forests with an average training time of 0.010 seconds, when training time is considered. Also, the classification accuracy and F1 score of Random forests on unlabeled data are found to be 100% and 100 respectively. Conclusions. Since our study focuses on classifying the traffic patterns of SGSN-MME more accurately, classification accuracy and F1 score are of highest importance than the training time of algorithm. Therefore, based on experimental results, we conclude that Random forests is the best suitable machine learning algorithm for classifying the traffic patterns of SGSN -MME. However, Naive Bayes can be also used if classification has to be performed in the least time possible and with moderate accuracy (around 70%).

Стилі APA, Harvard, Vancouver, ISO та ін.

18

Ambardekar, Amol A. "Efficient vehicle tracking and classification for an automated traffic surveillance system." abstract and full text PDF (free order & download UNR users only), 2007. http://0-gateway.proquest.com.innopac.library.unr.edu/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqdiss&rft_dat=xri:pqdiss:1451111.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

19

Harte, T. P. "Efficient neural network classification of magnetic resonance images of the breast." Thesis, University of Cambridge, 1998. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.603805.

Повний текст джерела

Анотація:

This dissertation proposes a new method of automated malignancy recognition in contrast-enhanced magnetic resonance images of the human breast using the multi-layer perceptron (MLP) feed-forward neural network paradigm. The fundamental limitation is identified as being the efficiency of such a classifier: the computational budget demanded by multi-dimensional image data sets is immense. Without optimization the MLP flounders. This work proposes a new efficient algorithm for MLP classification of large multi-dimensional data sets based on fast discrete orthogonal transforms. This is possible given the straightforward observation that point-wise mask-processing of image data for classification purposes is linear spatial convolution. The novel observation, then, is that the MLP permits convolution at its input layer due to the linearity of the inner product which it computes. Optimized fast Fourier transform (FFT) are investigated and an order of magnitude improvement in the execution time of a four-dimensional transform is achieved over commonly-implemented FFTs. One of the principal retardations in common multi-dimensional FFTs is observed to be the lack of attention paid to memory-hierarchy considerations. A simple, but fast, technique for optimizing cache performance is implemented. The abstract mathematical basis for convolution is investigated and a finite integer number theoretic transform (NTT) approach suggests itself, because such a transform can be defined that is fast, purely real, has parsimony of memory requirements, and has compact hardware realizations. A new method for multi-dimensional convolution with long-length number theoretic transforms is presented. This is an extension of previous work where NTTs were implemented over pseudo-Mersenne, and pseudo-Fermat surrogate moduli. A suitable modulus is identified which allows long-length transforms that readily lend themselves to the multi-dimensional convolution problem involved in classifying large magnetic resonance image data sets.

Стилі APA, Harvard, Vancouver, ISO та ін.

20

Lee, Zed Heeje. "A graph representation of event intervals for efficient clustering and classification." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-281947.

Повний текст джерела

Анотація:

Sequences of event intervals occur in several application domains, while their inherent complexity hinders scalable solutions to tasks such as clustering and classification. In this thesis, we propose a novel spectral embedding representation of event interval sequences that relies on bipartite graphs. More concretely, each event interval sequence is represented by a bipartite graph by following three main steps: (1) creating a hash table that can quickly convert a collection of event interval sequences into a bipartite graph representation, (2) creating and regularizing a bi-adjacency matrix corresponding to the bipartite graph, (3) defining a spectral embedding mapping on the bi-adjacency matrix. In addition, we show that substantial improvements can be achieved with regard to classification performance through pruning parameters that capture the nature of the relations formed by the event intervals. We demonstrate through extensive experimental evaluation on five real-world datasets that our approach can obtain runtime speedups of up to two orders of magnitude compared to other state-of-the-art methods and similar or better clustering and classification performance.
Sekvenser av händelsesintervall förekommer i flera applikationsdomäner, medan deras inneboende komplexitet hindrar skalbara lösningar på uppgifter som kluster och klassificering. I den här avhandlingen föreslår vi en ny spektral inbäddningsrepresentation av händelsens intervallsekvenser som förlitar sig på bipartitgrafer. Mer konkret representeras varje händelsesintervalsekvens av en bipartitgraf genom att följa tre huvudsteg: (1) skapa en hashtabell som snabbt kan konvertera en samling händelsintervalsekvenser till en bipartig grafrepresentation, (2) skapa och reglera en bi-adjacency-matris som motsvarar bipartitgrafen, (3) definiera en spektral inbäddning på bi-adjacensmatrisen. Dessutom visar vi att väsentliga förbättringar kan uppnås med avseende på klassificeringsprestanda genom beskärningsparametrar som fångar arten av relationerna som bildas av händelsesintervallen. Vi demonstrerar genom omfattande experimentell utvärdering på fem verkliga datasätt att vår strategi kan erhålla runtime-hastigheter på upp till två storlekar jämfört med andra modernaste metoder och liknande eller bättre kluster- och klassificerings- prestanda.

Стилі APA, Harvard, Vancouver, ISO та ін.

21

Esteve, García Albert. "Design of Efficient TLB-based Data Classification Mechanisms in Chip Multiprocessors." Doctoral thesis, Universitat Politècnica de València, 2017. http://hdl.handle.net/10251/86136.

Повний текст джерела

Анотація:

Most of the data referenced by sequential and parallel applications running in current chip multiprocessors are referenced by a single thread, i.e., private. Recent proposals leverage this observation to improve many aspects of chip multiprocessors, such as reducing coherence overhead or the access latency to distributed caches. The effectiveness of those proposals depends to a large extent on the amount of detected private data. However, the mechanisms proposed so far either do not consider either thread migration or the private use of data within different application phases, or do entail high overhead. As a result, a considerable amount of private data is not detected. In order to increase the detection of private data, this thesis proposes a TLB-based mechanism that is able to account for both thread migration and private application phases with low overhead. Classification status in the proposed TLB-based classification mechanisms is determined by the presence of the page translation stored in other core's TLBs. The classification schemes are analyzed in multilevel TLB hierarchies, for systems with both private and distributed shared last-level TLBs. This thesis introduces a page classification approach based on inspecting other core's TLBs upon every TLB miss. In particular, the proposed classification approach is based on exchange and count of tokens. Token counting on TLBs is a natural and efficient way for classifying memory pages. It does not require the use of complex and undesirable persistent requests or arbitration, since when two ormore TLBs race for accessing a page, tokens are appropriately distributed classifying the page as shared. However, TLB-based ability to classify private pages is strongly dependent on TLB size, as it relies on the presence of a page translation in the system TLBs. To overcome that, different TLB usage predictors (UP) have been proposed, which allow a page classification unaffected by TLB size. Specifically, this thesis introduces a predictor that obtains system-wide page usage information by either employing a shared last-level TLB structure (SUP) or cooperative TLBs working together (CUP).
La mayor parte de los datos referenciados por aplicaciones paralelas y secuenciales que se ejecutan enCMPs actuales son referenciadas por un único hilo, es decir, son privados. Recientemente, algunas propuestas aprovechan esta observación para mejorar muchos aspectos de los CMPs, como por ejemplo reducir el sobrecoste de la coherencia o la latencia de los accesos a cachés distribuidas. La efectividad de estas propuestas depende en gran medida de la cantidad de datos que son considerados privados. Sin embargo, los mecanismos propuestos hasta la fecha no consideran la migración de hilos de ejecución ni las fases de una aplicación. Por tanto, una cantidad considerable de datos privados no se detecta apropiadamente. Con el fin de aumentar la detección de datos privados, proponemos un mecanismo basado en las TLBs, capaz de reclasificar los datos a privado, y que detecta la migración de los hilos de ejecución sin añadir complejidad al sistema. Los mecanismos de clasificación en las TLBs se han analizado en estructuras de varios niveles, incluyendo TLBs privadas y con un último nivel de TLB compartido y distribuido. Esta tesis también presenta un mecanismo de clasificación de páginas basado en la inspección de las TLBs de otros núcleos tras cada fallo de TLB. De forma particular, el mecanismo propuesto se basa en el intercambio y el cuenteo de tokens (testigos). Contar tokens en las TLBs supone una forma natural y eficiente para la clasificación de páginas de memoria. Además, evita el uso de solicitudes persistentes o arbitraje alguno, ya que si dos o más TLBs compiten para acceder a una página, los tokens se distribuyen apropiadamente y la clasifican como compartida. Sin embargo, la habilidad de los mecanismos basados en TLB para clasificar páginas privadas depende del tamaño de las TLBs. La clasificación basada en las TLBs se basa en la presencia de una traducción en las TLBs del sistema. Para evitarlo, se han propuesto diversos predictores de uso en las TLBs (UP), los cuales permiten una clasificación independiente del tamaño de las TLBs. En concreto, esta tesis presenta un sistema mediante el que se obtiene información de uso de página a nivel de sistema con la ayuda de un nivel de TLB compartida (SUP) o mediante TLBs cooperando juntas (CUP).
La major part de les dades referenciades per aplicacions paral·leles i seqüencials que s'executen en CMPs actuals són referenciades per un sol fil, és a dir, són privades. Recentment, algunes propostes aprofiten aquesta observació per a millorar molts aspectes dels CMPs, com és reduir el sobrecost de la coherència o la latència d'accés a memòries cau distribuïdes. L'efectivitat d'aquestes propostes depen en gran mesura de la quantitat de dades detectades com a privades. No obstant això, els mecanismes proposats fins a la data no consideren la migració de fils d'execució ni les fases d'una aplicació. Per tant, una quantitat considerable de dades privades no es detecta apropiadament. A fi d'augmentar la detecció de dades privades, aquesta tesi proposa un mecanisme basat en les TLBs, capaç de reclassificar les dades com a privades, i que detecta la migració dels fils d'execució sense afegir complexitat al sistema. Els mecanismes de classificació en les TLBs s'han analitzat en estructures de diversos nivells, incloent-hi sistemes amb TLBs d'últimnivell compartides i distribuïdes. Aquesta tesi presenta un mecanisme de classificació de pàgines basat en inspeccionar les TLBs d'altres nuclis després de cada fallada de TLB. Concretament, el mecanisme proposat es basa en l'intercanvi i el compte de tokens. Comptar tokens en les TLBs suposa una forma natural i eficient per a la classificació de pàgines de memòria. A més, evita l'ús de sol·licituds persistents o arbitratge, ja que si dues o més TLBs competeixen per a accedir a una pàgina, els tokens es distribueixen apropiadament i la classifiquen com a compartida. No obstant això, l'habilitat dels mecanismes basats en TLB per a classificar pàgines privades depenen de la grandària de les TLBs. La classificació basada en les TLBs resta en la presència d'una traducció en les TLBs del sistema. Per a evitar-ho, s'han proposat diversos predictors d'ús en les TLBs (UP), els quals permeten una classificació independent de la grandària de les TLBs. Específicament, aquesta tesi introdueix un predictor que obté informació d'ús de la pàgina a escala de sistema mitjançant un nivell de TLB compartida (SUP) or mitjançant TLBs cooperant juntes (CUP).
Esteve García, A. (2017). Design of Efficient TLB-based Data Classification Mechanisms in Chip Multiprocessors [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/86136
TESIS

Стилі APA, Harvard, Vancouver, ISO та ін.

22

Karmakar, Priyabrata. "Effective and efficient kernel-based image representations for classification and retrieval." Thesis, Federation University Australia, 2018. http://researchonline.federation.edu.au/vital/access/HandleResolver/1959.17/165515.

Повний текст джерела

Анотація:

Image representation is a challenging task. In particular, in order to obtain better performances in different image processing applications such as video surveillance, autonomous driving, crime scene detection and automatic inspection, effective and efficient image representation is a fundamental need. The performance of these applications usually depends on how accurately images are classified into their corresponding groups or how precisely relevant images are retrieved from a database based on a query. Accuracy in image classification and precision in image retrieval depend on the effectiveness of image representation. Existing image representation methods have some limitations. For example, spatial pyramid matching, which is a popular method incorporating spatial information in image-level representation, has not been fully studied to date. In addition, the strengths of pyramid match kernel and spatial pyramid matching are not combined for better image matching. Kernel descriptors based on gradient, colour and shape overcome the limitations of histogram-based descriptors, but suffer from information loss, noise effects and high computational complexity. Furthermore, the combined performance of kernel descriptors has limitations related to computational complexity, higher dimensionality and lower effectiveness. Moreover, the potential of a global texture descriptor which is based on human visual perception has not been fully explored to date. Therefore, in this research project, kernel-based effective and efficient image representation methods are proposed to address the above limitations. An enhancement is made to spatial pyramid matching in terms of improved rotation invariance. This is done by investigating different partitioning schemes suitable to achieve rotation-invariant image representation and the proposal of a weight function for appropriate level contribution in image matching. In addition, the strengths of pyramid match kernel and spatial pyramid are combined to enhance matching accuracy between images. The existing kernel descriptors are modified and improved to achieve greater effectiveness, minimum noise effects, less dimensionality and lower computational complexity. A novel fusion approach is also proposed to combine the information related to all pixel attributes, before the descriptor extraction stage. Existing kernel descriptors are based only on gradient, colour and shape information. In this research project, a texture-based kernel descriptor is proposed by modifying an existing popular global texture descriptor. Finally, all the contributions are evaluated in an integrated system. The performances of the proposed methods are qualitatively and quantitatively evaluated on two to four different publicly available image databases. The experimental results show that the proposed methods are more effective and efficient in image representation than existing benchmark methods.
Doctor of Philosophy

Стилі APA, Harvard, Vancouver, ISO та ін.

23

Zhang, Liang. "Classification and ranking of environmental recordings to facilitate efficient bird surveys." Thesis, Queensland University of Technology, 2017. https://eprints.qut.edu.au/107097/1/Liang_Zhang_Thesis.pdf.

Повний текст джерела

Анотація:

This thesis contributes novel computer-assisted techniques to facilitating bird species surveys from a large number of environmental audio recordings. These techniques are applicable to both manual and automated recognition of bird species by removing irrelevant audio data and prioritising those relevant data for efficient bird species detection. This work also represents a significant step towards using automated techniques to support experts and the general public to explore and gain a better understanding of vocal species.

Стилі APA, Harvard, Vancouver, ISO та ін.

24

Loza, Mencía Eneldo [Verfasser], Johannes [Akademischer Betreuer] Fürnkranz, and Hüllermeier [Akademischer Betreuer] Eyke. "Efficient Pairwise Multilabel Classification / Eneldo Loza Mencía. Betreuer: Johannes Fürnkranz ; Hüllermeier Eyke." Darmstadt : Universitäts- und Landesbibliothek Darmstadt, 2013. http://d-nb.info/1107769655/34.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

25

Immaneni, Raghu Nandan. "An efficient approach to machine learning based text classification through distributed computing." Thesis, California State University, Long Beach, 2015. http://pqdtopen.proquest.com/#viewpdf?dispub=1603338.

Повний текст джерела

Анотація:

Text Classification is one of the classical problems in computer science, which is primarily used for categorizing data, spam detection, anonymization, information extraction, text summarization etc. Given the large amounts of data involved in the above applications, automated and accurate training models and approaches to classify data efficiently are needed.

In this thesis, an extensive study of the interaction between natural language processing, information retrieval and text classification has been performed. A case study named “keyword extraction” that deals with ‘identifying keywords and tags from millions of text questions’ is used as a reference. Different classifiers are implemented using MapReduce paradigm on the case study and the experimental results are recorded using two newly built distributed computing Hadoop clusters. The main aim is to enhance the prediction accuracy, to examine the role of text pre-processing for noise elimination and to reduce the computation time and resource utilization on the clusters.

Стилі APA, Harvard, Vancouver, ISO та ін.

26

Franz, Torsten. "Spatial classification methods for efficient infiltration measurements and transfer of measuring results." Doctoral thesis, Technische Universität Dresden, 2006. https://tud.qucosa.de/id/qucosa%3A24942.

Повний текст джерела

Анотація:

A comprehensive knowledge about the infiltration situation in a sewer system is required for sustainable operation and cost-effective maintenance. Due to the high expenditures of infiltration measurements an optimisation of necessary measurement campaigns and a reliable transfer of measurement results to comparable areas are essential. Suitable methods were developed to improve the information yield of measurements by identifying appropriate measuring point locations and to assign measurement results to other potential measuring points by comparing sub-catchments and classifying reaches. The methods are based on the introduced similarity approach “Similar sewer conditions lead to similar infiltration/inflow rates” and on modified multivariate statistical techniques. The developed methods have a high degree of freedom against data needs. They were successfully tested on real and generated data. For suitable catchments it is estimated, that the optimisation potential amounts up to 40 % accuracy improvement compared to non-optimised measuring point configurations. With an acceptable error the transfer of measurement results was successful for up to 75 % of the investigated sub-catchments. With the proposed methods it is possible to improve the information about the infiltration status of sewer systems and to reduce the measurement related uncertainty which results in significant cost savings for the operator.
Für den nachhaltigen Betrieb und die kosteneffiziente Unterhaltung von Kanalnetzen ist eine genaue Bestimmung ihrer Fremdwassersituation notwendig. Eine Optimierung der dazu erforderlichen Messkampagnen und eine zuverlässige Übertragung der Messergebnisse auf vergleichbare Gebiete sind aufgrund der hohen Aufwendungen für Infiltrationsmessungen angezeigt. Dafür wurden geeignete Methoden entwickelt, welche einerseits den Informationsgehalt von Messungen durch die Bestimmung optimaler Messpunkte verbessern und andererseits Messresultate mittels Vergleichen von Teileinzugsgebieten und Klassifizierungen von Kanalhaltungen zu anderen potenziellen Messstellen zuordnen. Die Methoden basieren auf dem Ähnlichkeitsansatz “Ähnliche Kanaleigenschaften führen zu ähnlichen Fremdwasserraten” und nutzen modifizierte multivariate statistische Verfahren. Sie haben einen hohen Freiheitsgrad bezüglich der Datenanforderung. Die Methoden wurden erfolgreich anhand gemessener und generierter Daten validiert. Es wird eingeschätzt, dass das Optimierungspotenzial bei geeigneten Einzugsgebieten bis zu 40 % gegenüber nicht optimierten Mess-netzen beträgt. Die Übertragung der Messergebnisse war mit einem akzeptablen Fehler für bis zu 75 % der untersuchten Teileinzugsgebiete erfolgreich. Mit den entwickelten Methoden ist es möglich, den Kenntnisstand über die Fremdwassersituation eines Kanalnetzes zu verbessern und die messungsbezogene Unsicherheit zu verringern. Dies resultiert in Kostenersparnissen für den Betreiber.

Стилі APA, Harvard, Vancouver, ISO та ін.

27

Franz, Torsten. "Spatial classification methods for efficient infiltration measurements and transfer of measuring results." Doctoral thesis, Dresden : Inst. für Siedlungs- und Industriewasserwirtschaft, Techn. Univ, 2007. http://nbn-resolving.de/urn:nbn:de:swb:14-1181687412171-65072.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

28

Runhem, Lovisa. "Resource efficient travel mode recognition." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-217897.

Повний текст джерела

Анотація:

In this report we attempt to provide insights to how a resource efficient solution for transportation mode recognition can be implemented on a smartphone using the accelerometer and magnetometer as sensors for data collection. The proposed system uses a hierarchical classification process where instances are first classified as vehicles or non-vehicles, then as wheel or rail vehicles, and lastly as belonging to one of the transportation modes: bus, car, motorcycle, subway, or train. A virtual gyroscope is implemented as a low-power source of simulated gyroscope data. Features are extracted from the accelerometer, magnetometer and virtual gyroscope readings that are sampled at 30 Hz, before they are classified using machine learning algorithms from the WEKA machine learning library. An Android application was developed to classify real-time data, and the resource consumption of the application was measured using the Trepn profiler application. The proposed system achieves an overall accuracy of 82.7% and a vehicular accuracy of 84.9% using a 5 second window with 75% overlap while having an average power consumption of 8.5 mW.
I denna rapport försöker vi ge insikter om hur en resurseffektiv lösning för transportlägesigenkänning kan implementeras på en smartphone genom att använda accelerometern och magnetometern som sensorer för datainsamling. Det föreslagna systemet använder en hierarkisk klassificeringsprocess där instanser först klassificeras som fordon eller icke-fordon, sedan som hjul- eller järnvägsfordon, och slutligen som tillhörande ett av transportsätten: buss, bil, motorcykel, tunnelbana eller tåg. Ett virtuellt gyroskop implementeras som en lågenergi källa till simulerad gyroskopdata. Olika särdrag extraheras från accelerometer, magnetometer och virtuella gyroskopläsningar som samlas in vid 30 Hz, innan de klassificeras med hjälp av maskininlärningsalgoritmer från WEKA-maskinlärningsbiblioteket. En Android-applikation har utvecklats för att klassificera realtidsdata, och programmets resursförbrukning mättes med hjälp av Trepn profiler-applikationen. Det föreslagna systemet uppnår en övergripande noggrannhet av 82.7% och en fordonsnoggrannhet av 84.9% genom att använda ett 5 sekunders fönster med 75% överlappning med en genomsnittlig energiförbrukning av 8.5 mW.

Стилі APA, Harvard, Vancouver, ISO та ін.

29

Meléndez, Rodríguez Jaime Christian. "Supervised and unsupervised segmentation of textured images by efficient multi-level pattern classification." Doctoral thesis, Universitat Rovira i Virgili, 2010. http://hdl.handle.net/10803/8487.

Повний текст джерела

Анотація:

This thesis proposes new, efficient methodologies for supervised and unsupervised image segmentation based on texture information. For the supervised case, a technique for pixel classification based on a multi-level strategy that iteratively refines the resulting segmentation is proposed. This strategy utilizes pattern recognition methods based on prototypes (determined by clustering algorithms) and support vector machines. In order to obtain the best performance, an algorithm for automatic parameter selection and methods to reduce the computational cost associated with the segmentation process are also included. For the unsupervised case, the previous methodology is adapted by means of an initial pattern discovery stage, which allows transforming the original unsupervised problem into a supervised one. Several sets of experiments considering a wide variety of images are carried out in order to validate the developed techniques.
Esta tesis propone metodologías nuevas y eficientes para segmentar imágenes a partir de información de textura en entornos supervisados y no supervisados. Para el caso supervisado, se propone una técnica basada en una estrategia de clasificación de píxeles multinivel que refina la segmentación resultante de forma iterativa. Dicha estrategia utiliza métodos de reconocimiento de patrones basados en prototipos (determinados mediante algoritmos de agrupamiento) y máquinas de vectores de soporte. Con el objetivo de obtener el mejor rendimiento, se incluyen además un algoritmo para selección automática de parámetros y métodos para reducir el coste computacional asociado al proceso de segmentación. Para el caso no supervisado, se propone una adaptación de la metodología anterior mediante una etapa inicial de descubrimiento de patrones que permite transformar el problema no supervisado en supervisado. Las técnicas desarrolladas en esta tesis se validan mediante diversos experimentos considerando una gran variedad de imágenes.

Стилі APA, Harvard, Vancouver, ISO та ін.

30

He, Yuheng [Verfasser]. "Efficient Positioning Methods and Location-Based Classification in the IP Multimedia Subsystem / Yuheng He." München : Verlag Dr. Hut, 2013. http://d-nb.info/1033041629/34.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

31

Kolb, Dirk [Verfasser], and Elmar [Akademischer Betreuer] Nöth. "Efficient and Trainable Detection and Classification of Radio Signals / Dirk Kolb. Betreuer: Elmar Nöth." Erlangen : Universitätsbibliothek der Universität Erlangen-Nürnberg, 2012. http://d-nb.info/1025963725/34.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

32

Tanaka, Elly M., Dirk Lindemann, Tatiana Sandoval-Guzmán, Nicole Stanke, and Stephanie Protze. "Foamy virus for efficient gene transfer in regeneration studies." BioMed Central, 2013. https://tud.qucosa.de/id/qucosa%3A28877.

Повний текст джерела

Анотація:

Background Molecular studies of appendage regeneration have been hindered by the lack of a stable and efficient means of transferring exogenous genes. We therefore sought an efficient integrating virus system that could be used to study limb and tail regeneration in salamanders. Results We show that replication-deficient foamy virus (FV) vectors efficiently transduce cells in two different regeneration models in cell culture and in vivo. Injection of EGFP-expressing FV but not lentivirus vector particles into regenerating limbs and tail resulted in widespread expression that persisted throughout regeneration and reamputation pointing to the utility of FV for analyzing adult phenotypes in non-mammalian models. Furthermore, tissue specific transgene expression is achieved using FV vectors during limb regeneration. Conclusions FV vectors are efficient mean of transferring genes into axolotl limb/tail and infection persists throughout regeneration and reamputation. This is a nontoxic method of delivering genes into axolotls in vivo/ in vitro and can potentially be applied to other salamander species.

Стилі APA, Harvard, Vancouver, ISO та ін.

33

Gulbinas, Rimas Viktoras. "Motivating and Quantifying Energy Efficient Behavior among Commercial Building Occupants." Diss., Virginia Tech, 2014. http://hdl.handle.net/10919/64867.

Повний текст джерела

Анотація:

The environmental and economic consequences of climate change are severe and are being exacerbated by increased global carbon emissions. In the United States, buildings account for over 40% of all domestic and 7.4% of all global CO2 emissions and therefore represent an important target for energy conservation initiatives. Even marginal energy savings across all buildings could have a profound effect on carbon emission mitigation. In order to realize the full potential of energy savings in the building sector, it is essential to maximize the energy efficiency of both buildings and the behavior of occupants who occupy them. In this vein, systems that collect and communicate building energy-use information to occupants (i.e. eco-feedback systems) have been demonstrated to motivate building occupants to significantly reduce overall building energy consumption. Furthermore, advancements in building sensor technologies and data processing capabilities have enabled the development of advanced eco-feedback systems that also allow building occupants to share energy-use data with one another and to collectively act to reduce energy consumption. In addition to monitoring building occupant energy-use, these systems are capable of collecting data about specific conservation actions taken by occupants and their interactions with different features of the eco-feedback system. However, despite recent advancements in eco-feedback and building sensor technologies, very few systems have been specifically designed to enable research on the effectiveness of different behavior-based energy conservation strategies in commercial buildings. Consequently, very little research has been conducted on how access to such systems impacts the energy-use behavior of building occupants. In this dissertation, I describe how my research over the past three years has advanced an understanding of how eco-feedback systems can impact the energy-use behavior of commercial building occupants. First, I present a novel eco-feedback system that I developed to connect building occupants over energy-use data and empower them to conserve energy while also collecting data that enables controlled studies to quantify the impacts of a wide variety of energy conservation strategies. Next, I present a commercial building study in which this eco-feedback system was used to investigate the effects of organizational network dynamics on the energy-use of individuals. I then introduce a new set of metrics based on individual energy-use data that enables the classification of individuals and building occupant networks based on their energy-use efficiency and predictability. I describe the principles behind the construction of these metrics and demonstrate how these quantitative measures can be used to increase the efficacy of behavior-based conservation campaigns by enabling targeted interventions. I conclude the dissertation with a discussion about the limitations of my research and the new research avenues that it has enabled.
Ph. D.

Стилі APA, Harvard, Vancouver, ISO та ін.

34

Park, Sang-Hyeun [Verfasser], Johannes [Akademischer Betreuer] Fürnkranz, and Eyke [Akademischer Betreuer] Hüllermeier. "Efficient Decomposition-Based Multiclass and Multilabel Classification / Sang-Hyeun Park. Betreuer: Johannes Fürnkranz ; Eyke Hüllermeier." Darmstadt : Universitäts- und Landesbibliothek Darmstadt, 2012. http://d-nb.info/1106115678/34.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

35

Papapetrou, Odysseas [Verfasser]. "Approximate algorithms for efficient indexing, clustering, and classification in Peer-to-peer networks / Odysseas Papapetrou." Hannover : Technische Informationsbibliothek und Universitätsbibliothek Hannover (TIB), 2011. http://d-nb.info/1013287142/34.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

36

Makki, Sara. "An Efficient Classification Model for Analyzing Skewed Data to Detect Frauds in the Financial Sector." Thesis, Lyon, 2019. http://www.theses.fr/2019LYSE1339/document.

Повний текст джерела

Анотація:

Différents types de risques existent dans le domaine financier, tels que le financement du terrorisme, le blanchiment d’argent, la fraude de cartes de crédit, la fraude d’assurance, les risques de crédit, etc. Tout type de fraude peut entraîner des conséquences catastrophiques pour des entités telles que les banques ou les compagnies d’assurances. Ces risques financiers sont généralement détectés à l'aide des algorithmes de classification. Dans les problèmes de classification, la distribution asymétrique des classes, également connue sous le nom de déséquilibre de classe (class imbalance), est un défi très commun pour la détection des fraudes. Des approches spéciales d'exploration de données sont utilisées avec les algorithmes de classification traditionnels pour résoudre ce problème. Le problème de classes déséquilibrées se produit lorsque l'une des classes dans les données a beaucoup plus d'observations que l’autre classe. Ce problème est plus vulnérable lorsque l'on considère dans le contexte des données massives (Big Data). Les données qui sont utilisées pour construire les modèles contiennent une très petite partie de groupe minoritaire qu’on considère positifs par rapport à la classe majoritaire connue sous le nom de négatifs. Dans la plupart des cas, il est plus délicat et crucial de classer correctement le groupe minoritaire plutôt que l'autre groupe, comme la détection de la fraude, le diagnostic d’une maladie, etc. Dans ces exemples, la fraude et la maladie sont les groupes minoritaires et il est plus délicat de détecter un cas de fraude en raison de ses conséquences dangereuses qu'une situation normale. Ces proportions de classes dans les données rendent très difficile à l'algorithme d'apprentissage automatique d'apprendre les caractéristiques et les modèles du groupe minoritaire. Ces algorithmes seront biaisés vers le groupe majoritaire en raison de leurs nombreux exemples dans l'ensemble de données et apprendront à les classer beaucoup plus rapidement que l'autre groupe. Dans ce travail, nous avons développé deux approches : Une première approche ou classifieur unique basée sur les k plus proches voisins et utilise le cosinus comme mesure de similarité (Cost Sensitive Cosine Similarity K-Nearest Neighbors : CoSKNN) et une deuxième approche ou approche hybride qui combine plusieurs classifieurs uniques et fondu sur l'algorithme k-modes (K-modes Imbalanced Classification Hybrid Approach : K-MICHA). Dans l'algorithme CoSKNN, notre objectif était de résoudre le problème du déséquilibre en utilisant la mesure de cosinus et en introduisant un score sensible au coût pour la classification basée sur l'algorithme de KNN. Nous avons mené une expérience de validation comparative au cours de laquelle nous avons prouvé l'efficacité de CoSKNN en termes de taux de classification correcte et de détection des fraudes. D’autre part, K-MICHA a pour objectif de regrouper des points de données similaires en termes des résultats de classifieurs. Ensuite, calculez les probabilités de fraude dans les groupes obtenus afin de les utiliser pour détecter les fraudes de nouvelles observations. Cette approche peut être utilisée pour détecter tout type de fraude financière, lorsque des données étiquetées sont disponibles. La méthode K-MICHA est appliquée dans 3 cas : données concernant la fraude par carte de crédit, paiement mobile et assurance automobile. Dans les trois études de cas, nous comparons K-MICHA au stacking en utilisant le vote, le vote pondéré, la régression logistique et l’algorithme CART. Nous avons également comparé avec Adaboost et la forêt aléatoire. Nous prouvons l'efficacité de K-MICHA sur la base de ces expériences. Nous avons également appliqué K-MICHA dans un cadre Big Data en utilisant H2O et R. Nous avons pu traiter et analyser des ensembles de données plus volumineux en très peu de temps
There are different types of risks in financial domain such as, terrorist financing, money laundering, credit card fraudulence and insurance fraudulence that may result in catastrophic consequences for entities such as banks or insurance companies. These financial risks are usually detected using classification algorithms. In classification problems, the skewed distribution of classes also known as class imbalance, is a very common challenge in financial fraud detection, where special data mining approaches are used along with the traditional classification algorithms to tackle this issue. Imbalance class problem occurs when one of the classes have more instances than another class. This problem is more vulnerable when we consider big data context. The datasets that are used to build and train the models contain an extremely small portion of minority group also known as positives in comparison to the majority class known as negatives. In most of the cases, it’s more delicate and crucial to correctly classify the minority group rather than the other group, like fraud detection, disease diagnosis, etc. In these examples, the fraud and the disease are the minority groups and it’s more delicate to detect a fraud record because of its dangerous consequences, than a normal one. These class data proportions make it very difficult to the machine learning classifier to learn the characteristics and patterns of the minority group. These classifiers will be biased towards the majority group because of their many examples in the dataset and will learn to classify them much faster than the other group. After conducting a thorough study to investigate the challenges faced in the class imbalance cases, we found that we still can’t reach an acceptable sensitivity (i.e. good classification of minority group) without a significant decrease of accuracy. This leads to another challenge which is the choice of performance measures used to evaluate models. In these cases, this choice is not straightforward, the accuracy or sensitivity alone are misleading. We use other measures like precision-recall curve or F1 - score to evaluate this trade-off between accuracy and sensitivity. Our objective is to build an imbalanced classification model that considers the extreme class imbalance and the false alarms, in a big data framework. We developed two approaches: A Cost-Sensitive Cosine Similarity K-Nearest Neighbor (CoSKNN) as a single classifier, and a K-modes Imbalance Classification Hybrid Approach (K-MICHA) as an ensemble learning methodology. In CoSKNN, our aim was to tackle the imbalance problem by using cosine similarity as a distance metric and by introducing a cost sensitive score for the classification using the KNN algorithm. We conducted a comparative validation experiment where we prove the effectiveness of CoSKNN in terms of accuracy and fraud detection. On the other hand, the aim of K-MICHA is to cluster similar data points in terms of the classifiers outputs. Then, calculating the fraud probabilities in the obtained clusters in order to use them for detecting frauds of new transactions. This approach can be used to the detection of any type of financial fraud, where labelled data are available. At the end, we applied K-MICHA to a credit card, mobile payment and auto insurance fraud data sets. In all three case studies, we compare K-MICHA with stacking using voting, weighted voting, logistic regression and CART. We also compared with Adaboost and random forest. We prove the efficiency of K-MICHA based on these experiments

Стилі APA, Harvard, Vancouver, ISO та ін.

37

Gilman, Ekaterina, Anja Keskinarkaus, Satu Tamminen, Susanna Pirttikangas, Juha Röning, and Jukka Riekki. "Personalised assistance for fuel-efficient driving." Elsevier, 2015. https://publish.fid-move.qucosa.de/id/qucosa%3A72830.

Повний текст джерела

Анотація:

Recent advances in technology are changing the way how everyday activities are performed. Technologies in the traffic domain provide diverse instruments of gathering and analysing data for more fuel-efficient, safe, and convenient travelling for both drivers and passengers. In this article, we propose a reference architecture for a context-aware driving assistant system. Moreover, we exemplify this architecture with a real prototype of a driving assistance system called Driving coach. This prototype collects, fuses and analyses diverse information, like digital map, weather, traffic situation, as well as vehicle information to provide drivers in-depth information regarding their previous trip along with personalised hints to improve their fuel-efficient driving in the future. The Driving coach system monitors its own performance, as well as driver feedback to correct itself to serve the driver more appropriately.

Стилі APA, Harvard, Vancouver, ISO та ін.

38

Weickert, J., and T. Steidten. "Efficient time step parallelization of full multigrid techniques." Universitätsbibliothek Chemnitz, 1998. http://nbn-resolving.de/urn:nbn:de:bsz:ch1-199800466.

Повний текст джерела

Анотація:

This paper deals with parallelization methods for time-dependent problems where the time steps are shared out among the processors. A Full Multigrid technique serves as solution algorithm, hence information of the preceding time step and of the coarser grid is necessary to compute the solution at each new grid level. Applying the usual extrapolation formula to process this information, the parallelization will not be very efficient. We developed another extrapolation technique which causes a much higher parallelization effect. Test examples show that no essential loss of exactness appears, such that the method presented here shall be well-applicable.

Стилі APA, Harvard, Vancouver, ISO та ін.

39

Hambsch, Mike, Qianqian Lin, Ardalan Armin, Paul L. Burn, and Paul Meredith. "Efficient, monolithic large area organohalide perovskite solar cells." Royal Society of Chemistry, 2016. https://tud.qucosa.de/id/qucosa%3A36282.

Повний текст джерела

Анотація:

Solar cells based on organohalide perovskites (PSCs) have made rapid progress in recent years and are a promising emerging technology. An important next evolutionary step for PSCs is their up-scaling to commercially relevant dimensions. The main challenges in scaling PSCs to be compatible with current c-Si cells are related to the limited conductivity of the transparent electrode, and the processing of a uniform and defect-free organohalide perovskite layer over large areas. In this work we present a generic and simple approach to realizing efficient solution-processed, monolithic solar cells based on methylammonium lead iodide (CH₃NH₃PbI₃). Our devices have an aperture area of 25 cm² without relying on an interconnected strip design, therefore reducing the complexity of the fabrication process and enhancing compatibility with the c-Si cell geometry. We utilize simple aluminum grid lines to increase the conductivity of the transparent electrode. These grid lines were exposed to an UV-ozone plasma to grow a thin aluminum oxide layer. This dramatically improves the wetting and film forming of the organohalide perovskite junction on top of the lines, reducing the probability of short circuits between the grid and the top electrode. The best devices employing these modified grids achieved power conversion efficiencies of up to 6.8%.

Стилі APA, Harvard, Vancouver, ISO та ін.

40

Weise, Michael. "A framework for efficient hierarchic plate and shell elements." Technische Universität Chemnitz, 2017. https://monarch.qucosa.de/id/qucosa%3A20867.

Повний текст джерела

Анотація:

The Mindlin-Reissner plate model is widely used for the elastic deformation simulation of moderately thick plates. Shear locking occurs in the case of thin plates, which means slow convergence with respect to the mesh size. The Kirchhoff plate model does not show locking effects, but is valid only for thin plates. One would like to have a method suitable for both thick and thin plates. Several approaches are known to deal with the shear locking in the Mindlin-Reissner plate model. In addition to the well-known MITC elements and other approaches based on a mixed formulation, hierarchical methods have been developed in the recent years. These are based on the Kirchhoff model and add terms to account for shear deformations. We present some of these methods and develop a new hierarchic plate formulation. This new model can be discretised by a combination of C0 and C1 finite elements. Numerical tests show that the new formulation is locking-free and numerically efficient. We also give an extension of the model to a hierarchical Naghdi shell based on a Koiter shell formulation with unknowns in Cartesian coordinates.:1 Introduction 2 Plate theory 3 Shell theory 4 Conclusion

Стилі APA, Harvard, Vancouver, ISO та ін.

41

Hönel, Sebastian. "Efficient Automatic Change Detection in Software Maintenance and Evolutionary Processes." Licentiate thesis, Linnéuniversitetet, Institutionen för datavetenskap och medieteknik (DM), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-94733.

Повний текст джерела

Анотація:

Software maintenance is such an integral part of its evolutionary process that it consumes much of the total resources available. Some estimate the costs of maintenance to be up to 100 times the amount of developing a software. A software not maintained builds up technical debt, and not paying off that debt timely will eventually outweigh the value of the software, if no countermeasures are undertaken. A software must adapt to changes in its environment, or to new and changed requirements. It must further receive corrections for emerging faults and vulnerabilities. Constant maintenance can prepare a software for the accommodation of future changes. While there may be plenty of rationale for future changes, the reasons behind historical changes may not be accessible longer. Understanding change in software evolution provides valuable insights into, e.g., the quality of a project, or aspects of the underlying development process. These are worth exploiting, for, e.g., fault prediction, managing the composition of the development team, or for effort estimation models. The size of software is a metric often used in such models, yet it is not well-defined. In this thesis, we seek to establish a robust, versatile and computationally cheap metric, that quantifies the size of changes made during maintenance. We operationalize this new metric and exploit it for automated and efficient commit classification. Our results show that the density of a commit, that is, the ratio between its net- and gross-size, is a metric that can replace other, more expensive metrics in existing classification models. Models using this metric represent the current state of the art in automatic commit classification. The density provides a more fine-grained and detailed insight into the types of maintenance activities in a software project. Additional properties of commits, such as their relation or intermediate sojourn-times, have not been previously exploited for improved classification of changes. We reason about the potential of these, and suggest and implement dependent mixture- and Bayesian models that exploit joint conditional densities, models that each have their own trade-offs with regard to computational cost and complexity, and prediction accuracy. Such models can outperform well-established classifiers, such as Gradient Boosting Machines. All of our empirical evaluation comprise large datasets, software and experiments, all of which we have published alongside the results as open-access. We have reused, extended and created datasets, and released software packages for change detection and Bayesian models used for all of the studies conducted.

Стилі APA, Harvard, Vancouver, ISO та ін.

42

Schütze, Lars, and Jeronimo Castrillon. "Efficient Late Binding of Dynamic Function Compositions." ACM, 2019. https://tud.qucosa.de/id/qucosa%3A73178.

Повний текст джерела

Анотація:

Adaptive software becomes more and more important as computing is increasingly context-dependent. Runtime adaptability can be achieved by dynamically selecting and applying context-specific code. Role-oriented programming has been proposed as a paradigm to enable runtime adaptive software by design. Roles change the objects’ behavior at runtime and thus allow adapting the software to a given context. However, this increased variability and expressiveness has a direct impact on performance and memory consumption. We found a high overhead in the steady-state performance of executing compositions of adaptations. This paper presents a new approach to use run-time information to construct a dispatch plan that can be executed efficiently by the JVM. The concept of late binding is extended to dynamic function compositions. We evaluated the implementation with a benchmark for role-oriented programming languages leveraging context-dependent role semantics achieving a mean speedup of 2.79× over the regular implementation.

Стилі APA, Harvard, Vancouver, ISO та ін.

43

Loza, Mencía Eneldo. "Efficient Pairwise Multilabel Classification." Phd thesis, 2013. https://tuprints.ulb.tu-darmstadt.de/3226/7/loza12diss.pdf.

Повний текст джерела

Анотація:

Multilabel classification learning is the task of learning a mapping between objects and sets of possibly overlapping classes and has gained increasing attention in recent times. A prototypical application scenario for multilabel classification is the assignment of a set of keywords to a document, a frequently encountered problem in the text classification domain. With upcoming Web 2.0 technologies, this domain is extended by a wide range of tag suggestion tasks and the trend definitely is moving towards more data points and more labels. This work provides an extended introduction into the topic of multilabel classification, a detailed formalization and a comprehensive overview of the present state-of-the-art approaches. A commonly used solution for solving multilabel tasks is to decompose the original problem into several subproblems. These subtasks are usually easy to solve with conventional techniques. In contrast to the straightforward approach of training one classifier for independently predicting the relevance of each class (binary relevance), this work focuses particularly on the pairwise decomposition of the original problem in which a decision function is learned for each possible pair of classes. The main advantage of this approach, the improvement of the predictive quality, comes at the cost of its main disadvantage, the quadratic number of classifiers needed (with respect to the number of labels). This thesis presents a framework of efficient and scalable solutions for handling hundreds or thousands of labels despite the quadratic dependency. As it turns out, training such a pairwise ensemble of classifiers can be accomplished in linear time and only differs from the straightforward binary relevance approach (BR) by a factor relative to the average number of labels associated to an object, which is usually small. Furthermore, the integration of a smart scheduling technique inspired from sports tournaments safely reduces the quadratic number of base classifier evaluations to log-linear in practice. Combined with a simple yet fast and powerful learning algorithm for linear classifiers, data with a huge number of high dimensional points, which was not amenable to pairwise learning before, can be processed even under real-time conditions. The remaining bottleneck, the exploding memory requirements, is coped by taking advantage of an interesting property of linear classifiers, namely the possibility of dual reformulation as a linear combination of the training examples. The suitability is demonstrated on the novel EUR-Lex text collection, which particularly puts the main scalability issue of pairwise learning to test. With its almost 4,000 labels and 20,000 documents it is one of the most challenging test beds in multilabel learning to date. The dual formulation allows to maintain the mathematical equivalent to 8 million base learners needed for conventionally solving EUR-Lex in almost the same amount of space as binary relevance. Moreover, BR was clearly beaten in the experiments. A further contribution based on hierarchical decomposition and arrangement of the original problem allows to reduce the dependency on the number of labels to even sub-linearity. This approach opens the door to a wide range of new challenges and applications but simultaneously maintains the advantages of pairwise learning, namely the excellent predictive quality. It was even shown in comparison to the flat variant that it has a particularly positive effect on balancing recall and precision on datasets with a large number of labels. The improved scalability and efficiency allowed to apply pairwise classification to a set of large multilabel problems with a parallel base of data points but different domains of labels. A first attempt was made in this parallel tasks setting in order to investigate the exploitation of label dependencies by pairwise learning, with first encouraging results. The usage of multilabel learning techniques for the automatic annotation of texts constitutes a further obvious but so far missing connection to multi-task and multi-target learning. The presented solution considers the simultaneous tagging of words with different but possibly overlapping annotation schemes as a multilabel problem. This solution is expected to particularly benefit from approaches which exploit label dependencies. The ability of pairwise learning for this purpose is obviously restricted to pairwise relations, therefore a technique is investigated which explores label constellations that only exist locally for a subgroup of data points. In addition to the positive effect of the supplemental information, the experimental evaluation demonstrates an interesting insight with regards to the different behavior of several state-of-the-art approaches with respect to the optimization of particular multilabel measures, a controversial topic in multilabel classification.

Стилі APA, Harvard, Vancouver, ISO та ін.

44

Fontenelle-Augustin, Tiffany Natasha, and 蒂芙妮. "Prototype Selection for Efficient Classification." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/4ad4rc.

Повний текст джерела

Анотація:

碩士
國立清華大學
資訊系統與應用研究所
106
Abstract Big data has become ubiquitous and has become of great significance in academia. With the rapid increase in the enormity of big data, many problems have arisen when trying to manipulate the data for the purpose of forecasting. In this thesis, we highlight the problem of computational complexity when attempting to deal with big data. We propose a heuristic that can help to solve this problem by altering the existing method of classification so that it is more suitable for handling big data, thereby increasing efficiency. Our heuristic would not only be more suitable to handle big data but it would also be faster than traditional classification while keeping the accuracy approximately the same as traditional classification, if not higher. Our heuristic combines prototype selection with the traditional classification process. In our heuristic, a subset of the training data is selected as prototypes. The remaining data in the training set is discarded and we continue the process of classification by training the set of prototypes as opposed to the conventional method of using the entire training set. The learning algorithm used in our heuristic is the J48 decision tree algorithm. We evaluated our heuristic by comparing the classification accuracy and running time of our algorithm (using prototypes) with the traditional decision tree and naïve Bayes algorithms (using the entire training set). We also compared the amount of data used in our training phase versus the amount used in the training phases of conventional methods. We tested the data on five data sets ranging from sizes small to large. Findings prove that for big data, our heuristic saves memory space and is 100% faster than traditional classification with only a slight drop in accuracy.

Стилі APA, Harvard, Vancouver, ISO та ін.

45

Lin, Tien Min, and 林天民. "ABV+: An Efficient Packet Classification Algorithm." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/60951596238716607595.

Повний текст джерела

Анотація:

碩士
長庚大學
資訊工程學研究所
97
英文摘要 Packet classification is an important technique for Internet services such as firewalls, intrusion detection systems, and differentiated services. The main function of packet classification is to classify incoming packets into different flows according to predefined rules or polices in a router. Since the packet classification is an important component of routers, it has received broad attention. A number of algorithms have been proposed in past few years. Among them, bit vector based algorithms such as Lucent Bit Vector (BV) and Aggregated Bit Vector (ABV) are well-known for their simplicity to be implemented in hardware. However, both BV and ABV do not scale well in large filter databases due to their storage requirements. In this thesis, we propose a new bit vector based algorithm named Aggregated Bit Vector Plus (ABV+). The key idea behind ABV+ is to replace each bit vector with two values for the selected trie. Since the length of a bit vector is equal to the number of filter rules, replacing the bit vector with two short and fixed-length fields can significantly reduce the storage requirement. For synthetic databases with 50K filter rules, experimental results show that ABV+ can reduce the storage requirement by 65%, and the search time by 42% as compared with ABV.

Стилі APA, Harvard, Vancouver, ISO та ін.

46

Lin, Keng-Pei, and 林耕霈. "Efficient Data Classification with Privacy-Preservation." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/47593951590552273335.

Повний текст джерела

Анотація:

博士
國立臺灣大學
電機工程學研究所
99
Data classification is a widely used data mining technique which learns classifiers from labeled data to predict the labels of unlabeled instances. Among data classification algorithms, the support vector machine (SVM) shows the state-of-the-art performance. Data privacy is a critical concern in applying the data mining techniques. In this dissertation, we study how to achieve privacy-preservation in utilizing the SVM as well as how to efficiently generate the SVM classifier. Outsourcing has become popular in current cloud computing trends. Since the training algorithm of the SVM involves intensive computations, outsourcing to external service providers can benefit the data owner who possesses only limited computing resources. In outsourcing, the data privacy is a critical concern since there may be sensitive information contained in the data. In addition to the data, the classifier generated from the data is also private to the data owner. Existing privacy-preserving SVM outsourcing technique is weak in security. In Chapter 2, we propose a secure privacy-preserving SVM outsourcing scheme. In the proposed scheme, the data are perturbed by random linear transformation which is stronger in security than existing works. The service provider generates the SVM classifier from the perturbed data where the classifier is also in perturbed form and cannot be accessed by the service provider. In Chapter 3, we study the inherent privacy violation problem in the SVM classifier. The SVM trains a classifier by solving an optimization problem to decide which instances of the training dataset are support vectors, which are the necessarily informative instances to form the SVM classifier. Since support vectors are intact tuples taken from the training dataset, releasing the SVM classifier for public use or other parties will disclose the private content of support vectors. We propose an approach to post-process the SVM classifier to transform it to a privacy-preserving SVM classifier which does not disclose the private content of support vectors. It precisely approximates the decision function of the Gaussian kernel SVM classifier without exposing the individual content of support vectors. The privacy-preserving SVM classifier is able to release the prediction ability of the SVM classifier without violating the individual data privacy. The efficiency of the SVM is also an important issue since for large-scale data, the SVM solver converges slowly. In Chapter 4, we design an efficient SVM training algorithm based on the kernel approximation technique developed in Chapter 3. The kernel function brings powerful classification ability to the SVM, but it incurs additional computational cost in the training process. In contrast, there exist faster solvers to train the linear SVM. We capitalize the kernel approximation technique to compute the kernel evaluation by the dot product of explicit low-dimensional features to leverage the efficient linear SVM solver for training a nonlinear kernel SVM. In addition to an efficient training scheme, it obtains a privacy-preserving SVM classifier directly, i.e., its classifier does not disclose any individual instance. We conduct extensive experiments over our studies. Experimental results show that the privacy-preserving SVM outsourcing scheme, the privacy-preserving SVM classifier, and the efficient SVM training scheme based on kernel approximation achieve similar classification accuracy to a normal SVM classifier while obtains the properties of privacy-preservation and efficiency respectively.

Стилі APA, Harvard, Vancouver, ISO та ін.

47

Ahmed, Omar. "Towards Efficient Packet Classification Algorithms and Architectures." Thesis, 2013. http://hdl.handle.net/10214/7406.

Повний текст джерела

Анотація:

Packet classification plays an important role in next generation networks. Packet classification is important to fulfill the requirements for many applications including firewalls, multimedia services, intrusion detection services, and differentiated services to name just a few. Hardware solutions such as CAM/TCAM do not scale well in space. Current software-based packet classification algorithms exhibit relatively poor performance, prompting many researchers to concentrate on novel frameworks and architectures that employ both hardware and software components. In this thesis we propose two novel algorithms, Packet Classification with Incremental Update (PCIU) and Group Based Search packet classification Algorithm (GBSA), that are scalable and demonstrate excellent results in terms of preprocessing and classification. The PCIU algorithm is an innovative and efficient packet classification algorithm with a unique incremental update capability that demonstrates powerful results and is accessible for many different tasks and clients. The algorithm was further improved and made more available for a variety of applications through its implementation in hardware. Four such implementations are detailed and discussed in this thesis. A hardware accelerator based on an ESL approach, using Handel-C, resulted in a 22x faster classification than a pure software implementation running on a state of the art Xeon processor. An ASIP implementation achieved on average a 21x quicker classification. We also propose another novel algorithm, GBSA, for packet classification that is scalable, fast and efficient. On average the algorithm consumes 0.4 MB of memory for a 10k rule set. In the worst case scenario, the classification time per packet is 2 μs, and the pre-processing speed is 3M Rule/sec, based on a CPU operating at 3.4 GHz. The proposed algorithm was evaluated and compared to state-of-the-art techniques, such as RFC, HiCut, Tuple, and PCIU, using several standard benchmarks. The obtained results indicate that GBSA outperforms these algorithms in terms of speed, memory usage and pre-processing time. The algorithm, furthermore, was improved and made more accessible for a variety of applications through implementation in hardware. Three approaches using this algorithm are detailed and discussed in this thesis. The first approach was implemented using an Application Specific Instruction Processor (ASIP), while the others were pure RTL implementations using two different ESL flows (Impulse-C and Handel-C). The GBSA ASIP implementation achieved, on average, a 18x faster running speed than a pure software implementation operating on a Xeon processor. Conversely, the hardware accelerators (based on the ESL approaches) resulted in 9x faster processing.

Стилі APA, Harvard, Vancouver, ISO та ін.

48

Park, Sang-Hyeun. "Efficient Decomposition-Based Multiclass and Multilabel Classification." Phd thesis, 2012. http://tuprints.ulb.tu-darmstadt.de/2994/1/diss_shpark.pdf.

Повний текст джерела

Анотація:

Decomposition-based methods are widely used for multiclass and multilabel classification. These approaches transform or reduce the original task to a set of smaller possibly simpler problems and allow thereby often to utilize many established learning algorithms, which are not amenable to the original task. Even for directly applicable learning algorithms, the combination with a decomposition-scheme may outperform the direct approach, e.g., if the resulting subproblems are simpler (in the sense of learnability). This thesis addresses mainly the efficiency of decomposition-based methods and provides several contributions improving the scalability with respect to the number of classes or labels, number of classifiers and number of instances. Initially, we present two approaches improving the efficiency of the training phase of multiclass classification. The first of them shows that by minimizing redundant learning processes, which can occur in decomposition-based approaches for multiclass problems, the number of operations in the training phase can be significantly reduced. The second approach is tailored to Naive Bayes as base learner. By a tight coupling of Naive Bayes and arbitrary decompositions, it allows an even higher reduction of the training complexity with respect to the number of classifiers. Moreover, an approach improving the efficiency of the testing phase is also presented. It is capable of reducing testing effort with respect to the number of classes independently of the base learner. Furthermore, efficient decomposition-based methods for multilabel classification are also addressed in this thesis. Besides proposing an efficient prediction method, an approach rebalancing predictive performance, time and memory complexity is presented. Aside from the efficiency-focused methods, this thesis contains also a study about a special case of the multilabel classification setting, which is elaborated, formalized and tackled by a prototypical decomposition-based approach.

Стилі APA, Harvard, Vancouver, ISO та ін.

49

Magalhães, Ricardo Manuel Correia. "Energy Efficient Smartphone-based Users Activity Classification." Master's thesis, 2019. https://hdl.handle.net/10216/119355.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

50

Tzou, Yi-ru, and 鄒依儒. "Cache Strategies for Efficient Lazy Associative Classification." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/05991093986242149764.

Повний текст джерела

Анотація:

碩士
國立中正大學
資訊工程所
98
Lazy associative classification generates rules according to the selected features from training instance that are closely related the testing instance. When a large data set is mined a huge number of rules. The rules are the frequent itemsets that they are bigger than minimum threshold. The frequent itemsests are originated in combined the candidate itemsets that they are gernerted from training data. Hence, lazy associative classification spends a lot of time on calculating the threshold with every itemset that the features from the training instances. These rules will be used repeatedly. Therefore, we use the cache strategies in lazy associative classification to solve the efficiency for accuracy and speed. We add the new generated rule in cache, but the cache size is bigger than the setting cache size. When cache is full, we use four method－FIFO, LRU, DLC(Discarding the lowest confidence)and DLD(Discarding the lowest difference) －that they are used to discard the rules in cache. For each rule, there are recorded five data in cache, include the support value, the confidence value, FIFO index, LRU index and deference, they are used to discard the excess rule when the rule size is full. Lazy associative classification is classified fast with using these data and keeping the accuracy. In this paper, we use automatic setting the average confidence to instead the human setting the confidence. There are two datasets: Edoc and ModApte-Top10. The accuracy of our experimental results is improved to 3.11% and classified time is saved to 1.27 times in the Edoc. In the ModApte-Top10, the accuracy is improved to 2.24% and classified time is saved to 3.95 times.

Стилі APA, Harvard, Vancouver, ISO та ін.

Дисертації з теми "EFFICIENT CLASSIFICATION"

Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями