Siga este enlace para ver otros tipos de publicaciones sobre el tema: Text Data Streams.

Tesis sobre el tema "Text Data Streams"

Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros

Elija tipo de fuente:

Consulte los 28 mejores tesis para su investigación sobre el tema "Text Data Streams".

Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.

También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.

Explore tesis sobre una amplia variedad de disciplinas y organice su bibliografía correctamente.

1

Snowsill, Tristan. "Data mining in text streams using suffix trees". Thesis, University of Bristol, 2012. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.556708.

Texto completo
Resumen
Data mining in text streams, or text stream mining, is an increasingly im- portant topic for a number of reasons, including the recent explosion in the availability of textual data and an increasing need for people and organi- sations to process and understand as much of that information as possible, from single users to multinational corporations and governments. In this thesis we present a data structure based on a generalised suffix tree which is capable of solving a number of text stream mining tasks. It can be used to detect changes in the text stream, detect when chunks of text are reused and detect events through identifying when the frequencies of phrases change in a statistically significant way. Suffix trees have been used for many years in the areas of combinatorial pattern matching and computational genomics. In this thesis we demonstrate how the suffix tree can become more widely applicable by making it possible to use suffix trees to analyse streams of data rather than static data sets, opening up a number of future avenues for research. The algorithms which we present are designed to be efficient in an on-line setting by having time complexity independent of the total amount of text seen and polynomial in the rate at which text is seen. We demonstrate the effectiveness of our methods on a large text stream comprising thousands of documents every day. This text stream is the stream of text news coming from over 600 online news outlets and the results ob- tained are of interest to news consumers, journalists and social scientists.
Los estilos APA, Harvard, Vancouver, ISO, etc.
2

Mejova, Yelena Aleksandrovna. "Sentiment analysis within and across social media streams". Diss., University of Iowa, 2012. https://ir.uiowa.edu/etd/2943.

Texto completo
Resumen
Social media offers a powerful outlet for people's thoughts and feelings -- it is an enormous ever-growing source of texts ranging from everyday observations to involved discussions. This thesis contributes to the field of sentiment analysis, which aims to extract emotions and opinions from text. A basic goal is to classify text as expressing either positive or negative emotion. Sentiment classifiers have been built for social media text such as product reviews, blog posts, and even Twitter messages. With increasing complexity of text sources and topics, it is time to re-examine the standard sentiment extraction approaches, and possibly to re-define and enrich sentiment definition. Thus, this thesis begins by introducing a rich multi-dimensional model based on Affect Control Theory and showing its usefulness in sentiment classification. Next, unlike sentiment analysis research to date, we examine sentiment expression and polarity classification within and across various social media streams by building topical datasets. When comparing Twitter, reviews, and blogs on consumer product topics, we show that it is possible, and sometimes even beneficial, to train sentiment classifiers on text sources which are different from the target text. This is not the case, however, when we compare political discussion in YouTube comments to Twitter posts, demonstrating the difficulty of political sentiment classification. We further show that neither discussion volume or sentiment expressed in these streams correspond well to national polls, putting in question recent research linking the two. The complexity of political discussion also calls for a more specific re-definition of "sentiment" as agreement with the author's political stance. We conclude that sentiment must be defined, and tools for its analysis designed, within a larger framework of human interaction.
Los estilos APA, Harvard, Vancouver, ISO, etc.
3

Hill, Geoffrey. "Sensemaking in Big Data: Conceptual and Empirical Approaches to Actionable Knowledge Generation from Unstructured Text Streams". Kent State University / OhioLINK, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=kent1433597354.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
4

Pinho, Roberto Dantas de. "Espaço incremental para a mineração visual de conjuntos dinâmicos de documentos". Universidade de São Paulo, 2009. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-14092009-123807/.

Texto completo
Resumen
Representações visuais têm sido adotadas na exploração de conjuntos de documentos, auxiliando a extração de conhecimento sem que seja necessária a análise individual de milhares de textos. Mapas de documentos, em particular, apresentam documentos individualmente representados espalhados em um espaço visual, refletindo suas relações de similaridade ou conexões. A construção destes mapas de documentos inclui, entre outras tarefas, o posicionamento dos textos e a identificação automática de áreas temáticas. Um desafio é a visualização de conjuntos dinâmicos de documentos. Na visualização de informação, é comum que alterações no conjunto de dados tenham um forte impacto na organização do espaço visual, dificultando a manutenção, por parte do usuário, de um mapa mental que o auxilie na interpretação dos dados apresentados e no acompanhamento das mudanças sofridas pelo conjunto de dados. Esta tese introduz um algoritmo para a construção dinâmica de mapas de documentos, capaz de manter uma disposição coerente à medida que elementos são adicionados ou removidos. O processo, inerentemente incremental e de baixa complexidade, utiliza um espaço bidimensional dividido em células, análogo a um tabuleiro de xadrez. Resultados consistentes foram alcançados em comparação com técnicas não incrementais de projeção de dados multidimensionais, tendo sido a técnica aplicada também em outros domínios, além de conjuntos de documentos. A visualização resultante não está sujeita a problemas de oclusão. A identificação de áreas temáticas é alcançada com técnicas de extração de regras de associação representativas para a identificação automática de tópicos. A combinação da extração de tópicos com a projeção incremental de dados em um processo integrado de mineração visual de textos compõe um espaço visual em que tópicos e áreas de interesse são destacados e atualizados à medida que o conjunto de dados é modificado
Visual representations are often adopted to explore document collections, assisting in knowledge extraction, and avoiding the thorough analysis of thousands of documents. Document maps present individual documents in visual spaces in such a way that their placement reflects similarity relations or connections between them. Building these maps requires, among other tasks, placing each document and identifying interesting areas or subsets. A current challenge is to visualize dynamic data sets. In Information Visualization, adding and removing data elements can strongly impact the underlying visual space. That can prevent a user from preserving a mental map that could assist her/him on understanding the content of a growing collection of documents or tracking changes on the underlying data set. This thesis presents a novel algorithm to create dynamic document maps, capable of maintaining a coherent disposition of elements, even for completely renewed sets. The process is inherently incremental, has low complexity and places elements on a 2D grid, analogous to a chess board. Consistent results were obtained as compared to (non-incremental) multidimensional scaling solutions, even when applied to visualizing domains other than document collections. Moreover, the corresponding visualization is not susceptible to occlusion. To assist users in indentifying interesting subsets, a topic extraction technique based on association rule mining was also developed. Together, they create a visual space where topics and interesting subsets are highlighted and constantly updated as the data set changes
Los estilos APA, Harvard, Vancouver, ISO, etc.
5

Wu, Yingyu. "Using Text based Visualization in Data Analysis". Kent State University / OhioLINK, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=kent1398079502.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
6

Young, Tom y Mark Wigent. "Dynamic Formatting of the Test Article Data Stream". International Foundation for Telemetering, 2010. http://hdl.handle.net/10150/605948.

Texto completo
Resumen
ITC/USA 2010 Conference Proceedings / The Forty-Sixth Annual International Telemetering Conference and Technical Exhibition / October 25-28, 2010 / Town and Country Resort & Convention Center, San Diego, California
Los estilos APA, Harvard, Vancouver, ISO, etc.
7

Crossman, Nathaniel C. "Stream Clustering And Visualization Of Geotagged Text Data For Crisis Management". Wright State University / OhioLINK, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=wright1590957641168863.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
8

Franco, Tom. "Performing Frame Transformations to Correctly Stream Position Data". University of Cincinnati / OhioLINK, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1491562251744704.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
9

Vickers, Stephen R. "Examining the Duplication of Flight Test Data Centers". International Foundation for Telemetering, 2011. http://hdl.handle.net/10150/595653.

Texto completo
Resumen
ITC/USA 2011 Conference Proceedings / The Forty-Seventh Annual International Telemetering Conference and Technical Exhibition / October 24-27, 2011 / Bally's Las Vegas, Las Vegas, Nevada
Aircraft flight test data processing began with on site data analysis from the very first aircraft design. This method of analyzing flight data continued from the early 1900's to the present day. Today each new aircraft program builds a separate data center for post flight processing (PFP) to include operations, system administration, and management. Flight Test Engineers (FTE) are relocated from geographical areas to ramp up the manpower needed to analyze the PFP data center products and when the first phase of aircraft design and development is completed the FTE headcount is reduced with the FTE either relocated to another program or the FTE finds other employment. This paper is a condensed form of the research conducted by the author on how the methodology of continuing to build PFP data centers cost the aircraft company millions of dollars in development and millions of dollars on relocation plus relocation stress effects on FTE which can hinder productivity. This method of PFP data center development can be avoided by the consolidation of PFP data centers using present technology.
Los estilos APA, Harvard, Vancouver, ISO, etc.
10

Yates, James William. "Mixing Staged Data Flow and Stream Computing Techniques in Modern Telemetry Data Acquisition/Processing Architectures". International Foundation for Telemetering, 1999. http://hdl.handle.net/10150/608707.

Texto completo
Resumen
International Telemetering Conference Proceedings / October 25-28, 1999 / Riviera Hotel and Convention Center, Las Vegas, Nevada
Today’s flight test processing systems must handle many more complex data formats than just the PCM and analog FM data streams of yesterday. Many flight test programs, and their respective test facilities, are looking to leverage their computing assets across multiple customers and programs. Typically, these complex programs require the ability to handle video, packet, and avionics bus data in real time, in addition to handling the more traditional PCM format. Current and future telemetry processing systems must have an architecture that will support the acquisition and processing of these varied data streams. This paper describes various architectural designs of both staged data flow and stream computing architectures, including current and future implementations. Processor types, bus design, and the effects of varying data types, including PCM, video, and packet telemetry, will be discussed.
Los estilos APA, Harvard, Vancouver, ISO, etc.
11

Boppudi, Srimanth. "Further Investigation of a New Traction Stress Based Shear Strength Characterization Method with Test Data". ScholarWorks@UNO, 2014. http://scholarworks.uno.edu/td/1847.

Texto completo
Resumen
In this thesis, a new traction stress based method for characterizing shear strength is investigated by carrying out a series of shear strength tests. The AWS method for the calculation of shear strength shows significant discrepancies between longitudinal and transverse specimens. The main purpose of this new traction based definition for shear strength is to demonstrate that there exists a single shear strength value regardless of specimen geometry and loading conditions. With this new approach a better correlation between shear strength values for transverse and longitudinal specimens is achieved. Special issues occur with the multi-pass welds in regards to the failure angle. The AWS equation does not account to different failure angles of the specimens, it only assumes 45o failure angle in all the cases, but the new approach takes into account the different failure angles. Finally with this method a quantitative weld sizing can be achieved for fillet welds.
Los estilos APA, Harvard, Vancouver, ISO, etc.
12

Kříž, Blažej. "Framework pro tvorbu generátorů dat". Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2012. http://www.nusl.cz/ntk/nusl-236623.

Texto completo
Resumen
This master's thesis is focused on the problem of data generation. At the beginning, it presents several applications for data generation and describes the data generation process. Then it deals with development of framework for data generators and demonstrational application for validating the framework.
Los estilos APA, Harvard, Vancouver, ISO, etc.
13

Luo, Dan y Yajing Ran. "Micro Drivers behind the Changes of CET1 Capital Ratio : An empirical analysis based on the results of EU-wide stress test". Thesis, Internationella Handelshögskolan, Högskolan i Jönköping, IHH, Företagsekonomi, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:hj:diva-44140.

Texto completo
Resumen
Background: Stress tests have been increasingly used as a part of the supervisory tool by national regulators after the financial crisis, which can also be used to conduct authorities’ supervisory for determining bank capital levels, assessing the health of a bank. Purpose: The main purpose of this study is to assess whether some micro factors play important roles on the changes of Common Equity Tier One Capital Ratio (between the bank accounting value and the stress testing results under the adverse scenarios).  Our secondary purpose is to investigate if our empirical results will help to provide some theoretical suggestions to regulators when they exercise stress tests.   Method: An empirical analysis by using Panel Data, introducing GARCH model to measure volatility.   Empirical foundation: The results of EU-wide stress tests and bank financial statements   Conclusion: The coefficient associated with non-performing loans to total loans is positively significant and the coefficient associated with bank size is negatively significant.  In addition, the financial system of strong banks is better to absorb financial shocks. These findings are useful, as banks is a reflection of the financial stability of an economic entity, we can use these findings as another reason to pay attention to the process of the stress testing rather just stress testing results.
Los estilos APA, Harvard, Vancouver, ISO, etc.
14

Faleiros, Thiago de Paulo. "Propagação em grafos bipartidos para extração de tópicos em fluxo de documentos textuais". Universidade de São Paulo, 2016. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-10112016-105854/.

Texto completo
Resumen
Tratar grandes quantidades de dados é uma exigência dos modernos algoritmos de mineração de texto. Para algumas aplicações, documentos são constantemente publicados, o que demanda alto custo de armazenamento em longo prazo. Então, é necessário criar métodos de fácil adaptação para uma abordagem que considere documentos em fluxo, e que analise os dados em apenas um passo sem requerer alto custo de armazenamento. Outra exigência é a de que essa abordagem possa explorar heurísticas a fim de melhorar a qualidade dos resultados. Diversos modelos para a extração automática das informações latentes de uma coleção de documentos foram propostas na literatura, dentre eles destacando-se os modelos probabilísticos de tópicos. Modelos probabilísticos de tópicos apresentaram bons resultados práticos, sendo estendidos para diversos modelos com diversos tipos de informações inclusas. Entretanto, descrever corretamente esses modelos, derivá-los e em seguida obter o apropriado algoritmo de inferência são tarefas difíceis, exigindo um tratamento matemático rigoroso para as descrições das operações efetuadas no processo de descoberta das dimensões latentes. Assim, para a elaboração de um método simples e eficiente para resolver o problema da descoberta das dimensões latentes, é necessário uma apropriada representação dos dados. A hipótese desta tese é a de que, usando a representação de documentos em grafos bipartidos, é possível endereçar problemas de aprendizado de máquinas, para a descoberta de padrões latentes em relações entre objetos, por exemplo nas relações entre documentos e palavras, de forma simples e intuitiva. Para validar essa hipótese, foi desenvolvido um arcabouço baseado no algoritmo de propagação de rótulos utilizando a representação em grafos bipartidos. O arcabouço, denominado PBG (Propagation in Bipartite Graph), foi aplicado inicialmente para o contexto não supervisionado, considerando uma coleção estática de documentos. Em seguida, foi proposta uma versão semissupervisionada, que considera uma pequena quantidade de documentos rotulados para a tarefa de classificação transdutiva. E por fim, foi aplicado no contexto dinâmico, onde se considerou fluxo de documentos textuais. Análises comparativas foram realizadas, sendo que os resultados indicaram que o PBG é uma alternativa viável e competitiva para tarefas nos contextos não supervisionado e semissupervisionado.
Handling large amounts of data is a requirement for modern text mining algorithms. For some applications, documents are published constantly, which demand a high cost for long-term storage. So it is necessary easily adaptable methods for an approach that considers documents flow, and be capable of analyzing the data in one step without requiring the high cost of storage. Another requirement is that this approach can exploit heuristics in order to improve the quality of results. Several models for automatic extraction of latent information in a collection of documents have been proposed in the literature, among them probabilistic topic models are prominent. Probabilistic topic models achieve good practical results, and have been extended to several models with different types of information included. However, properly describe these models, derive them, and then get appropriate inference algorithms are difficult tasks, requiring a rigorous mathematical treatment for descriptions of operations performed in the latent dimensions discovery process. Thus, for the development of a simple and efficient method to tackle the problem of latent dimensions discovery, a proper representation of the data is required. The hypothesis of this thesis is that by using bipartite graph for representation of textual data one can address the task of latent patterns discovery, present in the relationships between documents and words, in a simple and intuitive way. For validation of this hypothesis, we have developed a framework based on label propagation algorithm using the bipartite graph representation. The framework, called PBG (Propagation in Bipartite Graph) was initially applied to the unsupervised context for a static collection of documents. Then a semi-supervised version was proposed which need only a small amount of labeled documents to the transductive classification task. Finally, it was applied in the dynamic context in which flow of textual data was considered. Comparative analyzes were performed, and the results indicated that the PBG is a viable and competitive alternative for tasks in the unsupervised and semi-supervised contexts.
Los estilos APA, Harvard, Vancouver, ISO, etc.
15

Al-Ajmi, Adel. "Wellbore stability analysis based on a new true-triaxial failure criterion". Doctoral thesis, Stockholm : Department of Land and Water Resources Engineering, Royal Institute of Technology, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-4037.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
16

GONÇALVES, JÚNIOR Paulo Mauricio. "Multivariate non-parametric statistical tests to reuse classifiers in recurring concept drifting environments". Universidade Federal de Pernambuco, 2013. https://repositorio.ufpe.br/handle/123456789/12226.

Texto completo
Resumen
Data streams are a recent processing model where data arrive continuously, in large quantities, at high speeds, so that they must be processed on-line. Besides that, several private and public institutions store large amounts of data that also must be processed. Traditional batch classi ers are not well suited to handle huge amounts of data for basically two reasons. First, they usually read the available data several times until convergence, which is impractical in this scenario. Second, they imply that the context represented by data is stable in time, which may not be true. In fact, the context change is a common situation in data streams, and is named concept drift. This thesis presents rcd, a framework that o ers an alternative approach to handle data streams that su er from recurring concept drifts. It creates a new classi er to each context found and stores a sample of the data used to build it. When a new concept drift occurs, rcd compares the new context to old ones using a non-parametric multivariate statistical test to verify if both contexts come from the same distribution. If so, the corresponding classi er is reused. If not, a new classi er is generated and stored. Three kinds of tests were performed. One compares the rcd framework with several adaptive algorithms (among single and ensemble approaches) in arti cial and real data sets, among the most used in the concept drift research area, with abrupt and gradual concept drifts. It is observed the ability of the classi ers in representing each context, how they handle concept drift, and training and testing times needed to evaluate the data sets. Results indicate that rcd had similar or better statistical results compared to the other classi ers. In the real-world data sets, rcd presented accuracies close to the best classi er in each data set. Another test compares two statistical tests (knn and Cramer) in their capability in representing and identifying contexts. Tests were performed using adaptive and batch classi ers as base learners of rcd, in arti cial and real-world data sets, with several rates-of-change. Results indicate that, in average, knn had better results compared to the Cramer test, and was also faster. Independently of the test used, rcd had higher accuracy values compared to their respective base learners. It is also presented an improvement in the rcd framework where the statistical tests are performed in parallel through the use of a thread pool. Tests were performed in three processors with di erent numbers of cores. Better results were obtained when there was a high number of detected concept drifts, the bu er size used to represent each data distribution was large, and there was a high test frequency. Even if none of these conditions apply, parallel and sequential execution still have very similar performances. Finally, a comparison between six di erent drift detection methods was also performed, comparing the predictive accuracies, evaluation times, and drift handling, including false alarm and miss detection rates, as well as the average distance to the drift point and its standard deviation.
Submitted by João Arthur Martins (joao.arthur@ufpe.br) on 2015-03-12T18:02:08Z No. of bitstreams: 2 Tese Paulo Gonçalves Jr..pdf: 2957463 bytes, checksum: de163caadf10cbd5442e145778865224 (MD5) license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5)
Made available in DSpace on 2015-03-12T18:02:08Z (GMT). No. of bitstreams: 2 Tese Paulo Gonçalves Jr..pdf: 2957463 bytes, checksum: de163caadf10cbd5442e145778865224 (MD5) license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) Previous issue date: 2013-04-23
Fluxos de dados s~ao um modelo de processamento de dados recente, onde os dados chegam continuamente, em grandes quantidades, a altas velocidades, de modo que eles devem ser processados em tempo real. Al em disso, v arias institui c~oes p ublicas e privadas armazenam grandes quantidades de dados que tamb em devem ser processadas. Classi cadores tradicionais n~ao s~ao adequados para lidar com grandes quantidades de dados por basicamente duas raz~oes. Primeiro, eles costumam ler os dados dispon veis v arias vezes at e convergirem, o que e impratic avel neste cen ario. Em segundo lugar, eles assumem que o contexto representado por dados e est avel no tempo, o que pode n~ao ser verdadeiro. Na verdade, a mudan ca de contexto e uma situa c~ao comum em uxos de dados, e e chamado de mudan ca de conceito. Esta tese apresenta o rcd, uma estrutura que oferece uma abordagem alternativa para lidar com os uxos de dados que sofrem de mudan cas de conceito recorrentes. Ele cria um novo classi cador para cada contexto encontrado e armazena uma amostra dos dados usados para constru -lo. Quando uma nova mudan ca de conceito ocorre, rcd compara o novo contexto com os antigos, utilizando um teste estat stico n~ao param etrico multivariado para veri car se ambos os contextos prov^em da mesma distribui c~ao. Se assim for, o classi cador correspondente e reutilizado. Se n~ao, um novo classi cador e gerado e armazenado. Tr^es tipos de testes foram realizados. Um compara o rcd com v arios algoritmos adaptativos (entre as abordagens individuais e de agrupamento) em conjuntos de dados arti ciais e reais, entre os mais utilizados na area de pesquisa de mudan ca de conceito, com mudan cas bruscas e graduais. E observada a capacidade dos classi cadores em representar cada contexto, como eles lidam com as mudan cas de conceito e os tempos de treinamento e teste necess arios para avaliar os conjuntos de dados. Os resultados indicam que rcd teve resultados estat sticos semelhantes ou melhores, em compara c~ao com os outros classi cadores. Nos conjuntos de dados do mundo real, rcd apresentou precis~oes pr oximas do melhor classi cador em cada conjunto de dados. Outro teste compara dois testes estat sticos (knn e Cramer) em suas capacidades de representar e identi car contextos. Os testes foram realizados utilizando classi cadores xi xii RESUMO tradicionais e adaptativos como base do rcd, em conjuntos de dados arti ciais e do mundo real, com v arias taxas de varia c~ao. Os resultados indicam que, em m edia, KNN obteve melhores resultados em compara c~ao com o teste de Cramer, al em de ser mais r apido. Independentemente do crit erio utilizado, rcd apresentou valores mais elevados de precis~ao em compara c~ao com seus respectivos classi cadores base. Tamb em e apresentada uma melhoria do rcd onde os testes estat sticos s~ao executadas em paralelo por meio do uso de um pool de threads. Os testes foram realizados em tr^es processadores com diferentes n umeros de n ucleos. Melhores resultados foram obtidos quando houve um elevado n umero de mudan cas de conceito detectadas, o tamanho das amostras utilizadas para representar cada distribui c~ao de dados era grande, e havia uma alta freq u^encia de testes. Mesmo que nenhuma destas condi c~oes se aplicam, a execu c~ao paralela e seq uencial ainda t^em performances muito semelhantes. Finalmente, uma compara c~ao entre seis diferentes m etodos de detec c~ao de mudan ca de conceito tamb em foi realizada, comparando a precis~ao, os tempos de avalia c~ao, manipula c~ao das mudan cas de conceito, incluindo as taxas de falsos positivos e negativos, bem como a m edia da dist^ancia ao ponto de mudan ca e o seu desvio padr~ao.
Los estilos APA, Harvard, Vancouver, ISO, etc.
17

Gonçalves, Júnior Paulo Mauricio. "Multivariate non-parametric statistical tests to reuse classifiers in recurring concept drifting environments". Universidade Federal de Pernambuco, 2013. https://repositorio.ufpe.br/handle/123456789/12288.

Texto completo
Resumen
Data streams are a recent processing model where data arrive continuously, in large quantities, at high speeds, so that they must be processed on-line. Besides that, several private and public institutions store large amounts of data that also must be processed. Traditional batch classi ers are not well suited to handle huge amounts of data for basically two reasons. First, they usually read the available data several times until convergence, which is impractical in this scenario. Second, they imply that the context represented by data is stable in time, which may not be true. In fact, the context change is a common situation in data streams, and is named concept drift. This thesis presents rcd, a framework that o ers an alternative approach to handle data streams that su er from recurring concept drifts. It creates a new classi er to each context found and stores a sample of the data used to build it. When a new concept drift occurs, rcd compares the new context to old ones using a non-parametric multivariate statistical test to verify if both contexts come from the same distribution. If so, the corresponding classi er is reused. If not, a new classi er is generated and stored. Three kinds of tests were performed. One compares the rcd framework with several adaptive algorithms (among single and ensemble approaches) in arti cial and real data sets, among the most used in the concept drift research area, with abrupt and gradual concept drifts. It is observed the ability of the classi ers in representing each context, how they handle concept drift, and training and testing times needed to evaluate the data sets. Results indicate that rcd had similar or better statistical results compared to the other classi ers. In the real-world data sets, rcd presented accuracies close to the best classi er in each data set. Another test compares two statistical tests (knn and Cramer) in their capability in representing and identifying contexts. Tests were performed using adaptive and batch classi ers as base learners of rcd, in arti cial and real-world data sets, with several rates-of-change. Results indicate that, in average, knn had better results compared to the Cramer test, and was also faster. Independently of the test used, rcd had higher accuracy values compared to their respective base learners. It is also presented an improvement in the rcd framework where the statistical tests are performed in parallel through the use of a thread pool. Tests were performed in three processors with di erent numbers of cores. Better results were obtained when there was a high number of detected concept drifts, the bu er size used to represent each data distribution was large, and there was a high test frequency. Even if none of these conditions apply, parallel and sequential execution still have very similar performances. Finally, a comparison between six di erent drift detection methods was also performed, comparing the predictive accuracies, evaluation times, and drift handling, including false alarm and miss detection rates, as well as the average distance to the drift point and its standard deviation.
Submitted by João Arthur Martins (joao.arthur@ufpe.br) on 2015-03-12T19:25:11Z No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) tese Paulo Mauricio Gonçalves Jr..pdf: 2957463 bytes, checksum: de163caadf10cbd5442e145778865224 (MD5)
Made available in DSpace on 2015-03-12T19:25:11Z (GMT). No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) tese Paulo Mauricio Gonçalves Jr..pdf: 2957463 bytes, checksum: de163caadf10cbd5442e145778865224 (MD5) Previous issue date: 2013-04-23
Fluxos de dados s~ao um modelo de processamento de dados recente, onde os dados chegam continuamente, em grandes quantidades, a altas velocidades, de modo que eles devem ser processados em tempo real. Al em disso, v arias institui c~oes p ublicas e privadas armazenam grandes quantidades de dados que tamb em devem ser processadas. Classi cadores tradicionais n~ao s~ao adequados para lidar com grandes quantidades de dados por basicamente duas raz~oes. Primeiro, eles costumam ler os dados dispon veis v arias vezes at e convergirem, o que e impratic avel neste cen ario. Em segundo lugar, eles assumem que o contexto representado por dados e est avel no tempo, o que pode n~ao ser verdadeiro. Na verdade, a mudan ca de contexto e uma situa c~ao comum em uxos de dados, e e chamado de mudan ca de conceito. Esta tese apresenta o rcd, uma estrutura que oferece uma abordagem alternativa para lidar com os uxos de dados que sofrem de mudan cas de conceito recorrentes. Ele cria um novo classi cador para cada contexto encontrado e armazena uma amostra dos dados usados para constru -lo. Quando uma nova mudan ca de conceito ocorre, rcd compara o novo contexto com os antigos, utilizando um teste estat stico n~ao param etrico multivariado para veri car se ambos os contextos prov^em da mesma distribui c~ao. Se assim for, o classi cador correspondente e reutilizado. Se n~ao, um novo classi cador e gerado e armazenado. Tr^es tipos de testes foram realizados. Um compara o rcd com v arios algoritmos adaptativos (entre as abordagens individuais e de agrupamento) em conjuntos de dados arti ciais e reais, entre os mais utilizados na area de pesquisa de mudan ca de conceito, com mudan cas bruscas e graduais. E observada a capacidade dos classi cadores em representar cada contexto, como eles lidam com as mudan cas de conceito e os tempos de treinamento e teste necess arios para avaliar os conjuntos de dados. Os resultados indicam que rcd teve resultados estat sticos semelhantes ou melhores, em compara c~ao com os outros classi cadores. Nos conjuntos de dados do mundo real, rcd apresentou precis~oes pr oximas do melhor classi cador em cada conjunto de dados. Outro teste compara dois testes estat sticos (knn e Cramer) em suas capacidades de representar e identi car contextos. Os testes foram realizados utilizando classi cadores tradicionais e adaptativos como base do rcd, em conjuntos de dados arti ciais e do mundo real, com v arias taxas de varia c~ao. Os resultados indicam que, em m edia, KNN obteve melhores resultados em compara c~ao com o teste de Cramer, al em de ser mais r apido. Independentemente do crit erio utilizado, rcd apresentou valores mais elevados de precis~ao em compara c~ao com seus respectivos classi cadores base. Tamb em e apresentada uma melhoria do rcd onde os testes estat sticos s~ao executadas em paralelo por meio do uso de um pool de threads. Os testes foram realizados em tr^es processadores com diferentes n umeros de n ucleos. Melhores resultados foram obtidos quando houve um elevado n umero de mudan cas de conceito detectadas, o tamanho das amostras utilizadas para representar cada distribui c~ao de dados era grande, e havia uma alta freq u^encia de testes. Mesmo que nenhuma destas condi c~oes se aplicam, a execu c~ao paralela e seq uencial ainda t^em performances muito semelhantes. Finalmente, uma compara c~ao entre seis diferentes m etodos de detec c~ao de mudan ca de conceito tamb em foi realizada, comparando a precis~ao, os tempos de avalia c~ao, manipula c~ao das mudan cas de conceito, incluindo as taxas de falsos positivos e negativos, bem como a m edia da dist^ancia ao ponto de mudan ca e o seu desvio padr~ao.
Los estilos APA, Harvard, Vancouver, ISO, etc.
18

Di, Molfetta Sabino. "Studio del modello di vita e di affidabilità di condensatori "Brick" in film per applicazione Automotive per macchine elettriche o ibride". Master's thesis, Alma Mater Studiorum - Università di Bologna, 2019.

Buscar texto completo
Resumen
Il mio progetto di tesi dal titolo "Studio del modello di vita e di affidabilità di condensatori "Brick" in film per applicazioni Automotive per macchine elettriche o ibride, è stato svolto in collaborazione con un'azienda di prodotto, leader mondiale nella progettazione e produzione di condensatori. L'obiettivo del mio progetto è stato quello di studiare, da un punto di vista affidabilistico, i modelli di vita dei condensatori in film di polipropilene utilizzati per applicazioni automotive. I test analizzati consistono nel sottoporre il dispositivo a determinati stress di temperatura e tensione, dalla durata variabile di tn ore, in modo tale da valutare l'affidabilità e la vita attesa garantita dal condensatore. Le attività svolte possono essere suddivise in tre grandi blocchi: organizzazione dei dati, andando a considerare un unico "failure mode", "Fitting" dei dati tramite il modello considerato ed infine costruzione del Test Plan in modo da soddisfare le seguenti richieste aziendali: determinazione della durata di ciascuno stress, determinazione del numero di campioni per ogni livello di stress e determinazione dei livelli di stress.
Los estilos APA, Harvard, Vancouver, ISO, etc.
19

Leboullenger, Déborah. "Essais sur la transition énergétique : enjeux, valorisation, financement et risques". Thesis, Paris 10, 2017. http://www.theses.fr/2017PA100065/document.

Texto completo
Resumen
Cette thèse porte sur La question du financement de la transition énergétique bas-carbone et le rôle du secteur financier et bancaire dans la réussite des objectifs climatiques internationaux. Les enjeux de la transition énergétique pour le secteur financier sont triples. Il faut d’abord comprendre la nécessité d’adopter une analyse différenciée de la consommation énergétique des ménages, en particulier celle liée à leur logement, dans la recherche d’adéquation des objectifs macroéconomiques avec ceux des arbitrages financiers et économiques individuels. Le premier chapitre conduit une analyse par typologie des dépenses énergétiques des ménages et propose une segmentation des comportements microéconomiques des acteurs et du marché de la transition énergétique dans le logement. Il faut ensuite trouver une manière de valoriser les investissements privés dans la transition énergétique, encore aujourd’hui difficiles à massifier notamment lorsqu’il est question de la performance énergétique du logement. Le chapitre 2 s’emploie à déployer un modèle issu d’une technique d’optimisation par les fonctions frontières, pour rendre compte de la présence d’une valeur verte sur un marché local du logement privé en France. Enfin il s’agit d’intégrer les multiples risques liés au changement climatique au sein de la cartographie des risques finaux (spécifiques, systématiques et systémiques) qui pèsent sur les institutions financières, dans l’évaluation de leur activité (la gestion des flux financiers) mais aussi dans l’évaluation du profil de risque de leur bilan. Les intermédiaires financiers mais également les institutions qui les régulent ont un rôle clé à jouer dans l’établissement d’une valeur sociale du carbone endogène aux marchés financiers (chapitre 3)
This thesis deals with the main challenges that we need to address to foster the private financing of a low-carbon energy transition. A massive amount of investment in low carbon assets is needed and most of the effort must come from final energy consumers such as households. Their ability as well as the ability of the financial intermediation institutions (that is banks in Europe) to valorise low-carbon investments and risk profile is the key for a successful low-carbon energy transition in France and in every industrialized country. These researches focus more particularly on the housing sector which represents 44% of the final energy consumption and 21% of the total greenhouses gases emissions in France. The first chapter of this thesis takes the viewpoint that only a disaggregated approach can actually permit macroeconomic and nationwide objectives to reduce final energy consumption match microeconomic arbitrages regarding energy spending in the private residential housing sector. Using segmentation and decision tree growing econometric techniques, the chapter proposes a typology of energy spending and a segmentation analysis and of the energy transition “market” in the housing sector. The second chapter uses frontier functions estimation technique on a local French private housing market to determine if selling prices contain a “Green Property Value”. An empirical analysis is then conducted to determine if this value can offset the upfront cost of energy retrofit. The last chapter takes the prism of the financial institutions. It attempts a first evaluation of the impact and exposition to climate related risks, those are physical, transition, liability and systemic risks, on the banking system and its prudential regulation framework
Los estilos APA, Harvard, Vancouver, ISO, etc.
20

Bourniquel, Bernard. "Evaluation des deformations mecaniques de surface par diffraction x. Optimisation de la mesure des contraintes residuelles. Application au controle qualite du grenaillage de precontrainte". Nantes, 1988. http://www.theses.fr/1988NANT2037.

Texto completo
Resumen
Application de la mesure des contraintes residuelles par diffraction des rx au controle non destructif d'engrenages ayant subi un grenaillage de precontrainte. Developpement de nouvelles procedures experimentales
Los estilos APA, Harvard, Vancouver, ISO, etc.
21

Jakel, Roland. "Grundlagen der Elasto-Plastizität in Creo Simulate - Theorie und Anwendung". Universitätsbibliothek Chemnitz, 2012. http://nbn-resolving.de/urn:nbn:de:bsz:ch1-qucosa-87141.

Texto completo
Resumen
Der Vortrag beschreibt die Grundlagen der Elasto-Plastizität sowie die softwaretechnische Anwendung mit dem FEM-Programm Creo Simulate bzw. Pro/MECHANICA von PTC. Der erste Teil des Vortrages beschreibt die Charakteristika plastischen Verhaltens, unterschiedliche plastische Materialgesetze, Fließkriterien bei mehrachsiger Beanspruchung und unterschiedliche Verfestigungsmodelle. Im zweiten Vortragsteil werden Möglichkeiten und Grenzen der Berechnung elasto-plastischer Probleme mit der Software dargestellt sowie Anwendungstipps gegeben. Im dritten Vortragsteil schließlich werden verschiedene Beispiele vorgestellt, davon besonders ausführlich das Verhalten einer einachsigen elasto-plastischen Zugprobe vor und nach dem Eintreten der Einschnürdehnung
This presentation describes the basics of elasto-plasticity and its application with the finite element software Creo Simulate (formerly Pro/MECHANICA) from PTC. The first part describes the characteristics of plastic behavior, different plastic material laws, yield criteria for multiaxial stress states and different hardening models. In the second part, the opportunities and limitations of analyzing elasto-plastic problems with the FEM-code are described and user information is provided. The last part finally presents different examples. Deeply treated is the behavior of a uniaxial tensile test specimen before and after elongation with necking appears
Los estilos APA, Harvard, Vancouver, ISO, etc.
22

Abouelnagah, Younes. "Efficient Temporal Synopsis of Social Media Streams". Thesis, 2013. http://hdl.handle.net/10012/7689.

Texto completo
Resumen
Search and summarization of streaming social media, such as Twitter, requires the ongoing analysis of large volumes of data with dynamically changing characteristics. Tweets are short and repetitious -- lacking context and structure -- making it difficult to generate a coherent synopsis of events within a given time period. Although some established algorithms for frequent itemset analysis might provide an efficient foundation for synopsis generation, the unmodified application of standard methods produces a complex mass of rules, dominated by common language constructs and many trivial variations on topically related results. Moreover, these results are not necessarily specific to events within the time period of interest. To address these problems, we build upon the Linear time Closed itemset Mining (LCM) algorithm, which is particularly suited to the large and sparse vocabulary of tweets. LCM generates only closed itemsets, providing an immediate reduction in the number of trivial results. To reduce the impact of function words and common language constructs, we apply a filltering step that preserves these terms only when they may form part of a relevant collocation. To further reduce trivial results, we propose a novel strengthening of the closure condition of LCM to retain only those results that exceed a threshold of distinctiveness. Finally, we perform temporal ranking, based on information gain, to identify results that are particularly relevant to the time period of interest. We evaluate our work over a collection of tweets gathered in late 2012, exploring the efficiency and filtering characteristic of each processing step, both individually and collectively. Based on our experience, the resulting synopses from various time periods provide understandable and meaningful pictures of events within those periods, with potential application to tasks such as temporal summarization and query expansion for search.
Los estilos APA, Harvard, Vancouver, ISO, etc.
23

Chou, Cheng-Chieh y 周正杰. "Planning step-stress test plans based on censored data". Thesis, 2015. http://ndltd.ncl.edu.tw/handle/24428386624288427049.

Texto completo
Resumen
博士
淡江大學
數學學系博士班
103
In this dissertation, we discuss a k-level step-stress accelerated life-testing (ALT) experiment with unequal duration steps. Under the Type-I and Type-I hybrid censoring schemes, the general log-location-scale and exponential lifetime distributions with mean lives which are a linear function of stress for the former and a log-linear function of stress for the latter, along with a cumulative exposure model, are considered as the working models. The determination of the optimal unequal duration steps for exponential, Weibull and lognormal distributions are addressed using the variance-optimality criterion. Numerical results show that for the general log-location-scale and exponential distributions, the optimal k-level step-stress ALT model with unequal duration steps reduces just to a 2-level step-stress ALT model when the available data is either Type-I or Type-I hybrid censored data. Moreover, using the induction argument, we are capable to give a theoretical proof for this result based on a Type-I exponential censored data.
Los estilos APA, Harvard, Vancouver, ISO, etc.
24

Lin, Ying-Po y 林英博. "Optimal Step-Stress Test under Progressive Type I Censoring with Grouped Data". Thesis, 2003. http://ndltd.ncl.edu.tw/handle/74396472995776456245.

Texto completo
Resumen
碩士
淡江大學
統計學系
91
In the study of product reliability, a life test usually has to be conducted. There are several types of life testing experiments. Type I and Type II censoring schemes have been studied rather extensively by lots of researchers in order to obtain the lifetimes of products. These two schemes do not allow for units to be removed from the test at points other than the final termination point. However, this allowance will be desirable for some experimenters. Therefore, a progressive censoring scheme is proposed to handle this problem. With today's high technology, many products are designed to work without failure for years. Thus, some life tests result in few or no failures in a short life testing time. One approach to solve this problem is to accelerate the life of products by increasing the levels of stress in order to obtain failures quickly. Moreover, in practice, it is often impossible continuously to observe or inspect the testing process, even with censoring. We might only be able to inspect the test units intermittently. Hence, we observe only the number of failures within the time period, but not the ssociated failure times. Data of this type are called grouped data. In this thesis, we are going to combine progressive censoring, accelerated life test and grouped data to develop a step-stress accelerated life-testing scheme with type I progressive group-censoring. We will obtain the estimators of the parameter in the proposed model when the failure time distribution is exponential. The problem of choosing the optimal length of the inspection interval will also be addressed using the variance and the D-optimality criteria.
Los estilos APA, Harvard, Vancouver, ISO, etc.
25

Chen, Hsin-Hao y 陳信豪. "Acceptance Sampling Plans under Step-stress Test and Type Ⅰ Interval Censoring Data". Thesis, 2006. http://ndltd.ncl.edu.tw/handle/92860141347330162235.

Texto completo
Resumen
碩士
國立政治大學
統計研究所
94
In life test experiment we use interval censoring to complete it when we can not inspect the experiment units continuously due to some accidents or for convenience. Furthermore, it is difficult to obtain enough units of breakdown products for many long life components and products. At this moment we can adopt step-stress life test to proceed the experiment. Using this method we can make the test units breakdown early for reducing the time test needed effectively and save prime cost. In this thesis, acceptance sampling plans are established for Rayleigh lifetime data under step-stress and type I interval censoring scheme. The minimum sample sizes and the corresponding critical values of lifetime needed for test plans are found. Some tables are provided for the use of the proposed test plans.
Los estilos APA, Harvard, Vancouver, ISO, etc.
26

Wang, Ye. "Robust Text Mining in Online Social Network Context". Thesis, 2018. https://vuir.vu.edu.au/38645/.

Texto completo
Resumen
Text mining is involved in a broad scope of applications in diverse domains that mainly, but not exclusively, serve political, commercial, medical and academic needs. Along with the rapid development of the Internet technology in recent thirty years and the advent of online social media and network in a decade, text data is obliged to entail features of online social data streams, for example, the explosive growth, the constantly changing content and the huge volume. As a result, text mining is no longer merely oriented to textual content itself, but requires consideration of surroundings and combining theories and techniques of stream processing and social network analysis, which give birth to a wide range of applications used for understanding thoughts spread over the world , such as sentiment analysis, mass surveillance and market prediction. Automatically discovering sequences of words that represent appropriate themes in a collection of documents, topic detection closely associated with document clustering and classification. These two tasks play integral roles in revealing deep insight into the text content in the whole text mining framework. However, most existing detection techniques cannot adapt to the dynamic social context. This shows bottlenecks of detecting performance and deficiencies of topic models. In this thesis, we take aim at text data stream, investigating novel techniques and solutions for robust text mining to tackle arising challenges associated with the online social context by incorporating methodologies of stream processing, topic detection and document clustering and classification. In particular, we have advanced the state-of-theart by making the following contributions: 1. A Multi-Window based Ensemble Learning (MWEL) framework is proposed for imbalanced streaming data that comprehensively improves the classification performance. MWEL ensures that the ensemble classifier is maintained up to date and adaptive to the evolving data distribution by applying a multi-window monitoring mechanism and efficient updating strategy. 2. A semi-supervised learning method is proposed to detect latent topics from news streams and the corresponding social context with a constraint propagation scheme to adequately exploit the hidden geometrical structure as supervised information in given data space. A collective learning algorithm is proposed to integrate the textual content into the social context. A locally weighted scheme is afterwards proposed to seek an improvement of the algorithm stability. 3. A Robust Hierarchical Ensemble (RHE) framework is introduced to enhance the robustness of the topic model. It, on the one hand, reduces repercussions caused by outliers and noises, and on the other overcomes inherent defects of text data. RHE adapts to the changing distribution of text stream by constructing a flexible document hierarchy which can be dynamically adjusted. A discussion of how to extract the most valuable social context is conducted with experiments for the purpose of removing some noises from the surroundings and efficiency of the proposed.
Los estilos APA, Harvard, Vancouver, ISO, etc.
27

Haldenwang, Nils. "Reliable General Purpose Sentiment Analysis of the Public Twitter Stream". Doctoral thesis, 2017. https://repositorium.ub.uni-osnabrueck.de/handle/urn:nbn:de:gbv:700-2017092716282.

Texto completo
Resumen
General purpose Twitter sentiment analysis is a novel field that is closely related to traditional Twitter sentiment analysis but slightly differs in some key aspects. The main difference lies in the fact that the novel approach considers the unfiltered public Twitter stream while most of the previous approaches often applied various filtering steps which are not feasible for many applications. Another goal is to yield more reliable results by only classifying a tweet as positive or negative if it distinctly consists of the respective sentiment and mark the remaining messages as uncertain. Traditional approaches are often not that strict. Within the course of this thesis it could be verified that the novel approach differs significantly from the traditional approach. Moreover, the experimental results indicated that the archetypical approaches could be transferred to the new domain but the related domain data is consistently sub par when compared to high quality in-domain data. Finally, the viability of the best classification algorithm could be qualitatively verified in a real-world setting that was also developed within the course of this thesis.
Los estilos APA, Harvard, Vancouver, ISO, etc.
28

Fernandes, Sebastião Cardoso. "How to deal with extreme cases for credit risk monitoring: a case study in a credit risk data science company". Master's thesis, 2018. http://hdl.handle.net/10362/35455.

Texto completo
Resumen
The Global Financial Crisis triggered a severe hold on credit lending due to the financial institutions’ inability to assess credit applicants risk levels properly. Based on U.S. data from Lending Club, we conducted a study to evaluate the consequences of including macroeconomic risk factors in individual credit application observations. Through historical scenario stress testing, we find that this approach results in an increase in performance for credit scoring models developed in a stable economic cycle and applied to a recession. The inclusion of macroeconomic indicators reveals potential for credit institutions to better absorb shocks derived from economic downturns.
Los estilos APA, Harvard, Vancouver, ISO, etc.
Ofrecemos descuentos en todos los planes premium para autores cuyas obras están incluidas en selecciones literarias temáticas. ¡Contáctenos para obtener un código promocional único!

Pasar a la bibliografía