Увійти

Готові списки джерел за темами / SUMMARIZATION ALGORITHMS / Дисертації

Щоб переглянути інші типи публікацій з цієї теми, перейдіть за посиланням: SUMMARIZATION ALGORITHMS.

Дисертації з теми "SUMMARIZATION ALGORITHMS"

Автор: Grafiati

Опубліковано: 11 вересня 2023

Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями

Оберіть тип джерела:

Ознайомтеся з топ-18 дисертацій для дослідження на тему "SUMMARIZATION ALGORITHMS".

Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.

Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.

Переглядайте дисертації для різних дисциплін та оформлюйте правильно вашу бібліографію.

1

Kolla, Maheedhar, and University of Lethbridge Faculty of Arts and Science. "Automatic text summarization using lexical chains : algorithms and experiments." Thesis, Lethbridge, Alta. : University of Lethbridge, Faculty of Arts and Science, 2004, 2004. http://hdl.handle.net/10133/226.

Повний текст джерела

Анотація:

Summarization is a complex task that requires understanding of the document content to determine the importance of the text. Lexical cohesion is a method to identify connected portions of the text based on the relations between the words in the text. Lexical cohesive relations can be represented using lexical chaings. Lexical chains are sequences of semantically related words spread over the entire text. Lexical chains are used in variety of Natural Language Processing (NLP) and Information Retrieval (IR) applications. In current thesis, we propose a lexical chaining method that includes the glossary relations in the chaining process. These relations enable us to identify topically related concepts, for instance dormitory and student, and thereby enhances the identification of cohesive ties in the text. We then present methods that use the lexical chains to generate summaries by extracting sentences from the document(s). Headlines are generated by filtering the portions of the sentences extracted, which do not contribute towards the meaning of the sentence. Headlines generated can be used in real world application to skim through the document collections in a digital library. Multi-document summarization is gaining demand with the explosive growth of online news sources. It requires identification of the several themes present in the collection to attain good compression and avoid redundancy. In this thesis, we propose methods to group the portions of the texts of a document collection into meaningful clusters. clustering enable us to extract the various themes of the document collection. Sentences from clusters can then be extracted to generate a summary for the multi-document collection. Clusters can also be used to generate summaries with respect to a given query. We designed a system to compute lexical chains for the given text and use them to extract the salient portions of the document. Some specific tasks considered are: headline generation, multi-document summarization, and query-based summarization. Our experimental evaluation shows that efficient summaries can be extracted for the above tasks.
viii, 80 leaves : ill. ; 29 cm.

Стилі APA, Harvard, Vancouver, ISO та ін.

2

Hodulik, George M. "Graph Summarization: Algorithms, Trained Heuristics, and Practical Storage Application." Case Western Reserve University School of Graduate Studies / OhioLINK, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=case1482143946391013.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

3

Hamid, Fahmida. "Evaluation Techniques and Graph-Based Algorithms for Automatic Summarization and Keyphrase Extraction." Thesis, University of North Texas, 2016. https://digital.library.unt.edu/ark:/67531/metadc862796/.

Повний текст джерела

Анотація:

Automatic text summarization and keyphrase extraction are two interesting areas of research which extend along natural language processing and information retrieval. They have recently become very popular because of their wide applicability. Devising generic techniques for these tasks is challenging due to several issues. Yet we have a good number of intelligent systems performing the tasks. As different systems are designed with different perspectives, evaluating their performances with a generic strategy is crucial. It has also become immensely important to evaluate the performances with minimal human effort. In our work, we focus on designing a relativized scale for evaluating different algorithms. This is our major contribution which challenges the traditional approach of working with an absolute scale. We consider the impact of some of the environment variables (length of the document, references, and system-generated outputs) on the performance. Instead of defining some rigid lengths, we show how to adjust to their variations. We prove a mathematically sound baseline that should work for all kinds of documents. We emphasize automatically determining the syntactic well-formedness of the structures (sentences). We also propose defining an equivalence class for each unit (e.g. word) instead of the exact string matching strategy. We show an evaluation approach that considers the weighted relatedness of multiple references to adjust to the degree of disagreements between the gold standards. We publish the proposed approach as a free tool so that other systems can use it. We have also accumulated a dataset (scientific articles) with a reference summary and keyphrases for each document. Our approach is applicable not only for evaluating single-document based tasks but also for evaluating multiple-document based tasks. We have tested our evaluation method for three intrinsic tasks (taken from DUC 2004 conference), and in all three cases, it correlates positively with ROUGE. Based on our experiments for DUC 2004 Question-Answering task, it correlates with the human decision (extrinsic task) with 36.008% of accuracy. In general, we can state that the proposed relativized scale performs as well as the popular technique (ROUGE) with flexibility for the length of the output. As part of the evaluation we have also devised a new graph-based algorithm focusing on sentiment analysis. The proposed model can extract units (e.g. words or sentences) from the original text belonging either to the positive sentiment-pole or to the negative sentiment-pole. It embeds both (positive and negative) types of sentiment-flow into a single text-graph. The text-graph is composed with words or phrases as nodes, and their relations as edges. By recursively calling two mutually exclusive relations the model builds the final rank of the nodes. Based on the final rank, it splits two segments from the article: one with highly positive sentiment and the other with highly negative sentiments. The output of this model was tested with the non-polar TextRank generated output to quantify how much of the polar summaries actually covers the fact along with sentiment.

Стилі APA, Harvard, Vancouver, ISO та ін.

4

Chiarandini, Luca. "Characterizing and modeling web sessions with applications." Doctoral thesis, Universitat Pompeu Fabra, 2014. http://hdl.handle.net/10803/283414.

Повний текст джерела

Анотація:

This thesis focuses on the analysis and modeling of web sessions, groups of requests made by a single user for a single navigation purpose. Understanding how people browse through websites is important, helping us to improve interfaces and provide to better content. After first conducting a statistical analysis of web sessions, we go on to present algorithms to summarize and model web sessions. Finally, we describe applications that use novel browsing methods, in particular parallel browsing. We observe that people tend to browse images in a sequences and that those sequences can be considered as units of content in their own right. The session summarization algorithm presented in this thesis tackles a novel pattern mining problem, and this algorithm can also be applied to other fields, such as information propagation. From the statistical analysis and the models presented, we show that contextual information, such as the referrer domain and the time of day, plays a major role in the evolution of sessions. To understand browsing one should therefore take into account the context in which it takes place.
Esta tesis se centra en el análisis y modelaje de sesiones web, grupos de solicitudes realizadas por un único usuario para un sólo propósito de navegación. La comprensión de cómo la gente navega a través de los sitios web es importante para mejorar la interfaz y ofrecer un mejor contenido. En primer lugar, se realiza un análisis estadístico de las sesiones web. En segundo lugar, se presentan los algoritmos para identificar los patrones de navegación frecuentes y modelar las sesiones web. Finalmente, se describen varias aplicaciones que utilizan nuevas formas de navegación: la navegación paralela. A través del análisis de los registros de uso se observa que las personas tienden a navegar por las imágenes en modo secuencial y que esas secuencias pueden ser consideradas como unidades de contenido. % La generación de resumenes de sesiones presentada en esta tesis es un problema nuevo de extracción de patrones y se puede aplicar también a otros campos como el de la propagación de información. A partir del análisis y los modelos presentados entendemos que la información contextual, como el dominio previo de acceso o la hora del día, juega un papel importante en la evolución de las sesiones. Para entender la navegación no se debe, por tanto, olvidar el contexto en que esta se lleva a cabo.

Стилі APA, Harvard, Vancouver, ISO та ін.

5

Bahri, Maroua. "Improving IoT data stream analytics using summarization techniques." Electronic Thesis or Diss., Institut polytechnique de Paris, 2020. http://www.theses.fr/2020IPPAT017.

Повний текст джерела

Анотація:

Face à cette évolution technologique vertigineuse, l’utilisation des dispositifs de l'Internet des Objets (IdO), les capteurs, et les réseaux sociaux, d'énormes flux de données IdO sont générées quotidiennement de différentes applications pourront être transformées en connaissances à travers l’apprentissage automatique. En pratique, de multiples problèmes se posent afin d’extraire des connaissances utiles de ces flux qui doivent être gérés et traités efficacement. Dans ce contexte, cette thèse vise à améliorer les performances (en termes de mémoire et de temps) des algorithmes de l'apprentissage supervisé, principalement la classification à partir de flux de données en évolution. En plus de leur nature infinie, la dimensionnalité élevée et croissante de ces flux données dans certains domaines rendent la tâche de classification plus difficile. La première partie de la thèse étudie l’état de l’art des techniques de classification et de réduction de dimension pour les flux de données, tout en présentant les travaux les plus récents dans ce cadre.La deuxième partie de la thèse détaille nos contributions en classification pour les flux de données. Il s’agit de nouvelles approches basées sur les techniques de réduction de données visant à réduire les ressources de calcul des classificateurs actuels, presque sans perte en précision. Pour traiter les flux de données de haute dimension efficacement, nous incorporons une étape de prétraitement qui consiste à réduire la dimension de chaque donnée (dès son arrivée) de manière incrémentale avant de passer à l’apprentissage. Dans ce contexte, nous présentons plusieurs approches basées sur: Bayesien naïf amélioré par les résumés minimalistes et hashing trick, k-NN qui utilise compressed sensing et UMAP, et l’utilisation d’ensembles d’apprentissage également
With the evolution of technology, the use of smart Internet-of-Things (IoT) devices, sensors, and social networks result in an overwhelming volume of IoT data streams, generated daily from several applications, that can be transformed into valuable information through machine learning tasks. In practice, multiple critical issues arise in order to extract useful knowledge from these evolving data streams, mainly that the stream needs to be efficiently handled and processed. In this context, this thesis aims to improve the performance (in terms of memory and time) of existing data mining algorithms on streams. We focus on the classification task in the streaming framework. The task is challenging on streams, principally due to the high -- and increasing -- data dimensionality, in addition to the potentially infinite amount of data. The two aspects make the classification task harder.The first part of the thesis surveys the current state-of-the-art of the classification and dimensionality reduction techniques as applied to the stream setting, by providing an updated view of the most recent works in this vibrant area.In the second part, we detail our contributions to the field of classification in streams, by developing novel approaches based on summarization techniques aiming to reduce the computational resource of existing classifiers with no -- or minor -- loss of classification accuracy. To address high-dimensional data streams and make classifiers efficient, we incorporate an internal preprocessing step that consists in reducing the dimensionality of input data incrementally before feeding them to the learning stage. We present several approaches applied to several classifications tasks: Naive Bayes which is enhanced with sketches and hashing trick, k-NN by using compressed sensing and UMAP, and also integrate them in ensemble methods

Стилі APA, Harvard, Vancouver, ISO та ін.

6

Santos, Joelson Antonio dos. "Algoritmos rápidos para estimativas de densidade hierárquicas e suas aplicações em mineração de dados." Universidade de São Paulo, 2018. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-25102018-174244/.

Повний текст джерела

Анотація:

O agrupamento de dados (ou do inglês Clustering) é uma tarefa não supervisionada capaz de descrever objetos em grupos (ou clusters), de maneira que objetos de um mesmo grupo sejam mais semelhantes entre si do que objetos de grupos distintos. As técnicas de agrupamento de dados são divididas em duas principais categorias: particionais e hierárquicas. As técnicas particionais dividem um conjunto de dados em um determinado número de grupos distintos, enquanto as técnicas hierárquicas fornecem uma sequência aninhada de agrupamentos particionais separados por diferentes níveis de granularidade. Adicionalmente, o agrupamento hierárquico de dados baseado em densidade é um paradigma particular de agrupamento que detecta grupos com diferentes concentrações ou densidades de objetos. Uma das técnicas mais populares desse paradigma é conhecida como HDBSCAN*. Além de prover hierarquias, HDBSCAN* é um framework que fornece detecção de outliers, agrupamento semi-supervisionado de dados e visualização dos resultados. No entanto, a maioria das técnicas hierárquicas, incluindo o HDBSCAN*, possui uma alta complexidade computacional. Fato que as tornam proibitivas para a análise de grandes conjuntos de dados. No presente trabalho de mestrado, foram propostas duas variações aproximadas de HDBSCAN* computacionalmente mais escaláveis para o agrupamento de grandes quantidades de dados. A primeira variação de HDBSCAN* segue o conceito de computação paralela e distribuída, conhecido como MapReduce. Já a segunda, segue o contexto de computação paralela utilizando memória compartilhada. Ambas as variações são baseadas em um conceito de divisão eficiente de dados, conhecido como Recursive Sampling, que permite o processamento paralelo desses dados. De maneira similar ao HDBSCAN*, as variações propostas também são capazes de fornecer uma completa análise não supervisionada de padrões em dados, incluindo a detecção de outliers. Experimentos foram realizados para avaliar a qualidade das variações propostas neste trabalho, especificamente, a variação baseada em MapReduce foi comparada com uma versão paralela e exata de HDBSCAN* conhecida como Random Blocks. Já a versão paralela em ambiente de memória compartilhada foi comparada com o estado da arte (HDBSCAN*). Em termos de qualidade de agrupamento e detecção de outliers, tanto a variação baseada em MapReduce quanto a baseada em memória compartilhada mostraram resultados próximos à versão paralela exata de HDBSCAN* e ao estado da arte, respectivamente. Já em termos de tempo computacional, as variações propostas mostraram maior escalabilidade e rapidez para o processamento de grandes quantidades de dados do que as versões comparadas.
Clustering is an unsupervised learning task able to describe a set of objects in clusters, so that objects of a same cluster are more similar than objects of other clusters. Clustering techniques are divided in two main categories: partitional and hierarchical. The particional techniques divide a dataset into a number of distinct clusters, while hierarchical techniques provide a nested sequence of partitional clusters separated by different levels of granularity. Furthermore, hierarchical density-based clustering is a particular clustering paradigm that detects clusters with different concentrations or densities of objects. One of the most popular techniques of this paradigm is known as HDBSCAN*. In addition to providing hierarchies, HDBSCAN* is a framework that provides outliers detection, semi-supervised clustering and visualization of results. However, most hierarchical techniques, including HDBSCAN*, have a high complexity computational. This fact makes them prohibitive for the analysis of large datasets. In this work have been proposed two approximate variations of HDBSCAN* computationally more scalable for clustering large amounts of data. The first variation follows the concept of parallel and distributed computing, known as MapReduce. The second one follows the context of parallel computing using shared memory. Both variations are based on a concept of efficient data division, known as Recursive Sampling, which allows parallel processing of this data. In a manner similar to HDBSCAN*, the proposed variations are also capable of providing complete unsupervised patterns analysis in data, including outliers detection. Experiments have been carried out to evaluate the quality of the variations proposed in this work, specifically, the variation based on MapReduce have been compared to a parallel and exact version of HDBSCAN*, known as Random Blocks. Already the version parallel in shared memory environment have been compared to the state of the art (HDBSCAN*). In terms of clustering quality and outliers detection, the variation based on MapReduce and other based on shared memory showed results close to the exact parallel verson of HDBSCAN* and the state of the art, respectively. In terms of computational time, the proposed variations showed greater scalability and speed for processing large amounts of data than the compared versions.

Стилі APA, Harvard, Vancouver, ISO та ін.

7

Krübel, Monique. "Analyse und Vergleich von Extraktionsalgorithmen für die Automatische Textzusammenfassung." Master's thesis, Universitätsbibliothek Chemnitz, 2006. http://nbn-resolving.de/urn:nbn:de:swb:ch1-200601180.

Повний текст джерела

Анотація:

Obwohl schon seit den 50er Jahren auf dem Gebiet der Automatischen Textzusammenfassung Forschung betrieben wird, wurden der Nutzen und die Notwendigkeit dieser Systeme erst mit dem Boom des Internets richtig erkannt. Das World Wide Web stellt eine täglich wachsende Menge an Informationen zu nahezu jedem Thema zur Verfügung. Um den Zeitaufwand zum Finden und auch zum Wiederfinden der richtigen Informationen zu minimieren, traten Suchmaschinen ihren Siegeszug an. Doch um einen Überblick zu einem ausgewählten Thema zu erhalten, ist eine einfache Auflistung aller in Frage kommenden Seiten nicht mehr adäquat. Zusätzliche Mechanismen wie Extraktionsalgorithmen für die automatische Generierung von Zusammenfassungen können hier helfen, Suchmaschinen oder Webkataloge zu optimieren, um so den Zeitaufwand bei der Recherche zu verringern und die Suche einfacher und komfortabler zu gestalten. In dieser Diplomarbeit wurde eine Analyse von Extraktionsalgorithmen durchgeführt, welche für die automatische Textzusammenfassung genutzt werden können. Auf Basis dieser Analyse als viel versprechend eingestufte Algorithmen wurden in Java implementiert und die mit diesen Algorithmen erstellten Zusammenfassungen in einer Evaluation verglichen.

Стилі APA, Harvard, Vancouver, ISO та ін.

8

Maaloul, Mohamed. "Approche hybride pour le résumé automatique de textes : Application à la langue arabe." Thesis, Aix-Marseille, 2012. http://www.theses.fr/2012AIXM4778.

Повний текст джерела

Анотація:

Cette thèse s'intègre dans le cadre du traitement automatique du langage naturel. La problématique du résumé automatique de documents arabes qui a été abordée, dans cette thèse, s'est cristallisée autour de deux points. Le premier point concerne les critères utilisés pour décider du contenu essentiel à extraire. Le deuxième point se focalise sur les moyens qui permettent d'exprimer le contenu essentiel extrait sous la forme d'un texte ciblant les besoins potentiels d'un utilisateur. Afin de montrer la faisabilité de notre approche, nous avons développé le système "L.A.E", basé sur une approche hybride qui combine une analyse symbolique avec un traitement numérique. Les résultats d'évaluation de ce système sont encourageants et prouvent la performance de l'approche hybride proposée. Ces résultats, ont montré, en premier lieu, l'applicabilité de l'approche dans le contexte de documents sans restriction quant à leur thème (Éducation, Sport, Science, Politique, Reportage, etc.), leur contenu et leur volume. Ils ont aussi montré l'importance de l'apprentissage dans la phase de classement et sélection des phrases forment l'extrait final
This thesis falls within the framework of Natural Language Processing. The problems of automatic summarization of Arabic documents which was approached, in this thesis, are based on two points. The first point relates to the criteria used to determine the essential content to extract. The second point focuses on the means to express the essential content extracted in the form of a text targeting the user potential needs.In order to show the feasibility of our approach, we developed the "L.A.E" system, based on a hybrid approach which combines a symbolic analysis with a numerical processing.The evaluation results are encouraging and prove the performance of the proposed hybrid approach.These results showed, initially, the applicability of the approach in the context of mono documents without restriction as for their topics (Education, Sport, Science, Politics, Interaction, etc), their content and their volume. They also showed the importance of the machine learning in the phase of classification and selection of the sentences forming the final extract

Стилі APA, Harvard, Vancouver, ISO та ін.

9

Pokorný, Lubomír. "Metody sumarizace textových dokumentů." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2012. http://www.nusl.cz/ntk/nusl-236443.

Повний текст джерела

Анотація:

This thesis deals with one-document summarization of text data. Part of it is devoted to data preparation, mainly to the normalization. Listed are some of the stemming algorithms and it contains also description of lemmatization. The main part is devoted to Luhn"s method for summarization and its extension of use WordNet dictionary. Oswald summarization method is described and applied as well. Designed and implemented application performs automatic generation of abstracts using these methods. A set of experiments where developed, which verified correct functionality of the application and of extension of Luhn"s summarization method too.

Стилі APA, Harvard, Vancouver, ISO та ін.

10

Hassanlou, Nasrin. "Probabilistic graph summarization." Thesis, 2012. http://hdl.handle.net/1828/4403.

Повний текст джерела

Анотація:

We study group-summarization of probabilistic graphs that naturally arise in social networks, semistructured data, and other applications. Our proposed framework groups the nodes and edges of the graph based on a user selected set of node attributes. We present methods to compute useful graph aggregates without the need to create all of the possible graph-instances of the original probabilistic graph. Also, we present an algorithm for graph summarization based on pure relational (SQL) technology. We analyze our algorithm and practically evaluate its efficiency using an extended Epinions dataset as well as synthetic datasets. The experimental results show the scalability of our algorithm and its efficiency in producing highly compressed summary graphs in reasonable time.
Graduate

Стилі APA, Harvard, Vancouver, ISO та ін.

11

SINGH, SWATI. "ANALYSIS FOR TEXT SUMMARIZATION ALGORITHMS FOR DIFFERENT DATASETS." Thesis, 2017. http://dspace.dtu.ac.in:8080/jspui/handle/repository/15975.

Повний текст джерела

Анотація:

With the exponential increase in the data available on the internet for a single domain, it is difficult to understand the gist of a whole document without reading the whole document. Automatic Text Summarization reduces the content of the document by presenting important key points from the data. Extracting the major points from the document is easier and requires less machinery than forming new sentences from the available data. Research in this domain started nearly 50 years ago from identifying key features to rank important sentences in a text document. The main aim of text summarization is to obtain human quality summarization, which is still a distant dream. Abstractive Summarization techniques uses dynamic wordnet corpus to produce coherent and succinct summaries. Automatic text summarization has applications in various domains including medical research, legal domain, doctoral research, documents available on internet etc. To serve the need of text summarization, numerous algorithms based on different content selection and features using different methodologies are made in last half century. Research started from Single document summarization has shifted to Multi-document summarization in last few decades in order to save more time and compressing the same domain documents at once. Here, An analysis is presented on the Single document and Multi-document summarization algorithms on different domain datasets.

Стилі APA, Harvard, Vancouver, ISO та ін.

12

Chen, Chun-Chang, and 陳俊章. "An Ensemble Approach for Multi-document Summarization using Genetic Algorithms." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/26z56t.

Повний текст джерела

Анотація:

碩士
元智大學
資訊工程學系
106
Multi-document summarization is an important research task in text summarization. It helps people to reduce much time in reading articles of the same topics but with similar contents. In this study, we propose an ensemble model based on genetic algorithms. Using this model, we construct two ensemble summarization models, one for four network summarization models, and the other for four probabilistic topic network models. These two ensemble models use genetic algorithms to find the optimal weights. We use the datasets of DUC 2004 to DUC 2007 for performance evaluation. The experimental results show that these two ensemble models can achieve the best performance in ROUGE-1, ROUGE-2, and ROUGE-SU4 than other standalone network models and standalone probabilistic topic network models, respectively.

Стилі APA, Harvard, Vancouver, ISO та ін.

13

Chen, Yong-Jhih, and 陳泳志. "A Text Summarization Model based on Genetic Algorithm." Thesis, 2012. http://ndltd.ncl.edu.tw/handle/40844191853732361706.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

14

Tatarko, William. "Sumarizace větvených cyklů." Master's thesis, 2021. http://www.nusl.cz/ntk/nusl-448637.

Повний текст джерела

Анотація:

In this thesis we present a novel algorithm for summarization of loops with multiple branches operating over integers. The algorithm is based on anal- ysis of a so-called state diagram, which reflects feasibility of various branch interleavings. Summarization can be used to replace loops with equivalent non-iterative statements. This supports examination of reachability and can be used for software verification. For instance, summarization may also be used for (compiler) optimizations. 1

Стилі APA, Harvard, Vancouver, ISO та ін.

15

CHEN, YU-HSUAN, and 陳玉軒. "Constructing the Multi-Topics and Multi-Document Summarization Algorithm." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/qn8r6c.

Повний текст джерела

Анотація:

碩士
輔仁大學
統計資訊學系應用統計碩士在職專班
106
Document summarization is a very important topic in text mining. In the past, most researches focused on single or multi-document summarization in specific events or topics. However, there has been no summarization research focus on multi-documents in multi-topics in the same time. In this study, the proposed algorithm can cluster the news by analyzing the similarities among multiple news from 736 news stories in 9 different topics, and the clustering accuracy is 0.66. The ROUGE-N score of the proposed algorithm is not only better than TextRank and LexRank summarization but also can find a suitable threshold to process the summary results efficiently.

Стилі APA, Harvard, Vancouver, ISO та ін.

16

AGGARWAL, TUSHAR. "IMAGE DESCRIPTIVE SUMMARIZATION BY DEEP LEARNING AND ADVANCED LSTM MODEL ARCHITECTURE." Thesis, 2019. http://dspace.dtu.ac.in:8080/jspui/handle/repository/17084.

Повний текст джерела

Анотація:

Auto Image Descriptor is becoming a trending point of interest in current era of research among researchers. Being a great community, which is proposing a continuous and enhanced Iist of intuitive algorithms which is solving to its problems. However, still there are lot of improvement to this field. Therefore, it’s becoming a field of attraction for many researchers and industries and reliable to this digital world. Of these various image descriptive algorithms, some outperform others in terms of basic descriptors requirements like robustness, invisibility, processing cost, etc. In this thesis, we study a new hybrid model image descriptor scheme which when combined with our proposed model algorithm provides us efficient results. Following illustrative points are made to describe the thesis in a nutshell which will later on be discussed in detail.  Firstly, we train our image in 9x9 kernels using CNN model. The idea behind this 1024 kernel of our host image is to divide each pixel of host image with lowest human value system characteristics i.e lowest entropy values and lowest edge entropy values.  Our host image is further divided into 8x8 pixel blocks. Therefore, we’ll have 64 rows and 64 columns are there of the 8x8 blocks. In total 64x64x8 blocks of host image i.e. the size of host image is 512x512.  Now we captionate the pre trained labels to our pixelate model obtained from our CNN model. This will be used in embedding of our labels with the LSTM algorithm. Using vi LSTM model algorithm, it will assign a label to each pixelate kernel which will perform embedding to the host image.  This embedding and extraction is done Long Short Term Memory algorithm, which is explained in later chapters.  Now using this embedded image, we our Quest Q function of our host image using embed RNN network.  We find best value of Q function using this RNN networks. Also we get the best computed label for our host image using this algorithm.  In a nutshell, this project has combined four major algorithms to generate best results possible. These adopted criteria significantly contributed to establishing a scheme with high robustness against attacks without affecting the visual quality of the image. To describe it briefly the project consists of following four subsections- 1. Convolutional Neural Networks (CNN) 2. Long Short Term Memory Algorithm (RNN) 3. Recurrent Neural Networks (RNN)

Стилі APA, Harvard, Vancouver, ISO та ін.

17

Chester, Sean. "Representative Subsets for Preference Queries." Thesis, 2013. http://hdl.handle.net/1828/4833.

Повний текст джерела

Анотація:

We focus on the two overlapping areas of preference queries and dataset summarization. A (linear) preference query specifies the relative importance of the attributes in a dataset and asks for the tuples that best match those preferences. Dataset summarization is the task of representing an entire dataset by a small, representative subset. Within these areas, we focus on three important sub-problems, significantly advancing the state-of-the-art in each. We begin with an investigation into a new formulation of preference queries, identifying a neglected and important subclass that we call threshold projection queries. While literature typically constrains the attribute preferences (which are real-valued weights) such that their sum is one, we show that this introduces bias when querying by threshold rather than cardinality. Using projection, rather than inner product as in that literature, removes the bias. We then give algorithms for building and querying indices for this class of query, based, in the general case, on geometric duality and halfspace range searching, and, in an important special case, on stereographic projection. In the second part of the dissertation, we investigate the monochromatic reverse top-k (mRTOP) query in two dimensions. A mRTOP query asks for, given a tuple and a dataset, the linear preference queries on the dataset that will include the given tuple. Towards this goal, we consider the novel scenario of building an index to support mRTOP queries, using geometric duality and plane sweep. We show theoretically and empirically that the index is quick to build, small on disk, and very efficient at answering mRTOP queries. As a corollary to these efforts, we defined the top-k rank contour, which encodes the k-ranked tuple for every possible linear preference query. This is tremendously useful in answering mRTOP queries, but also, we posit, of significant independent interest for its relation to myriad related linear preference query problems. Intuitively, the top-k rank contour is the minimum possible representation of knowledge needed to identify the k-ranked tuple for any query, without apriori knowledge of that query. We also introduce k-regret minimizing sets, a very succinct approximation of a numeric dataset. The purpose of the approximation is to represent the entire dataset by just a small subset that nonetheless will contain a tuple within or near to the top-k for any linear preference query. We show that the problem of finding k-regret minimizing sets—and, indeed, the problem in literature that it generalizes—is NP-Hard. Still, for the special case of two dimensions, we provide a fast, exact algorithm based on the top-k rank contour. For arbitrary dimension, we introduce a novel greedy algorithm based on linear programming and randomization that does excellently in our empirical investigation.
Graduate
0984

Стилі APA, Harvard, Vancouver, ISO та ін.

18

TANG, ZHUO-YUE, and 唐卓悅. "Combining Main Path Analysis and Citation Analysis to Construct an Automatic Summarization Algorithm." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/uckqxq.

Повний текст джерела

Анотація:

碩士
輔仁大學
統計資訊學系應用統計碩士班
107
As we know, the process of learning or anything else is continuous and evolving, and so is the advance of academic papers. In the past, most researches focused on analysis of citation frequency and content. This study focuses on main path analysis, citation analysis and text mining techniques so that this study can apply the approach to make an automatic summary of a single topic. Meanwhile, the researchers can obtain the most complete abstract citation by summarizing the required quotation and its main content automatically according to the relevance of the quotations. This study can even make the researchers obtain what they want by searching a more general keyword. In addition, this method is applied to “h-index” field whose total similarity of abstract citation reaches 0.532, while the similarity in precision, recall and F-measure of H-index reaches 0.5, 0.727 and 0.593.And the total similarity of abstract citation of “main path analysis” reaches 0.4853.

Стилі APA, Harvard, Vancouver, ISO та ін.

Ми пропонуємо знижки на всі преміум-плани для авторів, чиї праці увійшли до тематичних добірок літератури. Зв'яжіться з нами, щоб отримати унікальний промокод!