Добірка наукової літератури з теми "SUMMARIZATION ALGORITHMS"

Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями

Оберіть тип джерела:

Ознайомтеся зі списками актуальних статей, книг, дисертацій, тез та інших наукових джерел на тему "SUMMARIZATION ALGORITHMS".

Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.

Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.

Статті в журналах з теми "SUMMARIZATION ALGORITHMS"

1

Chang, Hsien-Tsung, Shu-Wei Liu, and Nilamadhab Mishra. "A tracking and summarization system for online Chinese news topics." Aslib Journal of Information Management 67, no. 6 (November 16, 2015): 687–99. http://dx.doi.org/10.1108/ajim-10-2014-0147.

Повний текст джерела
Анотація:
Purpose – The purpose of this paper is to design and implement new tracking and summarization algorithms for Chinese news content. Based on the proposed methods and algorithms, the authors extract the important sentences that are contained in topic stories and list those sentences according to timestamp order to ensure ease of understanding and to visualize multiple news stories on a single screen. Design/methodology/approach – This paper encompasses an investigational approach that implements a new Dynamic Centroid Summarization algorithm in addition to a Term Frequency (TF)-Density algorithm to empirically compute three target parameters, i.e., recall, precision, and F-measure. Findings – The proposed TF-Density algorithm is implemented and compared with the well-known algorithms Term Frequency-Inverse Word Frequency (TF-IWF) and Term Frequency-Inverse Document Frequency (TF-IDF). Three test data sets are configured from Chinese news web sites for use during the investigation, and two important findings are obtained that help the authors provide more precision and efficiency when recognizing the important words in the text. First, the authors evaluate three topic tracking algorithms, i.e., TF-Density, TF-IDF, and TF-IWF, with the said target parameters and find that the recall, precision, and F-measure of the proposed TF-Density algorithm is better than those of the TF-IWF and TF-IDF algorithms. In the context of the second finding, the authors implement a blind test approach to obtain the results of topic summarizations and find that the proposed Dynamic Centroid Summarization process can more accurately select topic sentences than the LexRank process. Research limitations/implications – The results show that the tracking and summarization algorithms for news topics can provide more precise and convenient results for users tracking the news. The analysis and implications are limited to Chinese news content from Chinese news web sites such as Apple Library, UDN, and well-known portals like Yahoo and Google. Originality/value – The research provides an empirical analysis of Chinese news content through the proposed TF-Density and Dynamic Centroid Summarization algorithms. It focusses on improving the means of summarizing a set of news stories to appear for browsing on a single screen and carries implications for innovative word measurements in practice.
Стилі APA, Harvard, Vancouver, ISO та ін.
2

Yadav, Divakar, Naman Lalit, Riya Kaushik, Yogendra Singh, Mohit, Dinesh, Arun Kr Yadav, Kishor V. Bhadane, Adarsh Kumar, and Baseem Khan. "Qualitative Analysis of Text Summarization Techniques and Its Applications in Health Domain." Computational Intelligence and Neuroscience 2022 (February 9, 2022): 1–14. http://dx.doi.org/10.1155/2022/3411881.

Повний текст джерела
Анотація:
For the better utilization of the enormous amount of data available to us on the Internet and in different archives, summarization is a valuable method. Manual summarization by experts is an almost impossible and time-consuming activity. People could not access, read, or use such a big pile of information for their needs. Therefore, summary generation is essential and beneficial in the current scenario. This paper presents an efficient qualitative analysis of the different algorithms used for text summarization. We implemented five different algorithms, namely, term frequency-inverse document frequency (TF-IDF), LexRank, TextRank, BertSum, and PEGASUS, for a summary generation. These algorithms are chosen based on various factors. After reviewing the state-of-the-art literature, it generates good summaries results. The performance of these algorithms is compared on two different datasets, i.e., Reddit-TIFU and MultiNews, and their results are measured using Recall-Oriented Understudy for Gisting Evaluation (ROUGE) measure to perform analysis to decide the best algorithm among these and generate the summary. After performing a qualitative analysis of the above algorithms, we observe that for both the datasets, i.e., Reddit-TIFU and MultiNews, PEGASUS had the best average F-score for abstractive text summarization and TextRank algorithms for extractive text summarization, with a better average F-score.
Стилі APA, Harvard, Vancouver, ISO та ін.
3

Mall, Shalu, Avinash Maurya, Ashutosh Pandey, and Davain Khajuria. "Centroid Based Clustering Approach for Extractive Text Summarization." International Journal for Research in Applied Science and Engineering Technology 11, no. 6 (June 30, 2023): 3404–9. http://dx.doi.org/10.22214/ijraset.2023.53542.

Повний текст джерела
Анотація:
Abstract: Extractive text summarization is the process of identifying the most important information from a large text and presenting it in a condensed form. One popular approach to this problem is the use of centroid-based clustering algorithms, which group together similar sentences based on their content and then select representative sentences from each cluster to form a summary. In this research, we present a centroid-based clustering algorithm for email summarization that combines the use of word embeddings with a clustering algorithm. We compare our algorithm to existing summarization techniques. Our results show that our approach stands close to existing methods in terms of summary quality, while also being computationally efficient. Overall, our work demonstrates the potential of centroid-based clustering algorithms for extractive text summarization and suggests avenues for further research in this area.
Стилі APA, Harvard, Vancouver, ISO та ін.
4

BOKAEI, MOHAMMAD HADI, HOSSEIN SAMETI, and YANG LIU. "Extractive summarization of multi-party meetings through discourse segmentation." Natural Language Engineering 22, no. 1 (March 4, 2015): 41–72. http://dx.doi.org/10.1017/s1351324914000199.

Повний текст джерела
Анотація:
AbstractIn this article we tackle the problem of multi-party conversation summarization. We investigate the role of discourse segmentation of a conversation on meeting summarization. First, an unsupervised function segmentation algorithm is proposed to segment the transcript into functionally coherent parts, such asMonologuei(which indicates a segment where speakeriis the dominant speaker, e.g., lecturing all the other participants) orDiscussionx1x2, . . .,xn(which indicates a segment where speakersx1toxninvolve in a discussion). Then the salience score for a sentence is computed by leveraging the score of the segment containing the sentence. Performance of our proposed segmentation and summarization algorithms is evaluated using the AMI meeting corpus. We show better summarization performance over other state-of-the-art algorithms according to different metrics.
Стилі APA, Harvard, Vancouver, ISO та ін.
5

Dutta, Soumi, Vibhash Chandra, Kanav Mehra, Asit Kumar Das, Tanmoy Chakraborty, and Saptarshi Ghosh. "Ensemble Algorithms for Microblog Summarization." IEEE Intelligent Systems 33, no. 3 (May 2018): 4–14. http://dx.doi.org/10.1109/mis.2018.033001411.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
6

Han, Kai, Shuang Cui, Tianshuai Zhu, Enpei Zhang, Benwei Wu, Zhizhuo Yin, Tong Xu, Shaojie Tang, and He Huang. "Approximation Algorithms for Submodular Data Summarization with a Knapsack Constraint." ACM SIGMETRICS Performance Evaluation Review 49, no. 1 (June 22, 2022): 65–66. http://dx.doi.org/10.1145/3543516.3453922.

Повний текст джерела
Анотація:
Data summarization, a fundamental methodology aimed at selecting a representative subset of data elements from a large pool of ground data, has found numerous applications in big data processing, such as social network analysis [5, 7], crowdsourcing [6], clustering [4], network design [13], and document/corpus summarization [14]. Moreover, it is well acknowledged that the "representativeness" of a dataset in data summarization applications can often be modeled by submodularity - a mathematical concept abstracting the "diminishing returns" property in the real world. Therefore, a lot of studies have cast data summarization as a submodular function maximization problem (e.g., [2]).
Стилі APA, Harvard, Vancouver, ISO та ін.
7

Han, Kai, Shuang Cui, Tianshuai Zhu, Enpei Zhang, Benwei Wu, Zhizhuo Yin, Tong Xu, Shaojie Tang, and He Huang. "Approximation Algorithms for Submodular Data Summarization with a Knapsack Constraint." Proceedings of the ACM on Measurement and Analysis of Computing Systems 5, no. 1 (February 18, 2021): 1–31. http://dx.doi.org/10.1145/3447383.

Повний текст джерела
Анотація:
Data summarization, i.e., selecting representative subsets of manageable size out of massive data, is often modeled as a submodular optimization problem. Although there exist extensive algorithms for submodular optimization, many of them incur large computational overheads and hence are not suitable for mining big data. In this work, we consider the fundamental problem of (non-monotone) submodular function maximization with a knapsack constraint, and propose simple yet effective and efficient algorithms for it. Specifically, we propose a deterministic algorithm with approximation ratio 6 and a randomized algorithm with approximation ratio 4, and show that both of them can be accelerated to achieve nearly linear running time at the cost of weakening the approximation ratio by an additive factor of ε. We then consider a more restrictive setting without full access to the whole dataset, and propose streaming algorithms with approximation ratios of 8+ε and 6+ε that make one pass and two passes over the data stream, respectively. As a by-product, we also propose a two-pass streaming algorithm with an approximation ratio of 2+ε when the considered submodular function is monotone. To the best of our knowledge, our algorithms achieve the best performance bounds compared to the state-of-the-art approximation algorithms with efficient implementation for the same problem. Finally, we evaluate our algorithms in two concrete submodular data summarization applications for revenue maximization in social networks and image summarization, and the empirical results show that our algorithms outperform the existing ones in terms of both effectiveness and efficiency.
Стилі APA, Harvard, Vancouver, ISO та ін.
8

Popescu, Claudiu, Lacrimioara Grama, and Corneliu Rusu. "A Highly Scalable Method for Extractive Text Summarization Using Convex Optimization." Symmetry 13, no. 10 (September 30, 2021): 1824. http://dx.doi.org/10.3390/sym13101824.

Повний текст джерела
Анотація:
The paper describes a convex optimization formulation of the extractive text summarization problem and a simple and scalable algorithm to solve it. The optimization program is constructed as a convex relaxation of an intuitive but computationally hard integer programming problem. The objective function is highly symmetric, being invariant under unitary transformations of the text representations. Another key idea is to replace the constraint on the number of sentences in the summary with a convex surrogate. For solving the program we have designed a specific projected gradient descent algorithm and analyzed its performance in terms of execution time and quality of the approximation. Using the datasets DUC 2005 and Cornell Newsroom Summarization Dataset, we have shown empirically that the algorithm can provide competitive results for single document summarization and multi-document query-based summarization. On the Cornell Newsroom Summarization Dataset, it ranked second among the unsupervised methods tested. For the more challenging task of multi-document query-based summarization, the method was tested on the DUC 2005 Dataset. Our algorithm surpassed the other reported methods with respect to the ROUGE-SU4 metric, and it was at less than 0.01 from the top performing algorithms with respect to ROUGE-1 and ROUGE-2 metrics.
Стилі APA, Harvard, Vancouver, ISO та ін.
9

Boussaid, L., A. Mtibaa, M. Abid, and M. Paindavoin. "Real-Time Algorithms for Video Summarization." Journal of Applied Sciences 6, no. 8 (April 1, 2006): 1679–85. http://dx.doi.org/10.3923/jas.2006.1679.1685.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
10

Ke, Xiangyu, Arijit Khan, and Francesco Bonchi. "Multi-relation Graph Summarization." ACM Transactions on Knowledge Discovery from Data 16, no. 5 (October 31, 2022): 1–30. http://dx.doi.org/10.1145/3494561.

Повний текст джерела
Анотація:
Graph summarization is beneficial in a wide range of applications, such as visualization, interactive and exploratory analysis, approximate query processing, reducing the on-disk storage footprint, and graph processing in modern hardware. However, the bulk of the literature on graph summarization surprisingly overlooks the possibility of having edges of different types. In this article, we study the novel problem of producing summaries of multi-relation networks, i.e., graphs where multiple edges of different types may exist between any pair of nodes. Multi-relation graphs are an expressive model of real-world activities, in which a relation can be a topic in social networks, an interaction type in genetic networks, or a snapshot in temporal graphs. The first approach that we consider for multi-relation graph summarization is a two-step method based on summarizing each relation in isolation, and then aggregating the resulting summaries in some clever way to produce a final unique summary. In doing this, as a side contribution, we provide the first polynomial-time approximation algorithm based on the k -Median clustering for the classic problem of lossless single-relation graph summarization. Then, we demonstrate the shortcomings of these two-step methods, and propose holistic approaches, both approximate and heuristic algorithms, to compute a summary directly for multi-relation graphs. In particular, we prove that the approximation bound of k -Median clustering for the single relation solution can be maintained in a multi-relation graph with proper aggregation operation over adjacency matrices corresponding to its multiple relations. Experimental results and case studies (on co-authorship networks and brain networks) validate the effectiveness and efficiency of the proposed algorithms.
Стилі APA, Harvard, Vancouver, ISO та ін.

Дисертації з теми "SUMMARIZATION ALGORITHMS"

1

Kolla, Maheedhar, and University of Lethbridge Faculty of Arts and Science. "Automatic text summarization using lexical chains : algorithms and experiments." Thesis, Lethbridge, Alta. : University of Lethbridge, Faculty of Arts and Science, 2004, 2004. http://hdl.handle.net/10133/226.

Повний текст джерела
Анотація:
Summarization is a complex task that requires understanding of the document content to determine the importance of the text. Lexical cohesion is a method to identify connected portions of the text based on the relations between the words in the text. Lexical cohesive relations can be represented using lexical chaings. Lexical chains are sequences of semantically related words spread over the entire text. Lexical chains are used in variety of Natural Language Processing (NLP) and Information Retrieval (IR) applications. In current thesis, we propose a lexical chaining method that includes the glossary relations in the chaining process. These relations enable us to identify topically related concepts, for instance dormitory and student, and thereby enhances the identification of cohesive ties in the text. We then present methods that use the lexical chains to generate summaries by extracting sentences from the document(s). Headlines are generated by filtering the portions of the sentences extracted, which do not contribute towards the meaning of the sentence. Headlines generated can be used in real world application to skim through the document collections in a digital library. Multi-document summarization is gaining demand with the explosive growth of online news sources. It requires identification of the several themes present in the collection to attain good compression and avoid redundancy. In this thesis, we propose methods to group the portions of the texts of a document collection into meaningful clusters. clustering enable us to extract the various themes of the document collection. Sentences from clusters can then be extracted to generate a summary for the multi-document collection. Clusters can also be used to generate summaries with respect to a given query. We designed a system to compute lexical chains for the given text and use them to extract the salient portions of the document. Some specific tasks considered are: headline generation, multi-document summarization, and query-based summarization. Our experimental evaluation shows that efficient summaries can be extracted for the above tasks.
viii, 80 leaves : ill. ; 29 cm.
Стилі APA, Harvard, Vancouver, ISO та ін.
2

Hodulik, George M. "Graph Summarization: Algorithms, Trained Heuristics, and Practical Storage Application." Case Western Reserve University School of Graduate Studies / OhioLINK, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=case1482143946391013.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
3

Hamid, Fahmida. "Evaluation Techniques and Graph-Based Algorithms for Automatic Summarization and Keyphrase Extraction." Thesis, University of North Texas, 2016. https://digital.library.unt.edu/ark:/67531/metadc862796/.

Повний текст джерела
Анотація:
Automatic text summarization and keyphrase extraction are two interesting areas of research which extend along natural language processing and information retrieval. They have recently become very popular because of their wide applicability. Devising generic techniques for these tasks is challenging due to several issues. Yet we have a good number of intelligent systems performing the tasks. As different systems are designed with different perspectives, evaluating their performances with a generic strategy is crucial. It has also become immensely important to evaluate the performances with minimal human effort. In our work, we focus on designing a relativized scale for evaluating different algorithms. This is our major contribution which challenges the traditional approach of working with an absolute scale. We consider the impact of some of the environment variables (length of the document, references, and system-generated outputs) on the performance. Instead of defining some rigid lengths, we show how to adjust to their variations. We prove a mathematically sound baseline that should work for all kinds of documents. We emphasize automatically determining the syntactic well-formedness of the structures (sentences). We also propose defining an equivalence class for each unit (e.g. word) instead of the exact string matching strategy. We show an evaluation approach that considers the weighted relatedness of multiple references to adjust to the degree of disagreements between the gold standards. We publish the proposed approach as a free tool so that other systems can use it. We have also accumulated a dataset (scientific articles) with a reference summary and keyphrases for each document. Our approach is applicable not only for evaluating single-document based tasks but also for evaluating multiple-document based tasks. We have tested our evaluation method for three intrinsic tasks (taken from DUC 2004 conference), and in all three cases, it correlates positively with ROUGE. Based on our experiments for DUC 2004 Question-Answering task, it correlates with the human decision (extrinsic task) with 36.008% of accuracy. In general, we can state that the proposed relativized scale performs as well as the popular technique (ROUGE) with flexibility for the length of the output. As part of the evaluation we have also devised a new graph-based algorithm focusing on sentiment analysis. The proposed model can extract units (e.g. words or sentences) from the original text belonging either to the positive sentiment-pole or to the negative sentiment-pole. It embeds both (positive and negative) types of sentiment-flow into a single text-graph. The text-graph is composed with words or phrases as nodes, and their relations as edges. By recursively calling two mutually exclusive relations the model builds the final rank of the nodes. Based on the final rank, it splits two segments from the article: one with highly positive sentiment and the other with highly negative sentiments. The output of this model was tested with the non-polar TextRank generated output to quantify how much of the polar summaries actually covers the fact along with sentiment.
Стилі APA, Harvard, Vancouver, ISO та ін.
4

Chiarandini, Luca. "Characterizing and modeling web sessions with applications." Doctoral thesis, Universitat Pompeu Fabra, 2014. http://hdl.handle.net/10803/283414.

Повний текст джерела
Анотація:
This thesis focuses on the analysis and modeling of web sessions, groups of requests made by a single user for a single navigation purpose. Understanding how people browse through websites is important, helping us to improve interfaces and provide to better content. After first conducting a statistical analysis of web sessions, we go on to present algorithms to summarize and model web sessions. Finally, we describe applications that use novel browsing methods, in particular parallel browsing. We observe that people tend to browse images in a sequences and that those sequences can be considered as units of content in their own right. The session summarization algorithm presented in this thesis tackles a novel pattern mining problem, and this algorithm can also be applied to other fields, such as information propagation. From the statistical analysis and the models presented, we show that contextual information, such as the referrer domain and the time of day, plays a major role in the evolution of sessions. To understand browsing one should therefore take into account the context in which it takes place.
Esta tesis se centra en el análisis y modelaje de sesiones web, grupos de solicitudes realizadas por un único usuario para un sólo propósito de navegación. La comprensión de cómo la gente navega a través de los sitios web es importante para mejorar la interfaz y ofrecer un mejor contenido. En primer lugar, se realiza un análisis estadístico de las sesiones web. En segundo lugar, se presentan los algoritmos para identificar los patrones de navegación frecuentes y modelar las sesiones web. Finalmente, se describen varias aplicaciones que utilizan nuevas formas de navegación: la navegación paralela. A través del análisis de los registros de uso se observa que las personas tienden a navegar por las imágenes en modo secuencial y que esas secuencias pueden ser consideradas como unidades de contenido. % La generación de resumenes de sesiones presentada en esta tesis es un problema nuevo de extracción de patrones y se puede aplicar también a otros campos como el de la propagación de información. A partir del análisis y los modelos presentados entendemos que la información contextual, como el dominio previo de acceso o la hora del día, juega un papel importante en la evolución de las sesiones. Para entender la navegación no se debe, por tanto, olvidar el contexto en que esta se lleva a cabo.
Стилі APA, Harvard, Vancouver, ISO та ін.
5

Bahri, Maroua. "Improving IoT data stream analytics using summarization techniques." Electronic Thesis or Diss., Institut polytechnique de Paris, 2020. http://www.theses.fr/2020IPPAT017.

Повний текст джерела
Анотація:
Face à cette évolution technologique vertigineuse, l’utilisation des dispositifs de l'Internet des Objets (IdO), les capteurs, et les réseaux sociaux, d'énormes flux de données IdO sont générées quotidiennement de différentes applications pourront être transformées en connaissances à travers l’apprentissage automatique. En pratique, de multiples problèmes se posent afin d’extraire des connaissances utiles de ces flux qui doivent être gérés et traités efficacement. Dans ce contexte, cette thèse vise à améliorer les performances (en termes de mémoire et de temps) des algorithmes de l'apprentissage supervisé, principalement la classification à partir de flux de données en évolution. En plus de leur nature infinie, la dimensionnalité élevée et croissante de ces flux données dans certains domaines rendent la tâche de classification plus difficile. La première partie de la thèse étudie l’état de l’art des techniques de classification et de réduction de dimension pour les flux de données, tout en présentant les travaux les plus récents dans ce cadre.La deuxième partie de la thèse détaille nos contributions en classification pour les flux de données. Il s’agit de nouvelles approches basées sur les techniques de réduction de données visant à réduire les ressources de calcul des classificateurs actuels, presque sans perte en précision. Pour traiter les flux de données de haute dimension efficacement, nous incorporons une étape de prétraitement qui consiste à réduire la dimension de chaque donnée (dès son arrivée) de manière incrémentale avant de passer à l’apprentissage. Dans ce contexte, nous présentons plusieurs approches basées sur: Bayesien naïf amélioré par les résumés minimalistes et hashing trick, k-NN qui utilise compressed sensing et UMAP, et l’utilisation d’ensembles d’apprentissage également
With the evolution of technology, the use of smart Internet-of-Things (IoT) devices, sensors, and social networks result in an overwhelming volume of IoT data streams, generated daily from several applications, that can be transformed into valuable information through machine learning tasks. In practice, multiple critical issues arise in order to extract useful knowledge from these evolving data streams, mainly that the stream needs to be efficiently handled and processed. In this context, this thesis aims to improve the performance (in terms of memory and time) of existing data mining algorithms on streams. We focus on the classification task in the streaming framework. The task is challenging on streams, principally due to the high -- and increasing -- data dimensionality, in addition to the potentially infinite amount of data. The two aspects make the classification task harder.The first part of the thesis surveys the current state-of-the-art of the classification and dimensionality reduction techniques as applied to the stream setting, by providing an updated view of the most recent works in this vibrant area.In the second part, we detail our contributions to the field of classification in streams, by developing novel approaches based on summarization techniques aiming to reduce the computational resource of existing classifiers with no -- or minor -- loss of classification accuracy. To address high-dimensional data streams and make classifiers efficient, we incorporate an internal preprocessing step that consists in reducing the dimensionality of input data incrementally before feeding them to the learning stage. We present several approaches applied to several classifications tasks: Naive Bayes which is enhanced with sketches and hashing trick, k-NN by using compressed sensing and UMAP, and also integrate them in ensemble methods
Стилі APA, Harvard, Vancouver, ISO та ін.
6

Santos, Joelson Antonio dos. "Algoritmos rápidos para estimativas de densidade hierárquicas e suas aplicações em mineração de dados." Universidade de São Paulo, 2018. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-25102018-174244/.

Повний текст джерела
Анотація:
O agrupamento de dados (ou do inglês Clustering) é uma tarefa não supervisionada capaz de descrever objetos em grupos (ou clusters), de maneira que objetos de um mesmo grupo sejam mais semelhantes entre si do que objetos de grupos distintos. As técnicas de agrupamento de dados são divididas em duas principais categorias: particionais e hierárquicas. As técnicas particionais dividem um conjunto de dados em um determinado número de grupos distintos, enquanto as técnicas hierárquicas fornecem uma sequência aninhada de agrupamentos particionais separados por diferentes níveis de granularidade. Adicionalmente, o agrupamento hierárquico de dados baseado em densidade é um paradigma particular de agrupamento que detecta grupos com diferentes concentrações ou densidades de objetos. Uma das técnicas mais populares desse paradigma é conhecida como HDBSCAN*. Além de prover hierarquias, HDBSCAN* é um framework que fornece detecção de outliers, agrupamento semi-supervisionado de dados e visualização dos resultados. No entanto, a maioria das técnicas hierárquicas, incluindo o HDBSCAN*, possui uma alta complexidade computacional. Fato que as tornam proibitivas para a análise de grandes conjuntos de dados. No presente trabalho de mestrado, foram propostas duas variações aproximadas de HDBSCAN* computacionalmente mais escaláveis para o agrupamento de grandes quantidades de dados. A primeira variação de HDBSCAN* segue o conceito de computação paralela e distribuída, conhecido como MapReduce. Já a segunda, segue o contexto de computação paralela utilizando memória compartilhada. Ambas as variações são baseadas em um conceito de divisão eficiente de dados, conhecido como Recursive Sampling, que permite o processamento paralelo desses dados. De maneira similar ao HDBSCAN*, as variações propostas também são capazes de fornecer uma completa análise não supervisionada de padrões em dados, incluindo a detecção de outliers. Experimentos foram realizados para avaliar a qualidade das variações propostas neste trabalho, especificamente, a variação baseada em MapReduce foi comparada com uma versão paralela e exata de HDBSCAN* conhecida como Random Blocks. Já a versão paralela em ambiente de memória compartilhada foi comparada com o estado da arte (HDBSCAN*). Em termos de qualidade de agrupamento e detecção de outliers, tanto a variação baseada em MapReduce quanto a baseada em memória compartilhada mostraram resultados próximos à versão paralela exata de HDBSCAN* e ao estado da arte, respectivamente. Já em termos de tempo computacional, as variações propostas mostraram maior escalabilidade e rapidez para o processamento de grandes quantidades de dados do que as versões comparadas.
Clustering is an unsupervised learning task able to describe a set of objects in clusters, so that objects of a same cluster are more similar than objects of other clusters. Clustering techniques are divided in two main categories: partitional and hierarchical. The particional techniques divide a dataset into a number of distinct clusters, while hierarchical techniques provide a nested sequence of partitional clusters separated by different levels of granularity. Furthermore, hierarchical density-based clustering is a particular clustering paradigm that detects clusters with different concentrations or densities of objects. One of the most popular techniques of this paradigm is known as HDBSCAN*. In addition to providing hierarchies, HDBSCAN* is a framework that provides outliers detection, semi-supervised clustering and visualization of results. However, most hierarchical techniques, including HDBSCAN*, have a high complexity computational. This fact makes them prohibitive for the analysis of large datasets. In this work have been proposed two approximate variations of HDBSCAN* computationally more scalable for clustering large amounts of data. The first variation follows the concept of parallel and distributed computing, known as MapReduce. The second one follows the context of parallel computing using shared memory. Both variations are based on a concept of efficient data division, known as Recursive Sampling, which allows parallel processing of this data. In a manner similar to HDBSCAN*, the proposed variations are also capable of providing complete unsupervised patterns analysis in data, including outliers detection. Experiments have been carried out to evaluate the quality of the variations proposed in this work, specifically, the variation based on MapReduce have been compared to a parallel and exact version of HDBSCAN*, known as Random Blocks. Already the version parallel in shared memory environment have been compared to the state of the art (HDBSCAN*). In terms of clustering quality and outliers detection, the variation based on MapReduce and other based on shared memory showed results close to the exact parallel verson of HDBSCAN* and the state of the art, respectively. In terms of computational time, the proposed variations showed greater scalability and speed for processing large amounts of data than the compared versions.
Стилі APA, Harvard, Vancouver, ISO та ін.
7

Krübel, Monique. "Analyse und Vergleich von Extraktionsalgorithmen für die Automatische Textzusammenfassung." Master's thesis, Universitätsbibliothek Chemnitz, 2006. http://nbn-resolving.de/urn:nbn:de:swb:ch1-200601180.

Повний текст джерела
Анотація:
Obwohl schon seit den 50er Jahren auf dem Gebiet der Automatischen Textzusammenfassung Forschung betrieben wird, wurden der Nutzen und die Notwendigkeit dieser Systeme erst mit dem Boom des Internets richtig erkannt. Das World Wide Web stellt eine täglich wachsende Menge an Informationen zu nahezu jedem Thema zur Verfügung. Um den Zeitaufwand zum Finden und auch zum Wiederfinden der richtigen Informationen zu minimieren, traten Suchmaschinen ihren Siegeszug an. Doch um einen Überblick zu einem ausgewählten Thema zu erhalten, ist eine einfache Auflistung aller in Frage kommenden Seiten nicht mehr adäquat. Zusätzliche Mechanismen wie Extraktionsalgorithmen für die automatische Generierung von Zusammenfassungen können hier helfen, Suchmaschinen oder Webkataloge zu optimieren, um so den Zeitaufwand bei der Recherche zu verringern und die Suche einfacher und komfortabler zu gestalten. In dieser Diplomarbeit wurde eine Analyse von Extraktionsalgorithmen durchgeführt, welche für die automatische Textzusammenfassung genutzt werden können. Auf Basis dieser Analyse als viel versprechend eingestufte Algorithmen wurden in Java implementiert und die mit diesen Algorithmen erstellten Zusammenfassungen in einer Evaluation verglichen.
Стилі APA, Harvard, Vancouver, ISO та ін.
8

Maaloul, Mohamed. "Approche hybride pour le résumé automatique de textes : Application à la langue arabe." Thesis, Aix-Marseille, 2012. http://www.theses.fr/2012AIXM4778.

Повний текст джерела
Анотація:
Cette thèse s'intègre dans le cadre du traitement automatique du langage naturel. La problématique du résumé automatique de documents arabes qui a été abordée, dans cette thèse, s'est cristallisée autour de deux points. Le premier point concerne les critères utilisés pour décider du contenu essentiel à extraire. Le deuxième point se focalise sur les moyens qui permettent d'exprimer le contenu essentiel extrait sous la forme d'un texte ciblant les besoins potentiels d'un utilisateur. Afin de montrer la faisabilité de notre approche, nous avons développé le système "L.A.E", basé sur une approche hybride qui combine une analyse symbolique avec un traitement numérique. Les résultats d'évaluation de ce système sont encourageants et prouvent la performance de l'approche hybride proposée. Ces résultats, ont montré, en premier lieu, l'applicabilité de l'approche dans le contexte de documents sans restriction quant à leur thème (Éducation, Sport, Science, Politique, Reportage, etc.), leur contenu et leur volume. Ils ont aussi montré l'importance de l'apprentissage dans la phase de classement et sélection des phrases forment l'extrait final
This thesis falls within the framework of Natural Language Processing. The problems of automatic summarization of Arabic documents which was approached, in this thesis, are based on two points. The first point relates to the criteria used to determine the essential content to extract. The second point focuses on the means to express the essential content extracted in the form of a text targeting the user potential needs.In order to show the feasibility of our approach, we developed the "L.A.E" system, based on a hybrid approach which combines a symbolic analysis with a numerical processing.The evaluation results are encouraging and prove the performance of the proposed hybrid approach.These results showed, initially, the applicability of the approach in the context of mono documents without restriction as for their topics (Education, Sport, Science, Politics, Interaction, etc), their content and their volume. They also showed the importance of the machine learning in the phase of classification and selection of the sentences forming the final extract
Стилі APA, Harvard, Vancouver, ISO та ін.
9

Pokorný, Lubomír. "Metody sumarizace textových dokumentů." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2012. http://www.nusl.cz/ntk/nusl-236443.

Повний текст джерела
Анотація:
This thesis deals with one-document summarization of text data. Part of it is devoted to data preparation, mainly to the normalization. Listed are some of the stemming algorithms and it contains also description of lemmatization. The main part is devoted to Luhn"s method for summarization and its extension of use WordNet dictionary. Oswald summarization method is described and applied as well. Designed and implemented application performs automatic generation of abstracts using these methods. A set of experiments where developed, which verified correct functionality of the application and of extension of Luhn"s summarization method too.
Стилі APA, Harvard, Vancouver, ISO та ін.
10

Hassanlou, Nasrin. "Probabilistic graph summarization." Thesis, 2012. http://hdl.handle.net/1828/4403.

Повний текст джерела
Анотація:
We study group-summarization of probabilistic graphs that naturally arise in social networks, semistructured data, and other applications. Our proposed framework groups the nodes and edges of the graph based on a user selected set of node attributes. We present methods to compute useful graph aggregates without the need to create all of the possible graph-instances of the original probabilistic graph. Also, we present an algorithm for graph summarization based on pure relational (SQL) technology. We analyze our algorithm and practically evaluate its efficiency using an extended Epinions dataset as well as synthetic datasets. The experimental results show the scalability of our algorithm and its efficiency in producing highly compressed summary graphs in reasonable time.
Graduate
Стилі APA, Harvard, Vancouver, ISO та ін.

Частини книг з теми "SUMMARIZATION ALGORITHMS"

1

Tian, Yuanyuan, and Jignesh M. Patel. "Interactive Graph Summarization." In Link Mining: Models, Algorithms, and Applications, 389–409. New York, NY: Springer New York, 2010. http://dx.doi.org/10.1007/978-1-4419-6515-8_15.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
2

Javed, Hira, M. M. Sufyan Beg, and Nadeem Akhtar. "Multimodal Summarization: A Concise Review." In Algorithms for Intelligent Systems, 613–23. Singapore: Springer Singapore, 2022. http://dx.doi.org/10.1007/978-981-16-6893-7_54.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
3

Komorowski, Artur, Lucjan Janowski, and Mikołaj Leszczuk. "Evaluation of Multimedia Content Summarization Algorithms." In Cryptology and Network Security, 424–33. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-98678-4_43.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
4

Zhao, Yu, Songping Huang, Dongsheng Zhou, Zhaoyun Ding, Fei Wang, and Aixin Nian. "CNsum: Automatic Summarization for Chinese News Text." In Wireless Algorithms, Systems, and Applications, 539–47. Cham: Springer Nature Switzerland, 2022. http://dx.doi.org/10.1007/978-3-031-19214-2_45.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
5

Sharma, Arjun Datt, and Shaleen Deep. "Too Long-Didn’t Read: A Practical Web Based Approach towards Text Summarization." In Applied Algorithms, 198–208. Cham: Springer International Publishing, 2014. http://dx.doi.org/10.1007/978-3-319-04126-1_17.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
6

Gokul Amuthan, S., and S. Chitrakala. "CESumm: Semantic Graph-Based Approach for Extractive Text Summarization." In Algorithms for Intelligent Systems, 89–100. Singapore: Springer Singapore, 2021. http://dx.doi.org/10.1007/978-981-16-3246-4_8.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
7

Chen, Chen, Cindy Xide Lin, Matt Fredrikson, Mihai Christodorescu, Xifeng Yan, and Jiawei Han. "Mining Large Information Networks by Graph Summarization." In Link Mining: Models, Algorithms, and Applications, 475–501. New York, NY: Springer New York, 2010. http://dx.doi.org/10.1007/978-1-4419-6515-8_18.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
8

Tsitovich, Aliaksei, Natasha Sharygina, Christoph M. Wintersteiger, and Daniel Kroening. "Loop Summarization and Termination Analysis." In Tools and Algorithms for the Construction and Analysis of Systems, 81–95. Berlin, Heidelberg: Springer Berlin Heidelberg, 2011. http://dx.doi.org/10.1007/978-3-642-19835-9_9.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
9

Rehman, Tohida, Suchandan Das, Debarshi Kumar Sanyal, and Samiran Chattopadhyay. "An Analysis of Abstractive Text Summarization Using Pre-trained Models." In Algorithms for Intelligent Systems, 253–64. Singapore: Springer Nature Singapore, 2022. http://dx.doi.org/10.1007/978-981-19-1657-1_21.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
10

Nadaf, Shatajbegum, and Vidyagouri B. Hemadri. "Extractive Summarization of Text Using Weighted Average of Feature Scores." In Algorithms for Intelligent Systems, 223–31. Singapore: Springer Singapore, 2021. http://dx.doi.org/10.1007/978-981-33-4893-6_20.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.

Тези доповідей конференцій з теми "SUMMARIZATION ALGORITHMS"

1

Aldeghlawi, Maher, Mohammed Q. Alkhatib, and Miguel Velez-Reyes. "Data summarization for hyperspectral image analysis." In Algorithms, Technologies, and Applications for Multispectral and Hyperspectral Imaging XXVII, edited by David W. Messinger and Miguel Velez-Reyes. SPIE, 2021. http://dx.doi.org/10.1117/12.2590762.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
2

Tatar, Doina, Andreea Diana Mihis, and Gabriela Serban Czibula. "Lexical Chains Segmentation in Summarization." In 2008 10th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing. IEEE, 2008. http://dx.doi.org/10.1109/synasc.2008.11.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
3

Thakkar, K. S., R. V. Dharaskar, and M. B. Chandak. "Graph-Based Algorithms for Text Summarization." In Third International Conference on Emerging Trends in Engineering and Technology (ICETET 2010). IEEE, 2010. http://dx.doi.org/10.1109/icetet.2010.104.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
4

Boonchaisuk, Prayote, and Kanda Runapongsa Saikaew. "Efficient algorithms for Thai tweet summarization." In 2016 International Computer Science and Engineering Conference (ICSEC). IEEE, 2016. http://dx.doi.org/10.1109/icsec.2016.7859926.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
5

Liu, Jie, Fuzhen Chen, Xianguo Ma, Zuoyan Gong, Jianliang Zhang, Zhengjian Liu, Yaozu Wang, and YunFei Ma. "Summarization of Sinter Quality Prediction Algorithms." In 2022 37th Youth Academic Annual Conference of Chinese Association of Automation (YAC). IEEE, 2022. http://dx.doi.org/10.1109/yac57282.2022.10023825.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
6

Dutulescu, Andreea Nicoleta, Mihai Dascalu, and Stefan Ruseti. "Unsupervised Extractive Summarization with BERT." In 2022 24th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC). IEEE, 2022. http://dx.doi.org/10.1109/synasc57785.2022.00032.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
7

Liu, Na, Xiao-Jun Tang, Ying Lu, Ming-Xia Li, Hai-Wen Wang, and Peng Xiao. "Topic-Sensitive Multi-document Summarization Algorithm." In 2014 Sixth International Symposium on Parallel Architectures, Algorithms and Programming (PAAP). IEEE, 2014. http://dx.doi.org/10.1109/paap.2014.22.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
8

Li, Cong, Shuangxiong Wei, Yuxuan Liu, Siyi Luo, Di Yang, and Zengkai Wang. "Attention based fully convolutional network for video summarization." In International Conference on Algorithms, Microchips, and Network Applications, edited by Fengjie Cen and Ning Sun. SPIE, 2022. http://dx.doi.org/10.1117/12.2636379.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
9

Olariu, Andrei. "Clustering to Improve Microblog Stream Summarization." In 2012 14th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC). IEEE, 2012. http://dx.doi.org/10.1109/synasc.2012.10.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
10

Dey, Tamal K., Facundo Mémoli, and Yusu Wang. "Multiscale Mapper: Topological Summarization via Codomain Covers." In Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms. Philadelphia, PA: Society for Industrial and Applied Mathematics, 2015. http://dx.doi.org/10.1137/1.9781611974331.ch71.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Ми пропонуємо знижки на всі преміум-плани для авторів, чиї праці увійшли до тематичних добірок літератури. Зв'яжіться з нами, щоб отримати унікальний промокод!

До бібліографії