Dissertations / Theses on the topic 'Summarization'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Summarization.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Bosma, Wauter Eduard. "Discourse oriented summarization." Enschede : Centre for Telematics and Information Technology (CTIT), 2008. http://doc.utwente.nl/58836.
Full textMoon, Brandon B. "Interactive football summarization /." Diss., CLICK HERE for online access, 2010. http://contentdm.lib.byu.edu/ETD/image/etd3337.pdf.
Full textMoon, Brandon B. "Interactive Football Summarization." BYU ScholarsArchive, 2009. https://scholarsarchive.byu.edu/etd/1999.
Full textSizov, Gleb. "Extraction-Based Automatic Summarization : Theoretical and Empirical Investigation of Summarization Techniques." Thesis, Norwegian University of Science and Technology, Department of Computer and Information Science, 2010. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-10861.
Full textA summary is a shortened version of a text that contains the main points of the original content. Automatic summarization is the task of generating a summary by a computer. For example, given a collection of news articles for the last week an automatic summarizer is able to create a concise overview of the important events. This summary can be used as the replacement for the original content or help to identify the events that a person is particularly interested in. Potentially, automatic summarization can save a lot of time for people that deal with a large amount of textual information. The straightforward way to generate a summary is to select several sentences from the original text and organize them in way to create a coherent text. This approach is called extraction-based summarization and is the topic of this thesis. Extraction-based summarization is a complex task that consists of several challenging subtasks. The essential part of the extraction-based approach is identification of sentences that contain important information. It can be done using graph-based representations and centrality measures that exploit similarities between sentences to identify the most central sentences. This thesis provide a comprehensive overview of methods used in extraction-based automatic summarization. In addition, several general natural language processing issues such as feature selection and text representation models are discussed with regard to automatic summarization. Part of the thesis is dedicated to graph-based representations and centrality measures used in extraction-based summarization. Theoretical analysis is reinforced with the experiments using the summarization framework implemented for this thesis. The task for the experiments is query-focused multi-document extraction-based summarization, that is, summarization of several documents according to a user query. The experiments investigate several approaches to this task as well as the use of different representation models, similarity and centrality measures. The obtained results indicate that use of graph centrality measures significantly improves the quality of generated summaries. Among the variety of centrality measure the degree-based ones perform better than path-based measures. The best performance is achieved when centralities are combined with redundancy removal techniques that prevent inclusion of similar sentences in a summary. Experiments with representation models reveal that a simple local term count representation performs better than the distributed representation based on latent semantic analysis, which indicates that further investigation of distributed representations in regard to automatic summarization is necessary. The implemented system performs quite good compared with the systems that participated in DUC 2007 summarization competition. Nevertheless, manual inspection of the generated summaries demonstrate some of the flaws of the implemented summarization mechanism that can be addressed by introducing advanced algorithms for sentence simplification and sentence ordering.
Chellal, Abdelhamid. "Event summarization on social media stream : retrospective and prospective tweet summarization." Thesis, Toulouse 3, 2018. http://www.theses.fr/2018TOU30118/document.
Full textUser-generated content on social media, such as Twitter, provides in many cases, the latest news before traditional media, which allows having a retrospective summary of events and being updated in a timely fashion whenever a new development occurs. However, social media, while being a valuable source of information, can be also overwhelming given the volume and the velocity of published information. To shield users from being overwhelmed by irrelevant and redundant posts, retrospective summarization and prospective notification (real-time summarization) were introduced as two complementary tasks of information seeking on document streams. The former aims to select a list of relevant and non-redundant tweets that capture "what happened". In the latter, systems monitor the live posts stream and push relevant and novel notifications as soon as possible. Our work falls within these frameworks and focuses on developing a tweet summarization approaches for the two aforementioned scenarios. It aims at providing summaries that capture the key aspects of the event of interest to help users to efficiently acquire information and follow the development of long ongoing events from social media. Nevertheless, tweet summarization task faces many challenges that stem from, on one hand, the high volume, the velocity and the variety of the published information and, on the other hand, the quality of tweets, which can vary significantly. In the prospective notification, the core task is the relevancy and the novelty detection in real-time. For timeliness, a system may choose to push new updates in real-time or may choose to trade timeliness for higher notification quality. Our contributions address these levels: First, we introduce Word Similarity Extended Boolean Model (WSEBM), a relevance model that does not rely on stream statistics and takes advantage of word embedding model. We used word similarity instead of the traditional weighting techniques. By doing this, we overcome the shortness and word mismatch issues in tweets. The intuition behind our proposition is that context-aware similarity measure in word2vec is able to consider different words with the same semantic meaning and hence allows offsetting the word mismatch issue when calculating the similarity between a tweet and a topic. Second, we propose to compute the novelty score of the incoming tweet regarding all words of tweets already pushed to the user instead of using the pairwise comparison. The proposed novelty detection method scales better and reduces the execution time, which fits real-time tweet filtering. Third, we propose an adaptive Learning to Filter approach that leverages social signals as well as query-dependent features. To overcome the issue of relevance threshold setting, we use a binary classifier that predicts the relevance of the incoming tweet. In addition, we show the gain that can be achieved by taking advantage of ongoing relevance feedback. Finally, we adopt a real-time push strategy and we show that the proposed approach achieves a promising performance in terms of quality (relevance and novelty) with low cost of latency whereas the state-of-the-art approaches tend to trade latency for higher quality. This thesis also explores a novel approach to generate a retrospective summary that follows a different paradigm than the majority of state-of-the-art methods. We consider the summary generation as an optimization problem that takes into account the topical and the temporal diversity. Tweets are filtered and are incrementally clustered in two cluster types, namely topical clusters based on content similarity and temporal clusters that depends on publication time. Summary generation is formulated as integer linear problem in which unknowns variables are binaries, the objective function is to be maximized and constraints ensure that at most one post per cluster is selected with respect to the defined summary length limit
Nahnsen, Thade. "Automation of summarization evaluation methods and their application to the summarization process." Thesis, University of Edinburgh, 2011. http://hdl.handle.net/1842/5278.
Full textSmith, Christian. "Automatic summarization and readability." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-68332.
Full textSeidlhofer, Barbara. "Discourse analysis for summarization." Thesis, University College London (University of London), 1991. http://discovery.ucl.ac.uk/10018780/.
Full textCeylan, Hakan. "Investigating the Extractive Summarization of Literary Novels." Thesis, University of North Texas, 2011. https://digital.library.unt.edu/ark:/67531/metadc103298/.
Full textDemirtas, Kezban. "Automatic Video Categorization And Summarization." Master's thesis, METU, 2009. http://etd.lib.metu.edu.tr/upload/3/12611113/index.pdf.
Full textKazantseva, Anna. "Automatic summarization of short fiction." Thesis, University of Ottawa (Canada), 2007. http://hdl.handle.net/10393/27861.
Full textWen, Chung-Lin S. M. Massachusetts Institute of Technology. "Event-centric Twitter photo summarization." Thesis, Massachusetts Institute of Technology, 2014. http://hdl.handle.net/1721.1/91417.
Full text40
Cataloged from PDF version of thesis.
Includes bibliographical references (pages 71-74).
We develop a novel algorithm based on spectral geometry that summarize a photo collection into a small subset that represents the collection well. While the definition for a good summarization might not be unique, we focus on two metrics in this thesis: representativeness and diversity. By representativeness we mean that the sampled photo should be similar to other photos in the data set. The intuition behind this is that by regarding each photo as a "vote" towards the scene it depicts, we want to include the photos that have high "votes". Diversity is also desirable because repeating the same information is an inefficient use of the few spaces we have for summarization. We achieve these seemingly contradictory properties by applying diversified sampling on the denser part of the feature space. The proposed method uses diffusion distance to measure the distance between any given pair in the dataset. By emphasizing the connectivity of the local neighborhood, we achieve better accuracy compared to previous methods that used the global distance. Heat Kernel Signature (HKS) is then used to separate the denser part and the sparser part of the data. By intersecting the denser part generated by different features, we are able to remove most of the outliers, i.e., photos that have few similar photos in the dataset. Farthest Point Sampling (FPS) is then applied to give a diversified sampling, which produces our final summarization. The method can be applied to any image collection that has a specific topic but also a fair proportion of outliers. One scenario especially motivating us to develop this technique is the Twitter photos of a specific event. Microblogging services have became a major way that people share new information. However, the huge amount of data, the lack of structure, and the highly noisy nature prevent users from effectively mining useful information from it. There are textual data based methods but the absence of visual information makes them less valuable. To the best of our knowledge, this study is the first to address visual data in Twitter event summarization. Our method's output can produce a kind of "crowd-sourced news", useful for journalists as well as the general public. We illustrate our results by summarizing recent Twitter events and comparing them with those generated by metadata such as retweet numbers. Our results are of at least the same quality although produced by a fully automatic mechanism. In some cases, because metadata can be biased by factors such as the number of followers, our results are even better in comparison. We also note that by our initial pilot study, the photos we found with high-quality have little overlap with highly-tweeted photos. That suggests the signal we found is orthogonal to the retweet signal and the two signals can be potentially combined to achieve even better results.
by Chung-Lin Wen.
S.M.
Branavan, Satchuthananthavale Rasiah Kuhan. "High compression rate text summarization." Thesis, Massachusetts Institute of Technology, 2008. http://hdl.handle.net/1721.1/44368.
Full textIncludes bibliographical references (p. 95-97).
This thesis focuses on methods for condensing large documents into highly concise summaries, achieving compression rates on par with human writers. While the need for such summaries in the current age of information overload is increasing, the desired compression rate has thus far been beyond the reach of automatic summarization systems. The potency of our summarization methods is due to their in-depth modelling of document content in a probabilistic framework. We explore two types of document representation that capture orthogonal aspects of text content. The first represents the semantic properties mentioned in a document in a hierarchical Bayesian model. This method is used to summarize thousands of consumer reviews by identifying the product properties mentioned by multiple reviewers. The second representation captures discourse properties, modelling the connections between different segments of a document. This discriminatively trained model is employed to generate tables of contents for books and lecture transcripts. The summarization methods presented here have been incorporated into large-scale practical systems that help users effectively access information online.
by Satchuthananthavale Rasiah Kuhan Branavan.
S.M.
LI, WEI. "HIERARCHICAL SUMMARIZATION OF VIDEO DATA." University of Cincinnati / OhioLINK, 2007. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1186941444.
Full textSubramanian, Hema. "Summarization Of Real Valued Biclusters." University of Cincinnati / OhioLINK, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1307442728.
Full textKarlbom, Hannes. "Abstractive Summarization of Podcast Transcriptions." Thesis, Uppsala universitet, Artificiell intelligens, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-443377.
Full textLinhares, Pontes Elvys. "Compressive Cross-Language Text Summarization." Thesis, Avignon, 2018. http://www.theses.fr/2018AVIG0232/document.
Full textThe popularization of social networks and digital documents increased quickly the informationavailable on the Internet. However, this huge amount of data cannot be analyzedmanually. Natural Language Processing (NLP) analyzes the interactions betweencomputers and human languages in order to process and to analyze natural languagedata. NLP techniques incorporate a variety of methods, including linguistics, semanticsand statistics to extract entities, relationships and understand a document. Amongseveral NLP applications, we are interested, in this thesis, in the cross-language textsummarization which produces a summary in a language different from the languageof the source documents. We also analyzed other NLP tasks (word encoding representation,semantic similarity, sentence and multi-sentence compression) to generate morestable and informative cross-lingual summaries.Most of NLP applications (including all types of text summarization) use a kind ofsimilarity measure to analyze and to compare the meaning of words, chunks, sentencesand texts in their approaches. A way to analyze this similarity is to generate a representationfor these sentences that contains the meaning of them. The meaning of sentencesis defined by several elements, such as the context of words and expressions, the orderof words and the previous information. Simple metrics, such as cosine metric andEuclidean distance, provide a measure of similarity between two sentences; however,they do not analyze the order of words or multi-words. Analyzing these problems,we propose a neural network model that combines recurrent and convolutional neuralnetworks to estimate the semantic similarity of a pair of sentences (or texts) based onthe local and general contexts of words. Our model predicted better similarity scoresthan baselines by analyzing better the local and the general meanings of words andmulti-word expressions.In order to remove redundancies and non-relevant information of similar sentences,we propose a multi-sentence compression method that compresses similar sentencesby fusing them in correct and short compressions that contain the main information ofthese similar sentences. We model clusters of similar sentences as word graphs. Then,we apply an integer linear programming model that guides the compression of theseclusters based on a list of keywords. We look for a path in the word graph that has goodcohesion and contains the maximum of keywords. Our approach outperformed baselinesby generating more informative and correct compressions for French, Portugueseand Spanish languages. Finally, we combine these previous methods to build a cross-language text summarizationsystem. Our system is an {English, French, Portuguese, Spanish}-to-{English,French} cross-language text summarization framework that analyzes the informationin both languages to identify the most relevant sentences. Inspired by the compressivetext summarization methods in monolingual analysis, we adapt our multi-sentencecompression method for this problem to just keep the main information. Our systemproves to be a good alternative to compress redundant information and to preserve relevantinformation. Our system improves informativeness scores without losing grammaticalquality for French-to-English cross-lingual summaries. Analyzing {English,French, Portuguese, Spanish}-to-{English, French} cross-lingual summaries, our systemsignificantly outperforms extractive baselines in the state of the art for all these languages.In addition, we analyze the cross-language text summarization of transcriptdocuments. Our approach achieved better and more stable scores even for these documentsthat have grammatical errors and missing information
Wu, Jiewen. "WHISK: Web Hosted Information into Summarized Knowledge." DigitalCommons@CalPoly, 2016. https://digitalcommons.calpoly.edu/theses/1633.
Full textOzsoy, Makbule Gulcin. "Text Summarization Using Latent Semantic Analysis." Master's thesis, METU, 2011. http://etd.lib.metu.edu.tr/upload/12612988/index.pdf.
Full textOya, Tatsuro. "Automatic abstractive summarization of meeting conversations." Thesis, University of British Columbia, 2014. http://hdl.handle.net/2429/49946.
Full textScience, Faculty of
Computer Science, Department of
Graduate
Liu, Qing Computer Science & Engineering Faculty of Engineering UNSW. "Summarization of very large spatial dataset." Awarded by:University of New South Wales. School of Computer Science and Engineering, 2006. http://handle.unsw.edu.au/1959.4/25489.
Full textMlynarski, Angela, and University of Lethbridge Faculty of Arts and Science. "Automatic text summarization in digital libraries." Thesis, Lethbridge, Alta. : University of Lethbridge, Faculty of Arts and Science, 2006, 2006. http://hdl.handle.net/10133/270.
Full textxiii, 142 leaves ; 28 cm.
Sanchan, Nattapong. "Domain-focused summarization of polarized debates." Thesis, University of Sheffield, 2018. http://etheses.whiterose.ac.uk/20878/.
Full textSingi, Reddy Dinesh Reddy. "Comparative text summarization of product reviews." Thesis, Kansas State University, 2010. http://hdl.handle.net/2097/7031.
Full textDepartment of Computing and Information Sciences
William H. Hsu
This thesis presents an approach towards summarizing product reviews using comparative sentences by sentiment analysis. Specifically, we consider the problem of extracting and scoring features from natural language text for qualitative reviews in a particular domain. When shopping for a product, customers do not find sufficient time to learn about all products on the market. Similarly, manufacturers do not have proper written sources from which to learn about customer opinions. The only available techniques involve gathering customer opinions, often in text form, from e-commerce and social networking web sites and analyzing them, which is a costly and time-consuming process. In this work I address these issues by applying sentiment analysis, an automated method of finding the opinion stated by an author about some entity in a text document. Here I first gather information about smart phones from many e-commerce web sites. I then present a method to differentiate comparative sentences from normal sentences, form feature sets for each domain, and assign a numerical score to each feature of a product and a weight coefficient obtained by statistical machine learning, to be used as a weight for that feature in ranking various products by linear combinations of their weighted feature scores. In this thesis I also explain what role comparative sentences play in summarizing the product. In order to find the polarity of each feature a statistical algorithm is defined using a small-to-medium sized data set. Then I present my experimental environment and results, and conclude with a review of claims and hypotheses stated at the outset. The approach specified in this thesis is evaluated using manual annotated trained data and also using data from domain experts. I also demonstrate empirically how different algorithms on this summarization can be derived from the technique provided by an annotator. Finally, I review diversified options for customers such as providing alternate products for each feature, top features of a product, and overall rankings for products.
Tohalino, Jorge Andoni Valverde. "Extractive document summarization using complex networks." Universidade de São Paulo, 2018. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-24102018-155954/.
Full textDevido à grande quantidade de informações textuais disponíveis na Internet, a tarefa de sumarização automática de documentos ganhou importância significativa. A sumarização de documentos tornou-se importante porque seu foco é o desenvolvimento de técnicas destinadas a encontrar conteúdo relevante e conciso em grandes volumes de informação sem alterar seu significado original. O objetivo deste trabalho de Mestrado é usar os conceitos da teoria de grafos para o resumo extrativo de documentos para Sumarização mono-documento (SDS) e Sumarização multi-documento (MDS). Neste trabalho, os documentos são modelados como redes, onde as sentenças são representadas como nós com o objetivo de extrair as sentenças mais relevantes através do uso de algoritmos de ranqueamento. As arestas entre nós são estabelecidas de maneiras diferentes. A primeira abordagem para o cálculo de arestas é baseada no número de substantivos comuns entre duas sentenças (nós da rede). Outra abordagem para criar uma aresta é através da similaridade entre duas sentenças. Para calcular a similaridade de tais sentenças, foi usado o modelo de espaço vetorial baseado na ponderação Tf-Idf e word embeddings para a representação vetorial das sentenças. Além disso, fazemos uma distinção entre as arestas que vinculam sentenças de diferentes documentos (inter-camada) e aquelas que conectam sentenças do mesmo documento (intra-camada) usando modelos de redes multicamada para a tarefa de Sumarização multi-documento. Nesta abordagem, cada camada da rede representa um documento do conjunto de documentos que será resumido. Além das medições tipicamente usadas em redes complexas como grau dos nós, coeficiente de agrupamento, caminhos mais curtos, etc., a caracterização da rede também é guiada por medições dinâmicas de redes complexas, incluindo simetria, acessibilidade e tempo de absorção. Os resumos gerados foram avaliados usando diferentes corpus para Português e Inglês. A métrica ROUGE-1 foi usada para a validação dos resumos gerados. Os resultados sugerem que os modelos mais simples, como redes baseadas em Noun e Tf-Idf, obtiveram um melhor desempenho em comparação com os modelos baseados em word embeddings. Além disso, excelentes resultados foram obtidos usando a representação de redes multicamada de documentos para MDS. Finalmente, concluímos que várias medidas podem ser usadas para melhorar a caracterização de redes para a tarefa de sumarização.
AGUIAR, C. Z. "Concept Maps Mining for Text Summarization." Universidade Federal do Espírito Santo, 2017. http://repositorio.ufes.br/handle/10/9846.
Full text8 Resumo Os mapas conceituais são ferramentas gráficas para a representação e construção do conhecimento. Conceitos e relações formam a base para o aprendizado e, portanto, os mapas conceituais têm sido amplamente utilizados em diferentes situações e para diferentes propósitos na educação, sendo uma delas a represent ação do texto escrito. Mes mo um gramá tico e complexo texto pode ser representado por um mapa conceitual contendo apenas conceitos e relações que represente m o que foi expresso de uma forma mais complicada. No entanto, a construção manual de um mapa conceit ual exige bastante tempo e esforço na identificação e estruturação do conhecimento, especialmente quando o mapa não deve representar os conceitos da estrutura cognitiva do autor. Em vez disso, o mapa deve representar os conceitos expressos em um texto. Ass im, várias abordagens tecnológicas foram propostas para facilitar o processo de construção de mapas conceituais a partir de textos. Portanto, esta dissertação propõe uma nova abordagem para a construção automática de mapas conceituais como sumarização de t extos científicos. A sumarização pretende produzir um mapa conceitual como uma representação resumida do texto, mantendo suas diversas e mais importantes características. A sumarização pode facilitar a compreensão dos textos, uma vez que os alunos estão te ntando lidar com a sobrecarga cognitiva causada pela crescente quantidade de informação textual disponível atualmente. Este crescimento também pode ser prejudicial à construção do conhecimento. Assim, consideramos a hipótese de que a sumarização de um text o representado por um mapa conceitual pode atribuir características importantes para assimilar o conhecimento do texto, bem como diminuir a sua complexidade e o tempo necessário para processá - lo. Neste contexto, realizamos uma revisão da literatura entre o s anos de 1994 e 2016 sobre as abordagens que visam a construção automática de mapas conceituais a partir de textos. A partir disso, construímos uma categorização para melhor identificar e analisar os recursos e as características dessas abordagens tecnoló gicas. Além disso, buscamos identificar as limitações e reunir as melhores características dos trabalhos relacionados para propor nossa abordagem. 9 Ademais, apresentamos um processo Concept Map Mining elaborado seguindo quatro dimensões : Descrição da Fonte de Dados, Definição do Domínio, Identificação de Elementos e Visualização do Mapa. Com o intuito de desenvolver uma arquitetura computacional para construir automaticamente mapas conceituais como sumarização de textos acadêmicos, esta pesquisa resultou na ferramenta pública CMBuilder , uma ferramenta online para a construção automática de mapas conceituais a partir de textos, bem como uma api java chamada ExtroutNLP , que contém bibliotecas para extração de informações e serviços públicos. Para alcançar o objetivo proposto, direcionados esforços para áreas de processamento de linguagem natural e recuperação de informação. Ressaltamos que a principal tarefa para alcançar nosso objetivo é extrair do texto as proposições do tipo ( conceito, rela ção, conceito ). Sob essa premissa, a pesquisa introduz um pipeline que compreende: regras gramaticais e busca em profundidade para a extração de conceitos e relações a partir do texto; mapeamento de preposição, resolução de anáforas e exploração de entidad es nomeadas para a rotulação de conceitos; ranking de conceitos baseado na análise de frequência de elementos e na topologia do mapa; e sumarização de proposição baseada na topologia do grafo. Além disso, a abordagem também propõe o uso de técnicas de apre ndizagem supervisionada de clusterização e classificação associadas ao uso de um tesauro para a definição do domínio do texto e construção de um vocabulário conceitual de domínios. Finalmente, uma análise objetiva para validar a exatidão da biblioteca Extr outNLP é executada e apresenta 0.65 precision sobre o corpus . Além disso, uma análise subjetiva para validar a qualidade do mapa conceitual construído pela ferramenta CMBuilder é realizada , apresentando 0.75/0.45 para precision / recall de conceitos e 0.57/ 0.23 para precision/ recall de relações em idioma inglês e apresenta ndo 0.68/ 0.38 para precision/ recall de conceitos e 0.41/ 0.19 para precision/ recall de relações em idioma português. Ademais , um experimento para verificar se o mapa conceitual sumarizado pe lo CMBuilder tem influência para a compreensão do assunto abordado em um texto é realizado , atingindo 60% de acertos para mapas extraídos de pequenos textos com questões de múltipla escolha e 77% de acertos para m apas extraídos de textos extensos com quest ões discursivas
O'Brien, Shayne S. M. Massachusetts Institute of Technology. "Unsupervised summarization of public talk radio." Thesis, Massachusetts Institute of Technology, 2019. https://hdl.handle.net/1721.1/123648.
Full textCataloged from PDF version of thesis.
Includes bibliographical references (pages 111-118).
Talk radio exerts significant influence on the political and social dynamics of the United States, but labor-intensive data collection and curation processes have prevented previous works from analyzing its content at scale. Over the past year, the Laboratory for Social Machines and Cortico have created an ingest system to record and automatically transcribe audio from more than 150 public talk radio stations across the country. Using the outputs from this ingest, I introduce "hierarchical compression" for neural unsupervised summarization of spoken opinion in conversational dialogue. By relying on an unsupervised framework that obviates the need for labeled data, the summarization task becomes largely agnostic to human input beyond necessary decisions regarding model architecture, input data, and output length. Trained models are thus able to automatically identify and summarize opinion in a dynamic fashion, which is noted in relevant literature as one of the most significant obstacles to fully unlocking talk radio as a data source for linguistic, ethnographic, and political analysis. To evaluate model performance, I create a novel spoken opinion summarization dataset consisting of compressed versions of "representative," opinion-containing utterances extracted from a hand-curated and crowd-source-annotated dataset of 275 snippets. I use this evaluation dataset to show that my model quantitatively outperforms strong rule- and graph-based unsupervised baselines on ROUGE and METEOR while qualitatively demonstrating fluency and information retention according to human judges. Additional analyses of model outputs show that many improvements are still yet to be made to this model, thus laying the ground for its use in important future work such as characterizing the linguistic structure of spoken opinion "in the wild."
by Shayne O'Brien.
S.M.
S.M. Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences
Pham, Quang-Khai. "Time Sequence Summarization: Theory and Applications." Phd thesis, Université de Nantes, 2010. http://tel.archives-ouvertes.fr/tel-00538512.
Full textPham, Quang-Khai. "Time sequence summarization : theory and applications." Phd thesis, Nantes, 2010. http://www.theses.fr/2010NANT2102.
Full textNiccolai, Lorenzo. "Distillation Knowledge applied on Pegasus for Summarization." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2020. http://amslaurea.unibo.it/22202/.
Full textCasamayor, Gerard. "Semantically-oriented text planning for automatic summarization." Doctoral thesis, Universitat Pompeu Fabra, 2021. http://hdl.handle.net/10803/671530.
Full textEl resum automàtic de textos és una tasca dins del camp d'estudi de processament del llenguatge natural que versa sobre la creació automàtica de resums d'un o més documents, ja sigui extraient fragments del text d'entrada or generant un resum des de zero. La recerca recent en aquesta tasca ha estat dominada per un nou paradigma on el resum és abordat com un mapeig d'una seqüència de paraules en el document d'entrada a una nova seqüència de paraules que resumeixen el document. Els treballs que segueixen aquest paradigma apliquen mètodes d'aprenentatge supervisat profund per tal d'aprendre model seqüència a seqüència a partir d'un gran corpus de documents emparellats amb resums escrits a mà. Tot i els resultats impressionants en avaluacions quantitatives automàtiques, aquesta aproximació al resum automàtic també té alguns inconvenients. Un primer problema és que els models entrenats tendeixen a operar com una caixa negra que impedeix obtenir coneixements o resultats de representacions intermèdies i que puguin ser aplicat a altres tasques. Aquest és un problema important en situacions del món real on els resums no son l'única sortida que s'espera d'un sistema de processament de llenguatge natural. Un altre inconvenient significatiu és que els mètodes d'aprenentatge profund estan limitats a idiomes i tipus de resum pels que existeixen grans corpus amb resums escrits per humans. Tot i que els investigadors experimenten amb mètodes de transferència del coneixement per a superar aquest problema, encara ens trobem lluny de saber com d'efectius son aquests mètodes i com aplicar-los a situacions on els resums s'han d'adaptar a consultes o preferències formulades per l'usuari. En aquells casos en que no és pràctic aprendre models de seqüència a seqüència, convé tornar a una formulació més tradicional del resum automàtic on els documents d'entrada s'analitzen en primer lloc, es planifica el resum tot seleccionant i organitzant continguts i el resum final es genera per extracció o abstracció, fent servir mètodes de generació de llenguatge natural en aquest últim cas. Separar l'anàlisi lingüístic, la planificació i la generació permet aplicar estratègies diferents a cada tasca. Aquesta tesi tracta el pas central de planificació del resum. Inspirant-nos en recerca existent en desambiguació de sentits de mots, resum automàtic de textos i generació de llenguatge natural, aquesta tesi presenta una estratègia no supervisada per a la creació de resums. Seguim l'observació de que el rànquing d'ítems (significats o fragments de text) és un mètode comú per a tasques desambiguació i de resum, i proposem un mètode central per a la nostra estratègia que ordena significats lèxics i paraules d'un text. L'ordre resultant contribueix a la creació d'una representació semàntica en forma de graf des de la que seleccionem continguts no redundants i els organitzem per a la seva inclusió en el resum. L'estratègia general es fonamenta en bases de dades lexicogràfiques que proporcionen coneixement creuat entre múltiples idiomes i àrees temàtiques, i per mètodes de càlcul de similitud entre texts que fem servir per comparar significats entre sí i amb el text. Els mètodes que es presenten en aquesta tesi son posats a prova en dues tasques separades, la desambiguació de sentits de paraula i d'entitats amb nom, i el resum extractiu de documents en anglès. L'avaluació de la desambiguació mostra que la nostra estratègia produeix resultats útils per a tasques més enllà del resum automàtic, mentre que l'avaluació del resum extractiu ens permet comparar el nostre enfocament a sistemes existents de resum automàtic. Tot i que els nostres resultats no representen un avenç significatiu respecte a l'estat de la qüestió en desambiguació i resum automàtic, suggereixen que l'estratègia té un gran potencial.
Reeve, Lawrence H. Han Hyoil. "Semantic annotation and summarization of biomedical text /." Philadelphia, Pa. : Drexel University, 2007. http://hdl.handle.net/1860/1779.
Full textHassel, Martin. "Resource Lean and Portable Automatic Text Summarization." Doctoral thesis, Stockholm : Numerisk analys och datalogi Numerical Analysis and Computer Science, Kungliga Tekniska högskolan, 2007. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-4414.
Full textUlrich, Jan. "Supervised machine learning for email thread summarization." Thesis, University of British Columbia, 2008. http://hdl.handle.net/2429/2363.
Full textDi, Fabrizzio Giuseppe. "Automatic summarization of opinions in service reviews." Thesis, University of Sheffield, 2012. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.632550.
Full text"Fractal summarization." 2003. http://library.cuhk.edu.hk/record=b6073565.
Full text"August 2003."
Thesis (Ph.D.)--Chinese University of Hong Kong, 2003.
Includes bibliographical references (p. 256-281).
Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web.
Mode of access: World Wide Web.
Abstracts in English and Chinese.
Costa, Vítor Manuel da. "Update Summarization." Master's thesis, 2014. https://repositorio-aberto.up.pt/handle/10216/77587.
Full textCosta, Vítor Manuel da. "Update Summarization." Dissertação, 2014. https://repositorio-aberto.up.pt/handle/10216/77587.
Full textHassanlou, Nasrin. "Probabilistic graph summarization." Thesis, 2012. http://hdl.handle.net/1828/4403.
Full textGraduate
Ding, Wei-Ming, and 丁偉民. "Summarization Scoring System." Thesis, 2005. http://ndltd.ncl.edu.tw/handle/41800171399131023093.
Full text國立臺灣師範大學
資訊教育學系
93
The main purpose of this research is to develop a summarization scoring system for the teacher of the elementary school. This system refers to a method called Latent Semantic Analysis(LSA), to use Singular Value Decomposition(SVD) to build the semantic space. We build several kinds of semantic spaces which the size and style of writing are different. To compare the keywords of the summarizations between the teacher and the students in the different semantic spaces for the scoring. In addition to scoring, we analyze the other indexes of the summarization scoring to find the more appropriate approaches to summarizing for Chinese. The participants are the students of Xi-men elementary school. After the processes of the experiments of the summarizing, we analyze the correlation between all kinds of indexes of scoring calculated by the system and the teacher of scoring in different kinds of semantic spaces. The Results of the research are :(1)We can get a good result when we compare the summarization between the teacher and the students in the semantic spaces built by the translation of the SVD.(2)We try to scoring the summarization written by the students by comparing to the sentences of the summarization between the teacher and the students, and think that this approach is a worthy aspect to research.(3)The difference of size and the style of writing will effect upon the analysis of the result for the indexes of the scoring.
Chiu, Chung-Ren, and 邱中人. "Chinese News Summarization." Thesis, 2000. http://ndltd.ncl.edu.tw/handle/12254466664744727912.
Full textKumar, Trun. "Automaic Text Summarization." Thesis, 2014. http://ethesis.nitrkl.ac.in/5619/1/110CS0127.pdf.
Full textKumar, T. "Automatic text summarization." Thesis, 2014. http://ethesis.nitrkl.ac.in/5617/1/E-65.pdf.
Full textLee, Hsiang-Pin, and 李祥賓. "Text Summarization on News." Thesis, 2001. http://ndltd.ncl.edu.tw/handle/54130654842944705212.
Full text東吳大學
資訊科學系
89
The swift development of information technique and the Internet has resulted in a problem of information overflow. Hence it is imperative to find a way to help users browse through documents efficiently and effectively. Text summarization could be a remedy to this problem. Traditional text summarization is usually processed manually. However, it does cost lots of human resources and cannot satisfy the demand in real time. Therefore, it is necessary to automate the process. This paper presents three methods of text summarization on Reuters news corpus. First, we use the technique of Information Retrieval to collect the important vocabulary of the document (called Important Vocabulary Extract Policy). Second, we determine the significance of the sentence with its position in the document (called Optimal Position Policy). Last, we expand the vocabulary of the title (called Title Expand Policy). To express the concept of the document, we extract the important vocabulary from the document and analyze its structure to find which position the document subject occupies. Moreover, we believe that the title is rather significant in the document. We therefore expand the relative vocabulary of the title from the WordNet. We then use the expanded set of words to find the appropriate sentence for summarization. In experimentation, we design different experiments for three text summarization methods. The summary of text is then evaluated according to text categorization. Experimental results indicate that all of the methods used in this thesis can achieve acceptable performance. Finally, this thesis also proposes a method to combine two policies -- Optimal Position and Title Expand. Opposite to the criterion in 65.6% precision rate, the proposed method result a 71.9% precision rate, a 9.6% improvement in precision.
Yang, Jeng-Yuan, and 楊政遠. "Statistical Chinese News Summarization." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/tu9p36.
Full text國立臺北科技大學
資訊工程系研究所
98
With the growing number of news articles around the world every day, it would be helpful to users if the time to read news articles can be reduced. Typically, there are two general ways to summarize documents: multi-document summarization and single-document summarization. Multi-document news summarization is similar to ‘hot topics of the week’, which only lists the most important news reports; while single-document news summarization is more similar to a short abstract, which help readers quickly grasp the overall idea in articles. The focus of single-document news summarization is to remove as many unimportant words as possible and only preserve major keywords. In this paper, we mainly focus on single-document summarization for Chinese news articles with statistical methods. The proposed architecture of this paper is as follows. First, auxiliary vocabularies will be collected from news articles, which are included as the dictionary of our system. The original news articles will be kept along with the vocabularies. The vocabularies are stored in word bi-grams, as well as the document frequency and term frequency. Then, these are used to calculate the importance of sentences and select the most representative sentences as the summary. In our experiments, we only adopted news articles in the ‘science and technology’ category since more new terms can be easily obtained. The experimental result showed that news summaries generated from our system can be effectively clustered with the original news articles. These news summaries also showed a great reduction in the time needed to read news articles, which also save the total time to read all news articles. This shows that we have successfully achieved the major goal of our proposed system: to reduce the news reading time.
Tsai, Jean Ya-Chin, and 蔡雅晴. "Comic-Styled Movie Summarization." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/88014021431825574473.
Full text國立臺灣大學
資訊管理學研究所
95
This paper intends to use comics as the form of presentation best for movie content summarization. Movies, with its powerful ability to convey stories and evoke emotions through moving frames, find a significant body of research and application in video content analysis field. However, while this art form has been widely investigated using existing video analysis technology, none of them has been able to produce story content summarization with pleasant or satisfactory results. Indeed, the comic form’s naturally rich visual story-telling vocabulary and vivid imagery is ideal for movie summarization. By re-examining the translation rules between movie and comics, we have been successful in building an effective system that produces comic-styled movie summaries within a reasonable time frame. In our system, a heuristic pictorial layout and balloon placement algorithm is proposed after image processing of keyframes selected by a one-pass video processing. By applying comic style rendering, the generated movie content summary is greatly enhanced in its appearance. The system is easy to implement, fast, and flexible; it can be adapted for use in a variety of movie genres and comic styles, and extended to fit in specific video processing techniques.
Cheng, Kun-You, and 成崑佑. "Content-oriented Video Summarization." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/09382153335369623907.
Full text國立東華大學
資訊工程學系
94
The paper presents a new framework of video summarization which can extract and summarize video shots that a user interests in from a long and complicated video, according to their similarity of motion type and scene. Firstly, the shot detection adopts the color and edge information to make shot boundaries accurate. Then the clustering process classifies the shots according to their similarities in scenes and motion types. Finally, we select the important shots of each cluster by estimating their priority value. The priority value determines the importance of each shot by measuring the motion energy and color variation. The proposed method can produce a classified video summary, which allows users to review and search the video more easily. Experiment results illustrate that the proposed method can successfully classify a video into several clusters of different motion types and scenes, and extract the specific shots according to their importance.
沈健誠. "Multi-Document Summarization System." Thesis, 2001. http://ndltd.ncl.edu.tw/handle/67547214470615254060.
Full text國立清華大學
資訊工程學系
89
Most summarization systems are designed for a single document at present. These systems indicate the essence of individual document, but do not transfer similar documents into single summary. Can we develop a multi-document summarization system, which transfers related documents with the same event into a summary? If that is possible, the main points of documents will be clearly and simply displayed with two or three sentences. Users can see whether these documents are what they want in a minute. It can reduce time for collecting documents and enable users to gather information on the Internet more efficiently. To develop a multi-document summarization system is the goal of this thesis. Summary produced by the must system satisfy two conditions: indicative and topic related. The summary should be tailored to suit user’s query. To achieve this goal, we will study the indicativeness and topic relevance of sentences, and the selection of sentences that are important and independence to each other. Finally, unimportant small clauses will be deleted, to make the final summary more concise. System generates summaries with 248 documents and fifty topics of NTCIR. The reduction rate is over 95%. overall, the quality of summaries produced were satisfactory.
Tsai, Jean Ya-Chin. "Comic-Styled Movie Summarization." 2007. http://www.cetd.com.tw/ec/thesisdetail.aspx?etdun=U0001-2307200715130800.
Full textChou, Yu-Yu, and 周宥宇. "Spoken Document Summarization : with Structural Support Vector Machine,Domain Adaptation and Abstractive Summarization." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/48591868287184216440.
Full text