Relevant bibliographies by topics / Text compression

Academic literature on the topic 'Text compression'

Author: Grafiati

Published: 4 June 2021

Last updated: 9 March 2023

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Journal articles
Dissertations / Theses
Books
Book chapters
Conference papers
Reports

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Text compression.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Text compression"

Divakaran, Sajilal, Biji C. L., Anjali C, and Achuthsankar S. Nair. "MALAYALAM TEXT COMPRESSION." International Journal of Information Systems and Engineering 1, no. 1 (April 30, 2013): 1–11. http://dx.doi.org/10.24924/ijise/2013.04/v1.iss1/1.11.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Davison, Andrew. "Vague text compression." ACM SIGACT News 24, no. 1 (January 15, 1993): 68–74. http://dx.doi.org/10.1145/152992.153009.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Bol'shakov, I. A., and A. V. Smirnov. "Text compression methods." Journal of Soviet Mathematics 56, no. 1 (August 1991): 2249–62. http://dx.doi.org/10.1007/bf01099202.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Murugesan, G., and Rosario Gilmary. "Compression of text files using genomic code compression algorithm." International Journal of Engineering & Technology 7, no. 2.31 (May 29, 2018): 69. http://dx.doi.org/10.14419/ijet.v7i2.31.13399.

Full text

Abstract:

Text files utilize substantial amount of memory or disk space. Transmission of these files across a network depends upon a considerable amount of bandwidth. Compression procedures are explicitly advantageous in telecommunications and information technology because it facilitate devices to disseminate or reserve the equivalent amount of data in fewer bits. Text compression techniques section, the English passage by observing the patters and provide alternative symbols for larger patters of text. To diminish the depository of copious information and data storage expenditure, compression algorithms were used. Compression of significant and massive cluster of information can head to the improvement in retrieval time. Novel lossless compression algorithms have been introduced for better compression ratio. In this work, the various existing compression mechanisms that are particular for compressing the text files and Deoxyribonucleic acid (DNA) sequence files are analyzed. The performance is correlated in terms of compression ratio, time taken to compress/decompress the sequence and file size. In this proposed work, the input file is converted to DNA format and then DNA compression procedure is applied.

APA, Harvard, Vancouver, ISO, and other styles

P, Srividya. "Optimization of Lossless Compression Algorithms using Multithreading." Journal of Information Technology and Sciences 9, no. 1 (March 2, 2023): 36–42. http://dx.doi.org/10.46610/joits.2022.v09i01.005.

Full text

Abstract:

The process of reducing the number of bits required to characterize data is referred to as compression. The advantages of compression include a reduction in the time taken to transfer data from one point to another, and a reduction in the cost required for the storage space and network bandwidth. There are two types of compression algorithms namely lossy compression algorithm and lossless compression algorithm. Lossy algorithms find utility in compressing audio and video signals whereas lossless algorithms are used in compressing text messages. The advent of the internet and its worldwide usage has not only raised the utility but also the storage of text, audio and video files. These multimedia files demand more storage space as compared to traditional files. This has given rise to the requirement for an efficient compression algorithm. There is a considerable improvement in the computing performance of the machines due to the advent of the multi-core processor. However, this multi-core architecture is not used by compression algorithms. This paper shows the implementation of lossless compression algorithms namely the Lempel-Ziv-Markov Algorithm, BZip2 and ZLIB algorithms using the concept of multithreading. The results obtained prove that the ZLIB algorithm proves to be more efficient in terms of the time taken to compress and decompress the text. The comparison is done for both compressions without multithreading and compression with multi-threading.

APA, Harvard, Vancouver, ISO, and other styles

Stecuła, Beniamin, Kinga Stecuła, and Adrian Kapczyński. "Compression of Text in Selected Languages—Efficiency, Volume, and Time Comparison." Sensors 22, no. 17 (August 25, 2022): 6393. http://dx.doi.org/10.3390/s22176393.

Full text

Abstract:

The goal of the research was to study the possibility of using the planned language Esperanto for text compression, and to compare the results of the text compression in Esperanto with the compression in natural languages, represented by Polish and English. The authors performed text compression in the created program in Python using four compression algorithms: zlib, lzma, bz2, and zl4 in four versions of the text: in Polish, English, Esperanto, and Esperanto in x notation (without characters outside ASCII encoding). After creating the compression program, and compressing the proper texts, authors conducted an analysis on the comparison of compression time and the volume of the text before and after compression. The results of the study confirmed the hypothesis, based on which the planned language, Esperanto, gives better text compression results than the natural languages represented by Polish and English. The confirmation by scientific methods that Esperanto is more optimal for text compression is the scientific added value of the paper.

APA, Harvard, Vancouver, ISO, and other styles

Nguyen, Vu H., Hien T. Nguyen, Hieu N. Duong, and Vaclav Snasel. "n-Gram-Based Text Compression." Computational Intelligence and Neuroscience 2016 (2016): 1–11. http://dx.doi.org/10.1155/2016/9483646.

Full text

Abstract:

We propose an efficient method for compressing Vietnamese text usingn-gram dictionaries. It has a significant compression ratio in comparison with those of state-of-the-art methods on the same dataset. Given a text, first, the proposed method splits it inton-grams and then encodes them based onn-gram dictionaries. In the encoding phase, we use a sliding window with a size that ranges from bigram to five grams to obtain the best encoding stream. Eachn-gram is encoded by two to four bytes accordingly based on its correspondingn-gram dictionary. We collected 2.5 GB text corpus from some Vietnamese news agencies to buildn-gram dictionaries from unigram to five grams and achieve dictionaries with a size of 12 GB in total. In order to evaluate our method, we collected a testing set of 10 different text files with different sizes. The experimental results indicate that our method achieves compression ratio around 90% and outperforms state-of-the-art methods.

APA, Harvard, Vancouver, ISO, and other styles

Omer. "Arabic Short Text Compression." Journal of Computer Science 6, no. 1 (January 1, 2010): 24–28. http://dx.doi.org/10.3844/jcssp.2010.24.28.

Full text

APA, Harvard, Vancouver, ISO, and other styles

KAUFMAN, YAIR, and SHMUEL T. KLEIN. "SEMI-LOSSLESS TEXT COMPRESSION." International Journal of Foundations of Computer Science 16, no. 06 (December 2005): 1167–78. http://dx.doi.org/10.1142/s012905410500373x.

Full text

Abstract:

A new notion, that of semi-lossless text compression, is introduced, and its applicability in various settings is investigated. First results suggest that it might be hard to exploit the additional redundancy of English texts, but the new methods could be useful in applications where the correct spelling is not important, such as in short emails, and the new notion raises some interesting research problems in several different areas of Computer Science.

APA, Harvard, Vancouver, ISO, and other styles

man, Suher, and Andysah Putera Utama Siahaan. "Huffman Text Compression Technique." International Journal of Computer Science and Engineering 3, no. 8 (August 25, 2016): 103–8. http://dx.doi.org/10.14445/23488387/ijcse-v3i8p124.

Full text

APA, Harvard, Vancouver, ISO, and other styles

More sources

Dissertations / Theses on the topic "Text compression"

Wilson, Timothy David. "Animation of text compression algorithms." Thesis, University of Canterbury. Computer Science, 1992. http://hdl.handle.net/10092/9570.

Full text

Abstract:

It has been said that, there is no particular mystery in animation ... it's very simple and like anything that is simple, it is about the hardest thing in the world to do. Text compression is about finding ways of representing text with the smallest amount of data such that it can be restored to its original state. Text compression algorithms are abstract concepts and bringing them into the visual domain is difficult but the effort can bring insight both to the student and to the researcher. This thesis presents some animations of text compression methods and observations about producing good educational and research animations. Several algorithm animation systems were used in the animation production and it was found that although there are several good animation systems fulfilling different functions, little is known about what makes good algorithms animation. A better way of defining animations and some practical principles for animation that were discovered while producing these animations are discussed.

APA, Harvard, Vancouver, ISO, and other styles

Branavan, Satchuthananthavale Rasiah Kuhan. "High compression rate text summarization." Thesis, Massachusetts Institute of Technology, 2008. http://hdl.handle.net/1721.1/44368.

Full text

Abstract:

Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.
Includes bibliographical references (p. 95-97).
This thesis focuses on methods for condensing large documents into highly concise summaries, achieving compression rates on par with human writers. While the need for such summaries in the current age of information overload is increasing, the desired compression rate has thus far been beyond the reach of automatic summarization systems. The potency of our summarization methods is due to their in-depth modelling of document content in a probabilistic framework. We explore two types of document representation that capture orthogonal aspects of text content. The first represents the semantic properties mentioned in a document in a hierarchical Bayesian model. This method is used to summarize thousands of consumer reviews by identifying the product properties mentioned by multiple reviewers. The second representation captures discourse properties, modelling the connections between different segments of a document. This discriminatively trained model is employed to generate tables of contents for books and lecture transcripts. The summarization methods presented here have been incorporated into large-scale practical systems that help users effectively access information online.
by Satchuthananthavale Rasiah Kuhan Branavan.
S.M.

APA, Harvard, Vancouver, ISO, and other styles

Langiu, Alessio. "Optimal Parsing for dictionary text compression." Thesis, Paris Est, 2012. http://www.theses.fr/2012PEST1091/document.

Full text

Abstract:

Les algorithmes de compression de données basés sur les dictionnaires incluent une stratégie de parsing pour transformer le texte d'entrée en une séquence de phrases du dictionnaire. Etant donné un texte, un tel processus n'est généralement pas unique et, pour comprimer, il est logique de trouver, parmi les parsing possibles, celui qui minimise le plus le taux de compression finale. C'est ce qu'on appelle le problème du parsing. Un parsing optimal est une stratégie de parsing ou un algorithme de parsing qui résout ce problème en tenant compte de toutes les contraintes d'un algorithme de compression ou d'une classe d'algorithmes de compression homogène. Les contraintes de l'algorithme de compression sont, par exemple, le dictionnaire lui-même, c'est-à-dire l'ensemble dynamique de phrases disponibles, et combien une phrase pèse sur le texte comprimé, c'est-à-dire quelle est la longueur du mot de code qui représente la phrase, appelée aussi le coût du codage d'un pointeur de dictionnaire. En plus de 30 ans d'histoire de la compression de texte par dictionnaire, une grande quantité d'algorithmes, de variantes et d'extensions sont apparus. Cependant, alors qu'une telle approche de la compression du texte est devenue l'une des plus appréciées et utilisées dans presque tous les processus de stockage et de communication, seuls quelques algorithmes de parsing optimaux ont été présentés. Beaucoup d'algorithmes de compression manquent encore d'optimalité pour leur parsing, ou du moins de la preuve de l'optimalité. Cela se produit parce qu'il n'y a pas un modèle général pour le problème de parsing qui inclut tous les algorithmes par dictionnaire et parce que les parsing optimaux existants travaillent sous des hypothèses trop restrictives. Ce travail focalise sur le problème de parsing et présente à la fois un modèle général pour la compression des textes basée sur les dictionnaires appelé la théorie Dictionary-Symbolwise et un algorithme général de parsing qui a été prouvé être optimal sous certaines hypothèses réalistes. Cet algorithme est appelé Dictionary-Symbolwise Flexible Parsing et couvre pratiquement tous les cas des algorithmes de compression de texte basés sur dictionnaire ainsi que la grande classe de leurs variantes où le texte est décomposé en une séquence de symboles et de phrases du dictionnaire. Dans ce travail, nous avons aussi considéré le cas d'un mélange libre d'un compresseur par dictionnaire et d'un compresseur symbolwise. Notre Dictionary-Symbolwise Flexible Parsing couvre également ce cas-ci. Nous avons bien un algorithme de parsing optimal dans le cas de compression Dictionary-Symbolwise où le dictionnaire est fermé par préfixe et le coût d'encodage des pointeurs du dictionnaire est variable. Le compresseur symbolwise est un compresseur symbolwise classique qui fonctionne en temps linéaire, comme le sont de nombreux codeurs communs à longueur variable. Notre algorithme fonctionne sous l'hypothèse qu'un graphe spécial, qui sera décrit par la suite, soit bien défini. Même si cette condition n'est pas remplie, il est possible d'utiliser la même méthode pour obtenir des parsing presque optimaux. Dans le détail, lorsque le dictionnaire est comme LZ78, nous montrons comment mettre en œuvre notre algorithme en temps linéaire. Lorsque le dictionnaire est comme LZ77 notre algorithme peut être mis en œuvre en temps O (n log n) où n est le longueur du texte. Dans les deux cas, la complexité en espace est O (n). Même si l'objectif principal de ce travail est de nature théorique, des résultats expérimentaux seront présentés pour souligner certains effets pratiques de l'optimalité du parsing sur les performances de compression et quelques résultats expérimentaux plus détaillés sont mis dans une annexe appropriée
Dictionary-based compression algorithms include a parsing strategy to transform the input text into a sequence of dictionary phrases. Given a text, such process usually is not unique and, for compression purpose, it makes sense to find one of the possible parsing that minimizes the final compression ratio. This is the parsing problem. An optimal parsing is a parsing strategy or a parsing algorithm that solve the parsing problem taking account of all the constraints of a compression algorithm or of a class of homogeneous compression algorithms. Compression algorithm constrains are, for instance, the dictionary itself, i.e. the dynamic set of available phrases, and how much a phrase weight on the compressed text, i.e. the length of the codeword that represent such phrase also denoted as the cost of a dictionary pointer encoding. In more than 30th years of history of dictionary based text compression, while plenty of algorithms, variants and extensions appeared and while such approach to text compression become one of the most appreciated and utilized in almost all the storage and communication process, only few optimal parsing algorithms was presented. Many compression algorithms still leaks optimality of their parsing or, at least, proof of optimality. This happens because there is not a general model of the parsing problem that includes all the dictionary based algorithms and because the existing optimal parsings work under too restrictive hypothesis. This work focus on the parsing problem and presents both a general model for dictionary based text compression called Dictionary-Symbolwise theory and a general parsing algorithm that is proved to be optimal under some realistic hypothesis. This algorithm is called Dictionary-Symbolwise Flexible Parsing and it covers almost all the cases of dictionary based text compression algorithms together with the large class of their variants where the text is decomposed in a sequence of symbols and dictionary phrases.In this work we further consider the case of a free mixture of a dictionary compressor and a symbolwise compressor. Our Dictionary-Symbolwise Flexible Parsing covers also this case. We have indeed an optimal parsing algorithm in the case of dictionary-symbolwise compression where the dictionary is prefix closed and the cost of encoding dictionary pointer is variable. The symbolwise compressor is any classical one that works in linear time, as many common variable-length encoders do. Our algorithm works under the assumption that a special graph that will be described in the following, is well defined. Even if this condition is not satisfied it is possible to use the same method to obtain almost optimal parses. In detail, when the dictionary is LZ78-like, we show how to implement our algorithm in linear time. When the dictionary is LZ77-like our algorithm can be implemented in time O(n log n). Both have O(n) space complexity. Even if the main aim of this work is of theoretical nature, some experimental results will be introduced to underline some practical effects of the parsing optimality in compression performance and some more detailed experiments are hosted in a devoted appendix

APA, Harvard, Vancouver, ISO, and other styles

Ong, Ghim Hwee. "Text compression for transmission and storage." Thesis, Loughborough University, 1989. https://dspace.lboro.ac.uk/2134/13790.

Full text

Abstract:

The increasing use of computers for document preparation and publishing coupled with a growth in the general information management facilities available on computers has meant that most documents exist in computer processable form during their lifetime. This has led to a substantial increase in the demand for data storage facilities, which frequently seems to exceed the provision of storage facilities, despite the advances in storage technology. Furthermore, there is growing demand to transmit these textual documents from one use to another, rather than use a printed form for transfer between sites which then needs to be re-entered into a computer at the receiving site. Transmission facilities are, however, limited and large documents can be difficult and expensive to transmit. Problems of storage and transmission capacity can be alleviated by compacting the textual information beforehand, providing that there is no loss of information in this process. Conventional compaction techniques have been designed to compact all forms of data (binary as well as text) and have, predominantly, been based on the byte as the unit of compression. This thesis investigates the alternative of designing a compaction procedure for natural language texts, using the textual word as the unit of compression. Four related alternative techniques are developed and analysed in the thesis. These are designed to be appropriate for different circumstances where either maximum compression or maximum point to point transmission speed is of greatest importance, and where the characteristics of the transmission, or storage, medium may be oriented to a seven or eight bit data unit. The effectiveness of the four techniques is investigated both theoretically and by practical comparison with a widely used conventional alternative. It is shown that for a wide range of textual material the word based techniques yield a greater compression and require substantially less processing time.

APA, Harvard, Vancouver, ISO, and other styles

Jones, Greg 1963-2017. "RADIX 95n: Binary-to-Text Data Conversion." Thesis, University of North Texas, 1991. https://digital.library.unt.edu/ark:/67531/metadc500582/.

Full text

Abstract:

This paper presents Radix 95n, a binary to text data conversion algorithm. Radix 95n (base 95) is a variable length encoding scheme that offers slightly better efficiency than is available with conventional fixed length encoding procedures. Radix 95n advances previous techniques by allowing a greater pool of 7-bit combinations to be made available for 8-bit data translation. Since 8-bit data (i.e. binary files) can prove to be difficult to transfer over 7-bit networks, the Radix 95n conversion technique provides a way to convert data such as compiled programs or graphic images to printable ASCII characters and allows for their transfer over 7-bit networks.

APA, Harvard, Vancouver, ISO, and other styles

He, Meng. "Indexing Compressed Text." Thesis, University of Waterloo, 2003. http://hdl.handle.net/10012/1143.

Full text

Abstract:

As a result of the rapid growth of the volume of electronic data, text compression and indexing techniques are receiving more and more attention. These two issues are usually treated as independent problems, but approaches of combining them have recently attracted the attention of researchers. In this thesis, we review and test some of the more effective and some of the more theoretically interesting techniques. Various compression and indexing techniques are presented, and we also present two compressed text indices. Based on these techniques, we implement an compressed full-text index, so that compressed texts can be indexed to support fast queries without decompressing the whole texts. The experiments show that our index is compact and supports fast search.

APA, Harvard, Vancouver, ISO, and other styles

Blandon, Julio Cesar. "A novel lossless compression technique for text data." FIU Digital Commons, 1999. http://digitalcommons.fiu.edu/etd/1694.

Full text

Abstract:

The focus of this thesis is placed on text data compression based on the fundamental coding scheme referred to as the American Standard Code for Information Interchange or ASCII. The research objective is the development of software algorithms that result in significant compression of text data. Past and current compression techniques have been thoroughly reviewed to ensure proper contrast between the compression results of the proposed technique with those of existing ones. The research problem is based on the need to achieve higher compression of text files in order to save valuable memory space and increase the transmission rate of these text files. It was deemed necessary that the compression algorithm to be developed would have to be effective even for small files and be able to contend with uncommon words as they are dynamically included in the dictionary once they are encountered. A critical design aspect of this compression technique is its compatibility to existing compression techniques. In other words, the developed algorithm can be used in conjunction with existing techniques to yield even higher compression ratios. This thesis demonstrates such capabilities and such outcomes, and the research objective of achieving higher compression ratio is attained.

APA, Harvard, Vancouver, ISO, and other styles

Thaper, Nitin 1975. "Using compression for source-based classification of text." Thesis, Massachusetts Institute of Technology, 2001. http://hdl.handle.net/1721.1/86595.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Zhang, Nan. "TRANSFORM BASED AND SEARCH AWARE TEXT COMPRESSION SCHEMES AND COMPRESSED DOMAIN TEXT RETRIEVAL." Doctoral diss., University of Central Florida, 2005. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/3938.

Full text

Abstract:

In recent times, we have witnessed an unprecedented growth of textual information via the Internet, digital libraries and archival text in many applications. While a good fraction of this information is of transient interest, useful information of archival value will continue to accumulate. We need ways to manage, organize and transport this data from one point to the other on data communications links with limited bandwidth. We must also have means to speedily find the information we need from this huge mass of data. Sometimes, a single site may also contain large collections of data such as a library database, thereby requiring an efficient search mechanism even to search within the local data. To facilitate the information retrieval, an emerging ad hoc standard for uncompressed text is XML which preprocesses the text by putting additional user defined metadata such as DTD or hyperlinks to enable searching with better efficiency and effectiveness. This increases the file size considerably, underscoring the importance of applying text compression. On account of efficiency (in terms of both space and time), there is a need to keep the data in compressed form for as much as possible. Text compression is concerned with techniques for representing the digital text data in alternate representations that takes less space. Not only does it help conserve the storage space for archival and online data, it also helps system performance by requiring less number of secondary storage (disk or CD Rom) accesses and improves the network transmission bandwidth utilization by reducing the transmission time. Unlike static images or video, there is no international standard for text compression, although compressed formats like .zip, .gz, .Z files are increasingly being used. In general, data compression methods are classified as lossless or lossy. Lossless compression allows the original data to be recovered exactly. Although used primarily for text data, lossless compression algorithms are useful in special classes of images such as medical imaging, finger print data, astronomical images and data bases containing mostly vital numerical data, tables and text information. Many lossy algorithms use lossless methods at the final stage of the encoding stage underscoring the importance of lossless methods for both lossy and lossless compression applications. In order to be able to effectively utilize the full potential of compression techniques for the future retrieval systems, we need efficient information retrieval in the compressed domain. This means that techniques must be developed to search the compressed text without decompression or only with partial decompression independent of whether the search is done on the text or on some inversion table corresponding to a set of key words for the text. In this dissertation, we make the following contributions: (1) Star family compression algorithms: We have proposed an approach to develop a reversible transformation that can be applied to a source text that improves existing algorithm's ability to compress. We use a static dictionary to convert the English words into predefined symbol sequences. These transformed sequences create additional context information that is superior to the original text. Thus we achieve some compression at the preprocessing stage. We have a series of transforms which improve the performance. Star transform requires a static dictionary for a certain size. To avoid the considerable complexity of conversion, we employ the ternary tree data structure that efficiently converts the words in the text to the words in the star dictionary in linear time. (2) Exact and approximate pattern matching in Burrows-Wheeler transformed (BWT) files: We proposed a method to extract the useful context information in linear time from the BWT transformed text. The auxiliary arrays obtained from BWT inverse transform brings logarithm search time. Meanwhile, approximate pattern matching can be performed based on the results of exact pattern matching to extract the possible candidate for the approximate pattern matching. Then fast verifying algorithm can be applied to those candidates which could be just small parts of the original text. We present algorithms for both k-mismatch and k-approximate pattern matching in BWT compressed text. A typical compression system based on BWT has Move-to-Front and Huffman coding stages after the transformation. We propose a novel approach to replace the Move-to-Front stage in order to extend compressed domain search capability all the way to the entropy coding stage. A modification to the Move-to-Front makes it possible to randomly access any part of the compressed text without referring to the part before the access point. (3) Modified LZW algorithm that allows random access and partial decoding for the compressed text retrieval: Although many compression algorithms provide good compression ratio and/or time complexity, LZW is the first one studied for the compressed pattern matching because of its simplicity and efficiency. Modifications on LZW algorithm provide the extra advantage for fast random access and partial decoding ability that is especially useful for text retrieval systems. Based on this algorithm, we can provide a dynamic hierarchical semantic structure for the text, so that the text search can be performed on the expected level of granularity. For example, user can choose to retrieve a single line, a paragraph, or a file, etc. that contains the keywords. More importantly, we will show that parallel encoding and decoding algorithm is trivial with the modified LZW. Both encoding and decoding can be performed with multiple processors easily and encoding and decoding process are independent with respect to the number of processors.
Ph.D.
School of Computer Science
Engineering and Computer Science
Computer Science

APA, Harvard, Vancouver, ISO, and other styles

Linhares, Pontes Elvys. "Compressive Cross-Language Text Summarization." Thesis, Avignon, 2018. http://www.theses.fr/2018AVIG0232/document.

Full text

Abstract:

La popularisation des réseaux sociaux et des documents numériques a rapidement accru l'information disponible sur Internet. Cependant, cette quantité massive de données ne peut pas être analysée manuellement. Parmi les applications existantes du Traitement Automatique du Langage Naturel (TALN), nous nous intéressons dans cette thèse au résumé cross-lingue de texte, autrement dit à la production de résumés dans une langue différente de celle des documents sources. Nous analysons également d'autres tâches du TALN (la représentation des mots, la similarité sémantique ou encore la compression de phrases et de groupes de phrases) pour générer des résumés cross-lingues plus stables et informatifs. La plupart des applications du TALN, celle du résumé automatique y compris, utilisent une mesure de similarité pour analyser et comparer le sens des mots, des séquences de mots, des phrases et des textes. L’une des façons d'analyser cette similarité est de générer une représentation de ces phrases tenant compte de leur contenu. Le sens des phrases est défini par plusieurs éléments, tels que le contexte des mots et des expressions, l'ordre des mots et les informations précédentes. Des mesures simples, comme la mesure cosinus et la distance euclidienne, fournissent une mesure de similarité entre deux phrases. Néanmoins, elles n'analysent pas l'ordre des mots ou les séquences de mots. En analysant ces problèmes, nous proposons un modèle de réseau de neurones combinant des réseaux de neurones récurrents et convolutifs pour estimer la similarité sémantique d'une paire de phrases (ou de textes) en fonction des contextes locaux et généraux des mots. Sur le jeu de données analysé, notre modèle a prédit de meilleurs scores de similarité que les systèmes de base en analysant mieux le sens local et général des mots mais aussi des expressions multimots. Afin d'éliminer les redondances et les informations non pertinentes de phrases similaires, nous proposons de plus une nouvelle méthode de compression multiphrase, fusionnant des phrases au contenu similaire en compressions courtes. Pour ce faire, nous modélisons des groupes de phrases semblables par des graphes de mots. Ensuite, nous appliquons un modèle de programmation linéaire en nombres entiers qui guide la compression de ces groupes à partir d'une liste de mots-clés ; nous cherchons ainsi un chemin dans le graphe de mots qui a une bonne cohésion et qui contient le maximum de mots-clés. Notre approche surpasse les systèmes de base en générant des compressions plus informatives et plus correctes pour les langues française, portugaise et espagnole. Enfin, nous combinons les méthodes précédentes pour construire un système de résumé de texte cross-lingue. Notre système génère des résumés cross-lingue de texte en analysant l'information à la fois dans les langues source et cible, afin d’identifier les phrases les plus pertinentes. Inspirés par les méthodes de résumé de texte par compression en analyse monolingue, nous adaptons notre méthode de compression multiphrase pour ce problème afin de ne conserver que l'information principale. Notre système s'avère être performant pour compresser l'information redondante et pour préserver l'information pertinente, en améliorant les scores d'informativité sans perdre la qualité grammaticale des résumés cross-lingues du français vers l'anglais. En analysant les résumés cross-lingues depuis l’anglais, le français, le portugais ou l’espagnol, vers l’anglais ou le français, notre système améliore les systèmes par extraction de l'état de l'art pour toutes ces langues. En outre, une expérience complémentaire menée sur des transcriptions automatiques de vidéo montre que notre approche permet là encore d'obtenir des scores ROUGE meilleurs et plus stables, même pour ces documents qui présentent des erreurs grammaticales et des informations inexactes ou manquantes
The popularization of social networks and digital documents increased quickly the informationavailable on the Internet. However, this huge amount of data cannot be analyzedmanually. Natural Language Processing (NLP) analyzes the interactions betweencomputers and human languages in order to process and to analyze natural languagedata. NLP techniques incorporate a variety of methods, including linguistics, semanticsand statistics to extract entities, relationships and understand a document. Amongseveral NLP applications, we are interested, in this thesis, in the cross-language textsummarization which produces a summary in a language different from the languageof the source documents. We also analyzed other NLP tasks (word encoding representation,semantic similarity, sentence and multi-sentence compression) to generate morestable and informative cross-lingual summaries.Most of NLP applications (including all types of text summarization) use a kind ofsimilarity measure to analyze and to compare the meaning of words, chunks, sentencesand texts in their approaches. A way to analyze this similarity is to generate a representationfor these sentences that contains the meaning of them. The meaning of sentencesis defined by several elements, such as the context of words and expressions, the orderof words and the previous information. Simple metrics, such as cosine metric andEuclidean distance, provide a measure of similarity between two sentences; however,they do not analyze the order of words or multi-words. Analyzing these problems,we propose a neural network model that combines recurrent and convolutional neuralnetworks to estimate the semantic similarity of a pair of sentences (or texts) based onthe local and general contexts of words. Our model predicted better similarity scoresthan baselines by analyzing better the local and the general meanings of words andmulti-word expressions.In order to remove redundancies and non-relevant information of similar sentences,we propose a multi-sentence compression method that compresses similar sentencesby fusing them in correct and short compressions that contain the main information ofthese similar sentences. We model clusters of similar sentences as word graphs. Then,we apply an integer linear programming model that guides the compression of theseclusters based on a list of keywords. We look for a path in the word graph that has goodcohesion and contains the maximum of keywords. Our approach outperformed baselinesby generating more informative and correct compressions for French, Portugueseand Spanish languages. Finally, we combine these previous methods to build a cross-language text summarizationsystem. Our system is an {English, French, Portuguese, Spanish}-to-{English,French} cross-language text summarization framework that analyzes the informationin both languages to identify the most relevant sentences. Inspired by the compressivetext summarization methods in monolingual analysis, we adapt our multi-sentencecompression method for this problem to just keep the main information. Our systemproves to be a good alternative to compress redundant information and to preserve relevantinformation. Our system improves informativeness scores without losing grammaticalquality for French-to-English cross-lingual summaries. Analyzing {English,French, Portuguese, Spanish}-to-{English, French} cross-lingual summaries, our systemsignificantly outperforms extractive baselines in the state of the art for all these languages.In addition, we analyze the cross-language text summarization of transcriptdocuments. Our approach achieved better and more stable scores even for these documentsthat have grammatical errors and missing information

APA, Harvard, Vancouver, ISO, and other styles

More sources

Books on the topic "Text compression"

Bell, Timothy C. Text compression. Englewood Cliffs, N.J: Prentice Hall, 1990.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Storer, James A. Image and Text Compression. Boston, MA: Springer US, 1992.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Storer, James A., ed. Image and Text Compression. Boston, MA: Springer US, 1992. http://dx.doi.org/10.1007/978-1-4615-3596-6.

Full text

APA, Harvard, Vancouver, ISO, and other styles

1953-, Storer James A., ed. Image and text compression. Boston: Kluwer Academic, 1992.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

D, Barni Mauro Ph, ed. Document and image compression. Boca Raton, FL: CRC/Taylor & Francis, 2006.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Alistair, Moffat, and Bell Timothy C, eds. Managing gigabytes: Compressing and indexing documents and images. New York: Van Nostrand Reinhold, 1994.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Sabourin, Conrad. Computational character processing: Character coding, input, output, synthesis, ordering, conversion, text compression, encryption, display hashing, literate programming : bibliography. Montréal: Infolingua, 1994.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Sabourin, Conrad. Computational speech processing: Speech analysis, recognition, understanding, compression, transmission, coding, synthesis, text to speech systems, speech to tactile displays, speaker identification, prosody processing : bibliography. Montréal: Infolingua, 1994.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Cold Regions Research and Engineering Laboratory (U.S.), ed. Axial double-ball test versus the uniaxial unconfined compression test for measuring the compressive strength of freshwater and sea ice. [Hanover, N.H.]: US Army Corps of Engineers, Cold Regions Research & Engineering Laboratory, 1993.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Davis, Randall C. Analysis and test of superplastically formed titanium hat-stiffened panels under compression. [S.l.]: [s.n.], 1986.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

More sources

Book chapters on the topic "Text compression"

Ferragina, Paolo Igor, and Igor Nitto. "Text Compression." In Encyclopedia of Database Systems, 3046–48. Boston, MA: Springer US, 2009. http://dx.doi.org/10.1007/978-0-387-39940-9_1151.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Ferragina, Paolo, Igor Nitto, and Rossano Venturini. "Text Compression." In Encyclopedia of Database Systems, 1–3. New York, NY: Springer New York, 2017. http://dx.doi.org/10.1007/978-1-4899-7993-3_1151-2.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Ferragina, Paolo, Igor Nitto, and Rossano Venturini. "Text Compression." In Encyclopedia of Database Systems, 4070–72. New York, NY: Springer New York, 2018. http://dx.doi.org/10.1007/978-1-4614-8265-9_1151.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Konow, Roberto, and Gonzalo Navarro. "Text Index Compression." In Encyclopedia of Database Systems, 1–6. New York, NY: Springer New York, 2017. http://dx.doi.org/10.1007/978-1-4899-7993-3_945-2.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Navarro, Gonzalo. "Text Index Compression." In Encyclopedia of Database Systems, 3051–55. Boston, MA: Springer US, 2009. http://dx.doi.org/10.1007/978-0-387-39940-9_945.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Konow, Roberto, and Gonzalo Navarro. "Text Index Compression." In Encyclopedia of Database Systems, 4075–81. New York, NY: Springer New York, 2018. http://dx.doi.org/10.1007/978-1-4614-8265-9_945.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Mandal, Mrinal Kr. "Text Representation and Compression." In Multimedia Signals and Systems, 121–44. Boston, MA: Springer US, 2003. http://dx.doi.org/10.1007/978-1-4615-0265-4_6.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Crochemore, M., F. Mignosi, A. Restivo, and S. Salemi. "Text Compression Using Antidictionaries." In Automata, Languages and Programming, 261–70. Berlin, Heidelberg: Springer Berlin Heidelberg, 1999. http://dx.doi.org/10.1007/3-540-48523-6_23.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Palaniappan, Venka, and Shahram Latifi. "Lossy Text Compression Techniques." In ICCS 2007, 205–10. London: Springer London, 2007. http://dx.doi.org/10.1007/978-1-84628-992-7_28.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Molina, Alejandro, Juan-Manuel Torres-Moreno, Eric SanJuan, Iria da Cunha, and Gerardo Eugenio Sierra Martínez. "Discursive Sentence Compression." In Computational Linguistics and Intelligent Text Processing, 394–407. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013. http://dx.doi.org/10.1007/978-3-642-37256-8_33.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Text compression"

Gowtham, S., G. Iyshwarya, Kaushik Veluru, A. Tamarai Selvi, and J. Vasudha. "Text compression using ambigrams." In 2010 2nd International Conference on Education Technology and Computer (ICETC). IEEE, 2010. http://dx.doi.org/10.1109/icetc.2010.5529630.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Teuhola, Jukka, and Timo Raita. "Text compression using prediction." In the 9th annual international ACM SIGIR conference. New York, New York, USA: ACM Press, 1986. http://dx.doi.org/10.1145/253168.253192.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Goksu, Hayriye, and Banu Diri. "Morphology based text compression." In 2010 IEEE 18th Signal Processing and Communications Applications Conference (SIU 2010). IEEE, 2010. http://dx.doi.org/10.1109/siu.2010.5651231.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Moore, John P. T., Antonio D. Kheirkhahzadeh, and Jiva N. Bagale. "Towards Markup-Aware Text Compression." In 2014 Data Compression Conference (DCC). IEEE, 2014. http://dx.doi.org/10.1109/dcc.2014.80.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Bille, Philip, Mikko Berggren Ettienne, Travis Gagie, Inge Li Gortz, and Nicola Prezza. "Decompressing Lempel-Ziv Compressed Text." In 2020 Data Compression Conference (DCC). IEEE, 2020. http://dx.doi.org/10.1109/dcc47342.2020.00022.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Kruse, H., and A. Mukherjee. "Data compression using text encryption." In Proceedings DCC '97. Data Compression Conference. IEEE, 1997. http://dx.doi.org/10.1109/dcc.1997.582107.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Satoh, N., T. Morihara, Y. Okada, and S. Yoshida. "Study of Japanese text compression." In Proceedings DCC '97. Data Compression Conference. IEEE, 1997. http://dx.doi.org/10.1109/dcc.1997.582134.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Seker, Abdulkadir, Emre Delibas, and Banu Diri. "DNA sequence compression within traditional text compression algorithms." In 2017 25th Signal Processing and Communications Applications Conference (SIU). IEEE, 2017. http://dx.doi.org/10.1109/siu.2017.7960193.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Fenwick, P. "Symbol ranking text compressors." In Proceedings DCC '97. Data Compression Conference. IEEE, 1997. http://dx.doi.org/10.1109/dcc.1997.582093.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Rajesh, Bulla, Mohammed Javed, P. Nagabhushan, and Watanabe Osamu. "Segmentation of Text-Lines and Words from JPEG Compressed Printed Text Documents Using DCT Coefficients." In 2020 Data Compression Conference (DCC). IEEE, 2020. http://dx.doi.org/10.1109/dcc47342.2020.00083.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "Text compression"

Trim, M., Matthew Murray, and C. Crane. Modernization and structural evaluation of the improved Overhead Cable System. Engineer Research and Development Center (U.S.), March 2021. http://dx.doi.org/10.21079/11681/40025.

Full text

Abstract:

A modernized Overhead Cable System prototype for a 689 ft (210 m) Improved Ribbon Bridge crossing was designed, assembled, and structurally tested. Two independent structural tests were executed, i.e., a component-level compression test of the BSS tower was performed to determine its load capacity and failure mode; and a system-level ‘dry’ test of the improved OCS prototype was conducted to determine the limit state and failure mode of the entire OCS. In the component-level compression test of the BSS tower, the compressive capacity was determined to be 102 kips, and the failure mode was localized buckling in the legs of the tower section. During system-level testing, the prototype performed well up to 40.5 kips of simulated drag load, which corresponds to a uniformly distributed current velocity of 10.7 ft/s. If a more realistic, less conservative parabolic velocity distribution is assumed instead, the drag load for an 11 ft/s current is 21.1 kips. Under this assumption, the improved OCS prototype has a factor of safety of 1.9, based on a 689-ft crossing and 11-ft/s current. The OCS failed when one of the tower guy wires pulled out of the ground, causing the tower to overturn.

APA, Harvard, Vancouver, ISO, and other styles

Kovacs, Austin. Axial Double-Ball Test Versus the Uniaxial Unconfined Compression Test for Measuring the Compressive Strength of Freshwater and Sea Ice. Fort Belvoir, VA: Defense Technical Information Center, December 1993. http://dx.doi.org/10.21236/ada277025.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Corona, Edmundo. Numerical Simulations of the Kolsky Compression Bar Test. Office of Scientific and Technical Information (OSTI), October 2015. http://dx.doi.org/10.2172/1226520.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Jadaan, Osama M., and Andrew A. Wereszczak. Effective Size Analysis of the Diametral Compression (Brazil) Test Specimen. Office of Scientific and Technical Information (OSTI), April 2009. http://dx.doi.org/10.2172/951944.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Dyer, S., A. Faburada, K. Gallavan, M. Hoogendyk, and P. Hui. Compression Strength and Drop Test Performance of XM232 Case Assemblies,. Fort Belvoir, VA: Defense Technical Information Center, December 1995. http://dx.doi.org/10.21236/ada303269.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Shrestha, Som, Vishaldeep Sharma, and Omar Abdelaziz. Test Report #33: Compressor Calorimeter Test of R-410A Alternative: R-32/R-134a Mixture Using a Scroll Compressor. Office of Scientific and Technical Information (OSTI), February 2014. http://dx.doi.org/10.2172/1130959.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Tantawi, Sami. The Next Linear Collider Test Accelerator's RF Pulse Compression and Transmission Systems. Office of Scientific and Technical Information (OSTI), February 1999. http://dx.doi.org/10.2172/10192.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Shrestha, Som S., Vishaldeep Sharma, and Omar Abdelaziz. Compressor Calorimeter Test of R-410A Alternative: R-32/134a Mixture Using a Scroll Compressor. Office of Scientific and Technical Information (OSTI), February 2014. http://dx.doi.org/10.2172/1131502.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Croop, Harold C. Fabrication of Curved Graphite/Epoxy Compression Test Panels and Generation of Material Properties. Fort Belvoir, VA: Defense Technical Information Center, October 1985. http://dx.doi.org/10.21236/ada368444.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Baral, Aniruddha, Jeffrey Roesler, M. Ley, Shinhyu Kang, Loren Emerson, Zane Lloyd, Braden Boyd, and Marllon Cook. High-volume Fly Ash Concrete for Pavements Findings: Volume 1. Illinois Center for Transportation, September 2021. http://dx.doi.org/10.36501/0197-9191/21-030.

Full text

Abstract:

High-volume fly ash concrete (HVFAC) has improved durability and sustainability properties at a lower cost than conventional concrete, but its early-age properties like strength gain, setting time, and air entrainment can present challenges for application to concrete pavements. This research report helps with the implementation of HVFAC for pavement applications by providing guidelines for HVFAC mix design, testing protocols, and new tools for better quality control of HVFAC properties. Calorimeter tests were performed to evaluate the effects of fly ash sources, cement–fly ash interactions, chemical admixtures, and limestone replacement on the setting times and hydration reaction of HVFAC. To better target the initial air-entraining agent dosage for HVFAC, a calibration curve between air-entraining dosage for achieving 6% air content and fly ash foam index test has been developed. Further, a digital foam index test was developed to make this test more consistent across different labs and operators. For a more rapid prediction of hardened HVFAC properties, such as compressive strength, resistivity, and diffusion coefficient, an oxide-based particle model was developed. An HVFAC field test section was also constructed to demonstrate the implementation of a noncontact ultrasonic device for determining the final set time and ideal time to initiate saw cutting. Additionally, a maturity method was successfully implemented that estimates the in-place compressive strength of HVFAC through wireless thermal sensors. An HVFAC mix design procedure using the tools developed in this project such as the calorimeter test, foam index test, and particle-based model was proposed to assist engineers in implementing HVFAC pavements.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

Academic literature on the topic 'Text compression'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Contents

Journal articles on the topic "Text compression"

Dissertations / Theses on the topic "Text compression"

Books on the topic "Text compression"

Book chapters on the topic "Text compression"

Conference papers on the topic "Text compression"

Reports on the topic "Text compression"