Rozprawy doktorskie: „Text compression”

1

Wilson, Timothy David. "Animation of text compression algorithms". Thesis, University of Canterbury. Computer Science, 1992. http://hdl.handle.net/10092/9570.

Pełny tekst źródła

Streszczenie:

It has been said that, there is no particular mystery in animation ... it's very simple and like anything that is simple, it is about the hardest thing in the world to do. Text compression is about finding ways of representing text with the smallest amount of data such that it can be restored to its original state. Text compression algorithms are abstract concepts and bringing them into the visual domain is difficult but the effort can bring insight both to the student and to the researcher. This thesis presents some animations of text compression methods and observations about producing good educational and research animations. Several algorithm animation systems were used in the animation production and it was found that although there are several good animation systems fulfilling different functions, little is known about what makes good algorithms animation. A better way of defining animations and some practical principles for animation that were discovered while producing these animations are discussed.

Style APA, Harvard, Vancouver, ISO itp.

2

Branavan, Satchuthananthavale Rasiah Kuhan. "High compression rate text summarization". Thesis, Massachusetts Institute of Technology, 2008. http://hdl.handle.net/1721.1/44368.

Pełny tekst źródła

Streszczenie:

Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.
Includes bibliographical references (p. 95-97).
This thesis focuses on methods for condensing large documents into highly concise summaries, achieving compression rates on par with human writers. While the need for such summaries in the current age of information overload is increasing, the desired compression rate has thus far been beyond the reach of automatic summarization systems. The potency of our summarization methods is due to their in-depth modelling of document content in a probabilistic framework. We explore two types of document representation that capture orthogonal aspects of text content. The first represents the semantic properties mentioned in a document in a hierarchical Bayesian model. This method is used to summarize thousands of consumer reviews by identifying the product properties mentioned by multiple reviewers. The second representation captures discourse properties, modelling the connections between different segments of a document. This discriminatively trained model is employed to generate tables of contents for books and lecture transcripts. The summarization methods presented here have been incorporated into large-scale practical systems that help users effectively access information online.
by Satchuthananthavale Rasiah Kuhan Branavan.
S.M.

Style APA, Harvard, Vancouver, ISO itp.

3

Langiu, Alessio. "Optimal Parsing for dictionary text compression". Thesis, Paris Est, 2012. http://www.theses.fr/2012PEST1091/document.

Pełny tekst źródła

Streszczenie:

Les algorithmes de compression de données basés sur les dictionnaires incluent une stratégie de parsing pour transformer le texte d'entrée en une séquence de phrases du dictionnaire. Etant donné un texte, un tel processus n'est généralement pas unique et, pour comprimer, il est logique de trouver, parmi les parsing possibles, celui qui minimise le plus le taux de compression finale. C'est ce qu'on appelle le problème du parsing. Un parsing optimal est une stratégie de parsing ou un algorithme de parsing qui résout ce problème en tenant compte de toutes les contraintes d'un algorithme de compression ou d'une classe d'algorithmes de compression homogène. Les contraintes de l'algorithme de compression sont, par exemple, le dictionnaire lui-même, c'est-à-dire l'ensemble dynamique de phrases disponibles, et combien une phrase pèse sur le texte comprimé, c'est-à-dire quelle est la longueur du mot de code qui représente la phrase, appelée aussi le coût du codage d'un pointeur de dictionnaire. En plus de 30 ans d'histoire de la compression de texte par dictionnaire, une grande quantité d'algorithmes, de variantes et d'extensions sont apparus. Cependant, alors qu'une telle approche de la compression du texte est devenue l'une des plus appréciées et utilisées dans presque tous les processus de stockage et de communication, seuls quelques algorithmes de parsing optimaux ont été présentés. Beaucoup d'algorithmes de compression manquent encore d'optimalité pour leur parsing, ou du moins de la preuve de l'optimalité. Cela se produit parce qu'il n'y a pas un modèle général pour le problème de parsing qui inclut tous les algorithmes par dictionnaire et parce que les parsing optimaux existants travaillent sous des hypothèses trop restrictives. Ce travail focalise sur le problème de parsing et présente à la fois un modèle général pour la compression des textes basée sur les dictionnaires appelé la théorie Dictionary-Symbolwise et un algorithme général de parsing qui a été prouvé être optimal sous certaines hypothèses réalistes. Cet algorithme est appelé Dictionary-Symbolwise Flexible Parsing et couvre pratiquement tous les cas des algorithmes de compression de texte basés sur dictionnaire ainsi que la grande classe de leurs variantes où le texte est décomposé en une séquence de symboles et de phrases du dictionnaire. Dans ce travail, nous avons aussi considéré le cas d'un mélange libre d'un compresseur par dictionnaire et d'un compresseur symbolwise. Notre Dictionary-Symbolwise Flexible Parsing couvre également ce cas-ci. Nous avons bien un algorithme de parsing optimal dans le cas de compression Dictionary-Symbolwise où le dictionnaire est fermé par préfixe et le coût d'encodage des pointeurs du dictionnaire est variable. Le compresseur symbolwise est un compresseur symbolwise classique qui fonctionne en temps linéaire, comme le sont de nombreux codeurs communs à longueur variable. Notre algorithme fonctionne sous l'hypothèse qu'un graphe spécial, qui sera décrit par la suite, soit bien défini. Même si cette condition n'est pas remplie, il est possible d'utiliser la même méthode pour obtenir des parsing presque optimaux. Dans le détail, lorsque le dictionnaire est comme LZ78, nous montrons comment mettre en œuvre notre algorithme en temps linéaire. Lorsque le dictionnaire est comme LZ77 notre algorithme peut être mis en œuvre en temps O (n log n) où n est le longueur du texte. Dans les deux cas, la complexité en espace est O (n). Même si l'objectif principal de ce travail est de nature théorique, des résultats expérimentaux seront présentés pour souligner certains effets pratiques de l'optimalité du parsing sur les performances de compression et quelques résultats expérimentaux plus détaillés sont mis dans une annexe appropriée
Dictionary-based compression algorithms include a parsing strategy to transform the input text into a sequence of dictionary phrases. Given a text, such process usually is not unique and, for compression purpose, it makes sense to find one of the possible parsing that minimizes the final compression ratio. This is the parsing problem. An optimal parsing is a parsing strategy or a parsing algorithm that solve the parsing problem taking account of all the constraints of a compression algorithm or of a class of homogeneous compression algorithms. Compression algorithm constrains are, for instance, the dictionary itself, i.e. the dynamic set of available phrases, and how much a phrase weight on the compressed text, i.e. the length of the codeword that represent such phrase also denoted as the cost of a dictionary pointer encoding. In more than 30th years of history of dictionary based text compression, while plenty of algorithms, variants and extensions appeared and while such approach to text compression become one of the most appreciated and utilized in almost all the storage and communication process, only few optimal parsing algorithms was presented. Many compression algorithms still leaks optimality of their parsing or, at least, proof of optimality. This happens because there is not a general model of the parsing problem that includes all the dictionary based algorithms and because the existing optimal parsings work under too restrictive hypothesis. This work focus on the parsing problem and presents both a general model for dictionary based text compression called Dictionary-Symbolwise theory and a general parsing algorithm that is proved to be optimal under some realistic hypothesis. This algorithm is called Dictionary-Symbolwise Flexible Parsing and it covers almost all the cases of dictionary based text compression algorithms together with the large class of their variants where the text is decomposed in a sequence of symbols and dictionary phrases.In this work we further consider the case of a free mixture of a dictionary compressor and a symbolwise compressor. Our Dictionary-Symbolwise Flexible Parsing covers also this case. We have indeed an optimal parsing algorithm in the case of dictionary-symbolwise compression where the dictionary is prefix closed and the cost of encoding dictionary pointer is variable. The symbolwise compressor is any classical one that works in linear time, as many common variable-length encoders do. Our algorithm works under the assumption that a special graph that will be described in the following, is well defined. Even if this condition is not satisfied it is possible to use the same method to obtain almost optimal parses. In detail, when the dictionary is LZ78-like, we show how to implement our algorithm in linear time. When the dictionary is LZ77-like our algorithm can be implemented in time O(n log n). Both have O(n) space complexity. Even if the main aim of this work is of theoretical nature, some experimental results will be introduced to underline some practical effects of the parsing optimality in compression performance and some more detailed experiments are hosted in a devoted appendix

Style APA, Harvard, Vancouver, ISO itp.

4

Ong, Ghim Hwee. "Text compression for transmission and storage". Thesis, Loughborough University, 1989. https://dspace.lboro.ac.uk/2134/13790.

Pełny tekst źródła

Streszczenie:

The increasing use of computers for document preparation and publishing coupled with a growth in the general information management facilities available on computers has meant that most documents exist in computer processable form during their lifetime. This has led to a substantial increase in the demand for data storage facilities, which frequently seems to exceed the provision of storage facilities, despite the advances in storage technology. Furthermore, there is growing demand to transmit these textual documents from one use to another, rather than use a printed form for transfer between sites which then needs to be re-entered into a computer at the receiving site. Transmission facilities are, however, limited and large documents can be difficult and expensive to transmit. Problems of storage and transmission capacity can be alleviated by compacting the textual information beforehand, providing that there is no loss of information in this process. Conventional compaction techniques have been designed to compact all forms of data (binary as well as text) and have, predominantly, been based on the byte as the unit of compression. This thesis investigates the alternative of designing a compaction procedure for natural language texts, using the textual word as the unit of compression. Four related alternative techniques are developed and analysed in the thesis. These are designed to be appropriate for different circumstances where either maximum compression or maximum point to point transmission speed is of greatest importance, and where the characteristics of the transmission, or storage, medium may be oriented to a seven or eight bit data unit. The effectiveness of the four techniques is investigated both theoretically and by practical comparison with a widely used conventional alternative. It is shown that for a wide range of textual material the word based techniques yield a greater compression and require substantially less processing time.

Style APA, Harvard, Vancouver, ISO itp.

5

Jones, Greg 1963-2017. "RADIX 95n: Binary-to-Text Data Conversion". Thesis, University of North Texas, 1991. https://digital.library.unt.edu/ark:/67531/metadc500582/.

Pełny tekst źródła

Streszczenie:

This paper presents Radix 95n, a binary to text data conversion algorithm. Radix 95n (base 95) is a variable length encoding scheme that offers slightly better efficiency than is available with conventional fixed length encoding procedures. Radix 95n advances previous techniques by allowing a greater pool of 7-bit combinations to be made available for 8-bit data translation. Since 8-bit data (i.e. binary files) can prove to be difficult to transfer over 7-bit networks, the Radix 95n conversion technique provides a way to convert data such as compiled programs or graphic images to printable ASCII characters and allows for their transfer over 7-bit networks.

Style APA, Harvard, Vancouver, ISO itp.

6

He, Meng. "Indexing Compressed Text". Thesis, University of Waterloo, 2003. http://hdl.handle.net/10012/1143.

Pełny tekst źródła

Streszczenie:

As a result of the rapid growth of the volume of electronic data, text compression and indexing techniques are receiving more and more attention. These two issues are usually treated as independent problems, but approaches of combining them have recently attracted the attention of researchers. In this thesis, we review and test some of the more effective and some of the more theoretically interesting techniques. Various compression and indexing techniques are presented, and we also present two compressed text indices. Based on these techniques, we implement an compressed full-text index, so that compressed texts can be indexed to support fast queries without decompressing the whole texts. The experiments show that our index is compact and supports fast search.

Style APA, Harvard, Vancouver, ISO itp.

7

Blandon, Julio Cesar. "A novel lossless compression technique for text data". FIU Digital Commons, 1999. http://digitalcommons.fiu.edu/etd/1694.

Pełny tekst źródła

Streszczenie:

The focus of this thesis is placed on text data compression based on the fundamental coding scheme referred to as the American Standard Code for Information Interchange or ASCII. The research objective is the development of software algorithms that result in significant compression of text data. Past and current compression techniques have been thoroughly reviewed to ensure proper contrast between the compression results of the proposed technique with those of existing ones. The research problem is based on the need to achieve higher compression of text files in order to save valuable memory space and increase the transmission rate of these text files. It was deemed necessary that the compression algorithm to be developed would have to be effective even for small files and be able to contend with uncommon words as they are dynamically included in the dictionary once they are encountered. A critical design aspect of this compression technique is its compatibility to existing compression techniques. In other words, the developed algorithm can be used in conjunction with existing techniques to yield even higher compression ratios. This thesis demonstrates such capabilities and such outcomes, and the research objective of achieving higher compression ratio is attained.

Style APA, Harvard, Vancouver, ISO itp.

8

Thaper, Nitin 1975. "Using compression for source-based classification of text". Thesis, Massachusetts Institute of Technology, 2001. http://hdl.handle.net/1721.1/86595.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

9

Zhang, Nan. "TRANSFORM BASED AND SEARCH AWARE TEXT COMPRESSION SCHEMES AND COMPRESSED DOMAIN TEXT RETRIEVAL". Doctoral diss., University of Central Florida, 2005. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/3938.

Pełny tekst źródła

Streszczenie:

In recent times, we have witnessed an unprecedented growth of textual information via the Internet, digital libraries and archival text in many applications. While a good fraction of this information is of transient interest, useful information of archival value will continue to accumulate. We need ways to manage, organize and transport this data from one point to the other on data communications links with limited bandwidth. We must also have means to speedily find the information we need from this huge mass of data. Sometimes, a single site may also contain large collections of data such as a library database, thereby requiring an efficient search mechanism even to search within the local data. To facilitate the information retrieval, an emerging ad hoc standard for uncompressed text is XML which preprocesses the text by putting additional user defined metadata such as DTD or hyperlinks to enable searching with better efficiency and effectiveness. This increases the file size considerably, underscoring the importance of applying text compression. On account of efficiency (in terms of both space and time), there is a need to keep the data in compressed form for as much as possible. Text compression is concerned with techniques for representing the digital text data in alternate representations that takes less space. Not only does it help conserve the storage space for archival and online data, it also helps system performance by requiring less number of secondary storage (disk or CD Rom) accesses and improves the network transmission bandwidth utilization by reducing the transmission time. Unlike static images or video, there is no international standard for text compression, although compressed formats like .zip, .gz, .Z files are increasingly being used. In general, data compression methods are classified as lossless or lossy. Lossless compression allows the original data to be recovered exactly. Although used primarily for text data, lossless compression algorithms are useful in special classes of images such as medical imaging, finger print data, astronomical images and data bases containing mostly vital numerical data, tables and text information. Many lossy algorithms use lossless methods at the final stage of the encoding stage underscoring the importance of lossless methods for both lossy and lossless compression applications. In order to be able to effectively utilize the full potential of compression techniques for the future retrieval systems, we need efficient information retrieval in the compressed domain. This means that techniques must be developed to search the compressed text without decompression or only with partial decompression independent of whether the search is done on the text or on some inversion table corresponding to a set of key words for the text. In this dissertation, we make the following contributions: (1) Star family compression algorithms: We have proposed an approach to develop a reversible transformation that can be applied to a source text that improves existing algorithm's ability to compress. We use a static dictionary to convert the English words into predefined symbol sequences. These transformed sequences create additional context information that is superior to the original text. Thus we achieve some compression at the preprocessing stage. We have a series of transforms which improve the performance. Star transform requires a static dictionary for a certain size. To avoid the considerable complexity of conversion, we employ the ternary tree data structure that efficiently converts the words in the text to the words in the star dictionary in linear time. (2) Exact and approximate pattern matching in Burrows-Wheeler transformed (BWT) files: We proposed a method to extract the useful context information in linear time from the BWT transformed text. The auxiliary arrays obtained from BWT inverse transform brings logarithm search time. Meanwhile, approximate pattern matching can be performed based on the results of exact pattern matching to extract the possible candidate for the approximate pattern matching. Then fast verifying algorithm can be applied to those candidates which could be just small parts of the original text. We present algorithms for both k-mismatch and k-approximate pattern matching in BWT compressed text. A typical compression system based on BWT has Move-to-Front and Huffman coding stages after the transformation. We propose a novel approach to replace the Move-to-Front stage in order to extend compressed domain search capability all the way to the entropy coding stage. A modification to the Move-to-Front makes it possible to randomly access any part of the compressed text without referring to the part before the access point. (3) Modified LZW algorithm that allows random access and partial decoding for the compressed text retrieval: Although many compression algorithms provide good compression ratio and/or time complexity, LZW is the first one studied for the compressed pattern matching because of its simplicity and efficiency. Modifications on LZW algorithm provide the extra advantage for fast random access and partial decoding ability that is especially useful for text retrieval systems. Based on this algorithm, we can provide a dynamic hierarchical semantic structure for the text, so that the text search can be performed on the expected level of granularity. For example, user can choose to retrieve a single line, a paragraph, or a file, etc. that contains the keywords. More importantly, we will show that parallel encoding and decoding algorithm is trivial with the modified LZW. Both encoding and decoding can be performed with multiple processors easily and encoding and decoding process are independent with respect to the number of processors.
Ph.D.
School of Computer Science
Engineering and Computer Science
Computer Science

Style APA, Harvard, Vancouver, ISO itp.

10

Linhares, Pontes Elvys. "Compressive Cross-Language Text Summarization". Thesis, Avignon, 2018. http://www.theses.fr/2018AVIG0232/document.

Pełny tekst źródła

Streszczenie:

La popularisation des réseaux sociaux et des documents numériques a rapidement accru l'information disponible sur Internet. Cependant, cette quantité massive de données ne peut pas être analysée manuellement. Parmi les applications existantes du Traitement Automatique du Langage Naturel (TALN), nous nous intéressons dans cette thèse au résumé cross-lingue de texte, autrement dit à la production de résumés dans une langue différente de celle des documents sources. Nous analysons également d'autres tâches du TALN (la représentation des mots, la similarité sémantique ou encore la compression de phrases et de groupes de phrases) pour générer des résumés cross-lingues plus stables et informatifs. La plupart des applications du TALN, celle du résumé automatique y compris, utilisent une mesure de similarité pour analyser et comparer le sens des mots, des séquences de mots, des phrases et des textes. L’une des façons d'analyser cette similarité est de générer une représentation de ces phrases tenant compte de leur contenu. Le sens des phrases est défini par plusieurs éléments, tels que le contexte des mots et des expressions, l'ordre des mots et les informations précédentes. Des mesures simples, comme la mesure cosinus et la distance euclidienne, fournissent une mesure de similarité entre deux phrases. Néanmoins, elles n'analysent pas l'ordre des mots ou les séquences de mots. En analysant ces problèmes, nous proposons un modèle de réseau de neurones combinant des réseaux de neurones récurrents et convolutifs pour estimer la similarité sémantique d'une paire de phrases (ou de textes) en fonction des contextes locaux et généraux des mots. Sur le jeu de données analysé, notre modèle a prédit de meilleurs scores de similarité que les systèmes de base en analysant mieux le sens local et général des mots mais aussi des expressions multimots. Afin d'éliminer les redondances et les informations non pertinentes de phrases similaires, nous proposons de plus une nouvelle méthode de compression multiphrase, fusionnant des phrases au contenu similaire en compressions courtes. Pour ce faire, nous modélisons des groupes de phrases semblables par des graphes de mots. Ensuite, nous appliquons un modèle de programmation linéaire en nombres entiers qui guide la compression de ces groupes à partir d'une liste de mots-clés ; nous cherchons ainsi un chemin dans le graphe de mots qui a une bonne cohésion et qui contient le maximum de mots-clés. Notre approche surpasse les systèmes de base en générant des compressions plus informatives et plus correctes pour les langues française, portugaise et espagnole. Enfin, nous combinons les méthodes précédentes pour construire un système de résumé de texte cross-lingue. Notre système génère des résumés cross-lingue de texte en analysant l'information à la fois dans les langues source et cible, afin d’identifier les phrases les plus pertinentes. Inspirés par les méthodes de résumé de texte par compression en analyse monolingue, nous adaptons notre méthode de compression multiphrase pour ce problème afin de ne conserver que l'information principale. Notre système s'avère être performant pour compresser l'information redondante et pour préserver l'information pertinente, en améliorant les scores d'informativité sans perdre la qualité grammaticale des résumés cross-lingues du français vers l'anglais. En analysant les résumés cross-lingues depuis l’anglais, le français, le portugais ou l’espagnol, vers l’anglais ou le français, notre système améliore les systèmes par extraction de l'état de l'art pour toutes ces langues. En outre, une expérience complémentaire menée sur des transcriptions automatiques de vidéo montre que notre approche permet là encore d'obtenir des scores ROUGE meilleurs et plus stables, même pour ces documents qui présentent des erreurs grammaticales et des informations inexactes ou manquantes
The popularization of social networks and digital documents increased quickly the informationavailable on the Internet. However, this huge amount of data cannot be analyzedmanually. Natural Language Processing (NLP) analyzes the interactions betweencomputers and human languages in order to process and to analyze natural languagedata. NLP techniques incorporate a variety of methods, including linguistics, semanticsand statistics to extract entities, relationships and understand a document. Amongseveral NLP applications, we are interested, in this thesis, in the cross-language textsummarization which produces a summary in a language different from the languageof the source documents. We also analyzed other NLP tasks (word encoding representation,semantic similarity, sentence and multi-sentence compression) to generate morestable and informative cross-lingual summaries.Most of NLP applications (including all types of text summarization) use a kind ofsimilarity measure to analyze and to compare the meaning of words, chunks, sentencesand texts in their approaches. A way to analyze this similarity is to generate a representationfor these sentences that contains the meaning of them. The meaning of sentencesis defined by several elements, such as the context of words and expressions, the orderof words and the previous information. Simple metrics, such as cosine metric andEuclidean distance, provide a measure of similarity between two sentences; however,they do not analyze the order of words or multi-words. Analyzing these problems,we propose a neural network model that combines recurrent and convolutional neuralnetworks to estimate the semantic similarity of a pair of sentences (or texts) based onthe local and general contexts of words. Our model predicted better similarity scoresthan baselines by analyzing better the local and the general meanings of words andmulti-word expressions.In order to remove redundancies and non-relevant information of similar sentences,we propose a multi-sentence compression method that compresses similar sentencesby fusing them in correct and short compressions that contain the main information ofthese similar sentences. We model clusters of similar sentences as word graphs. Then,we apply an integer linear programming model that guides the compression of theseclusters based on a list of keywords. We look for a path in the word graph that has goodcohesion and contains the maximum of keywords. Our approach outperformed baselinesby generating more informative and correct compressions for French, Portugueseand Spanish languages. Finally, we combine these previous methods to build a cross-language text summarizationsystem. Our system is an {English, French, Portuguese, Spanish}-to-{English,French} cross-language text summarization framework that analyzes the informationin both languages to identify the most relevant sentences. Inspired by the compressivetext summarization methods in monolingual analysis, we adapt our multi-sentencecompression method for this problem to just keep the main information. Our systemproves to be a good alternative to compress redundant information and to preserve relevantinformation. Our system improves informativeness scores without losing grammaticalquality for French-to-English cross-lingual summaries. Analyzing {English,French, Portuguese, Spanish}-to-{English, French} cross-lingual summaries, our systemsignificantly outperforms extractive baselines in the state of the art for all these languages.In addition, we analyze the cross-language text summarization of transcriptdocuments. Our approach achieved better and more stable scores even for these documentsthat have grammatical errors and missing information

Style APA, Harvard, Vancouver, ISO itp.

11

Carlsson, Yvonne. "Genericitet i text". Doctoral thesis, Stockholms universitet, Institutionen för nordiska språk, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-81330.

Pełny tekst źródła

Streszczenie:

This dissertation examines genericity from a textual perspective. The material consists of popular science texts about species of animals. The investigation concerns both the distribution of different forms of generic noun phrases and the boundary between generic and non-generic noun phrases. The analytical tools are taken from Accessibility Theory and Blending Theory. Two separate studies have been undertaken. The results of the first study indicate that generic reference on the whole follows the same principles of accessibility as non-generic reference, although there are some differences that can be attributed to the distinction between generic and non-generic reference. Some results suggest that our mental representations of generic referents are generally less accessible than those of non-generic referents. Factors other than accessibility influencing the choice of generic noun phrases are also identified. While genericity is generally treated as an all-or-nothing phenomenon, an important experience of this first study concerns the difficulties facing anyone who tries to distinguish between generic and non-generic noun phrases in authentic texts. These difficulties are the centre of attention in the second study, which shows that genericity is an extremely context-dependent phenomenon. The sentence context may clearly indicate a particular, non-generic reference, while the wider context of the text reveals that the noun phrase in question is in fact generic. Not infrequently, chains of reference involve a great deal of shifting and slithering between a generic and a non-generic meaning, although the references are seemingly coreferential. It is sometimes difficult to decide on the real referents intended. At times there are also clear cases where the noun phrase must be analysed as referring to both generic and non-generic entities at the same time. This implies that it is unlikely that we actually decide for every reference if it is generic or non-generic.

Style APA, Harvard, Vancouver, ISO itp.

12

Bell, Timothy. "A unifying theory and improvements for existing approaches to text compression". Thesis, University of Canterbury. Computer Science, 1986. http://hdl.handle.net/10092/8411.

Pełny tekst źródła

Streszczenie:

More than 40 different schemes for performing text compression have been proposed in the literature. Many of these schemes appear to use quite different approaches, such as Huffman coding, dictionary substitution, predictive modelling, and modelling with Finite State Automata (FSA). From the many schemes in the literature, a representative sample has been selected to include all schemes of current interest (i.e. schemes which are in popular use, or those which have been proposed recently). The main result given in the thesis is that each of these schemes disguises some form of variable-order Markov model (VOMM), which is a relatively inexact model for text. In a variable-order Markov model, each symbol is predicted using a finite number of directly preceding symbols as a context. An important class of FSAs, called Finite Context Automata (FCAs) is defined, and is shown that FCAs implement a form of variable-order Markov modelling. Informally, an FCA is an FSA where the current state is determined by some finite number of immediately preceding input symbols. Three types of proof are used to show that text compression schemes use variable-order Markov modelling: (1) some schemes, such as Cleary and Witten's "Prediction by Partial Matching", use a VOMM by definition, (2) Cormack and Horspool's "Dynamic Markov Compression" scheme uses an FSA for prediction, and it is shown that the FSAs generated will always be FCAs, (3) a class of compression schemes called Greedy Macro (GM) schemes is defined, and a wide range of compression schemes, including Ziv-Lempel (LZ) coding, are shown to belong to that class. A construction is then given to generate an FSA equivalent to any GM scheme, and the FSA is shown to implement a form of variable-order Markov modelling. Because variable-order Markov models are only a crude model for text, the main conclusion of the thesis is that more powerful models, such as Pushdown Automata, combined with arithmetic coding, offer better compression than any existing schemes, and should be explored further. However, there is room for improvement in the compression and speed of some existing schemes, and this is explored as follows. The LZ schemes are currently regarded as the most practical, in that they achieve good compression, are usually very fast, and require relatively little memory to perform well. To study these schemes more closely, an explicit probabalistic symbol-wise model is given, which is equivalent to one of the LZ schemes, LZ77. This model is suitable for providing probabilities for character-by-character Huffman or arithmetic coding. Using the insight gained by examining the symbol-wise model, improvements have been found which can be reflected in LZ schemes, resulting in a scheme called LZB, which offers improved compression, and for which the choice of parameters is less critical. Experiments verify that LZB gives better compression than competing LZ schemes for a large number of texts. Although the time complexity for encoding using LZB and similar schemes is O(n) for a text of n characters, straightforward implementations are very slow. The time consuming step of these algorithms is a search for the longest string match. An algorithm is given which uses a binary search tree to find the longest string match, and experiments show that this results in a dramatic increase in encoding speed.

Style APA, Harvard, Vancouver, ISO itp.

13

Martin, Wickus. "A lossy, dictionary -based method for short message service (SMS) text compression". Master's thesis, University of Cape Town, 2009. http://hdl.handle.net/11427/6415.

Pełny tekst źródła

Streszczenie:

Short message service (SMS) message compression allows either more content to be fitted into a single message or fewer individual messages to be sent as part of a concatenated (or long) message. While essentially only dealing with plain text, many of the more popular compression methods do not bring about a massive reduction in size for short messages. The Global System for Mobile communications (GSM) specification suggests that untrained Huffman encoding is the only required compression scheme for SMS messaging, yet support for SMS compression is still not widely available on current handsets. This research shows that Huffman encoding might actually increase the size of very short messages and only modestly reduce the size of longer messages. While Huffman encoding yields better results for larger text sizes, handset users do not usually write very large messages consisting of thousands of characters. Instead, an alternative compression method called lossy dictionary-based (LD-based) compression is proposed here. In terms of this method, the coder uses a dictionary tuned to the most frequently used English words and economically encodes white space. The encoding is lossy in that the original case is not preserved; instead, the resulting output is all lower case, a loss that might be acceptable to most users. The LD-based method has been shown to outperform Huffman encoding for the text sizes typically used when writing SMS messages, reducing the size of even very short messages and even, for instance, cutting a long message down from five to two parts. Keywords: SMS, text compression, lossy compression, dictionary compression

Style APA, Harvard, Vancouver, ISO itp.

14

Tao, Tao. "COMPRESSED PATTERN MATCHING FOR TEXT AND IMAGES". Doctoral diss., University of Central Florida, 2005. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/2739.

Pełny tekst źródła

Streszczenie:

The amount of information that we are dealing with today is being generated at an ever-increasing rate. On one hand, data compression is needed to efficiently store, organize the data and transport the data over the limited-bandwidth network. On the other hand, efficient information retrieval is needed to speedily find the relevant information from this huge mass of data using available resources. The compressed pattern matching problem can be stated as: given the compressed format of a text or an image and a pattern string or a pattern image, report the occurrence(s) of the pattern in the text or image with minimal (or no) decompression. The main advantages of compressed pattern matching versus the naïve decompress-then-search approach are: First, reduced storage cost. Since there is no need to decompress the data or there is only minimal decompression required, the disk space and the memory cost is reduced. Second, less search time. Since the size of the compressed data is smaller than that of the original data, a searching performed on the compressed data will result in a shorter search time. The challenge of efficient compressed pattern matching can be met from two inseparable aspects: First, to utilize effectively the full potential of compression for the information retrieval systems, there is a need to develop search-aware compression algorithms. Second, for data that is compressed using a particular compression technique, regardless whether the compression is search-aware or not, we need to develop efficient searching techniques. This means that techniques must be developed to search the compressed data with no or minimal decompression and with not too much extra cost. Compressed pattern matching algorithms can be categorized as either for text compression or for image compression. Although compressed pattern matching for text compression has been studied for a few years and many publications are available in the literature, there is still room to improve the efficiency in terms of both compression and searching. None of the search engines available today make explicit use of compressed pattern matching. Compressed pattern matching for image compression, on the other hand, has been relatively unexplored. However, it is getting more attention because lossless compression has become more important for the ever-increasing large amount of medical images, satellite images and aerospace photos, which requires the data to be losslessly stored. Developing efficient information retrieval techniques from the losslessly compressed data is therefore a fundamental research challenge. In this dissertation, we have studied compressed pattern matching problem for both text and images. We present a series of novel compressed pattern matching algorithms, which are divided into two major parts. The first major work is done for the popular LZW compression algorithm. The second major work is done for the current lossless image compression standard JPEG-LS. Specifically, our contributions from the first major work are: 1. We have developed an "almost-optimal" compressed pattern matching algorithm that reports all pattern occurrences. An earlier "almost-optimal" algorithm reported in the literature is only capable of detecting the first occurrence of the pattern and the practical performance of the algorithm is not clear. We have implemented our algorithm and provide extensive experimental results measuring the speed of our algorithm. We also developed a faster implementation for so-called "simple patterns". The simple patterns are patterns that no unique symbol appears more than once. The algorithm takes advantage of this property and runs in optimal time. 2. We have developed a novel compressed pattern matching algorithm for multiple patterns using the Aho-Corasick algorithm. The algorithm takes O(mt+n+r) time with O(mt) extra space, where n is the size of the compressed file, m is the total size of all patterns, t is the size of the LZW trie and r is the number of occurrences of the patterns. The algorithm is particularly efficient when being applied on archival search if the archives are compressed with a common LZW trie. All the above algorithms have been implemented and extensive experiments have been conducted to test the performance of our algorithms and to compare with the best existing algorithms. The experimental results show that our compressed pattern matching algorithm for multiple patterns is competitive among the best algorithms and is practically the fastest among all approaches when the number of patterns is not very large. Therefore, our algorithm is preferable for general string matching applications. LZW is one of the most efficient and popular compression algorithms used extensively and both of our algorithms require no modification on the compression algorithm. Our work, therefore, has great economical and market potential Our contributions from the second major work are: 1 We have developed a new global context variation of the JPEG-LS compression algorithm and the corresponding compressed pattern matching algorithm. Comparing to the original JPEG-LS, the global context variation is search-aware and has faster encoding and decoding speeds. The searching algorithm based on the global-context variation requires partial decompression of the compressed image. The experimental results show that it improves the search speed by about 30% comparing to the decompress-then-search approach. Based on our best knowledge, this is the first two-dimensional compressed pattern matching work for the JPEG-LS standard. 2 We have developed a two-pass variation of the JPEG-LS algorithm and the corresponding compressed pattern matching algorithm. The two-pass variation achieves search-awareness through a common compression technique called semi-static dictionary. Comparing to the original algorithm, the compression of the new algorithm is equally well but the encoding takes slightly longer. The searching algorithm based on the two-pass variation requires no decompression at all and therefore works in the fully compressed domain. It runs in time O(nc+mc+nm+m^2) with extra space O(n+m+mc), where n is the number of columns of the image, m is the number of rows and columns of the pattern, nc is the compressed image size and mc is the compressed pattern size. The algorithm is the first known two-dimensional algorithm that works in the fully compressed domain.
Ph.D.
School of Computer Science
Engineering and Computer Science
Computer Science

Style APA, Harvard, Vancouver, ISO itp.

15

Matsubara, Shigeki, Yoshihide Kato i Seiji Egawa. "Sentence Compression by Removing Recursive Structure from Parse Tree". Springer, 2008. http://hdl.handle.net/2237/15113.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

16

Hertz, David. "Secure Text Communication for the Tiger XS". Thesis, Linköping University, Department of Electrical Engineering, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-8011.

Pełny tekst źródła

Streszczenie:

The option of communicating via SMS messages can be considered available in all GSM networks. It therefore constitutes a almost universally available method for mobile communication.

The Tiger XS, a device for secure communication manufactured by Sectra, is equipped with an encrypted text message transmission system. As the text message service of this device is becoming increasingly popular and as options to connect the Tiger XS to computers or to a keyboard are being researched, the text message service is in need of upgrade.

This thesis proposes amendments to the existing protocol structure. It thoroughly examines a number of options for source coding of small text messages and makes recommendations as to implementation of such features. It also suggests security enhancements and introduces a novel form of stegangraphy.

Style APA, Harvard, Vancouver, ISO itp.

17

Young, David A. "Compression of Endpoint Identifiers in Delay Tolerant Networking". Ohio University / OhioLINK, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1385559406.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

18

Gardner-Stephen, Paul Mark, i paul gardner-stephen@flinders edu au. "Explorations In Searching Compressed Nucleic Acid And Protein Sequence Databases And Their Cooperatively-Compressed Indices". Flinders University. Computer Science, Engineering & Mathematics, 2008. http://catalogue.flinders.edu.au./local/adt/public/adt-SFU20081111.105047.

Pełny tekst źródła

Streszczenie:

Nucleic acid and protein databases such as GenBank are growing at a rate that perhaps eclipses even Moores Law of increase in computational power. This poses a problem for the biological sciences, which have become increasingly dependant on searching and manipulating these databases. It was once reasonably practical to perform exhaustive searches of these databases, for example using the algorithm described by Smith and Waterman, however it has been many years since this was the case. This has led to the development of a series of search algorithms, such as FASTA, BLAST and BLAT, that are each successively faster, but at similarly successive costs in terms of thoroughness. Attempts have been made to remedy this problem by devising search algorithms that are both fast and thorough. An example is CAFE, which seeks to construct a search system with a sub-linear relationship between search time and database size, and argues that this property must be present for any search system to be successful in the long term. This dissertation explores this notion by seeking to construct a search system that takes advantage of the growing redundancy in databases such as GenBank in order to reduce both the search time and the space required to store the databases and their indices, while preserving or increasing the thoroughness of the search. The result is the creation and implementation of new genomic sequence search and alignment, database compression, and index compression algorithms and systems that make progress toward resolving the problem of reducing search speed and space requirements while improving sensitivity. However, success is tempered by the need for databases with adequate local redundancy, and the computational cost of these algorithms when servicing un-batched queries.

Style APA, Harvard, Vancouver, ISO itp.

19

Tam, Wai I. "Compression, indexing and searching of a large structured-text database in a library monitoring and control system (LiMaCS)". Thesis, University of Macau, 1998. http://umaclib3.umac.mo/record=b1636991.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

20

Malatji, Promise Tshepiso. "The development of accented English synthetic voices". Thesis, University of Limpopo, 2019. http://hdl.handle.net/10386/2917.

Pełny tekst źródła

Streszczenie:

Thesis (M. Sc. (Computer Science)) --University of Limpopo, 2019
A Text-to-speech (TTS) synthesis system is a software system that receives text as input and produces speech as output. A TTS synthesis system can be used for, amongst others, language learning, and reading out text for people living with different disabilities, i.e., physically challenged, visually impaired, etc., by native and non-native speakers of the target language. Most people relate easily to a second language spoken by a non-native speaker they share a native language with. Most online English TTS synthesis systems are usually developed using native speakers of English. This research study focuses on developing accented English synthetic voices as spoken by non-native speakers in the Limpopo province of South Africa. The Modular Architecture for Research on speech sYnthesis (MARY) TTS engine is used in developing the synthetic voices. The Hidden Markov Model (HMM) method was used to train the synthetic voices. Secondary training text corpus is used to develop the training speech corpus by recording six speakers reading the text corpus. The quality of developed synthetic voices is measured in terms of their intelligibility, similarity and naturalness using a listening test. The results in the research study are classified based on evaluators’ occupation and gender and the overall results. The subjective listening test indicates that the developed synthetic voices have a high level of acceptance in terms of similarity and intelligibility. A speech analysis software is used to compare the recorded synthesised speech and the human recordings. There is no significant difference in the voice pitch of the speakers and the synthetic voices except for one synthetic voice.

Style APA, Harvard, Vancouver, ISO itp.

21

Borggren, Lukas. "Automatic Categorization of News Articles With Contextualized Language Models". Thesis, Linköpings universitet, Artificiell intelligens och integrerade datorsystem, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-177004.

Pełny tekst źródła

Streszczenie:

This thesis investigates how pre-trained contextualized language models can be adapted for multi-label text classification of Swedish news articles. Various classifiers are built on pre-trained BERT and ELECTRA models, exploring global and local classifier approaches. Furthermore, the effects of domain specialization, using additional metadata features and model compression are investigated. Several hundred thousand news articles are gathered to create unlabeled and labeled datasets for pre-training and fine-tuning, respectively. The findings show that a local classifier approach is superior to a global classifier approach and that BERT outperforms ELECTRA significantly. Notably, a baseline classifier built on SVMs yields competitive performance. The effect of further in-domain pre-training varies; ELECTRA’s performance improves while BERT’s is largely unaffected. It is found that utilizing metadata features in combination with text representations improves performance. Both BERT and ELECTRA exhibit robustness to quantization and pruning, allowing model sizes to be cut in half without any performance loss.

Style APA, Harvard, Vancouver, ISO itp.

22

Shang, Guokan. "Spoken Language Understanding for Abstractive Meeting Summarization Unsupervised Abstractive Meeting Summarization with Multi-Sentence Compression and Budgeted Submodular Maximization. Energy-based Self-attentive Learning of Abstractive Communities for Spoken Language Understanding Speaker-change Aware CRF for Dialogue Act Classification". Thesis, Institut polytechnique de Paris, 2021. http://www.theses.fr/2021IPPAX011.

Pełny tekst źródła

Streszczenie:

Grâce aux progrès impressionnants qui ont été réalisés dans la transcription du langage parlé, il est de plus en plus possible d'exploiter les données transcrites pour des tâches qui requièrent la compréhension de ce que l'on dit dans une conversation. Le travail présenté dans cette thèse, réalisé dans le cadre d'un projet consacré au développement d'un assistant de réunion, contribue aux efforts en cours pour apprendre aux machines à comprendre les dialogues des réunions multipartites. Nous nous sommes concentrés sur le défi de générer automatiquement les résumés abstractifs de réunion.Nous présentons tout d'abord nos résultats sur le Résumé Abstractif de Réunion (RAR), qui consiste à prendre une transcription de réunion comme entrée et à produire un résumé abstractif comme sortie. Nous introduisons une approche entièrement non-supervisée pour cette tâche, basée sur la compression multi-phrases et la maximisation sous-modulaire budgétisée. Nous tirons également parti des progrès récents en vecteurs de mots et dégénérescence de graphes appliqués au TAL, afin de prendre en compte les connaissances sémantiques extérieures et de concevoir de nouvelles mesures de diversité et d'informativité.Ensuite, nous discutons de notre travail sur la Classification en Actes de Dialogue (CAD), dont le but est d'attribuer à chaque énoncé d'un discours une étiquette qui représente son intention communicative. La CAD produit des annotations qui sont utiles pour une grande variété de tâches, y compris le RAR. Nous proposons une couche neuronale modifiée de Champ Aléatoire Conditionnel (CAC) qui prend en compte non seulement la séquence des énoncés dans un discours, mais aussi les informations sur les locuteurs et en particulier, s'il y a eu un changement de locuteur d'un énoncé à l'autre.La troisième partie de la thèse porte sur la Détection de Communauté Abstractive (DCA), une sous-tâche du RAR, dans laquelle les énoncés d'une conversation sont regroupés selon qu'ils peuvent être résumés conjointement par une phrase abstractive commune. Nous proposons une nouvelle approche de la DCA dans laquelle nous introduisons d'abord un encodeur neuronal contextuel d'énoncé qui comporte trois types de mécanismes d'auto-attention, puis nous l'entraînons en utilisant les méta-architectures siamoise et triplette basées sur l'énergie. Nous proposons en outre une méthode d'échantillonnage générale qui permet à l'architecture triplette de capturer des motifs subtils (p. ex., des groupes qui se chevauchent et s'emboîtent)
With the impressive progress that has been made in transcribing spoken language, it is becoming increasingly possible to exploit transcribed data for tasks that require comprehension of what is said in a conversation. The work in this dissertation, carried out in the context of a project devoted to the development of a meeting assistant, contributes to ongoing efforts to teach machines to understand multi-party meeting speech. We have focused on the challenge of automatically generating abstractive meeting summaries.We first present our results on Abstractive Meeting Summarization (AMS), which aims to take a meeting transcription as input and produce an abstractive summary as output. We introduce a fully unsupervised framework for this task based on multi-sentence compression and budgeted submodular maximization. We also leverage recent advances in word embeddings and graph degeneracy applied to NLP, to take exterior semantic knowledge into account and to design custom diversity and informativeness measures.Next, we discuss our work on Dialogue Act Classification (DAC), whose goal is to assign each utterance in a discourse a label that represents its communicative intention. DAC yields annotations that are useful for a wide variety of tasks, including AMS. We propose a modified neural Conditional Random Field (CRF) layer that takes into account not only the sequence of utterances in a discourse, but also speaker information and in particular, whether there has been a change of speaker from one utterance to the next.The third part of the dissertation focuses on Abstractive Community Detection (ACD), a sub-task of AMS, in which utterances in a conversation are grouped according to whether they can be jointly summarized by a common abstractive sentence. We provide a novel approach to ACD in which we first introduce a neural contextual utterance encoder featuring three types of self-attention mechanisms and then train it using the siamese and triplet energy-based meta-architectures. We further propose a general sampling scheme that enables the triplet architecture to capture subtle patterns (e.g., overlapping and nested clusters)

Style APA, Harvard, Vancouver, ISO itp.

23

Sjöstrand, Björn. "Evaluation of Compression Testing and Compression Failure Modes of Paperboard : Video analysis of paperboard during short-span compression and the suitability of short- and long-span compression testing of paperboard". Thesis, Karlstads universitet, Institutionen för ingenjörs- och kemivetenskaper (from 2013), 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:kau:diva-27519.

Pełny tekst źródła

Streszczenie:

The objectives of the thesis were to find the mechanisms that govern compression failures in paperboard and to find the link between manufacturing process and paperboard properties. The thesis also investigates two different test methods and evaluates how suitable they are for paperboard grades. The materials are several commercial board grades and a set of hand-formed dynamic sheets that are made to mimic the construction of commercial paperboard. The method consists of mounting a stereomicroscope on a short-span compression tester and recording the compression failure on video, long-span compression testing and standard properties testing. The observed failure modes of paperboard under compression were classified into four categories depending on the appearance of the failures. Initiation of failure takes place where the structure is weakest and fiber buckling happens after the initiation, which consists of breaking of fiber-fiber bonds or fiber wall delamination. The compression strength is correlated to density and operations and raw materials that increase the density also increases the compression strength. Short-span compression and Long-span compression are not suitable for testing all kinds of papers; the clamps in short-span give bulky specimens an initial geometrical shape that can affect the given value of compression strength. Long-span compression is only suitable for a limited range of papers, one problem with too thin papers are low wavelength buckling.

Style APA, Harvard, Vancouver, ISO itp.

24

Gattis, Sherri L. "Ruggedized Television Compression Equipment for Test Range Systems". International Foundation for Telemetering, 1988. http://hdl.handle.net/10150/615062.

Pełny tekst źródła

Streszczenie:

International Telemetering Conference Proceedings / October 17-20, 1988 / Riviera Hotel, Las Vegas, Nevada
The Wideband Data Protection Program was necessitated from the need to develop digitized, compressed video to enable encryption.

Style APA, Harvard, Vancouver, ISO itp.

25

Jas, Abhijit. "Test vector compression techniques for systems-on-chip /". Full text (PDF) from UMI/Dissertation Abstracts International, 2001. http://wwwlib.umi.com/cr/utexas/fullcit?p3008359.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

26

Rocher, Tatiana. "Compression et indexation de séquences annotées". Thesis, Lille 1, 2018. http://www.theses.fr/2018LIL1I004/document.

Pełny tekst źródła

Streszczenie:

Cette thèse en algorithmique du texte étudie la compression, l'indexation et les requêtes sur un texte annoté. Un texte annoté est un texte sur lequel nous ajoutons des informations. Ce peut être par exemple une recombinaison V(D)J, un marqueur de globules blancs, où le texte est une séquence ADN et les annotations sont des noms de gènes. Le système immunitaire d'une personne se représente par un ensemble de recombinaisons V(D)J. Avec le séquençage à haut débit, on peut avoir accès à des millions de recombinaisons V(D)J qui sont stockées et doivent pouvoir être retrouvées et comparées rapidement.La première contribution de ce manuscrit est une méthode de compression d'un texte annoté qui repose sur le principe du stockage par références. Le texte est découpé en facteurs pointant vers les séquences annotées déjà connues. La seconde contribution propose deux index pour un texte annoté. Ils utilisent une transformée de Burrows-Wheeler indexant le texte ainsi qu'un Wavelet Tree stockant les annotations. Ces index permettent des requêtes efficaces sur le texte, les annotations ou les deux. Nous souhaitons à terme utiliser l'un de ces index pour indexer des recombinaisons V(D)J obtenues dans des services d'hématologie lors du diagnostic et du suivi de patients atteints de leucémie
This thesis in text algorithm studies the compression, indexation and querying on a labeled text. A labeled text is a text to which we add information. For example: a V(D)J recombination, a marker for lymphocytes, where the text is a DNA sequence and the labels are the genes' names. A person's immune system can be represented with a set of V(D)J recombinations. With high-throughput sequencing, we have access to millions of V(D)J recombinations which are stored and need to be recovered and compared quickly.The first contribution of this manuscript is a compression method for a labeled text which uses the concept of storage by references. The text is divided into sections which point to pre-established labeled sequences. The second contribution offers two indexes for a labeled text. Both use a Burrows-Wheeler transform to index the text and a Wavelet Tree to index the labels. These indexes allow efficient queries on text, labels or both. We would like to use one of these indexes on V(D)J recombinations which are obtained with hematology services from the diagnostic or follow-up of patients suffering from leukemia

Style APA, Harvard, Vancouver, ISO itp.

27

Langiu, Alessio. "Parsing optimal pour la compression du texte par dictionnaire". Phd thesis, Université Paris-Est, 2012. http://tel.archives-ouvertes.fr/tel-00804215.

Pełny tekst źródła

Streszczenie:

Les algorithmes de compression de données basés sur les dictionnaires incluent une stratégie de parsing pour transformer le texte d'entrée en une séquence de phrases du dictionnaire. Etant donné un texte, un tel processus n'est généralement pas unique et, pour comprimer, il est logique de trouver, parmi les parsing possibles, celui qui minimise le plus le taux de compression finale. C'est ce qu'on appelle le problème du parsing. Un parsing optimal est une stratégie de parsing ou un algorithme de parsing qui résout ce problème en tenant compte de toutes les contraintes d'un algorithme de compression ou d'une classe d'algorithmes de compression homogène. Les contraintes de l'algorithme de compression sont, par exemple, le dictionnaire lui-même, c'est-à-dire l'ensemble dynamique de phrases disponibles, et combien une phrase pèse sur le texte comprimé, c'est-à-dire quelle est la longueur du mot de code qui représente la phrase, appelée aussi le coût du codage d'un pointeur de dictionnaire. En plus de 30 ans d'histoire de la compression de texte par dictionnaire, une grande quantité d'algorithmes, de variantes et d'extensions sont apparus. Cependant, alors qu'une telle approche de la compression du texte est devenue l'une des plus appréciées et utilisées dans presque tous les processus de stockage et de communication, seuls quelques algorithmes de parsing optimaux ont été présentés. Beaucoup d'algorithmes de compression manquent encore d'optimalité pour leur parsing, ou du moins de la preuve de l'optimalité. Cela se produit parce qu'il n'y a pas un modèle général pour le problème de parsing qui inclut tous les algorithmes par dictionnaire et parce que les parsing optimaux existants travaillent sous des hypothèses trop restrictives. Ce travail focalise sur le problème de parsing et présente à la fois un modèle général pour la compression des textes basée sur les dictionnaires appelé la théorie Dictionary-Symbolwise et un algorithme général de parsing qui a été prouvé être optimal sous certaines hypothèses réalistes. Cet algorithme est appelé Dictionary-Symbolwise Flexible Parsing et couvre pratiquement tous les cas des algorithmes de compression de texte basés sur dictionnaire ainsi que la grande classe de leurs variantes où le texte est décomposé en une séquence de symboles et de phrases du dictionnaire. Dans ce travail, nous avons aussi considéré le cas d'un mélange libre d'un compresseur par dictionnaire et d'un compresseur symbolwise. Notre Dictionary-Symbolwise Flexible Parsing couvre également ce cas-ci. Nous avons bien un algorithme de parsing optimal dans le cas de compression Dictionary-Symbolwise où le dictionnaire est fermé par préfixe et le coût d'encodage des pointeurs du dictionnaire est variable. Le compresseur symbolwise est un compresseur symbolwise classique qui fonctionne en temps linéaire, comme le sont de nombreux codeurs communs à longueur variable. Notre algorithme fonctionne sous l'hypothèse qu'un graphe spécial, qui sera décrit par la suite, soit bien défini. Même si cette condition n'est pas remplie, il est possible d'utiliser la même méthode pour obtenir des parsing presque optimaux. Dans le détail, lorsque le dictionnaire est comme LZ78, nous montrons comment mettre en œuvre notre algorithme en temps linéaire. Lorsque le dictionnaire est comme LZ77 notre algorithme peut être mis en œuvre en temps O (n log n) où n est le longueur du texte. Dans les deux cas, la complexité en espace est O (n). Même si l'objectif principal de ce travail est de nature théorique, des résultats expérimentaux seront présentés pour souligner certains effets pratiques de l'optimalité du parsing sur les performances de compression et quelques résultats expérimentaux plus détaillés sont mis dans une annexe appropriée.

Style APA, Harvard, Vancouver, ISO itp.

28

Navickas, T. A., i S. G. Jones. "PULSE CODE MODULATION DATA COMPRESSION FOR AUTOMATED TEST EQUIPMENT". International Foundation for Telemetering, 1991. http://hdl.handle.net/10150/612065.

Pełny tekst źródła

Streszczenie:

International Telemetering Conference Proceedings / November 04-07, 1991 / Riviera Hotel and Convention Center, Las Vegas, Nevada
Development of automated test equipment for an advanced telemetry system requires continuous monitoring of PCM data while exercising telemetry inputs. This requirements leads to a large amount of data that needs to be stored and later analyzed. For example, a data stream of 4 Mbits/s and a test time of thirty minutes would yield 900 Mbytes of raw data. With this raw data, information needs to be stored to correlate the raw data to the test stimulus. This leads to a total of 1.8 Gb of data to be stored and analyzed. There is no method to analyze this amount of data in a reasonable time. A data compression method is needed to reduce the amount of data collected to a reasonable amount. The solution to the problem was data reduction. Data reduction was accomplished by real time limit checking, time stamping, and smart software. Limit checking was accomplished by an eight state finite state machine and four compression algorithms. Time stamping was needed to correlate stimulus to the appropriate output for data reconstruction. The software was written in the C programming language with a DOS extender used to allow it to run in extended mode. A 94 - 98% compression in the amount of data gathered was accomplished using this method.

Style APA, Harvard, Vancouver, ISO itp.

29

Poirier, Régis. "Compression de données pour le test des circuits intégrés". Montpellier 2, 2004. http://www.theses.fr/2004MON20119.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

30

Khayat, Moghaddam Elham. "On low power test and low power compression techniques". Diss., University of Iowa, 2011. https://ir.uiowa.edu/etd/997.

Pełny tekst źródła

Streszczenie:

With the ever increasing integration capability of semiconductor technology, today's large integrated circuits require an increasing amount of data to test them which increases test time and elevated requirements of tester memory. At the same time, as VLSI design sizes and their operating frequencies continue to increase, timing-related defects are high proportion of the total chip defects and at-speed test is crucial. DFT techniques are widely used in order to improve the testability of a design. While DFT techniques facilitate generation and application of tests, they may cause the test vectors to contain non-functional states which result in higher switching activities compared to the functional mode of operation. Excessive switching activity causes higher power dissipation as well as higher peak supply currents. Excessive power dissipation may cause hot spots that could cause damage the circuit. Excessive peak supply currents may cause higher IR drops which increase signal propagation delays during test causing yield loss. Several methods have been proposed to reduce the switching activity in the circuit under test during shift and capture cycles. While these methods reduce switching activity during test and eliminate the abnormal IR drop, circuits may now operate faster on the tester than they would in the actual system. For speed related and high resistance defect mechanisms, this type of undertesting means that the device could be rejected by the systems integrator or by the end consumer and thus increasing the DPPM of the devices. Therefore, it is critical to ensure that the peak switching activity generated during the two functional clock cycles of an at-speed test is as close as possible to the functional switching activity levels specified for the device. The first part of this dissertation proposes a new method to generate test vectors that mimic functional operation from the switching activity point of view. It uses states obtained by applying a number of functional clock cycles starting from the scan-in state of a test vector to fill unspecified scan cells in test cubes. Experimental results indicate that for industrial designs, the proposed techniques can reduce the peak capture switching on average by 49% while keeping the quality of test very close to conventional ATPG. The second part of this dissertation addresses IR-drop and power minimization techniques in embedded deterministic test environment. The proposed technique employs a controller that allows a given scan chain to be driven by either the decompressor or pseudo functional background. Experimental results indicate an average of 36% reduction in peak switching activity during capture using the proposed technique. In the last part of this dissertation, a new low power test data compression scheme using clock gater circuitry is proposed to simultaneously reduce test data volume and test power by enabling only a subset of the scan chains in each test phase. Since, most of the total power during test is typically in clock tree, by disabling significant portion of clock tree in each test phase, significant reduction in the test power in both combinational logic and clock distribution network are achieved. Using this technique, transitions in the scan chains during both loading of test stimuli and unloading of test responses decrease which will permit increased scan shift frequency and also increase in the number of cores that can be tested in parallel in multi-core designs. The proposed method has the ability of decreasing, in a power aware fashion, the test data volume. Experimental results presented for industrial designs demonstrate that on average reduction factors of 2 and 4 in test data volume and test power are achievable, respectively.

Style APA, Harvard, Vancouver, ISO itp.

31

Zacharia, Nadime. "Compression and decompression of test data for scan-based designs". Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1997. http://www.collectionscanada.ca/obj/s4/f2/dsk1/tape11/PQDD_0004/MQ44048.pdf.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

32

Zacharia, Nadime. "Compression and decompression of test data for scan based designs". Thesis, McGill University, 1996. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=20218.

Pełny tekst źródła

Streszczenie:

Traditional methods to test integrated circuits (ICs) require enormous amount of memory, which make them increasingly expensive and unattractive. This thesis addresses this issue for scan-based designs by proposing a method to compress and decompress input test patterns. By storing the test patterns in a compressed format, the amount of memory required to test ICs can be reduced to manageable levels. The thesis describes the compression and decompression scheme in details. The proposed method relies on the insertion of a decompression unit on the chip. During test application, the patterns are decompressed by the decompression unit as they are applied. Hence, decompression is done on-the-fly in hardware and does not slow down test application.
The design of the decompression unit is treated in depth and a design is proposed that minimizes the amount of extra hardware required. In fact, the design of the decompression unit uses flip-flops already on the chip: it is implemented without inserting any additional flip-flops.
The proposed scheme is applied in two different contexts: (1) in (external) deterministic-stored testing, to reduce the memory requirements imposed on the test equipment; and (2) in built-in self test, to design a test pattern generator capable of generating deterministic patterns with modest area and memory requirements.
Experimental results are provided for the largest ISCAS'89 benchmarks. All of these results point to show that the proposed technique greatly reduces the amount of test data while requiring little area overhead. Compression factors of more than 20 are reported for some circuits.

Style APA, Harvard, Vancouver, ISO itp.

33

Pateras, Stephen. "Correlated and cube-contained random patterns : test set compression techniques". Thesis, McGill University, 1991. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=70300.

Pełny tekst źródła

Streszczenie:

Two novel methods to reduce the number of random test patterns required to fully test a circuit are proposed in this thesis. In the concept of correlated random patterns, reductions in a circuit's random pattern test length are achieved by taking advantage of correlations measured between values applied at different input positions in a complete deterministic test set. Instead of being generated independently, correlated inputs have their random values generated from a common source with each input's value then individually biased at a rate necessary to match the measured correlation. In the concept of cube-contained random patterns, reductions in random pattern test lengths are achieved by the successive assignment of temporarily fixed values to selected inputs during the random pattern generation process.
The concepts of correlated and cube-contained random patterns can be viewed as methods to compress a deterministic test set into a small amount of information which is then used to control the generation of a superset of the deterministic test set. The goal is to make this superset as small as possible while maintaining its containment of the original test set. The two concepts are meant to be used in either a Built-In Self-Test (BIST) environment or with an external tester when the storage requirements of a deterministic test are too large.
Experimental results show that both correlated and cube-contained random patterns can achieve 100% fault coverage of synthesized circuits using orders or magnitude less patterns than when equiprobable random patterns are used.

Style APA, Harvard, Vancouver, ISO itp.

34

Dalmasso, Julien. "Compression de données de test pour architecture de systèmes intégrés basée sur bus ou réseaux et réduction des coûts de test". Thesis, Montpellier 2, 2010. http://www.theses.fr/2010MON20061/document.

Pełny tekst źródła

Streszczenie:

Les circuits intégrés devenant de plus en plus complexes, leur test demande des efforts considérables se répercutant sur le coût de développement et de production de ces composants. De nombreux travaux ont donc porté sur la réduction du coût de ce test en utilisant en particulier les techniques de compression de données de test. Toutefois ces techniques n'adressent que des coeurs numériques dont les concepteurs détiennent la connaissance de toutes les informations structurelles et donc en pratique n'adressent que le test de sous-blocs d'un système complet. Dans cette thèse, nous proposons tout d'abord une nouvelle technique de compression des données de test pour les circuits intégrés compatible avec le paradigme de la conception de systèmes (SoC) à partir de fonctions pré-synthétisées (IPs ou coeurs). Puis, deux méthodes de test des systèmes utilisant la compression sont proposées. La première est relative au test des systèmes SoC utilisant l'architecture de test IEEE 1500 (avec un mécanisme d'accès au test de type bus), la deuxième concerne le test des systèmes pour lesquels la communication interne s'appuie sur des structures de type réseau sur puce (NoC). Ces deux méthodes utilisent conjointement un ordonnancement du test des coeurs du système avec une technique de compression horizontale afin d'augmenter le parallélisme du test des coeurs constituant le système et ce, à coût matériel constant. Les résultats expérimentaux sur des systèmes sur puces de référence montrent des gains de l'ordre de 50% sur le temps de test du système complet
While microelectronics systems become more and more complex, test costs have increased in the same way. Last years have seen many works focused on test cost reduction by using test data compression. However these techniques only focus on individual digital circuits whose structural implementation (netlist) is fully known by the designer. Therefore, they are not suitable for the testing of cores of a complete system. The goal of this PhD work was to provide a new solution for test data compression of integrated circuits taking into account the paradigm of systems-on-chip (SoC) built from pre-synthesized functions (IPs or cores). Then two systems testing method using compression are proposed for two different system architectures. The first one concerns SoC with IEEE 1500 test architecture (with bus-based test access mechanism), the second one concerns NoC-based systems. Both techniques use test scheduling methods combined with test data compression for better exploration of the design space. The idea is to increase test parallelism with no hardware extra cost. Experimental results performed on system-on-chip benchmarks show that the use of test data compression leads to test time reduction of about 50% at system level

Style APA, Harvard, Vancouver, ISO itp.

35

Willis, Stephen, i Bernd Langer. "A Duel Compression Ethernet Camera Solution for Airborne Applications". International Foundation for Telemetering, 2014. http://hdl.handle.net/10150/577522.

Pełny tekst źródła

Streszczenie:

ITC/USA 2014 Conference Proceedings / The Fiftieth Annual International Telemetering Conference and Technical Exhibition / October 20-23, 2014 / Town and Country Resort & Convention Center, San Diego, CA
Camera technology is now ubiquitous with smartphones, laptops, automotive and industrial applications frequently utilizing high resolution imagine sensors. Increasingly there is a demand for high-definition cameras in the aerospace market - however, such cameras must have several considerations that do not apply to average consumer use including high reliability and being ruggedized for harsh environments. A significant issue is managing the large volumes of data that one or more HD cameras produce. One method of addressing this issue is to use compression algorithms that reduce video bandwidth. This can be achieved with dedicated compression units or modules within data acquisition systems. For flight test applications it is important that data from cameras is available for telemetry and coherently synchronized while also being available for storage. Ideally the data in the telemetry steam should be highly compressed to preserve downlink bandwidth while the recorded data is lightly compressed to provide maximum quality for onboard/ post flight analysis. This paper discusses the requirements for airborne applications and presents an innovative solution using Ethernet cameras with integrated compression that outputs two steams of data. This removes the need for dedicated video and compression units while offering all the features of such including switching camera sources and optimized video streams.

Style APA, Harvard, Vancouver, ISO itp.

36

Limprasert, Tawan. "Behaviour of soil, soil-cement and soil-cement-fiber under multiaxial test". Ohio University / OhioLINK, 1995. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1179260769.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

37

Junior, Célio Anderson da Silva. "Avaliação das propriedades mecânicas de ossos de coelhas submetidas à administração de glicocorticóides". Universidade de São Paulo, 2003. http://www.teses.usp.br/teses/disponiveis/82/82131/tde-11122003-143837/.

Pełny tekst źródła

Streszczenie:

O corticóide é usado nas várias especialidades médicas por ser um fármaco de potente efeito anti-inflamatório e imunossupressor, no entanto, é capaz de produzir algumas alterações metabólicas quando administrado por uso prolongado. Desta forma o presente estudo teve como objetivo avaliar as possíveis alterações nas propriedades mecânicas de osso cortical e trabecular de coelhas albinas, quando submetidas à administração de corticoesteróide em altas doses e por tempo prolongado, através de ensaio mecânico de flexão em três pontos em tíbia e fêmures avaliados como estrutura, além de ensaios mecânicos em corpos de prova por flexão em três pontos em fêmures e compressão em vértebras (L5). Para o estudo foram utilizados 39 animais, divididos aleatoriamente em dois grupos, experimental (GE) e controle (GC). Estes grupos foram divididos em quatro subgrupos, sendo 2 experimentais e 2 controles. Na investigação estrutural foram avaliadas as tíbias e fêmures direito por ensaios mecânicos de flexão em 3 pontos, e na investigação como material foram realizados ensaios mecânicos de flexão em 3 pontos para fêmures e compressão para vértebras (L5) através de corpos de prova. A metodologia utilizada para esta pesquisa foi igual para todos os animais, no entanto, o grupo experimental foi administrado durante 21 dias com o medicamento metilprednisolona (Solumedron) diluída em solução salina numa proporção de 2 mg/Kg/dia, e o grupo controle foi somente administrada solução salina na mesma dosagem. Todos os ensaios mecânicos foram realizados na Máquina Universal de Ensaio do Laboratório de Bioengenharia - FMRP/USP. Durante os ensaios foram registrados os valores das deflexões e as respectivas cargas aplicadas, sendo posteriormente confeccionadas curvas carga versus deflexão para cada osso ensaiado. As propriedades mecânicas determinadas para os fêmures e tíbias analisadas de forma estrutural foram: a carga aplicada no limite de proporcionalidade, deflexão no limite de proporcionalidade, carga máxima, rigidez e a resiliência de cada osso. Já para os fêmures e vértebras analisados como material ósseo, foram avaliados suas tensões máximas através da curva carga versus deflexão. Comparações estatísticas foram realizadas entre os resultados encontrados, através do teste T-Student, com nível de significância estabelecido em 5% para todos os parâmetros analisados. Nas análises estatísticas entre os grupos investigados não foram observadas diferenças significativas em nenhuma das comparações, porém observou-se ao final deste estudo uma perda de peso corporal significativa do grupo experimental. Desta forma foi evidenciado através dos resultados encontrados nesta pesquisa que as propriedades mecânicas investigadas não apresentaram alterações significativas com o protocolo de administração de medicamento, sugerindo-se que novos experimentos sejam realizados, com modificações de dosagem e tempo de aplicação.
Corticoteroids are used in many clinical conditions because they present strong anti-inflammatory and imunessupressor activities. But, at the same time, they can cause many metabolic alterations and side effects mainly when there is prolonged. In the present research we studied the possible alterations caused by steroids on the mechanical properties of lamellar and trabecular bone of rabbits. The mechanical properties were assessed by bending tests performed on intact femurs and tibial as well as in samples of cortical bone. Compression tests were performed in L5 vertebral. Thirty-seven female rabbits were randomly distributes in one experimental group (EG-animals) and control (CG animals). Such groups were divided into four subgroups: two experimentals and two controls. The experimental animals received 2mg/kg/day of methylprednisolone (Solumedron ® ) during three weeks. The control animals received the same volume of intramuscular injections of saline, once a day, during three weeks. From the load x deformation curves the load and deflexion were obtained at the yielding point. The ultimate load as well as resilience were also obtained for the intact bones. When the specimens were analysed the ultimate tension was determined. The statisitical analyses did not show any difference between treated and untreated animals for the mechanical properties. But the treated animals showed a significant loose of body weight. We ful that such results require a deepening in the investigation.

Style APA, Harvard, Vancouver, ISO itp.

38

Lancel, Jerome. "Analysis and test of a centrifugal compressor". Thesis, University of Sussex, 2002. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.250183.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

39

Chen, Liang-Chih, i 陳良智. "On Text Compression Algorithms". Thesis, 1997. http://ndltd.ncl.edu.tw/handle/68972127981738088023.

Pełny tekst źródła

Streszczenie:

碩士
國立清華大學
資訊科學研究所
85
The purpose of this thesis is to make a comprehensive survey about text compression and find out an optimal algorithm for it. There have been some famous algorithms in this realm. How these algorithms work and why they compress data well are two topics we most concern about, so we survey these algorithms and analyze the superiority and limitation of them from the theoretical viewpoint at first. Then, the performance of algorithms are evaluated by experiments. Finally, the context modeling is suggested to improve the compression ratio further. By that, almost all the redundancy in source messages is exhausted under our understanding. A new algorithm using the context modeling is proposed and evaluated.

Style APA, Harvard, Vancouver, ISO itp.

40

"Text compression for Chinese documents". Chinese University of Hong Kong, 1995. http://library.cuhk.edu.hk/record=b5888571.

Pełny tekst źródła

Streszczenie:

by Chi-kwun Kan.
Thesis (M.Phil.)--Chinese University of Hong Kong, 1995.
Includes bibliographical references (leaves 133-137).
Abstract --- p.i
Acknowledgement --- p.iii
Chapter 1 --- Introduction --- p.1
Chapter 1.1 --- Importance of Text Compression --- p.1
Chapter 1.2 --- Historical Background of Data Compression --- p.2
Chapter 1.3 --- The Essences of Data Compression --- p.4
Chapter 1.4 --- Motivation and Objectives of the Project --- p.5
Chapter 1.5 --- Definition of Important Terms --- p.6
Chapter 1.5.1 --- Data Models --- p.6
Chapter 1.5.2 --- Entropy --- p.10
Chapter 1.5.3 --- Statistical and Dictionary-based Compression --- p.12
Chapter 1.5.4 --- Static and Adaptive Modelling --- p.12
Chapter 1.5.5 --- One-Pass and Two-Pass Modelling --- p.13
Chapter 1.6 --- Benchmarks and Measurements of Results --- p.15
Chapter 1.7 --- Sources of Testing Data --- p.16
Chapter 1.8 --- Outline of the Thesis --- p.16
Chapter 2 --- Literature Survey --- p.18
Chapter 2.1 --- Data compression Algorithms --- p.18
Chapter 2.1.1 --- Statistical Compression Methods --- p.18
Chapter 2.1.2 --- Dictionary-based Compression Methods (Ziv-Lempel Fam- ily) --- p.23
Chapter 2.2 --- Cascading of Algorithms --- p.33
Chapter 2.3 --- Problems of Current Compression Programs on Chinese --- p.34
Chapter 2.4 --- Previous Chinese Data Compression Literatures --- p.37
Chapter 3 --- Chinese-related Issues --- p.38
Chapter 3.1 --- Characteristics in Chinese Data Compression --- p.38
Chapter 3.1.1 --- Large and Not Fixed Size Character Set --- p.38
Chapter 3.1.2 --- Lack of Word Segmentation --- p.40
Chapter 3.1.3 --- Rich Semantic Meaning of Chinese Characters --- p.40
Chapter 3.1.4 --- Grammatical Variance of Chinese Language --- p.41
Chapter 3.2 --- Definition of Different Coding Schemes --- p.41
Chapter 3.2.1 --- Big5 Code --- p.42
Chapter 3.2.2 --- GB (Guo Biao) Code --- p.43
Chapter 3.2.3 --- Unicode --- p.44
Chapter 3.2.4 --- HZ (Hanzi) Code --- p.45
Chapter 3.3 --- Entropy of Chinese and Other Languages --- p.45
Chapter 4 --- Huffman Coding on Chinese Text --- p.49
Chapter 4.1 --- The use of the Chinese Character Identification Routine --- p.50
Chapter 4.2 --- Result --- p.51
Chapter 4.3 --- Justification of the Result --- p.53
Chapter 4.4 --- Time and Memory Resources Analysis --- p.58
Chapter 4.5 --- The Heuristic Order-n Huffman Coding for Chinese Text Com- pression --- p.61
Chapter 4.5.1 --- The Algorithm --- p.62
Chapter 4.5.2 --- Result --- p.63
Chapter 4.5.3 --- Justification of the Result --- p.64
Chapter 4.6 --- Chapter Conclusion --- p.66
Chapter 5 --- The Ziv-Lempel Compression on Chinese Text --- p.67
Chapter 5.1 --- The Chinese LZSS Compression --- p.68
Chapter 5.1.1 --- The Algorithm --- p.69
Chapter 5.1.2 --- Result --- p.73
Chapter 5.1.3 --- Justification of the Result --- p.74
Chapter 5.1.4 --- Time and Memory Resources Analysis --- p.80
Chapter 5.1.5 --- Effects in Controlling the Parameters --- p.81
Chapter 5.2 --- The Chinese LZW Compression --- p.92
Chapter 5.2.1 --- The Algorithm --- p.92
Chapter 5.2.2 --- Result --- p.94
Chapter 5.2.3 --- Justification of the Result --- p.95
Chapter 5.2.4 --- Time and Memory Resources Analysis --- p.97
Chapter 5.2.5 --- Effects in Controlling the Parameters --- p.98
Chapter 5.3 --- A Comparison of the performance of the LZSS and the LZW --- p.100
Chapter 5.4 --- Chapter Conclusion --- p.101
Chapter 6 --- Chinese Dictionary-based Huffman coding --- p.103
Chapter 6.1 --- The Algorithm --- p.104
Chapter 6.2 --- Result --- p.107
Chapter 6.3 --- Justification of the Result --- p.108
Chapter 6.4 --- Effects of Changing the Size of the Dictionary --- p.111
Chapter 6.5 --- Chapter Conclusion --- p.114
Chapter 7 --- Cascading of Huffman coding and LZW compression --- p.116
Chapter 7.1 --- Static Cascading Model --- p.117
Chapter 7.1.1 --- The Algorithm --- p.117
Chapter 7.1.2 --- Result --- p.120
Chapter 7.1.3 --- Explanation and Analysis of the Result --- p.121
Chapter 7.2 --- Adaptive (Dynamic) Cascading Model --- p.125
Chapter 7.2.1 --- The Algorithm --- p.125
Chapter 7.2.2 --- Result --- p.126
Chapter 7.2.3 --- Explanation and Analysis of the Result --- p.127
Chapter 7.3 --- Chapter Conclusion --- p.128
Chapter 8 --- Concluding Remarks --- p.129
Chapter 8.1 --- Conclusion --- p.129
Chapter 8.2 --- Future Work Direction --- p.130
Chapter 8.2.1 --- Improvement in Efficiency and Resources Consumption --- p.130
Chapter 8.2.2 --- The Compressibility of Chinese and Other Languages --- p.131
Chapter 8.2.3 --- Use of Grammar Model --- p.131
Chapter 8.2.4 --- Lossy Compression --- p.131
Chapter 8.3 --- Epilogue --- p.132
Bibliography --- p.133

Style APA, Harvard, Vancouver, ISO itp.

41

Liou, Chin-yuan, i 劉欽源. "Text compression schemes : a comparison". Thesis, 1993. http://ndltd.ncl.edu.tw/handle/77425729174310677189.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

42

Perera, Paththamestrige. "Syntactic Sentence Compression for Text Summarization". Thesis, 2013. http://spectrum.library.concordia.ca/977725/1/Paththamestrige_MSc_F2013.pdf.

Pełny tekst źródła

Streszczenie:

Abstract Automatic text summarization is a dynamic area in Natural Language Processing that has gained much attention in the past few decades. As a vast amount of data is accumulating and becoming available online, providing automatic summaries of specific subjects/topics has become an important user requirement. To encourage the growth of this research area, several shared tasks are held annually and different types of benchmarks are made available. Early work on automatic text summarization focused on improving the relevance of the summary content but now the trend is more towards generating more abstractive and coherent summaries. As a result of this, sentence simplification has become a prominent requirement in automatic summarization. This thesis presents our work on sentence compression using syntactic pruning methods in order to improve automatic text summarization. Sentence compression has several applications in Natural Language Processing such as text simplification, topic and subtitle generation, removal of redundant information and text summarization. Effective sentence compression techniques can contribute to text summarization by simplifying texts, avoiding redundant and irrelevant information and allowing more space for useful information. In our work, we have focused on pruning individual sentences, using their phrase structure grammar representations. We have implemented several types of pruning techniques and the results were evaluated in the context of automatic summarization, using standard evaluation metrics. In addition, we have performed a series of human evaluations and a comparison with other sentence compression techniques used in automatic summarization. Our results show that our syntactic pruning techniques achieve compression rates that are similar to previous work and also with what humans achieve. However, the automatic evaluation using ROUGE shows that any type of sentence compression causes a decrease in content compared to the original summary and extra content addition does not show a significant improvement in ROUGE. The human evaluation shows that our syntactic pruning techniques remove syntactic structures that are similar to what humans remove and inter-annotator content evaluation using ROUGE shows that our techniques perform well compared to other baseline techniques. However, when we evaluate our techniques with a grammar structure based F-measure, the results show that our pruning techniques perform better and seem to approximate human techniques better than baseline techniques.

Style APA, Harvard, Vancouver, ISO itp.

43

Zhang, Xiaoxi. "Efficient Parallel Text Compression on GPUs". Thesis, 2011. http://hdl.handle.net/1969.1/ETD-TAMU-2011-12-10308.

Pełny tekst źródła

Streszczenie:

This paper demonstrates an efficient text compressor with parallel Lempel-Ziv-Markov chain algorithm (LZMA) on graphics processing units (GPUs). We divide LZMA into two parts, match finder and range encoder. We parallel both parts and achieve competitive performance with freeArc on AMD 6-core 2.81 GHz CPU. We measure match finder time, range encoder compression time and demonstrate realtime performance on a large dataset: 10 GB web pages crawled by IRLbot. Our parallel range encoder is 15 times faster than sequential algorithm (FastAC) with static model.

Style APA, Harvard, Vancouver, ISO itp.

44

LANGIU, Alessio. "Optimal Parsing for Dictionary Text Compression". Doctoral thesis, 2012. http://hdl.handle.net/10447/94651.

Pełny tekst źródła

Streszczenie:

Dictionary-based compression algorithms include a parsing strategy to transform the input text into a sequence of dictionary phrases. Given a text, such process usually is not unique and, for compression purpose, it makes sense to find one of the possible parsing that minimize the final compression ratio. This is the parsing problem. An optimal parsing is a parsing strategy or a parsing algorithm that solve the parsing problem taking account of all the constraints of a compression algorithm or of a class of homogeneous compression algorithms. Compression algorithm constrains are, for instance, the dictionary itself, i.e. the dynamic set of available phrases, and how much a phrase weights on the compressed text, i.e. the number of bits of which the codeword representing such phrase is composed, also denoted as the encoding cost of a dictionary pointer. In more than 30th years of history of dictionary based text compression, while plenty of algorithms, variants and extensions appeared and while dictionary approach to text compression became one of the most appreciated and utilized in almost all the storage and communication processes, only few optimal parsing algorithms were presented. Many compression algorithms still leaks optimality of their parsing or, at least, proof of optimality. This happens because there is not a general model of the parsing problem that includes all the dictionary based algorithms and because the existing optimal parsing algorithms work under too restrictive hypothesis. This work focus on the parsing problem and presents both a general model for dictionary based text compression called Dictionary-Symbolwise Text Compression theory and a general parsing algorithm that is proved to be optimal under some realistic hypothesis. This algorithm is called iii Dictionary-Symbolwise Flexible Parsing and it covers almost all of the known cases of dictionary based text compression algorithms together with the large class of their variants where the text is decomposed in a sequence of symbols and dictionary phrases. In this work we further consider the case of a free mixture of a dictionary compressor and a symbolwise compressor. Our Dictionary-Symbolwise Flexible Parsing covers also this case. We have indeed an optimal parsing algorithm in the case of dictionary-symbolwise compression where the dictionary is prefix closed and the cost of encoding dictionary pointer is variable. The symbolwise compressor is any classical one that works in linear time, as many common variable-length encoders do. Our algorithm works under the assumption that a special graph that will be described in the following, is well defined. Even if this condition is not satisfied, it is possible to use the same method to obtain almost optimal parses. In detail, when the dictionary is LZ78-like, we show how to implement our algorithm in linear time. When the dictionary is LZ77-like our algorithm can be implemented in time O(n log n). Both have O(n) space complexity. Even if the main aim of this work is of theoretical nature, some experimental results will be introduced to underline some practical effects of the parsing optimality in terms of compression performance and to show how to improve the compression ratio by building extensions Dictionary- Symbolwise of known algorithms. Finally, some more detailed experiments are hosted in a devoted appendix.

Style APA, Harvard, Vancouver, ISO itp.

45

Ye, Yan. "Text image compression based on pattern matching /". Diss., 2002. http://wwwlib.umi.com/cr/ucsd/fullcit?p3036946.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

46

"Context-based compression algorithms for text and image data". 1997. http://library.cuhk.edu.hk/record=b5889317.

Pełny tekst źródła

Streszczenie:

Wong Ling.
Thesis (M.Phil.)--Chinese University of Hong Kong, 1997.
Includes bibliographical references (leaves 80-85).
ABSTRACT --- p.1
Chapter 1. --- INTRODUCTION --- p.2
Chapter 1.1 --- motivation --- p.4
Chapter 1.2 --- Original Contributions --- p.5
Chapter 1.3 --- thesis Structure --- p.5
Chapter 2. --- BACKGROUND --- p.7
Chapter 2.1 --- information theory --- p.7
Chapter 2.2 --- early compression --- p.8
Chapter 2.2.1 --- Some Source Codes --- p.10
Chapter 2.2.1.1 --- Huffman Code --- p.10
Chapter 2.2.1.2 --- Tutstall Code --- p.10
Chapter 2.2.1.3 --- Arithmetic Code --- p.11
Chapter 2.3 --- modern techniques for compression --- p.14
Chapter 2.3.1 --- Statistical Modeling --- p.14
Chapter 2.3.1.1 --- Context Modeling --- p.15
Chapter 2.3.1.2 --- State Based Modeling --- p.17
Chapter 2.3.2 --- Dictionary Based Compression --- p.17
Chapter 2.3.2.1 --- LZ-compression --- p.19
Chapter 2.3.3 --- Other Compression Techniques --- p.20
Chapter 2.3.3.1 --- Block Sorting --- p.20
Chapter 2.3.3.2 --- Context Tree Weighting --- p.21
Chapter 3. --- SYMBOL REMAPPING --- p.22
Chapter 3. 1 --- reviews on Block Sorting --- p.22
Chapter 3.1.1 --- Forward Transformation --- p.23
Chapter 3.1.2 --- Inverse Transformation --- p.24
Chapter 3.2 --- Ordering Method --- p.25
Chapter 3.3 --- discussions --- p.27
Chapter 4. --- CONTENT PREDICTION --- p.29
Chapter 4.1 --- Prediction and Ranking Schemes --- p.29
Chapter 4.1.1 --- Content Predictor --- p.29
Chapter 4.1.2 --- Ranking Techn ique --- p.30
Chapter 4.2 --- Reviews on Context Sorting --- p.31
Chapter 4.2.1 --- Context Sorting basis --- p.31
Chapter 4.3 --- General Framework of Content Prediction --- p.31
Chapter 4.3.1 --- A Baseline Version --- p.32
Chapter 4.3.2 --- Context Length Merge --- p.34
Chapter 4.4 --- Discussions --- p.36
Chapter 5. --- BOUNDED-LENGTH BLOCK SORTING --- p.38
Chapter 5.1 --- block sorting with bounded context length --- p.38
Chapter 5.1.1 --- Forward Transformation --- p.38
Chapter 5.1.2 --- Reverse Transformation --- p.39
Chapter 5.2 --- Locally Adaptive Entropy Coding --- p.43
Chapter 5.3 --- discussion --- p.45
Chapter 6. --- CONTEXT CODING FOR IMAGE DATA --- p.47
Chapter 6.1 --- Digital Images --- p.47
Chapter 6.1.1 --- Redundancy --- p.48
Chapter 6.2 --- model of a compression system --- p.49
Chapter 6.2.1 --- Representation --- p.49
Chapter 6.2.2 --- Quantization --- p.50
Chapter 6.2.3 --- Lossless coding --- p.51
Chapter 6.3 --- The Embedded Zerotree Wavelet Coding --- p.51
Chapter 6.3.1 --- Simple Zerotree-like Implementation --- p.53
Chapter 6.3.2 --- Analysis of Zerotree Coding --- p.54
Chapter 6.3.2.1 --- Linkage between Coefficients --- p.55
Chapter 6.3.2.2 --- Design of Uniform Threshold Quantizer with Dead Zone --- p.58
Chapter 6.4 --- Extensions on Wavelet Coding --- p.59
Chapter 6.4.1 --- Coefficients Scanning --- p.60
Chapter 6.5 --- Discussions --- p.61
Chapter 7. --- CONCLUSIONS --- p.63
Chapter 7.1 --- Future Research --- p.64
APPENDIX --- p.65
Chapter A --- Lossless Compression Results --- p.65
Chapter B --- Image Compression Standards --- p.72
Chapter C --- human Visual System Characteristics --- p.75
Chapter D --- Lossy Compression Results --- p.76
COMPRESSION GALLERY --- p.77
Context-based Wavelet Coding --- p.75
RD-OPT-based jpeg Compression --- p.76
SPIHT Wavelet Compression --- p.77
REFERENCES --- p.80

Style APA, Harvard, Vancouver, ISO itp.

47

Wen, Chih-Ming, i 溫智旻. "Text Compression Using a Word-Based Large Alphabet". Thesis, 2005. http://ndltd.ncl.edu.tw/handle/74710510610120099923.

Pełny tekst źródła

Streszczenie:

碩士
國立臺灣科技大學
資訊工程系
93
In this thesis, some word-based large-alphabet text compression schemes are studied. After a word token is parsed from an English or Chinese text file, its occurrence probability is predicted with blended predictive models or with partially matched predictive model. Then, this probability is encoded in the module of arithmetic coding. In order to improve the speed of our compression schemes, we have also studied some data structures and their corresponding processing methods under the condition of large alphabet size. For the schemes studied here, we have implemented them as executable programs to practically compress and decompress typical text files. Performance comparison is made between our program and other text compression programs such as GZIP, bzip2, and PPMd. Experiment results show that our program have better compression rate than the programs mentioned for compressing both kinds of Chinese and English text files. In average, rate improvements are 17.02%, 5.48%, and 1.12% for Chinese text files, and 12.08%, 2.04%, 0.29% for English text files, respectively.

Style APA, Harvard, Vancouver, ISO itp.

48

HUANG, PING-FENG, i 黃品丰. "Recovering corrupted text files with dictionary-based compression". Thesis, 2018. http://ndltd.ncl.edu.tw/handle/8npz55.

Pełny tekst źródła

Streszczenie:

碩士
逢甲大學
通訊工程學系
106
The pace of present technology development now days is getting faster and faster. The new technologies created enormous amount of data over internet and they are growing exponentially. To store all these data pushes for the highly efficient data compression techniques. These data compression algorithms make the transfer of data faster than ever and storage space smaller than ever. However, most of the data compression techniques concentrated on high compression ration rather than data recovery of corrupted compressed files. A compressed file will be unable to decompress correctly if there are few bits corrupted in the file and will require a re-transmission or to retrain another copy from other storage device. To cope with the need of data recovery from certain files, we proposed a segmented data compression technique. This technique can help us to retrieve most of the uncorrupted data if some contents of this file are lost. The trade-off of the proposed data compression is less than optimal compressed ratios for better protection of data files.

Style APA, Harvard, Vancouver, ISO itp.

49

Chuang, Yu-Ting, i 莊侑頲. "Text Detection in Color Images and Compound Document Compression". Thesis, 2003. http://ndltd.ncl.edu.tw/handle/91068801328009029959.

Pełny tekst źródła

Streszczenie:

碩士
國立臺灣大學
電信工程學研究所
91
Abstract As the growth of multimedia components, News, magazines, Web pages, etc are everywhere in our life. However, text in these documents plays an important role when people need to realize details of their downloaded data. Besides that, in order to make people who speak different languages know content of these data easily, text localization and translation in color images are becoming more and more important, it is clear that good text translation can be achieved if we can accurately localize text regions. In order to achieve good translating performance, we propose a novel approach to detect text in color images with very low false alarm rate. First of all, neural network color quantization is used to compact text color. Second, 3D histogram analysis chooses several colors candidates, and then extracted each of these color candidates to obtain several bi-level images. For each extracted bi-level image, connected component analysis and several morphological operators are fed to hold some boxes that are possible text regions. At last, we can use L.O.G edge detector to authenticate accurate text regions from each possible text regions. Meanwhile, in complex color images, multi-quantization layers can be integrated to reject non-text parts and reduce false alarm rate. In addition to localize text regions in color images, we can also apply the text localization technique in the compression of compound documents such as magazines and newspaper. The application can not only reduce transmitting rate effectively but also hold text part clear when low bit-rate transmitting.

Style APA, Harvard, Vancouver, ISO itp.

50

Lin, Wen Long, i 林文隆. "Wavelet-Based Color Document Compression with Graph and Text Segmentation". Thesis, 1998. http://ndltd.ncl.edu.tw/handle/03869862444590817113.

Pełny tekst źródła

Streszczenie:

碩士
國立交通大學
電機與控制工程學系
86
In this thesis, we use the technology of graph and text segmentationin wavelet coefficients to separate graph and text in color document. Zero-Tree encodes the part of graph-image, and the part of text-image is coded by the method of multi-plain text coding. Color-number, the ratio of projection variance, and fractal dimension which are different in graph and text part of the block give us the information manipulate the segmentation. Because of the characteristic of these three parameters which reveal strong fuzzy property, we develop a fuzzy rule to achieve the purpose of segmentation. The result of program simulation shows that image compression with graph- text segmentation has good performance on high compression ratio in color document. We also discuss the problem of the best bit- rate allocation in color image, the relation between PSNR and the layer number in wavelet transform, and how high-frequency coefficients effect the image quality.

Style APA, Harvard, Vancouver, ISO itp.

Rozprawy doktorskie na temat „Text compression”

Utwórz poprawne odniesienie w stylach APA, MLA, Chicago, Harvard i wielu innych