Rozprawy doktorskie na temat „Floating point”

Kliknij ten link, aby zobaczyć inne rodzaje publikacji na ten temat: Floating point.

Utwórz poprawne odniesienie w stylach APA, MLA, Chicago, Harvard i wielu innych

Wybierz rodzaj źródła:

Sprawdź 50 najlepszych rozpraw doktorskich naukowych na temat „Floating point”.

Przycisk „Dodaj do bibliografii” jest dostępny obok każdej pracy w bibliografii. Użyj go – a my automatycznie utworzymy odniesienie bibliograficzne do wybranej pracy w stylu cytowania, którego potrzebujesz: APA, MLA, Harvard, Chicago, Vancouver itp.

Możesz również pobrać pełny tekst publikacji naukowej w formacie „.pdf” i przeczytać adnotację do pracy online, jeśli odpowiednie parametry są dostępne w metadanych.

Przeglądaj rozprawy doktorskie z różnych dziedzin i twórz odpowiednie bibliografie.

1

Skogstrøm, Kristian. "Implementation of Floating-point Coprocessor". Thesis, Norwegian University of Science and Technology, Department of Computer and Information Science, 2005. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-9202.

Pełny tekst źródła
Streszczenie:

This thesis presents the architecture and implementation of a high-performance floating-point coprocessor for Atmel's new microcontroller. The coprocessor architecture is based on a fused multiply-add pipeline developed in the specialization project, TDT4720. This pipeline has been optimized significantly and extended to support negation of all operands and single-precision input and output. New hardware has been designed for the decode/fetch unit, the register file, the compare/convert pipeline and the approximation tables. Division and square root is performed in software using Newton-Raphson iteration. The Verilog RTL implementation has been synthesized at 167 MHz using a 0.18 um standard cell library. The total area of the final implementation is 107 225 gates. The coprocessor has also been synthesized with the CPU. Test-programs have been run to verify that the coprocessor works correctly. A complete verification of the floating-point coprocessor, however, has not been performed due to limitations in time.

Style APA, Harvard, Vancouver, ISO itp.
2

Zhang, Yiwei. "Biophysically accurate floating point neuroprocessors". Thesis, University of Bristol, 2011. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.544427.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
3

Baidas, Zaher Abdulkarim. "High-level floating-point synthesis". Thesis, University of Southampton, 2000. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.325049.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
4

Duracz, Jan Andrzej. "Verification of floating point programs". Thesis, Aston University, 2010. http://publications.aston.ac.uk/15778/.

Pełny tekst źródła
Streszczenie:
In this thesis we present an approach to automated verification of floating point programs. Existing techniques for automated generation of correctness theorems are extended to produce proof obligations for accuracy guarantees and absence of floating point exceptions. A prototype automated real number theorem prover is presented, demonstrating a novel application of function interval arithmetic in the context of subdivision-based numerical theorem proving. The prototype is tested on correctness theorems for two simple yet nontrivial programs, proving exception freedom and tight accuracy guarantees automatically. The prover demonstrates a novel application of function interval arithmetic in the context of subdivision-based numerical theorem proving. The experiments show how function intervals can be used to combat the information loss problems that limit the applicability of traditional interval arithmetic in the context of hard real number theorem proving.
Style APA, Harvard, Vancouver, ISO itp.
5

Ross, Johan, i Hans Engström. "Voice Codec for Floating Point Processor". Thesis, Linköping University, Department of Electrical Engineering, 2008. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-15763.

Pełny tekst źródła
Streszczenie:

As part of an ongoing project at the department of electrical engineering, ISY, at Linköping University, a voice decoder using floating point formats has been the focus of this master thesis. Previous work has been done developing an mp3-decoder using the floating point formats. All is expected to be implemented on a single DSP.The ever present desire to make things smaller, more efficient and less power consuming are the main reasons for this master thesis regarding the use of a floating point format instead of the traditional integer format in a GSM codec. The idea with the low precision floating point format is to be able to reduce the size of the memory. This in turn reduces the size of the total chip area needed and also decreases the power consumption.One main question is if this can be done with the floating point format without losing too much sound quality of the speech. When using the integer format, one can represent every value in the range depending on how many bits are being used. When using a floating point format you can represent larger values using fewer bits compared to the integer format but you lose representation of some values and have to round the values off.From the tests that have been made with the decoder during this thesis, it has been found that the audible difference between the two formats is very small and can hardly be heard, if at all. The rounding seems to have very little effect on the quality of the sound and the implementation of the codec has succeeded in reproducing similar sound quality to the GSM standard decoder.

Style APA, Harvard, Vancouver, ISO itp.
6

Englund, Madeleine. "Hybrid Floating-point Units in FPGAs". Thesis, Linköpings universitet, Datorteknik, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-86587.

Pełny tekst źródła
Streszczenie:
Floating point numbers are used in many applications that  would be well suited to a higher parallelism than that offered in a CPU. In  these cases, an FPGA, with its ability to handle multiple calculations  simultaneously, could be the solution. Unfortunately, floating point  operations which are implemented in an FPGA is often resource intensive,  which means that many developers avoid floating point solutions in FPGAs or  using FPGAs for floating point applications. Here the potential to get less expensive floating point operations by using ahigher radix for the floating point numbers and using and expand the existingDSP block in the FPGA is investigated. One of the goals is that the FPGAshould be usable for both the users that have floating point in their designsand those who do not. In order to motivate hard floating point blocks in theFPGA, these must not consume too much of the limited resources. This work shows that the floating point addition will become smaller withthe use of the higher radix, while the multiplication becomes smaller by usingthe hardware of the DSP block. When both operations are examined at the sametime, it turns out that it is possible to get a reduced area, compared toseparate floating point units, by utilizing both the DSP block and higherradix for the floating point numbers.
Style APA, Harvard, Vancouver, ISO itp.
7

Xiao, Yancheng. "Two floating point LLL reduction algorithms". Thesis, McGill University, 2013. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=114503.

Pełny tekst źródła
Streszczenie:
The Lenstra, Lenstra and Lov\'sz (LLL) reduction is the most popular lattice reduction and is a powerful tool for solving many complex problems in mathematics and computer science. The blocking technique casts matrix algorithms in terms of matrix-matrix operations to permit efficient reuse of data in the algorithms. In this thesis, we use the blocking technique to develop two floating point block LLL reduction algorithms, the left-to-right block LLL (LRBLLL) reduction algorithm and the alternating partition block LLL (APBLLL) reduction algorithm, and give the complexity analysis of these two algorithms. We compare these two block LLL reduction algorithms with the original LLL reduction algorithm (in floating point arithmetic) and the partial LLL (PLLL) reduction algorithm in the literature in terms of CPU run time, flops and relative backward errors. The simulation results show that the overall CPU run time of the two block LLL reduction algorithms are faster than the partial LLL reduction algorithm and much faster than the original LLL, even though the two block algorithms cost more flops than the partial LLL reduction algorithm in some cases. The shortcoming of the two block algorithms is that sometimes they may not be as numerically stable as the original and partial LLL reduction algorithms. The parallelization of APBLLL is discussed.
Le Lenstra, Lenstra et réduction Lovasz (LLL) est la réduction de réseaux plus populaire et il est un outil puissant pour résoudre de nombreux problèmes complexes en mathématiques et en informatique. La technique bloc LLL bloquante reformule les algorithmes en termes de matrice-matrice opérations de permettre la réutilisation efficace des données dans les algorithmes bloc LLL. Dans cette thèse, nous utilisons la technique de blocage de développer les deux algorithmes de réduction bloc LLL en points flottants, l'algorithme de réduction bloc LLL de la gauche vers la droite (LRBLLL) et l'algorithme de réduction bloc LLL en partition alternative (APBLLL), et donner a l'analyse de la complexité des ces deux algorithmes. Nous comparons ces deux algorithmes de réduction bloc LLL avec l'algorithme de réduction LLL original (en arithmétique au point flottant) et l'algorithme de réduction LLL partielle (PLLL) dans la littérature en termes de temps d'exécution CPU, flops et les erreurs de l'arrière par rapport. Les résultats des simulations montrent que les temps d'exécution CPU pour les deux algorithmes de réduction blocs LLL sont plus rapides que l'algorithme de réduction LLL partielle et beaucoup plus rapide que la réduction LLL originale, même si les deux algorithmes par bloc coûtent plus de flops que l'algorithme de réduction LLL partielle dans certains cas. L'inconvénient de ces deux algorithmes par blocs, c'est que parfois, ils peuvent n'être pas aussi stable numériquement que les algorithmes originaux et les algorithmes de réduction LLL partielle. Le parallélisation de APBLLL est discutée.
Style APA, Harvard, Vancouver, ISO itp.
8

Kupriianova, Olga. "Towards a modern floating-point environment". Thesis, Paris 6, 2015. http://www.theses.fr/2015PA066584/document.

Pełny tekst źródła
Streszczenie:
Cette thèse fait une étude sur deux moyens d'enrichir l'environnement flottant courant : le premier est d'obtenir plusieurs versions d'implantation pour chaque fonction mathématique, le deuxième est de fournir des opérations de la norme IEEE754, qui permettent de mélanger les entrées et la sortie dans les bases différentes. Comme la quantité de versions différentes pour chaque fonction mathématique est énorme, ce travail se concentre sur la génération du code. Notre générateur de code adresse une large variété de fonctions: il produit les implantations paramétrées pour les fonctions définies par l'utilisateur. Il peut être vu comme un générateur de fonctions boîtes-noires. Ce travail inclut un nouvel algorithme pour le découpage de domaine et une tentative de remplacer les branchements pendant la reconstruction par un polynôme. Le nouveau découpage de domaines produit moins de sous-domaines et les degrés polynomiaux sur les sous-domaines adjacents ne varient pas beaucoup. Pour fournir les implantations vectorisables il faut éviter les branchements if-else pendant la reconstruction. Depuis la révision de la norme IEEE754 en 2008, il est devenu possible de mélanger des nombres de différentes précisions dans une opération. Par contre, il n'y a aucun mécanisme qui permettrait de mélanger les nombres dans des bases différentes dans une opération. La recherche dans l'arithmétique en base mixte a commencé par les pires cas pour le FMA. Un nouvel algorithme pour convertir une suite de caractères décimaux du longueur arbitraire en nombre flottant binaire est présenté. Il est indépendant du mode d'arrondi actuel et produit un résultat correctement arrondi
This work investigates two ways of enlarging the current floating-point environment. The first is to support several implementation versions of each mathematical function (elementary such as $\exp$ or $\log$ and special such as $\erf$ or $\Gamma$), the second one is to provide IEEE754 operations that mix the inputs and the output of different \radixes. As the number of various implementations for each mathematical function is large, this work is focused on code generation. Our code generator supports the huge variety of functions: it generates parametrized implementations for the user-specified functions. So it may be considered as a black-box function generator. This work contains a novel algorithm for domain splitting and an approach to replace branching on reconstruction by a polynomial. This new domain splitting algorithm produces less subdomains and the polynomial degrees on adjacent subdomains do not change much. To produce vectorizable implementations, if-else statements on the reconstruction step have to be avoided. Since the revision of the IEEE754 Standard in 2008 it is possible to mix numbers of different precisions in one operation. However, there is no mechanism that allows users to mix numbers of different radices in one operation. This research starts an examination ofmixed-radix arithmetic with the worst cases search for FMA. A novel algorithm to convert a decimal character sequence of arbitrary length to a binary floating-point number is presented. It is independent of currently-set rounding mode and produces correctly-rounded results
Style APA, Harvard, Vancouver, ISO itp.
9

Kupriianova, Olga. "Towards a modern floating-point environment". Electronic Thesis or Diss., Paris 6, 2015. http://www.theses.fr/2015PA066584.

Pełny tekst źródła
Streszczenie:
Cette thèse fait une étude sur deux moyens d'enrichir l'environnement flottant courant : le premier est d'obtenir plusieurs versions d'implantation pour chaque fonction mathématique, le deuxième est de fournir des opérations de la norme IEEE754, qui permettent de mélanger les entrées et la sortie dans les bases différentes. Comme la quantité de versions différentes pour chaque fonction mathématique est énorme, ce travail se concentre sur la génération du code. Notre générateur de code adresse une large variété de fonctions: il produit les implantations paramétrées pour les fonctions définies par l'utilisateur. Il peut être vu comme un générateur de fonctions boîtes-noires. Ce travail inclut un nouvel algorithme pour le découpage de domaine et une tentative de remplacer les branchements pendant la reconstruction par un polynôme. Le nouveau découpage de domaines produit moins de sous-domaines et les degrés polynomiaux sur les sous-domaines adjacents ne varient pas beaucoup. Pour fournir les implantations vectorisables il faut éviter les branchements if-else pendant la reconstruction. Depuis la révision de la norme IEEE754 en 2008, il est devenu possible de mélanger des nombres de différentes précisions dans une opération. Par contre, il n'y a aucun mécanisme qui permettrait de mélanger les nombres dans des bases différentes dans une opération. La recherche dans l'arithmétique en base mixte a commencé par les pires cas pour le FMA. Un nouvel algorithme pour convertir une suite de caractères décimaux du longueur arbitraire en nombre flottant binaire est présenté. Il est indépendant du mode d'arrondi actuel et produit un résultat correctement arrondi
This work investigates two ways of enlarging the current floating-point environment. The first is to support several implementation versions of each mathematical function (elementary such as exp or log and special such as erf or Γ), the second one is to provide IEEE754 operations that mix the inputs and the output of different radixes. As the number of various implementations for each mathematical function is large, this work is focused on code generation. Our code generator supports the huge variety of functions: it generates parametrized implementations for the user-specified functions. So it may be considered as a black-box function generator. This work contains a novel algorithm for domain splitting and an approach to replace branching on reconstruction by a polynomial. This new domain splitting algorithm produces less subdomains and the polynomial degrees on adjacent subdomains do not change much. To produce vectorizable implementations, if-else statements on the reconstruction step have to be avoided. Since the revision of the IEEE754 Standard in 2008 it is possible to mix numbers of different precisions in one operation. However, there is no mechanism that allows users to mix numbers of different radices in one operation. This research starts an examination ofmixed-radix arithmetic with the worst cases search for FMA. A novel algorithm to convert a decimal character sequence of arbitrary length to a binary floating-point number is presented. It is independent of currently-set rounding mode and produces correctly-rounded results
Style APA, Harvard, Vancouver, ISO itp.
10

Aamodt, Tor. "Floating-point to fixed-point compilation and embedded architectural support". Thesis, National Library of Canada = Bibliothèque nationale du Canada, 2001. http://www.collectionscanada.ca/obj/s4/f2/dsk3/ftp04/MQ58787.pdf.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
11

Shen, Shumin. "A floating-point analog-to-digital converter". Thesis, University of Ottawa (Canada), 2004. http://hdl.handle.net/10393/26772.

Pełny tekst źródła
Streszczenie:
This thesis studies the floating-point analog-to-digital converter (FP-ADC). The first attempt is to analyze the parallel architecture of the floating-point converter, which is our research base. The characteristics and specifications of the floating-point AID converter are described. Simulations of the parallel architecture of the floating-point A/D converter were conceived, run and presented here to support the theoretically derived FP-ADC transfer characteristics. After analyzing the parallel architecture of the floating-point A/D converter, the following work is to provide a way of minimizing the conversion time as well as keeping the precision of the floating point A/D converter (FP-ADC) by implementing the parallel architecture with Field Programmable Gate Arrays (FPGA). The thesis presents the design and practical implementation of the parallel FP-ADC, based on a FPGA and other hybrid components-of-the-shelf. The correctness of the design was verified by computer simulation, while the functionality of the implemented FP-ADC was tested on a test bench controlled by a PC. (Abstract shortened by UMI.)
Style APA, Harvard, Vancouver, ISO itp.
12

Panisset, Jean François. "A double precision floating point convolution processor /". Thesis, McGill University, 1994. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=68047.

Pełny tekst źródła
Streszczenie:
Two-dimensional convolution is one of the basic operations in image processing, where it is used as a filtering tool. A kernel of values corresponding to the spatial-domain impulse response of the filter is applied to the original image in order to perform desired operations such as low-pass filtering or edge enhancement.
Since convolution is basically a two-dimensional multiply and accumulate operation, it is computationally intensive. General-purpose computer architectures are often ill-suited to perform two-dimensional convolutions, since they lack the required processing speed or memory bandwidth. This motivated the project to design and build a specialized device which can compute the convolution operation efficiently for such applications.
This thesis addresses the design and implementation of a specialized processor which can perform two-dimensional convolution using double-precision floating-point operands. The selected architecture is based on the concept of the systolic array. These architectures are reviewed particularly for the constraints which impact their logical and physical design, as well as for the numerous applications for which they have been proposed in the literature or have been implemented. After outlining the overall system architecture of the convolution processor, the thesis focuses on the details of the implementation of the bus interface and Direct Memory Access controller. Finally, the performance of the proposed design is evaluated and compared against alternative software implementations of the convolution algorithm on representative architectures. (Abstract shortened by UMI.)
Style APA, Harvard, Vancouver, ISO itp.
13

Zhang, Michael Ruogu 1977. "Software floating-point computation on parallel mahcines". Thesis, Massachusetts Institute of Technology, 1999. http://hdl.handle.net/1721.1/80133.

Pełny tekst źródła
Streszczenie:
Thesis (M.Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1999.
Includes bibliographical references (p. 71).
by Michael Ruogu Zhang.
M.Eng.
Style APA, Harvard, Vancouver, ISO itp.
14

Havermark, Joel. "Bit-Vector Approximations of Floating-Point Arithmetic". Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-372077.

Pełny tekst źródła
Streszczenie:
The use of floating-point numbers in safety-critical applications shows a need to efficiently and automatically reason about them. One approach is to use Satisfiability modulo theories (SMT). The naive approach to using SMT does not scale well. Previous work suggests approximations as a scalable solution. Zeljic, Backeman, Wintersteiger, and Rümmer have created a framework called UppSAT for iterative approximations. The approximations created with UppSAT use a precision to indicate how approximate the formula is. Floating-point can be approximated by the simpler fixed-point format. This provides the benefit of not having to encode the rounding modes and special values. It also enables efficient encodings of the operations as bit-vectors. Zeljic et al. have implemented such an approximation in UppSAT. The precision of the approximation indicates the amount of bits that the numbers use. This thesis aims to improve the way that the approximation handles precision: by providing two new strategies for increasing it, increasing the detail, and changing the maximum. One of the two new strategies is implemented. Both strategies are based on the idea to calculate the amount of bits needed to represent a float as fixed-point. The implemented strategy performs worse than the current but solves test cases that the current is not able to solve. The reason for the performance is probably a too fast increase of the precision and that the same precision is used for the whole formula. Even though the implemented strategy is worse, the new strategies and precision domain can provide a base to build on when further improving the approximation and also help to show that fixed-point and approximations, in general, are suitable for reasoning about floating-point and that there are more approximations to investigate.
Style APA, Harvard, Vancouver, ISO itp.
15

Kolumban, Gaspar. "Low Cost Floating-Point Extensions to a Fixed-Point SIMD Datapath". Thesis, Linköpings universitet, Datorteknik, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-101586.

Pełny tekst źródła
Streszczenie:
The ePUMA architecture is a novel master-multi-SIMD DSP platform aimed at low-power computing, like for embedded or hand-held devices for example. It is both a configurable and scalable platform, designed for multimedia and communications. Numbers with both integer and fractional parts are often used in computers because many important algorithms make use of them, like signal and image processing for example. A good way of representing these types of numbers is with a floating-point representation. The ePUMA platform currently supports a fixed-point representation, so the goal of this thesis will be to implement twelve basic floating-point arithmetic operations and two conversion operations onto an already existing datapath, conforming as much as possible to the IEEE 754-2008 standard for floating-point representation. The implementation should be done at a low hardware and power consumption cost. The target frequency will be 500MHz. The implementation will be compared with dedicated DesignWare components and the implementation will also be compared with floating-point done in software in ePUMA. This thesis presents a solution that on average increases the VPE datapath hardware cost by 15% and the power consumption increases by 15% on average. Highest clock frequency with the solution is 473MHz. The target clock frequency of 500MHz is thus not achieved but considering the lack of register retiming in the synthesis step, 500MHz can most likely be reached with this design.
Style APA, Harvard, Vancouver, ISO itp.
16

Debski, Michal. "Self-calibrating floating-point analog-to-digital converter". Thesis, University of Ottawa (Canada), 2005. http://hdl.handle.net/10393/26884.

Pełny tekst źródła
Streszczenie:
The Floating-Point Analog-to-Digital Converter (FPADC) is an extended version of the Fixed-Point ADC. It is designed to deal with a broader dynamic range of signals while exhibiting a smaller relative quantization error. The traditional implementation of the FPADC is characterized by a high relative precision, but it requires high-precision high-speed components in order to achieve that. The high precision of the high-speed components comes at a greater cost. This constraint limits the availability of FPADCs to high-priced designs. The thesis addresses a low-speed and a low-cost calibration approach for the FPADC. It presents the architecture, design and implementation platform of a self-calibrating differential predictive FPADC which is characterized by utilizing low-grade components. The precision is maintained at high values by additional hardware that periodically performs calibration cycles. Starting with a review of the field of FPADC the thesis develops the understanding of the Floating Point ADCs. The implementation is then extended to include a high precision low speed calibrating ADC. A complete implementation of the design is carried out and described. Finally, experimental measurements are performed to test the new FPADC and present the acquired results.
Style APA, Harvard, Vancouver, ISO itp.
17

Pillai, Rajan V. K. "On low power floating point data path architectures". Thesis, National Library of Canada = Bibliothèque nationale du Canada, 2000. http://www.collectionscanada.ca/obj/s4/f2/dsk1/tape4/PQDD_0021/NQ47712.pdf.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
18

Shah, Syed Yawar Ali. "On synthesis and optimization of floating point units". Thesis, National Library of Canada = Bibliothèque nationale du Canada, 2001. http://www.collectionscanada.ca/obj/s4/f2/dsk3/ftp04/MQ59309.pdf.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
19

Drolet, Jean. "The design of a floating-point convolution system /". Thesis, McGill University, 1992. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=56813.

Pełny tekst źródła
Streszczenie:
Convolution is the basic operation behind many image processing algorithms. However, it is a computationally intensive operation. Dedicated hardware exists to implement the fixed-point version of this operation. But recent developments such as laser range data processing now require floating-point arithmetic which is often performed by software.
This thesis presents the design of a specialized convolution processor that operates on double precision floating-point data. This convolver is based on an array of systolic cells and may be configured to process both images and unidimensional signals. Support circuitry handles data format conversion as well as data sequencing for the systolic array. In addition, the processor communicates with the memory of a host computer via a DMA (direct memory access) interface to he VMEbus. In this thesis, the design of these auxiliary subsystems is emphasized and their implementation in application specific integrated circuits (ASIC) is presented.
Style APA, Harvard, Vancouver, ISO itp.
20

Jain, Sheetal A. 1980. "Low-power single-precision IEEE Floating-point unit". Thesis, Massachusetts Institute of Technology, 2003. http://hdl.handle.net/1721.1/87426.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
21

Hellman, Noah. "Mitchell-Based Approximate Operations on Floating-Point Numbers". Thesis, Linköpings universitet, Datorteknik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-178882.

Pełny tekst źródła
Streszczenie:
By adapting Mitchell's algorithm for floating-point numbers, one can efficiently perform arithmetic floating-point operations in an approximate logarithmic domain in order to perform approximate computations of functions such as multiplication, division, square root and others. This work examines how this algorithm can be improved in terms of accuracy and hardware complexity by applying a set of various methods that are parametrized and offer a large design space. Optimal coefficients for a large portion of this space is determined and used to synthesize circuits for both ASIC and FPGA circuits using the bfloat16 format\@. Optimal configurations are then extracted to create an optimal curve where one can select an acceptable error range and obtain a circuit with a minimal hardware cost.
Style APA, Harvard, Vancouver, ISO itp.
22

Mishra, Biswajit. "Investigation into a Floating Point Geometric Algebra Processor". Thesis, University of Southampton, 2007. https://eprints.soton.ac.uk/266009/.

Pełny tekst źródła
Streszczenie:
The widespread use of Computer Graphics and Computer Vision applications has led to a plethora of hardware implementations that are usually expressed using linear algebraic methods. There are two drawbacks with this approach that are posing fundamental challenges to engineers developing hardware and software applications in this area. The first is the complexity and size of the hardware blocks required to practically realize such applications – particularly multiplication, addition and accumulation operations. Whether the platform is Field Programmable Gate Arrays (FPGA) or Application Specific Integrated Circuits (ASICs), in both cases there are significant issues in efficiently implementing complex geometric functions using standard mathematical techniques, particularly in floating point arithmetic. The second major issue is the complexity required for the effective solution of complex multi-dimensional problems either for scientific computation or for advanced graphical applications. Conventional algebraic techniques do not scale well in hardware terms to more than 3 dimensional problems, so a new approach is desirable to handle these situations. Geometric Algebra (GA) promises to unify the different approaches used in vector algebra, trigonometry, homogeneous coordinates and quaternion algebra into a single framework. Geometric Algebra provides a rich set of geometric primitives to describe points, lines, planes, circles and spheres along with simple algebraic operations instead of points and lines alone as in a conventional algebra. This ability to carry out direct operations on this rich set of primitives enables GA to be a powerful tool for solving a wide variety of problems in computer vision, graphics and robotics. In all these areas, performance is a key issue, therefore hardware architecture of GA is considered essential to meet the stringent performance requirements for these applications. In this thesis, a detailed review of the influential research in the development of GA along with the necessary fundamentals of GA is given. Subsequently a review of background relating different implementation strategies provides an important element in understanding the specific requirements and thereby developing the hardware architecture. Based on this study, an architecture was developed that is modular and scalable to higher dimensions for geometric algebra processing. In this architecture, the designer can easily specify the floating point resolution, the order of the computation and also configure the trade-offs between the hardware area and speed. The modularity and the flexibility of the interface of the architecture also provides a platform where the designer can quantify the clock cycles to the number of resources that they may have in hand for any GA based application. This architecture has been designed not only to be a stand alone core, but can also be configured and used as a coprocessor to a larger system. To demonstrate the performance and flexibility of the GA architecture presented in this thesis, the hardware has been tested extensively using a standard image processing application. The performance results obtained from these experiments are comparable to the results obtained using existing methods. It is also shown through derivations and also from the experiments that the convolution operation in the image processing application with the GA based rotor masks, belong to a class of linear vector filters. This linear vector filter can be applied to image or speech signals where vector filtering is of fundamental interest. This opens up a range of research opportunities to the growing field of color image processing. This work has explored the totally new area of GA hardware with novel aspects including the grade tracking, configurability and linearity of the hardware. From a software point of view and application development, this work has explored the development of a platform with compiler support and easier programming methods specific to the GA hardware in an FPGA based platform. This has further enabled and increased the practical significance of the work by verifying the GA techniques in a variety of real world designs. Therefore, from both points of view it has advanced the state-of-art and has opened up opportunities for further research in GA hardware.
Style APA, Harvard, Vancouver, ISO itp.
23

Raina, Saurabh-Kumar. "FLIP, a floating-point library for integer processors". Lyon, École normale supérieure (sciences), 2006. http://www.theses.fr/2006ENSL0369.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
24

Collingbourne, Peter Cyrus. "Symbolic crosschecking of data-parallel floating point code". Thesis, Imperial College London, 2013. http://hdl.handle.net/10044/1/10936.

Pełny tekst źródła
Streszczenie:
In this thesis we present a symbolic execution-based technique for cross-checking programs accelerated using SIMD or OpenCL against an unaccelerated version, as well as a technique for detecting data races in OpenCL programs. Our techniques are implemented in KLEE-CL, a symbolic execution engine based on KLEE that supports symbolic reasoning on the equivalence between expressions involving both integer and floating-point operations. While the current generation of constraint solvers provide good support for integer arithmetic, there is little support available for floating-point arithmetic, due to the complexity inherent in such computations. The key insight behind our approach is that floating-point values are only reliably equal if they are essentially built by the same operations. This allows us to use an algorithm based on symbolic expression matching augmented with canonicalisation rules to determine path equivalence. Under symbolic execution, we have to verify equivalence along every feasible control-flow path. We reduce the branching factor of this process by aggressively merging conditionals, if-converting branches into select operations via an aggressive phi-node folding transformation. To support the Intel Streaming SIMD Extension (SSE) instruction set, we lower SSE instructions to equivalent generic vector operations, which in turn are interpreted in terms of primitive integer and floating-point operations. To support OpenCL programs, we symbolically model the OpenCL environment using an OpenCL runtime library targeted to symbolic execution. We detect data races by keeping track of all memory accesses using a memory log, and reporting a race whenever we detect that two accesses conflict. By representing the memory log symbolically, we are also able to detect races associated with symbolically indexed accesses of memory objects. We used KLEE-CL to find a number of issues in a variety of open source projects that use SSE and OpenCL, including mismatches between implementations, memory errors, race conditions and compiler bugs.
Style APA, Harvard, Vancouver, ISO itp.
25

Brown, Ashley W. "Profile-directed specialisation of custom floating-point hardware". Thesis, Imperial College London, 2010. http://hdl.handle.net/10044/1/5604.

Pełny tekst źródła
Streszczenie:
We present a methodology for generating floating-point arithmetic hardware designs which are, for suitable applications, much reduced in size, while still retaining performance and IEEE-754 compliance. Our system uses three key parts: a profiling tool, a set of customisable floating-point units and a selection of system integration methods. We use a profiling tool for floating-point behaviour to identify arithmetic operations where fundamental elements of IEEE-754 floating-point may be compromised, without generating erroneous results in the common case. In the uncommon case, we use simple detection logic to determine when operands lie outside the range of capabilities of the optimised hardware. Out-of-range operations are handled by a separate, fully capable, floatingpoint implementation, either on-chip or by returning calculations to a host processor. We present methods of system integration to achieve this errorcorrection. Thus the system suffers no compromise in IEEE-754 compliance, even when the synthesised hardware would generate erroneous results. In particular, we identify from input operands the shift amounts required for input operand alignment and post-operation normalisation. For operations where these are small, we synthesise hardware with reduced-size barrel-shifters. We also propose optimisations to take advantage of other profile-exposed behaviours, including removing the hardware required to swap operands in a floating-point adder or subtractor, and reducing the exponent range to fit observed values. We present profiling results for a range of applications, including a selection of computational science programs, Spec FP 95 benchmarks and the FFMPEG media processing tool, indicating which would be amenable to our method. Selected applications which demonstrate potential for optimisation are then taken through to a hardware implementation. We show up to a 45% decrease in hardware size for a floating-point datapath, with a correctable error-rate of less then 3%, even with non-profiled datasets.
Style APA, Harvard, Vancouver, ISO itp.
26

DeLorimier, Michael DeHon André. "Floating-point sparse matrix-vector multiply for FPGAs /". Diss., Pasadena, Calif. : California Institute of Technology, 2005. http://resolver.caltech.edu/CaltechETD:etd-05132005-144347.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
27

McCleeary, Ryan. "Lazy exact real arithmetic using floating point operations". Diss., University of Iowa, 2019. https://ir.uiowa.edu/etd/6991.

Pełny tekst źródła
Streszczenie:
Exact real arithmetic systems can specify any amount of precision on the output of the computations. They are used in a wide variety of applications when a high degree of precision is necessary. Some of these applications include: differential equation solvers, linear equation solvers, large scale mathematical models, and SMT solvers. This dissertation proposes a new exact real arithmetic system which uses lazy list of floating point numbers to represent the real numbers. It proposes algorithms for basic arithmetic computations on these structures and proves their correctness. This proposed system has the advantage of algorithms which can be supported by modern floating point hardware, while still being a lazy exact real arithmetic system.
Style APA, Harvard, Vancouver, ISO itp.
28

De, Blasio Simone, i Karpers Fredrik Ekstedt. "Comparing the precision in matrix multiplication between Posits and IEEE 754 floating-points : Assessing precision improvement with emerging floating-point formats". Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-280036.

Pełny tekst źródła
Streszczenie:
IEEE 754 floating-points are the current standard way to represent real values in computers, but there are alternative formats emerging. One of these emerging formats are Posits. The main characteristic of Posit is that the format allows for higher precision than IEEE 754 floats of the same bit size for numbers of magnitude close to 1, but lower precision for numbers of much smaller or bigger magnitude. This study compared the precision between IEEE 754 floating-point and Posit when it comes to matrix multiplication. Different sizes of matrices are compared, combined with different intervals which the values of the matrix elements were generated in. The results showed that Posits outperformed IEEE 754 floating-point numbers in terms of precision when the values are in an interval equal to or larger than [􀀀0:01; 0:01), or equal to or smaller than [􀀀100; 100). Matrix size did not affect this much, unless the intermediate format Quire was used to eliminate rounding error. For almost all other intervals, IEEE 754 floats performed better than Posits. Although most of our results favored IEEE 754 floats, Posits does have a precision benefit if one can be sure the data is within the ideal interval. Maybe Posits still have a role to play in the future of floating-point formats.
IEEE 754 flyttal är den nuvarande standarden för att representera reella tal i datorer, men det finns framväxande alternativa format. Ett av dessa nya format är Posit. Huvudkarakteristiken för Posit är att formatet möjliggör för högre precision än IEEE 754 flyttal med samma bitstorlek för värden av magnitud nära 1, men lägre precision för värden av mycket mindre eller större magnitud Denna studie jämförde precisionen mellan flyttal av formaten IEEE 754 och Posit när det gäller matrismultiplikation. Olika storlekar av matriser jämfördes, samt olika intervall av värden som matriselementen genererades i. Resultaten visade att Posits presterade bättre än IEEE 754 flyttal när det gäller precision när värdena är i ett intervall lika med eller större än [􀀀0:01; 0:01), eller lika med eller mindre än [􀀀100; 100). Matrisstorlek hade inte en anmärkningsvärd effekt på detta förutom när formatet Quire användes för att eliminera avrundningsfel. I nästan alla andra intervall presterade IEEE 754 flyttal bättre än Posits. Även om de flesta av våra resultat gynnade IEEE 754-flyttal, har Posits en precisions fördel om man kan vara säker på att värdena ligger inom det ideella intervallet. Posits kan alltså ha en roll att spela i framtiden för representation av flyttal.
Style APA, Harvard, Vancouver, ISO itp.
29

Robe, Edward D. "SIMULINK modules that emulate digital controllers realized with fixed-point or floating-point arithmetic". Ohio : Ohio University, 1994. http://www.ohiolink.edu/etd/view.cgi?ohiou1180120138.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
30

Catanzaro, Bryan C. "Higher radix floating-point representations for FPGA-based arithmetic /". Diss., CLICK HERE for online access, 2005. http://contentdm.lib.byu.edu/ETD/image/etd808.pdf.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
31

Dahlberg, Anders. "Evaluation of a Floating Point Acoustic Echo Canceller Implementation". Thesis, Linköping University, Department of Electrical Engineering, 2007. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-8938.

Pełny tekst źródła
Streszczenie:

This master thesis consists of implementation and evaluation of an AEC, Acoustic Echo Canceller, algorithm in a floating-point architecture. The most important question this thesis will try to answer is to determine benefits or drawbacks of using a floating-point architecture, relative a fixed-point architecture, to do AEC. In a telephony system there is two common forms of echo, line echo and acoustic echo. Acoustic echo is introduced by sound emanating from a loudspeaker, e.g. in a handsfree or speakerphone, being picked up by a microphone and then sent back to the source. The problem with this feedback is that the far-end speaker will hear one, or multiple, time-delayed version(s) of her own speech. This time-delayed version of speech is usually perceived as both confusing and annoying unless removed by the use of AEC. In this master thesis the performance of a floating-point version of a normalized least-mean-square AEC algorithm was evaluated in an environment designed and implemented to approximate live telephony calls. An instruction-set simulator and assembler available at the initiation of this master thesis were extended to enable; zero-overhead loops, modular addressing, post-increment of registers and register-write forwarding. With these improvements a bit-true assembly version was implemented capable of real-time AEC requiring 15 million instructions per second. A solution using as few as eight mantissa bits, in an external format used when storing data in memory, was found to have an insignificant effect on the selected AEC implementation’s performance. Due to the relatively low memory requirement of the selected AEC algorithm, the use of a small external format has a minor effect on the required memory size. In total this indicates that the possible reduction of the memory requirement and related energy consumption, does not justify the added complexity and energy consumption of using a floating-point architecture for the selected algorithm. Use of a floating-point format can still be advantageous in speech-related signal processing when the introduced time delay by a subband, or a similar frequency domain, solution is unacceptable. Speech algorithms that have high memory use and small introduced delay requirements are a good candidate for a floating-point digital signal processor architecture.

Style APA, Harvard, Vancouver, ISO itp.
32

Costello, Joseph Patrick. "Behavioural synthesis of low-power floating point CORDIC processors". Thesis, National Library of Canada = Bibliothèque nationale du Canada, 2000. http://www.collectionscanada.ca/obj/s4/f2/dsk1/tape4/PQDD_0032/MQ65854.pdf.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
33

Lyu, Chung-nan. "Pipelined floating point divider with built-in testing circuits". Ohio : Ohio University, 1988. http://www.ohiolink.edu/etd/view.cgi?ohiou1182864748.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
34

Côté, Jean-François 1966. "The design of a testable floating point convolution processor /". Thesis, McGill University, 1990. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=60002.

Pełny tekst źródła
Streszczenie:
This thesis describes the design of a pipeline architecture double precision floating point systolic cell for convolution. The arithmetic operations are distributed into three pipeline stages, enabling the cell to process each set of operands within 16 clock cycles. While offering the same precision obtained on a standard computers, the systolic cell reduces the convolution time expenditure by as much as three orders of magnitude.
Style APA, Harvard, Vancouver, ISO itp.
35

Hok, Ho Chun. "Customisable and reconfigurable platform for optimising floating point computations". Thesis, Imperial College London, 2010. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.509798.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
36

Liew, Daniel Simon. "Symbolic execution of verification languages and floating-point code". Thesis, Imperial College London, 2017. http://hdl.handle.net/10044/1/59705.

Pełny tekst źródła
Streszczenie:
The focus of this thesis is a program analysis technique named symbolic execution. We present three main contributions to this field. First, an investigation into comparing several state-of-the-art program analysis tools at the level of an intermediate verification language over a large set of benchmarks, and improvements to the state-of-the-art of symbolic execution for this language. This is explored via a new tool, Symbooglix, that operates on the Boogie intermediate verification language. Second, an investigation into performing symbolic execution of floating-point programs via a standardised theory of floating-point arithmetic that is supported by several existing constraint solvers. This is investigated via two independent extensions of the KLEE symbolic execution engine to support reasoning about floating-point operations (with one tool developed by the thesis author). Third, an investigation into the use of coverage-guided fuzzing as a means for solving constraints over finite data types, inspired by the difficulties associated with solving floating-point constraints. The associated prototype tool, JFS, which builds on the LibFuzzer project, can at present be applied to a wide range of SMT queries over bit-vector and floating-point variables, and shows promise on floating-point constraints.
Style APA, Harvard, Vancouver, ISO itp.
37

Wittman, Susan Jean. "Servo compensation using a floating point digital signal processor". Thesis, Massachusetts Institute of Technology, 1989. http://hdl.handle.net/1721.1/39018.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
38

Lyu, Chuang-nan. "Pipelined floating point divider with built-in testing circuits". Ohio University / OhioLINK, 1988. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1182864748.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
39

Ratan, Amrita. "Hardware Modules for Safe Integer and Floating-Point Arithmetic". University of Cincinnati / OhioLINK, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1383812316.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
40

Lugo, Martinez Jose E. "Strategies for sharing a floating point unit between SPEs". Diss., [La Jolla] : University of California, San Diego, 2010. http://wwwlib.umi.com/cr/ucsd/fullcit?p1470744.

Pełny tekst źródła
Streszczenie:
Thesis (M.S.)--University of California, San Diego, 2010.
Title from first page of PDF file (viewed February 17, 2010). Available via ProQuest Digital Dissertations. Includes bibliographical references (p. 55-57).
Style APA, Harvard, Vancouver, ISO itp.
41

Costello, Joseph Patrick. "Behavioural synthesis of low-power floating point CORDIC processors". Ottawa : National Library of Canada = Bibliothèque nationale du Canada, 2002. http://www.nlc-bnc.ca/obj/s4/f2/dsk1/tape4/PQDD%5F0032/MQ65854.pdf.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
42

Catanzaro, Bryan Christopher. "Higher Radix Floating-Point Representations for FPGA-Based Arithmetic". BYU ScholarsArchive, 2005. https://scholarsarchive.byu.edu/etd/311.

Pełny tekst źródła
Streszczenie:
Field Programmable Gate Arrays (FPGAs) are increasingly being used for high-throughput floating-point computation. It is forecasted that by 2009, FPGAs will provide an order of magnitude greater sustained floating-point throughput than conventional processors. FPGA implementations of floating-point operators have historically been designed to use binary floating-point representations, as do general purpose processors. Binary representations were chosen as the standard over three decades ago because they provide maximal numerical accuracy per bit of floating-point data. However, the unique nature of FPGA-based computation makes numerical accuracy per unit of FPGA resources a more important measure of the usefulness of a given floating-point representation. From this viewpoint, higher radix floating-point representations are well suited to FPGA-based computations, especially high precision calculations which require the support of denormalized numbers. This work shows that higher radix representations lead to more efficient use of FPGA resources. For example, a hexadecimal floating-point adder provides a 30% lower Area-Time product than its binary counterpart, and a hexadecimal floating-point multiplier has a 13% lower Area-Time product than its binary counterpart. This savings occurs while still delivering equal worst-case and better average-case numerical accuracy. This work presents a family of higher radix floating-point representations that are designed specifically to interoperate with standard IEEE floating-point, allowing the creation of floating-point datapaths which operate on standard binary floating-point data, yet use higher radix representations internally. Such datapaths provide higher performance by any measure: they are more accurate numerically, consume less FPGA resources and have shorter latencies. When taking into consideration the unique nature of FPGA-based computing systems, this work shows that binary floating-point representations are not optimal for most FPGA-based arithmetic computations. Higher radix representations can therefore be a useful tool for building efficient custom floating-point datapaths on FPGAs.
Style APA, Harvard, Vancouver, ISO itp.
43

Coors, Martin. "A floating-point to fixed-point design flow for high performance digital signal processors /". Aachen : Shaker, 2005. http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&doc_number=013834304&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
44

Stenersen, Espen. "Vectorized 128-bit Input FP16/FP32/FP64 Floating-Point Multiplier". Thesis, Norwegian University of Science and Technology, Department of Electronics and Telecommunications, 2008. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-8876.

Pełny tekst źródła
Streszczenie:

3D graphic accelerators are often limited by their floating-point performance. A Graphic Processing Unit (GPU) has several specialized floating-point units to achieve high throughput and performance. The floating-point units consume a large part of total area, and power consumption, and hence architectural choices are important to evaluate when implementing the design. GPUs are specially tuned for performing a set of operations on large sets of data. The task of a 3D graphic solution is to render a image or a scene. The scene contains geometric primitives as well as descriptions of the light, the way each object reflects light and the viewer position and orientation. This thesis evaluates four different pipelined, vectorized floating-point multipliers, supporting 16-bit, 32-bit and 64-bit floating-point numbers. The architectures are compared concerning area usage, power consumption and performance. Two of the architectures are implemented at Register Transfer Level (RTL), tested and synthesized, to see if assumptions made in the estimation methodologies are accurate enough to select the best architecture to implement given a set of architectures and constraints. The first architecture trades area for lower power consumption with a throughput of 38.4 Gbit/s at 300 MHz clock frequency, and the second architecture trades power for smaller area with equal throughput. The two architectures are synthesized at 200 MHz, 300 MHz and 400 MHz clock frequency, in a 65 nm low-power standard cell library and a 90 nm general purpose library, and for different input data format distributions, to compare area and power results at different clock frequencies, input data distributions and target technology. Architecture one has lower power consumption than architecture two at all clock frequencies and input data format distributions. At 300 MHz, architecture one has a total power consumption of 1.9210 mW at 65 nm, and 15.4090 mW at 90 nm. Architecture two has a total power consumption of 7.3569 mW at 65 nm, and 17.4640 mW at 90 nm. Architecture two requires less area than architecture one at all clock frequencies. At 300 MHz, architecture one has a total area of 59816.4414 um^2 at 65 nm, and 116362.0625 um^2 at 90 nm. Architecture two has a total area of 50843.0 um^2 at 65 nm, and 95242.0469 um^2 at 90 nm.

Style APA, Harvard, Vancouver, ISO itp.
45

Lu, Chung-Kuei. "A design of floating point FFT using Genesil Silicon Compiler". Thesis, Monterey, California. Naval Postgraduate School, 1991. http://hdl.handle.net/10945/30956.

Pełny tekst źródła
Streszczenie:
The hardware of floating-point MULTIPLY, ADD, and SUBTRACT units are designed to support the multiplication, addition, and subtraction operation necessary in the Fast Fourier Transform (FFT). In this thesis, the IEEE floating-point standard is adopted and scaled down to 16 bits, but the exponent is an excess-8 number represented using radix-2. A 16 bit reduced word size floating-point arithematic unit for high speed signal analysis was implemented. The layout verification, functional simulation, and timing analysis of these units have been performed on the Genesil Silicon Compiler (GSC) system that was developed to overcome the shortcomings of the time consuming custom layout methods. The design of this thesis work can be used for further investigation of the high speed, pipelined floating-point arithmetic units.
Style APA, Harvard, Vancouver, ISO itp.
46

Dutta, Sumit Ph D. Massachusetts Institute of Technology. "Floating-point unit (FPU) designs with nano-electromechanical (NEM) relays". Thesis, Massachusetts Institute of Technology, 2013. http://hdl.handle.net/1721.1/84724.

Pełny tekst źródła
Streszczenie:
Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2013.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Includes bibliographical references (pages 71-74).
Nano-electromechanical (NEM) relays are an alternative to CMOS transistors as the fabric of digital circuits. Circuits with NEM relays offer energy-efficiency benefits over CMOS since they have zero leakage power and are strategically designed to maintain throughput that is competitive with CMOS despite their slow actuation times. The floating-point unit (FPU) is the most complex arithmetic unit in a computational system. This thesis investigates if the energy-efficiency promise of NEM relays demonstrated before on smaller circuit blocks holds for complex computational structures such as the FPU. The energy, performance, and area trade-offs of FPU designs with NEM relays are examined and compared with that of state-of-the-art CMOS designs in an equivalent scaled process. Circuits that are critical path bottlenecks, including primarily the leading zero detector (LZD) and leading zero anticipator (LZA) blocks, are carefully identified and optimized for low latency and device count. We manage to drop the NEM relay FPU latency from 71 mechanical delays in a CMOS-style implementation to 16 mechanical delays in a NEM relay pass-logic style implementation. The FPU designed with NEM relays features 15x lower energy per operation compared to CMOS.
by Sumit Dutta.
S.M.
Style APA, Harvard, Vancouver, ISO itp.
47

Peterson, Scott Thomas. "Experimental response and analysis of the Evergreen Point Floating Bridge". Connect to this title online, 2002. http://www.dissertations.wsu.edu/dissertations/Fall2002/s%5Fpeterson%5F102102.pdf.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
48

Plet, Antoine. "Contribution to error analysis of algorithms in floating-point arithmetic". Thesis, Lyon, 2017. http://www.theses.fr/2017LYSEN038/document.

Pełny tekst źródła
Streszczenie:
L’arithmétique virgule flottante est une approximation de l’arithmétique réelle dans laquelle chaque opération peut introduire une erreur. La norme IEEE 754 requiert que les opérations élémentaires soient aussi précises que possible, mais au cours d’un calcul, les erreurs d’arrondi s’accumulent et peuvent conduire à des résultats totalement faussés. Cela arrive avec une expression aussi simple que ab + cd, pour laquelle l’algorithme naïf retourne parfois un résultat aberrant, avec une erreur relative largement supérieure à 1. Il est donc important d’analyser les algorithmes utilisés pour contrôler l’erreur commise. Je m’intéresse à l’analyse de briques élémentaires du calcul en cherchant des bornes fines sur l’erreur relative. Pour des algorithmes suffisamment précis, en arithmétique de base β et de précision p, on arrive en général à prouver une borne sur l'erreur de la forme α·u + o(u²) où α > 0 et u = 1/2·β1-p est l'unité d'arrondi. Comme indication de la finesse d'une telle borne, on peut fournir des exemples numériques pour les précisions standards qui approchent cette borne, ou bien un exemple paramétré par la précision qui génère une erreur de la forme α·u + o(u²), prouvant ainsi l'optimalité asymptotique de la borne. J’ai travaillé sur la formalisation d’une arithmétique à virgule flottante symbolique, sur des nombres paramétrés par la précision, et à son implantation dans le logiciel de calcul formel Maple. J’ai aussi obtenu une borne d'erreur très fine pour un algorithme d’inversion complexe en arithmétique flottante. Ce résultat suggère le calcul d'une division décrit par la formule x/y = (1/y)·x, par opposition à x/y = (x·y)/|y|². Quel que soit l'algorithme utilisé pour effectuer la multiplication, nous avons une borne d'erreur plus petite pour les algorithmes décrits par la première formule. Ces travaux sont réalisés avec mes directeurs de thèse, en collaboration avec Claude-Pierre Jeannerod (CR Inria dans AriC, au LIP)
Floating-point arithmetic is an approximation of real arithmetic in which each operation may introduce a rounding error. The IEEE 754 standard requires elementary operations to be as accurate as possible. However, through a computation, rounding errors may accumulate and lead to totally wrong results. It happens for example with an expression as simple as ab + cd for which the naive algorithm sometimes returns a result with a relative error larger than 1. Thus, it is important to analyze algorithms in floating-point arithmetic to understand as thoroughly as possible the generated error. In this thesis, we are interested in the analysis of small building blocks of numerical computing, for which we look for sharp error bounds on the relative error. For this kind of building blocks, in base and precision p, we often successfully prove error bounds of the form α·u + o(u²) where α > 0 and u = 1/2·β1-p is the unit roundoff. To characterize the sharpness of such a bound, one can provide numerical examples for the standard precisions that are close to the bound, or examples that are parametrized by the precision and generate an error of the same form α·u + o(u²), thus proving the asymptotic optimality of the bound. However, the paper and pencil checking of such parametrized examples is a tedious and error-prone task. We worked on the formalization of a symbolicfloating-point arithmetic, over numbers that are parametrized by the precision, and implemented it as a library in the Maple computer algebra system. We also worked on the error analysis of the basic operations for complex numbers in floating-point arithmetic. We proved a very sharp error bound for an algorithm for the inversion of a complex number in floating-point arithmetic. This result suggests that the computation of a complex division according to x/y = (1/y)·x may be preferred, instead of the more classical formula x/y = (x·y)/|y|². Indeed, for any complex multiplication algorithm, the error bound is smaller with the algorithms described by the “inverse and multiply” approach.This is a joint work with my PhD advisors, with the collaboration of Claude-Pierre Jeannerod (CR Inria in AriC, at LIP)
Style APA, Harvard, Vancouver, ISO itp.
49

El, Moussawi Ali Hassan. "SIMD-aware word length optimization for floating-point to fixed-point conversion targeting embedded processors". Thesis, Rennes 1, 2016. http://www.theses.fr/2016REN1S150/document.

Pełny tekst źródła
Streszczenie:
Afin de limiter leur coût et/ou leur consommation électrique, certains processeurs embarqués sacrifient le support matériel de l'arithmétique à virgule flottante. Pourtant, pour des raisons de simplicité, les applications sont généralement spécifiées en utilisant l'arithmétique à virgule flottante. Porter ces applications sur des processeurs embarqués de ce genre nécessite une émulation logicielle de l'arithmétique à virgule flottante, qui peut sévèrement dégrader la performance. Pour éviter cela, l'application est converti pour utiliser l'arithmétique à virgule fixe, qui a l'avantage d'être plus efficace à implémenter sur des unités de calcul entier. La conversion de virgule flottante en virgule fixe est une procédure délicate qui implique des compromis subtils entre performance et précision de calcul. Elle permet, entre autre, de réduire la taille des données pour le coût de dégrader la précision de calcul. Par ailleurs, la plupart de ces processeurs fournissent un support pour le calcul vectoriel de type SIMD (Single Instruction Multiple Data) afin d'améliorer la performance. En effet, cela permet l'exécution d'une opération sur plusieurs données en parallèle, réduisant ainsi le temps d'exécution. Cependant, il est généralement nécessaire de transformer l'application pour exploiter les unités de calcul vectoriel. Cette transformation de vectorisation est sensible à la taille des données ; plus leurs tailles diminuent, plus le taux de vectorisation augmente. Il apparaît donc un compromis entre vectorisation et précision de calcul. Plusieurs travaux ont proposé des méthodologies permettant, d'une part la conversion automatique de virgule flottante en virgule fixe, et d'autre part la vectorisation automatique. Dans l'état de l'art, ces deux transformations sont considérées indépendamment, pourtant elles sont fortement liées. Dans ce contexte, nous étudions la relation entre ces deux transformations, dans le but d'exploiter efficacement le compromis entre performance et précision de calcul. Ainsi, nous proposons d'abord un algorithme amélioré pour l'extraction de parallélisme SLP (Superword Level Parallelism ; une technique de vectorisation). Puis, nous proposons une nouvelle méthodologie permettant l'application conjointe de la conversion de virgule flottante en virgule fixe et de l'exploitation du SLP. Enfin, nous implémentons cette approche sous forme d'un flot de compilation source-à-source complètement automatisé, afin de valider ces travaux. Les résultats montrent l'efficacité de cette approche, dans l'exploitation du compromis entre performance et précision, vis-à-vis d'une approche classique considérant ces deux transformations indépendamment
In order to cut-down their cost and/or their power consumption, many embedded processors do not provide hardware support for floating-point arithmetic. However, applications in many domains, such as signal processing, are generally specified using floating-point arithmetic for the sake of simplicity. Porting these applications on such embedded processors requires a software emulation of floating-point arithmetic, which can greatly degrade performance. To avoid this, the application is converted to use fixed-point arithmetic instead. Floating-point to fixed-point conversion involves a subtle tradeoff between performance and precision ; it enables the use of narrower data word lengths at the cost of degrading the computation accuracy. Besides, most embedded processors provide support for SIMD (Single Instruction Multiple Data) as a mean to improve performance. In fact, this allows the execution of one operation on multiple data in parallel, thus ultimately reducing the execution time. However, the application should usually be transformed in order to take advantage of the SIMD instruction set. This transformation, known as Simdization, is affected by the data word lengths ; narrower word lengths enable a higher SIMD parallelism rate. Hence the tradeoff between precision and Simdization. Many existing work aimed at provide/improving methodologies for automatic floating-point to fixed-point conversion on the one side, and Simdization on the other. In the state-of-the-art, both transformations are considered separately even though they are strongly related. In this context, we study the interactions between these transformations in order to better exploit the performance/accuracy tradeoff. First, we propose an improved SLP (Superword Level Parallelism) extraction (an Simdization technique) algorithm. Then, we propose a new methodology to jointly perform floating-point to fixed-point conversion and SLP extraction. Finally, we implement this work as a fully automated source-to-source compiler flow. Experimental results, targeting four different embedded processors, show the validity of our approach in efficiently exploiting the performance/accuracy tradeoff compared to a typical approach, which considers both transformations independently
Style APA, Harvard, Vancouver, ISO itp.
50

Coors, Martin [Verfasser]. "A Floating-Point to Fixed-Point Design Flow for High Performance Digital Signal Processors / Martin Coors". Aachen : Shaker, 2005. http://d-nb.info/1181610834/34.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
Oferujemy zniżki na wszystkie plany premium dla autorów, których prace zostały uwzględnione w tematycznych zestawieniach literatury. Skontaktuj się z nami, aby uzyskać unikalny kod promocyjny!

Do bibliografii